o ©Ì³i<ã@s6ddlmZddlZddlmZGdd„dejƒZdS)é)ÚTupleN)Únncs~eZdZdZdededededejddf‡fd d „Zddd„Ze defd d„ƒZ dejdejdeejejffdd„Z ‡ZS)ÚKVCachea Standalone ``nn.Module`` containing a kv-cache to cache past key and values during inference. Args: batch_size (int): batch size model will be run with max_seq_len (int): maximum sequence length model will be run with num_kv_heads (int): number of key/value heads. head_dim (int): per-attention head embedding dimension dtype (torch.dtype): dtype for the caches Ú batch_sizeÚmax_seq_lenÚnum_kv_headsÚhead_dimÚdtypeÚreturnNcsptƒ ¡||||f}|jdtj||ddd|jdtj||ddd|jdt d|d¡dd||_dS) NÚk_cache)r F)Ú persistentÚv_cacheÚ cache_posré)ÚsuperÚ__init__Úregister_bufferÚtorchÚzerosÚaranger)Úselfrrrrr Úcache_shape©Ú __class__©úN/home/ubuntu/.local/lib/python3.10/site-packages/torchtune/modules/kv_cache.pyrs ÿÿÿ zKVCache.__init__cCs(|j ¡|j ¡|j|j8_dS)zReset the cache to zero.N)rÚzero_r rÚsize©rrrrÚreset.s z KVCache.resetcCs|jd ¡S)Nr)rÚitemrrrrr4szKVCache.sizeÚk_valÚv_valcCsÀ|j\}}}}||jjdkr!td|jjd›d|jd›dƒ‚|jd||jjdks0J‚|j}|j}||dd…dd…|jd|…f<||dd…dd…|jd|…f<|j |¡||fS)aPUpdate KV cache with the new ``k_val``, ``v_val`` and return the updated cache. Note: When updating the KV cache, it is assumed that subsequent updates should update key-value positions in consecutive sequence positions. If you wish to update cache values which have already been filled, use ``.reset()``, which will reset the cache to the zero-th position. Example: >>> cache = KVCache(batch_size=2, max_seq_len=16, num_kv_heads=4, head_dim=32, dtype=torch.bfloat16) >>> keys, values = torch.ones((2, 4, 8, 32)), torch.ones((2, 4, 8, 32)) >>> cache.update(keys, values) >>> # now positions 0 through 7 are filled >>> cache.size >>> 8 >>> keys, values = torch.ones((2, 4, 1, 32)), torch.ones((2, 4, 1, 32)) >>> cache.update(keys, values) >>> # this will fill at position 8 >>> cache.size >>> 9 Args: k_val (torch.Tensor): Current key tensor with shape [B, H, S, D] v_val (torch.Tensor): Current value tensor with shape [B, H, S, D] Returns: Tuple[torch.Tensor, torch.Tensor]: Updated key and value cache tensors, respectively. Raises: ValueError: if the batch size of the new key (or value) tensor is greater than the batch size used during cache setup. Note: This function will raise an ``AssertionError`` if the sequence length of ``k_val`` is longer than the maximum cache sequence length. rz6The current cache has been setup with a batch size of z,, but found new key tensors with batch size ú!rN)ÚshaperÚ ValueErrorrr Úadd_)rr!r"ÚbszÚ_Úseq_lenÚk_outÚv_outrrrÚupdate8s'ÿÿ zKVCache.update)r N)Ú__name__Ú __module__Ú__qualname__Ú__doc__Úintrr rrÚpropertyrÚTensorrr,Ú __classcell__rrrrr s2þýüûúù ÿÿþr)ÚtypingrrrÚModulerrrrrÚs