o ˜à·ii ã@sJddlmZmZddlmZddlmZddlmZeGdd„dƒƒZ dS)é)ÚAnyÚLiteral)Úfield_validator)Úconfig)ÚAttentionBackendEnumc@sÐeZdZUdZdZedBed< dZeddBed< dZ e ed< dZeed < dZ e ed < dZe ed< dZe dBed < dZe ed< dZe ed< defdd„Zedddededefdd„ƒƒZdS)ÚAttentionConfigz/Configuration for attention mechanisms in vLLM.NÚbackend)ééÚflash_attn_versionFÚuse_prefill_decode_attentioné Ú(flash_attn_max_num_splits_for_cuda_graphÚuse_cudnn_prefillTÚ"use_trtllm_ragged_deepseek_prefillÚuse_trtllm_attentionÚdisable_flashinfer_prefillÚ!disable_flashinfer_q_quantizationÚreturncCs&ddlm}m}g}|||ƒ}||ƒS)a$ Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states. r)Úget_hash_factorsÚhash_factors)Úvllm.config.utilsrr)ÚselfrrÚignored_factorsÚfactors©rúK/home/ubuntu/vllm_env/lib/python3.10/site-packages/vllm/config/attention.pyÚcompute_hash.s zAttentionConfig.compute_hashÚbefore)ÚmodeÚvaluecCst|tƒrt| ¡S|S)z6Enable parsing of the `backend` enum type from string.)Ú isinstanceÚstrrÚupper)Úclsr rrrÚvalidate_backend_before<s z'AttentionConfig.validate_backend_before)Ú__name__Ú __module__Ú__qualname__Ú__doc__rrÚ__annotations__rrrÚboolrÚintrrrrrr"rrÚclassmethodrr%rrrrrs0 rN) ÚtypingrrÚpydanticrrrÚ#vllm.v1.attention.backends.registryrrrrrrÚs