o ÔÙ¾iã@stdZddlmZmZddlmZddlmZddlm Z ddl mZmZm Z e e¡ZGdd„deƒZeejd <d S)z.LFM2 (Liquid Foundation Model 2) configurationé)ÚListÚOptional)ÚCONFIG_MAPPING)Ú Lfm2Config)Úlogging)ÚMamba2CacheParamsÚMamba2StateShapeÚmamba2_state_dtypec@sdeZdZdZedeefdd„ƒZedeefdd„ƒZedefdd„ƒZ ede efd d „ƒZdS)rzÔ SGLang configuration for LFM2 models. Extends HuggingFace's Lfm2Config with hybrid model properties needed by SGLang. LFM2 uses a hybrid architecture mixing full attention and ShortConv layers. ÚreturncCódd„t|jƒDƒS)z0Return indices of attention layers for KV cache.cSsg|] \}}|dkr|‘qS)Úfull_attention©©Ú.0ÚiÚltr r úK/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/configs/lfm2.pyÚ +sz7Lfm2Config.full_attention_layer_ids..©Ú enumerateÚlayer_types©Úselfr r rÚfull_attention_layer_ids(sz#Lfm2Config.full_attention_layer_idscCr)z3Return indices of conv layers for conv state cache.cSsg|] \}}|dvr|‘qS))ÚconvÚ short_convr rr r rr0sz/Lfm2Config.linear_layer_ids..rrr r rÚlinear_layer_ids-sÿzLfm2Config.linear_layer_idscCsdS)zJReturn chunk size for Mamba2 backend. LFM2 doesn't use chunking, return 1.ér rr r rÚmamba_chunk_size4szLfm2Config.mamba_chunk_sizec Cszddlm}|j}|s dS|j}t|jƒ}z|ƒ}Wn ttfy'd}Ynwtj ||d||d|d}t ||t|ƒdS)zú Get cache params for HybridReqToTokenPool initialization. LFM2 uses ShortConv layers with a small fixed-size cache (kernel_size - 1). Unlike full Mamba2 models, LFM2 only uses the conv state, not SSM temporal state. r)Úget_attention_tp_sizeNr)Ú tp_world_sizeÚintermediate_sizeÚn_groupsÚ num_headsÚhead_dimÚ state_sizeÚconv_kernel)ÚshapeÚlayersÚdtype)Úsglang.srt.layers.dp_attentionrrÚhidden_sizeÚintÚconv_L_cacheÚAssertionErrorÚRuntimeErrorrÚcreaterr )rrÚconv_layer_idsr+r&Útp_sizer'r r rÚmamba2_cache_params9s2 ÿù ýzLfm2Config.mamba2_cache_paramsN) Ú__name__Ú __module__Ú__qualname__Ú__doc__Úpropertyrr,rrrrrr3r r r rr srÚlfm2N)r7ÚtypingrrÚtransformersrrÚHFLfm2ConfigÚtransformers.utilsrÚsglang.srt.configs.mamba_utilsrrr Ú get_loggerr4ÚloggerÚ_extra_contentr r r rÚs H