o ˜à·iPã@s*ddlmZddlZGdd„dejƒZdS)é)ÚAnyNcsÄeZdZUdZejed<dZdZdZ d!de eefdBde eefdBdedBdedBde de de de dededede f‡fdd„ Z‡fdd„Zedejfdd „ƒZ‡ZS)"ÚUltravoxConfigaµ This is the configuration class to store the configuration of a [`UltravoxForConditionalGeneration`]. It is used to instantiate an Ultravox model according to the specified arguments, defining the model architecture. Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the documentation from [`PretrainedConfig`] for more information. Args: audio_config (`Union[AutoConfig, dict]`, *optional*): Custom audio config or dict. text_config (`Union[AutoConfig, dict]`, *optional*): The config object of the text backbone. audio_model_id (`str`, *optional*): The model ID of the audio backbone. text_model_id (`str`, *optional*): The model ID of the text backbone. ignore_index (`int`, *optional*, defaults to -100): The ignore index for the loss function. audio_token_index (`int`, *optional*, defaults to 32000): The audio token index to encode the audio prompt. stack_factor (`int`, *optional*, defaults to 8): Audio downsampling factor for the multimodal projector. norm_init (`float`, *optional*, defaults to 0.4): The initialization value for the layer normalization. projector_act (`str`, *optional*, defaults to `"swiglu"`): The activation function used by the multimodal projector. projector_ln_mid (`bool`, *optional*, defaults to `False`): Whether to apply layer normalization at the middle of the projector or at the end. Versions v0.4.1 and below use `False`, but v0.5 and above use `True`. Úwrapped_model_configÚultravoxz <|audio|>FNéœÿÿÿé}ééçš™™™™™Ù?ÚswiglurÚaudio_configÚtext_configÚaudio_model_idÚ text_model_idÚignore_indexÚaudio_token_indexÚhidden_sizeÚstack_factorÚ norm_initÚ projector_actÚprojector_ln_midÚnum_projector_layersc s´||_||_||_||_| |_| |_||_||_||_|dur2|p"i}t j | dd¡di|¤Ž|_||_ |durOd|_ |p?i}t j | dd¡di|¤Ž|_tƒjdi| ¤ŽdS)NÚ model_typeÚllamaÚwhisper©)rrrrrrrrrÚtransformersÚCONFIG_MAPPINGÚgetrrrÚsuperÚ__init__)Úselfrr rrrrrrrrrrÚkwargs©Ú __class__rú^/home/ubuntu/vllm_env/lib/python3.10/site-packages/vllm/transformers_utils/configs/ultravox.pyr 3s4 ÿþ ÿþzUltravoxConfig.__init__csd|dkr|durddlm}||dd|_n|dkr+|dur+ddlm}||dd|_tƒ ||¡S)Nrr)Ú get_configF)Útrust_remote_coder)Úvllm.transformers_utils.configr&rrrÚ__setattr__)r!ÚkeyÚvaluer&r#rr%r)`szUltravoxConfig.__setattr__ÚreturncCs |j ¡S)N)rÚget_text_config)r!rrr%r ss zUltravoxConfig.text_config)NNNNrrrr r rFr)Ú__name__Ú __module__Ú__qualname__Ú__doc__rÚPretrainedConfigÚ__annotations__rÚaudio_tokenÚis_compositionÚdictÚstrrÚintÚfloatÚboolr r)Úpropertyr Ú __classcell__rrr#r%r s\ #óþýüûúùø ÷ öõô ó-r)Útypingrrr2rrrrr%Ús