o Š“iñã@s<ddlZddlmZmZmZddlmZGdd„deƒZdS)éN)ÚpipelineÚTurbomindEngineConfigÚGenerationConfigé)Ú BaseModelc@s<eZdZ d dd„Z dd d „Z ddd„ZdS)Ú LMDeployModelÚcudaédNcKsP|dksJdƒ‚|dtj d¡j}t|d}|r|nd}t|d|d|_dS)Nrzclmdeploy only supports cuda devices, consider changing device or using a different backend instead.i)Úcache_max_entry_countzekwek/Soprano-1.1-80MÚERROR)Ú log_levelÚbackend_config)ÚtorchrÚget_device_propertiesÚtotal_memoryrr)ÚselfÚdeviceÚ cache_size_mbÚ model_pathÚkwargsÚcache_size_ratior Úmodel_name_or_path©rúM/home/ubuntu/.local/lib/python3.10/site-packages/soprano/backends/lmdeploy.pyÚ__init__s þzLMDeployModel.__init__çffffffî?ç333333Ó?ç333333ó?c CsHtdd|||dd}|j||d}g}|D]}| |j|jdœ¡q|S©NÚ generationTi)Úoutput_last_hidden_stateÚ do_sampleÚtop_pÚtemperatureÚrepetition_penaltyÚmax_new_tokens)Ú gen_config)Ú finish_reasonÚhidden_state)rrÚappendr'Úlast_hidden_state) rÚpromptsr"r#r$r&Ú responsesÚresÚresponserrrÚinfersû þzLMDeployModel.inferccsFtdd|||dd}|jj|g|d}|D] }|j|jdœVqdSr)rrÚstream_inferr'r*)rÚpromptr"r#r$r&r,r.rrrr0+s€û þÿzLMDeployModel.stream_infer)rr N)rrr)Ú__name__Ú __module__Ú__qualname__rr/r0rrrrrs ý üür)rÚlmdeployrrrÚbaserrrrrrÚs