o *ºih?ã@s˜UddlZddlZddlmZmZmZddlZddlmZm Z m Z mZmZm Z mZddlmZddlmZmZmZddlmZddlmZddlmZmZmZdd lmZgd ¢ZdgZe e!¡Z"ej#e$d< dEd edededeedeeeff dd„Z% dEdededej&jdeefdd„Z'dede(defdd„Z)dede*fdd„Z+dede*fdd „Z,dede(fd!d"„Z-ed#d$dedeee(efddffd%d&„ƒZ.ed#d$ ' ( (dFded)e*d*e*d+e*deee(efddff d,d-„ƒZ/dejde0fd.d/„Z1dejd0d1de*fd2d3„Z2eƒd4ede*fd5d6„ƒZ3e eej4fd7ejd8ejd9ee d:ee d;eej5f de0d?eed@e*de0f dAdB„Z7d;ej5de8fdCdD„Z9dS)HéN)Ú GeneratorÚOptionalÚTuple)Ú FP4_E2M1_DATAÚ FP8_E4M3_DATAÚ FloatArgsÚQuantizationArgsÚQuantizationStrategyÚQuantizationTypeÚround_to_quantized_type_dtype)ÚQuantizationScheme)Úgenerate_mxfp4_scalesÚmaybe_convert_from_mxfp4_expÚshould_generatre_mxfp4_scales)Ú deprecated)Úlogger)ÚFloatTensorÚ IntTensorÚTensor)ÚModule)Úis_module_quantizedÚis_model_quantizedÚmodule_typeÚget_torch_bit_depthÚcan_quantizeÚKV_CACHE_TARGETSÚis_kv_cache_quant_schemeÚiter_named_leaf_modulesÚiter_named_quantizable_modulesÚcompute_dynamic_scales_and_zpÚcalculate_rangeÚcalculate_qparamsÚgenerate_gparamÚ strategy_cdivzre:.*self_attn$Ú_LOGGERÚmin_valsÚmax_valsÚquantization_argsÚglobal_scaleÚreturncCs†t |t |¡¡}t |t |¡¡}|j}t||ƒ\}}||}|jrMt t |¡t |¡¡}t|dr:t |d} n|t |ƒd} tj| j||j d} n$|jdkr\|jtjkr\tdƒ‚||t |ƒ} ||| } t | ||¡} |dury|| } |jdur…t| |jd} t|| ƒ} t|jdur“|jn| j d}t | d ktj|| j |d | ¡} t| |jdd} | jd kr¿| d ¡} | d ¡} | | fS)aý :param min_vals: tensor of min value(s) to calculate scale(s) and zero point(s) from :param max_vals: tensor of max value(s) to calculate scale(s) and zero point(s) from :param quantization_args: settings to quantization :param global_scale: additional global scale to scale the locally generated scale currently only applied/supported for Fp4 :return: tuple of the calculated scale(s) and zero point(s). For FP4, the calculated scale is of dtype FP8 )Úargs)Úxé)ÚdeviceÚdtypeéz0Asymmetric Quantization is not supported for FP4N©r.r)r.r-F)r.Úcast_to_original_dtypeé)ÚtorchÚminÚ zeros_likeÚmaxr-r Ú symmetricÚabsrr ÚfloatÚzerosÚshaper.Únum_bitsÚtyper ÚFLOATÚNotImplementedErrorÚclampÚscale_dtyperrÚ_get_dtype_epsÚwhereÚtensorÚzp_dtypeÚndimÚreshape)r%r&r'r(r-Úbit_minÚbit_maxÚ bit_rangeÚmax_val_posÚscalesÚzero_pointsÚeps©rOúj/home/ubuntu/veenaModal/venv/lib/python3.10/site-packages/compressed_tensors/quantization/utils/helpers.pyr!AsV ÿ ÿ ÿýýÿ r!Úvaluer*Úmodulec sðd}|jtjkrddh‰t‡fdd„t|jƒDƒƒ}n;|jtjkr$d}n2|jtjtjfvrFd}d}t |jd|j¡|jf}| d|¡}ntjtjtjtjf}td |›ƒ‚|s`t |¡\}} ntj|||d }tj|||d } t|| ||dS)aµ Returns the computed scales and zero points for dynamic activation quantization. :param value: tensor to calculate quantization parameters for :param args: quantization args :param reduce_dims: optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions :return: tuple of scale and zero point derived from the observed tensor Trr2c3s|] }|ˆvr|VqdS©NrO)Ú.0Úidx©ÚdimrOrPÚ ¬s€z0compute_dynamic_scales_and_zp..NéÿÿÿÿFz+Dynamic quantization is only supported for )rWÚkeepdims)r()Ústrategyr ÚTOKENÚtupleÚrangerFÚTENSORÚTENSOR_GROUPÚGROUPÚmathÚceilr;Ú group_sizeÚ unflattenÚ ValueErrorr3ÚaminmaxÚaminÚamaxr!) rQr*rRr(Ú keep_dimsÚreduce_dimsÚ reshaped_dimsÚsupported_strategiesÚmin_valÚmax_valrOrVrPr—s<þþüþrr-cCsÐ|jtjkr$d|j}tj|dd|d}tj|d|d}||fS|jtjkr`|jdkrCtjtj|d}tjtj |d}||fS|jdkr\tjt j|d}tjt j |d}||fStdƒ‚td|j›ƒ‚)a Calculated the effective quantization range for the given Quantization Args :param quantization_args: quantization args to get range of :param device: device to store the range to :return: tuple endpoints for the given quantization range r,r2)r-ér/z1Range calculation only supported for 4 and 8 bitszInvalid quantization type ) r=r ÚINTr<r3rDr>rr6r4rr?rf)r'r-rJÚq_maxÚq_minrOrOrPr Òs$ ò öúÿr cCsBt|dƒsdS|jjdurdS|jjdurdS|jjdurdSdS)zÍ Check if a module is quantized, based on the existence of a non-empty quantization scheme :param module: pytorch module to check :return: True if module is quantized, False otherwise Úquantization_schemeFNT)ÚhasattrrtÚweightsÚinput_activationsÚoutput_activations©rRrOrOrPrïs rÚmodelcCstdd„| ¡DƒƒS)zç Check if any modules in a model are quantized, based on the existence of a non-empty quantization scheme in at least one module :param model: pytorch model :return: True if model is quantized, False otherwise css|]}t|ƒVqdSrS)r)rTÚ submodulerOrOrPrXs€z%is_model_quantized..)ÚanyÚmodules)rzrOrOrPrsrcCs t|ƒjS)zˆ Gets a string representation of a module type :module: pytorch module to get type of :return: module type as a string )r=Ú__name__ryrOrOrPrs rz“This function will be removed in a future release. Please use `model.named_modules()` and filter by compressed_tensors.InternalModule if neceessary)Úmessageccsœ| ¡D]F\}}t| ¡ƒ}t|ƒdkrd|vr||fVqt|ƒdkr/tt| ¡ƒŽ\}}d}tt|ƒƒD]}||}d|vrCd}q7|sK||fVqdS)zÞ Yields modules that do not have any submodules except observers. The observers themselves are not yielded :param model: model to get leaf modules of :returns: generator tuple of (name, leaf_submodule) rÚobserverFTN)Ú named_modulesÚlistÚchildrenÚlenÚzipÚnamed_childrenr^)rzÚnamer{rƒr†Úhas_non_observer_childrenÚiÚ child_namerOrOrPrs"€€ €ðrTFÚinclude_childrenÚinclude_attnÚinclude_mlpccsÐ| ¡D]`\}}|rMt| ¡ƒ}t|ƒdkr!d|vr!||fVn,t|ƒdkr1tt| ¡ƒŽ\}}d}tt|ƒƒD]} || } d| vrEd}q9|sM||fV|rY| d¡rY||fV|re| d¡re||fVqdS)aU Yield name and submodule of - leaf modules, set by include_children - attention modyles, set by include_attn :param model: model to get leaf modules of :param include_children: flag to get the leaf modules :param inlcude_attn: flag to get the attention modules :returns: generator tuple of (name, submodule) rr€FTÚ self_attnÚmlpN)rr‚rƒr„r…r†r^Úendswith)rzr‹rŒrr‡r{rƒr†rˆr‰rŠrOrOrPr:s0€€ €ércCs8z t |j¡j}W|Styt |j¡j}Y|Sw)z¹ Determine the number of bits used to represent the dtype of a tensor :param value: tensor to check bit depth of :return: bit depth of each element in the value tensor )r3Úfinfor.ÚbitsÚ TypeErrorÚiinfo)rQÚ bit_depthrOrOrPrhsýýrÚ quant_argsrcCs:t|ƒ}|j}||jkrt d|›d|›d¡||jkS)aI Checks if value can be quantized by quant_args. :param value: tensor to check for quantization :param quant_args: QuantizationArgs to use for quantization :return: False if value is already quantized to quant_args or value is incompatible with quant_args, True if value can be quantized with quant_args z%Can't quantize tensor with bit depth z to zH.The QuantizationArgs provided are not compatible with the input tensor.)rr<r$Úwarn)rQr–r•Úrequested_depthrOrOrPrws ÿ rÚschemecCs|jD] }|tvrdSqdS)a Check whether the QuantizationScheme targets the kv cache. It does if all the following criteria are met: - the scheme targets either exactly match the KV_CACHE_TARGETS or the match KV_CACHE_TARGETS regex pattern - the scheme quantizes output_activations (we want to quantize the outputs from the KV_CACHE_TARGETS, as their correspond to the keys and values that are to be saved in the cache) :param scheme: The QuantizationScheme to investigate :return: boolean flag TF)Útargetsr)r™ÚtargetrOrOrPr‹s ÿrÚupdated_min_valÚupdated_max_valÚ scale_dataÚ quant_datar.c Cs^t |t |¡¡}t |t |¡¡}t t |¡t |¡¡}|j|j|}| |¡ dg¡S)ah Generate a global scale for an entire tensor (input_tensor). Goal of the scale is to ensure that the quantization (local) scale falls into the approproiate dtype range. E.g. for NVFP4, group (local) scales are in dtype FP8. The global_scale attempts to use the entire FP8 dtype range while mapping a per-group max to the FP4 max. r2)r3r4r5r6r8ÚtorG) rœrržrŸr.r%r&rKr(rOrOrPr" s r"Údivisorr[ÚstrictcCsVt ||¡}|||kr)|›d|›d|›d|›}|r t|ƒ‚tjdd |¡|S)NzJ quantization strategy requires strict division of weight/activation size z and group/block size z[. consider reducing the group/block size or ignoring modules with weights not divisible by T)Úlog_once)rbrcrfrÚbindÚwarning)rQr¡r[r¢ÚdividendrrOrOrPr#·sÿÿýÿr#cCs@|tjkrdS|tjkrdSt tjg|d¡rt |¡jSdS)NgÀ?gÐ?r0r2)rr.rr3Úis_floating_pointrDr‘rNr0rOrOrPrBÎs rBrS)TFF)F):ÚloggingrbÚtypingrrrr3Ú*compressed_tensors.quantization.quant_argsrrrrr r rÚ,compressed_tensors.quantization.quant_schemerÚ1compressed_tensors.quantization.utils.mxfp4_utilsr rrÚcompressed_tensors.utilsrÚlogururrrrÚtorch.nnrÚ__all__rÚ getLoggerr~r$ÚLoggerÚ__annotations__r!ÚnnrÚstrr ÚboolrrrrrÚintrrrÚfloat32r.r"r#r9rBrOrOrOrPÚs¸ $ üÿþýü ûZüÿþý ü; ÿ&ÿüÿþýüû)ûÿþýü ûüÿþýü û