o ,’×i†bã@s¦UddlZddlmZmZmZddlmZddlmZm Z m Z mZmZm Z mZddlmZddlmZddlmZdd lmZdd lmZddlZddgZejjZd d„ZiZe e e fed<dd„Z dLdd„Z!e!ej"ƒddœde#fdd„ƒZ$e!ej%ƒdMde#fdd„ƒZ&e!ej'ƒdMde#fdd„ƒZ(e!ej)ƒdMde#fdd„ƒZ* dLdee#dee#d ee#d!e+de#f d"d#„Z,e!ej-ej.gƒddœde#fd$d%„ƒZ/e!ej0ƒde#fd&d'„ƒZ1d(d)„Z2e!ej3ej4gƒddœde#fd*d+„ƒZ5dd,œdee e e#d-fe e#d-fe e#d-fee e#d-fffd.d/„Z6dd,œdee e e#d-fe e#d-fe e#d-fee e#d-fffd0d1„Z7e!ej8d2d3ddœde#fd4d5„ƒZ9e!ej:d2d3de#fd6d7„ƒZ;d8d9„Zgƒddœde#fd:d;„ƒZ?e!ej@d2d3de#fdd?„ƒZCej"e$ej%e&ej'e(ej)e*ej-e/ej.e/ej0e1ej3e5ej4e5ej=e?ej>e?ej8e9ej:e;ej@eAejBeCiZd@dA„ZDgdB¢ZEdCdD„ZFdEdF„ZGdGdH„ZHdIdJ„ZIGdKd„deƒZJdS)NéN)Útree_mapÚtree_flattenÚtree_unflattené)Ú ModuleTracker)ÚListÚAnyÚDictÚOptionalÚUnionÚTupleÚIterator)Údefaultdict)ÚTorchDispatchMode)Úregister_decomposition©Úprod©ÚwrapsÚFlopCounterModeÚregister_flop_formulacCst|tjƒr |jS|S©N)Ú isinstanceÚtorchÚTensorÚshape)Úi©rúV/home/ubuntu/SoloSpeech/.venv/lib/python3.10/site-packages/torch/utils/flop_counter.pyÚ get_shapesrÚ flop_registrycstˆƒddœ‡fdd„ ƒ}|S)N)Úout_valcs(tt|||fƒ\}}}ˆ|d|i|¤ŽS)NÚ out_shape)rr)r!ÚargsÚkwargsr"©ÚfrrÚnfszshape_wrapper..nfr©r&r'rr%rÚ shape_wrappersr)Fcs‡‡fdd„}|S)Ncs"ˆst|ƒ}tˆtdd|ƒ|S)NT)ÚregistryÚunsafe)r)rr )Úflop_formula©Úget_rawÚtargetsrrÚregister_fun"sz+register_flop_formula..register_funr)r/r.r0rr-rr!s)r"Úreturnc Os,|\}}|\}}||ksJ‚||d|S)zCount flops for matmul.ér) Úa_shapeÚb_shaper"r#r$ÚmÚkÚk2ÚnrrrÚmm_flop*sr9cKó t||ƒS)zCount flops for addmm.)r9©Ú self_shaper3r4r"r$rrrÚ addmm_flop5s r=cKsD|\}}}|\}}} ||ksJ‚||ksJ‚||| d|} | S)z"Count flops for the bmm operation.r2r)r3r4r"r$Úbr5r6Úb2r7r8ÚfloprrrÚbmm_flop:s rAcKr:)z&Count flops for the baddbmm operation.©rAr;rrrÚbaddbmm_flopGs rCÚx_shapeÚw_shaper"Ú transposedc CsL|d}|r|n|dd…}|^}}} t|ƒt|ƒ|||d} | S)aCount flops for convolution. Note only multiplication is counted. Computation for bias are ignored. Flops for a transposed convolution are calculated as flops = (x_shape[2:] * prod(w_shape) * batch_size). Args: x_shape (list(int)): The input shape before convolution. w_shape (list(int)): The filter shape. out_shape (list(int)): The output shape after convolution. transposed (bool): is the convolution transposed Returns: int: the number of flops rr2Nr) rDrEr"rFÚ batch_sizeÚ conv_shapeÚc_outÚc_inÚfilter_sizer@rrrÚconv_flop_countOs rLc Ost||||dS)zCount flops for convolution.©rF)rL) rDrEÚ_biasÚ_strideÚ_paddingÚ _dilationrFr"r#r$rrrÚ conv_flopvsrRcCs–dd„}d} | drt|dƒ}| t||||ƒ7} | drIt|dƒ}|r9| t||ƒ||ƒ||ƒdd7} | S| t||ƒ||ƒ||ƒdd7} | S)NcSs |d|dgt|dd…ƒS)Nrrr2)Úlist)rrrrÚt‹s zconv_backward_flop..trrFrM)rrL)Úgrad_out_shaperDrErNrOrPrQrFÚ_output_paddingÚ_groupsÚoutput_maskr"rTÚ flop_countÚgrad_input_shapeÚgrad_weight_shaperrrÚconv_backward_flop|sF þr\cCsÀ|\}}}}|\}}} } |\}}} }||kr|kr8nJ‚||kr)|kr8nJ‚|| kr8| | kr8|| ks:J‚d}|t||||f|||| fƒ7}|t|||| f||| |fƒ7}|S)z^ Count flops for self-attention. NB: We can assume that value_shape == key_shape rrB)Úquery_shapeÚ key_shapeÚvalue_shaper>ÚhÚs_qÚd_qÚ_b2Ú_h2Ús_kÚ_d2Ú_b3Ú_h3Ú_s3Úd_vÚtotal_flopsrrrÚsdpa_flop_countäsP""rlcOst|||ƒS)úCount flops for self-attention.©rl)r]r^r_r"r#r$rrrÚ sdpa_flopösro)Úgrad_out.ccsJ|dur’t|jƒdksJ‚t|jƒdksJ‚|dus#|j|jks#J‚|j\}} } |j\}}}|j\}} }|dus;J‚|dusAJ‚|j|jksIJ‚|dd…|dd… ¡}|dd…|dd… ¡}t||ƒD]%\}}d| || f}d|||f}d| ||f}|dur†|nd}||||fVqjdS|j|j|j|durŸ|jndfVdS)a; Given inputs to a flash_attention_(forward|backward) kernel, this will handle behavior for NestedTensor inputs by effectively unbinding the NestedTensor and yielding the shapes for each batch element. In the case that this isn't a NestedTensor kernel, then it just yields the original shapes. Néréÿÿÿÿ©ÚlenrÚtolistÚzip)ÚqueryÚkeyÚvaluerpÚ cum_seq_qÚ cum_seq_kÚmax_qÚmax_kÚ_Úh_qrbÚh_kÚd_kÚh_vrjÚ seq_q_lengthsÚ seq_k_lengthsÚ seq_q_lenÚ seq_k_lenÚnew_query_shapeÚ new_key_shapeÚnew_value_shapeÚnew_grad_out_shaperrrÚ%_unpack_flash_attention_nested_shapesýs*€&r‹ccsP|dur•t|jƒdksJ‚t|jƒdksJ‚|dus#|j|jks#J‚|j\}}} } |j\}}}}|j\}}} }|dus>J‚|dusDJ‚|j|jksLJ‚|dd…|dd… ¡}|dd…|dd… ¡}t||ƒD]%\}}d| || f}d|||f}d| ||f}|dur‰|nd}||||fVqmdS|j|j|j|dur¢|jndfVdS)a? Given inputs to a efficient_attention_(forward|backward) kernel, this will handle behavior for NestedTensor inputs by effectively unbinding the NestedTensor and yielding the shapes for each batch element. In the case that this isn't a NestedTensor kernel, then it just yields the original shapes. Nérrrrs)rwrxryrpÚcu_seqlens_qÚcu_seqlens_kÚmax_seqlen_qÚmax_seqlen_kr~rrbr€rr‚rjÚ seqlens_qÚ seqlens_kÚlen_qÚlen_kr‡rˆr‰rŠrrrÚ)_unpack_efficient_attention_nested_shapes+s*€&r•T)r.c Os(t|||||||d} tdd„| DƒƒS)rm)rwrxryrzr{r|r}csó$|] \}}}}t|||ƒVqdSrrn©Ú.0r]r^r_r~rrrÚ uó € ÿ ÿz0_flash_attention_forward_flop..©r‹Úsum)rwrxryrzr{r|r}r"r#r$ÚsizesrrrÚ_flash_attention_forward_flop[óù þržc Os(t|||||||d} tdd„| DƒƒS)rm)rwrxryrrŽrrcsr–rrnr—rrrr™•ršz4_efficient_attention_forward_flop..©r•rœ)rwrxryÚbiasrrŽrrr#r$rrrrÚ!_efficient_attention_forward_flop{rŸr¢cCsVd}|\}}}}|\} } }}|\} }}}|\}}}}|| kr)| kr)|krBnJ‚|| kr;|kr;|krBnJ‚||ksDJ‚||krP||krP||ksRJ‚d}|t||||f||||fƒ7}|t||||f||||fƒ7}|t||||f||||fƒ7}|t||||f||||fƒ7}|t||||f||||fƒ7}|S)NrrB)rUr]r^r_rkr>r`rarbrcrdrerfrgrhrirjÚ_b4Ú_h4Ú_s4Ú_d4rrrÚsdpa_backward_flop_count›sT"""""r§cOst||||ƒS)z(Count flops for self-attention backward.©r§)rUr]r^r_r"r#r$rrrÚsdpa_backward_flop¶sr©c Oó*t|||||||| d}tdd„|DƒƒS)N)rwrxryrprzr{r|r}csó&|]\}}}}t||||ƒVqdSrr¨©r˜r]r^r_rUrrrr™Öó € ÿ ÿz1_flash_attention_backward_flop..r›) rprwrxryÚoutÚ logsumexprzr{r|r}r#r$ÚshapesrrrÚ_flash_attention_backward_flop»óø þr±c Orª)N)rwrxryrprrŽrrcsr«rr¨r¬rrrr™÷rz5_efficient_attention_backward_flop..r ) rprwrxryr¡r®rrŽrrr#r$r°rrrÚ"_efficient_attention_backward_flopÜr²r³cCst|tƒs|fS|Sr)rÚtuple)ÚxrrrÚnormalize_tuples r¶)ÚÚKÚMÚBÚTcCs0tdtttƒdtt|ƒƒddƒƒ}t|S)Nrrr2rq)ÚmaxÚminrtÚsuffixesÚstr)ÚnumberÚindexrrrÚget_suffix_strs(rÂcCs&t |¡}|d|d›}|t|S)Nièz.3f)r¾rÁ)rÀÚsuffixrÁryrrrÚconvert_num_with_suffixs rÄcCs|dkrdS||d›S)Nrú0%z.2%r)ÚnumÚdenomrrrÚconvert_to_percent_str&srÈcstˆƒ‡fdd„ƒ}|S)Ncst|ƒ\}}ˆ|Ž}t||ƒSr)rr)r#Ú flat_argsÚspecr®r%rrr',s z)_pytreeify_preserve_structure..nfrr(rr%rÚ_pytreeify_preserve_structure+srËcs¼eZdZdZ ddeeejje ejjfde dedeee e ffd d „Zde fdd „Zdeeee e fffdd„Zddd„Z‡fdd„Z‡fdd„Zddd„Zdd„Z‡ZS)raþ ``FlopCounterMode`` is a context manager that counts the number of flops within its context. It does this using a ``TorchDispatchMode``. It also supports hierarchical output by passing a module (or list of modules) to FlopCounterMode on construction. If you do not need hierarchical output, you do not need to use it with a module. Example usage .. code-block:: python mod = ... with FlopCounterMode(mod) as flop_counter: mod.sum().backward() Nr2TÚmodsÚdepthÚdisplayÚcustom_mappingcCsdtdd„ƒ|_||_||_|duri}|durtjdddit¥dd„| ¡Dƒ¥|_tƒ|_ dS)NcSsttƒSr)rÚintrrrrÚOsz*FlopCounterMode.__init__..zXs*z,FlopCounterMode.__init__..) rÚflop_countsrÍrÎÚwarningsÚwarnr ÚitemsrÚmod_tracker)ÚselfrÌrÍrÎrÏrrrÚ__init__IsÿþzFlopCounterMode.__init__r1cCst|jd ¡ƒS)NÚGlobal)rœrØÚvalues©rÝrrrÚget_total_flops\szFlopCounterMode.get_total_flopscCsdd„|j ¡DƒS)aReturn the flop counts as a dictionary of dictionaries. The outer dictionary is keyed by module name, and the inner dictionary is keyed by operation name. Returns: Dict[str, Dict[Any, int]]: The flop counts as a dictionary. cSsi|] \}}|t|ƒ“qSr)ÚdictrÕrrrr×isz3FlopCounterMode.get_flop_counts..)rØrÛrárrrÚget_flop_counts_s zFlopCounterMode.get_flop_countscs|durˆj}|dur d}ddl}d|_gd¢}g}ˆ ¡‰tˆƒ‰d‰‡‡‡‡fdd„}tˆj ¡ƒD]}|dkr;q4| d ¡d }||krGq4|||d ƒ}| |¡q4dˆjvrwˆswt |ƒD]\} } d|| d|| d<q_|ddƒ|}t|ƒdkr‚gd¢g}|j||d dS)Ni?BrT)ÚModuleÚFLOPz% TotalFcsŽtˆj| ¡ƒ}ˆ|ˆkO‰d|}g}| ||t|ˆƒt|ˆƒg¡ˆj| ¡D]\}}| |dt|ƒt|ˆƒt|ˆƒg¡q,|S)Nú z - )rœrØràÚappendrÄrÈrÛr¿)Úmod_namerÍrkÚpaddingràr6rÖ©Úglobal_flopsÚ global_suffixÚis_global_subsumedrÝrrÚprocess_modys ýýz.FlopCounterMode.get_table..process_modrßÚ.rrç)rßÚ0rÅ)ÚleftÚrightró)ÚheadersÚcolalign)rÍÚtabulateÚPRESERVE_WHITESPACErârÂÚsortedrØÚkeysÚcountÚextendÚ enumeratert)rÝrÍröÚheaderràrïÚmodÚ mod_depthÚ cur_valuesÚidxryrrërÚ get_tableks6 zFlopCounterMode.get_tablecs"|j ¡|j ¡tƒ ¡|Sr)rØÚclearrÜÚ __enter__Úsuperrá©Ú __class__rrr§s zFlopCounterMode.__enter__cs4tƒj|Ž|j ¡|jrt| |j¡ƒdSdSr)rÚ__exit__rÜrÎÚprintrrÍ)rÝr#rrrrs ÿzFlopCounterMode.__exit__rcCs,|r|ni}||i|¤Ž}| |j|||¡Sr)Ú_count_flopsÚ_overloadpacket)rÝÚfuncÚtypesr#r$r®rrrÚ__torch_dispatch__³sz"FlopCounterMode.__torch_dispatch__cCsV||jvr)|j|}||i|¤d|i¤Ž}t|jjƒD] }|j|||7<q|S)Nr!)r ÚsetrÜÚparentsrØ)rÝÚfunc_packetr®r#r$Úflop_count_funcrYÚparrrrr ¸s zFlopCounterMode._count_flops)Nr2TNr)rN)Ú__name__Ú __module__Ú__qualname__Ú__doc__r rrÚnnrårrÐÚboolr rrÞrâr¿rärrrrr Ú __classcell__rrrrr5s,ûþýü û < )Fr)KrÚtorch.utils._pytreerrrÚmodule_trackerrÚtypingrrr r rrr ÚcollectionsrÚtorch.utils._python_dispatchrÚ torch._decomprÚmathrÚ functoolsrrÙÚ__all__ÚopsÚatenrr Ú__annotations__r)rÚmmrÐr9Úaddmmr=ÚbmmrAÚbaddbmmrCrrLÚconvolutionÚ_convolutionrRÚconvolution_backwardr\rlÚ'_scaled_dot_product_efficient_attentionÚ#_scaled_dot_product_flash_attentionror‹r•Ú_flash_attention_forwardržÚ_efficient_attention_forwardr¢r§Ú0_scaled_dot_product_efficient_attention_backwardÚ,_scaled_dot_product_flash_attention_backwardr©Ú_flash_attention_backwardr±Ú_efficient_attention_backwardr³r¶r¾rÂrÄrÈrËrrrrrÚsÀ $ üÿþýü û'ôgû6 ö3û6 ö0 ÷õõ ó ó!ñ