o }o™iã@s<ddlZddlZddd„Zddd„Zddd„Zdd d „ZdS)éNFcóH|rtjdddtdˆ›dˆ›ddƒ‰d‰‡‡‡‡‡‡fd d „}|S)a A forward hook to dump all of the module input and output norms. It is called at every time after forward() has computed an output. Only float type input/output tensor norms are computed. For more details about the forward hook, check https://pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_hook.html Args: name: tensor name trainer: PTL trainer rank: worker rank logger: PTL log function dump_to_file: wether dump the csv file to the disk Ú debug_infoT©Úexist_okzdebug_info/forward_Ú_rankú.txtÚwFc óŽˆjrÁg}g}t|ƒD]<\}}t|tjƒrG|jtjks'|jtjks'|jtjkrGˆs.| d¡|j ¡}| |›¡ˆdˆ›dˆ›d|›|ƒqt|tƒrt|ƒD]<\}}t|tjƒr|jtjksm|jtjksm|jtjkrˆst| d¡|j ¡}| |›¡ˆdˆ›dˆ›d|›|ƒqQn| d¡| |j ¡›¡| ˆj ›¡ˆs·| d¡ˆ d |¡d ¡d ‰ˆ d |¡d ¡ˆ ¡dS)NÚinputzdebug_info_forward/rÚ_inputÚoutputÚ_outputÚstepú,Ú T©ÚtrainingÚ enumerateÚ isinstanceÚtorchÚTensorÚdtypeÚfloatÚhalfÚbfloat16ÚappendÚdataÚnormÚtupleÚglobal_stepÚwriteÚjoinÚflush© ÚmoduleÚinputsÚoutputsÚvaluesÚheadersÚnÚiÚ input_normÚoutput_norm©ÚfpÚheaderÚloggerÚnameÚrankÚtrainer©úI/home/ubuntu/.local/lib/python3.10/site-packages/nemo/utils/debug_hook.pyÚforward_hook&ó@$ € $ €ø z&get_forward_hook..forward_hook©ÚosÚmakedirsÚopen)r1r3r2r0Údump_to_filer6r4r-r5Úget_forward_hookó $r=cr)aE A backward hook to dump all of the module input and output grad norms. The hook will be called every time the gradients with respect to module inputs are computed. Only float type input/output grad tensor norms are computed. For more details about the backward hook, check https://pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_full_backward_hook.html Args: name: tensor name trainer: PTL trainer rank: worker rank logger: PTL log function dump_to_file: wether dump the csv file to the disk rTrzdebug_info/backward_rrrFc r )Nr zdebug_info_backward/rrrr rrrTrr#r-r4r5Ú backward_hook_r7z(get_backward_hook..backward_hookr8)r1r3r2r0r<r?r4r-r5Úget_backward_hookMr>r@csLˆrtjdddtdˆ›dˆ›ddƒ‰d‰‡‡‡‡‡‡‡‡fd d „}|S)aÐ A tensor hook to dump all of the tensor weight norms and grad norms at the end of each of the backward steps. For more details about the tensor hook, check https://pytorch.org/docs/stable/generated/torch.Tensor.register_hook.html Args: module: the model module name: tensor name trainer: PTL trainer rank: worker rank logger: PTL log function dump_to_file: wether dump the csv file to the disk rTrzdebug_info/tensor_rz.csvrFcsÚg}g}ˆ ˆ¡}|j ¡}|j ¡}ˆdˆ›dˆ›d|ƒˆdˆ›dˆ›d|ƒ| |›¡| |›¡| ˆj›¡ˆrkˆs]| d¡| d¡| d¡ˆ d |¡d ¡d ‰ˆ d |¡d ¡ˆ ¡|S)Nzdebug_info_tensors/rÚ _grad_normÚ_weight_normÚweightÚgradrrrT)Ú get_parameterrrrrr r!r")rDr'r(rCÚweight_normÚ grad_norm©r<r.r/r0r$r1r2r3r4r5Útensor_hook˜s( z$get_tensor_hook..tensor_hookr8)r$r1r3r2r0r<rIr4rHr5Úget_tensor_hook†s rJc Cs”d}tj ¡rtj ¡}| ¡D]\}}|dkr$| t||||||ƒ¡q| ¡D]\}}|dkrG| t |||||ƒ¡| t|||||ƒ¡q)dS)zÏ Register debug hooks. It can 1. track the module forward step input/ouput norm 2. track the module backward step input/output grad norm 3. track the parameter weight norm and grad norm. rÚN)rÚdistributedÚis_initializedÚget_rankÚnamed_parametersÚ register_hookrJÚ named_modulesÚregister_forward_hookr=Úregister_full_backward_hookr@)r$r3r0r<r2r1ÚtensorÚlayerr4r4r5Úregister_debug_hooks´s €€ýrV)F)r9rr=r@rJrVr4r4r4r5Ús 9 9.