o ¢iÒã@sbddlmZddlZddlZddlmZddlmZddlmZGdd„dƒZ Gd d „d eƒZ dS)é)ÚOptionalN)ÚTensor)ÚModuleé)ÚErrorsc@s\eZdZUeed<eeed<defdd„Zedefdd„ƒZde fdd „Z ed d„ƒZdS) Ú AttentionMaskÚ bool_maskÚ_logit_maskcCs2|jtjkr tdƒ‚||_tj ttd¡|_ dS)Nz7Expected the attention mask to be of dtype 'torch.bool') ÚdtypeÚtorchÚboolÚ ValueErrorrÚjitÚannotaterrr )Úselfr©rúY/home/ubuntu/.local/lib/python3.10/site-packages/curated_transformers/models/attention.pyÚ__init__szAttentionMask.__init__ÚreturncCs4|jdurd|j ¡d|_|j}|dusJ‚|S)Ngð?gàÿÿïÇ)r rÚint)rÚ logit_maskrrrrs zAttentionMask.logit_maskcCs |j ¡S©N)rÚdim©rrrrr s zAttentionMask.dimcCs|jjSr)rÚshaperrrrr#szAttentionMask.shapeN)Ú__name__Ú __module__Ú__qualname__rÚ__annotations__rrÚpropertyrrrrrrrrr s rc sFeZdZddœdef‡fdd„Zdededed ed ef dd„Z‡ZS) ÚScaledDotProductAttentiongš™™™™™¹?)Údropout_probr!cstƒ ¡tjj|d|_dS)N)Úp)ÚsuperrrÚnnÚDropoutÚdropout)rr!©Ú __class__rrr*s z"ScaledDotProductAttention.__init__ÚkÚqÚvÚ attn_maskrcCsz| ¡dkr tdƒ‚|jd}|| dd¡}|t |¡}|j\}}||j |dd|¡7}|jdd} | | |¡} | S)zw Shapes: k, q, v - (batch, heads, seq_len, width) attn_mask - (batch, seq_len) rz@The attention mask must be a 2D-tensor of shape [batch, seq_len]éÿÿÿÿéþÿÿÿé)r) rr rÚ transposeÚmathÚsqrtrÚviewÚsoftmaxr&)rr)r*r+r,Ú model_dimÚattn_scoresÚbatchÚseq_lenÚattn_weightsÚattn_valuesrrrÚforward.s ÿ z!ScaledDotProductAttention.forward) rrrÚfloatrrrr;Ú __classcell__rrr'rr )sÿÿÿÿþr )Útypingrr1rrÚtorch.nnrÚerrorsrrr rrrrÚs