o .wÖij&ã@sUddlmZddlmZmZddlZddlZddlmZddl m Z ddlmZddl mZiZeed<d ed ejdefdd „Zdededeeeffdd„Zdededeeeffdd„Z d"dededede dde ddedeeeffdd„Zdededefd d!„ZdS)#é)Úpermutations)ÚAnyÚCallableN)ÚTensor)ÚLiteral)Úrank_zero_warn)Ú_SCIPY_AVAILABLEÚ_ps_dictÚspk_numÚdeviceÚreturncCsJt|ƒt|ƒ}|tvrtjttt|ƒƒƒ|d}|t|<|St|}|S)N)r)Ústrr ÚtorchÚtensorÚlistrÚrange)r rÚkeyÚps©rú^/home/ubuntu/sommelier/.venv/lib/python3.10/site-packages/torchmetrics/functional/audio/pit.pyÚ_gen_permutationssÿrÚ metric_mtxÚ eval_funccsvddlm‰| ¡ ¡}t t ‡‡fdd„|Dƒ¡¡}| |j ¡}t |d|dd…dd…df¡ ddg¡}||fS) a¢Solves the linear sum assignment problem. This implementation uses scipy and input is therefore transferred to cpu during calculations. Args: metric_mtx: the metric matrix, shape [batch_size, spk_num, spk_num] eval_func: the function to reduce the metric values of different the permutations Returns: best_metric: shape ``[batch]`` best_perm: shape ``[batch, spk]`` r)Úlinear_sum_assignmentcs g|]}ˆ|ˆtjkƒd‘qS)é)rÚmax)Ú.0Úpwm©rrrrÚ >s z<_find_best_perm_by_linear_sum_assignment..éNéÿÿÿÿéþÿÿÿ)Úscipy.optimizerÚdetachÚcpurrÚnpÚarrayÚtorÚgatherÚmean)rrÚmmtxÚ best_permÚbest_metricrrrÚ(_find_best_perm_by_linear_sum_assignment*s *r.cCsˆ|jdd…\}}t||jd}|jd}|jd |||¡}t |d|¡}|jdd}||dd\} } | ¡} || dd…f}| |fS)aSolves the linear sum assignment problem using exhaustive method. This is done by exhaustively calculating the metric values of all possible permutations, and returns the best metric values and the corresponding permutations. Args: metric_mtx: the metric matrix, shape ``[batch_size, spk_num, spk_num]`` eval_func: the function to reduce the metric values of different the permutations Returns: best_metric: shape ``[batch]`` best_perm: shape ``[batch, spk]`` Nr ©r rr)N.r©Údim) ÚshaperrÚTÚexpandrr)r*r$)rrÚ batch_sizer rÚperm_numÚbpsÚmetric_of_ps_detailsÚmetric_of_psr-Úbest_indexesr,rrrÚ$_find_best_perm_by_exhaustive_methodDs r;úspeaker-wiserÚpredsÚtargetÚmetric_funcÚmode©r<úpermutation-wise©rÚminÚkwargscKsx|jdd…|jdd…krtdƒ‚|dvrtd|›ƒ‚|dvr(td|›ƒ‚|jdkr:td|j›d |j›d ƒ‚|dkrAtjntj}|jdd…\}}|dkr¬t||jd } | jd} tj |d| d¡dj || g|jdd…¢RŽ}|j| dd}|||fi|¤Ž} tj| |t | ƒd¡dd} || dd\}}| ¡}| |dd…f}||fS||dd…ddf|dd…ddffi|¤Ž}tj|||f|j|jd}||dd…ddf<t|ƒD]0}t|ƒD])}|dkrî|dkrîqã||dd…|df|dd…|dffi|¤Ž|dd…||f<qãqÝ|dksts1|dkr&ts&td|›dƒt||ƒ\}}||fSt||ƒ\}}||fS)aUCalculate `Permutation invariant training`_ (PIT). This metric can evaluate models for speaker independent multi-talker speech separation in a permutation invariant way. Args: preds: float tensor with shape ``(batch_size,num_speakers,...)`` target: float tensor with shape ``(batch_size,num_speakers,...)`` metric_func: a metric function accept a batch of target and estimate. if `mode`==`'speaker-wise'`, then ``metric_func(preds[:, i, ...], target[:, j, ...])`` is called and expected to return a batch of metric tensors ``(batch,)``; if `mode`==`'permutation-wise'`, then ``metric_func(preds[:, p, ...], target[:, :, ...])`` is called, where `p` is one possible permutation, e.g. [0,1] or [1,0] for 2-speaker case, and expected to return a batch of metric tensors ``(batch,)``; mode: can be `'speaker-wise'` or `'permutation-wise'`. eval_func: the function to find the best permutation, can be ``'min'`` or ``'max'``, i.e. the smaller the better or the larger the better. kwargs: Additional args for metric_func Returns: Tuple of two float tensors. First tensor with shape ``(batch,)`` contains the best metric value for each sample and second tensor with shape ``(batch,)`` contains the best permutation. Example: >>> from torchmetrics.functional.audio import scale_invariant_signal_distortion_ratio >>> # [batch, spk, time] >>> preds = torch.tensor([[[-0.0579, 0.3560, -0.9604], [-0.1719, 0.3205, 0.2951]]]) >>> target = torch.tensor([[[ 1.0958, -0.1648, 0.5228], [-0.4100, 1.1942, -0.5103]]]) >>> best_metric, best_perm = permutation_invariant_training( ... preds, target, scale_invariant_signal_distortion_ratio, ... mode="speaker-wise", eval_func="max") >>> best_metric tensor([-5.1091]) >>> best_perm tensor([[0, 1]]) >>> pit_permutate(preds, best_perm) tensor([[[-0.0579, 0.3560, -0.9604], [-0.1719, 0.3205, 0.2951]]]) rr z_Predictions and targets are expected to have the same shape at the batch and speaker dimensionsrCz-eval_func can only be "max" or "min" but got rAz>mode can only be "speaker-wise" or "permutation-wise" but got z/Inputs must be of shape [batch, spk, ...], got z and z insteadrrBr/rr!)r1ÚindexN)Úrepeatsr1r0.)ÚdtyperézIn pit metric for speaker-num z8>3, we recommend installing scipy for better performance)r2ÚRuntimeErrorÚ ValueErrorÚndimrrrDrrÚindex_selectÚreshapeÚrepeat_interleaver*Úlenr$ÚemptyrHrrrr;r.)r=r>r?r@rrEÚeval_opr5r Úpermsr6ÚppredsÚptargetr9r-r:r,Ú first_elerÚ target_idxÚ preds_idxrrrÚpermutation_invariant_trainingksb2ÿ ÿÿ. ÿÿý ÿþrYÚpermcCst dd„t||ƒDƒ¡S)a"Permutate estimate according to perm. Args: preds: the estimates you want to permutate, shape [batch, spk, ...] perm: the permutation returned from permutation_invariant_training, shape [batch, spk] Returns: Tensor: the permutated version of estimate cSsg|]\}}t |d|¡‘qS)r)rrM)rÚpredÚprrrrãsz!pit_permutate..)rÚstackÚzip)r=rZrrrÚ pit_permutateØsr_)r<r)Ú itertoolsrÚtypingrrÚnumpyr&rrÚtyping_extensionsrÚtorchmetrics.utilitiesrÚtorchmetrics.utilities.importsrr ÚdictÚ__annotations__ÚintrrÚtupler.r;rYr_rrrrÚsT ÿþ ýÿþ ý+ûÿþýüûú ùm