o y“©iUã@sxUddlZddlmZddlmZmZmZmZmZm Z m Z mZddlZddlm Z mZddlmZddlmZddgiZd d ddd dddddddœZeeeeeffed<dZdLdd„Zdede efdd„Zdedededeee ffd d!„Z "dMd#e ed$e ed%edeee e effd&d'„Zd(e e ed#e ed$e ede efd)d*„Zd+e e ed$e ede efd,d-„Z dNd.ed/eed0eegefd1eege efde ef d2d3„Z d4e ed5e ed6edeee ffd7d8„Z!d4e ed5e edeee ffd9d:„Z"d4e e ed5e e edeee ffd;d<„Z# dNd=e ed5e e ed>eeeefd?ed/eed0eegefd1eege efdeeeefeeee fffd@dA„Z$dBeeee fdeee ffdCdD„Z% E " FdOd=eee efd5eee ee e efd?eddGed0eegefd1eege efdHeee edIffdeee ffdJdK„Z&dS)PéN)ÚCounter)ÚAnyÚCallableÚDictÚListÚOptionalÚSequenceÚTupleÚUnion)ÚTensorÚtensor)ÚLiteral)Ú_NLTK_AVAILABLE)Úrouge_scoreÚ_rouge_score_updateÚnltkééééééééé ÚLÚLsum)Úrouge1Úrouge2Úrouge3Úrouge4Úrouge5Úrouge6Úrouge7Úrouge8Úrouge9ÚrougeLÚ rougeLsumÚALLOWED_ROUGE_KEYS)ÚavgÚbestÚreturncCs^ddl}z |j d¡WdSty.z|jddddddWYdSty-tdƒ‚ww) zxCheck whether `nltk` `punkt` is downloaded. If not, try to download if a machine is connected to the internet. rNztokenizers/punkt.zipÚpunktTF)ÚquietÚforceÚ halt_on_errorÚraise_on_errorzz`nltk` resource `punkt` is not available on a disk and cannot be downloaded as a machine is not connected to the internet.)rÚdataÚfindÚLookupErrorÚdownloadÚ ValueErrorÚOSError)r©r7úV/home/ubuntu/.local/lib/python3.10/site-packages/torchmetrics/functional/text/rouge.pyÚ _ensure_nltk_punkt_is_downloaded*sÿÿýr9ÚxcCs2tstdƒ‚ddl}tƒt dd|¡| |¡S)zdThe sentence is split to get rougeLsum scores matching published rougeL scores for BART and PEGASUS.zQROUGE-Lsum calculation requires that `nltk` is installed. Use `pip install nltk`.rNzÚ)rÚModuleNotFoundErrorrr9ÚreÚsubÚ sent_tokenize)r:rr7r7r8Ú_split_sentence=s r@Úhits_or_lcsÚpred_lenÚ target_lencCsp||}||}||krdkr!nn ttdƒtdƒtdƒdSd||||}tt|ƒt|ƒt|ƒdS)akThis computes precision, recall and F1 score based on hits/lcs, and the length of lists of tokenizer predicted and target sentences. Args: hits_or_lcs: A number of matches or a length of the longest common subsequence. pred_len: A length of a tokenized predicted sentence. target_len: A length of a tokenized target sentence. ç©Ú precisionÚrecallÚfmeasurer)Údictr)rArBrCrFrGrHr7r7r8Ú_compute_metricsIs rJFÚpred_tokensÚ target_tokensÚreturn_full_tablecsÆ‡fdd„tt|ƒdƒDƒ}tdt|ƒdƒD]@}tdtˆƒdƒD]4}||dˆ|dkrB||d|dd|||<q#t||d||||dƒ|||<q#q|r]|S|ddS)zÅCommon DP algorithm to compute the length of the longest common subsequence. Args: pred_tokens: A tokenized predicted sentence. target_tokens: A tokenized target sentence. csg|]}dgtˆƒd‘qS)rr)Úlen)Ú.0Ú_©rKr7r8Ú dsz_lcs..réÿÿÿÿ)ÚrangerNÚmax)rKrLrMÚlcsÚiÚjr7rQr8Ú_lcs[s ",ürYÚ lcs_tablecCs¤t|ƒ}t|ƒ}g}|dkrP|dkrP||d||dkr/| d|d¡|d8}|d8}n|||d||d|krD|d8}n|d8}|dkrP|dks|S)zöBacktrack LCS table. Args: lcs_table: A table containing information for the calculation of the longest common subsequence. pred_tokens: A tokenized predicted sentence. target_tokens: A tokenized target sentence. rr)rNÚinsert)rZrKrLrWrXÚbacktracked_lcsr7r7r8Ú_backtracked_lcsps ø r]Úpred_tokens_listcsndttdttdttfdd„‰dtttdttfdd„}‡‡fd d „|Dƒ}‡fdd „||ƒDƒ}|S)zþFind union LCS between a target sentence and iterable of predicted tokens. Args: pred_tokens_list: A tokenized predicted sentence split by ' '. target_tokens: A tokenized single part of target sentence split by ' '. Return: rKrLr+cSst||dd}t|||ƒ}|S)zSReturns one of the longest of longest common subsequence via backtracked lcs table.T)rM)rYr])rKrLrZÚbacktracked_lcs_tabler7r7r8Úlcs_ind“sz_union_lcs..lcs_indÚ lcs_tablescSstttƒj|ŽƒƒS)z#Find union LCS given a list of LCS.)ÚsortedÚlistÚsetÚunion)rar7r7r8Ú find_union™sz_union_lcs..find_unioncsg|]}ˆ|ˆƒ‘qSr7r7)rOrK©r`rLr7r8rRsz_union_lcs..csg|]}ˆ|‘qSr7r7©rOrW)rLr7r8rRžó)rÚstrÚint)r^rLrfraÚ union_lcsr7rgr8Ú _union_lcs‰s " rmÚtextÚstemmerÚ normalizerÚ tokenizercsft|ƒr||ƒnt dd| ¡¡}t|ƒr||ƒnt d|¡}ˆr*‡fdd„|Dƒ}dd„|Dƒ}|S)aRouge score should be calculated only over lowercased words and digits. Optionally, Porter stemmer can be used to strip word suffixes to improve matching. The text normalization follows the implemantion from `Rouge score_Text Normalizition`_ Args: text: An input sentence. stemmer: Porter stemmer instance to strip word suffixes to improve matching. normalizer: A user's own normalizer function. If this is ``None``, replacing any non-alpha-numeric characters with spaces is default. This function must take a ``str`` and return a ``str``. tokenizer: A user's own tokenizer function. If this is ``None``, splitting by spaces is default This function must take a ``str`` and return ``Sequence[str]`` z [^a-z0-9]+ú z\s+cs&g|]}t|ƒdkrˆ |¡n|‘qS)r)rNÚstem©rOr:©ror7r8rR¿ó&z0_normalize_and_tokenize_text..cSs&g|]}t|tƒrt|ƒdkr|‘qS)r)Ú isinstancerjrNrtr7r7r8rRÂrv)Úcallabler=r>ÚlowerÚsplit)rnrorprqÚtokensr7rur8Ú_normalize_and_tokenize_text¢s"r|ÚpredÚtargetÚn_gramcs¤dttdtdtfdd„}|||ƒ|||ƒ‰‰tˆ ¡ƒtˆ ¡ƒ}}d||fvr8ttdƒtdƒtdƒdSt‡‡fd d „tˆƒDƒƒ}t |t |dƒt |dƒƒS)z»This computes precision, recall and F1 score for the Rouge-N metric. Args: pred: A predicted sentence. target: A target sentence. n_gram: N-gram overlap. r{Únr+csDtƒ}‡‡fdd„ttˆƒˆdƒDƒD] }||d7<q|S)Nc3s$|] }tˆ||ˆ…ƒVqdS©N)Útuplerh©r€r{r7r8Ú Òs€"z9_rouge_n_score.._create_ngrams..r)rrTrN)r{r€ÚngramsÚngramr7rƒr8Ú_create_ngramsÐs(z&_rouge_n_score.._create_ngramsrrDrEc3s"|]}tˆ|ˆ|ƒVqdSr)Úmin)rOÚw©Úpred_ngramsÚ target_ngramsr7r8r„Üs€ z!_rouge_n_score..r)rrjrkrÚsumÚvaluesrIrrdrJrU)r}r~rr‡rBrCÚhitsr7rŠr8Ú_rouge_n_scoreÇs rcCsNt|ƒt|ƒ}}d||fvrttdƒtdƒtdƒdSt||ƒ}t|||ƒS)z›This computes precision, recall and F1 score for the Rouge-L metric. Args: pred: A predicted sentence. target: A target sentence. rrDrE)rNrIrrYrJ)r}r~rBrCrVr7r7r8Ú_rouge_l_scoreàs r‘cCsÚttt|ƒƒ}ttt|ƒƒ}d||fvr!ttdƒtdƒtdƒdSdtttdtfdd„}||ƒ}||ƒ}d}|D],}t||ƒ} | D]"} || dkre|| dkre|d7}|| d8<|| d8<qCq:t |||ƒS) a:This computes precision, recall and F1 score for the Rouge-LSum metric. More information can be found in Section 3.2 of the referenced paper [1]. This implementation follow the official implementation from: https://github.com/google-research/google-research/blob/master/rouge/rouge_scorer.py Args: pred: An iterable of predicted sentence split by ' '. target: An iterable target sentence split by ' '. References [1] ROUGE: A Package for Automatic Evaluation of Summaries by Chin-Yew Lin. https://aclanthology.org/W04-1013/ rrDrEÚ sentencesr+cSstƒ}|D]}| |¡q|Sr)rÚupdate)r’r…Úsentencer7r7r8Ú_get_token_countssz,_rouge_lsum_score.._get_token_countsr) rÚmaprNrIrrrjrrmrJ)r}r~rBrCr•Úpred_tokens_countÚtarget_tokens_countrÚtgtrVÚtokenr7r7r8Ú_rouge_lsum_scoreïs$ €ür›ÚpredsÚrouge_keys_valuesÚ accumulatecsdd„|Dƒ}t||ƒD]ú\}} dd„|Dƒ} dd„|Dƒ}g}t|ˆˆˆƒ} d|vr8‡‡‡fdd„t|ƒDƒ}| D]P}t|ˆˆˆƒ}d|vrT‡‡‡fdd„t|ƒDƒ}|D],}t|tƒrdt| ||ƒ}n|d krnt| |ƒ}n |dkrwt||ƒ}|| |<|| |¡qV| | ¡¡q:|d kr¹|d‰t ‡fdd„|Dƒ¡}tt |¡ ¡ƒ}|D] }|| |||¡qªq|d krdd„|Dƒ}| ¡D].\}}i‰|D]}| ¡D]\}}|ˆvrãgˆ|<ˆ| |¡q×qÑ‡fdd„ˆDƒ||<qÉ|D]}|| ||¡qúq|S)aÆ Update the rouge score with the current set of predicted and target sentences. Args: preds: An iterable of predicted sentences. target: An iterable of iterable of target sentences. rouge_keys_values: List of N-grams/'L'/'Lsum' arguments. accumulate: Useful incase of multi-reference rouge score. ``avg`` takes the avg of all references with respect to predictions ``best`` takes the best fmeasure score obtained between prediction and multiple corresponding references. Allowed values are ``avg`` and ``best``. stemmer: Porter stemmer instance to strip word suffixes to improve matching. normalizer: A user's own normalizer function. If this is ``None``, replacing any non-alpha-numeric characters with spaces is default. This function must take a `str` and return a `str`. tokenizer: A user's own tokenizer function. If this is ``None``, spliting by spaces is default This function must take a `str` and return `Sequence[str]` Example: >>> preds = "My name is John".split() >>> target = "Is your name John".split() >>> from pprint import pprint >>> score = _rouge_score_update(preds, target, rouge_keys_values=[1, 2, 3, 'L'], accumulate='best') >>> pprint(score) {1: [{'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}], 2: [{'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}], 3: [{'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}], 'L': [{'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}, {'fmeasure': tensor(0.), 'precision': tensor(0.), 'recall': tensor(0.)}]} cSói|]}|g“qSr7r7©rOÚ rouge_keyr7r7r8Ú Józ'_rouge_score_update..cSói|]}|i“qSr7r7r r7r7r8r¢Mr£cSrŸr7r7r r7r7r8r¢Nr£rcóg|] }t|ˆˆˆƒ‘qSr7©r|)rOÚ pred_sentence©rprorqr7r8rRRóÿÿz'_rouge_score_update..cr¥r7r¦)rOÚtgt_sentencer¨r7r8rR[r©rr*rcsg|]}|ˆd‘qS)rHr7)rOÚv)Úkey_currr7r8rRmsr)cSr¤r7r7r r7r7r8r¢tsÿcs i|]}|t ˆ|¡ ¡“qSr7)ÚtorchrÚmean)rOÚ_type)Ú_dict_metric_score_batchr7r8r¢sÿ)Úzipr|r@rwrkrr‘r›ÚappendÚcopyrrÚargmaxÚitemÚitems)rœr~rržrorprqÚresultsÚpred_rawÚ target_rawÚresult_innerÚ result_avgÚlist_resultsr}Ú pred_lsumÚtarget_raw_innerr™Útarget_lsumr¡ÚscoreÚall_fmeasureÚhighest_idxÚnew_result_avgÚmetricsÚmetricr¯Úvaluer7)r°r¬rprorqr8rsh3þþ ÿ ÿý ÿ€rÚsentence_resultscCs8i}|ikr|S| ¡D] \}}t |¡ ¡||<q|S)zÇCompute the combined ROUGE metric for all the input set of predicted and target sentences. Args: sentence_results: Rouge-N/Rouge-L/Rouge-LSum metrics calculated for single sentence. )r¶rrr®)rÇr·r¡Úscoresr7r7r8Ú_rouge_score_compute‰srÉr*©rrr&r'Úuse_stemmerÚ rouge_keys.c Csj|rtstdƒ‚ddl}|r|jj ¡nd}t|tƒs|f}|D]} | t ¡vr6t d| ›dtt ¡ƒ›ƒ‚q dd„|Dƒ} t|tƒr[tdd „|Dƒƒr[t|t ƒrT|gnd d„|Dƒ}t|t ƒrc|g}t|t ƒrl|gg}t||| ||||d}i}| D]} dD]}g|d | ›d|›<qq{| ¡D]\} }|D]}| ¡D]\}}|d | ›d|› |¡qq—q‘t|ƒS)aw Calculate `Calculate Rouge Score`_ , used for automatic summarization. Args: preds: An iterable of predicted sentences or a single predicted sentence. target: An iterable of iterables of target sentences or an iterable of target sentences or a single target sentence. accumulate: Useful incase of multi-reference rouge score. - ``avg`` takes the avg of all references with respect to predictions - ``best`` takes the best fmeasure score obtained between prediction and multiple corresponding references. use_stemmer: Use Porter stemmer to strip word suffixes to improve matching. normalizer: A user's own normalizer function. If this is ``None``, replacing any non-alpha-numeric characters with spaces is default. This function must take a ``str`` and return a ``str``. tokenizer: A user's own tokenizer function. If this is ``None``, spliting by spaces is default This function must take a ``str`` and return ``Sequence[str]`` rouge_keys: A list of rouge types to calculate. Keys that are allowed are ``rougeL``, ``rougeLsum``, and ``rouge1`` through ``rouge9``. Return: Python dictionary of rouge scores for each input rouge key. Example: >>> from torchmetrics.functional.text.rouge import rouge_score >>> preds = "My name is John" >>> target = "Is your name John" >>> from pprint import pprint >>> pprint(rouge_score(preds, target)) {'rouge1_fmeasure': tensor(0.7500), 'rouge1_precision': tensor(0.7500), 'rouge1_recall': tensor(0.7500), 'rouge2_fmeasure': tensor(0.), 'rouge2_precision': tensor(0.), 'rouge2_recall': tensor(0.), 'rougeL_fmeasure': tensor(0.5000), 'rougeL_precision': tensor(0.5000), 'rougeL_recall': tensor(0.5000), 'rougeLsum_fmeasure': tensor(0.5000), 'rougeLsum_precision': tensor(0.5000), 'rougeLsum_recall': tensor(0.5000)} Raises: ModuleNotFoundError: If the python package ``nltk`` is not installed. ValueError: If any of the ``rouge_keys`` does not belong to the allowed set of keys. References: [1] ROUGE: A Package for Automatic Evaluation of Summaries by Chin-Yew Lin. https://aclanthology.org/W04-1013/ zBStemmer requires that `nltk` is installed. Use `pip install nltk`.rNzGot unknown rouge key z. Expected to be one of cSsg|]}t|‘qSr7)r()rOÚkeyr7r7r8rRårizrouge_score..css|]}t|tƒVqdSr)rwrj©rOr™r7r7r8r„çs€zrouge_score..cSsg|]}|g‘qSr7r7rÎr7r7r8rRèr£)rorprqrž)rHrFrGÚrougerP)rr<rrsÚporterÚ PorterStemmerrwr‚r(Úkeysr5rcÚallrjrr¶r²rÉ)rœr~ržrËrprqrÌrrorÍrrÇÚoutputr¡ÚtprÄrÅrÆr7r7r8ršsN? ÿ ù ÿÿÿr)r+N)F)NNN)r*FNNrÊ)'r=ÚcollectionsrÚtypingrrrrrrr r rrrÚtyping_extensionsr Útorchmetrics.utilities.importsrÚ__doctest_requires__r(rjrkÚ__annotations__ÚALLOWED_ACCUMULATE_VALUESr9r@rJÚboolrYr]rmr|rr‘r›rrÉrr7r7r7r8ÚsÐ ( õ "ÿÿÿÿ þ ÿÿÿ þ&üÿþýü û*%&.-ùÿ þýüûúù ø&rùÿþýüûúù ø