Use this prompt. It describes **our current implementation**, not MER: ```text You are helping me audit a custom ASR metric from our codebase named `space_norm_wer`. Explain the metric in depth, but describe exactly what our current code does, not what a better metric should do. Important: - Do NOT redefine it as MER. - Do NOT simplify it into "CER on text with spaces removed". - Do NOT describe standard WER only. - Describe the exact implementation behavior. Context: This metric is computed after a standard normalization pass, then a custom function aligns the reference and hypothesis at the character level with all spaces removed, and maps those character edits back to reference words. This is the normalization pipeline applied before scoring: 1. Remove a leading language tag like `<|te|>` if present. 2. Remove bracketed spans like `[noise]`. 3. Remove angle-bracket markup like `<...>`. 4. Remove punctuation: `[।॥,.\-!?;:'"()\[\]{}\/\\@#$%^&*+=~`|]` 5. Remove digits `[0-9]` 6. Lowercase the text 7. Collapse multiple whitespace into a single space and trim After normalization: - `ref_norm = normalized reference` - `hyp_norm = normalized hypothesis` Our current `space_norm_wer` is computed like this per sample: 1. Split `ref_norm` by spaces into `ref_words`. 2. If `ref_words` is empty, return `(0, 0)`. 3. Remove all spaces: - `ref_nospace = ref_norm.replace(" ", "")` - `hyp_nospace = hyp_norm.replace(" ", "")` 4. If `ref_nospace == hyp_nospace`, return `(0, len(ref_words))`. - This means pure spacing differences cause zero errors. 5. Build a `char_to_word` mapping: - For every character position in `ref_nospace`, record which reference word that character originally belonged to. 6. Compute standard Levenshtein edit distance DP over characters of: - `ref_nospace` - `hyp_nospace` 7. Backtrack through the DP and mark which reference words were touched by edits: - Match: mark nothing - Substitution: mark the reference word owning that substituted reference character - Deletion from reference: mark the reference word owning that deleted reference character - Insertion in hypothesis: mark the nearest reference word - if `i > 0`, mark the previous reference word - else if `i < n`, mark the next reference word 8. Store marked words in a set so the same reference word is counted at most once, even if multiple character edits hit it. 9. Return: - `error_words = number of distinct reference words touched by edits` - `total_words = len(ref_words)` So the per-sample score is effectively: - `space_norm_wer_sample = error_words / total_words` Corpus-level aggregation in our benchmark: - Only samples where both `ref_norm` and `hyp_norm` are non-empty contribute to this metric. - The benchmark accumulates: - `space_err_words += error_words` - `space_total_words += total_words` - Final corpus metric is: - `space_norm_wer = space_err_words / space_total_words` This means: - It is a micro-average over all reference words. - It is NOT the average of per-utterance percentages. - It is NOT CER on the joined strings. - It is a word-level metric after space-insensitive character alignment. Explain all of the following in plain English: 1. The intuition behind the metric 2. The exact normalization pipeline 3. The exact DP + backtracking logic 4. Why insertions are assigned to a nearby reference word 5. Why a single wrong character can make one whole reference word count as wrong 6. Why this metric forgives split/join boundary errors 7. Why this metric is different from MER 8. How the corpus-level aggregation works 9. Why this metric is stricter/coarser than character-level MER Then work through these examples explicitly and compute the outcomes: Example 1: English spacing only - ref: `new york city` - hyp: `newyork city` Example 2: English spacing + content error - ref: `money transfer limit` - hyp: `moneytransfer limt` Example 3: Telugu spacing only - ref: `చాలా మంచి రోజు` - hyp: `చాలామంచి రోజు` Example 4: Telugu spacing + content error - ref: `చాలా మంచి రోజు` - hyp: `చాలామంచి రొజు` For each example, show: - normalized reference - normalized hypothesis - space-stripped reference - space-stripped hypothesis - which reference words are counted as wrong - current `space_norm_wer` - what MER would have been if MER were defined as `cer(ref_nospace, hyp_nospace)` End with a short comparison table between: - standard normalized WER - our current `space_norm_wer` - MER = CER on joined text Final instruction: Emphasize that our current metric is: "word error rate after forgiving whitespace boundaries by doing character alignment on space-stripped text and then counting how many reference words are touched by real content edits." ``` If you want, I can also give you a second shorter version of this prompt for Claude that's half the length.