After checking `260k.json`: this is **not a broad English collapse**. The headline `56.31` is being dominated by a small set of catastrophic short-clip failures. ## Main Finding All `500` English hypotheses in `260k.json` are still detected as English. I did **not** see prompt-token leakage, byte-token garbage, or English switching into Indic script. What I did see is a very specific failure mode on ultra-short English clips: - `8/500` English samples go into runaway repetition to `128-256` tokens. - Those `8` samples alone account for about **59% of all English errors**. - If I remove just those `8`, English WER drops from `56.31` to about **23.03**. - If I evaluate only English clips with `duration >= 0.5s`, WER is about **22.61**. A representative example from `260k.json`: ```10643:10650:/workspace/maya-asr/260k.json "id": "en_0064", "language": "english", "lang_code": "en", "source": "Svarah", "duration": 0.2343125, "reference": "Up", "hypothesis": "okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay,