After checking `260k.json`: this is **not a broad English collapse**. The headline `56.31` is being dominated by a small set of catastrophic short-clip failures.

## Main Finding

All `500` English hypotheses in `260k.json` are still detected as English. I did **not** see prompt-token leakage, byte-token garbage, or English switching into Indic script. What I did see is a very specific failure mode on ultra-short English clips:

- `8/500` English samples go into runaway repetition to `128-256` tokens.
- Those `8` samples alone account for about **59% of all English errors**.
- If I remove just those `8`, English WER drops from `56.31` to about **23.03**.
- If I evaluate only English clips with `duration >= 0.5s`, WER is about **22.61**.

A representative example from `260k.json`:

```10643:10650:/workspace/maya-asr/260k.json
      "id": "en_0064",
      "language": "english",
      "lang_code": "en",
      "source": "Svarah",
      "duration": 0.2343125,
      "reference": "Up",
      "hypothesis": "okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay, okay,