--- Task: Add MER (Meaningful Error Rate) to benchmark outputs MER is a space-normalized metric that measures content accuracy ignoring word boundary errors. It's already defined in /home/ubuntu/training/benchmark_schema/BENCHMARK_SCHEMA.md — read it for the full schema. What MER is: CER computed on space-stripped text. After standard normalization (NFKC, punctuation removal, case fold), remove ALL spaces from both reference and hypothesis, then compute character error rate on the resulting single character streams. from jiwer import cer def norm_standard(text, language): """Your existing normalization: NFKC, strip zero-width chars, collapse whitespace, remove punctuation, case fold.""" # ... whatever you already have for wer_norm ... def compute_mer(ref, hyp, language): ref_stripped = norm_standard(ref, language).replace(" ", "") hyp_stripped = norm_standard(hyp, language).replace(" ", "") if not ref_stripped: return 0.0 if not hyp_stripped else 100.0 if not hyp_stripped: return 100.0 return round(cer(ref_stripped, hyp_stripped) * 100, 2) That's it. It's cer() on space-removed text, not wer(). Since removing spaces makes everything one "word", WER would just give 0% or 100% — useless. CER on the joined character stream is the correct computation. Why it matters: "कोत्तागुडेम" → "कोत्ता गुड़ेम" is 0% MER but multiple WER errors. For Dravidian languages (Tamil/Malayalam/Kannada),─wer_norm─is─40-50%─but─MER─is─8-12%─—─the─model─knows─the─content,─it─just────── doesn't know word boundaries.