● Let me look at how the training pipeline handles language tags and transcription format so I can give your inference agent accurate guidance. Searched for 2 patterns, read 1 file (ctrl+o to expand) ● Now let me check the inference pipeline to understand how Qwen3ASRModel works: Searched for 2 patterns, read 2 files (ctrl+o to expand) ● Now let me check how the model's transcribe works and what it sends as the prompt: Searched for 2 patterns, read 1 file (ctrl+o to expand) ● Now I have the full picture. Here are my suggestions for your inference agent: --- Key issues in the benchmark script: 1. validate_language() will crash on Indic languages The biggest problem: utils.py:SUPPORTED_LANGUAGES only has the 30 base Qwen3-ASR languages. It does NOT include Assamese, Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, or Telugu. When the benchmark calls model.transcribe(language="Tamil"), it hits validate_language() → ValueError: Unsupported language: Tamil. Fix: The inference agent needs to either: - Add the Indic languages to SUPPORTED_LANGUAGES in qwen_asr/inference/utils.py - Or pass language=None and let the model auto-detect (the model was trained with language_tag_mode=auto which embeds language names in the target text, so auto-detect should work) 2. Language name mismatch The benchmark script's LANG_CODE_TO_NAME matches the training script's exactly — that's good. But _MODEL_SUPPORTED_LANGS in the benchmark only has the base 30 languages. So the force_language logic falls back to None for all Indic languages except Hindi/English. This means: - Hindi/English: forced language → model outputs raw text only - Other Indic: auto-detect → model outputs language Tamilactual text This inconsistency will mess up WER because parse_asr_output strips the language prefix when user_language is set, but keeps it otherwise. Fix: Either force all languages (after adding them to SUPPORTED_LANGUAGES) or force none. 3. How the model actually generates output During training, the target format is: language Tamilநான் வீட்டுக்கு போகிறேன் At inference with force_language="Tamil", the prompt is pre-filled with language Tamil and the model generates just the transcription text. Without forcing, the model generates the full language Xtext sequence and parse_asr_output extracts the text part. 4. The transcribe() API expects (np.ndarray, sr) tuples, not raw arrays The benchmark passes audio_inputs.append((wav, sr)) — that's correct. 5. Recommendation for the benchmark: # Option A: Add Indic languages to SUPPORTED_LANGUAGES and force language # This is cleaner and gives better results since the model doesn't waste # tokens on language detection. # In qwen_asr/inference/utils.py, add to SUPPORTED_LANGUAGES: # "Assamese", "Bengali", "Gujarati", "Kannada", "Malayalam", # "Marathi", "Odia", "Punjabi", "Tamil", "Telugu" # Option B: Don't force language, let auto-detect work # Simpler, but the model generates `language Xtext` # and parse_asr_output handles extraction. # Use: model.transcribe(audio=audio_inputs, language=None) 6. max_new_tokens=256 may be too low for longer Indic transcriptions — some of the 10-15s samples in the training data have 280+ character transcripts. Consider 448 or 512. 7. The checkpoint needs base model files The MakeEveryCheckpointInferableCallback copies required HF files (preprocessor_config, chat_template, etc.) into each checkpoint. Verify these exist in the checkpoint being benchmarked, otherwise from_pretrained will fail. Want me to patch SUPPORTED_LANGUAGES in utils.py to include the Indic languages? ✻ Cogitated for 1m 45s ❯ yes please. Read 1 file (ctrl+o to expand) ● Update(qwen_asr/inference/utils.py) ⎿ Added 12 lines, removed 1 line 64 "Greek", 65 "Romanian", 66 "Hungarian", 67 - "Macedonian" 67 + "Macedonian", 68 + # Indic languages (added via Phase 2 fine-tuning) 69 + "Assamese", 70 + "Bengali", 71 + "Gujarati", 72 + "Kannada", 73 + "Malayalam", 74 + "Marathi", 75 + "Odia", 76 + "Punjabi", 77 + "Tamil", 78 + "Telugu", 79 ] 80 _ASR_TEXT_TAG = "" 81 _LANG_PREFIX = "language " ● Bash(python3 -m py_compile /root/data/Qwen3-ASR-official/qwen_asr/inference/utils.py && echo "OK") ⎿ OK ● Bash(git add qwen_asr/inference/utils.py && git commit -m "$(cat <<'EOF' Add Indic languages to SUPPORTED_LANGUAGES for inference…) ⎿ [main 915a4ba] Add Indic languages to SUPPORTED_LANGUAGES for inference 1 file changed, 12 insertions(+), 1 deletion(-) To https://github.com/Maya-AI-ML-Trainings-Cookbooks/qwen3-asr-1.7b-phase2-sft.git fb9c6e1..915a4ba main -> main ● Done. Pushed to the repo. Your inference agent can now force_language="Tamil" etc. without hitting the ValueError.