● 1. Checkpoint format

  The .ckpt files are Lightning checkpoints. No .nemo files are in R2 — only .ckpt files were uploaded. Load via config +
  state_dict:

  import nemo.collections.asr as nemo_asr
  from omegaconf import OmegaConf
  import torch

  cfg = OmegaConf.load("configs/train/stage1_prod_8xh200.yaml")
  model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel(cfg=cfg.model, trainer=None)

  ckpt = torch.load("path_to.ckpt", map_location="cpu", weights_only=False)
  model.load_state_dict(ckpt["state_dict"], strict=False)

  The 5.2 GB files in ckpt-20000 are from an earlier run with encoder frozen (no optimizer state for encoder params = smaller
  file). The 13.7 GB files have full optimizer state. Both contain the same state_dict keys — just load ckpt["state_dict"].

  Important: The tokenizer is NOT in R2. It's at tokenizers/stage1_prod_bpe/tokenizer.model in the git repo. Make sure that
  directory exists locally or the model won't init.

  2. Language conditioning at inference

  Skip it for benchmarking from R2 checkpoints. The language embedding is a 12K-param additive bias — negligible impact. The model
   works fine without it.

  The complication: _lang_embed_module weights ARE in the state_dict but the model constructed from config alone won't have that
  module registered. Use strict=False when loading — it will silently skip the lang embed keys. The model will run in
  language-agnostic mode.

  If you DO want language conditioning, copy LanguageEmbedding class and patch_encoder_for_lang() from scripts/train_prod.py
  (lines ~46-75 and ~510-525 in git), register the module before loading state_dict, and set model._current_lang_ids per sample.

  Recommendation: just use strict=False and skip lang conditioning. Simpler, works fine.

  3. Which checkpoints to benchmark

  Just ckpt-60000 — it has the best val_loss (140.7). The others are superseded:
  - ckpt-20000: val_loss 209.6 (early training)
  - ckpt-40000: val_loss 164.9
  - ckpt-60000: val_loss 140.7 (best in R2)

  4. model_id

  - model_id: maya-asr-tdt-1.1b
  - checkpoint name: ckpt-60000

  5. Decoding strategy

  TDT greedy (primary). The config in git already has the right settings:

  decoding:
    strategy: greedy
    model_type: tdt
    durations: [0, 1, 2, 3, 4]
    greedy:
      max_symbols: 10

  CTC decoding is optional for comparison — use model.change_decoding_strategy(decoder_type="ctc") if NeMo supports it on the
  hybrid model.

  6. GPU availability

  The inference machine needs ~5 GB VRAM in bf16 for a single forward pass. Any GPU with 8+ GB works. The R2 checkpoint files are
  ~13.7 GB on disk but that's mostly optimizer state — the model weights alone are ~2.3 GB in bf16.

  Download from R2:
  rclone copy r2:ptcheckpoints/hybrid-tdt-gemma/03-26-2026/ckpt-60000/ ./ckpt-60000/ --include "*.ckpt" --progress

  Only download the *-last.ckpt or the best val_loss .ckpt — you don't need all files in the directory.