● 1. Checkpoint format The .ckpt files are Lightning checkpoints. No .nemo files are in R2 — only .ckpt files were uploaded. Load via config + state_dict: import nemo.collections.asr as nemo_asr from omegaconf import OmegaConf import torch cfg = OmegaConf.load("configs/train/stage1_prod_8xh200.yaml") model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel(cfg=cfg.model, trainer=None) ckpt = torch.load("path_to.ckpt", map_location="cpu", weights_only=False) model.load_state_dict(ckpt["state_dict"], strict=False) The 5.2 GB files in ckpt-20000 are from an earlier run with encoder frozen (no optimizer state for encoder params = smaller file). The 13.7 GB files have full optimizer state. Both contain the same state_dict keys — just load ckpt["state_dict"]. Important: The tokenizer is NOT in R2. It's at tokenizers/stage1_prod_bpe/tokenizer.model in the git repo. Make sure that directory exists locally or the model won't init. 2. Language conditioning at inference Skip it for benchmarking from R2 checkpoints. The language embedding is a 12K-param additive bias — negligible impact. The model works fine without it. The complication: _lang_embed_module weights ARE in the state_dict but the model constructed from config alone won't have that module registered. Use strict=False when loading — it will silently skip the lang embed keys. The model will run in language-agnostic mode. If you DO want language conditioning, copy LanguageEmbedding class and patch_encoder_for_lang() from scripts/train_prod.py (lines ~46-75 and ~510-525 in git), register the module before loading state_dict, and set model._current_lang_ids per sample. Recommendation: just use strict=False and skip lang conditioning. Simpler, works fine. 3. Which checkpoints to benchmark Just ckpt-60000 — it has the best val_loss (140.7). The others are superseded: - ckpt-20000: val_loss 209.6 (early training) - ckpt-40000: val_loss 164.9 - ckpt-60000: val_loss 140.7 (best in R2) 4. model_id - model_id: maya-asr-tdt-1.1b - checkpoint name: ckpt-60000 5. Decoding strategy TDT greedy (primary). The config in git already has the right settings: decoding: strategy: greedy model_type: tdt durations: [0, 1, 2, 3, 4] greedy: max_symbols: 10 CTC decoding is optional for comparison — use model.change_decoding_strategy(decoder_type="ctc") if NeMo supports it on the hybrid model. 6. GPU availability The inference machine needs ~5 GB VRAM in bf16 for a single forward pass. Any GPU with 8+ GB works. The R2 checkpoint files are ~13.7 GB on disk but that's mostly optimizer state — the model weights alone are ~2.3 GB in bf16. Download from R2: rclone copy r2:ptcheckpoints/hybrid-tdt-gemma/03-26-2026/ckpt-60000/ ./ckpt-60000/ --include "*.ckpt" --progress Only download the *-last.ckpt or the best val_loss .ckpt — you don't need all files in the directory.