● Here are the answers for your inference agent: 1. Codec weights location: On this machine at /root/maya-tts/checkpoints/xcodec2/ckpt/epoch=4-step=1400000.ckpt — this is a single checkpoint file that contains both encoder and decoder. The decoding.create() and encoding.create() functions both take this same path. There's also a model_config.json in the same directory (/root/maya-tts/checkpoints/xcodec2/ckpt/model_config.json). If your inference agent needs these, they'd need to be copied or downloaded separately from the LLM checkpoints — they're not in R2's ptcheckpoints bucket. 2. Checkpoint format at ckpt-140000: It's the raw DeepSpeed ZeRO-3 training checkpoint — NOT a converted HF checkpoint. It contains sharded optimizer states (bf16_zero_pp_rank_*_optim_states.pt), sharded model states (zero_pp_rank_*_model_states.pt), plus a consolidated model.safetensors. The model.safetensors + config.json + tokenizer.json should be directly loadable by AutoModelForCausalLM.from_pretrained() without running convert_checkpoint.py. The DeepSpeed shard files are only needed for resuming training. 3. Extended tokenizer: Yes, confirmed. The tokenizer is saved with every checkpoint — tokenizer.json (~28MB) contains the full 193,856 token vocabulary including all <|s_0|> through <|s_65535|> speech tokens and the 8 control tokens. 4. Codec variant: Stock XCodec2 checkpoint (epoch=4-step=1400000.ckpt), not custom-trained. The codec produces 65,536 discrete codes at 50 tokens/sec from 16kHz audio. 5. Quality at 140K steps: This is an early checkpoint — the run is currently at step ~155K out of 550K (28%). Per-source eval at step 140K showed audio loss ~7.1, paired loss ~7.0. Text is strong (2.3) since the base Llama already knew text. Speech output will likely be intelligible but noisy/repetitive — the model is still learning audio codebook patterns. The pre-SFT gate on a 20-step smoke checkpoint scored nondegenerate_fraction=0.71 (below the 0.9 production threshold). At 140K steps it should be better but don't expect production quality. This is a valid sanity-check checkpoint. One correction to relay: Your inference agent noted 1,271,565 max steps — that's from an older config. The current production run is 550,000 max steps with BS=12 per GPU. Also, the base model is meta-llama/Llama-3.2-1B (base), NOT Llama-3.2-1B-Instruct.