is this cool ? W0404 23:22:37.924000 1113384 torch/distributed/run.py:792] W0404 23:22:37.924000 1113384 torch/distributed/run.py:792] ***************************************** W0404 23:22:37.924000 1113384 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0404 23:22:37.924000 1113384 torch/distributed/run.py:792] ***************************************** 2026-04-04 23:22:48 [INFO] World size: 8, Device: cuda:0 2026-04-04 23:22:48 [INFO] Loading model: /workspace/training/tokenizer_extension/extended_model Loading weights: 0%| | 0/2150 [00:00<|startoftranscript|><|emo:undefined|><|hi|><|hi|><|pnc|><|noitn|><|notimestamp|><|nodiarize|> → 9 tokens 2026-04-04 23:25:45 [INFO] DataLoader: 8 workers, max_batch_mel_frames=800000, max_batch_utterances=64 2026-04-04 23:25:46 [INFO] Optimizer: AdamW, base_lr=0.0002, LLRD=0.9, max_encoder_depth=47 wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from WANDB_API_KEY. wandb: Currently logged in as: bharathkumar1922001 (maya-research-maya-research) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin wandb: setting up run 85dzyawa wandb: Tracking run with wandb version 0.25.1 wandb: Run data is saved locally in /workspace/training/wandb/run-20260404_232549-85dzyawa wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run indic-lang-extension-v1 wandb: ⭐️ View project at https://wandb.ai/maya-research-maya-research/maya-asr-cohere-transcribe wandb: 🚀 View run at https://wandb.ai/maya-research-maya-research/maya-asr-cohere-transcribe/runs/85dzyawa 2026-04-04 23:25:58 [INFO] Starting training: 2 epochs, ~142574 estimated steps 2026-04-04 23:25:58 [INFO] Warmup: 2851 steps (2%) 2026-04-04 23:25:58 [INFO] Strategy: ddp, BF16: True 2026-04-04 23:25:58 [INFO] Gradient accumulation: 2 2026-04-04 23:25:58 [INFO] === Epoch 1/2 === 2026-04-04 23:28:47 [INFO] Step 50 | loss=15.4257 | lr=3.44e-06 | grad=78.50 | throughput=181.6x RT 2026-04-04 23:31:02 [INFO] Step 100 | loss=13.5723 | lr=6.94e-06 | grad=40.25 | throughput=150.6x RT 2026-04-04 23:33:39 [INFO] Step 150 | loss=9.6411 | lr=1.05e-05 | grad=4.12 | throughput=89.8x RT 2026-04-04 23:36:34 [INFO] Step 200 | loss=8.9768 | lr=1.40e-05 | grad=1.89 | throughput=82.1x RT 2026-04-04 23:39:48 [INFO] Step 250 | loss=8.3982 | lr=1.75e-05 | grad=2.12 | throughput=81.1x RT 2026-04-04 23:42:42 [INFO] Step 300 | loss=7.8483 | lr=2.10e-05 | grad=1.70 | throughput=35.9x RT 2026-04-04 23:46:01 [INFO] Step 350 | loss=7.5591 | lr=2.45e-05 | grad=1.73 | throughput=52.8x RT 2026-04-04 23:49:09 [INFO] Step 400 | loss=7.2173 | lr=2.80e-05 | grad=1.81 | throughput=40.0x RT 2026-04-04 23:52:19 [INFO] Step 450 | loss=7.0046 | lr=3.15e-05 | grad=2.11 | throughput=31.1x RT 2026-04-04 23:56:07 [INFO] Step 500 | loss=6.9517 | lr=3.50e-05 | grad=3.59 | throughput=41.1x RT 2026-04-04 23:59:29 [INFO] Step 550 | loss=6.7215 | lr=3.85e-05 | grad=3.20 | throughput=20.0x RT 2026-04-05 00:03:12 [INFO] Step 600 | loss=6.6965 | lr=4.20e-05 | grad=1.27 | throughput=27.8x RT 2026-04-05 00:07:01 [INFO] Step 650 | loss=6.5202 | lr=4.55e-05 | grad=6.28 | throughput=26.9x RT 2026-04-05 00:10:26 [INFO] Step 700 | loss=6.3383 | lr=4.90e-05 | grad=2.41 | throughput=16.7x RT 2026-04-05 00:14:20 [INFO] Step 750 | loss=6.3453 | lr=5.25e-05 | grad=5.12 | throughput=26.4x RT 2026-04-05 00:18:05 [INFO] Step 800 | loss=6.1364 | lr=5.61e-05 | grad=5.62 | throughput=15.5x RT 2026-04-05 00:21:43 [INFO] Step 850 | loss=6.1014 | lr=5.96e-05 | grad=2.44 | throughput=16.7x RT dont disturb the run. what could be the reason for throughput decline pattern ?