================================================================ LAB 2 — FULL OUTPUT LOG Thu Mar 5 11:57:33 AM UTC 2026 ================================================================ ================================================================ PART A: C1+C2 — Training with Timing ================================================================ Device: cuda ResNet-18 params: 11,173,962 ============================================================ C1/C2: Training — 5 epochs optimizer=sgd workers=4 bn=on device=cuda ============================================================ Ep Loss Acc% Data(s) Train(s) Total(s) 1 1.8894 31.12% 0.65 13.41 15.77 2 1.3499 50.39% 1.01 10.92 13.18 3 1.0827 61.14% 0.87 11.63 13.88 4 0.8919 68.36% 0.61 12.90 15.23 5 0.7512 73.83% 0.65 12.94 15.28 [Q3] Trainable params : 11,173,962 [Q3] Params w/ grads : 11,173,962 [Q3] Optimizer states : 62 ================================================================ PART A: C3 — I/O Worker Sweep ================================================================ Device: cuda ResNet-18 params: 11,173,962 ============================================================ C3: I/O Optimization — Worker Sweep (5 epochs each) ============================================================ Workers AvgData(s) AvgTrain(s) AvgTotal(s) 0 13.396 12.060 27.086 4 0.797 11.638 13.928 8 0.852 11.728 14.123 12 1.276 12.501 15.560 16 1.367 12.548 15.719 /home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:624: UserWarning: This DataLoader will create 20 worker processes in total. Our suggested max number of worker in current system is 16, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. if max_num_worker_suggest is None: 20 1.201 12.477 15.488 C3.2 => Best num_workers = 4 (avg total: 13.928s, avg data: 0.797s) Plot saved -> c3_workers.png => Use --num_workers 4 for subsequent experiments ================================================================ PART A: C4 — GPU vs CPU ================================================================ Device: cuda ResNet-18 params: 11,173,962 ============================================================ C4: GPU vs CPU ============================================================ [CPU] Training 5 epochs... Ep 1: loss=1.9320 acc=30.70% time=579.23s Ep 2: loss=1.4355 acc=47.51% time=431.86s Ep 3: loss=1.1419 acc=59.25% time=432.04s Ep 4: loss=0.9589 acc=66.19% time=429.39s Ep 5: loss=0.8158 acc=71.12% time=467.04s [CPU] Avg epoch time: 467.91s [CUDA] Training 5 epochs... Ep 1: loss=1.9007 acc=30.89% time=7.07s Ep 2: loss=1.4480 acc=46.87% time=6.49s Ep 3: loss=1.2061 acc=56.13% time=6.50s Ep 4: loss=0.9932 acc=64.87% time=6.56s Ep 5: loss=0.8537 acc=69.84% time=6.52s [CUDA] Avg epoch time: 6.63s ============================================================ C4 Summary: GPU vs CPU (workers=4) ============================================================ CPU avg: 467.91s/epoch GPU avg: 6.63s/epoch GPU speedup: 70.6x ================================================================ PART A: C5 — Optimizer Comparison ================================================================ Device: cuda ResNet-18 params: 11,173,962 ============================================================ C5: Optimizer Comparison ============================================================ [SGD] Ep Loss Acc% Train(s) Total(s) 1 1.9082 31.52% 5.80 6.82 2 1.4078 48.34% 5.37 6.42 3 1.1673 57.93% 5.37 6.40 4 0.9827 65.60% 5.38 6.38 5 0.8566 69.93% 5.38 6.39 [SGD_NESTEROV] Ep Loss Acc% Train(s) Total(s) 1 1.9148 31.59% 5.42 6.46 2 1.3372 51.10% 5.41 6.44 3 1.0420 62.62% 5.42 6.51 4 0.8773 69.03% 5.43 6.50 5 0.7533 73.69% 5.42 6.47 [ADAM] Ep Loss Acc% Train(s) Total(s) 1 2.2094 20.82% 5.55 6.62 2 1.8813 27.24% 5.52 6.61 3 1.8430 28.43% 5.52 6.60 4 1.8157 30.26% 5.52 6.59 5 1.8078 30.47% 5.54 6.60 ============================================================ C5 Summary: Optimizer Comparison (workers=4) ============================================================ Optimizer AvgLoss AvgAcc% AvgTrain(s) sgd 1.2645 54.66% 5.46 sgd_nesterov 1.1849 57.61% 5.42 adam 1.9114 27.44% 5.53 ================================================================ PART A: C6 — Without Batch Norm ================================================================ Device: cuda ResNet-18 params: 11,173,962 ============================================================ C6: Without Batch Norm — 5 epochs (SGD, workers=4) ============================================================ Ep Loss Acc% Train(s) Total(s) 1 1.9335 26.70% 4.63 5.62 2 1.5540 42.90% 4.21 5.28 3 1.3556 51.21% 4.22 5.26 4 1.1640 58.84% 4.21 5.18 5 1.0096 64.68% 4.21 5.19 C6 Summary => avg loss: 1.4034, avg acc: 48.87% ================================================================ PART B: C7–C10 — TorchScript ================================================================ Device: cuda Training 5 epochs before scripting... Ep 1: loss=1.8304 acc=32.97% Ep 2: loss=1.3699 acc=49.58% Ep 3: loss=1.0921 acc=60.89% Ep 4: loss=0.9062 acc=67.85% Ep 5: loss=0.7824 acc=72.45% C7: Scripted model saved -> resnet18_scripted.pt C7: Save/load verification — max diff: 0.0e+00 ============================================================ C8: TorchScript Model Graph ============================================================ graph(%self.1 : __torch__.lab2.ResNet, %x.1 : Tensor): %42 : int = prim::Constant[value=-1]() %13 : Function = prim::Constant[name="relu"]() %12 : bool = prim::Constant[value=0]() # :0:0 %41 : int = prim::Constant[value=1]() # /home/ubuntu/hpml_nyu/lab2.py:80:29 %bn1.1 : __torch__.torch.nn.modules.batchnorm.BatchNorm2d = prim::GetAttr[name="bn1"](%self.1) %conv1.1 : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv1"](%self.1) %9 : Tensor = prim::CallMethod[name="forward"](%conv1.1, %x.1) # /home/ubuntu/hpml_nyu/lab2.py:74:28 %10 : Tensor = prim::CallMethod[name="forward"](%bn1.1, %9) # /home/ubuntu/hpml_nyu/lab2.py:74:19 %x0.1 : Tensor = prim::CallFunction(%13, %10, %12) # :0:0 %layer1.1 : __torch__.torch.nn.modules.container.___torch_mangle_1.Sequential = prim::GetAttr[name="layer1"](%self.1) %x1.1 : Tensor = prim::CallMethod[name="forward"](%layer1.1, %x0.1) # /home/ubuntu/hpml_nyu/lab2.py:75:12 %layer2.1 : __torch__.torch.nn.modules.container.___torch_mangle_9.Sequential = prim::GetAttr[name="layer2"](%self.1) %x2.1 : Tensor = prim::CallMethod[name="forward"](%layer2.1, %x1.1) # /home/ubuntu/hpml_nyu/lab2.py:76:12 %layer3.1 : __torch__.torch.nn.modules.container.___torch_mangle_17.Sequential = prim::GetAttr[name="layer3"](%self.1) %x3.1 : Tensor = prim::CallMethod[name="forward"](%layer3.1, %x2.1) # /home/ubuntu/hpml_nyu/lab2.py:77:12 %layer4.1 : __torch__.torch.nn.modules.container.___torch_mangle_25.Sequential = prim::GetAttr[name="layer4"](%self.1) %x4.1 : Tensor = prim::CallMethod[name="forward"](%layer4.1, %x3.1) # /home/ubuntu/hpml_nyu/lab2.py:78:12 %avgpool.1 : __torch__.torch.nn.modules.pooling.AdaptiveAvgPool2d = prim::GetAttr[name="avgpool"](%self.1) %x5.1 : Tensor = prim::CallMethod[name="forward"](%avgpool.1, %x4.1) # /home/ubuntu/hpml_nyu/lab2.py:79:12 %x6.1 : Tensor = aten::flatten(%x5.1, %41, %42) # /home/ubuntu/hpml_nyu/lab2.py:80:12 %fc.1 : __torch__.torch.nn.modules.linear.Linear = prim::GetAttr[name="fc"](%self.1) %48 : Tensor = prim::CallMethod[name="forward"](%fc.1, %x6.1) # /home/ubuntu/hpml_nyu/lab2.py:81:15 return (%48) ============================================================ C9: Test Set Accuracy ============================================================ PyTorch model: 70.94% TorchScript model: 70.94% ============================================================ C10: Latency Comparison (single image, ms) ============================================================ CPU (ms) GPU (ms) PyTorch 12.17 2.03 TorchScript 11.22 1.26 CPU speedup: 1.08x CUDA speedup: 1.61x ================================================================ DONE — Thu Mar 5 12:50:13 PM UTC 2026 ================================================================