APEX FusedRMSNorm not available, using native implementation /home/ubuntu/vibevoice/vibevoice/processor/vibevoice_asr_processor.py:23: UserWarning: audio_utils not available, will fall back to soundfile for audio loading warnings.warn("audio_utils not available, will fall back to soundfile for audio loading") The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'Qwen2Tokenizer'. The class this function is called from is 'VibeVoiceTextTokenizerFast'. Loading checkpoint shards: 0%| | 0/3 [00:00 main() File "/home/ubuntu/vibevoice/cuda_graph_engine.py", line 312, in main r, out = benchmark_cuda_graphs(model, processor, voice_path, text, ddpm_steps=10, cfg_scale=1.3) File "/home/ubuntu/vibevoice/cuda_graph_engine.py", line 194, in benchmark_cuda_graphs cg_diffusion = CUDAGraphDiffusion( File "/home/ubuntu/vibevoice/cuda_graph_engine.py", line 50, in __init__ self._capture() File "/home/ubuntu/vibevoice/cuda_graph_engine.py", line 72, in _capture with torch.cuda.graph(g): File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/cuda/graphs.py", line 186, in __exit__ self.cuda_graph.capture_end() File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/cuda/graphs.py", line 84, in capture_end super().capture_end() RuntimeError: CUDA error: operation failed due to a previous error during capture CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.