APEX FusedRMSNorm not available, using native implementation /home/ubuntu/vibevoice/vibevoice/processor/vibevoice_asr_processor.py:23: UserWarning: audio_utils not available, will fall back to soundfile for audio loading warnings.warn("audio_utils not available, will fall back to soundfile for audio loading") The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'Qwen2Tokenizer'. The class this function is called from is 'VibeVoiceTextTokenizerFast'. Loading checkpoint shards: 0%| | 0/3 [00:00 main() File "/home/ubuntu/vibevoice/rtf_optimize.py", line 116, in main _ = measure(model, processor, voice_path, "Speaker 1: warmup test.", ddpm_steps=20, cfg_scale=1.3) File "/home/ubuntu/vibevoice/rtf_optimize.py", line 50, in measure outputs = model.generate( File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/vibevoice/vibevoice/modular/modeling_vibevoice_inference.py", line 628, in generate positive_condition = outputs.last_hidden_state[diffusion_indices, -1, :] RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 259, in forward hidden_states = self.input_layernorm(hidden_states) File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 225, in forward return self.weight * hidden_states.to(input_dtype). To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.