"VibeVoiceForConditionalGeneration" Instantiating VibeVoiceForConditionalGenerationInference model under default dtype torch.bfloat16. "VibeVoiceForConditionalGeneration" Instantiating VibeVoiceForConditionalGenerationInference model under default dtype torch.bfloat16. Generate config GenerationConfig {} [ERROR] : ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2. Traceback (most recent call last): model = VibeVoiceForConditionalGenerationInference.from_pretrained( raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}") ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2. Error loading the model. Trying to use SDPA. However, note that only flash_attention_2 has been fully tested, and using SDPA may result in lower audio quality. All model checkpoint weights were used when initializing VibeVoiceForConditionalGenerationInference. All the weights of VibeVoiceForConditionalGenerationInference were initialized from the model checkpoint at microsoft/VibeVoice-1.5B. If your task is similar to the task the model of the checkpoint was trained on, you can already use VibeVoiceForConditionalGenerationInference for predictions without further training. Generation config file not found, using a generation config created from the model config. Generating: 0%| | 0/936 [00:00