COMPARISON: VibeVoice vs Fish S2 Pro vs Sooktam-2 Reference audio: /home/ubuntu/vibevoice/demo/voices/modi.wav Language: Hindi ============================================================ SOOKTAM-2 (BharatGen) ============================================================ Running setup-cls.sh... setup-cls.sh stderr: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root? Loading Sooktam-2 model... Encountered exception while importing f5_tts: No module named 'f5_tts' Sooktam-2 failed: This modeling file requires the following packages that were not found in your environment: f5_tts. Run `pip install f5_tts` Traceback (most recent call last): File "/home/ubuntu/compare_tts.py", line 81, in test_sooktam2 model = AutoModel.from_pretrained( File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 549, in from_pretrained config, kwargs = AutoConfig.from_pretrained( File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1346, in from_pretrained config_class = get_class_from_dynamic_module( File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 604, in get_class_from_dynamic_module final_module = get_cached_module_file( File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 427, in get_cached_module_file modules_needed = check_imports(resolved_module_file) File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 260, in check_imports raise ImportError( ImportError: This modeling file requires the following packages that were not found in your environment: f5_tts. Run `pip install f5_tts` ============================================================ VIBEVOICE 1.5B (our current best) ============================================================ WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1773512317.647956 1891727 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. /home/ubuntu/.local/lib/python3.10/site-packages/google/api_core/_python_version_support.py:275: FutureWarning: You are using a Python version (3.10.12) which Google will stop supporting in new releases of google.api_core once it reaches its end of life (2026-10-04). Please upgrade to the latest Python version, or at least Python 3.11, to continue receiving updates for google.api_core past that date. warnings.warn(message, FutureWarning) APEX FusedRMSNorm not available, using native implementation /home/ubuntu/vibevoice/vibevoice/processor/vibevoice_asr_processor.py:23: UserWarning: audio_utils not available, will fall back to soundfile for audio loading warnings.warn("audio_utils not available, will fall back to soundfile for audio loading") The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'Qwen2Tokenizer'. The class this function is called from is 'VibeVoiceTextTokenizerFast'. `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/3 [00:00