# Issue: fishaudio/fish-speech #843 - Repository: fishaudio/fish-speech | SOTA Open Source TTS | 27K stars | Python ## How to continue training with finetune? What are the steps to continue training - Author: [@HimanshuRepozitory](https://github.com/HimanshuRepozitory) - State: open - Created: 2025-01-22T04:23:25Z - Updated: 2025-03-20T02:21:30Z ### Discussed in https://github.com/fishaudio/fish-speech/discussions/577 Originally posted by **WithAngleOrDemon** September 21, 2024 After terminating the training in finetune, I am unable to continue with the previous training. What is the method to continue training? I tried using the newly generated model from Lora, but it still reported an error ``` [2024-09-21 05:55:53,084][fish_speech.models.text2semantic.llama][INFO] - [rank: 0] Loaded weights with error: _IncompatibleKeys(missing_keys=['embeddings.lora_A', 'embeddings.lora_B', 'codebook_embeddings.lora_A', 'codebook_embeddings.lora_B', 'layers.0.attention.wqkv.lora_A', 'layers.0.attention.wqkv.lora_B', 'layers.0.attention.wo.lora_A', 'layers.0.attention.wo.lora_B', 'layers.0.feed_forward.w1.lora_A', 'layers.0.feed_forward.w1.lora_B', 'layers.0.feed_forward.w3.lora_A', 'layers.0.feed_forward.w3.lora_B', 'layers.0.feed_forward.w2.lora_A', 'layers.0.feed_forward.w2.lora_B', 'layers.1.attention.wqkv.lora_A', 'layers.1.attention.wqkv.lora_B', 'layers.1.attention.wo.lora_A', 'layers.1.attention.wo.lora_B', 'layers.1.feed_forward.w1.lora_A', 'layers.1.feed_forward.w1.lora_B', 'layers.1.feed_forward.w3.lora_A', 'layers.1.feed_forward.w3.lora_B', 'layers.1.feed_forward.w2.lora_A', 'layers.1.feed_forward.w2.lora_B', 'layers.2.attention.wqkv.lora_A', 'layers.2.attention.wqkv.lora_B', 'layers.2.attention.wo.lora_A', 'layers.2.attention.wo.lora_B', 'layers.2.feed_forward.w1.lora_A', 'layers.2.feed_forward.w1.lora_B', 'layers.2.feed_forward.w3.lora_A', 'layers.2.feed_forward.w3.lora_B', 'layers.2.feed_forward.w2.lora_A', 'layers.2.feed_forward.w2.lora_B', 'layers.3.attention.wqkv.lora_A', 'layers.3.attention.wqkv.lora_B', 'layers.3.attention.wo.lora_A', 'layers.3.attention.wo.lora_B', 'layers.3.feed_forward.w1.lora_A', 'layers.3.feed_forward.w1.lora_B', 'layers.3.feed_forward.w3.lora_A', 'layers.3.feed_forward.w3.lora_B', 'layers.3.feed_forward.w2.lora_A', 'layers.3.feed_forward.w2.lora_B', 'layers.4.attention.wqkv.lora_A', 'layers.4.attention.wqkv.lora_B', 'layers.4.attention.wo.lora_A', 'layers.4.attention.wo.lora_B', 'layers.4.feed_forward.w1.lora_A', 'layers.4.feed_forward.w1.lora_B', 'layers.4.feed_forward.w3.lora_A', 'layers.4.feed_forward.w3.lora_B', 'layers.4.feed_forward.w2.lora_A', 'layers.4.feed_forward.w2.lora_B', 'layers.5.attention.wqkv.lora_A', 'layers.5.attention.wqkv.lora_B', 'layers.5.attention.wo.lora_A', 'layers.5.attention.wo.lora_B', 'layers.5.feed_forward.w1.lora_A', 'layers.5.feed_forward.w1.lora_B', 'layers.5.feed_forward.w3.lora_A', 'layers.5.feed_forward.w3.lora_B', 'layers.5.feed_forward.w2.lora_A', 'layers.5.feed_forward.w2.lora_B', 'layers.6.attention.wqkv.lora_A', 'layers.6.attention.wqkv.lora_B', 'layers.6.attention.wo.lora_A', 'layers.6.attention.wo.lora_B', 'layers.6.feed_forward.w1.lora_A', 'layers.6.feed_forward.w1.lora_B', 'layers.6.feed_forward.w3.lora_A', 'layers.6.feed_forward.w3.lora_B', 'layers.6.feed_forward.w2.lora_A', 'layers.6.feed_forward.w2.lora_B', 'layers.7.attention.wqkv.lora_A', 'layers.7.attention.wqkv.lora_B', 'layers.7.attention.wo.lora_A', 'layers.7.attention.wo.lora_B', 'layers.7.feed_forward.w1.lora_A', 'layers.7.feed_forward.w1.lora_B', 'layers.7.feed_forward.w3.lora_A', 'layers.7.feed_forward.w3.lora_B', 'layers.7.feed_forward.w2.lora_A', 'layers.7.feed_forward.w2.lora_B', 'layers.8.attention.wqkv.lora_A', 'layers.8.attention.wqkv.lora_B', 'layers.8.attention.wo.lora_A', 'layers.8.attention.wo.lora_B', 'layers.8.feed_forward.w1.lora_A', 'layers.8.feed_forward.w1.lora_B', 'layers.8.feed_forward.w3.lora_A', 'layers.8.feed_forward.w3.lora_B', 'layers.8.feed_forward.w2.lora_A', 'layers.8.feed_forward.w2.lora_B', 'layers.9.attention.wqkv.lora_A', 'layers.9.attention.wqkv.lora_B', 'layers.9.attention.wo.lora_A', 'layers.9.attention.wo.lora_B', 'layers.9.feed_forward.w1.lora_A', 'layers.9.feed_forward.w1.lora_B', 'layers.9.feed_forward.w3.lora_A', 'layers.9.feed_forward.w3.lora_B', 'layers.9.feed_forward.w2.lora_A', 'layers.9.feed_forward.w2.lora_B', 'layers.10.attention.wqkv.lora_A', 'layers.10.attention.wqkv.lora_B', 'layers.10.attention.wo.lora_A', 'layers.10.attention.wo.lora_B', 'layers.10.feed_forward.w1.lora_A', 'layers.10.feed_forward.w1.lora_B', 'layers.10.feed_forward.w3.lora_A', 'layers.10.feed_forward.w3.lora_B', 'layers.10.feed_forward.w2.lora_A', 'layers.10.feed_forward.w2.lora_B', 'layers.11.attention.wqkv.lora_A', 'layers.11.attention.wqkv.lora_B', 'layers.11.attention.wo.lora_A', 'layers.11.attention.wo.lora_B', 'layers.11.feed_forward.w1.lora_A', 'layers.11.feed_forward.w1.lora_B', 'layers.11.feed_forward.w3.lora_A', 'layers.11.feed_forward.w3.lora_B', 'layers.11.feed_forward.w2.lora_A', 'layers.11.feed_forward.w2.lora_B', 'layers.12.attention.wqkv.lora_A', 'layers.12.attention.wqkv.lora_B', 'layers.12.attention.wo.lora_A', 'layers.12.attention.wo.lora_B', 'layers.12.feed_forward.w1.lora_A', 'layers.12.feed_forward.w1.lora_B', 'layers.12.feed_forward.w3.lora_A', 'layers.12.feed_forward.w3.lora_B', 'layers.12.feed_forward.w2.lora_A', 'layers.12.feed_forward.w2.lora_B', 'layers.13.attention.wqkv.lora_A', 'layers.13.attention.wqkv.lora_B', 'layers.13.attention.wo.lora_A', 'layers.13.attention.wo.lora_B', 'layers.13.feed_forward.w1.lora_A', 'layers.13.feed_forward.w1.lora_B', 'layers.13.feed_forward.w3.lora_A', 'layers.13.feed_forward.w3.lora_B', 'layers.13.feed_forward.w2.lora_A', 'layers.13.feed_forward.w2.lora_B', 'layers.14.attention.wqkv.lora_A', 'layers.14.attention.wqkv.lora_B', 'layers.14.attention.wo.lora_A', 'layers.14.attention.wo.lora_B', 'layers.14.feed_forward.w1.lora_A', 'layers.14.feed_forward.w1.lora_B', 'layers.14.feed_forward.w3.lora_A', 'layers.14.feed_forward.w3.lora_B', 'layers.14.feed_forward.w2.lora_A', 'layers.14.feed_forward.w2.lora_B', 'layers.15.attention.wqkv.lora_A', 'layers.15.attention.wqkv.lora_B', 'layers.15.attention.wo.lora_A', 'layers.15.attention.wo.lora_B', 'layers.15.feed_forward.w1.lora_A', 'layers.15.feed_forward.w1.lora_B', 'layers.15.feed_forward.w3.lora_A', 'layers.15.feed_forward.w3.lora_B', 'layers.15.feed_forward.w2.lora_A', 'layers.15.feed_forward.w2.lora_B', 'layers.16.attention.wqkv.lora_A', 'layers.16.attention.wqkv.lora_B', 'layers.16.attention.wo.lora_A', 'layers.16.attention.wo.lora_B', 'layers.16.feed_forward.w1.lora_A', 'layers.16.feed_forward.w1.lora_B', 'layers.16.feed_forward.w3.lora_A', 'layers.16.feed_forward.w3.lora_B', 'layers.16.feed_forward.w2.lora_A', 'layers.16.feed_forward.w2.lora_B', 'layers.17.attention.wqkv.lora_A', 'layers.17.attention.wqkv.lora_B', 'layers.17.attention.wo.lora_A', 'layers.17.attention.wo.lora_B', 'layers.17.feed_forward.w1.lora_A', 'layers.17.feed_forward.w1.lora_B', 'layers.17.feed_forward.w3.lora_A', 'layers.17.feed_forward.w3.lora_B', 'layers.17.feed_forward.w2.lora_A', 'layers.17.feed_forward.w2.lora_B', 'layers.18.attention.wqkv.lora_A', 'layers.18.attention.wqkv.lora_B', 'layers.18.attention.wo.lora_A', 'layers.18.attention.wo.lora_B', 'layers.18.feed_forward.w1.lora_A', 'layers.18.feed_forward.w1.lora_B', 'layers.18.feed_forward.w3.lora_A', 'layers.18.feed_forward.w3.lora_B', 'layers.18.feed_forward.w2.lora_A', 'layers.18.feed_forward.w2.lora_B', 'layers.19.attention.wqkv.lora_A', 'layers.19.attention.wqkv.lora_B', 'layers.19.attention.wo.lora_A', 'layers.19.attention.wo.lora_B', 'layers.19.feed_forward.w1.lora_A', 'layers.19.feed_forward.w1.lora_B', 'layers.19.feed_forward.w3.lora_A', 'layers.19.feed_forward.w3.lora_B', 'layers.19.feed_forward.w2.lora_A', 'layers.19.feed_forward.w2.lora_B', 'layers.20.attention.wqkv.lora_A', 'layers.20.attention.wqkv.lora_B', 'layers.20.attention.wo.lora_A', 'layers.20.attention.wo.lora_B', 'layers.20.feed_forward.w1.lora_A', 'layers.20.feed_forward.w1.lora_B', 'layers.20.feed_forward.w3.lora_A', 'layers.20.feed_forward.w3.lora_B', 'layers.20.feed_forward.w2.lora_A', 'layers.20.feed_forward.w2.lora_B', 'layers.21.attention.wqkv.lora_A', 'layers.21.attention.wqkv.lora_B', 'layers.21.attention.wo.lora_A', 'layers.21.attention.wo.lora_B', 'layers.21.feed_forward.w1.lora_A', 'layers.21.feed_forward.w1.lora_B', 'layers.21.feed_forward.w3.lora_A', 'layers.21.feed_forward.w3.lora_B', 'layers.21.feed_forward.w2.lora_A', 'layers.21.feed_forward.w2.lora_B', 'layers.22.attention.wqkv.lora_A', 'layers.22.attention.wqkv.lora_B', 'layers.22.attention.wo.lora_A', 'layers.22.attention.wo.lora_B', 'layers.22.feed_forward.w1.lora_A', 'layers.22.feed_forward.w1.lora_B', 'layers.22.feed_forward.w3.lora_A', 'layers.22.feed_forward.w3.lora_B', 'layers.22.feed_forward.w2.lora_A', 'layers.22.feed_forward.w2.lora_B', 'layers.23.attention.wqkv.lora_A', 'layers.23.attention.wqkv.lora_B', 'layers.23.attention.wo.lora_A', 'layers.23.attention.wo.lora_B', 'layers.23.feed_forward.w1.lora_A', 'layers.23.feed_forward.w1.lora_B', 'layers.23.feed_forward.w3.lora_A', 'layers.23.feed_forward.w3.lora_B', 'layers.23.feed_forward.w2.lora_A', 'layers.23.feed_forward.w2.lora_B', 'output.lora_A', 'output.lora_B', 'fast_embeddings.lora_A', 'fast_embeddings.lora_B', 'fast_layers.0.attention.wqkv.lora_A', 'fast_layers.0.attention.wqkv.lora_B', 'fast_layers.0.attention.wo.lora_A', 'fast_layers.0.attention.wo.lora_B', 'fast_layers.0.feed_forward.w1.lora_A', 'fast_layers.0.feed_forward.w1.lora_B', 'fast_layers.0.feed_forward.w3.lora_A', 'fast_layers.0.feed_forward.w3.lora_B', 'fast_layers.0.feed_forward.w2.lora_A', 'fast_layers.0.feed_forward.w2.lora_B', 'fast_layers.1.attention.wqkv.lora_A', 'fast_layers.1.attention.wqkv.lora_B', 'fast_layers.1.attention.wo.lora_A', 'fast_layers.1.attention.wo.lora_B', 'fast_layers.1.feed_forward.w1.lora_A', 'fast_layers.1.feed_forward.w1.lora_B', 'fast_layers.1.feed_forward.w3.lora_A', 'fast_layers.1.feed_forward.w3.lora_B', 'fast_layers.1.feed_forward.w2.lora_A', 'fast_layers.1.feed_forward.w2.lora_B', 'fast_layers.2.attention.wqkv.lora_A', 'fast_layers.2.attention.wqkv.lora_B', 'fast_layers.2.attention.wo.lora_A', 'fast_layers.2.attention.wo.lora_B', 'fast_layers.2.feed_forward.w1.lora_A', 'fast_layers.2.feed_forward.w1.lora_B', 'fast_layers.2.feed_forward.w3.lora_A', 'fast_layers.2.feed_forward.w3.lora_B', 'fast_layers.2.feed_forward.w2.lora_A', 'fast_layers.2.feed_forward.w2.lora_B', 'fast_layers.3.attention.wqkv.lora_A', 'fast_layers.3.attention.wqkv.lora_B', 'fast_layers.3.attention.wo.lora_A', 'fast_layers.3.attention.wo.lora_B', 'fast_layers.3.feed_forward.w1.lora_A', 'fast_layers.3.feed_forward.w1.lora_B', 'fast_layers.3.feed_forward.w3.lora_A', 'fast_layers.3.feed_forward.w3.lora_B', 'fast_layers.3.feed_forward.w2.lora_A', 'fast_layers.3.feed_forward.w2.lora_B', 'fast_output.lora_A', 'fast_output.lora_B'], unexpected_keys=[]) [2024-09-21 05:55:53,093][__main__][INFO] - [rank: 0] Instantiating callbacks... [2024-09-21 05:55:53,093][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback [2024-09-21 05:55:53,098][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback [2024-09-21 05:55:53,099][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback [2024-09-21 05:55:53,099][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback [2024-09-21 05:55:53,125][__main__][INFO] - [rank: 0] Instantiating loggers... [2024-09-21 05:55:53,125][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating logger [2024-09-21 05:55:53,129][__main__][INFO] - [rank: 0] Instantiating trainer Trainer already configured with model summary callbacks: []. Skipping setting a default `ModelSummary` callback. GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs [2024-09-21 05:55:53,188][__main__][INFO] - [rank: 0] Logging hyperparameters! 2024-09-21 05:55:53.504280: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-21 05:55:53.520652: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-21 05:55:53.525473: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-21 05:55:53.537436: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-21 05:55:54.685740: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [2024-09-21 05:55:55,703][__main__][INFO] - [rank: 0] Starting training! [2024-09-21 05:55:55,709][__main__][INFO] - [rank: 0] Resuming from checkpoint: results/checkpoints/last.ckpt Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 ---------------------------------------------------------------------------------------------------- distributed_backend=nccl All distributed processes registered. Starting with 1 processes ---------------------------------------------------------------------------------------------------- /usr/local/lib/python3.10/dist-packages/lightning/pytorch/callbacks/model_checkpoint.py:654: Checkpoint directory /content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/results/checkpoints exists and is not empty. Restoring states from the checkpoint path at results/checkpoints/last.ckpt [2024-09-21 05:55:56,694][fish_speech.utils.utils][ERROR] - [rank: 0] Traceback (most recent call last): File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/utils/utils.py", line 66, in wrap metric_dict, object_dict = task_func(cfg=cfg) File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/train.py", line 110, in train trainer.fit(model=model, datamodule=datamodule, ckpt_path=ckpt_path) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit call._call_and_handle_interrupt( File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch return function(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 950, in _run self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 398, in _restore_modules_and_callbacks self.restore_model() File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 275, in restore_model self.trainer.strategy.load_model_state_dict( File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 371, in load_model_state_dict self.lightning_module.load_state_dict(checkpoint["state_dict"], strict=strict) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2215, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for TextToSemantic: Missing key(s) in state_dict: "model.embeddings.weight", "model.codebook_embeddings.weight", "model.layers.0.attention.wqkv.weight", "model.layers.0.attention.wo.weight", "model.layers.0.feed_forward.w1.weight", "model.layers.0.feed_forward.w3.weight", "model.layers.0.feed_forward.w2.weight", "model.layers.0.ffn_norm.weight", "model.layers.0.attention_norm.weight", "model.layers.1.attention.wqkv.weight", "model.layers.1.attention.wo.weight", "model.layers.1.feed_forward.w1.weight", "model.layers.1.feed_forward.w3.weight", "model.layers.1.feed_forward.w2.weight", "model.layers.1.ffn_norm.weight", "model.layers.1.attention_norm.weight", "model.layers.2.attention.wqkv.weight", "model.layers.2.attention.wo.weight", "model.layers.2.feed_forward.w1.weight", "model.layers.2.feed_forward.w3.weight", "model.layers.2.feed_forward.w2.weight", "model.layers.2.ffn_norm.weight", "model.layers.2.attention_norm.weight", "model.layers.3.attention.wqkv.weight", "model.layers.3.attention.wo.weight", "model.layers.3.feed_forward.w1.weight", "model.layers.3.feed_forward.w3.weight", "model.layers.3.feed_forward.w2.weight", "model.layers.3.ffn_norm.weight", "model.layers.3.attention_norm.weight", "model.layers.4.attention.wqkv.weight", "model.layers.4.attention.wo.weight", "model.layers.4.feed_forward.w1.weight", "model.layers.4.feed_forward.w3.weight", "model.layers.4.feed_forward.w2.weight", "model.layers.4.ffn_norm.weight", "model.layers.4.attention_norm.weight", "model.layers.5.attention.wqkv.weight", "model.layers.5.attention.wo.weight", "model.layers.5.feed_forward.w1.weight", "model.layers.5.feed_forward.w3.weight", "model.layers.5.feed_forward.w2.weight", "model.layers.5.ffn_norm.weight", "model.layers.5.attention_norm.weight", "model.layers.6.attention.wqkv.weight", "model.layers.6.attention.wo.weight", "model.layers.6.feed_forward.w1.weight", "model.layers.6.feed_forward.w3.weight", "model.layers.6.feed_forward.w2.weight", "model.layers.6.ffn_norm.weight", "model.layers.6.attention_norm.weight", "model.layers.7.attention.wqkv.weight", "model.layers.7.attention.wo.weight", "model.layers.7.feed_forward.w1.weight", "model.layers.7.feed_forward.w3.weight", "model.layers.7.feed_forward.w2.weight", "model.layers.7.ffn_norm.weight", "model.layers.7.attention_norm.weight", "model.layers.8.attention.wqkv.weight", "model.layers.8.attention.wo.weight", "model.layers.8.feed_forward.w1.weight", "model.layers.8.feed_forward.w3.weight", "model.layers.8.feed_forward.w2.weight", "model.layers.8.ffn_norm.weight", "model.layers.8.attention_norm.weight", "model.layers.9.attention.wqkv.weight", "model.layers.9.attention.wo.weight", "model.layers.9.feed_forward.w1.weight", "model.layers.9.feed_forward.w3.weight", "model.layers.9.feed_forward.w2.weight", "model.layers.9.ffn_norm.weight", "model.layers.9.attention_norm.weight", "model.layers.10.attention.wqkv.weight", "model.layers.10.attention.wo.weight", "model.layers.10.feed_forward.w1.weight", "model.layers.10.feed_forward.w3.weight", "model.layers.10.feed_forward.w2.weight", "model.layers.10.ffn_norm.weight", "model.layers.10.attention_norm.weight", "model.layers.11.attention.wqkv.weight", "model.layers.11.attention.wo.weight", "model.layers.11.feed_forward.w1.weight", "model.layers.11.feed_forward.w3.weight", "model.layers.11.feed_forward.w2.weight", "model.layers.11.ffn_norm.weight", "model.layers.11.attention_norm.weight", "model.layers.12.attention.wqkv.weight", "model.layers.12.attention.wo.weight", "model.layers.12.feed_forward.w1.weight", "model.layers.12.feed_forward.w3.weight", "model.layers.12.feed_forward.w2.weight", "model.layers.12.ffn_norm.weight", "model.layers.12.attention_norm.weight", "model.layers.13.attention.wqkv.weight", "model.layers.13.attention.wo.weight", "model.layers.13.feed_forward.w1.weight", "model.layers.13.feed_forward.w3.weight", "model.layers.13.feed_forward.w2.weight", "model.layers.13.ffn_norm.weight", "model.layers.13.attention_norm.weight", "model.layers.14.attention.wqkv.weight", "model.layers.14.attention.wo.weight", "model.layers.14.feed_forward.w1.weight", "model.layers.14.feed_forward.w3.weight", "model.layers.14.feed_forward.w2.weight", "model.layers.14.ffn_norm.weight", "model.layers.14.attention_norm.weight", "model.layers.15.attention.wqkv.weight", "model.layers.15.attention.wo.weight", "model.layers.15.feed_forward.w1.weight", "model.layers.15.feed_forward.w3.weight", "model.layers.15.feed_forward.w2.weight", "model.layers.15.ffn_norm.weight", "model.layers.15.attention_norm.weight", "model.layers.16.attention.wqkv.weight", "model.layers.16.attention.wo.weight", "model.layers.16.feed_forward.w1.weight", "model.layers.16.feed_forward.w3.weight", "model.layers.16.feed_forward.w2.weight", "model.layers.16.ffn_norm.weight", "model.layers.16.attention_norm.weight", "model.layers.17.attention.wqkv.weight", "model.layers.17.attention.wo.weight", "model.layers.17.feed_forward.w1.weight", "model.layers.17.feed_forward.w3.weight", "model.layers.17.feed_forward.w2.weight", "model.layers.17.ffn_norm.weight", "model.layers.17.attention_norm.weight", "model.layers.18.attention.wqkv.weight", "model.layers.18.attention.wo.weight", "model.layers.18.feed_forward.w1.weight", "model.layers.18.feed_forward.w3.weight", "model.layers.18.feed_forward.w2.weight", "model.layers.18.ffn_norm.weight", "model.layers.18.attention_norm.weight", "model.layers.19.attention.wqkv.weight", "model.layers.19.attention.wo.weight", "model.layers.19.feed_forward.w1.weight", "model.layers.19.feed_forward.w3.weight", "model.layers.19.feed_forward.w2.weight", "model.layers.19.ffn_norm.weight", "model.layers.19.attention_norm.weight", "model.layers.20.attention.wqkv.weight", "model.layers.20.attention.wo.weight", "model.layers.20.feed_forward.w1.weight", "model.layers.20.feed_forward.w3.weight", "model.layers.20.feed_forward.w2.weight", "model.layers.20.ffn_norm.weight", "model.layers.20.attention_norm.weight", "model.layers.21.attention.wqkv.weight", "model.layers.21.attention.wo.weight", "model.layers.21.feed_forward.w1.weight", "model.layers.21.feed_forward.w3.weight", "model.layers.21.feed_forward.w2.weight", "model.layers.21.ffn_norm.weight", "model.layers.21.attention_norm.weight", "model.layers.22.attention.wqkv.weight", "model.layers.22.attention.wo.weight", "model.layers.22.feed_forward.w1.weight", "model.layers.22.feed_forward.w3.weight", "model.layers.22.feed_forward.w2.weight", "model.layers.22.ffn_norm.weight", "model.layers.22.attention_norm.weight", "model.layers.23.attention.wqkv.weight", "model.layers.23.attention.wo.weight", "model.layers.23.feed_forward.w1.weight", "model.layers.23.feed_forward.w3.weight", "model.layers.23.feed_forward.w2.weight", "model.layers.23.ffn_norm.weight", "model.layers.23.attention_norm.weight", "model.norm.weight", "model.output.weight", "model.fast_embeddings.weight", "model.fast_layers.0.attention.wqkv.weight", "model.fast_layers.0.attention.wo.weight", "model.fast_layers.0.feed_forward.w1.weight", "model.fast_layers.0.feed_forward.w3.weight", "model.fast_layers.0.feed_forward.w2.weight", "model.fast_layers.0.ffn_norm.weight", "model.fast_layers.0.attention_norm.weight", "model.fast_layers.1.attention.wqkv.weight", "model.fast_layers.1.attention.wo.weight", "model.fast_layers.1.feed_forward.w1.weight", "model.fast_layers.1.feed_forward.w3.weight", "model.fast_layers.1.feed_forward.w2.weight", "model.fast_layers.1.ffn_norm.weight", "model.fast_layers.1.attention_norm.weight", "model.fast_layers.2.attention.wqkv.weight", "model.fast_layers.2.attention.wo.weight", "model.fast_layers.2.feed_forward.w1.weight", "model.fast_layers.2.feed_forward.w3.weight", "model.fast_layers.2.feed_forward.w2.weight", "model.fast_layers.2.ffn_norm.weight", "model.fast_layers.2.attention_norm.weight", "model.fast_layers.3.attention.wqkv.weight", "model.fast_layers.3.attention.wo.weight", "model.fast_layers.3.feed_forward.w1.weight", "model.fast_layers.3.feed_forward.w3.weight", "model.fast_layers.3.feed_forward.w2.weight", "model.fast_layers.3.ffn_norm.weight", "model.fast_layers.3.attention_norm.weight", "model.fast_norm.weight", "model.fast_output.weight". [2024-09-21 05:55:56,701][fish_speech.utils.utils][INFO] - [rank: 0] Output dir: results/ Error executing job with overrides: ['project=', '+lora@model.model.lora_config=r_8_alpha_16'] Traceback (most recent call last): File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/train.py", line 137, in main train(cfg) File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/utils/utils.py", line 77, in wrap raise ex File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/utils/utils.py", line 66, in wrap metric_dict, object_dict = task_func(cfg=cfg) File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/train.py", line 110, in train trainer.fit(model=model, datamodule=datamodule, ckpt_path=ckpt_path) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit call._call_and_handle_interrupt( File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch return function(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 950, in _run self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 398, in _restore_modules_and_callbacks self.restore_model() File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 275, in restore_model self.trainer.strategy.load_model_state_dict( File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 371, in load_model_state_dict self.lightning_module.load_state_dict(checkpoint["state_dict"], strict=strict) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2215, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for TextToSemantic: Missing key(s) in state_dict: "model.embeddings.weight", "model.codebook_embeddings.weight", "model.layers.0.attention.wqkv.weight", "model.layers.0.attention.wo.weight", "model.layers.0.feed_forward.w1.weight", "model.layers.0.feed_forward.w3.weight", "model.layers.0.feed_forward.w2.weight", "model.layers.0.ffn_norm.weight", "model.layers.0.attention_norm.weight", "model.layers.1.attention.wqkv.weight", "model.layers.1.attention.wo.weight", "model.layers.1.feed_forward.w1.weight", "model.layers.1.feed_forward.w3.weight", "model.layers.1.feed_forward.w2.weight", "model.layers.1.ffn_norm.weight", "model.layers.1.attention_norm.weight", "model.layers.2.attention.wqkv.weight", "model.layers.2.attention.wo.weight", "model.layers.2.feed_forward.w1.weight", "model.layers.2.feed_forward.w3.weight", "model.layers.2.feed_forward.w2.weight", "model.layers.2.ffn_norm.weight", "model.layers.2.attention_norm.weight", "model.layers.3.attention.wqkv.weight", "model.layers.3.attention.wo.weight", "model.layers.3.feed_forward.w1.weight", "model.layers.3.feed_forward.w3.weight", "model.layers.3.feed_forward.w2.weight", "model.layers.3.ffn_norm.weight", "model.layers.3.attention_norm.weight", "model.layers.4.attention.wqkv.weight", "model.layers.4.attention.wo.weight", "model.layers.4.feed_forward.w1.weight", "model.layers.4.feed_forward.w3.weight", "model.layers.4.feed_forward.w2.weight", "model.layers.4.ffn_norm.weight", "model.layers.4.attention_norm.weight", "model.layers.5.attention.wqkv.weight", "model.layers.5.attention.wo.weight", "model.layers.5.feed_forward.w1.weight", "model.layers.5.feed_forward.w3.weight", "model.layers.5.feed_forward.w2.weight", "model.layers.5.ffn_norm.weight", "model.layers.5.attention_norm.weight", "model.layers.6.attention.wqkv.weight", "model.layers.6.attention.wo.weight", "model.layers.6.feed_forward.w1.weight", "model.layers.6.feed_forward.w3.weight", "model.layers.6.feed_forward.w2.weight", "model.layers.6.ffn_norm.weight", "model.layers.6.attention_norm.weight", "model.layers.7.attention.wqkv.weight", "model.layers.7.attention.wo.weight", "model.layers.7.feed_forward.w1.weight", "model.layers.7.feed_forward.w3.weight", "model.layers.7.feed_forward.w2.weight", "model.layers.7.ffn_norm.weight", "model.layers.7.attention_norm.weight", "model.layers.8.attention.wqkv.weight", "model.layers.8.attention.wo.weight", "model.layers.8.feed_forward.w1.weight", "model.layers.8.feed_forward.w3.weight", "model.layers.8.feed_forward.w2.weight", "model.layers.8.ffn_norm.weight", "model.layers.8.attention_norm.weight", "model.layers.9.attention.wqkv.weight", "model.layers.9.attention.wo.weight", "model.layers.9.feed_forward.w1.weight", "model.layers.9.feed_forward.w3.weight", "model.layers.9.feed_forward.w2.weight", "model.layers.9.ffn_norm.weight", "model.layers.9.attention_norm.weight", "model.layers.10.attention.wqkv.weight", "model.layers.10.attention.wo.weight", "model.layers.10.feed_forward.w1.weight", "model.layers.10.feed_forward.w3.weight", "model.layers.10.feed_forward.w2.weight", "model.layers.10.ffn_norm.weight", "model.layers.10.attention_norm.weight", "model.layers.11.attention.wqkv.weight", "model.layers.11.attention.wo.weight", "model.layers.11.feed_forward.w1.weight", "model.layers.11.feed_forward.w3.weight", "model.layers.11.feed_forward.w2.weight", "model.layers.11.ffn_norm.weight", "model.layers.11.attention_norm.weight", "model.layers.12.attention.wqkv.weight", "model.layers.12.attention.wo.weight", "model.layers.12.feed_forward.w1.weight", "model.layers.12.feed_forward.w3.weight", "model.layers.12.feed_forward.w2.weight", "model.layers.12.ffn_norm.weight", "model.layers.12.attention_norm.weight", "model.layers.13.attention.wqkv.weight", "model.layers.13.attention.wo.weight", "model.layers.13.feed_forward.w1.weight", "model.layers.13.feed_forward.w3.weight", "model.layers.13.feed_forward.w2.weight", "model.layers.13.ffn_norm.weight", "model.layers.13.attention_norm.weight", "model.layers.14.attention.wqkv.weight", "model.layers.14.attention.wo.weight", "model.layers.14.feed_forward.w1.weight", "model.layers.14.feed_forward.w3.weight", "model.layers.14.feed_forward.w2.weight", "model.layers.14.ffn_norm.weight", "model.layers.14.attention_norm.weight", "model.layers.15.attention.wqkv.weight", "model.layers.15.attention.wo.weight", "model.layers.15.feed_forward.w1.weight", "model.layers.15.feed_forward.w3.weight", "model.layers.15.feed_forward.w2.weight", "model.layers.15.ffn_norm.weight", "model.layers.15.attention_norm.weight", "model.layers.16.attention.wqkv.weight", "model.layers.16.attention.wo.weight", "model.layers.16.feed_forward.w1.weight", "model.layers.16.feed_forward.w3.weight", "model.layers.16.feed_forward.w2.weight", "model.layers.16.ffn_norm.weight", "model.layers.16.attention_norm.weight", "model.layers.17.attention.wqkv.weight", "model.layers.17.attention.wo.weight", "model.layers.17.feed_forward.w1.weight", "model.layers.17.feed_forward.w3.weight", "model.layers.17.feed_forward.w2.weight", "model.layers.17.ffn_norm.weight", "model.layers.17.attention_norm.weight", "model.layers.18.attention.wqkv.weight", "model.layers.18.attention.wo.weight", "model.layers.18.feed_forward.w1.weight", "model.layers.18.feed_forward.w3.weight", "model.layers.18.feed_forward.w2.weight", "model.layers.18.ffn_norm.weight", "model.layers.18.attention_norm.weight", "model.layers.19.attention.wqkv.weight", "model.layers.19.attention.wo.weight", "model.layers.19.feed_forward.w1.weight", "model.layers.19.feed_forward.w3.weight", "model.layers.19.feed_forward.w2.weight", "model.layers.19.ffn_norm.weight", "model.layers.19.attention_norm.weight", "model.layers.20.attention.wqkv.weight", "model.layers.20.attention.wo.weight", "model.layers.20.feed_forward.w1.weight", "model.layers.20.feed_forward.w3.weight", "model.layers.20.feed_forward.w2.weight", "model.layers.20.ffn_norm.weight", "model.layers.20.attention_norm.weight", "model.layers.21.attention.wqkv.weight", "model.layers.21.attention.wo.weight", "model.layers.21.feed_forward.w1.weight", "model.layers.21.feed_forward.w3.weight", "model.layers.21.feed_forward.w2.weight", "model.layers.21.ffn_norm.weight", "model.layers.21.attention_norm.weight", "model.layers.22.attention.wqkv.weight", "model.layers.22.attention.wo.weight", "model.layers.22.feed_forward.w1.weight", "model.layers.22.feed_forward.w3.weight", "model.layers.22.feed_forward.w2.weight", "model.layers.22.ffn_norm.weight", "model.layers.22.attention_norm.weight", "model.layers.23.attention.wqkv.weight", "model.layers.23.attention.wo.weight", "model.layers.23.feed_forward.w1.weight", "model.layers.23.feed_forward.w3.weight", "model.layers.23.feed_forward.w2.weight", "model.layers.23.ffn_norm.weight", "model.layers.23.attention_norm.weight", "model.norm.weight", "model.output.weight", "model.fast_embeddings.weight", "model.fast_layers.0.attention.wqkv.weight", "model.fast_layers.0.attention.wo.weight", "model.fast_layers.0.feed_forward.w1.weight", "model.fast_layers.0.feed_forward.w3.weight", "model.fast_layers.0.feed_forward.w2.weight", "model.fast_layers.0.ffn_norm.weight", "model.fast_layers.0.attention_norm.weight", "model.fast_layers.1.attention.wqkv.weight", "model.fast_layers.1.attention.wo.weight", "model.fast_layers.1.feed_forward.w1.weight", "model.fast_layers.1.feed_forward.w3.weight", "model.fast_layers.1.feed_forward.w2.weight", "model.fast_layers.1.ffn_norm.weight", "model.fast_layers.1.attention_norm.weight", "model.fast_layers.2.attention.wqkv.weight", "model.fast_layers.2.attention.wo.weight", "model.fast_layers.2.feed_forward.w1.weight", "model.fast_layers.2.feed_forward.w3.weight", "model.fast_layers.2.feed_forward.w2.weight", "model.fast_layers.2.ffn_norm.weight", "model.fast_layers.2.attention_norm.weight", "model.fast_layers.3.attention.wqkv.weight", "model.fast_layers.3.attention.wo.weight", "model.fast_layers.3.feed_forward.w1.weight", "model.fast_layers.3.feed_forward.w3.weight", "model.fast_layers.3.feed_forward.w2.weight", "model.fast_layers.3.ffn_norm.weight", "model.fast_layers.3.attention_norm.weight", "model.fast_norm.weight", "model.fast_output.weight". Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. ``` --- ### Timeline **@Stardust-minus** commented · Jan 26, 2025 at 8:21am > Add ckpt_path in config.yml **@med1844** commented · Jan 28, 2025 at 2:16am > `tree | grep config.yml` yields no result, i.e. there's no such file. **@abhisirka2001** commented · Jan 28, 2025 at 9:41am > I am facing the same issue > can somebody help? > I have passed the merged model directory in config.yaml ,```pretrained_ckpt_path: /workspace/FISHSPEECH/lora_weights```. > @Stardust-minus @HimanshuRepozitory **Stardust-minus** was mentioned · Jan 28, 2025 at 9:41am **HimanshuRepozitory** was mentioned · Jan 28, 2025 at 9:41am **abhisirka2001** was mentioned · Feb 11, 2025 at 3am **@ootsuka-repos** commented · Mar 3, 2025 at 9:02am > @Stardust-minus @HimanshuRepozitory @med1844 @abhisirka2001 > > I placed the same problem. > I checked and it seems that the ckpt saved during the process has a key used in the Lora training and the difference between the pre-model and this key causes the error. > > Put model._strict_loading = False before trainer.fit(model=model, datamodule=datamodule, ckpt_path=ckpt_path) in train.py. **med1844** was mentioned · Mar 3, 2025 at 9:02am **Stardust-minus** was mentioned · Mar 3, 2025 at 9:02am **abhisirka2001** was mentioned · Mar 3, 2025 at 9:02am **HimanshuRepozitory** was mentioned · Mar 3, 2025 at 9:02am **@Pandede** commented · Mar 20, 2025 at 2:21am > @ootsuka-repos Could this method introduce any underlying issue, such as some layers would be re-trained from scratch but not from the checkpoint? **ootsuka-repos** was mentioned · Mar 20, 2025 at 2:21am