Skipping import of cpp extensions due to incompatible torch version 2.7.0+cu126 for torchao version 0.16.0 Please see https://github.com/pytorch/ao/issues/2919 for more info `torch_dtype` is deprecated! Use `dtype` instead! Some weights of the model checkpoint at /home/ubuntu/training/checkpoints/cohere-transcribe-ckpt-10000 were not used when initializing CohereAsrForConditionalGeneration: ['encoder_encoder_decoder_proj.bias', 'encoder_encoder_decoder_proj.weight', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.0.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.0.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.0.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.0.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.0.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.0.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.0.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.0.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.0.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.0.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.0.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.0.third_sub_layer.dense_out.weight', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.1.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.1.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.1.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.1.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.1.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.1.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.1.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.1.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.1.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.1.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.1.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.1.third_sub_layer.dense_out.weight', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.2.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.2.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.2.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.2.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.2.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.2.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.2.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.2.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.2.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.2.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.2.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.2.third_sub_layer.dense_out.weight', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.3.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.3.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.3.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.3.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.3.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.3.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.3.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.3.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.3.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.3.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.3.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.3.third_sub_layer.dense_out.weight', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.4.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.4.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.4.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.4.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.4.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.4.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.4.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.4.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.4.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.4.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.4.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.4.third_sub_layer.dense_out.weight', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.5.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.5.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.5.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.5.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.5.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.5.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.5.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.5.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.5.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.5.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.5.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.5.third_sub_layer.dense_out.weight', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.6.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.6.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.6.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.6.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.6.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.6.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.6.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.6.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.6.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.6.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.6.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.6.third_sub_layer.dense_out.weight', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.7.first_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.7.layer_norm_1.bias', 'transf_decoder._transf_decoder._decoder.layers.7.layer_norm_1.weight', 'transf_decoder._transf_decoder._decoder.layers.7.layer_norm_2.bias', 'transf_decoder._transf_decoder._decoder.layers.7.layer_norm_2.weight', 'transf_decoder._transf_decoder._decoder.layers.7.layer_norm_3.bias', 'transf_decoder._transf_decoder._decoder.layers.7.layer_norm_3.weight', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.key_net.bias', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.key_net.weight', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.out_projection.bias', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.out_projection.weight', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.query_net.bias', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.query_net.weight', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.value_net.bias', 'transf_decoder._transf_decoder._decoder.layers.7.second_sub_layer.value_net.weight', 'transf_decoder._transf_decoder._decoder.layers.7.third_sub_layer.dense_in.bias', 'transf_decoder._transf_decoder._decoder.layers.7.third_sub_layer.dense_in.weight', 'transf_decoder._transf_decoder._decoder.layers.7.third_sub_layer.dense_out.bias', 'transf_decoder._transf_decoder._decoder.layers.7.third_sub_layer.dense_out.weight'] - This IS expected if you are initializing CohereAsrForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing CohereAsrForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of CohereAsrForConditionalGeneration were not initialized from the model checkpoint at /home/ubuntu/training/checkpoints/cohere-transcribe-ckpt-10000 and are newly initialized: ['encoder_decoder_proj.bias', 'encoder_decoder_proj.weight', 'transf_decoder._decoder.layers.0.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.0.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.0.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.0.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.0.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.0.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.0.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.0.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.0.layer_norm_1.bias', 'transf_decoder._decoder.layers.0.layer_norm_1.weight', 'transf_decoder._decoder.layers.0.layer_norm_2.bias', 'transf_decoder._decoder.layers.0.layer_norm_2.weight', 'transf_decoder._decoder.layers.0.layer_norm_3.bias', 'transf_decoder._decoder.layers.0.layer_norm_3.weight', 'transf_decoder._decoder.layers.0.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.0.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.0.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.0.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.0.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.0.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.0.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.0.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.0.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.0.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.0.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.0.third_sub_layer.dense_out.weight', 'transf_decoder._decoder.layers.1.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.1.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.1.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.1.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.1.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.1.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.1.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.1.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.1.layer_norm_1.bias', 'transf_decoder._decoder.layers.1.layer_norm_1.weight', 'transf_decoder._decoder.layers.1.layer_norm_2.bias', 'transf_decoder._decoder.layers.1.layer_norm_2.weight', 'transf_decoder._decoder.layers.1.layer_norm_3.bias', 'transf_decoder._decoder.layers.1.layer_norm_3.weight', 'transf_decoder._decoder.layers.1.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.1.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.1.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.1.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.1.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.1.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.1.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.1.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.1.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.1.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.1.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.1.third_sub_layer.dense_out.weight', 'transf_decoder._decoder.layers.2.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.2.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.2.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.2.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.2.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.2.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.2.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.2.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.2.layer_norm_1.bias', 'transf_decoder._decoder.layers.2.layer_norm_1.weight', 'transf_decoder._decoder.layers.2.layer_norm_2.bias', 'transf_decoder._decoder.layers.2.layer_norm_2.weight', 'transf_decoder._decoder.layers.2.layer_norm_3.bias', 'transf_decoder._decoder.layers.2.layer_norm_3.weight', 'transf_decoder._decoder.layers.2.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.2.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.2.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.2.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.2.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.2.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.2.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.2.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.2.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.2.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.2.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.2.third_sub_layer.dense_out.weight', 'transf_decoder._decoder.layers.3.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.3.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.3.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.3.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.3.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.3.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.3.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.3.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.3.layer_norm_1.bias', 'transf_decoder._decoder.layers.3.layer_norm_1.weight', 'transf_decoder._decoder.layers.3.layer_norm_2.bias', 'transf_decoder._decoder.layers.3.layer_norm_2.weight', 'transf_decoder._decoder.layers.3.layer_norm_3.bias', 'transf_decoder._decoder.layers.3.layer_norm_3.weight', 'transf_decoder._decoder.layers.3.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.3.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.3.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.3.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.3.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.3.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.3.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.3.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.3.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.3.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.3.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.3.third_sub_layer.dense_out.weight', 'transf_decoder._decoder.layers.4.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.4.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.4.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.4.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.4.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.4.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.4.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.4.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.4.layer_norm_1.bias', 'transf_decoder._decoder.layers.4.layer_norm_1.weight', 'transf_decoder._decoder.layers.4.layer_norm_2.bias', 'transf_decoder._decoder.layers.4.layer_norm_2.weight', 'transf_decoder._decoder.layers.4.layer_norm_3.bias', 'transf_decoder._decoder.layers.4.layer_norm_3.weight', 'transf_decoder._decoder.layers.4.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.4.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.4.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.4.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.4.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.4.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.4.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.4.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.4.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.4.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.4.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.4.third_sub_layer.dense_out.weight', 'transf_decoder._decoder.layers.5.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.5.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.5.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.5.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.5.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.5.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.5.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.5.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.5.layer_norm_1.bias', 'transf_decoder._decoder.layers.5.layer_norm_1.weight', 'transf_decoder._decoder.layers.5.layer_norm_2.bias', 'transf_decoder._decoder.layers.5.layer_norm_2.weight', 'transf_decoder._decoder.layers.5.layer_norm_3.bias', 'transf_decoder._decoder.layers.5.layer_norm_3.weight', 'transf_decoder._decoder.layers.5.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.5.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.5.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.5.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.5.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.5.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.5.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.5.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.5.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.5.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.5.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.5.third_sub_layer.dense_out.weight', 'transf_decoder._decoder.layers.6.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.6.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.6.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.6.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.6.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.6.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.6.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.6.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.6.layer_norm_1.bias', 'transf_decoder._decoder.layers.6.layer_norm_1.weight', 'transf_decoder._decoder.layers.6.layer_norm_2.bias', 'transf_decoder._decoder.layers.6.layer_norm_2.weight', 'transf_decoder._decoder.layers.6.layer_norm_3.bias', 'transf_decoder._decoder.layers.6.layer_norm_3.weight', 'transf_decoder._decoder.layers.6.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.6.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.6.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.6.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.6.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.6.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.6.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.6.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.6.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.6.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.6.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.6.third_sub_layer.dense_out.weight', 'transf_decoder._decoder.layers.7.first_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.7.first_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.7.first_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.7.first_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.7.first_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.7.first_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.7.first_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.7.first_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.7.layer_norm_1.bias', 'transf_decoder._decoder.layers.7.layer_norm_1.weight', 'transf_decoder._decoder.layers.7.layer_norm_2.bias', 'transf_decoder._decoder.layers.7.layer_norm_2.weight', 'transf_decoder._decoder.layers.7.layer_norm_3.bias', 'transf_decoder._decoder.layers.7.layer_norm_3.weight', 'transf_decoder._decoder.layers.7.second_sub_layer.key_net.bias', 'transf_decoder._decoder.layers.7.second_sub_layer.key_net.weight', 'transf_decoder._decoder.layers.7.second_sub_layer.out_projection.bias', 'transf_decoder._decoder.layers.7.second_sub_layer.out_projection.weight', 'transf_decoder._decoder.layers.7.second_sub_layer.query_net.bias', 'transf_decoder._decoder.layers.7.second_sub_layer.query_net.weight', 'transf_decoder._decoder.layers.7.second_sub_layer.value_net.bias', 'transf_decoder._decoder.layers.7.second_sub_layer.value_net.weight', 'transf_decoder._decoder.layers.7.third_sub_layer.dense_in.bias', 'transf_decoder._decoder.layers.7.third_sub_layer.dense_in.weight', 'transf_decoder._decoder.layers.7.third_sub_layer.dense_out.bias', 'transf_decoder._decoder.layers.7.third_sub_layer.dense_out.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Using the latest cached version of the dataset since BayAreaBoys/indic-asr-benchmark-6k couldn't be found on the Hugging Face Hub Found the latest cached dataset configuration 'default' at /home/ubuntu/training/datasets/indic-asr-benchmark-6k/BayAreaBoys___indic-asr-benchmark-6k/default/0.0.0/4d4bc89b7c915a6d80a1efbee3a708d58e688c81 (last modified on Tue Mar 31 15:10:13 2026). Loading checkpoint: /home/ubuntu/training/checkpoints/cohere-transcribe-ckpt-10000 Model loaded in 6.4s Loading dataset: BayAreaBoys/indic-asr-benchmark-6k Filter: 0%| | 0/6000 [00:00 main() File "/home/ubuntu/training/benchmark_cohere_transcribe.py", line 439, in main results = run_inference( File "/home/ubuntu/training/benchmark_cohere_transcribe.py", line 327, in run_inference features = processor.feature_extractor( File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/cohere_hyphen_transcribe_hyphen_ckpt_hyphen_10000/processing_cohere_asr.py", line 436, in __call__ input_features, length = self.filterbank(audio_tensor, seq_len) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/cohere_hyphen_transcribe_hyphen_ckpt_hyphen_10000/processing_cohere_asr.py", line 256, in forward x = self._apply_dither(x, seq_len_time) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn return fn(*args, **kwargs) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/cohere_hyphen_transcribe_hyphen_ckpt_hyphen_10000/processing_cohere_asr.py", line 166, in _apply_dither noise = torch.randn( RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'