Skipping import of cpp extensions due to incompatible torch version 2.9.1+cu128 for torchao version 0.16.0 Please see https://github.com/pytorch/ao/issues/2919 for more info (APIServer pid=2487454) INFO 03-30 07:40:02 [utils.py:287] (APIServer pid=2487454) INFO 03-30 07:40:02 [utils.py:287] █ █ █▄ ▄█ (APIServer pid=2487454) INFO 03-30 07:40:02 [utils.py:287] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.16.0 (APIServer pid=2487454) INFO 03-30 07:40:02 [utils.py:287] █▄█▀ █ █ █ █ model Scicom-intl/Multilingual-Expressive-TTS-0.6B (APIServer pid=2487454) INFO 03-30 07:40:02 [utils.py:287] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=2487454) INFO 03-30 07:40:02 [utils.py:287] (APIServer pid=2487454) INFO 03-30 07:40:02 [utils.py:223] non-default args: {'host': '0.0.0.0', 'port': 9093, 'model': 'Scicom-intl/Multilingual-Expressive-TTS-0.6B', 'dtype': 'bfloat16', 'max_model_len': 1024, 'gpu_memory_utilization': 0.95, 'kv_cache_dtype': 'fp8', 'enable_prefix_caching': True, 'max_num_seqs': 1024} (APIServer pid=2487454) INFO 03-30 07:40:03 [model.py:529] Resolved architecture: Qwen3ForCausalLM (APIServer pid=2487454) INFO 03-30 07:40:03 [model.py:1871] Downcasting torch.float32 to torch.bfloat16. (APIServer pid=2487454) INFO 03-30 07:40:03 [model.py:1549] Using max model len 1024 (APIServer pid=2487454) INFO 03-30 07:40:03 [cache.py:214] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor. (APIServer pid=2487454) INFO 03-30 07:40:03 [scheduler.py:224] Chunked prefill is enabled with max_num_batched_tokens=2048. (APIServer pid=2487454) INFO 03-30 07:40:03 [vllm.py:689] Asynchronous scheduling is enabled. Skipping import of cpp extensions due to incompatible torch version 2.9.1+cu128 for torchao version 0.16.0 Please see https://github.com/pytorch/ao/issues/2919 for more info (EngineCore_DP0 pid=2487641) INFO 03-30 07:40:12 [core.py:97] Initializing a V1 LLM engine (v0.16.0) with config: model='Scicom-intl/Multilingual-Expressive-TTS-0.6B', speculative_config=None, tokenizer='Scicom-intl/Multilingual-Expressive-TTS-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=fp8, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=Scicom-intl/Multilingual-Expressive-TTS-0.6B, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': , 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False, 'fuse_act_padding': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore_DP0 pid=2487641) INFO 03-30 07:40:14 [parallel_state.py:1234] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://216.81.248.184:36719 backend=nccl (EngineCore_DP0 pid=2487641) INFO 03-30 07:40:14 [parallel_state.py:1445] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A (EngineCore_DP0 pid=2487641) INFO 03-30 07:40:14 [gpu_model_runner.py:4124] Starting to load model Scicom-intl/Multilingual-Expressive-TTS-0.6B... (EngineCore_DP0 pid=2487641) INFO 03-30 07:40:15 [cuda.py:367] Using FLASHINFER attention backend out of potential backends: ['FLASHINFER', 'TRITON_ATTN']. (EngineCore_DP0 pid=2487641) :1184: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (EngineCore_DP0 pid=2487641) :1184: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (EngineCore_DP0 pid=2487641) INFO 03-30 07:40:16 [weight_utils.py:579] No model.safetensors.index.json found in remote. (EngineCore_DP0 pid=2487641) Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00 (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [2/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu:1: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 22 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [3/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu:1: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 22 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [4/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu:1: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 22 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [5/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu:1: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 22 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [6/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu:1: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 22 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [7/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu:1: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 22 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [8/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu:1: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 22 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [9/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/page.cuh:26, (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_config.inc:2, (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cu:16: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/utils.cuh:21:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 21 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] [10/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] In file included from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/page.cuh:26, (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/pos_enc.cuh:27, (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/scheduler.cuh:30, (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu:17: (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/utils.cuh:21:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] 21 | #include (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] compilation terminated. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] ninja: build stopped: subcommand failed. (EngineCore_DP0 pid=2487641) ERROR 03-30 07:40:35 [core.py:1006] (EngineCore_DP0 pid=2487641) Process EngineCore_DP0: (EngineCore_DP0 pid=2487641) Traceback (most recent call last): (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/jit/cpp_ext.py", line 328, in run_ninja (EngineCore_DP0 pid=2487641) subprocess.run( (EngineCore_DP0 pid=2487641) File "/usr/lib/python3.10/subprocess.py", line 526, in run (EngineCore_DP0 pid=2487641) raise CalledProcessError(retcode, process.args, (EngineCore_DP0 pid=2487641) subprocess.CalledProcessError: Command '['ninja', '-v', '-C', '/home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False', '-f', '/home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/build.ninja']' returned non-zero exit status 1. (EngineCore_DP0 pid=2487641) (EngineCore_DP0 pid=2487641) The above exception was the direct cause of the following exception: (EngineCore_DP0 pid=2487641) (EngineCore_DP0 pid=2487641) Traceback (most recent call last): (EngineCore_DP0 pid=2487641) File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=2487641) self.run() (EngineCore_DP0 pid=2487641) File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=2487641) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1010, in run_engine_core (EngineCore_DP0 pid=2487641) raise e (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 996, in run_engine_core (EngineCore_DP0 pid=2487641) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=2487641) return func(*args, **kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 740, in __init__ (EngineCore_DP0 pid=2487641) super().__init__( (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 113, in __init__ (EngineCore_DP0 pid=2487641) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=2487641) return func(*args, **kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 275, in _initialize_kv_caches (EngineCore_DP0 pid=2487641) self.model_executor.initialize_from_config(kv_cache_configs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config (EngineCore_DP0 pid=2487641) self.collective_rpc("compile_or_warm_up_model") (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc (EngineCore_DP0 pid=2487641) result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=2487641) return func(*args, **kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=2487641) return func(*args, **kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 467, in compile_or_warm_up_model (EngineCore_DP0 pid=2487641) kernel_warmup(self) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 72, in kernel_warmup (EngineCore_DP0 pid=2487641) worker.model_runner._dummy_run( (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context (EngineCore_DP0 pid=2487641) return func(*args, **kwargs) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4788, in _dummy_run (EngineCore_DP0 pid=2487641) attn_metadata, _ = self._build_attention_metadata( (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1883, in _build_attention_metadata (EngineCore_DP0 pid=2487641) _build_attn_group_metadata(kv_cache_gid, attn_gid, cm) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1834, in _build_attn_group_metadata (EngineCore_DP0 pid=2487641) attn_metadata_i = builder.build( (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/attention/backends/flashinfer.py", line 1084, in build (EngineCore_DP0 pid=2487641) prefill_wrapper.plan( (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/prefill.py", line 1918, in plan (EngineCore_DP0 pid=2487641) self._cached_module = get_batch_prefill_module( (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/prefill.py", line 404, in get_batch_prefill_module (EngineCore_DP0 pid=2487641) module = gen_batch_prefill_module(backend, *args).build_and_load() (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/jit/core.py", line 317, in build_and_load (EngineCore_DP0 pid=2487641) self.build(verbose, need_lock=False) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/jit/core.py", line 303, in build (EngineCore_DP0 pid=2487641) run_ninja(self.build_dir, self.ninja_path, verbose) (EngineCore_DP0 pid=2487641) File "/home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/jit/cpp_ext.py", line 340, in run_ninja (EngineCore_DP0 pid=2487641) raise RuntimeError(msg) from e (EngineCore_DP0 pid=2487641) RuntimeError: Ninja build failed. Ninja output: (EngineCore_DP0 pid=2487641) ninja: Entering directory `/home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False' (EngineCore_DP0 pid=2487641) [1/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [2/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [3/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [4/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [5/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [6/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [7/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [8/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu:1: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/prefill.cuh:22:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 22 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [9/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/page.cuh:26, (EngineCore_DP0 pid=2487641) from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_config.inc:2, (EngineCore_DP0 pid=2487641) from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_binding.cu:16: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/utils.cuh:21:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 21 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) [10/11] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o (EngineCore_DP0 pid=2487641) FAILED: [code=1] /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o (EngineCore_DP0 pid=2487641) /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /usr/include/python3.10 -isystem /usr/include/cccl -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/csrc -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -std=c++17 --threads=1 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -O3 -c /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o /home/ubuntu/.cache/flashinfer/0.6.3/80/cached_ops/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o (EngineCore_DP0 pid=2487641) In file included from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/page.cuh:26, (EngineCore_DP0 pid=2487641) from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/pos_enc.cuh:27, (EngineCore_DP0 pid=2487641) from /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/scheduler.cuh:30, (EngineCore_DP0 pid=2487641) from /home/ubuntu/.cache/flashinfer/0.6.3/80/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu:17: (EngineCore_DP0 pid=2487641) /home/ubuntu/.local/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/utils.cuh:21:10: fatal error: cuda_fp8.h: No such file or directory (EngineCore_DP0 pid=2487641) 21 | #include (EngineCore_DP0 pid=2487641) | ^~~~~~~~~~~~ (EngineCore_DP0 pid=2487641) compilation terminated. (EngineCore_DP0 pid=2487641) ninja: build stopped: subcommand failed. (EngineCore_DP0 pid=2487641) [rank0]:[W330 07:40:36.155440072 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=2487454) Traceback (most recent call last): (APIServer pid=2487454) File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main (APIServer pid=2487454) return _run_code(code, main_globals, None, (APIServer pid=2487454) File "/usr/lib/python3.10/runpy.py", line 86, in _run_code (APIServer pid=2487454) exec(code, run_globals) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 531, in (APIServer pid=2487454) uvloop.run(run_server(args)) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run (APIServer pid=2487454) return loop.run_until_complete(wrapper()) (APIServer pid=2487454) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper (APIServer pid=2487454) return await main (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 457, in run_server (APIServer pid=2487454) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 476, in run_server_worker (APIServer pid=2487454) async with build_async_engine_client( (APIServer pid=2487454) File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__ (APIServer pid=2487454) return await anext(self.gen) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=2487454) async with build_async_engine_client_from_engine_args( (APIServer pid=2487454) File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__ (APIServer pid=2487454) return await anext(self.gen) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=2487454) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 222, in from_vllm_config (APIServer pid=2487454) return cls( (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 148, in __init__ (APIServer pid=2487454) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=2487454) return func(*args, **kwargs) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 124, in make_async_mp_client (APIServer pid=2487454) return AsyncMPClient(*client_args) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=2487454) return func(*args, **kwargs) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 835, in __init__ (APIServer pid=2487454) super().__init__( (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 490, in __init__ (APIServer pid=2487454) with launch_core_engines(vllm_config, executor_class, log_stats) as ( (APIServer pid=2487454) File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__ (APIServer pid=2487454) next(self.gen) (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 925, in launch_core_engines (APIServer pid=2487454) wait_for_engine_startup( (APIServer pid=2487454) File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup (APIServer pid=2487454) raise RuntimeError( (APIServer pid=2487454) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [1]+ Exit 1 python3 -m vllm.entrypoints.openai.api_server --model Scicom-intl/Multilingual-Expressive-TTS-0.6B --host 0.0.0.0 --port 9093 --max-model-len 1024 --gpu-memory-utilization 0.95 --dtype bfloat16 --kv-cache-dtype fp8 --enable-prefix-caching --max-num-seqs 1024 2>&1