🐛 LLaMA 3.2 Vision — RuntimeError: `cutlassF: no kernel found to launch!`

# 🐛 LLaMA 3.2 Vision — RuntimeError: `cutlassF: no kernel found to launch!` ## 🔍 Problem Description While running Meta's `Llama-3.2-11B-Vision` model using Hugging Face Transformers and PyTorch, the following error occurred during inference: ```bash= \anaconda3\envs\llama3vision\lib\site-packages\transformers\models\mllama\modeling_mllama.py", line 316, in forward attn_output = F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask) RuntimeError: cutlassF: no kernel found to launch! ``` This error typically happens when using scaled dot-product attention in PyTorch: ```python= attn_output = F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask) ``` ## 📦 Setup Details * Torch version: 2.2.0+cu121 (2.1.2+cu121 would cause error) ```bash pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121 ```