Technology Sharing

Weird error log

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

https://github.com/meta-llama/llama3/issues/80

There is no problem reading the model, but the following occurs during inference:
RuntimeError: “triu_tril_cuda_template” not implemented for ‘BFloat16’

————————————————

Cause of the incident

When I try to understand the AutoProcessor of transformers, it prompts me:
RuntimeError: Failed to import transformers.models.auto.processing_auto because of the following error (look up to see its traceback):
Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.8 and torchvision has CUDA Version=11.7. Please reinstall the torchvision that matches your PyTorch install.
It says that the cuda versions of my torch and torchvision don't match? I originally installed it according to Pytorch...

My torch version is as follows:
torch 2.0.0+cu118
torchaudio 2.0.1
torchvision 0.15.1

It's strange, there is no cu118 after the two. So I found the official website of pytorch and downloaded it again:
pip install torch2.0.0 torchvision0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

This time it was correct. I only uninstalled torchvision, so torchaudio did not update.
torch 2.0.0+cu118
torchaudio 2.0.1
torchvision 0.15.1+cu118

This is when the most recent error occurs.

————————

Temporary remedies

I am reading qwen1.5 7B, and set torch_dtype=torch.bfloat16. After changing bfloat16 to torch_dtype=torch.float16, it can be inferred. Or return torchvision to the normal version.
But torch.float16 and torch.bfloat16 are two completely different things. It doesn't seem right to just switch them...

——————————————

The best remedy

With torch_dtype="auto", transformers will automatically use bfloat16.
I also made some observations, printing model.config under different conditions:

  1. I used Autoconfig and sent the config file that comes with qwen to AutoModelForCausalLM.from_pertrained. It showed that bfloat16 was used, but it actually consumed the space of float32.
  2. Set torch_dtype=torch.float16, which consumes 16 bytes of space and displays float16.
  3. Set torch_dtype="auto", consume 16 bytes of space, and display bfloat16.

Oh, it's too much.