The big model is the strongest AI by the first half of 2024. Qwen2 is the most powerful open source big model in China. This is the first article in the big model series, which aims to quickly deploy and see how the recently released big model works. If the effect is good, fine-tune your own GPTs.
Select Windos and click Download for Windows (Preview). Here, VPN will be much faster.
Default installation
2. Download and install Qwen2
1. Download Qwen2
Enter the official tutorial: https://qwen.readthedocs.io/zh-cn/latest/getting_started/installation.html
First, click on the efficiency evaluation at the bottom to see the video memory occupied by each model, and choose the one that suits you. For example, my graphics card is 4070 with 12G video memory, so the model I chose is Qwen2-7B-Instruct GPTQ-Int4
You will see different suffixes, "q" + the number of bits used to store the weights (precision) + the specific variant, with higher numbers giving better performance.
The larger the number, the higher the accuracy. k increases the accuracy by 2 bits on all attention and feed_forward tensors, and m increases the accuracy by 2 bits on half of the attention and feed_forward tensors.
Choose a model according to your needs. I directly chose q8 here.
2. Run Qwen2
Create a new folder, give it an English name (qwen), and move qwen2-7b-instruct-q8_0.gguf into the folder.
Create a new file named Modelfile in the folder and fill it with
FROM ./qwen2-7b-instruct-q8_0.gguf
1
Then use the command line to create the Qwen2-7B model through ollama:
ollama create Qwen2-7B -f ./Modelfile
1
The success message indicates successful creation.
Run, enter the command
ollama run Qwen2-7B
1
When a dialog box appears, you can chat
If you want to see what large models are available locally: Ollama List If you want to delete this model: ollama rm xxx If you want to see which big models are running: ollama ps
But chatting in DOS always feels the same as the way of chatting in the last century, so in order to find the feeling of GPT, we will continue to implement it in the web.