Large model [Qwen2-7B local deployment (WEB version)] (windows)

2024-07-12

Large Model Series Article Catalog

Preface

The big model is the strongest AI by the first half of 2024. Qwen2 is the most powerful open source big model in China. This is the first article in the big model series, which aims to quickly deploy and see how the recently released big model works. If the effect is good, fine-tune your own GPTs.

1. Download and install Ollama

EnterOfficial Website
Click Download
Select Windos and click Download for Windows (Preview). Here, VPN will be much faster.
Default installation

2. Download and install Qwen2

1. Download Qwen2

Enter the official tutorial: https://qwen.readthedocs.io/zh-cn/latest/getting_started/installation.html
First, click on the efficiency evaluation at the bottom to see the video memory occupied by each model, and choose the one that suits you. For example, my graphics card is 4070 with 12G video memory, so the model I chose is Qwen2-7B-Instruct GPTQ-Int4
EnterDownload Link
You will see different suffixes, "q" + the number of bits used to store the weights (precision) + the specific variant, with higher numbers giving better performance.
The larger the number, the higher the accuracy. k increases the accuracy by 2 bits on all attention and feed_forward tensors, and m increases the accuracy by 2 bits on half of the attention and feed_forward tensors.
Choose a model according to your needs. I directly chose q8 here.

2. Run Qwen2

Create a new folder, give it an English name (qwen), and move qwen2-7b-instruct-q8_0.gguf into the folder.
Create a new file named Modelfile in the folder and fill it with

FROM ./qwen2-7b-instruct-q8_0.gguf
1

Then use the command line to create the Qwen2-7B model through ollama:

ollama create Qwen2-7B -f ./Modelfile
1

The success message indicates successful creation.

Run, enter the command

ollama run Qwen2-7B
1

When a dialog box appears, you can chat
Please add a description of the image

If you want to see what large models are available locally: Ollama List
If you want to delete this model: ollama rm xxx
If you want to see which big models are running: ollama ps

But chatting in DOS always feels the same as the way of chatting in the last century, so in order to find the feeling of GPT, we will continue to implement it in the web.

Node.js

1. Download and install Node.js

EnterNode official websiteDownload Node and install
Verify the node version:

node -v
1

No problem with v20 or above

downloadollama-webui code
Enter the ollama-webui folder and set the domestic mirror source to speed up:

npm config set registry http://mirrors.cloud.tencent.com/npm/
1

Install Node.js dependencies:

npm install
1

If an error message is given saying that an audit is required, proceed in order:

npm audit
npm audit fix
1
2

Start the web interface:

npm run dev
1

OpenWeb Pages, select your model to start the conversation:
Please add a description of the image

Technology Sharing