webui与deepseek671B部署

创建可以使用gpu的容器

创建容器:

1
docker run -d --name deepseek-webui -p 8001:8080 --gpus all nvidia/cuda:12.1.1-base-ubuntu22.04 sleep infinity

进入容器:

1
docker exec -it deepseek-webui /bin/bash

容器内安装webui与ollama

下载安装miniconda:

1
2
3
4
5
apt update
apt install wget
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

创建conda环境并安装webui:

1
2
3
4
conda create -n webui python=3.11
conda activate webui
pip install open-webui
open-webui serve

本次在cpu上进行deepseek推理部署,选用deepseek-r1-671b-1.58bit量化模型。
首先安装ollama:

1
2
3
4
5
6
7
apt install curl
apt install systemctl
curl -fsSL https://ollama.com/install.sh -o ollama_install.sh
chmod +x ollama_install.sh
sed -i 's|https://ollama.com/download/|https://github.com/ollama/ollama/releases/download/v0.5.7/|' ollama_install.sh
sh ollama_install.sh
systemctl start ollama

模型拉取与执行

拉取模型:

1
ollama pull SIGJNF/deepseek-r1-671b-1.58bit

配置文件:

1
vim cpu.modelfile

写入如下内容:

1
2
3
4
FROM SIGJNF/deepseek-r1-671b-1.58bit:latest

PARAMETER num_gpu 0
SYSTEM """A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>"""

创建模型:

1
ollama create SIGJNF/deepseek-r1-671b-1.58bit:cpu -f cpu.modelfile

设置加载超时时间:

1
export OLLAMA_LOAD_TIMEOUT=360m

如果没启动ollama的话,启动ollama服务:

1
ollama serve

启动模型:

1
ollama run SIGJNF/deepseek-r1-671b-1.58bit:cpu

也可通过机器ip+端口号来访问webui使用模型


webui与deepseek671B部署
http://example.com/2025/02/27/deepseek671B部署/
Beitragsautor
John Doe
Veröffentlicht am
February 27, 2025
Urheberrechtshinweis