webui与deepseek671B部署

创建可以使用gpu的容器

创建容器：

1	`docker run -d --name deepseek-webui -p 8001:8080 --gpus all nvidia/cuda:12.1.1-base-ubuntu22.04 sleep infinity`

进入容器：

1	`docker exec -it deepseek-webui /bin/bash`

容器内安装webui与ollama

下载安装miniconda:

apt update
apt install wget
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

创建conda环境并安装webui:

conda create -n webui python=3.11
conda activate webui
pip install open-webui
open-webui serve

本次在cpu上进行deepseek推理部署，选用deepseek-r1-671b-1.58bit量化模型。
首先安装ollama:

apt install curl
apt install systemctl
curl -fsSL https://ollama.com/install.sh -o ollama_install.sh
chmod +x ollama_install.sh
sed -i 's|https://ollama.com/download/|https://github.com/ollama/ollama/releases/download/v0.5.7/|' ollama_install.sh
sh ollama_install.sh
systemctl start ollama

模型拉取与执行

拉取模型：

1	`ollama pull SIGJNF/deepseek-r1-671b-1.58bit`

配置文件：

1	`vim cpu.modelfile`

写入如下内容：

FROM SIGJNF/deepseek-r1-671b-1.58bit:latest

PARAMETER num_gpu 0
SYSTEM """A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>"""

创建模型：

1	`ollama create SIGJNF/deepseek-r1-671b-1.58bit:cpu -f cpu.modelfile`

设置加载超时时间：

1	`export OLLAMA_LOAD_TIMEOUT=360m`

如果没启动ollama的话，启动ollama服务：

1	`ollama serve`

启动模型：

1	`ollama run SIGJNF/deepseek-r1-671b-1.58bit:cpu`

也可通过机器ip+端口号来访问webui使用模型

webui与deepseek671B部署

http://example.com/2025/02/27/deepseek671B部署/

Beitragsautor

John Doe

Veröffentlicht am

February 27, 2025

Urheberrechtshinweis

bcache使用 Vorheriger

docker创建容器 Nächster