5 分钟阅读

AI培训班deepseek部署测试

AI培训联通云电脑

用户名 密码
cuichaoyang xxxxxx

培训虚拟机交付表

虚拟机明细

虚拟机名称 备注 虚拟机Root密码
cuichaoyang 初始的root密码: xxxxxx
cuichaoyang 修改后的root密码: xxxxxx

端口映射表

项目名称: IP地址
公网地址: x.x.x.x
虚拟机名称 虚拟机ID 映射端口 虚拟机端口
cuichaoyang 8600037278844887040 12380 22
cuichaoyang 8600037278844887040 12396 1088
cuichaoyang 8600037278844887040 12412 8080
cuichaoyang 8600037278844887040 12428 11434

本地化部署

查看当前目录下的文件大小:

root@cuichaoyang:~#
ls -lh

结果如下:

total 2.0G
drwxr-xr-x  2 root root 4.0K Apr 16 10:29 install_env
drwxr-xr-x 11 root root 4.0K Sep 20  2022 NVIDIA_CUDA-11.0_Samples
-rw-r--r--  1 root root 348M Oct  2  2022 NVIDIA-Linux-x86_64-515.76.run
-rw-r--r--  1 root root 1.7G Apr 12 23:02 ollama-linux-amd64.tgz
-rw-r--r--  1 root root 1.3K Apr 11 15:11 run_cluster.sh
drwx------  3 root root 4.0K Sep 19  2022 snap
-rw-r--r--  1 root root  770 Apr 16 10:57 start-vllm.sh

一、Ollama本地部署

1.1 Ollama Windows系统部署

暂无

1.2 Ollama Linux系统部署

暂无

1.3 Ollama Docker运行

暂无

二、vLLM 本地部署

2.1 vLLM Docker单机部署

vLLM是什么?

vLLM(Virtual Large Language Model)是一款专为大规模语言模型(LLM)设计的高效推理和服务引擎,其核心目标是通过创新的内存管理和并行计算技术,显著提升LLM推理的吞吐量和资源利用率,同时降低延迟,从而支持高并发场景下的实时应用。现已发展成为一个由学术界和工业界共同贡献的社区驱动项目。

下载docker镜像文件1

root@cuichaoyang:~# 
docker pull vllm/vllm-openai:v0.7.3

结果如下:

v0.7.3: Pulling from vllm/vllm-openai
Digest: sha256:4f4037303e8c7b69439db1077bb849a0823517c0f785b894dc8e96d58ef3a0c2
Status: Image is up to date for vllm/vllm-openai:v0.7.3
docker.io/vllm/vllm-openai:v0.7.3

下载docker镜像文件2

root@cuichaoyang:~# 
docker pull ghcr.io/open-webui/open-webui:cuda

结果如下:

cuda: Pulling from open-webui/open-webui
Digest: sha256:b768633c723f01a2d566d090710a853f39c569c362b913476dfe4977addd613e
Status: Image is up to date for ghcr.io/open-webui/open-webui:cuda
ghcr.io/open-webui/open-webui:cuda

下载docker镜像文件3

root@cuichaoyang:~# 
docker pull gpustack/gpustack

结果如下:

Using default tag: latest
latest: Pulling from gpustack/gpustack
Digest: sha256:8c0521de972f1f16494a87e3de696d4c06cc3d2a08716aa990656132bb9d2eb0
Status: Image is up to date for gpustack/gpustack:latest
docker.io/gpustack/gpustack:latest

查看docker镜像文件

root@cuichaoyang:~#
docker images

结果如下:

REPOSITORY                      TAG        IMAGE ID       CREATED        SIZE
ghcr.io/open-webui/open-webui   cuda       b768633c723f   47 hours ago   13.4GB
gpustack/gpustack               0.6.0rc1   292a17d3032b   4 days ago     23.7GB
vllm/vllm-openai                v0.7.3     4f4037303e8c   7 weeks ago    25.4GB
gpustack/gpustack               latest     8c0521de972f   2 months ago   21.3GB

下载1.5B推理模型

阿里魔搭社区,下载deepseek-r1-1.5B模型的命令脚本01_model_download-1.5B.sh如下:

#!/bin/bash

# 安装依赖
apt install python3-pip -y
pip3 install modelscope

# 创建模型存储目录
#mkdir -p /data01/snapshot_download/deepseekr1_Qwen_32B
mkdir -p /data01/snapshot_download/deepseekr1_Qwen_1.5B


# 执行Python下载代码
python3 - <<EOF
from modelscope import snapshot_download

model_dir = snapshot_download(
    'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B',
    cache_dir='/data01/snapshot_download/deepseekr1_Qwen_1.5B',
    revision='master'
)
print(f"Model downloaded to: {model_dir}")
EOF

查看下载后的1.5B文件目录:

root@cuichaoyang:~# 
ls /data01/snapshot_download/deepseekr1_Qwen_1.5B/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/ -lh

结果如下:

total 3.4G
-rw-r--r-- 1 root root  679 Apr 16 10:29 config.json
-rw-r--r-- 1 root root   73 Apr 16 10:29 configuration.json
drwxr-xr-x 2 root root 4.0K Apr 16 10:29 figures
-rw-r--r-- 1 root root  181 Apr 16 10:29 generation_config.json
-rw-r--r-- 1 root root 1.1K Apr 16 10:29 LICENSE
-rw-r--r-- 1 root root 3.4G Apr 16 10:32 model.safetensors
-rw-r--r-- 1 root root  16K Apr 16 10:29 README.md
-rw-r--r-- 1 root root 3.0K Apr 16 10:29 tokenizer_config.json
-rw-r--r-- 1 root root 6.8M Apr 16 10:29 tokenizer.json

安装显卡驱动

查看NVIDIA显卡的状态(vLLM运行需要cuda版本为12.1以上,可通过nvidia-smi来查看版本。如版本满足,次步骤可略过。):

root@cuichaoyang:~#
nvidia-smi

执行结果如下:

Wed Apr 16 16:12:49 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:07.0 Off |                    0 |
| N/A   72C    P0             34W /   70W |   12571MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1236      G   /usr/lib/xorg/Xorg                              4MiB |
|    0   N/A  N/A     15817      C   /usr/bin/python3                            12550MiB |
+-----------------------------------------------------------------------------------------+

若没有安装,自行去NVIDIA官网下载相关驱动,并安装。

docker启动

做好端口映射,将8000:8000替换为12412:8000

root@cuichaoyang:~# 
docker run -d --runtime nvidia --gpus all  \
    -v /data01/snapshot_download/deepseekr1_Qwen_1.5B/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B:/home/llm_deploy \
    --env "HF_HUB_OFFLINE=1" \
    -p 12412:8000 \
    --ipc=host \
    vllm/vllm-openai:v0.7.3 \
    --model /home/llm_deploy \
    --served_model_name DeepSeek-R1-1.5B \
    --max_model_len 2048 \
    --tensor-parallel-size 1 \
    --dtype=half \
    --gpu-memory-utilization=0.9 \
    --api_key OPENWEBUI123

ifconfig查看本机ip地址:

root@cuichaoyang:~# 
ifconfig

执行结果如下:

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::389a:d5ff:fe36:188e  prefixlen 64  scopeid 0x20<link>
        ether 3a:9a:d5:36:18:8e  txqueuelen 0  (Ethernet)
        RX packets 34  bytes 3299 (3.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 40  bytes 10068 (10.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.3.229  netmask 255.255.255.0  broadcast 192.168.3.255
        inet6 fe80::f816:3eff:fe8b:25b9  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:8b:25:b9  txqueuelen 1000  (Ethernet)
        RX packets 3313814  bytes 11559062108 (11.5 GB)
        RX errors 0  dropped 2  overruns 0  frame 0
        TX packets 2134191  bytes 136495740 (136.4 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 4538  bytes 538920 (538.9 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4538  bytes 538920 (538.9 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth9e2d49d: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::8c:26ff:fe5e:6325  prefixlen 64  scopeid 0x20<link>
        ether 02:8c:26:5e:63:25  txqueuelen 0  (Ethernet)
        RX packets 19  bytes 3145 (3.1 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 38  bytes 8764 (8.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethb0eb23a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::b0ab:b9ff:fe2c:30f  prefixlen 64  scopeid 0x20<link>
        ether b2:ab:b9:2c:03:0f  txqueuelen 0  (Ethernet)
        RX packets 3  bytes 126 (126.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 13  bytes 1777 (1.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

问答(相关命令):

root@cuichaoyang:~# 
 curl http://192.168.3.229:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer OPENWEBUI123" \
    -d '{
        "model": "DeepSeek-R1-1.5B",
        "messages": [
            {"role": "system", "content": "世界上最高的山峰是哪座山?"},
            {"role": "user", "content": "你好"}
        ]
    }'

结果如下:

{"id":"chatcmpl-0b7f8076e98d4178a8309a9641049e04","object":"chat.completion","created":1744788362,"model":"DeepSeek-R1-1.5B","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"好,现在我要回答用户的问题:“世界上最高的山峰是哪座山?”首先,我想确认一下用户是在寻找这个问题的答案,可能他们想知道答案的来源,或者对火星的名称产生兴趣。\n\n接下来,我需要思考这个问题在历史上和科学上的相关性。我认为这个问题涉及到多个方面,包括已知山峰的高度,以及科学史上的故事。我应该先整理一下已知的最高的山峰,比如珠穆朗玛峰,然后我记得(lineshape尔峰的高度了。但为了准确回答,最好在正式上下文中补充相关信息,但这样可能会让用户有点困惑。\n\n我还应该考虑用户可能关心的是这座山是否有特别的意义,比如向地球、人类或者其他生命形式对的一种认同。这也是一个值得探讨的话题,可以增加回答的深度和趣味性。\n\n最后,我应该保持语言简洁明了,避免使用过于专业或晦涩的词汇,让用户容易理解。同时,考虑到用户可能没有直接回答问题的意图,我需要确保回答不仅满足他们的需求,还能为他们提供更多有用的信息,比如更多关于珠穆朗玛峰等的背景知识。\n\n总结一下,我的回答应该是:\n世界上最高的山峰是珠穆朗玛峰,高度超过8848米,秋季达到最高峰,年均气温36.2摄氏度。它以其独特的山顶king角色而闻名,被用来向地球、人类与其他生命形式呈现一种认同感。此外, Lineshape尔峰也是世界上的山峰,高度超过8811米,位于引发这个词意义的地方。\n</think>\n\n世界上最高的山峰之一是珠穆朗玛峰,其海拔为8848米,是山Tán的 Namesake之一。 blenders of geothermal fragile and engineering campaigns have made supporter the revered geothermal wafer. spectral properties and members ofING качество.德意志联合档案机构保留了老 (^(speed这个部分似乎有些疑问。来自_lineshape尔峰。.md attribute: Lineshape尔峰. 此山也是世界上的山峰,当我们将它视为这一概念时,这反映了它对为何能成为某个误会或误解的组成部分。 Dallas & lifelong Andrea T. Crow. 🐔讽刺了Lineshape尔峰与其他山峰生长之间的关系。图片体现Lineshape尔峰。青灰色 ,来自_lineshape尔峰。的click以外的颜色的 Purchase Conflict Dogxt;.graphs主题图片 file sailing logo elifaces it无直接的使用可能要展示意图。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":15,"total_tokens":526,"completion_tokens":511,"prompt_tokens_details":null}

查看docker的进程:

root@cuichaoyang:~# 
docker ps -a

执行结果:

CONTAINER ID   IMAGE                     COMMAND                  CREATED          STATUS                      PORTS                                         NAMES
734a3ff89d8c   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   39 minutes ago   Exited (1) 39 minutes ago                                                 clever_hermann
5bc72edf47f9   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   43 minutes ago   Up 43 minutes               0.0.0.0:8000->8000/tcp, [::]:8000->8000/tcp   dreamy_pascal
01eb5b671c82   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   5 hours ago      Created                                                                   exciting_cannon
453430ead584   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   5 hours ago      Exited (1) 5 hours ago                                                    stoic_golick
8afccd73a918   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   5 hours ago      Created                                                                   keen_noether
5df6ef72b29a   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   5 hours ago      Exited (1) 5 hours ago                                                    interesting_knuth
5148204b5c14   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   5 hours ago      Created                                                                   practical_lehmann
e9a538621323   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   5 hours ago      Exited (1) 5 hours ago                                                    mystifying_tharp
592e2e324e11   vllm/vllm-openai:v0.7.3   "python3 -m vllm.ent…"   3 days ago       Exited (1) 3 days ago                                                     strange_bhabha

2.2 vLLM集群部署

暂无

三、GPUStack本地部署

暂无

GPUStack是什么?

GPUStack 支持异构 GPU 构建统一管理的算力集群,无论目标 GPU 运行在 Apple Mac、Windows PC 还是 Linux 服务器上, GPUStack 都能统一纳管并形成统一算力集群。GPUStack 管理员可以从诸如 HuggingFace、modelscope等流行的大语言模型仓库中轻松部署任意 LLM。进而,开发人员则可以像访问 OpenAI 或 Microsoft Azure 等供应商提供的公有 LLM 服务的 API 一样,非常简便地调用 OpenAI 兼容的 API 访问部署就绪的私有 LLM

3.1GPUStack集群模式

暂无