VLLm: Unterschied zwischen den Versionen
Zur Navigation springen
Zur Suche springen
KKeine Bearbeitungszusammenfassung |
|||
| Zeile 1: | Zeile 1: | ||
=== Beschreibung === | === Beschreibung === | ||
Docker normal installieren | |||
=== Download === | === Download === | ||
| Zeile 12: | Zeile 13: | ||
=== Ausführen === | === Ausführen === | ||
<syntaxhighlight lang="bash" line="1"> | Variante 1:<syntaxhighlight lang="bash" line="1"> | ||
docker run -it --rm --shm-size=8g --device=/dev/kfd --device=/dev/dri \ | docker run -it --rm --shm-size=8g --device=/dev/kfd --device=/dev/dri \ | ||
--group-add video -p 8086:8000 \ | --group-add video -p 8086:8000 \ | ||
| Zeile 18: | Zeile 19: | ||
nalanzeyu/vllm-gfx906 \ | nalanzeyu/vllm-gfx906 \ | ||
vllm serve /models/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit --served-model-name Homelab --max-model-len 30000 --enable-auto-tool-choice --tool-call-parser hermes | vllm serve /models/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit --served-model-name Homelab --max-model-len 30000 --enable-auto-tool-choice --tool-call-parser hermes | ||
</syntaxhighlight>Variante 2, getestet 18.12.2025:<syntaxhighlight lang="bash"> | |||
sudo docker run -it --rm --network=host \ | |||
--group-add=video --ipc=host --cap-add=SYS_PTRACE \ | |||
--security-opt seccomp=unconfined --device /dev/kfd \ | |||
--device /dev/dri \ | |||
-v /home/hendrik/.lmstudio/models/:/app/models \ | |||
-e HF_HOME="/app/models" \ | |||
-e HF_TOKEN="<TOKEN>" \ | |||
-e NCCL_P2P_DISABLE=1 \ | |||
-e VLLM_CUSTOM_OPS=all \ | |||
-e VLLM_ROCM_USE_AITER=0 \ | |||
-e SAFETENSORS_FAST_GPU=1 \ | |||
-e PYTORCH_TUNABLEOP_ENABLED=1 | |||
rocm/vllm-dev:nightly | |||
</syntaxhighlight>Ohne Tensor Parallism:<syntaxhighlight lang="bash"> | |||
vllm serve Qwen/Qwen3-VL-8B-Thinking --max_model_len 4096 --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser qwen3 | |||
</syntaxhighlight>Mit:<syntaxhighlight lang="bash"> | |||
vllm serve Qwen/Qwen3-VL-8B-Thinking --tp 2 --max_model_len 4096 --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser qwen3 | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Version vom 18. Dezember 2025, 19:45 Uhr
Beschreibung
Docker normal installieren
Download
Normal (ROCm)
docker pull rocm/vllm-dev:nightlygfx906
docker pull nalanzeyu/vllm-gfx906Ausführen
Variante 1:
docker run -it --rm --shm-size=8g --device=/dev/kfd --device=/dev/dri \
--group-add video -p 8086:8000 \
-v /mnt/share/models:/models \
nalanzeyu/vllm-gfx906 \
vllm serve /models/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit --served-model-name Homelab --max-model-len 30000 --enable-auto-tool-choice --tool-call-parser hermesVariante 2, getestet 18.12.2025:
sudo docker run -it --rm --network=host \
--group-add=video --ipc=host --cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined --device /dev/kfd \
--device /dev/dri \
-v /home/hendrik/.lmstudio/models/:/app/models \
-e HF_HOME="/app/models" \
-e HF_TOKEN="<TOKEN>" \
-e NCCL_P2P_DISABLE=1 \
-e VLLM_CUSTOM_OPS=all \
-e VLLM_ROCM_USE_AITER=0 \
-e SAFETENSORS_FAST_GPU=1 \
-e PYTORCH_TUNABLEOP_ENABLED=1
rocm/vllm-dev:nightlyOhne Tensor Parallism:
vllm serve Qwen/Qwen3-VL-8B-Thinking --max_model_len 4096 --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser qwen3Mit:
vllm serve Qwen/Qwen3-VL-8B-Thinking --tp 2 --max_model_len 4096 --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser qwen3