Running a Chatbot
This document provides step-by-step instructions on setting up and running a LLaMA chatbot on your Hyperstack virtual machine. By following these instructions, you can configure your environment, install the necessary tools, and deploy the chatbot.
Prerequisites
- These instructions are for a Debian-based Linux virtual machine, like Ubuntu or Debian.
Prepare the operating system
Open a terminal on your virtual machine, update the package list, and install the necessary packages which include Docker:
sudo apt update
sudo apt install nvidia-utils-515 nvidia-driver-515 docker.io git-lfs
Install Docker Compose
Install Docker Compose by running the following commands:
VERSION=$(curl --silent https://api.github.com/repos/docker/compose/releases/latest | grep -Po '"tag_name": "\K.*\d')
DESTINATION=/usr/local/bin/docker-compose
sudo curl -L https://github.com/docker/compose/releases/download/${VERSION}/docker-compose-$(uname -s)-$(uname -m) -o $DESTINATION
sudo chmod +x /usr/local/bin/docker-compose
These commands download and install Docker Compose, a tool used for defining and running multi-container Docker applications. They retrieve the latest version of Docker Compose, place it in a designated location, and make it executable.
Install Nvidia Toolkit for Docker
Configure Nvidia Toolkit for Docker with the following commands:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
These commands are responsible for installing and configuring the Nvidia Toolkit for Docker. They ensure that Docker containers have access to Nvidia GPU resources by setting up the necessary package repositories, installing the toolkit, and configuring the runtime environment.
Download the LLaMA Chatbot web user interface
Execute the following commands to download and install the web UI components necessary for the LLaMA chatbot.
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-web-ui
mkdir installers
cd installers
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-debian11-11-7-local_11.7.1-515.65.01-1_amd64.deb
These commands download and set up the web user interface (UI) for the LLaMA chatbot. They clone the necessary files and components from a repository, create required directories, and fetch additional dependencies.
For additional information about the text generation web UI, click here.
Download the chatbot model
Navigate to the "models" directory and clone the chatbot model by running the following commands:
cd ../models
git-lfs clone https://huggingface.co/decapoda-research/llama-13b-hf/tree/main
Prepare Docker files
- Create a Dockerfile by running the following commands:
nano Dockerfile
- In the text editor that opens, paste the following content into the Dockerfile:
FROM python:3.10.6-slim-bullseye
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y git software-properties-common gnupg
COPY . /app
WORKDIR /app
RUN dpkg -i /app/installers/cuda-repo-debian11-11-7-local_11.7.1-515.65.01-1_amd64.deb
RUN cp /var/cuda-repo-debian11-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
RUN add-apt-repository contrib
RUN apt-get update
RUN apt-get -y install cuda \
&& apt -y remove nvidia-* \
&& rm -rf /var/cuda-repo-debian11-11-6-local
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/google_translate/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/silero_tts/requirements.txt
CMD python server.py --auto-devices --cai-chat --load-in-8bit --bf16 --listen --listen-port=8888
This Dockerfile initializes an image based on Python 3.10.6 and Debian Bullseye, setting up essential packages, including Git and CUDA support.
- Save and exit the text editor (press
Ctrl
+O
, thenEnter
, andCtrl
+X
).
Create a Docker Compose file
- Create a Docker Compose file by running the following command:
nano docker-compose.yml
- In the text editor, paste the following content into the docker-compose.yml file:
version: "3.3"
services:
text-generation-webui:
build: .
ports:
- "8889:8888"
stdin_open: true
tty: true
volumes:
- .:/app
command: python server.py --auto-devices --cai-chat --model "llama-7b-hf" --listen --listen-port=8888 --gpu-memory 15 15 15 15
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
This Docker Compose file sets up the environment for the LLaMA chatbot. It defines the necessary configurations, specifically GPU memory allocation, with "15 15 15 15" indicating 15GB of memory for each of the 4 available GPUs.
- Save and exit the text editor (press
Ctrl
+O
, thenEnter
, andCtrl
+X
).
Update the requirements file
If you have a "requirements.txt" file, open it and append the following lines to it:
--extra-index-url https://download.pytorch.org/whl/cu117
torchaudio
torch==1.13.1+cu117
torchvision==0.14.1+cu117
These lines append package versions, including PyTorch and related libraries, ensuring compatibility and enabling the LLaMA chatbot to function as expected.
Run the chatbot
Execute the following command to start the chatbot within the Docker container:
docker-compose up
The chatbot should now be running on your virtual machine, as shown below.
Chatbot user interface
To learn about the various features of the text generation web UI, click here.
Chatbot UI documentation quick-links:
- Chat mode - Used to have multi-turn conversations with the model.
- Default and Notebook modes - Used to generate raw completions starting from your prompt.
- Parameters Tab - Contains parameters that control the text generation.
- Model Tab - This is the section where you can load models, apply LoRAs to a loaded model, and download new models.
- Training Tab - For training your own LoRAs.
- Session Tab - Used to restart the UI with new settings.