Running a Chatbot

This document provides step-by-step instructions on setting up and running a LLaMA chatbot on your Hyperstack virtual machine. By following these instructions, you can configure your environment, install the necessary tools, and deploy the chatbot.

Prerequisites

These instructions are for a Debian-based Linux virtual machine, like Ubuntu or Debian.

Prepare the operating system

Open a terminal on your virtual machine, update the package list, and install the necessary packages which include Docker:

sudo apt update
sudo apt install nvidia-utils-515 nvidia-driver-515 docker.io git-lfs

Install Docker Compose

Install Docker Compose by running the following commands:

VERSION=$(curl --silent https://api.github.com/repos/docker/compose/releases/latest | grep -Po '"tag_name": "\K.*\d')
DESTINATION=/usr/local/bin/docker-compose

sudo curl -L https://github.com/docker/compose/releases/download/${VERSION}/docker-compose-$(uname -s)-$(uname -m) -o $DESTINATION
sudo chmod +x /usr/local/bin/docker-compose

These commands download and install Docker Compose, a tool used for defining and running multi-container Docker applications. They retrieve the latest version of Docker Compose, place it in a designated location, and make it executable.

Install Nvidia Toolkit for Docker

Configure Nvidia Toolkit for Docker with the following commands:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
         sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
         sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

These commands are responsible for installing and configuring the Nvidia Toolkit for Docker. They ensure that Docker containers have access to Nvidia GPU resources by setting up the necessary package repositories, installing the toolkit, and configuring the runtime environment.

Download the LLaMA Chatbot web user interface

Execute the following commands to download and install the web UI components necessary for the LLaMA chatbot.

git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-web-ui
mkdir installers
cd installers
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-debian11-11-7-local_11.7.1-515.65.01-1_amd64.deb

These commands download and set up the web user interface (UI) for the LLaMA chatbot. They clone the necessary files and components from a repository, create required directories, and fetch additional dependencies.

For additional information about the text generation web UI, click here.

Download the chatbot model

Navigate to the "models" directory and clone the chatbot model by running the following commands:

cd ../models
git-lfs clone https://huggingface.co/decapoda-research/llama-13b-hf/tree/main

Prepare Docker files

Create a Dockerfile by running the following commands:

nano Dockerfile

In the text editor that opens, paste the following content into the Dockerfile:

FROM python:3.10.6-slim-bullseye
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y git software-properties-common gnupg
COPY . /app
WORKDIR /app
RUN dpkg -i /app/installers/cuda-repo-debian11-11-7-local_11.7.1-515.65.01-1_amd64.deb
RUN cp /var/cuda-repo-debian11-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
RUN add-apt-repository contrib
RUN apt-get update
RUN apt-get -y install cuda \
&& apt -y remove nvidia-* \
&& rm -rf /var/cuda-repo-debian11-11-6-local
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/google_translate/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/silero_tts/requirements.txt
CMD python server.py --auto-devices --cai-chat --load-in-8bit --bf16 --listen --listen-port=8888

This Dockerfile initializes an image based on Python 3.10.6 and Debian Bullseye, setting up essential packages, including Git and CUDA support.

Save and exit the text editor (press Ctrl + O, then Enter, and Ctrl + X).

Create a Docker Compose file

Create a Docker Compose file by running the following command:

nano docker-compose.yml

In the text editor, paste the following content into the docker-compose.yml file:

version: "3.3"
services:
  text-generation-webui:
    build: .
    ports:
      - "8889:8888"
    stdin_open: true
    tty: true
    volumes:
      - .:/app
    command: python server.py --auto-devices --cai-chat --model "llama-7b-hf" --listen --listen-port=8888 --gpu-memory 15 15 15 15
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

This Docker Compose file sets up the environment for the LLaMA chatbot. It defines the necessary configurations, specifically GPU memory allocation, with "15 15 15 15" indicating 15GB of memory for each of the 4 available GPUs.

Save and exit the text editor (press Ctrl + O, then Enter, and Ctrl + X).

Update the requirements file

If you have a "requirements.txt" file, open it and append the following lines to it:

--extra-index-url https://download.pytorch.org/whl/cu117
torchaudio
torch==1.13.1+cu117
torchvision==0.14.1+cu117

These lines append package versions, including PyTorch and related libraries, ensuring compatibility and enabling the LLaMA chatbot to function as expected.

Run the chatbot

Execute the following command to start the chatbot within the Docker container:

docker-compose up

The chatbot should now be running on your virtual machine, as shown below.

Chatbot user interface

Chatbot UI image 1

Chatbot UI image 2

To learn about the various features of the text generation web UI, click here.

Chatbot UI documentation quick-links:

Chat mode - Used to have multi-turn conversations with the model.
Default and Notebook modes - Used to generate raw completions starting from your prompt.
Parameters Tab - Contains parameters that control the text generation.
Model Tab - This is the section where you can load models, apply LoRAs to a loaded model, and download new models.
Training Tab - For training your own LoRAs.
Session Tab - Used to restart the UI with new settings.

Running a Chatbot

Prerequisites​

Prepare the operating system​

Install Docker Compose​

Install Nvidia Toolkit for Docker​

Download the LLaMA Chatbot web user interface​

Download the chatbot model​

Prepare Docker files​

Create a Docker Compose file​

Update the requirements file​

Run the chatbot​

Chatbot user interface​

Chatbot UI documentation quick-links:​

Back to top​