Skip to main content

Running a Chatbot: How-to Guide

This document provides step-by-step instructions on setting up and running a LLaMA chatbot on your NexGen virtual machine. By following these instructions, you can configure your environment, install the necessary tools, and deploy the chatbot.


  • These instructions are for a Debian-based Linux virtual machine, like Ubuntu or Debian.

Prepare the operating system

Open a terminal on your virtual machine, update the package list, and install the necessary packages which include Docker:

sudo apt update
sudo apt install nvidia-utils-515 nvidia-driver-515 git-lfs

Install Docker Compose

Install Docker Compose by running the following commands:

VERSION=$(curl --silent | grep -Po '"tag_name": "\K.*\d')

sudo curl -L${VERSION}/docker-compose-$(uname -s)-$(uname -m) -o $DESTINATION
sudo chmod +x /usr/local/bin/docker-compose

These commands download and install Docker Compose, a tool used for defining and running multi-container Docker applications. They retrieve the latest version of Docker Compose, place it in a designated location, and make it executable.

Install Nvidia Toolkit for Docker

Configure Nvidia Toolkit for Docker with the following commands:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

These commands are responsible for installing and configuring the Nvidia Toolkit for Docker. They ensure that Docker containers have access to Nvidia GPU resources by setting up the necessary package repositories, installing the toolkit, and configuring the runtime environment.

Download the LLaMA Chatbot web user interface

Execute the following commands to download and install the web UI components necessary for the LLaMA chatbot.

git clone
cd text-generation-web-ui
mkdir installers
cd installers

These commands download and set up the web user interface (UI) for the LLaMA chatbot. They clone the necessary files and components from a repository, create required directories, and fetch additional dependencies.

For additional information about the text generation web UI, click here.

Download the chatbot model

Navigate to the "models" directory and clone the chatbot model by running the following commands:

cd ../models
git-lfs clone

Prepare Docker files

  1. Create a Dockerfile by running the following commands:

    nano Dockerfile
  2. In the text editor that opens, paste the following content into the Dockerfile:

    FROM python:3.10.6-slim-bullseye
    ENV DEBIAN_FRONTEND=noninteractive
    RUN apt-get update && apt-get install -y git software-properties-common gnupg
    COPY . /app
    WORKDIR /app
    RUN dpkg -i /app/installers/cuda-repo-debian11-11-7-local_11.7.1-515.65.01-1_amd64.deb
    RUN cp /var/cuda-repo-debian11-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
    RUN add-apt-repository contrib
    RUN apt-get update
    RUN apt-get -y install cuda \
    && apt -y remove nvidia-* \
    && rm -rf /var/cuda-repo-debian11-11-6-local
    RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/requirements.txt
    RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/google_translate/requirements.txt
    RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/silero_tts/requirements.txt
    CMD python --auto-devices --cai-chat --load-in-8bit --bf16 --listen --listen-port=8888

    This Dockerfile initializes an image based on Python 3.10.6 and Debian Bullseye, setting up essential packages, including Git and CUDA support.

  3. Save and exit the text editor (press Ctrl + O, then Enter, and Ctrl + X).

Create a Docker Compose file

  1. Create a Docker Compose file by running the following command:

    nano docker-compose.yml
  2. In the text editor, paste the following content into the docker-compose.yml file:

    version: "3.3"
    build: .
    - "8889:8888"
    stdin_open: true
    tty: true
    - .:/app
    command: python --auto-devices --cai-chat --model "llama-7b-hf" --listen --listen-port=8888 --gpu-memory 15 15 15 15
    - driver: nvidia
    count: all
    capabilities: [gpu]

    This Docker Compose file sets up the environment for the LLaMA chatbot. It defines the necessary configurations, specifically GPU memory allocation, with "15 15 15 15" indicating 15GB of memory for each of the 4 available GPUs.

  3. Save and exit the text editor (press Ctrl + O, then Enter, and Ctrl + X).

Update the requirements file

If you have a "requirements.txt" file, open it and append the following lines to it:


These lines append package versions, including PyTorch and related libraries, ensuring compatibility and enabling the LLaMA chatbot to function as expected.

Run the chatbot

Execute the following command to start the chatbot within the Docker container:

docker-compose up

Chatbot user interface

Chatbot UI image 1

Chatbot UI image 2

To learn about the various features of the text generation web UI, click here.

Back to top