Introduction

The Hugging Face Inference API makes it easy to send prompts to large language models (LLMs) hosted on the Hugging Face Hub. By combining this with Gradio, you can quickly build interactive chatbots and demos with a simple, web-based user interface—without worrying about backend frameworks.

In this tutorial, you’ll learn how to integrate the Hugging Face Inference API into a Gradio app, securely store your API key, and create an interactive chatbot that handles user input and returns AI-generated text.

💡 Why use Gradio?

Gradio is a lightweight, open-source Python library that makes it easy to create interactive demos for machine learning models. It’s known for:

✅ Quick setup: Build and launch interactive interfaces in minutes.
✅ Customizable components: Supports text, audio, video, and more.
✅ Shareable links: Great for showcasing models with a public URL.
✅ Integration with popular ML frameworks: Including Hugging Face and TensorFlow.

💡
Combining Gradio with the Hugging Face Inference API allows you to rapidly prototype and share LLM-powered chatbots.

✅ Prerequisites

Before you start, make sure you have:

🛠️1️⃣ Install Libraries

First, install the required Python packages:

pip install huggingface_hub python-dotenv gradio

This will install everything you need to interact with the Hugging Face Inference API and build a Gradio app.

📦2️⃣ Import Packages

Create a file named app.py and import the required packages:

import os
from dotenv import load_dotenv
import gradio as gr
from huggingface_hub import InferenceClient

🔑3️⃣ Store Your API Key

To use the Hugging Face Inference API, you need to pass your access token as the API key. It’s important to keep the access token safe and out of version control. A recommended way to manage it locally is by using a .env file and the python-dotenv package.

Create a file named .env in the root of your project and add your token:

HF_API_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxx

In your Python code, load the key:

load_dotenv()
api_key = os.getenv("HF_API_TOKEN")

This approach makes your credentials easy to manage across different environments and keeps them secure.

🚀4️⃣ Initialize the Inference Client

Next, initialize the InferenceClient with your API key and select the desired inference provider:

client = InferenceClient(
    provider="sambanova",
    api_key=api_key,
)
CTA Image

Learn more about building end-to-end AI solutions in our newly published book! We guide you step-by-step using Microsoft Fabric, sharing practical insights and hands-on techniques. Whether you’re just starting out or want to deepen your expertise, you’ll discover everything you need to successfully deploy AI in real-world scenarios.

To the Book

💬5️⃣ Build the Gradio Chatbot

Now, define a function that sends user input to the LLM and returns its response:

def chat_with_llm(prompt, history):
    try:
        response = client.chat.completions.create(
            model="meta-llama/Llama-3.1-8B-Instruct",
            messages=[
                {"role": "user", "content": prompt}
            ],
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In