📘 Introduction

When you build AI apps, some users ask the same question more than once. Your app may also send the same prompt during testing, retries, demos, or repeated workflows. If every identical prompt goes back to the model provider, you can waste time and money.

In this tutorial, you will learn how to cache LangChain chat model responses in Python. We will start with a normal ChatOpenAI call, then add an in-memory cache, and finally use a small SQLite cache file so cached responses can survive between script runs.

💡
Caching is useful when the exact same prompt and model settings are used again. It is not the same thing as chat memory, RAG, or provider-side prompt caching.

💡 Why caching matters

A cache stores a previous result and reuses it when the same request appears again. In an LLM app, this can make repeated responses faster and reduce unnecessary API calls.

For example, imagine a learning app that explains the same concept to many users. If the prompt, model name, and model settings are identical, the app can reuse the cached answer instead of calling the model again.

LangChain model caching is best for exact repeats. If the prompt changes by even a small amount, it may count as a new request.

✅ Prerequisites

Before getting started, make sure you have:

☑️ Python installed
☑️ Basic Python knowledge
☑️ An OpenAI API key
☑️ A terminal or command prompt

⚙️1️⃣ Create a project folder

Create a new project folder for this tutorial:

mkdir langchain-response-cache
cd langchain-response-cache

🐍2️⃣ Create a virtual environment

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

On Windows, activate it with:

.venv\Scripts\activate

📦3️⃣ Install libraries

Install the LangChain packages we need. The community package is used later for the SQLite cache example.

pip install langchain-core langchain-openai langchain-community

🔐4️⃣ Set your API key

Set your OpenAI API key as an environment variable. Replace the placeholder with your own key and never commit real secrets to a repository.

export OPENAI_API_KEY="your_api_key_here"

On Windows PowerShell, use:

$env:OPENAI_API_KEY="your_api_key_here"

🧪5️⃣ Create a normal LangChain response script

First, create a normal script without caching. This gives us a baseline so the cache behavior is easier to understand.

Create a file named normal_response.py and add this code:

import time

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4.1-mini", temperature=0)
question = "Explain LangChain caching in two short sentences."

for run_number in range(1, 3):
    start = time.perf_counter()
    response = model.invoke(question)
    elapsed = time.perf_counter() - start

    print(f"Run {run_number}: {elapsed:.2f} seconds")
    print(response.content)
    print("-" * 40)

normal_response.py

Run the script:

python normal_response.py

Both runs call the model. The timing will vary, but the second run still needs to wait for a model response because no LangChain cache is active yet.

🎓
Want the full Academy version? Next we add LangChain caching, compare the result, persist responses with SQLite, and cover common troubleshooting issues.

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In