📘 Introduction

Most beginner LLM examples wait for the full response before printing anything. That is fine for small scripts, but it can feel slow in a real AI app. When the model writes a longer answer, users often expect to see the text appear step by step.

In this tutorial, you will learn how to stream LangChain responses in Python. We will first call a model with a normal request, then update the script so it prints response chunks as they arrive.

💡
Streaming does not make the model think faster. It makes the experience feel faster because the user can start reading before the full answer is complete.

💡 What are we implementing?

We will build this small workflow:

User prompt -> ChatOpenAI -> response chunks -> terminal output

A normal model call returns one complete message. A streaming call returns smaller chunks while the answer is being generated. Your application can print those chunks in the terminal, send them to a web app, or show them in a chat interface.

✅ Prerequisites

Before getting started, make sure you have:

☑️ Python installed
☑️ Basic Python knowledge
☑️ An OpenAI API key
☑️ A terminal or command prompt

⚙️1️⃣ Create a project folder

Create a new local project folder for this tutorial:

mkdir langchain-streaming-responses
cd langchain-streaming-responses

🐍2️⃣ Create a virtual environment

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

On Windows, activate it with:

.venv\Scripts\activate

📦3️⃣ Install libraries

Install LangChain and the OpenAI integration package:

pip install -U langchain langchain-openai

🔐4️⃣ Set your API key

Set your OpenAI API key as an environment variable. Replace the placeholder with your own key and never commit real secrets to a repository.

export OPENAI_API_KEY="your_api_key_here"

On Windows PowerShell, use:

$env:OPENAI_API_KEY="your_api_key_here"
🔐
Use environment variables for credentials. Do not paste private API keys directly into Python files.

🧱5️⃣ Create a normal response script

First, create a file named normal_response.py. This version uses a regular model call and waits until the full answer is ready.

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4.1-mini", temperature=0)

response = model.invoke(
    "Explain why streaming responses are useful in AI apps in three short bullet points."
)

print(response.content)

normal_response.py

Run the script:

python normal_response.py

You should see the full response after the model finishes generating it:

- Users can start reading before the full response is complete.
- Long answers feel faster and more interactive.
- Streaming is useful for chatbots, copilots, and AI assistants.

✍️6️⃣ Stream the response chunk by chunk

Now create a file named stream_response.py. This version uses the streaming interface and prints each chunk as it arrives.

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4.1-mini", temperature=0)

stream = model.stream(
    "Write a short beginner-friendly explanation of streaming in AI apps."
)

for chunk in stream:
    print(chunk.content, end="", flush=True)

print()

stream_response.py

Run it:

python stream_response.py

This time, the answer should appear piece by piece. The exact chunks can vary, but the important behavior is that the terminal starts printing before the complete message is finished.

🎓
Want to go deeper? In the Academy section, we add async streaming, inspect usage metadata, and troubleshoot common streaming issues.

⚡7️⃣ Stream asynchronously with astream

For many web apps and backend services, async code is useful because it lets your application handle other work while waiting for model chunks. Create a file named async_stream_response.py.

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In