Quickstart

Basic Usage

from ritellm import completion

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

response = completion(
    model="openai/gpt-3.5-turbo",
    messages=messages
)

print(response["choices"][0]["message"]["content"])

Streaming

To enable streaming, set stream=True:

from ritellm import completion

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a short poem."}
]

response = completion(
    model="openai/gpt-3.5-turbo",
    messages=messages,
    stream=True  # Enable streaming
)

# Iterate over chunks as they arrive
for chunk in response:
    if "choices" in chunk and len(chunk["choices"]) > 0:
        delta = chunk["choices"][0].get("delta", {})
        content = delta.get("content", "")
        if content:
            print(content, end="", flush=True)

print()  # New line after streaming completes

Response Format

Non-Streaming Response

When stream=False (default), you receive a complete response dictionary:

{
    "id": "chatcmpl-...",
    "object": "chat.completion",
    "created": 1234567890,
    "model": "gpt-3.5-turbo-0125",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! How can I help you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 20,
        "completion_tokens": 10,
        "total_tokens": 30
    }
}

Streaming Response

When stream=True, you receive an iterator of chunk dictionaries:

# First chunk (usually empty or with role)
{
    "id": "chatcmpl-...",
    "object": "chat.completion.chunk",
    "created": 1234567890,
    "model": "gpt-3.5-turbo-0125",
    "choices": [
        {
            "index": 0,
            "delta": {
                "role": "assistant",
                "content": ""
            },
            "finish_reason": None
        }
    ]
}

# Content chunks
{
    "id": "chatcmpl-...",
    "object": "chat.completion.chunk",
    "created": 1234567890,
    "model": "gpt-3.5-turbo-0125",
    "choices": [
        {
            "index": 0,
            "delta": {
                "content": "Hello"
            },
            "finish_reason": None
        }
    ]
}

# Final chunk
{
    "id": "chatcmpl-...",
    "object": "chat.completion.chunk",
    "created": 1234567890,
    "model": "gpt-3.5-turbo-0125",
    "choices": [
        {
            "index": 0,
            "delta": {},
            "finish_reason": "stop"
        }
    ]
}

Complete Example

Here's a complete example that handles streaming responses gracefully:

from ritellm import completion

def stream_completion(messages, model="openai/gpt-3.5-turbo"):
    """Stream a completion and print the response."""
    response = completion(
        model=model,
        messages=messages,
        stream=True,
        temperature=0.7,
        max_tokens=500
    )

    print("Assistant: ", end="", flush=True)
    full_response = ""

    for chunk in response:
        if "choices" not in chunk or len(chunk["choices"]) == 0:
            continue

        choice = chunk["choices"][0]
        delta = choice.get("delta", {})
        content = delta.get("content", "")

        if content:
            print(content, end="", flush=True)
            full_response += content

        # Check if streaming is complete
        if choice.get("finish_reason") == "stop":
            break

    print()  # New line
    return full_response


# Usage
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
]

response_text = stream_completion(messages)
print(f"\n\nFull response length: {len(response_text)} characters")

Best Practices

Always handle missing content: Not all chunks will have content, especially the first and last chunks.

content = chunk["choices"][0]["delta"].get("content", "")
if content:
    print(content, end="", flush=True)

Use flush=True: When printing streaming content, use flush=True to ensure immediate output.
Check finish_reason: Monitor the finish_reason field to know when streaming is complete.
Error handling: Wrap streaming in try-except blocks to handle network issues gracefully.

try:
    for chunk in response:
        # Process chunk
        pass
except Exception as e:
    print(f"\nStreaming error: {e}")

Accumulate response: If you need the full response text, accumulate it from the chunks:

full_text = ""
for chunk in response:
    content = chunk["choices"][0]["delta"].get("content", "")
    full_text += content

Supported Providers

Currently, streaming is supported for:

✅ OpenAI (openai/ prefix)

More providers will be added in future releases.