Skip to content

Python

ritellm.completion(model, messages, temperature=None, max_tokens=None, base_url=None, stream=False, additional_params=None)

Clean Python wrapper around the completion_gateway function.

This function provides a convenient interface to call various LLM providers' chat completion APIs through the Rust-backed completion_gateway binding. The model string should include a provider prefix.

Example

from ritellm import completion

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

# Non-streaming
response = completion(model="openai/gpt-3.5-turbo", messages=messages)
print(response["choices"][0]["message"]["content"])

# Streaming
response = completion(model="openai/gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
    print(chunk["choices"][0]["delta"].get("content", ""), end="")

Parameters:

Name Type Description Default
model str

The model to use with provider prefix (e.g., "openai/gpt-4", "openai/gpt-3.5-turbo")

required
messages list

A list of message dictionaries with "role" and "content" keys

required
temperature float

Sampling temperature (0.0 to 2.0)

None
max_tokens int

Maximum tokens to generate

None
base_url str

Base URL for the API endpoint

None
stream bool

Enable streaming responses (default: False)

False
additional_params str

Additional parameters as a JSON string

None

Returns:

Type Description
dict[str, Any] | Iterator[dict[str, Any]]

dict | Iterator[dict]: A dictionary containing the API response, or an iterator of chunks if stream=True

Raises:

Type Description
ValueError

If the provider prefix is not supported

Environment Variables

OPENAI_API_KEY: Required for OpenAI models

ritellm.acompletion(model, messages, temperature=None, max_tokens=None, base_url=None, stream=False, additional_params=None) async

Async Python wrapper around the async_completion_gateway function.

This function provides an async interface to call various LLM providers' chat completion APIs through the Rust-backed async_completion_gateway binding. The model string should include a provider prefix.

Example

import asyncio
from ritellm import acompletion

async def main():
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]

    # Non-streaming
    response = await acompletion(model="openai/gpt-3.5-turbo", messages=messages)
    print(response["choices"][0]["message"]["content"])

    # Streaming
    response = await acompletion(model="openai/gpt-3.5-turbo", messages=messages, stream=True)
    for chunk in response:
        print(chunk["choices"][0]["delta"].get("content", ""), end="")

asyncio.run(main())

Parameters:

Name Type Description Default
model str

The model to use with provider prefix (e.g., "openai/gpt-4", "openai/gpt-3.5-turbo")

required
messages list

A list of message dictionaries with "role" and "content" keys

required
temperature float

Sampling temperature (0.0 to 2.0)

None
max_tokens int

Maximum tokens to generate

None
base_url str

Base URL for the API endpoint

None
stream bool

Enable streaming responses (default: False)

False
additional_params str

Additional parameters as a JSON string

None

Returns:

Type Description
dict[str, Any] | Iterator[dict[str, Any]]

dict | Iterator[dict]: A dictionary containing the API response, or an iterator of chunks if stream=True

Raises:

Type Description
ValueError

If the provider prefix is not supported

Environment Variables

OPENAI_API_KEY: Required for OpenAI models