GPT python SDK introduces massive overhead / incorrect timeout

March 20, 2024

I’ve been using openai python packge v0.28.1 with the requests_timeout param which worked OK.
I then updated to the ^1. version only to find out that the timeout no longer works as expected (they have changed the param name from requests_timeout to timeout.

Here is an odd behavior with the current newest version (1.14.1):

from openai import OpenAI, APITimeoutError
import os

client = OpenAI(
    api_key=os.environ['OPENAI_API_KEY'],
)

for timeout in [0.001, 0.1, 1, 2]:
    with log_duration('openai query') as duration_context:
        try:
            response = client.chat.completions.create(  # type: ignore[call-overload]
                model="gpt-4-0125-preview",
                messages=[{'content': 'describe the universe in 10000 characters', 'role': 'system'}],
                temperature=0.0,
                max_tokens=450,
                top_p=1,
                timeout=timeout
            )
        except APITimeoutError as e:
            continue

log_duration just measure the time it takes. the result are :

2024-03-20 14:59:19 [info     ] openai query duration=2.805093 duration=2.8050930500030518 name=openai query
2024-03-20 14:59:22 [info     ] openai query duration=2.844164 duration=2.8441641330718994 name=openai query
2024-03-20 14:59:29 [info     ] openai query duration=6.396946 duration=6.396945953369141 name=openai query
2024-03-20 14:59:38 [info     ] openai query duration=9.387082 duration=9.387081861495972 name=openai query

which is way more then the timeouts. We have been getting a bunch of timeouts on our lambdas without understanding why as the timeout on openai is supposed to be so much lower.

what am I missing? is there such a big overhead in OpenAI’s >1 python SDK?

>Solution :

Requests that time out are retried twice by default, with a short exponential backoff.

You can use the max_retries option to configure or disable retry settings:

from openai import OpenAI

# Configure the default for all requests:
client = OpenAI(
    # default is 2
    max_retries=0,
)

# Or, configure per-request:
client.with_options(max_retries=5).chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "How can I get the name of the current day in Node.js?",
        }
    ],
    model="gpt-3.5-turbo",
)