The OpenAI Create Chat Completion endpoint suffered a nearly complete outage for nearly two hours on Wednesday, 8 November 2023.
APImetrics makes two scheduled test calls to this endpoint every five minutes. One call is a simple query that returns a small payload and one is a more complex (and thus compute-intensive) call that returns a larger payload.
APImetrics observed the outage beginning at 5:43 PST on 8 November. The outage ended at 7:38 PST. The duration was 1 hour and 55 minutes.
During the outage, we initially observed a mixture of HTTP status 502 Bad Gateway and 503 Service Unavailable responses.
For the final part of the outage, from ~7:00 PST, we observed HTTP status code 429 Too Many Requests responses until the end of the outage.
We also observed one pass during the outage at 6:43 PST. (This was for a test call that returns a smaller payload and requires less compute.)
We did not observe any service interruption for the OpenAI Get Models endpoint that we also monitor.
The outage is clearly seen in these graphs of the response size over time.
OpenAI Create Chat Completion (Short Response) test
OpenAI Create Chat Completion (Long Response) test
The outage is clearly seen in these graphs of the response size over time.
Detailed Breakdown of OpenAI Outage
OpenAI Create Chat Completion (Long Response) test
Outage began at 5:43:03 PST. 16 FAILs (a call is made every 5 minutes for each test), a mixture of HTTP 502 responses Bad Gateway (12 responses) and 503 responses Service Temporarily Unavailable (4 responses). Error messages were terse:
{
“error”: {
“code”: 502,
“message”: “Bad gateway.”,
“param”: null,
“type”: “cf_bad_gateway”
}
}
{
“error”: {
“code”: 503,
“message”: “Service Unavailable.”,
“param”: null,
“type”: “cf_service_unavailable”
}
}
From 7:03:43 PST, we received 6 WARNINGs with HTTP response 429 Too Many Requests and the warning message:
{
“status”: “come back later”
}
The first passing call was at 7:38:09 PST. The outage lasted 1 hour and 55 minutes.
The outage was immediately preceded by two SLOW responses.
OpenAI Create Chat Completion (Short Response) test
Outage began at 5:43:30 AM PST. 12 FAILs, 5 502s, 7 503s, one pass at 6:43:30 PST, two more FAILs (both 502s) at 6:48:30 PST and 6:53:30 PST, 7 WARNINGs (429s) from 6:58:28 to 7:28:28 PST, one FAIL at 7:33:38 PST (Connection Error with no HTTP status code returned) and then passing from 7:38:37 PST.
The outage lasted 1 hour and 55 minutes, but included one pass (that returned the correct information). For this test, the outage was not preceded by any SLOW responses.
OpenAI Get Models endpoint
This endpoint was unaffected by the outage.
24-Hour performance summary
For 900 calls for all three tests from 2023-11-7 16:13 PST to 2023-11-8 16:13 PST, we observed 8 Connection Errors, 5 HTTP status response 522 Connection Timed Out Cloudflare errors (2 at 2023-11-7 19:23 PST 3 at 2023-11-8 11:58). All the 503 Service Unavailable responses were during the outage period.
We observed 2 502 Bad Gateway responses before the outage and 3 after the outage (all for the OpenAI Get Models endpoint). All of the 429 Too Many Requests WARNINGs are in the outage period.