THE CLOUD FOR APIS: 2022 IN REVIEW
The Definitive Annual Cloud API Performance Industry Report from the team, ecosystem, and data behind APIContext
Data Scientist @ APContext
Introduction
In 2022, APIContext made over a billion API calls to more than 8,400 different API endpoints from 70 geographically diverse cloud data centers across AWS, Azure, Google, and IBM. This report builds on the work of our previous reports, the 2021 Cloud API Performance report can be found here.
In this report, we build on the unique and ever-expanding API dataset generated by the APIContext platform to establish an unbiased, industry-wide baseline for API quality scoring.
The APIContext Cloud API Performance report 2022 focuses on data from leading API services, including those from prominent corporate infrastructure providers, financial services institutions, social networks, and search engines.
Five Key Findings for 2022
- Overall, API performance & availability improved, with six APIs achieving 99.99% service availability (compared with just two in 2021).
- Service quality has also improved, with 70% of providers having a Cloud API Service Consistency (CASC) score of 9.00+, indicating very good performance.
- 28% of providers had a CASC score between 8.00 and 8.99, indicating good performance.
- Only 2% had a CASC score between 7.00 and 7.99, indicating an API in need of investigation.
- AWS had the best network performance of the major clouds.
- Only Azure showed a decrease in DNS Lookup Time; AWS and IBM Cloud showed significant increases.
- For the fourth year in a row, AWS US-East (N. Virginia) was the fastest cloud location for Time to Connect with an average time of 1.23 ms, demonstrating that many solutions continue to be hosted here – potentially bringing up a huge risk for cloud services.
Summary
Cloud API performance has been stable over 2020- 2022, albeit with evidence of deterioration in certain aspects of performance, particularly DNS Lookup Time. This period has likely been impacted by COVID-19 and geopolitical uncertainties, and performance does still vary between clouds, regions, and locations.
QUALITY IS STABLE: Most services are rated as excellent with a CASC (quality) score of 9.00 or more. Overall quality is similar to 2020 and 2021, suggesting that improvements in performance might be tending to plateau. There is no excuse for not having a highly stable and consistently performant API.
AVAILABILITY: Five 9s is a tough target, but 99.99% is a goal that should be achievable for most APIs and six of the services studied managed to reach this level, up from two in 2021. This indicates an area of future focus for quality improvements.
CLOUD PERFORMANCE: We see significant differences in performance between clouds. In 2022, Azure was consistently >70 ms slower than AWS. In an API-first economy where every millisecond increasingly counts, can you and your customers afford to be using a slow cloud?
As an integrator to an API-based service you should consider where is best to integrate from to maximize performance. As a provider you should provide insight and resources for your customers and partners on architectural choices as part of your standard developer information.
DNS DETERIORATION: DNS resolution times have slowed across clouds except for AWS and all regions except South America. If AWS can have a median DNS Time of 3 ms, so can the other clouds.
In 2023, we want to see DNS Times getting better again across all clouds and regions.
ABSOLUTE REGIONAL DIFFERENCES HAVE DECREASED: In our first report for 2017 there was a 10x difference between South America and Europe in median TCP Connect Time – South America is now 40% quicker than Europe and over 3x faster than North America.
This is excellent and shows the APIs and the cloud providers continue to strive for improvement.
That said, you will still pay geographic penalties depending on how the APIs you call are architected and where they are hosted.
Interestingly, the relative regional differences have remained more or less the same over the six year period. Oceania and South Asia are still ~3x slower than North America for Total Time even though Total Time is down across the board; only Europe has significantly improved relative to North America.
Many services are hosted in North America and further performance improvements would be helped by multi-hosting services in additional regions to reduce distance from the end user and thus download time.
The API Supply Chain
API performance is good across a wide range of popular services, with more and better-quality availability than in 2021.
But the problem of the API Supply Chain, as John Musser calls it, remains significant. There are meaningful geographic differences, such as physical distances across oceans and continents; and cloud performance variations, such as the amount of bandwidth available through fiber optic cables and the capacity of network equipment. DNS Lookup Times, which have always been a problem, seem to be getting worse.
Using an API isn’t just relying on a black box. The API you provide or use exists in a universe of components including their own cloud service, a CDN provider, probably a gateway of their own, a backend server architecture, and potentially a security and identity service. And each of those components has its own configuration and cloud dependencies. Poorly performing APIs need up to 2.5x as much work as good performing ones, and over the course of a year, a poor API endpoint could, all by itself be costing $10,000s in lost engineering resource. Across the industry we calculate this could be costing OVER $90 BILLION a year.
What is a cASC Score
There are many metrics that can be used to understand the performance profile, but the complexities can lead to confusion. Cloud API Service Consistency (CASC) scoring provides a single score, out of 10, that shows how well any API is functioning beyond the pass rate. Think of the CASC score like a combination of an API speed test and a credit rating. We take all the metrics we have for your API, blend them together, then compare the result against our unrivaled dataset of historical API speed test records. This gives us a single number that’s benchmarked against all the APIs observed by APImetrics.
DNS Is STILL a Problem
DNS Lookup Time has increased across most cloud services and regions, suggesting there are issues with network infrastructure and configuration for Azure, Google, and IBM Cloud, particularly in Europe and North America.
For example, in 2022, DocuSign had a DNS time of 199.35 ms compared to the average of 10.05 ms. Capital One had a DNS time of 339.2 ms. In 2021 DocuSign was slowest at 252.8 ms compared to an average of 16.975 ms; Capital One was 143.75 ms.
From our analysis, most services use a CDN provider. However, a CDN doesn’t just work out of the box and needs to be configured. If you are a user or a provider of any services, you should focus on this.
No Data Found
Top Achievers
From January 2022 – December 2022, PagerDuty had 21.9 minutes of measurable downtime on their APIs. (The best performer in 2021 was DocuSign with 26.5 minutes of downtime). Congratulations to the DevOps team and everyone else at PagerDuty involved!
The next-best performer had a failure equivalent of just 31.6 minutes of downtime over 12 months, which is still a pretty good effort. To put that in perspective, the worst-performing API had over 3.2 days of downtime. In other words, if an API is being exercised at an average rate of a 50 calls/ second, that would mean more than 13 million attempted calls were lost.
Look For Yourself
High-level data for 2022 is provided for free at the API Directory and Ranking Lists at https:// apicontext.com/api-directory/. If you would like to be able to dive deeper into the details, please drop us a line for licensing access.
Methodology
API calls were made on a regularly scheduled basis from 70 data centers around the world using APImetrics observer agents running on application servers provided by the cloud computing services of Amazon (AWS), Google, IBM Cloud, and Microsoft (Azure).
The sample sizes for each API called are roughly the same and are equivalent to a call from each cloud location made to each endpoint every five minutes throughout the year.
We logged the data using our API performance and quality monitoring and management platform. Latency, pass rates, and quality scores were recorded in the same way for all APIs. For most APIs, data is available for the whole of the period.
For analysis, we have grouped the APIs and endpoints we monitored into the services they represented.
Pass Rates
In calculating the pass rate, we define failures to include the following:
- 5xx server-side errors
- Network errors in which no response is returned
- Content errors where the API did not return the correct content, i.e. an empty JSON body or
incorrect data returned - Slow errors in which a response is received after an exceptionally long period
- Redirect errors in which a 3xx redirect HTTP status code is returned
We ignored call-specific application errors such as issues with the returned content and client side HTTP status code 4xx warnings caused by authentication problems such as tokens that have expired.
If an API fails, it may pass if called again immediately and succeed if the outage is transitory. However, our methodology still gives a general indication of availability issues.
n-9s Reliability
The traditional telecommunications standard for service availability is five 9s – at least 99.999% uptime of just five minutes of downtime in a year. Of 34 services analyzed in this study, no API managed to achieve a service ability of five 9s. Six services achieved four 9s, up from two in 2021.
Table 1: Number of APIs by service availability
In 2022, 18% of major corporate services measured scored less than three 9s. There was a nearly 18-hour difference in unscheduled downtime observed between two leading file management services, compared to just 28 minutes between them in 2021.
Quality
APImetrics uses CASC, our patented quality scoring system, to compare the quality of different APIs. CASC (Cloud API Service Consistency) blends multiple factors to derive a “credit rating” for an API, benchmarked against our unmatched historical dataset of API test call records.
- Scores over 9.00 are evidence of exceptional quality of operation
- Scores over 8.00 relate to a healthy, well-functioning API that will give few problems to users.
- Scores between 6.00–8.00 indicate some significant issues that will lead to a degraded user experience and increased engineering support costs.
- A CASC score below 6.00 is considered poor and urgent attention is required.
It is important to note that CASC scores do not fall on a normal curve. The scores are absolute, and we see no engineering reasons why prominent APIs should not consistently reach an 8.00+ CASC score.
Most services studied are of very good quality with CASC score of 9.00 or over, which indicates excellent performance over 2020-2022.
API Performance in 2022 was very similar to 2020. In both years, only 4% of services had CASC scores between 7.00 and 7.99, which indicates significant issues that need attention. The consistency of quality over three years might indicate that a plateau in API performance has been reached or that investment in infrastructure has been curtailed because of uncertainties around the global COVID-19 pandemic and the geopolitical situation.
Latency
Some calls will be faster than others because of the nature of the backend processing involved, so total call duration, even over a sample size of tens of millions of calls, can only give a partial view of the behavior of the APIs.
In 2022, we added a huge number of fast calls for the Serinus Cloud Watch service calls. This had the effect of reducing the Total Time average, but it doesn’t affect the DNS Time, TCP Connect Time or SSL Handshake Time, which are determined by the cloud/region/location not by the call being made. Those three components are involved in the network setup of the call and are the same for all calls (they vary with cloud/region/ location and over time, but not by type of call).
The average total time has gone down, but the trends between clouds and regions remain. Remember, we are dealing with a snapshot from the data we have from which we can extract broad trends, not claiming that everything is like-for-like over a period of six years. We can only really do that for UK Open Banking.
- AWS was the fastest cloud by median total time from mid-2021 and the whole of 2022.
- Azure has been consistently the slowest cloud by median time since 2018.
- AWS’s median DNS Lookup Time has continued to fall and is now about 3 ms.
- The DNS Lookup Times for Azure and IBM Cloud have increased from mid-2021, bucking the long-term trend and also no longer demonstrating the marked quantized behavior seen since mid-2018. This suggests that there was some fundamental change made by Azure and IBM Cloud in 2021 to the configuration of their DNS services.
- Google’s DNS Lookup Time has only increased slightly in this period, but like Azure and IBM Cloud shows more variance and is no longer as obviously quantized suggesting that they too made configuration changes.
- DNS Lookup Time by region also showed a slight increase in 2022 and more variance, which reflects the changes made by Azure, Google, and IBM Cloud.
DNS Matters
From mid-2021, DNS lookup times increased whether measured by cloud (except for AWS) or by region. They have also shown more short-time variation and, in most cases, are less obviously quantized than was the case in the 3-year period mid-2018 through mid-2021.
South America was the fastest region for median DNS Lookup Time in 2022 with a time of ~8 ms (down from ~9 ms in 2021) and South Asia the slowest at ~16 ms (up from 10 ms in 2021). Europe was the fastest region in 2021 at ~6 ms but was up to ~9ms in 2022.
South America was the only region to have faster DNS in 2022 than 2021. Given that faster performance has been achieved in the past, this is an area for focus in improving service quality for all cloud network engineering teams who should monitor DNS times on an ongoing basis and ensure performance criteria from all regions and locations are becoming longer or less stable.
South America
South Asia
Recommendations
- Actively monitor the availability and latency of all APIs you expose and consume. If you’re not, you don’t know how your APIs are performing right now for users in the real world. And if you’re not actively monitoring your APIs, you’re not managing them.
- Benchmark the performance and quality of your APIs against those of your peers/competitors. Because you really don’t want to discover that they are so much better than you.
- Know the differences between cloud locations v. user locations. Your service might be hosted in Virginia, but your users might be in Vienna and Vietnam. Make sure your choice of cloud isn’t wasting valuable milliseconds for your users.
- Not all clouds are the same and they change over time. 70 ms or more of latency can be down to your choice of cloud. Your API users shouldn’t tens or hundreds of milliseconds of latency just because of a decision made years ago.
- Ensure no specific issues affect the DNS Lookup Time for your domain (should be 12 ms or less). DNS should always be fast. If it isn’t, do something about it because slow DNS is just money down the drain!
- Understand what factors impact call latency and where to focus improvements. What’s the latency component most impacting user experience, and what can you do to improve it?
- Track performance outliers and determine their causes. Slow outliers can greatly impact user experience. Are some calls taking 30 seconds or more to complete? How can you stop that?
- Be aware of the impact of API failures and errors on user experience and business costs All API performance and quality issues cost money. Bad APIs mean lost customers. Can your organization afford not to have the best possible APIs providing the best possible user experience?
Appendix - Detailed Findings
Failure Rates By Cloud/Region
The graph below shows the overall failure rate, excluding 4xx client -side warnings.
Failures here include 5xx server-side errors, network errors in which no response is returned, slow errors in which a response is received after an exceptionally long period and redirect errors in which a 3xx redirect HTTP status code is returned.
No Data Found
Figure A1: Overall failure rate by cloud, 2019-2022
Over the four-year period, 2019-2022, the overall failure rate is generally nearly identical for
all clouds.
No Data Found
Figure A2: Overall failure rate by region, 2019-2022
This is even more evident for regions with very little difference in failure rates between regions in 2022.
Latency Data Per Cloud/Region
The variation in median total time shows a roughly similar pattern between clouds. This is to be expected, as the variation is primarily driven by changes on the server-side for API providers. (The median is a better metric here as it excludes the effect of outliers.)
The fall in median latency at the beginning of 2022 is owing to the introduction of the Serinus Cloud Monitor tests. These tests are both fast and high frequency, so have the effect of reducing average total time (the Time to First Byte is a generally representative measure of the server-side processing time and the Serinus Cloud Watch tests have led to the average value of this component being reduced).
No Data Found
Figure A3: Median Total Time by cloud, 2019-2022
The tables below show the median Total Times, DNS Times and Connect Times for the four clouds for the period 2017-2022. Median Connect Time has decreased for all four clouds for the last three years (2020-2023), but DNS Time stayed the same or increased for three clouds and only AWS managed a close to optimal median DNS Time for 2022 at 3 ms, down from 4 ms in 2021.
No Data Found
Figure A4: Median DNS Time by cloud, 2019-2022
From mid-2018 to mid-2021, DNS Time is quantized for all four clouds, but improvements to the APImetrics observer network implemented in 2021 have allowed for a more granular analysis.
Azure and IBM Cloud in particular show marked increases in DNS Time. Google only increases slightly, but best in class AWS manages to decrease.
Median Total Time by Cloud
No Data Found
Figure A5: Comparison of Median Total Time by year by cloud, 2019-2022
Median DNS Time by Cloud
No Data Found
Figure A6: Comparison of Median DNS Time by year by cloud, 2019-2022
Median Connect Time by Cloud
No Data Found
Figure A7: Comparison Median Connect Time by cloud, 2019-2022
No Data Found
Figure A8: Median DNS Time by region, 2019-2022
Of the six regions, Europe and North America have consistently been the fastest over the four year period. This is likely because of a combination relative geographical compactness and infrastructure investment. In 2022, East Asia and South America comprise an intermediate group with Oceania and South Asia a slow group. All regions show a decrease in Connect Time in 2022 compared to 2021, although there is likely still room for improvement for Oceania and South America locations. Only South America decreased DNS Time in 2022 (Oceania was constant) and other regions increased, so this is clearly an area where improvements are possible to return to the optimal performance seen in 2021.
No Data Found
Figure A9: Median DNS Time by cloud, 2019-2022
We see the same behavior as noted for clouds of a slight upward trend in DNS Time from mid-2021
with the clear quantized pattern seen from mid-2018 no longer evident.
Median Connect Time by Cloud
No Data Found
Figure A10: Comparison of Median Total Time by year by cloud, 2019-2022
Median DNS time by Region
No Data Found
Figure A11: Comparison of Median DNS Time by year by region, 2019-2022
Median Connect time by Region
No Data Found
Figure A12: Comparison of Median Connect Time by year by region, 2019-2022
No Data Found
Figure A13: Median DNS Time by Region by Year Relative to Oceania January 2019 – October 2022
Glossary of Terms
APImetrics Has You Covered
APImetrics provides run-time API governance solutions for organizations offering API services across the Financial Services, Open Banking, Telecoms, Software, and IoT sectors. By enabling a holistic, end-to-end view of performance, quality, and functional issues across the API surface, we allow organizations to better serve their customers and end users.
Our patented technology automates the process of producing regulator-ready reports for financial services providers around the world.
Our active monitoring platform integrates with many of the leading developer operations suites and provides an API-centric view of:
- Real-time API performance from more than 80 locations worldwide on four clouds and six continents
- Fully integrated security monitoring designed and built for the needs of the financial services industry
- Machine learning based analysis driven by a database of more than a billion real API calls
- Integrated reporting, analysis, and alerting
- 360-degree visibility with Cloud API Service Consistency scoring (CASC), allowing for at-a glance service and competitor comparisons
Contact Us
Learn how we bring best practices and open standards to API monitoring with integrated API workflows, performance assurance and conformance analysis.