APIContext partners with Akamai to expand advanced API monitoring adoption. Learn more >

API Monitoring – the basics

With the average enterprise now using over 1000 APIs, why are we still being asked what is API Monitoring and how to do it? 

The reason is simple, it’s a complex topic that that falls between different groups inside the enterprise and can lead to a variety of approaches, not all of them as effective as others.

One challenge is there are several audiences who need to worry about API Monitoring and they are all concerned about different API KPIs (API Key Performance Indicators). As a result, they have different goals from API monitoring.

So, let’s outline some insights into the approaches and some methodologies to make API monitoring easier and more effective. We’re going to look at this from the perspective of working out what you want to achieve with API monitoring and then look at how to deliver that.

  • What are the goals of my API Monitoring?
  • Who is the audience?
  • What do I want or need to monitor? What should I monitor?
  • How do I want to monitor it (and does it matter)?
 

This article will answer these, and other, questions and suggest some clear strategies for enterprises looking to add dedicated API monitoring or build on what they have already.

 
API Monitoring

API Monitoring: Setting your Goals

First, what is your pain point and what type of monitoring is going to help you understand that pain? For many, this is purely knowing your “API availability.” If you want to know if your APIs are available or know about problems before your customers notice a problem, it’s the perfect place to begin.

One of the potential issues can be relying on internal observations through application monitoring (APM) tools like Dynatrace, New Relic, Splunk or others. APM tools allow you to spot outages but often it takes some time for the outage to become apparent. An example of this would be if you have time-based use patterns (i.e. customers in a single time zone or setup of time zones) where low load during the night masks an emergent problem after an update at midnight.

In practice, we’ve seen as much as 4 hours difference between using log-based API monitoring and purpose-built synthetic monitoring.

The second issue with log-based monitoring is that many APIs can appear to function but not be working as expected. This is an issue with all APIs, but particularly bad with GraphQL where all APIs returned a HTTP 200 code even if there is an error message in the body. Your goal should be to monitor the APIs for function not just status codes. 

Third, if your goal is to validate a Service Level Objective, you need to monitor that from where your customers are. In our experience of monitoring billions of API calls, it is extremely difficult to reach agreement over performance or quality metrics when you are looking at different metrics.
When your customer shows you data that they generated locally, you will
not get very far responding with your own data that is designed to show
your services in the most favorable light.

Finally, it is important to remember to monitor all the aspects of your APIs – because security is complex, it’s common for engineering teams to skip parts of the API monitoring process in order to simplify the work. This creates gaps, and gaps can lead to unexpected outages.

Moreover, highly secure services that are protected by OAuth and other strong authentication systems can be complex and brittle (i.e. prone to break) – engineering teams sometimes install workarounds for API security in order to simplify the testing process. This has led to credentials being exposed in the wild and other potential security problems being introduced to your API surface area.

Overall your goal should be to monitor your APIs like your end customers consume them and generate data that replicates their experience.

 

API Monitoring: Who is the audience?

Most teams that monitor APIs do so for an internal audience. If that is the case for you, consider who your stakeholders are. APIs now cross all aspects of business operations. Product and engineering team need completely different data and analysis than the SRE team or the compliance and risk team. If you are monitoring for Service Level Objectives where there are external customer stakeholders involved (possibly even ones that pay you for API access), then their needs and the data they want to see is likely to be very different again.

If you are in a regulated industry, it’s entirely possible that your API Monitoring might potentially have a business risk impact too, which needs to be managed in addition to all of the technical considerations we have discussed.

While it increases the complexity of your API monitoring solution, the right product should be able to meet the needs of all the stakeholders and deliver the data directly to the tools and systems needed for accurate reporting.

Your API monitoring solution needs to ensure that it can output to the systems you use and gather the data that solves all your business needs for all the stakeholders involved.

 

API Monitoring: What to Monitor?

Because most API calls look like HTTP web calls, there is a natural tendency to treat them like webpages, and there are classes of API Monitoring tools that have evolved from web monitoring. This approach has some drawbacks because fundamentally API calls and web calls are different things. They work differently, they have different security and they’re often used in very different ways. We’re not just saying that because our API monitoring is purpose-built from the ground up for even the most complex API use cases.

Let’s start with considering the basic things people tend to think of when talking about API Monitoring:

  • Availability – is it up or not? 
  • Performance (latency, usually in milliseconds) – how fast is a call?
 

But what are you missing here? There’s a lot more to the question of how well does an API work than just 2 metrics.

  • Functional – did the call actually deliver the result expected?
  • Slow calls – how many passing calls fell outside of the expected performance range?
  • Did the call return the correct context as per your specification?
  • Are the security settings working to your expectations?
  • Is it performing uniformly across all possible locations? I.e. does one particular cloud or data center under perform – is that one that is critical to your customers?
  • Is the content you are sending correct for the call? Can you tell?
  • Are you returning the content you expected?
 

Depending on the audience, you might need to answer lots of questions and in a regulated environment this could involve a government agency with the power to fine you.

It’s also worth mentioning at this point, that you might also want to monitor the services you consume – are you getting what you pay for and could you prove to a third party that they missed their SLA?

As more APIs fall under the purview of a regulator, especially in healthcare and banking, teams need to pay careful attention to the quality of the data returned by the API. Does it match the OpenAPI specification? Many engineering groups use linting tools in their test process but if something changes between releases, would they know?

As a further complication, if you are expected to return accurate content AND be able to prove it is accurate, could you do so? Checking the content for meeting the standard AND returning the correct data could be essential in some industries to avoid fines.

 

API Monitoring: How to Monitor?

When considering what to monitor, we reviewed two types of monitoring. 

The first is simply an availability check – a ‘ping.’ This will tell you quickly if an endpoint exists and responds in the way you expect. A ping, if conducted from the right locations, might even answer some of the performance questions. Uptime monitoring is a simple check on the availability of a specific endpoint.

Typically uptime checks use simplified security (which might not work with complex OAuth-based APIs) and don’t check anything beyond the availability of the service and the overall latency of the call.

For a richer experience we would recommend full synthetic API monitoring where you are generating real calls against production systems to test all the sequences that a user would be interested in. This should also use the same security and authentication settings as an end user would use – requiring a system to handle not just OAuth type authentication but also the process of token refreshes too.

Calls should be made from the outside in, at a frequency that enables you to catch outages when they start. An APIContext client found that by using APIContext at a frequency of 1 functional sequence every 10 minutes they were able to reduce the time for alerting to potential outages from 4 hours (with their gateway and APM logs in a major data analysis tool) to under 15 minutes.

In addition to monitoring at the correct frequency, we recommend setting logic into your monitoring to handle retry scenarios. It’s a fact that API calls don’t work, first-time, 100% of the time, often failing for the first call and then passing for the second. Within APIContext we offer a range of retry conditions to capture the exact moment when an outage begins.

Don’t we have a Monitoring tool for that?

A final consideration is the assumption that all monitoring and testing is built equally. It is not. Something that works for a website in certain situations might miss critical API scenarios because it’s not designed to handle that type of security system.

Every team faces pressure to save costs, especially in SaaS and Tooling, but we find that good teams use different tools for different jobs, and APIs are complex enough that they do merit their own tooling.

Recommendations

  1. A fully functional API monitoring synthetics tool is essential for any enterprise, especially in a regulated industry
  2. Different audiences depend on different data – the needs of the SRE team will be different to the needs of the product or compliance teams – but the data ought to come from the same source
  3. Monitoring should be done outside-in, in the cloud locations from which your customers and end users call you
  4. Service Level Objectives should be measured independent from your infrastructure to avoid contaminating the data
  5. Data generated by your solution should be available in the tools and services you use
 

APIContext is a world leader in synthetic API monitoring for regulated and secure APIs with a system designed to save time, effort and money for all stakeholders.

API Monitoring

See how APIContext monitors the APIs you rely on and provides peace of mind.

KPIs for APIs

There are lots of things you can measure with APIs. What things are you likely missing?

Ready To Start Monitoring?

Want to learn more? Check out our technical documentation, our API directory, or start using the product immediately. Sign up instantly, and monitor your first API call in minutes.