API Rants: Test in Prod or give up.
This is a public service rant.
For the love of Mike/whatever being is responsible for your monitoring, please monitor your APIs in their production environment!
I had a little rant recently about how I thought the monitoring industry had become somewhat self-serving— now I’m going to shift to something that isn’t the fault of the monitoring industry but is a HUGE problem in the Open Finance sector.
Monitoring Production Systems
You release a software service. Own it. Period. We can go home now.
What’s that?
Group risk and security won’t let you monitor a real account?
Then give up, you’ve failed before you’ve even start.
No seriously, stick a fork in your open digital strategy, you’re done.
Harsh? Perhaps, but let’s break this down.
A modern IT setup for something as complex as, say, a bank or large credit card provider, has many moving parts. Some of itmight be quite old, and some of it will be outside of your direct control because a third-party or uninterested department hidden deep in the maze of your business history owns it.
However, a complex IT solution is, in fact, more than the sum of its parts. Just because you know that all the parts are working doesn’t tell you anything about the system as a whole.
Same goes for any individual part of the solution. If you’re an API engineering group, your gateway will tell you a lot, but it won’t tell you why a partner in Finland gets 404 errors-Forbidden when trying to access your systems from their Google Compute application – in fact, it won’t even see their calls.
We (APImetrics) sit on a certification committee where this recently became a significant topic of conversation. Not least of all because many of the members didn’t want to talk about the topic. They’d been shut down about having test accounts because their internal risk and compliance teams didn’t want them to have accounts available for this purpose. But this is a sector where the risks are outweighed by the risk of software failure.
It would be like knowing there was a potentially dangerous software bug in an aircraft and ignoring it because the risk was considered low… no, that could never happen… The point is not doing this opens organizations up to more institutional risk than doing it.
This extends to many parts of the financial services API delivery sector.
How often should you retest?
Do you verify that all the parts of your security delivery system work correctly?
Do your Tokens last as long as you expect them to?
Do they renew?
Can you access protected resources – IN PRODUCTION – without any security?
Financial services APIs have to be monitored, externally, for lots of reasons, and if you’re skipping that, you’re leaving your entire strategy open to failure.
- Monitor for functionality: do the calls work as expected, the way external users expect them to?
- Monitor for performance: does everything work correctly from ANYWHERE your customers and partners could be?
- Monitor for security: does the security work correctly for people to access production and, perhaps more importantly, does it prevent access in the right way?
You can’t mock these things. Sandboxes don’t accurately mimic real-life servers. The risks to a brand, to your bottom line, and to your relationships with government regulators mean this is something that needs to be taken MUCH more seriously.
Consider this all-too-realistic scenario, that we have, in fact seen:
A TPP complains to your national regulator that something doesn’t work. You counter with evidence that all the time the TPP is complaining about you were, in fact working. Except – and here’s the kicker – as far as they were concerned, you were actually down and down hard. Not only is this not worth a potential fine, but loss of brand value and perception in the industry is a very real threat.
It has a value to your operation that exceeds the potential risk of testing against a real but very locked down test account.
All too often we observe that a continuous testing ethos ends at launch and DevOps takes over with an eye on the various systems they have in their remit. But that isn’t good enough for critical open systems where the delivery chain is long, complex, and often outside of your direct control and site.
Monitor everything. Holistically. End-to-End and in production. It’s 2020, honestly, you cannot afford not to.