How "Microservices Architecture" Went From Being Useful to A Sickness

You Shouldn't Become Middleware Unintentionally

Aug 30, 2024

How Bad Can A Dependency Outage Be?

In July 2024, the world witnessed what might be the largest IT outage in history1.

It wasn't a cyberattack or a natural disaster2, but a routine software update from security company CrowdStrike that brought millions of Windows systems grinding to a halt.

A critical flaw in CrowdStrike's Falcon platform triggered the blue screen of death on millions of devices—airline terminals, banking portals, utility company sites—disrupting critical services across the globe.

A Small Collection of CrowdStrike “Blue Screen Death Day” Memes | by Zhimin Zhan | Jul, 2024 | Medium

Airlines were grounded, public transit systems halted, hospitals faced delays, and financial institutions struggled to keep operations running.

The outage revealed how our growing reliance on third-party services has made modern businesses vulnerable to cascading systemic failures.

The fallout from this incident is expected to cost U.S. Fortune 500 companies ALONE an estimated $5.4 billion, highlighting the ~~potential~~ actual risks of placing too much trust in a single point of failure.

The Crowdstrike outage reveals how a single point of failure in a specialized service can lead to massive consequences from seemingly anodyne integrations.

Why Do We Build This Way?

Key Takeaway: Businesses generally aren’t stupid—the specialized offshoring of technical, physical, and mental effort helps offer cheaper and better products during normal business operations. Microservices thinking has led to great business-as-usual (BAU) gains since the 60s

Before I go out of my way to criticize microservices thinking, we need to understand why so many businesses have embraced this model in the first place.

The appeal of microservice architecture, which allows businesses to break down complex systems into smaller, independent services, is undeniable.

Across industries, practices, regions—they all see the advantage.

Even I’ve discussed the significant merits of this structure in my post about composability.

Why Composability Matters in Your Tech Stack

Ward Rushton

November 3, 2023

Why Composability Matters in Your Tech Stack

In the 1930s Kiddicraft was the top dog for producing interlocking plastic building blocks—the LEGO before LEGO.

Read full story

In 2020, 77% of businesses reported using microservice architecture, and 92% of those said the adoption was successful.
Some of the most successful companies, such as Amazon, Netflix, Uber, and Etsy, have attributed their success in part to microservices.3

For many companies, this modular approach allows for faster development and deployment of new features, reduces the complexity of large monolithic systems4, and enables teams to work on different parts of a codebase without stepping on each other’s toes5.

By integrating third-party services into their architecture, businesses can also leverage specialized capabilities without having to build them in-house, saving time and reducing costs.

Just like how it might not make sense for an individual to own a tree mulcher because it’s not used very often, but it definitely makes sense for a arborist company that needs to break down trees.

It would be crazy to own this on your own, but it’s a great tool to be able to use across projects.

You’re offshoring your teams difficulties with time zone management to another external source that specializes in that problem.

That company may have special tools, processes, or intellectual property that they developed to solve that exact problem—why not let them do the heavy lifting?

This logic leaks out from just technology to all modern businesses.

Instead of having a generalist company gardener, hire landscaping companies specialized in each aspect of landscaping you need
Instead of having a local machinist that makes the parts you need, re-engineer your parts to ship in from specialized factories in Asia
Instead of having extra inventory storage, have your parts arrive just-in-time

All of these examples create better, cheaper results during normal business operations—they’re partially responsible for the huge cost savings on production since the mid 20th century.

But as these practices have seen widespread adoption, many organizations have become increasingly dependent on these specialized external services.6

This dependency, while initially advantageous, introduces a new set of risks—risks that are often overlooked until a failure occurs.

The very architecture that promises speed and flexibility can also create blind spots, where businesses are so focused on immediate gains that they lose sight of the vulnerabilities that they introduce into a system.

I call this kind of thinking “Microservices Myopia”

The Risks of Microservice Myopia

Key Takeaway: Microservice Myopia occurs when businesses shift too much responsibility onto specialized systems, leaving humans to manage complex dependencies they are not naturally equipped to handle. Instead of adapting, most businesses and people remain blind to the risks

While the appeal of microservice architecture is rooted in its ability to break down complex systems into manageable, specialized tasks, it also creates a situation where businesses over-rely on these specialized services—or even worse, so depend on third-party services that they forget they even exist.

This over-reliance introduces significant cascading risks, particularly when those services fail or change unexpectedly. The crux of the problem lies in the mismatch between what humans and computers are naturally good at.

Dependency — If you build your business like this, you should certainly **be aware** of the tenuous support you rely on

Humans excel at generalized tasks—we are adaptable, capable of seeing the big picture, and can pivot when faced with unexpected challenges.

In contrast, computers are incredibly efficient at performing highly specialized tasks with precision and speed, but they lack the flexibility and broader perspective that humans bring to the table.

Microservices architecture, with its emphasis on specialization, often pushes humans into roles where they are managing an increasingly complex web of dependencies, rather than applying their strengths in adaptability and holistic thinking

By over-relying on this architecture, we inadvertently shift the burden of managing these intricate systems onto human operators, who are then tasked with coordinating multiple microservices, each with its own potential points of failure.

This setup can leave businesses vulnerable to disruptions that humans are poorly equipped to manage—like the kind seen in the CrowdStrike outage.

When one specialized service fails, the ripple effects can be catastrophic, and the very people responsible for maintaining operations may find themselves overwhelmed by the complexity of the situation.

But the worst part is that people aren’t bracing for an impact—they’re walking blindly and unaware.

This is the danger of “Microservice Myopia”: by focusing too narrowly on the benefits of specialization and speed, businesses can end up creating environments where humans are expected to manage highly specialized systems without the necessary tools or perspectives to do so effectively.

The result is a fragile ecosystem where a single point of failure can lead to widespread disruption, as humans struggle to navigate the complex interdependencies that arise from this architectural approach.

Now that you know the risks of cascading microservice failure, we’ll discuss strategies to mitigate it in a future post.

…largest outage so far

But I certainly assumed it was at the time

Ironically, while software (which is easier to change) moved towards microservices architecture, physical products (which are hard to change) have shifted away from replace-one-part construction. See below for an example.

Which accrue technical debt, even when not in technology. Imagine trying to change Coca-Cola’s shipping infrastructure. It would be extremely difficult on a physical level and technical level. If they just offshored their shipping to FedEx it would be an easier shift to UPS.

when git merge 😂😂😂 #git #programming memes

Also why ‘price gouging’ during a supply constrained event isn’t inherently bad. It proportionately rewards the businesses that bear the ongoing costs of excess supply during normal operations during times where demand is big.

For example, Home Depot should only carry the minimum generators to satisfy their customers during normal operations. Otherwise they’re spending a lot of money to ship, store, and insure complicated machinery on their property.

But in a hurricane, wouldn’t it be useful to be able to pay extra—but guarantee that you can find a generator when you need it?

Pricing controls generally lead to shortages instead of depressed prices.

MarTech for Humans

Why Composability Matters in Your Tech Stack

Discussion about this post