“When everyone digs for gold, sell shovels”
Pretty good advice.
A lot like “Buy low; sell high”.
Or “Maintain a calorie deficit to lose weight”—all advice that’s correct and simple (but not easy).
When it comes to AI, the belief—according to stock prices—is that chip manufacturers are selling shovels.
And that customer data is the new oil. No wait, its the new gold ore.
This shovel-gold dynamic is great if you’re looking to make money in stocks.
But what if you want to make use your data to make money?—You know, like a business does?
Technology as a Mine
Successful gold miners—they do exist—plan where to dig long before they buy a shovel.
They survey the land to know what’s down there.
They’ll research to know what to expect underground.
Then they decide where is the highest return per dig.
People outside the mining industry hear.
Shares of the mine are created, bundled, and packaged for sale.
Sell-side bankers tell stories to the public. The bankers spin yarns about how the new mine could be El Dorado.
With starry eyes, the public bids up the price of these shares—pricing them with irrational exuberance—until the mine needs to pour out solid gold to be profitable.
The first hole is dug. Gold isn’t gleaming just below the surface. Grey reality sets in.
Investors order an audit of the preliminary digs. It reveals that the mine needs a significant CapEx to properly process the gold ore from the tailings.
Share prices plummet. Investors flee.
“Why would I want that? It requires investment now to return anything.”
Yep, it does—but so is everything worth doing.
Taking Your AI Past The Orphan Period
Just like the late 90s dot-com bubble1, there’s a lot of hype about AI.
Even here, we’ve talked about it time and again
Industry thought leaders2 have—correctly—identified that generative AI has a lot of potential to change how business is done.
By eliminating many routine manual activities, AI can speed up slow activities.
But with this hype, one essential element is overlooked: the quality of the data that feeds these AI systems.
Imagine a gold miner who neglects to survey the land. He digs randomly without knowing where the richest veins lie.
This approach won’t lead to success.
Imagine a business leader who neglects to survey his existing data. He directs IT randomly without knowing where the useful data lies.
This approach won’t lead to success.
Poor data quality in AI projects leads to flawed decisions
Poor data quality leads to wasted investments
Poor data quality leads to angry conversations with your boss’s boss.
Getting your AI projects to realize any value—beyond hype—requires some boring, nose-to-the-grindstone foundational data work.
Data Quality: The AI Mine Location
For AI, data is ore.
To extract anything of value, this data must have some useful material.
More is not better.
High-quality data is accurate, complete, consistent, and relevant.
High-quality data is reliable and representative of the real-world.
Before setting up an AI mine, you need to know where to dig and what kind of equipment will be needed.
In AI, this means data cleaning before model training begins.
Imagine you have an AI calendar tool that’s supposed to schedule people’s meetings. It uses historical Outlook data to predict future schedules.
But all your employees use Google Calendar for everything outside of rec sports.
You won’t hit gold.
When an AI project skips this first step, it's mining without a map.
You may hit a rich vein occasionally, but more often, you'll waste time and resources on barren ground.
Don’t bother starting an AI project without understanding what data will go in and what data will come out3.
Realized Value: Reaping the Rewards
In mining, after all the hard work, the gold is sold for a profit.
A successful AI project can deliver valuable insights and drive business value.
But—just like a mine—the results don’t come without understanding the value of hard work.
Not every mine strikes it rich; not every AI project will yield immediate returns.
But a mine without any ore CAN’T strike it rich; an AI project without good data CAN’T perform well.
Getting past the hype to realize real returns requires patience, continuous investment in data quality (and good engineers), and a willingness to iterate and improve.
FYI, the dot com bomb was BAD. Inflation-adjusted, the market took ~20 years to recover.
Notably different from real leaders because thoughts don’t have to be subjected to the scrutiny of real life.
Without criticism it would be nothing but one 'hosannah.' But nothing but hosannah is not enough for life, the hosannah must be tried in the crucible of doubt
—The Brothers Karamazov, Part IV.
Chapter 9: The Devil. Ivan's Nightmare
Data preparation is also really important, but that’s less of a precursor activity. Preparation involves defining what happens when encountering missing values, duplicates, and differing formats. This process ensures that the data fed into AI models actually represents what you think it does, leading to better performance and more reliable predictions.