Tokenmaxxing: The Costly Mistake in AI Engineering Metrics

Silicon Valley has a new competitive sport, and it’s exactly the mistake you think it is.

The New York Times reported last week that engineers at Meta, OpenAI, and other major tech companies are competing on internal leaderboards that track how many AI tokens they consume. The trend has a name: tokenmaxxing. One OpenAI engineer burned through 210 billion tokens in a single week. A software engineer in Stockholm says his employer spends more on his Claude Code usage than on his salary. Jensen Huang told the GTC audience that engineers should receive up to half their base salary in AI tokens as a fourth pillar of compensation.

I’ve been building and scaling engineering teams for fifteen years, and I’ve seen this exact pattern before albeit with a different metric but the same trap.

We’ve Done This Before

In the early 2000s, some shops measured developer productivity by lines of code written. The metric was legible, easy to track, and felt like it captured something real. It did it, but just not in the way that anyone thought. The engineers who wrote the most code weren’t necessarily shipping the best features. They were often shipping the most bloated, hardest-to-maintain features. The metric rewarded volume when the job required judgment.

We learned that lesson and moved on to velocity points, which we promptly gamed by inflating estimates. Then we tried deployment frequency, which incentivized small, low-risk changes over meaningful ones. Every generation of engineering leadership discovers that measuring the most visible input is the fastest way to get more of that input and less of the outcome you actually wanted.

Tokenmaxxing is the agentic-era version of this exact failure mode. When companies put token consumption on a leaderboard, they aren’t measuring productivity. They’re measuring compute spend and calling it performance.

The Input/Output Confusion

I wrote a few weeks ago about the minutes added to workforce framing which is the idea that AI’s real value is expanding your team’s effective capacity, not replacing headcount. That framing is about output: what did AI enable your team to accomplish that they couldn’t have otherwise?

Tokenmaxxing flips that framing on its head. It measures the input (e.g. how much compute your team consumed) and assumes the output followed. That assumption is the entire problem.

An engineer who burns through 10 billion tokens running unsupervised agent swarms that generate code nobody reviews isn’t more productive than one who uses 100 million tokens with intentional prompting and actually ships. They’re just more expensive and if your performance review system can’t tell the difference, you have a management problem dressed up as an AI strategy.

Enjoying this? I write about AI implementation and engineering leadership every week.

What Engineering Leaders Should Actually Be Worried About

Here’s the part of this conversation that most of the coverage is missing, and the part that should make every CTO and VP of Engineering sit up straight.

Token budgets as compensation are a CFO conversation, not just a recruiting perk. When Jensen Huang says engineers should get half their base salary in tokens, that sounds like a benefit but compute budgets don’t vest. They don’t appreciate, and they don’t compound. A $100,000 token budget on top of a $375,000 total comp package looks generous in a term sheet, but it’s a line item that benefits the company as much as the employee.

When per-engineer token spend approaches salary, the headcount math changes. This is the uncomfortable part. If a company is spending $200K on an engineer’s salary and $200K on the compute that makes them productive, a CFO is going to ask a very natural question: what happens if we reduce the human side of that equation and increase the compute side? Tokenmaxxing culture doesn’t just measure the wrong thing but it accelerates the case for headcount reduction by making the compute cost per employee visible and large.

Leaderboards create perverse incentives for your least experienced engineers. Junior engineers who are already anxious about proving their value in the AI era will grind tokens to stay off the bottom of the board. This is the opposite of what you want. You want junior engineers asking better questions, reviewing AI-generated output carefully, and building judgment. You don’t want them running agent swarms they can’t evaluate to generate a number that makes them look busy.

The Real Metric

I’ve argued consistently that the right way to measure AI’s impact is capacity generated which is the delta between what your team could accomplish before AI tooling and what they can accomplish now, measured in outcomes, not inputs. Minutes added to workforce, not tokens consumed.

If you want to know whether your AI investment is working, ask these questions: Is the time from idea to production shorter? Are your engineers spending more of their day on high-judgment work and less on scaffolding? Is your team shipping things that were previously impossible at your headcount level?

None of those questions are answered by a token leaderboard!

The best engineering teams I’ve worked with and built have always been the ones that optimized for signal over noise. That was true when the noise was lines of code, it was true when the noise was story points, and it’s true now that the noise is token consumption. The metric changes. The mistake doesn’t.

The companies that win the agentic era won’t be the ones that burned the most tokens. They’ll be the ones that turned tokens into outcomes with the least waste and the clearest intent. That’s not a new idea. That’s what good engineering leadership has always been.

This post connects to ideas from Minutes Added To Workforce, Your AI Notetaker Is the Biggest Security Decision You’re Not Making, and Are You Managing Your Manager? (The Agentic Update).

“Tokenmaxxing” Is Lines-of-Code Thinking for the Agentic Era

We’ve Done This Before

The Input/Output Confusion

What Engineering Leaders Should Actually Be Worried About

The Real Metric

Like this:

We’ve Done This Before

The Input/Output Confusion

What Engineering Leaders Should Actually Be Worried About

The Real Metric

Share this:

Like this:

You might also like

Polymorphic Cultures: One Year In

The Marginal Return of Intelligence (And the Marginal Return to It)

One-Person Companies Are a Forcing Function, Not a Fad

The notebook, in your inbox.

Discover more from Duncan Grazier