Why software testing just became the most important skill in technology

a software developer showing a colleague how to improve their code quality

Ash Gawthorp, Ten10’s Chief Technology Officer, discusses why the recent rise of AI agents proves the lasting importance of software testing skills in the modern tech workforce

If you’re working in technology right now and you feel a bit lost—like everyone around you has figured out this AI thing while you’re still trying to understand what MCP stands for, or whether you should be using agents, or how to keep ChatGPT from making things up—you’re not alone.

In fact, you’re in remarkably good company.

The admission that validated an industry’s anxiety

In late December 2025, a post on X (formerly Twitter) immediately went viral, garnering over 55,000 likes, 10,000 retweets, and 16.4 million views. The message sparked intense discussions across the entire tech industry because of both what it said and who said it.

The message was this:

screenshot of a tweet from andrej karpathy

The person who wrote this? Andrej Karpathy.

For those unfamiliar with the name, Andrej Karpathy is not just some developer having imposter syndrome. He is:

  • A founding member of OpenAI (2015-2017), where he worked on the fundamental AI research that led to GPT and other breakthrough models.
  • Former Director of AI at Tesla (2017-2022), where he led the computer vision team that built one of the world’s most advanced self-driving AI systems, now deployed in millions of cars.
  • PhD from Stanford under legendary AI researcher Fei-Fei Li.
  • Designer and primary instructor of Stanford’s CS231n course—the first deep learning class at Stanford, which became one of its most popular courses and trained an entire generation of AI researchers.
  • Named to MIT Technology Review’s Innovators Under 35 (2020) and TIME Magazine’s 100 Most Influential People in AI (2024).
  • Currently founder of Eureka Labs, an AI+education company.

In other words: Karpathy didn’t just observe the AI revolution—he helped build it. He was there at the beginning of OpenAI. He architected Tesla’s autonomous driving neural networks. He literally taught deep learning to thousands of Stanford students. He popularised the concept of “Software 2.0″—the idea that neural networks are becoming the new way we write code.

And now, in 2026, even he feels behind.

His description of AI as “some powerful alien tool…with no manual” and the transformation as a “magnitude 9 earthquake” captures perfectly the disorientation that everyone—from junior developers to elite AI researchers—is experiencing right now.

What happened in the comments under Karpathy’s post was extraordinary. Thousands of developers, engineers, CTOs, and technical leaders essentially said: “Thank God someone said it. I thought it was just me.”

One comment summarised it perfectly: “Karpathy sounds like a gaslighting victim. First the AI bros tell us we’re close to AGI and the Bots are at PhD level. Then, when the tools randomly break it’s our fault for not instructing them correctly.”

The feeling is universal because the shift is real. This isn’t about individuals falling behind—it’s about the entire profession being fundamentally refactored in real-time.

a man sat working with multiple screens of code

The phase transition: From authorship to orchestration

What Karpathy is describing, and what everyone is feeling, is a profound “phase transition” in technical leverage.

For decades, software engineering was defined by deterministic authorship: you wrote exact instructions, and machines executed them identically every time. If your code compiled and ran, you knew exactly what it would do. Trust was inherent in the system.

Now we’ve entered an era defined by orchestrating probabilistic intelligence. Large language models are stochastic token generators rather than logic-based programs. Their internal reasoning cannot be “single-stepped” or fully inspected. You can ask the same question twice and get different answers. Sometimes the model makes mistakes. Sometimes it misunderstands context. Sometimes it just hallucinates entirely plausible falsehoods.

In this new world, trust is no longer inherent—it must be engineered.

And that’s where the opportunity for software testers becomes extraordinary.

The reveal: This is all about testing

Here’s the crucial insight that transforms Karpathy’s admission from anxiety-inducing to opportunity-revealing:

Every single thing Karpathy describes as the “new programmable layer of abstraction” is fundamentally about quality assurance and verification.

When he talks about needing to build “an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities,” he’s describing the exact problem that software testers have been solving for decades: How do you maintain authority and ensure correctness when the underlying system is unpredictable?

The skills that testers have developed (specification, verification, observability, failure mode analysis, continuous monitoring) haven’t just become relevant. They’ve become the foundational skills for making AI work at all.

Let me show you what I mean by walking through the four levels of this new technical hierarchy. Watch how each level is fundamentally about testing.

Level 1: Conditioning and steering = Test case design

The AI challenge

Karpathy describes needing to master “agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills” as part of this new layer. Breaking this down into practical capabilities:

  • Intent specification: Ambiguity causes models to hallucinate. You must develop tight problem contracts with specific definitions and constraints.
  • Context engineering: Managing what material enters the context window—the “new IO and databases” of the AI stack.
  • Constraint design: Without constraints like schemas, rubrics, and stop conditions, a probabilistic system is merely a “slot machine.”

This is test case design

Look at what’s actually being described here:

  • Intent specification = Writing test requirements with zero ambiguity
  • Context engineering = Managing test data, fixtures, and environmental conditions
  • Constraint design = Defining acceptance criteria and boundary conditions

Testers have always been the people who look at a vague requirement like “the system should be fast” and say, “Fast how? Under what conditions? What’s acceptable? What’s the failure threshold?”

That skill—eliminating ambiguity—has always been the tester’s superpower. In deterministic code, developers could sometimes get away with fuzzy specifications because the code would at least behave consistently. In probabilistic systems, ambiguity is catastrophic. You’re not just risking bugs—you’re risking complete nonsense.

The foundational skill of AI engineering is actually the foundational skill of software testing: making the implicit explicit.

Level 2: Preserving Authority = The Testing Function Itself

The AI challenge

In traditional software, the person who wrote the code naturally held authority over it. You could trace every line, understand every decision. In the age of AI, this “chain of custody” is broken. The system is now required to:

  • Verify outputs: Since models produce “plausible falsehoods,” you must design explicit verification mechanisms—some deterministic (like unit tests) and some procedural (like human review).
  • Establish provenance: Systems must provide traceability—citations, retrieved documents, audit trails that justify the model’s claims.
  • Enforce permissions: Models should never be a security boundary; permissions must be deterministic and based on least privilege.

This is literally testing

This level is entirely the testing discipline:

  • Verification design – Karpathy explicitly calls out “unit tests” as deterministic verification. This is the core testing function: deciding what to verify, how to verify it, and when verification can be automated versus requiring human judgment.
  • Provenance – Test traceability and audit trails have always been quality engineering concerns. Being able to say “this requirement was tested by these test cases, and here’s the evidence” is Testing 101.
  • Permission envelopes – This is security testing and authorization validation—ensuring systems respect boundaries.

The phrase “plausible falsehoods” is brilliantly descriptive. It’s the AI equivalent of bugs that pass through traditional testing because they look correct. Testers have always been the discipline that understands “it compiled” doesn’t mean “it works,” and “it works” doesn’t mean “it’s correct.”

That distinction just became existential. Without explicit verification, AI systems are literally unusable.

Level 3: Workflow design = Test architecture

The AI challenge

Karpathy describes needing to master “workflows, IDE integrations” and build systems where “fundamentally stochastic, fallible, unintelligible and changing entities” are “intermingled with what used to be good old-fashioned engineering.”

This breaks down into:

  • Decomposition: Breaking tasks into pipeline steps where failures remain local rather than global.
  • Failure mode taxonomy: Debugging shifts from tracing code logic to classifying why a model failed (was the task underspecified? was retrieval wrong? did the model lack capability?).
  • Observability: Because internal reasoning is opaque, the surrounding system must be made extremely legible through traces of tool calls and intermediate validations.

This Is test architecture

Every piece of this is systems testing and quality engineering at scale:

  • Decomposition – Designing systems for testability: breaking complex systems into components with isolated failure modes, clear interfaces, and independent verification points. This is how you make large systems testable.
  • Failure mode taxonomy – Root cause analysis and defect classification. Testers have always asked “why did this fail?” and built taxonomies (environmental issue? requirements defect? integration problem? data issue?) to make debugging systematic.
  • Observability – Instrumenting systems for testability—adding logging, tracing, and monitoring so that failures can be diagnosed and reproduced. The insight that you can’t inspect internal reasoning, so you must make the surrounding system legible, is classic test architecture thinking.

When Karpathy describes the challenge of debugging systems where you can’t see inside the black box, he’s describing what testers have dealt with forever. When you can’t step through the code (because it’s a model, or because it’s proprietary, or because it’s too complex), you instrument what goes in and what comes out. You build harnesses. You create observability.

Level 4: Compounding leverage = Continuous quality

The AI challenge

Karpathy talks about the challenge of systems that are “constantly changing”—models that update, capabilities that shift, and the need to “not fall behind” in a profession experiencing a “magnitude 9 earthquake.”

This requires:

  • Evaluation harnesses: Without “evals”—golden sets of examples or regression tests—changing a prompt or model is like playing “Russian roulette” with the system’s performance.
  • Feedback loops: Getting the model to critique and revise its own work before final shipment.
  • Drift management: Governing systems in a state of continuous change due to model updates and evolving requirements.

This is modern quality engineering

This is the entire modern quality engineering toolkit:

  • Evaluation harnesses – Karpathy literally describes regression test suites, golden master testing, and benchmark sets. His “Russian roulette” metaphor is exactly what happens when you deploy without automated testing.
  • Feedback loops – Continuous testing, quality gates in CI/CD, and shift-left testing practices that catch issues before production. Self-reviewing systems are just automated test-and-fix loops.
  • Drift management – Production monitoring, synthetic testing, and quality metrics that track system behaviour over time as the world changes. This is the recognition that quality isn’t a one-time achievement—it’s a continuous practice.

This level acknowledges something testers have always known: shipping the software isn’t the end—it’s the beginning. Systems drift. Requirements change. The environment evolves. Quality is ongoing vigilance.

The new definition of "technical" = Quality engineering

Karpathy writes: “There’s a new programmable layer of abstraction to master… and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities.”

Read that sentence again, but replace his description with this: “You need to build systems that produce reliable outcomes even when the underlying components are unreliable, opaque, and constantly changing.”

That’s not a new problem. That’s the definition of software testing.

The traditional divide between “technical” (people who write code) and “non-technical” (everyone else) is collapsing. In Karpathy’s new landscape, being “technical” simply means the ability to orchestrate uncertainty without losing authority.

And orchestrating uncertainty without losing authority has always been what testers do.

The opportunity: From behind to ahead

Here’s where the narrative flips from anxiety to opportunity:

Everyone feels behind because the skills that matter have fundamentally changed. But testers aren’t starting from scratch—they’re starting from mastery.

The skills Karpathy describes as this bewildering “new programmable layer” aren’t new to testers:

  • Eliminating ambiguity in specifications? Testers do this daily.
  • Designing verification mechanisms for systems you can’t fully inspect? That’s black-box testing.
  • Building observability into opaque systems? That’s test instrumentation.
  • Creating evaluation harnesses and regression suites? That’s test automation.
  • Classifying failure modes and doing root cause analysis? That’s debugging and triage.
  • Monitoring for drift and continuous quality? That’s production testing.

When Karpathy says “some powerful alien tool was handed around except it comes with no manual,” he’s describing the experience of developers who are accustomed to deterministic systems. For testers, who have always worked with uncertainty, unreliability, and the need to verify rather than trust, this isn’t alien at all—it’s just Tuesday.

man writing lines of code on three screens

Three critical insights for testers

1. Everyone needs resting skills now

Karpathy’s post went viral because it articulated what everyone is feeling. But here’s what’s underneath that feeling: everyone now needs to do what testers have always done.

AI systems don’t just benefit from testing—they’re unusable without it: without verification mechanisms, AI systems are “slot machines”—unpredictable, unreliable, and dangerous.

Quality engineering hasn’t gone from “nice to have” to “important.” It’s gone from “important” to “table stakes for AI working at all.”

2. The build/test divide has collapsed

In the deterministic era, there was a clear division: some people built the system (wrote the code), and other people tested it (verified it worked). This created an organisational structure where testing was downstream from building.

That structure is dying.

When building itself becomes about orchestrating unreliable components, building and testing become the same activity. You’re not writing code that you know will work—you’re composing systems from parts that might work, then verifying they work, then monitoring whether they keep working.

Every engineer now needs testing skills. Every AI system needs quality engineering architecture. The distinction has collapsed.

3. Vendor independence requires verification capability

Karpathy describes needing to build “mental models for strengths and pitfalls” of these systems. Why? Because you can’t trust vendor claims.

When OpenAI says GPT-4 is “82% better” at something, or when vendors claim their model “rarely hallucinates,” what they mean is: under certain conditions, with certain prompts, on certain benchmarks, the performance improved. Will it work for your use case? You have no idea until you test it.

Organisations building AI capability without deep quality engineering expertise are taking vendors at their word. That’s not strategy—that’s faith.

The person who can design evaluation harnesses, build observability into agent workflows, create feedback loops that catch hallucinations before production, and establish provenance and verification mechanisms? That’s not just a tester anymore—that’s the most strategic role in the AI era.

The bottom line: You’re not behind—you’re ahead

Karpathy’s post resonated because everyone feels behind. The profession is being “dramatically refactored.” There’s a “magnitude 9 earthquake.” The tools are “powerful but come with no manual.”

But here’s what testers need to understand: The skills required for this new era are exactly the skills you’ve been building your entire career.

When Karpathy describes orchestrating uncertainty without losing authority, he’s not describing a new discipline. He’s describing software testing.

The phase transition in technical leverage isn’t creating something new. It’s revealing that the discipline of quality engineering—of building trust into unreliable systems—was always the hard part. We just didn’t notice because deterministic code made trust automatic.

Everyone feels behind because everyone needs to learn to think like a tester.

And testers? You’re not behind. You’ve been training for this moment for years. The rest of the industry is just now catching up to what you’ve always known:

  • Trust must be verified
  • Specifications must be precise
  • Systems must be observable
  • Quality must be continuous
  • Uncertainty must be managed

The future belongs to people who can look at a probabilistic black box and say: “Here’s how we’ll verify it. Here’s how we’ll monitor it. Here’s how we’ll know when it’s drifting. And here’s how we’ll maintain trust at scale.”

That’s always been the tester’s job.

It’s just that now, finally, the entire industry realises that it’s the most important job there is.

Follow Ash Gawthorp on LinkedIn

ash gawthorp author picRead more insights from our Chief Technology Officer by following Ash on LinkedIn and message him directly any questions you have about AI and the changing landscape of software testing.