The narrative around AI and human intelligence has settled into two camps. One says AI will make us all smarter. The other says it’s turning us into intellectual dependents. Both are wrong, and the research now shows exactly why.
AI amplifies whichever mode the user brings to it. Passive, uncritical use measurably degrades cognition. Active, intentional use measurably sharpens it. The choice between those two modes is itself a higher-order thinking act — and that’s the whole point.
AI amplifies whichever cognitive mode you bring to it. The tool doesn’t determine the outcome. Your approach does.
The evidence for cognitive erosion is real
The research on unguided AI use is damning. A Microsoft Research and Carnegie Mellon University study of 319 knowledge workers found higher confidence in AI tools correlated with less critical thinking. An MIT Media Lab EEG study tracked 54 participants over four months — the ChatGPT group showed up to 55% reduced brain connectivity. 83% couldn’t quote from their own essays afterward. The researchers called this cognitive debt: short-term efficiency purchased at long-term cognitive cost.
A peer-reviewed German study compared students using ChatGPT vs Google. ChatGPT users experienced lower cognitive load but produced lower-quality reasoning. The easy path produced weaker thinking.
Worse performance by students after unrestricted AI access was removed — genuine skill atrophy measured at scale in a PNAS field experiment with nearly 1,000 students.
The most powerful finding comes from a PNAS field experiment with nearly 1,000 Turkish high school maths students. Unrestricted ChatGPT access improved practice scores by 48% — but when AI was removed, students performed 17% worse than the control group. Not worse than when they had AI. Worse than students who never used it at all.
How AI is used changes everything
That same PNAS study had a third group using a “GPT Tutor” that gave hints instead of answers — a Socratic approach. This group improved by 127%, more than double the unrestricted group. And the negative effects when AI was removed were largely mitigated. Same tool. Same students. Different design. Radically different outcomes.
The MIT brain study showed the same pattern. When the brain-only group was finally given LLM access, they showed increased neural activity and more sophisticated prompting strategies. Strong cognitive foundations let people use AI productively. Weak foundations lead to dependency.
A cross-country experiment across Germany, Switzerland, and the UK confirmed it: unguided AI fostered cognitive offloading without improving reasoning, while structured prompting significantly enhanced both critical reasoning and reflective engagement. A meta-analysis of 29 experimental studies found generative AI exerts a moderate positive effect on higher-order thinking — but only when learners had high self-regulated learning capacities.
The variable isn’t the AI. It’s the user’s approach.
Prompting is thinking made visible
Most people treat prompting as a technical skill. The research says something more interesting. A Frontiers in Education paper identified prompt engineering as requiring metacognitive monitoring and cognitive regulation. A Springer Nature paper showed prompt crafting aligns with computational thinking: abstraction, iteration, generalisation, debugging. This isn’t about memorising template prefixes. It’s about decomposing a problem clearly enough that another intelligence — artificial or human — can work with it.
AI wasn’t failing me; it was holding up a mirror to the fuzziness of my own thoughts.
A Rev survey of 1,000+ AI users backs this up. People who feel they’re getting better at prompting are 64% more likely to say they never experience hallucinations. Daily users are 14 times more likely than casual users to double-check AI’s work. Power users spend dramatically more time iterating. The people getting the best results from AI are working harder, not less.
The confident wrongness problem
AI is more confident when it’s wrong. MIT research found AI models are 34% more likely to use definitive language when generating incorrect information. A 2025 mathematical proof confirmed hallucinations cannot be fully eliminated under current architectures. OpenAI’s o3 reasoning model hallucinated 33% of the time on factual questions. Stanford found LLMs hallucinated at least 75% of the time on court ruling questions, inventing over 120 non-existent cases.
The consequences are already measured. Researcher Damien Charlotin’s database has identified 1,081 court cases involving AI-hallucinated content — 90% from 2025 alone. Major international law firms have been fined for citing non-existent rulings. Knowledge workers spend 4.3 hours per week fact-checking AI outputs. Global financial losses from hallucinations reached €67.4 billion in 2024.
Every technology that automates a cognitive task creates the same fork. GPS users have measurably worse spatial memory. Younger-generation Inuit hunters who traded deep-rooted wayfinding skills for GPS experienced increased serious accidents in the Arctic. But route planning before a trip supplements rather than replaces spatial knowledge. Same tool, different cognitive engagement, radically different outcomes. AI is no different.
The adoption-capability gap
Of people globally have received any AI training — yet 66% use AI regularly. The gap between adoption and capability is where the damage happens.
A KPMG/University of Melbourne study across 47 countries found 66% of people use AI regularly, yet less than half trust it. 66% rely on output without evaluating accuracy. 56% are making mistakes because of AI. The World Economic Forum ranks analytical thinking as the number one core skill — 7 in 10 companies call it essential. AI literacy is the fastest-growing skill on LinkedIn with 600% growth in 2024. 77% of employers plan to upskill workers rather than cut roles.
The market is not rewarding people who can use AI. It’s rewarding people who can think clearly and use AI as one of their tools.
Beyond the chat box: LLMs are not agents
Everything above describes how people interact with large language models — the chat-box pattern. You type a prompt, the model generates text, you evaluate it. That’s where the cognitive risk lives, and where all the research was conducted.
But the AI landscape is splitting in two, and the distinction matters more than most people realise.
LLMs generate text. You ask a question, you get an answer. The quality depends entirely on your prompt, your evaluation, and your domain knowledge. This is what ChatGPT, Gemini, and most “AI tools” are — powerful, but fundamentally reactive.
Agentic AI does work. It reads your codebase, plans a sequence of actions, executes them, checks the results, and iterates. It uses real tools — file systems, APIs, databases, version control — and produces artifacts you can inspect, test, and deploy. The cognitive demand on the human shifts from “evaluate this text” to “direct this process and verify the outcomes.”
I build with agentic AI every day. The tool I use most is Claude Code, Anthropic’s agentic coding environment. Not a chat window where I paste prompts and edit what comes back. A workflow where the AI reads an entire project, proposes specific changes, runs builds, catches its own errors, and I direct, review, and decide.
That demands more thinking, not less. And the tooling rewards that thinking in ways a chat box never could.
What working with agentic AI actually looks like
Claude Code doesn’t just respond to prompts. It operates inside your project. It reads files, edits code, runs terminal commands, manages version control, and deploys to production. But what makes it genuinely different from the chat-box pattern is how much it forces you to think clearly before anything happens.
Take custom slash commands — reusable workflows you define as markdown instructions that the AI executes on demand. I’ve built a /start command that gathers project context, checks the GitHub backlog, reviews the last session’s handoff notes, asks what I’m building today, and sets up a clean git branch. A /deploy command that runs the full build-test-deploy pipeline to Google Cloud Run. A /verify-grants command that audits our grant database against live government sources for stale data. A /simplify command that reviews changed code for duplication and unnecessary complexity before every commit.
Each of these required me to think through the exact sequence of steps, the decision points, the failure modes, the edge cases. What should happen if the build fails. What context the AI needs before it starts. What questions it should ask me versus what it should decide on its own. That’s systems thinking — the kind the research says matters most — and the AI just executes it faithfully every time.
Planning mode takes it further. Before any significant build, I switch Claude Code into a dedicated planning mode where it can’t write code — only think. It analyses the codebase, maps the files that need to change, considers architectural trade-offs, identifies risks, and proposes a step-by-step implementation plan. I review it, challenge the assumptions, push back on the approach, adjust the scope. Only when I’m satisfied does implementation begin. The AI forces me to articulate what I actually want and why before a single line of code changes — the same metacognitive process the research identified as the key differentiator between productive and passive AI use.
Then there’s MCP — Model Context Protocol — an open standard that lets the AI connect to external tools and services in real time. Through MCP servers, Claude Code can query up-to-date library documentation instead of relying on training data. It can interact with GitHub — creating branches, opening pull requests, reading review comments. It can manage databases, generate podcast audio through Google’s NotebookLM, fetch live web content for research. Each integration is an architectural decision: what tools does this workflow need, what data should the AI have access to, what permissions does it get, what guardrails prevent it from doing something destructive. You’re not just prompting — you’re designing a system of capabilities and constraints.
Memory and context persistence is another layer entirely. Claude Code maintains a persistent memory system across sessions — project decisions, user preferences, feedback corrections, external references. When I correct its approach, it remembers for next time. When a partner’s contact details change, it updates its own records. When a strategic decision is made, it logs the reasoning so future sessions have context. This means the AI gets sharper over time, but only because I’m continuously curating what it knows and challenging what it assumes. The memory system is a mirror of my own thinking about the project — if my thinking is sloppy, the memory is sloppy, and every future session inherits that sloppiness.
Guardrails and permission systems close the loop. You define what the AI can do autonomously and what requires human approval. File edits, terminal commands, git operations, external API calls — each has a permission level. The AI proposes, the human approves. Every destructive action — deleting files, force-pushing branches, modifying shared infrastructure — requires explicit confirmation. This isn’t just safety engineering. It’s a forcing function for intentionality. You can’t sleepwalk through a workflow that asks you to approve every consequential action.
The pattern across all of these is the same. The AI handles the implementation noise — the boilerplate, the syntax, the configuration, the repetitive plumbing. What’s left for the human is pure decision-making. Product architecture. System design. Integration logic. Quality judgement. The thinking that actually matters.
Agentic AI strips away the noise. What’s left is pure decision-making — the thinking that actually matters.
The role that didn’t exist five years ago
This is already reshaping what technical roles look like. We’re hiring for a position right now that sits at the intersection of product architecture, automation engineering, and communication. The person we need is equally comfortable building complex workflows and explaining them to non-technical founders. They architect integrations between CRMs, payment systems, and project management tools. They design automation systems that transform how businesses operate. And they document everything clearly enough that someone else can maintain it.
Five years ago, that was three separate roles. Today, agentic AI makes it one — because the implementation bottleneck has moved. The scarce resource is no longer someone who can write the code or configure the platform. It’s someone who can think clearly enough to direct the work, verify the outcomes, and explain the decisions to a room full of people who don’t share their technical vocabulary.
That’s the shift the research is pointing toward without quite naming it. The cognitive risks of passive AI use are real and documented. But they describe one mode of interaction — the chat-box mode — and that mode is already being overtaken by something that demands sharper thinking, clearer communication, and deeper domain expertise from the human in the loop.
The businesses and professionals who will thrive aren’t the ones with the best AI tools. They’re the ones who can think clearly enough to direct them.
