Nate B. Jones — AI News & Strategy Daily · Zeitgeist

About #

20-year product leader and AI strategist. Former Head of Product, Amazon Prime Video. Daily AI briefings across YouTube, Substack, and podcast. ~450k+ followers across platforms.

2026-06-26 — I Built an Open Engine That Connects Claude, ChatGPT, and Codex Together #

YouTube/Substack · YouTube · Read

The bottleneck in modern AI workflows isn’t model capability — it’s the integration layer: humans manually shuttle work between Claude, Codex, ChatGPT, maintaining context across each handoff (“the transcript commutes while you wait”).
Open Engine is a copy-paste handoff framework (no API engineering required) — a seven-part task record that preserves source decisions, visible constraints, an audit trail, and “receipts” (evidence of completion), so each subsequent agent inherits full context rather than starting blind.
The “one-loop audit” reframes handoff friction: instead of asking “is this automatable?” ask “is the handoff structure good enough for an agent to claim this work?” — the bottleneck is specification quality, not capability.
Practical takeaway: build task records that outlive individual model sessions; the Open Engine’s handoff structure is the infrastructure layer beneath the model choice.

2026-06-24 — I Stopped Prompting AI One Task At A Time. This Works Better. #

YouTube/Podcast · YouTube · Podcast · Read

Identifies the invisible work between tasks — remembering, connecting, following up across email, Slack, calendar — as the integration load that lives in your head, not in any tool. AI handles the tasks; no AI handles the transitions.
Defines an AI “loop” as a recurring job with built-in memory, information sources, safe actions, and clear scope boundaries — distinct from an agent (autonomous, open-ended) or a prompt (one-shot). A “loop of loops” notices when changes in one area affect another.
The beginner-safe implementation principle: loops that draft outputs but pause before sending — human approval stays in the chain until the loop has proven itself across enough cycles to trust automation.
Practical takeaway: the five-question framework for turning a messy recurring obligation into a loop focuses on what the job notices and remembers, not just what it does — the memory and trigger design are the hard part.

2026-06-23 — The Doing Got Cheap. Now What? | Claude Fable 5 Changes Work #

YouTube/Podcast/Substack · YouTube · Podcast · Read

Claude Fable 5’s “detailed task imagination” capability — the ability to specify substantial work rather than individual prompts — shifts the bottleneck from execution to specification: “the doing got cheap; now the thinking about what to do is expensive again.”
The nine-field task specification format (covered in the Substack guide) structures complete job delegation: scope, constraints, success criteria, failure modes, artifacts, dependencies, timeline, review triggers, and handoff format. Benchmark scores matter less than the delegation contract.
The review queue becomes the new constraint: when a model can absorb whole jobs, the bottleneck shifts to the human capacity to review completed work rather than generate it — management of AI output queues is the emergent skill.
Practical takeaway: restructure work around complete jobs delegated upfront rather than iterative prompt-response sequences; the nine-field format converts vague instructions into agent-claimable specifications.

2026-06-22 — Why Anthropic Actually Won the Month (Yes, Really) #

Podcast · ~8 min · Podcast

Competitive analysis framed around talent movement rather than benchmark comparisons — argues that who moves where reveals more about organisational trajectory than model release scores.
The “recursive self-improvement” signal: talent flowing toward Anthropic’s safety and interpretability teams is read as evidence of where the community believes the next capability ceiling will be addressed, not just where pay is highest.
Practical takeaway: watch talent migration across AI labs as a leading indicator of technical direction — it captures bets that benchmark announcements obscure.

2026-06-21 — Every AI Agent Needs an Owner #

Podcast · ~14 min · Podcast

Production agents fail not through model degradation but through context drift — expanding tool access, broadening scope, accumulating edge-case prompts — until the agent no longer reliably does the original job.
Seven maintenance components that deteriorate: job definition, contextual diet, memory systems, tool access, scope boundaries, validation methods, and measured value delivery. Any one drifting silently causes failure.
“Agent maintenance is the grown-up AI skill for 2026” — the organisations deploying reliable agents are those running scheduled maintenance reviews that audit each component and remove rather than add.
Practical takeaway: assign named human owners to production agents who run periodic audits against the original job definition — the Vercel sales agent case study (removing 80% of tools → better reliability) is the canonical example.

2026-06-19 — Your AI skills are leaving your hands. Here’s how to own them. #

YouTube/Substack/Podcast · ~17 min · YouTube · Read · Podcast

Three-tier distinction between prompt, memory, and skill: prompts and memory travel between AI tools; skills remain trapped in proprietary platform formats — Claude skills don’t transfer to Codex and vice versa.
“Ownership vs. rental”: skills embedded in platform-native workflows become career capital you must rebuild from scratch every time you switch tools; skills documented as portable procedures (SKILL.md, MCP configs) remain yours.
The One-Question Test for a genuinely owned workflow: “Is it visible, movable, inspectable, testable, and available wherever I work?” Platform-embedded chat history fails all five criteria; an MCP-exportable procedure passes.
Practical takeaway: document operating procedures outside your AI platform — in SKILL.md files or MCP-compatible config formats — rather than relying on chat memory or platform-specific skill libraries.

2026-06-17 — Vercel deleted 80% of its agent’s tools and the agent got better. #

YouTube/Substack · YouTube · Read

Vercel’s sales agent case study: removing 80% of its available tools improved reliability — constraint and curation produce better agents than capability accumulation.
Frames agent maintenance as a continuous discipline analogous to physical systems maintenance: agents drift not through model changes but through expanding context and tool access.
Identifies seven components that deteriorate over time: job definition, contextual diet, memory systems, tool access, scope boundaries, validation methods, and measured value delivery.
Practical takeaway: schedule regular “agent maintenance” reviews that audit each of the seven components and remove rather than add — drift toward complexity is the failure mode, not drift toward simplicity.

2026-06-15 — The Harness Is the Business: Inside the OpenAI and Anthropic IPO Bet #

Podcast/Substack · ~11 min · Podcast · Read

Deconstructs OpenAI’s IPO valuation through four competing narratives: software company (recurring revenue), utility (indispensable infrastructure), infrastructure provider (compute layer), and deployment specialist — each narrative implies different multiples and different failure modes.
“The hardest part of the AI market may be installing intelligence inside real organizations” — the IPO framing bets on deployment capability, not model capability, which is the harder thing to replicate.
The Executive Briefing angle: cheap intelligence (commodity inference) is not the same as effective intelligence deployment; the organisational “harness” surrounding models — integration, governance, workflow redesign — is the actual constraint on value capture.
Practical takeaway: enterprises evaluating AI investments should assess deployment infrastructure (the harness) separately from model capability; intelligence abundance is already here, deployment capacity is the variable.

2026-06-11 — Fable 5 is here — but who is it for? [Short] #

YouTube Short · YouTube

Quick assessment of whether Fable 5 justifies the hype for professional workflows; promises a full review Saturday covering model capabilities and real-world applications.
Framing positions Fable 5 as a question of audience fit, not raw capability — “who is it for” rather than “how good is it.”

2026-06-10 — Claude vs. Codex isn’t about code. It’s about whether you steer or dispatch. #

YouTube/Podcast · ~16 min · YouTube · Podcast

Core argument: the choice between Claude Code and Codex is not a capability comparison — it’s a philosophical question about how you want to work with agents. Claude trains a steering model (you stay close, redirect, and supervise); Codex trains a dispatching model (you write clear specifications upfront and demand verifiable outputs).
The steering/dispatch split changes what you reach for when problems arise: Claude Code users escalate through dialogue; Codex users improve their spec and re-dispatch. Neither is universally better — the choice depends on task ambiguity and your tolerance for mid-task intervention.
Names “agent literacy” (knowing when to steer vs. dispatch) as the critical professional skill for 2026 — more important than model selection.
Practical takeaway: match your working style to the tool’s philosophy before evaluating output quality; mismatched philosophy produces frustration that gets attributed to model capability.

2026-06-09 — Fix your operating model or lose at AI [Short] #

YouTube Short · YouTube

Companies viewing high token costs as evidence that “AI doesn’t work” are misdiagnosing the problem — the issue is operations not redesigned around agents.
Reframes the token cost question: not a verdict on AI viability but a signal that the operating model needs restructuring to route agent work to high-ROI tasks.

2026-06-08 — Beyond The Hype: Why Meta And Block Are Firing People #

YouTube · ~14 min · YouTube

Decodes different layoff categories across tech companies: hyperscaler GPU spending reallocation (Meta), visionary strategic pivots (Block), and operational restructuring. Argues the layoff motivation is legible if you look at the capital allocation pattern, not the press release.
Practical framework for reading workforce decisions as strategy signals — the layoff type reveals whether the company is substituting AI for labour, investing in AI infrastructure, or executing a product strategy change.

2026-06-07 — Executive Briefing: Uber Burned Its Entire AI Budget Early #

Substack · Read

Uber depleting its AI token budget ahead of schedule is a failure of budgeting model, not of AI economics. When AI becomes embedded operational labour rather than a purchased tool, seat-based licensing and fixed AI line items structurally undercount actual usage.
Framework: companies must shift to understanding delegated intelligence cost — what did you ask AI to do, and did it produce customer value? The token-per-outcome metric, not token-per-seat, is the right unit for AI budget governance.

2026-06-05 — You can’t trust one token number across your tools #

Substack · Read

Token counts are meaningless without outcome context — the same token spend might be productive investment, learning overhead, or pure waste depending on what was accomplished.
Guides building a dashboard that tracks token spend across Codex, Claude, and ChatGPT alongside work completed, enabling teams to classify spend into three buckets: productive, exploratory, and waste. The classification enables budget governance without cutting productive AI usage.
Takeaway: the monitoring infrastructure for AI-as-labour requires the same outcome tracking you’d apply to any other operational cost centre.

2026-06-04 — Don’t let your AI output go to waste [Short] #

YouTube Short · Watch

AI output needs to be directed into a system or it disappears — the failure mode is generating good AI output with no downstream capture mechanism.
Framing: the bottleneck in AI-assisted work is not generation quality but output routing and retention.

2026-06-03 — Opus 4.8 Won Our Benchmark. I Still Wouldn’t Use It For Everything. #

YouTube · Podcast · YouTube · Substack

Opus 4.8 scored 81 in Jones’s practitioner benchmark suite (GPT-5.5: 71; Opus 4.7: 54). Excels at source discipline, operational judgment, canary handling, provenance, and self-correction. Weaknesses: visualisation and front-end tasks.
Andon Labs finding: Opus 4.8 on max effort performed worse than Opus 4.8 on high effort, and both performed worse than Opus 4.7 on long-horizon business benchmarks — maximum reasoning effort is not a monotonic improvement lever.
Nine-factor model routing framework: task type and duration; source material requirements; tool integration; artifact inspection; state preservation; supervision demands; uncertainty handling; failure costs; visual/front-end requirements. Route to Codex or GPT-5.5 for certain long-running workflows despite Opus 4.8’s benchmark lead.

2026-06-03 — AI didn’t fix your meetings, it broke your team size [Short] #

YouTube Short · Watch

AI tools increase individual output capacity, which means the optimal meeting size should decrease — the same meeting room now represents more total cognitive capacity than before.
Organisations running the same meeting cadence and group sizes as pre-AI are leaving leverage on the table.

2026-06-03 — AI didn’t fix your meetings, it broke them [Short] #

YouTube Short · Watch

AI-generated pre-reads and summaries allow participants to come to meetings already up to speed — but meetings designed for information transfer (the majority) become redundant, not better.
The meeting format that survives AI: decision and alignment sessions only. All other meeting types should be replaced by asynchronous AI-assisted workflows.

2026-06-01 — Why I’m moving this Substack from daily coverage to deeper weekly work #

Substack · Article

Nate announces a format shift from daily AI briefings to three weekly deep-dive pieces: comprehensive analysis of major developments, practical build guides, and executive briefings.
Rationale: AI models and tools are now widely available; the real challenge has shifted to understanding what to build and developing genuine fluency rather than surface-level awareness. Daily coverage no longer provides the synthesis value it once did.
Signals a broader maturation of AI practitioner media: the “breaking news” cadence served the 2023–2025 discovery phase; 2026’s challenge is depth and application, not awareness.

2026-06-01 — The death of traditional databases [Short] #

YouTube · YouTube

Enterprise data platform transformation: “trillion-token organisational context” as competitive advantage. The insight is that enterprises with large, well-structured internal knowledge bases are the ones that extract disproportionate value from LLM agents.
RAG at scale has limitations that become visible only when the context volume exceeds what traditional retrieval architectures can handle — the “death of traditional databases” framing is about knowledge architecture, not storage infrastructure.

2026-06-01 — This is how AI agents actually take over enterprises [Short] #

YouTube · YouTube

Analysis of OpenAI’s enterprise strategy vs. Anthropic’s competitive positioning around organisational context utilisation and enterprise software lock-in mechanisms.
Key insight: enterprise AI adoption is not primarily a capability race — it is a data lock-in and workflow integration race. Whoever owns the organisational context owns the agent value.

2026-06-02 — Why your meetings are actually destroying your output [Short] #

YouTube · YouTube

“AI raised coordination costs by the same order as output” — the structural unit of the AI era is the five-person strike team, not the large coordinated department.
The argument: AI collapses the time cost of individual output, but coordination overhead scales with headcount regardless. Meeting overhead amplifies existing team size problems; the solution is structural reduction in coordination surface, not better meeting facilitation.

2026-06-02 — Is your AI team actually efficient? [Short] #

YouTube · YouTube

Addresses misconceptions about AI team structures: the opportunity is “expanding ambition, not shrinking headcount.” Efficient AI teams are not smaller teams doing the same work — they are the same-sized teams attempting work that was previously impossible.
Pairs with the format-shift announcement: Nate’s strategic positioning is increasingly executive-framing rather than practitioner-tactics.

2026-05-31 — Prove Your Value at Work in the AI Era: Judgment Artifacts #

Podcast/Substack · ~10 min · Podcast

Core thesis: AI has eroded traditional competence signals — polished documents and prototypes no longer demonstrate judgment because AI can produce them without the underlying understanding.
The replacement signal: “portable judgment evidence” — making invisible decision-making visible through whiteboard-style conversations, situation-decision-risk frameworks, and documented reasoning traces.
Distinction between deliverable and judgment: AI automates the deliverable; human value lies in the judgment that determined what deliverable to produce. Hiring and career advancement must assess the latter.

2026-05-29 — Product Management When Software Creation Is Cheap #

YouTube/Substack · YouTube · Article

Core thesis: the cost of a first software version has collapsed, shifting the PM job from rationing scarce engineering capacity to classifying an abundance of rapidly-built tools. Microsoft’s 1M+ Power Platform assets are the canonical case study — half-real tools nobody owns, spreading into systems of record.
Introduces a four-state classification ladder: personal tool → team beta → supported internal product → customer-facing product. Specific user-count and risk thresholds gate promotion between levels.
Key new PM skill: identifying demotion triggers — recognising when a supported tool no longer justifies maintenance costs. “Supported” is not a permanent state.
Practical takeaway: two ready-to-use prompts for classifying employee-built tools into their actual production tier and auditing existing tools for demotion eligibility. The PM job in the era of cheap software is governance, not allocation.

2026-05-28 — Agent Product Analytics: What Your Dashboard Can’t See #

YouTube/Substack · YouTube · Article

Frame: standard dashboards show green metrics (active users, long sessions, chat messages) while missing critical failures inside agent runs — exemplified by a Cursor agent deleting a production database in nine seconds without triggering any alerts.
The unit of product behaviour is shifting from the session to the agent run. Analytics must now track: what work users delegate, what tools agents access, what boundaries they hit, and how often users correct them.
Agent systems compress traditional feedback cycles from weeks to minutes, enabling mid-flight course correction — but only if proper instrumentation exists. “Speed is the engine. Analytics is the rudder.”
Most teams classify agent analytics as engineering telemetry, not product analytics — explaining why runs “go fast in the wrong direction” without steering mechanisms.
Practical takeaway: build analytics around three categories: agent events (replacing clicks), completed vs. trusted tasks (a key distinction), and workflow autonomy earned through user acceptance patterns.

2026-05-28 — Shorts: Claude AI Prompting + Why People Switch to Claude #

YouTube · Shorts · The ultimate Claude AI prompting trick · Why millions are switching to Claude

Two short-form pieces reinforcing the Claude interaction model theme: (1) the constitutional AI framing makes Claude measurably more likely to identify flaws in a plan; (2) switching to Claude requires a mental model shift, not just a tool swap.
Consistent with the longer 2026-05-27 piece on Claude’s distinct interaction design — these appear to be distribution cuts from that content.

2026-05-27 — Claude Interaction Model: Two Shorts #

YouTube · Shorts · Why you’re using Claude completely wrong · The mistake everyone makes switching to Claude

Claude’s constitutional AI training makes it measurably more likely to identify flaws in a plan — users who treat Claude like ChatGPT miss the distinct interaction model and get worse outputs.
The core behavioural shift: describe your situation (context + constraints) rather than prescribing the output you want — Claude responds to framing, not to instruction-following, as its primary mode.
Interaction habit gaps compound: small misalignments between how you prompt and how the model was trained widen into large capability gaps over weeks of use.
Practical takeaway: before switching to a new model, study its interaction design — treat it as onboarding to a new colleague’s working style, not swapping one text interface for another.

2026-05-26 — Public AI Work: How Teams Actually Learn From AI #

Podcast/Substack · ~16 min · Article · YouTube

AI work in private chats is invisible to the organisation — it cannot be learned from, replicated, or scaled. Visibility is the precondition for institutional AI learning.
Key case study: Shopify’s “River” workflow routes agent work through public Slack channels, creating real-time apprenticeship infrastructure where colleagues observe AI sessions as they happen.
The “apprenticeship gap”: teams using private AI chats widen skill gaps between individuals; teams using public-facing AI workflows narrow them by making tacit knowledge visible.
Nate provides three concrete prompts for capturing and sharing AI sessions as institutional knowledge assets — turning individual productivity gains into organisational capital.
Practical takeaway: move AI work into shared spaces before optimising prompts. Organisational leverage comes from visibility, not from personal prompt quality.

2026-05-25 — AI Agents Create a Hidden Platform Team Bottleneck #

Podcast/Substack · ~46 min · Article · YouTube

AI agents accelerate application teams 10× but platform/infrastructure teams receive no corresponding headcount — a structural bottleneck that compounds as agent adoption scales.
Agents behave adversarially toward infrastructure not by design but because they generate work volumes the infrastructure was never dimensioned for. Based on interview with Emma (OpenAI data platform).
Application and platform teams accelerate at different rates under AI adoption; the gap compounds unless platform teams proactively build control layers and evaluation frameworks.
Teams must build private eval suites capable of testing agent behaviour across model upgrades — each new model version can silently change agent behaviour at scale.
Practical takeaway: first infrastructure investment for scaling AI agents should be platform team capacity and eval tooling, not more application-layer agent features.

2026-05-24 — Why Big Tech Now Runs an AI Factory #

Podcast/Substack · ~23 min · Article · YouTube

AI is no longer a software business — it is an industrial supply operation constrained by physical manufacturing capacity (HBM chips, packaging lines), not code.
Microsoft plans to spend ~$190B in 2026 on AI infrastructure and still expects capacity shortfalls. Vendor agreements written as pure software contracts do not account for physical supply risk.
Key framework: shift from seat-based budgeting to token forecasting; treat AI capacity like a commodity with supply risk, not a SaaS subscription.
HBM bottlenecks and chip packaging complexity are the near-term constraints affecting availability SLAs — enterprise contracts with no supply assurance clauses are exposed.
Practical takeaway: renegotiate AI vendor contracts to include supply assurance clauses, utilisation discipline, and capacity reservation. Standard SaaS terms leave organisations exposed in a crunch.

2026-05-24–26 — Mini-series: Platform-Agnostic AI Memory Architecture #

YouTube Shorts · Why switching AI models is now impossible · How to build a 10-cent AI brain · Why you should never trust ChatGPT’s memory · Are AI Agents Actually Boosting Productivity?

AI platform memory systems are isolated silos — context built in ChatGPT cannot transfer to Claude, Gemini, or custom agents. Vendor lock-in through memory fragmentation is a structural risk, not a feature gap.
Solution architecture: Postgres with vector embeddings as a model-agnostic memory layer, accessible via MCP servers across tools. Costs 10–30 cents/month; the infrastructure barrier to persistent cross-tool AI memory is essentially zero.
The real productivity gap is not task completion speed — it is accumulated context. Agents that retain six months of context compound advantages exponentially; session-reset agents restart from zero every time.
Switching cost is not technical (APIs are similar) — it is contextual. Teams that build model-agnostic context architectures gain a structural advantage that grows over time.
Practical takeaway: build memory architecture outside the AI platform using open infrastructure (Postgres + vectors). Context portability decisions made now determine model flexibility for years.

2026-05-23 — Claude’s AI Town Voted Yes On Everything #

YouTube · YouTube

Analysis of Emergence AI’s 15-day virtual town experiment: five AI models in a simulated social environment. Claude’s behaviour was anomalous — it voted affirmatively on everything, a pattern Nate reads as alignment training creating measurable behavioural signatures in multi-agent social contexts.
Core structural insight: “the harness, not the model, does the heavy lifting” — long-running agent deployments require orchestration infrastructure (memory, goal persistence, re-prompting cadence) to remain coherent; the model alone cannot sustain goal-directed behaviour over 15 days.
Harness design quality determines outcome quality more than model selection in extended multi-agent deployments.
Practical takeaway: when evaluating multi-agent architectures, benchmark the harness separately from the model. Harness quality is the dominant variable in extended deployments.

2026-05-22 — Build the Room Before You Write the Memo #

Substack · Article

When AI produces a mediocre draft, the problem is almost never the prompt — it is the quality and organisation of source materials fed to the model.
Framework: treat AI generation as a function of input quality; organising documents, removing noise, and structuring sources before prompting is the highest-leverage intervention available.
The memo analogy: you wouldn’t write a memo without a brief; similarly, don’t generate content without a structured source folder — the model’s output quality is bounded by its inputs.
Practical takeaway: invest prep time in source organisation before generation. This returns better outputs than prompt engineering applied to a messy input set.

2026-05-21 — MIT Says Half Your AI Gains Come From How You Ask. Not the Model. #

Podcast/Substack · Article

Core reframe: the bottleneck for AI productivity is not model capability but the quality of the assignment. Generic AI output reflects weak briefs, not weak models. Framing prompt-writing as “briefing” rather than “prompting” — you’re assigning work to a senior partner, not typing into a search box.
The “six-field brief” template: goal, context, constraints, quality standards, format, and autonomy level. Providing all six transforms extended agent work from vague to actionable; skipping any field transfers ambiguity back to the model.
Unexpected side effect: improving AI briefing discipline improves communication with human colleagues — the same clarity that makes AI output useful makes management clearer. The skill generalises.
Practical takeaway: treat every AI assignment failure as a brief-quality failure first before attributing it to model capability. The model is rarely the bottleneck.

2026-05-20 — I Asked Seven Questions About Our AI Agent. We Failed Five. #

Podcast/Substack · Article

Seven control-layer questions that determine whether an AI agent ships to production: where does it reside, what state does it remember, who does it act for, when is approval required, what are spending limits, what’s the kill switch, and what audit trail exists. Most teams can answer two.
The “control layer” is the infrastructure sitting between models and production systems — runtime, identity, payments, state, approval flows. Companies like Cloudflare, Okta, Stripe, and Datadog are becoming AI-era gatekeepers by providing this missing governance layer.
Practical diagnostic: run the seven questions on any agent proposal before committing to build. If five fail, the agent isn’t ready — the infrastructure isn’t ready.
Cross-column note: the control layer framework maps directly to the Claude Compliance API launch (May 21) — Anthropic is providing the governance layer for Claude Enterprise that Nate identifies as the critical missing piece for production agents.

2026-05-18 — Marketing for Humans and AI Agents in 2026 #

Podcast/Substack · YouTube · Article · YouTube

Core reframe: B2B marketing now must serve two simultaneous audiences — humans (persuasion logic) and AI agents performing vendor research (legibility/verifiability logic). 69% of software buyers chose different vendors based on chatbot recommendations; one-third selected previously unknown companies.
The “Truth Layer” concept: marketing becomes the steward of claims-evidence mapping, not just communications. Overstated AI capabilities create “trust-debt” that agents surface faster than traditional fact-checking; the AI-washing enforcement wave (SEC class actions) is the downstream consequence.
“Make More Stuff” trap: AI-driven content velocity is a commodity play that diminishes brand value and misses the structural shift. The strategic response is positioning marketing to touch product strategy, not just production throughput.
Practical diagnostic: 3 diagnostics for what an AI agent “sees” when evaluating your company — structured, verifiable claims audit before agent-mediated buyers encounter inconsistencies.
Career signal: assess whether leadership understands the two-audience model before taking a marketing role; reposition marketing careers toward claims-evidence governance rather than content production.

2026-05-17 — Stop asking if AI can do this. Start asking what shape the work is. #

Substack · Podcast · Article

Core reframe: “can AI do this?” is the wrong investment gate. The right question is “what shape is this work?” — workflow structure determines whether to automate, build, buy, hire, or wait, not model capability.
Six-dimension classification framework: repetition frequency, cost of errors, judgment requirements, model maturity trajectory, market solution availability, company specificity. Two-axis decision matrix maps market maturity vs company specificity to five investment motions.
Warning: 40%+ of agentic AI projects forecast to be cancelled by end of 2027 due to cost, unclear value, or inadequate controls — most stem from committing capital before classifying work shape.
Practical takeaway: score your workflow against the six dimensions before committing budget; use four diagnostic prompts (decomposer, scorer, pressure-test, describability gate) to route capital to the correct motion.

2026-05-16 — Claude Recovered $400K in Bitcoin. That’s Not Even the Big Story. #

Podcast · Podcast

Five developments covered: Notion’s transformation into an agent platform; Claude usage limits destabilising subscription models; Anthropic surpassing OpenAI on business customer metrics; Mythos and GPT 5.5 advancing AI cybersecurity; emerging challenges in agent pricing, security posture, and AI stack selection.
The actual big story: not the Bitcoin recovery (a dramatic but isolated demonstration) but the hard operational choices now facing organisations — which AI stack to commit to, how to price agentic work differently from SaaS, how to secure systems handling autonomous decisions.
Practical takeaway: real workflow leverage requires moving beyond model announcements to deployment architecture, agent governance, and commercial unit redesign — these are the variables that determine whether agents create value.

2026-05-16 — Exclusive: a conversation with Tibo from Codex on what your company has to become when the model can actually do the work #

Substack · Article

Core argument: AI capability has shifted the bottleneck from whether models can do technical work to where human judgment sits within organisations. “The question of where human judgment lives inside a company stops being a developer question and starts being a leadership one.”
Two organisational failure modes: over-restriction (agents rendered useless) and under-restriction (board-level incidents). Competitive advantage comes from “the quiet work of building the five layers” — unremarkable initially, but creating operational separation from competitors who skip governance.
Practical takeaway: architect human oversight structures across multiple leadership functions, not concentrated in technical teams alone — governance is a leadership design problem, not a tooling problem.

2026-05-15 — The 2 prompts I’d run before any 2026 SaaS renewal (especially if you’re deploying agents) #

Substack · Article

Seat-based SaaS pricing is shifting: vendors are wrapping traditional per-user licenses in usage meters for agent-delegated work. “The seat is not dead. It is being wrapped in a meter for delegated work.” Salesforce agent revenue nearly doubled QoQ ($540M → $800M); Microsoft adds a $15/user agent governance license on top of the $30 Copilot seat.
Analyses eight vendors — Salesforce, Microsoft, SAP, ServiceNow, Workday, Zendesk, HubSpot, Atlassian — each building agent pricing layers atop existing seat models differently.
Critical timing warning: once agents embed into workflows and support metrics, vendor negotiating power increases sharply — turning off proven systems becomes operationally painful. Negotiate before deployment, not after.
Practical takeaway: run two diagnostic prompts before renewal — one mapping which systems AI agents will touch, one framing the CFO conversation about total cost of AI-augmented workflows.

2026-05-14 — 95% of AI pilots never reach production. The implementation audit that finds out why before your next budget cycle #

Substack · Article

Core argument: the strategic moat in enterprise AI is not model access but implementation architecture — the technical and operational infrastructure that transforms demos into production workflows handling real business processes. “95% of AI pilots never reach production” because companies confuse the two.
Identifies a mid-market opportunity: companies with real workflow complexity but insufficient internal engineering to operationalise AI — where major players (Anthropic, OpenAI, private equity) are now investing in deployment services.
Frames the diagnostic question as: does your AI product “own a workflow or decorate a model?” — i.e., does it have a specific role in a specific workflow with the right data, permissions, review process, and success metric, or is it an impressive internal showcase?
Includes an implementation architecture audit tool, promised to score readiness across six components before a budget cycle.

2026-05-13 — Your AI agent is rediscovering 85% of its context every run. Here’s the architecture fix #

Substack · Podcast · Article · Podcast

Argues that production agents fail not because vector search is flawed, but because they lack proper context assembly before acting. Classic RAG finds semantically similar text; the problem is assembling what the agent actually needs at runtime — current records, user permissions, active policies, decision trails.
Proposes a “knowledge layer” framing broader than RAG: encompasses retrieval, document structure, semantic data models, access control, provenance, and memory — vector search becomes one component in this architecture, not the core solution.
Failure pattern without the knowledge layer: agents improvise on missing context, producing wrong refunds, stale policies, outdated metrics, and excessive token waste. The “85% rediscovery” waste is structural, not a prompting problem.
Delivers practical artefacts: retrieval contracts (defining what the agent is guaranteed to receive), failure triage frameworks, and architecture decision records for teams implementing knowledge systems.

2026-05-12 — While Execs Panic, This Skill Gets Rare #

YouTube (short-form) · YouTube

Revisits the capability-adoption gap as the core opportunity: regulatory, organisational, cultural, and trust inertia slow AI integration faster than capability development advances.
Uses Shopify’s integration timeline collapse as a concrete data point: the window between capability and broad adoption is compressing, concentrating asymmetric returns on early movers.

2026-05-11 — Your AI Agent Doesn’t Need A Better Prompt. It Needs A Judge. #

YouTube/Podcast · Substack · YouTube · Podcast · Substack

Frames the core production agent problem: chat demos exist in “suggestion space” (rejection is free), but agents with real tool access — send emails, update records, spend money — need architectural guardrails, not better prompts.
Root cause of standard controls failing: a single model cannot simultaneously pursue a task and police itself; approval modals either cause habituation (ignored) or abandonment.
Architectural solution: a separate “judge” layer — a distinct component evaluating whether proposed actions should execute, placed at action boundaries and built in from the start, not retrofitted.
Judge toolkit: action classification, proposal generation, specialist judges for high-risk boundaries, evaluation mechanisms, and durable memory governance persisting context across sessions.
Implementation path: start with highest-risk action boundaries using structured prompts + provenance tracking, so the judge can reference prior decisions over time.

2026-05-10 — Anthropic And OpenAI Just Admitted The Model Isn’t Enough #

YouTube · YouTube

Analyses a McKinsey platform security incident as an organisational design failure, not a technical one — the model behaved as intended; the procurement and integration process failed to account for agent/human boundary distinctions.
Key directive: bring developers into procurement decisions before contracts are signed; security calculus changes when agents (not just human users) are platform actors.
Both Anthropic and OpenAI framing implicitly concedes that model capability alone cannot guarantee safe deployment — system-level architecture is the missing layer.

2026-05-09 — Frontier vs Comfortable: Where Do You Actually Sit? #

YouTube · YouTube

Both doomer and boomer AI narratives miss the speed dynamics — the real opportunity lies in the gap between capability development and societal adoption.
Asymmetric returns accrue to those building AI fluency now: regulatory, organisational, cultural, and trust inertia are slowing integration faster than technical development is advancing.

2026-05-08 — 271 Vulnerabilities: What Mozilla’s AI Found Changes Everything #

YouTube/Podcast · ~30 min · Podcast · YouTube · Substack

Mozilla’s Mythos (built on Anthropic tooling) identified 271 security vulnerabilities in Firefox — a 12× increase over previous manual scans; zero written by a human attacker.
Core argument: reliance on human code authorship as a security trust anchor is becoming obsolete. AI-generated code verified through adversarial machine review is approaching the reliability of trusted human code.
Organisations have a narrow window to improve code interpretability before the trust assumption fully flips — connects directly to the comprehension debt theme: code that passes all tests but that no human understands is also code that no human can verify as secure.
Practical implication: security teams need to shift from “who wrote this code” to “how can this code be adversarially reviewed at scale.”

2026-05-08 — While Markets Panic, This Happens #

YouTube · YouTube

Short-form exploration of the capability-adoption gap as an opportunity window: regulatory, organisational, cultural, and trust inertia all slow AI integration significantly faster than capability development.
Market panic is a distraction from the real signal — the gap between what AI can do and what organisations have integrated is widening, not closing.

2026-05-07 — Your AI Agent Is Locked To One Model. OpenClaw Just Killed That. #

YouTube/Podcast · ~25 min · Podcast · YouTube · Substack

OpenClaw evolved from a chatbot wrapper into a runtime abstraction layer — agents can now swap the underlying model between tasks without redesigning the workflow.
Strategic insight: memory and state management become the durable competitive advantage, not the specific model selected. Workflows built on OpenClaw remain portable across provider changes.
Design implication for enterprise: build workflows that treat model selection as a runtime parameter, not an architectural commitment. The model is ephemeral; the memory and permissions layer is what compounds.

2026-05-07 — 16 Million Fake Accounts Stealing AI Capabilities #

YouTube · YouTube

Automated model capability extraction via systematic API usage — the “off-manifold probe” concept: probing regions of a model’s capability space not reached by normal usage to extract frontier behaviour.
Performance gaps between frontier and distilled models are predictable from provenance — production systems need model provenance tracking, not just performance benchmarks.

2026-05-06 — Your AI Fails At Real Work. The Model Isn’t Why. #

YouTube/Podcast · ~23 min · Podcast · YouTube · Substack

Three-layer framework for AI agent integration: access (what the agent can reach), meaning (what actions signify in context), authority (who defines the semantics). Most agents have access; almost none have semantic depth.
“Access without meaning requires constant supervision.” The difference between Perplexity and Salesforce as agent platforms: Salesforce exposes actual business semantics; Perplexity gives access to information without organisational context.
The durable competitive advantage is not the best model — it’s the platform exposing the richest work semantics. This reaches the same conclusion as the OpenClaw episode from the opposite direction: model is ephemeral, semantics compound.

2026-05-06 — Nuclear Weapons vs AI: Which Is Actually Harder to Stop? #

YouTube · YouTube

Model capability extraction framed as a “Napster problem”: the economic ratio of extraction cost vs development cost is thousands-to-one in the attacker’s favour.
The nuclear analogy inverted: AI model capabilities are easier to copy than weapons-grade material because the signal is all-software and copies are perfect — connects to the Anthropic/OpenAI distillation controversy.

2026-05-05 — Consumer AI Has a Problem Nobody’s Naming #

YouTube/Podcast · ~32 min · Podcast · YouTube · Substack

The “anticipation gap”: current AI agents remain reactive — users must remember, translate tasks into prompts, and supervise results. The agent does not anticipate what you need next.
Permission ladder framework: read-only → notify + propose → execute with confirmation → autonomous. Most consumer products are stuck at levels 1–2; the gap to level 4 is the real retention and habit-formation frontier.
Why consumer AI retention is weak despite high initial engagement: reactive agents are useful but not habit-forming. Anticipatory agents would be both — but require the trust infrastructure (permission ladders) that most products haven’t built.

2026-05-05 — This Is Why Distilled Models Collapse #

YouTube · YouTube

Distilled models occupy “narrower capability manifolds” — they appear capable on benchmarks but fail on edge cases and agentic task compositions that define real production work.
Model provenance matters for production reliability, not just ethics: a distilled model trained on frontier model outputs may fail unpredictably on tasks the frontier model handled robustly.

2026-05-04 — AI’s ‘Thin Ice’ Moment: Is Your Job Already Gone? #

YouTube/Podcast · ~34 min · Podcast · YouTube · Substack

Job audit framework: categorise weekly work into Theater (visible but performative, easily replaceable), Commodity (routine AI-executable tasks), Leverage (human-in-the-loop tasks that amplify outcomes), Durable (relational, judgement-intensive work AI cannot replicate).
The “thin ice” argument: AI doesn’t need to replace entire roles to create vulnerability. Eliminating enough Commodity work creates instability during the next organisational disruption — roles that look secure today may not survive the next restructuring cycle.
Proactive audit: map your own weekly tasks before your organisation does. The window to reposition from Commodity to Leverage work is narrowing as AI capability expands.

2026-05-04 — AI Is Cheaper to Copy Than Create #

YouTube · YouTube

“$2 million in API costs can extract capabilities that cost $2 billion to develop” — the distillation economics argument that makes open-weight model competition structurally asymmetric.
Capability collapse in distilled models and provenance implications for production systems — direct context for the DeepSeek/Anthropic distillation controversy.

2026-05-03 — Stripe, Visa, Mastercard, Microsoft, Meta. All Building The Same Thing. #

YouTube · YouTube

Agentic commerce infrastructure thesis: payment authority is relocating from seller-controlled environments to buyer agents. Power is shifting from platforms that control purchase flows to agents that act on behalf of buyers.
Brand repositioning and fraud protection are the first casualties: when an agent makes purchase decisions, seller-controlled brand presentation and traditional fraud signals both become less effective.

2026-05-03 — The $60M AI Win That Wasn’t / AI Works Too Well at the Wrong Thing #

YouTube · Short 1 · Short 2

Klarna’s AI deployment automated work equivalent to 853 employees, saving $60M — but optimised for the wrong objectives. “74% of companies report no tangible value from AI” because they measure efficiency, not value.
The distinction: context engineering (what information the AI has) vs intent engineering (what the AI is being asked to achieve for the organisation). Klarna solved for cost reduction; whether customer value followed is the unresolved question.

2026-05-02 — Anthropic Might Buy Atlassian For $40B. Here’s Why It Makes Sense. #

YouTube · YouTube

Issue trackers (Linear, Jira, Atlassian) are becoming agent control infrastructure — they manage state, permissions, ownership tracking, and task routing that autonomous agents need to operate.
Tools built for human project management prove even more valuable to AI agents because agents need structured state management more than humans do. CRMs and service desks follow the same pattern.

2026-05-01 — The Buying Rule for Your Personal AI Computer #

YouTube/Podcast · ~33 min · Podcast · YouTube · Substack

Six-layer personal AI stack: Hardware → Runtime → Models → Memory → Applications → Workflows — the buying rule is to own what compounds in value for your specific work patterns, and rent frontier models (Claude, ChatGPT) as specialists.
The “$5,000 mistake” framing: avoid expensive hardware without clear use cases; the open-weight ecosystem (Llama, DeepSeek, Qwen) makes local inference practical, but only if your workflows actually require it.
Three concrete build profiles — knowledge worker, privacy maximalist, local developer — with routing maps to classify workflows as local, cloud, or hybrid.
“The deeper AI reaches into your work, the more valuable it becomes to own the substrate underneath” — the strategic case for local compute mirrors the enterprise sovereign AI argument at the individual scale.

2026-04-30 — Microsoft Is Testing Claude Against Its Own Copilot. Here’s Why. #

YouTube · YouTube

Microsoft is internally benchmarking Claude against Copilot — a signal that even the company that built Copilot (on GPT infrastructure) is evaluating alternatives for specific enterprise workloads.
The competitive dynamic: Microsoft’s OpenAI investment creates loyalty but not exclusivity — Anthropic’s enterprise push is landing inside the largest Microsoft accounts.
Practical implication: enterprise AI strategy is shifting from “pick a platform” to “route by task” — Claude for some workloads, Copilot for others, depending on where each model’s verifiable strengths land.

2026-04-30 — Salesforce Killed The Browser. Every Agent Runs Your CRM Now. #

Podcast · ~23 min · Podcast · Substack

Core argument: “The agent conversation stopped being about models two quarters ago. It is about infrastructure now.” Salesforce Headless 360 is named as the most important launch of the month — not for model quality but for data-fabric and workflow integration depth.
Five-question filter for evaluating agent launches: data accessibility, workflow integration, agent stacking capability, enterprise adoption potential, licence ROI — filters demos from deployments.
Routing guidance: Copilot, Perplexity, Claude, Salesforce for different task classes — the professional’s AI stack is a layered routing architecture, not a single-platform bet.
The infrastructure-over-model shift means enterprise tool selection criteria have fundamentally changed: benchmark performance is now a threshold condition, not a differentiator.

2026-04-30 — What to Do When Your Company’s AI Tool Is Bad at Your Job #

Podcast · ~25 min · Podcast · Substack

Corporate AI defaults (Copilot, etc.) frequently underperform for specific roles; complaints get dismissed as preference rather than performance data. The fix is reframing with measurable evidence: “Copilot is bad” is not actionable, but “the four-hour-a-week tax you’re paying because IT picked the wrong default” creates urgency through quantification.
One-job, one-week measurement: pick a recurring task, run it through both tools, log four data columns — the data is the argument, not the subjective frustration.
Three-altitude escalation: manager, CTO, and executive levels each require different reasoning — wrong altitude means identical requests fail regardless of evidence quality.
Practical takeaway: the barrier to AI tool change is political, not technical — evidence-based quantification plus altitude-matched messaging are the two mechanisms that actually move procurement decisions.

2026-04-27 — Apple Just Positioned Itself for the Next Trillion Dollars #

YouTube/Podcast/Substack · ~21 min · Podcast · YouTube · Substack

Apple’s elevation of hardware engineers to CEO (Ternus) and CHO (Srouji) is framed as a structural break — the company is changing which AI race it runs, not trying harder at cloud-based AI where it’s losing ground.
Core economic thesis: cloud inference economics are currently “subsidised” and unsustainable; on-device computing becomes defensible as those subsidies unwind — parallels Apple’s earlier move of computing off the mainframe in the 1970s.
Demand is already visible: law firms buying Mac Minis for compliance-driven local AI reveal appetite for on-device products that don’t yet exist at mainstream scale.
Leaders must evaluate infrastructure dependency: who controls the inference stack your organisation runs on matters increasingly as cloud subsidy models unwind.

2026-04-25 — Your Design Workflow Has Three Steps. ChatGPT Just Made It One. #

Podcast/Substack · ~26 min · Podcast · Substack

GPT-Image-2 is architecturally distinct from prior image models — it plans composition, searches the live web, and self-verifies before generating pixels, joining the reasoning stack that was previously text-only.
Scored 1,512 on Image Arena (242 points above competitors, largest recorded leap); seven previously non-viable creative workflows are now viable, including localised-at-launch campaigns, UI-spec-as-render-target, and coherent design systems from a single prompt.
Critical risk: “screenshots-as-proof just ended” — the model can cleanly forge pharmacy labels and Slack screenshots; trust/verification controls built on image authenticity need immediate review.
Role shift from execution to specification: product, design, engineering, and marketing roles all move toward spec and oversight functions — the article provides brand-system documents and red-team exercises per role.

2026-04-24 — Claude Design Just Killed the Mockup. Is Your Team Next? #

YouTube/Podcast · ~24 min · Podcast · YouTube

Claude Design is the third piece in a coordinated Anthropic stack (Claude Code + Cowork + Design) — not a standalone Figma replacement but the completion of an end-to-end build motion.
Core shift: the prototype is no longer an approximation of the product — it is the product. The mockup-to-production handoff that teams have used for twenty years is going extinct.
Role-by-role breakdown: PMs, designers, engineers, and founders each face different changes as the cost the mockup represented simply disappears.
Google Stitch is already responding with design.markdown — early signal of how the ecosystem is adapting to design-as-prompt workflows.
Leaders framing this as “Figma killer” are misreading it — the threat isn’t to a tool but to an entire workflow category.

2026-04-23 — Your Apps Don’t Need an API Anymore. Codex Just Proved It. #

YouTube/Podcast · ~21 min · Podcast · YouTube

OpenAI’s Codex desktop agent can operate Mac applications autonomously, bypassing the API layer entirely — a qualitative shift in how agents interact with software.
Contrasts Codex’s approach with Claude’s computer use: different architectural philosophies with distinct implications for enterprise integration and control.
Core argument: the ability to interact with software as a human does (UI-level) rather than via API represents a new competitive dimension, not just a convenience feature.
Practical takeaway: teams designing agent workflows around API-first assumptions may need to rethink integration strategy as UI-native agent operation matures.

2026-04-23 — Dark Factories vs Everyone Else: The Real AI Divide #

YouTube · ~short · YouTube

Surfaces a productivity paradox: most developers using AI tools are measurably slower despite faster tooling, while elite teams achieve fully autonomous code generation.
The divide is not tool access but process maturity — elite teams have restructured workflows around AI output, while mainstream teams add AI to existing habits.
Frames “dark factory” as a benchmark: autonomous, minimal-human-touch production pipelines that most orgs are not close to achieving.
Warning: treating AI as an individual productivity add-on rather than a workflow redesign will leave teams on the wrong side of the divide as the gap widens.

2026-04-23 — Karpathy’s Wiki vs. Open Brain. One Fails When You Need It Most. #

YouTube/Podcast · ~41 min · Podcast · YouTube

Contrasts two memory architecture philosophies: write-time compilation (pre-process knowledge into structured formats at ingestion) vs. query-time synthesis (derive answers dynamically at retrieval).
Write-time compilation delivers precision and token efficiency but is brittle when query intent deviates from pre-compiled assumptions; query-time synthesis is flexible but expensive and inconsistent.
Argues the choice is not aesthetic — it determines system behaviour under pressure, specifically when users need the system most (novel queries, edge cases).
Practical takeaway: pick the architecture that matches your failure tolerance, not your optimistic use-case; most enterprise systems are implicitly query-time and don’t know it.

2026-04-22 — Why Manual Testing Is Dead (This Architecture Proves It) #

YouTube · YouTube

Examines automated testing architectures and digital simulation environments that make traditional QA processes obsolete at AI development velocities.
Core claim: specification quality is now the binding constraint on software quality — testing catches what bad specs cause, not what bad code causes.
As code generation speed increases, the bottleneck moves permanently upstream to requirements and intent capture.
Practical takeaway: invest in specification tooling and review processes now; manual testing investment is largely wasted at current agent output speeds.

2026-04-21 — Your Prompts Didn’t Change. Opus 4.7 Did. #

YouTube/Podcast · ~52 min · Podcast · YouTube

Claude Opus 4.7 introduces improvements to persistence alongside a notable increase in literalism — prompts that previously worked through implication now require explicit instruction.
Benchmarks across enterprise knowledge work categories show meaningful gains; web research tasks show regressions.
Tokenizer changes affect cost-efficiency calculations — teams should revalidate their token budgets under Opus 4.7 rather than assuming continuity.
Practical takeaway: treat model upgrades as breaking changes for production prompts; regression-test before deploying, especially for tasks relying on model inference of intent.

2026-04-21 — AI Tools Got Faster But Developers Didn’t #

YouTube · YouTube

References studies showing experienced programmers took 19% longer on tasks when using AI tools, while believing themselves to be 24% faster — a confidence/performance inversion.
The gap is attributed to workflow friction, context-switching overhead, and over-reliance on AI output without adequate review.
Speed gains from AI tools are real at the individual task level but often negative at the workflow level due to integration costs and rework.
Practical takeaway: measure actual throughput including rework and review time, not perceived speed; the productivity dividend requires workflow redesign, not just tool adoption.

2026-04-20 — Nobody Knows What You’re Worth Anymore | The AI Job Market Reality #

YouTube/Podcast · ~21 min · Podcast · YouTube

Following 60,000 Q1 tech layoffs, the labour market can no longer price roles where AI makes production cost approach zero — output volume is no longer a differentiator.
Comprehension depth — the ability to understand, explain, and take accountability for AI-generated work — becomes the primary differentiator for human workers.
Working transparently (showing reasoning, creating comprehension artifacts) signals irreplaceable value in an environment where portfolios of AI-generated output are indistinguishable.
Practical takeaway: shift from accumulating deliverables to producing understanding artifacts; the market will pay for comprehension that AI cannot substitute.

2026-04-20 — Why Nothing Going Wrong Is Actually the Scariest Part #

YouTube · YouTube

Addresses the failure mode where autonomous agents execute instructions correctly but cause harm — the system worked as designed, but the design was wrong.
Structural alignment failures can be invisible in testing and emerge only in production at scale, especially when agents operate across trust boundaries.
Safety instructions embedded in prompts are insufficient; alignment must be built into architecture (constraints, oversight hooks, escalation paths).
Practical takeaway: the absence of visible errors is not a safety signal for autonomous agents — design for failure detectability, not just failure prevention.

2026-04-19 — Block Laid Off Half Its Company for AI. AI Can’t Do the Job. #

YouTube/Podcast · ~20 min · Podcast · YouTube

Examines world model implementations — AI systems designed to replace management judgment — across three distinct architectural approaches with documented failure modes.
Core finding: world models fail at the judgment layer; they can route information and even synthesise sense-making, but cannot hold accountability or adapt to contextually novel situations.
Block’s restructuring created a capability vacuum that AI systems were architecturally unable to fill, not just inadequately trained for.
Practical takeaway: before removing human roles, decompose what those roles actually do — world model capability maps poorly onto management functions as traditionally defined.

2026-04-19 — The Web Is About to Look Completely Different #

YouTube · YouTube

Infrastructure providers are building agent-native web interaction primitives: cryptocurrency wallets for agents, fraud detection tuned to AI traffic patterns, authentication flows that bypass human-facing UI.
The shift parallels mobile web but is more fundamental — it changes what a “web request” is, not just what device makes it.
Current fraud detection and rate-limiting infrastructure treats AI traffic as anomalous; this will be resolved at the infrastructure layer within the near term.
Practical takeaway: web products built around human interaction patterns (CAPTCHAs, session flows, UI affordances) need an agent-native access layer or risk losing AI-driven traffic.

2026-04-18 — OpenAI Just Gave Agents the Ability to Do Everything — The Consequences Are Massive #

YouTube · YouTube

OpenAI’s simultaneous infrastructure launches enable autonomous agents to install software, write files, and execute financial transactions — collapsing the gap between AI capability and real-world action.
New trust boundaries emerge between human and AI capabilities: what was previously a human-gated action is now agent-accessible, requiring architectural trust enforcement rather than UI-level friction.
The velocity of capability expansion outpaces most organisations’ governance frameworks — trust architecture is now a product requirement, not a compliance exercise.
Practical takeaway: revisit agent permission models immediately; last month’s capability assumptions are already outdated, and the blast radius of agent errors has materially increased.

2026-04-18 — Karpathy’s Agent Ran 700 Experiments While He Slept. It’s Coming For You. #

YouTube/Podcast · ~27 min · Podcast · YouTube

Autonomous research agents running iterative experiments overnight represent a step-change in research throughput — 700 experiments in one sleep cycle is not a demo, it is a new baseline.
Memory architecture determines whether such systems compound knowledge or accumulate noise: write-time compilation vs. query-time synthesis creates fundamentally different knowledge curves over time.
Teams without agent-scale evaluation infrastructure will be unable to process the output these systems generate, creating a new bottleneck at the review and interpretation layer.
Practical takeaway: agent infrastructure investment must pair with evaluation infrastructure investment — generation capacity without review capacity produces noise at scale.

2026-04-18 — Every Tech Giant Is Building the Same Thing Right Now #

YouTube · YouTube

Google, Microsoft, Amazon, and OpenAI are converging on agent-native infrastructure: identity systems for agents, permission frameworks, and inter-agent communication protocols.
The convergence suggests an emerging platform layer analogous to the mobile OS wars — whoever controls agent identity and permission infrastructure controls the ecosystem.
Unlike mobile web, the interaction paradigm shift is bidirectional: agents initiate actions, not just respond to user requests, fundamentally changing what infrastructure must support.
Practical takeaway: vendor platform choices made now will carry agent-identity lock-in; evaluate infrastructure vendors on their agent-native roadmap, not their current human-facing product.

2026-04-17 — Anthropic And OpenAI Are Fighting Over Your Memory. You’re Going To Lose. #

YouTube/Podcast · ~30 min · Podcast · YouTube

Accumulated professional context in AI platforms constitutes a new category of capital — the fifth category, after financial, social, human, and reputational capital.
Vendor lock-in mechanisms operate through context layers: the more a system knows about how you work, the more painful switching becomes, independently of model quality.
There is no portable working identity standard; users building deep context on closed platforms are creating capital they do not own and cannot extract.
Practical takeaway: maintain personal context databases outside vendor platforms; extract working context regularly via structured prompts to preserve portability as the lock-in deepens.

2026-04-17 — Tech Talent Is About to Get Ugly Thanks to This Memo #

YouTube · YouTube

Selection pressure for AI fluency is reshaping hiring, creating a U-shaped talent market: experienced practitioners (who understand what AI gets wrong) and AI-native developers (who never worked without it) are both valued; mid-career workers in between are most exposed.
Hiring memos explicitly prioritising AI fluency over domain seniority are now circulating at major tech companies — this is policy, not aspiration.
The compress-or-replace dynamic means headcount reductions will continue to accelerate in roles where AI can substitute task execution without requiring the judgment layer.
Practical takeaway: mid-career workers should explicitly build the comprehension and judgment artifacts that demonstrate the value AI cannot substitute, not just the AI-augmented output.

2026-04-16 — Your AI Is 50x Faster. You’re Getting 2x. You’re Fixing the Wrong Thing. #

YouTube/Podcast · ~20 min · Podcast · YouTube

The gap between model speed (50x faster) and productivity gain (2x) is not a model problem — it is an interface and organisational overhead problem.
Human interface overhead — approval steps, context handoffs, review cycles — consumes the speed dividend that faster models deliver.
Details four durable human roles in agentic systems: goal specification, edge-case adjudication, accountability holding, and taste/aesthetic judgment.
Practical takeaway: redesign the human-in-the-loop touchpoints before optimising for model speed; the bottleneck is the interface layer, not the inference layer.

2026-04-15 — The Real Problem With AI Agents Nobody’s Talking About #

YouTube/Podcast · ~38 min · Podcast

The binding constraint on agent deployment is not installation, capability, or cost — it is requirements definition: clearly specifying what the agent should do, in which contexts, with what constraints.
“Installing an agent is trivial; defining what it should do is the hard part” — this inverts the conventional wisdom that implementation is the bottleneck.
Proposes interviewer agents as an intermediate architecture: agents whose job is to elicit and formalise requirements before a task agent is deployed, surfacing the specification problem explicitly.
Practical takeaway: treat requirements definition as a first-class engineering problem for agent systems; a SOUL.md or equivalent specification document is not optional overhead, it is the product.

2026-04-14 — 3 Model Drops. $15M/Day in Burn. One Product Dead. Nobody Connected Them. #

Podcast · ~21 min · Podcast

Sora’s shutdown exposed the unsustainable unit economics underlying many AI capability showcases: $15M/day burn against $2.1M lifetime revenue is a cautionary structural failure, not a market timing problem.
March 2026’s headline model releases (ChatGPT 5.4, Gemini 3.1 Ultra) masked five quieter developments signalling a shift from capability competition to economic sustainability competition.
AI ad placements converting at 1.5x efficiency directly threaten Google’s search revenue model — the economic disruption is now hitting the incumbents’ core business, not just startups.
Practical takeaway: the relevant metric is now “inference cost per delivered unit of revenue,” not benchmark performance; leaders tracking capability announcements without tracking unit economics are navigating blind.

2026-04-13 — I Looked At Amazon After They Fired 16,000 Engineers. Their AI Broke Everything. #

Podcast · ~19 min · Podcast

AI-generated code at scale creates “dark code” — software that ships but that nobody on the team fully understands — representing an organisational capability crisis, not a code quality problem.
Amazon’s post-layoff codebase illustrates what happens when comprehension is not a deployment gate: systems run but the organisation loses the ability to modify, debug, or extend them safely.
Introduces a three-layer framework: spec-driven development (comprehension before generation), self-describing architectures (code that documents its own intent), and comprehension gates (mandatory checkpoints before deployment).
Practical takeaway: code generation velocity amplifies the cost of unclear specifications upstream; invest in spec tooling and comprehension gates now, before dark code accumulates to the point of organisational fragility.

2026-04-12 — I Watched 3 Companies Lay Off Their Managers. All 3 Hit the Same Wall. #

Podcast · ~33 min · Podcast

Decompose management into three distinct functions: information routing (AI handles readily), sense-making (resists automation; requires contextual judgment), and accountability & feedback (irreplaceable by LLMs at current capability levels).
Kimi, Block, and Meta represent three different experiments in flattening management — all three encountered the same wall: removing the sense-making and accountability layers causes coordination failures that AI cannot patch.
The failure mode is cutting “load-bearing structure” — teams confuse information routing (automatable) with sense-making (not automatable) and remove both simultaneously.
Practical takeaway: use a decomposition playbook before restructuring; map which management functions you are automating vs. eliminating vs. preserving — the distinction determines whether the restructure succeeds or collapses.

2026-04-11 — Google’s New Quantization Is a Game Changer #

Podcast · ~22 min · Podcast

Google’s TurboQuant achieves 6x KV cache compression with zero data loss — a software-only breakthrough that changes LLM deployment economics without requiring hardware upgrades.
Memory (specifically KV cache storage) is a structural bottleneck in LLM deployment at scale; TurboQuant is the first production-grade lossless solution, not an incremental improvement.
The asymmetric advantage: operators who take control of context layer optimisation before this moves to mainstream production will hold a structural cost advantage over competitors waiting for vendors to solve it.
Practical takeaway: treat memory management as core infrastructure strategy, not a vendor problem to be solved later; the window to build competitive advantage here is open now and will close as this becomes commoditised.

2026-04-10 — There Are Only 5 Safe Places to Build in AI Right Now. Are You in One? #

Podcast · ~26 min · Podcast

Most AI application builders are “functionally thin wrappers” — marginally better UI over a commodity API — and face rapid commoditisation as supply becomes infinite.
Five durable structural positions: trust as routing layer (responsible agentic systems), context ownership (platforms like Notion or Salesforce as data chokepoints), distribution scarcity, taste/aesthetic judgment requiring human accountability, and liability ownership AI cannot assume.
Lovable shipping 100,000 projects per day at $6.6B valuation exemplifies the growth-without-moat trap — high velocity but no structural ownership.
Practical takeaway: evaluate your market position against the five categories; if you cannot claim at least one, you are in the commoditisation path regardless of current traction.

2026-04-09 — Nasdaq Quietly Changed Its Rules. Now Your 401(k) Pays for SpaceX’s IPO. #

Podcast · ~23 min · Podcast

Nasdaq indexing rule changes now route retirement account flows into AI company IPOs regardless of float constraints or lock-up mechanics — passive investors are involuntarily exposed to AI burn rates.
Float constraints mean most index-included AI companies have illiquid share structures; the index inclusion creates price signals disconnected from fundamental valuation.
Burn rate implications for retail investors are material: the gap between paper valuation and cash sustainability is being obscured by index-driven inflows.
Practical takeaway: if you hold broad index funds, you are now implicitly invested in AI infrastructure burn rates — understand the exposure, even if you cannot easily opt out.

2026-04-09 — I Analyzed 512,000 Lines of Leaked Code. It Shows What’s Coming for Your AI Tools. #

Podcast · ~25 min · Podcast

Anthropic’s Conway agent system (leaked via code) reveals a five-layer platform strategy: domain encoding, workflow calibration, behavioural relationship, artifact history, and a proprietary extension format creating ecosystem lock-in.
The proprietary extension system makes tools built for Conway incompatible with competing agent platforms — a deliberate “Active Directory” move creating foundational enterprise dependency.
Behavioural lock-in operates through accumulated context: four compounding layers make switching friction exceed data portability laws’ ability to address.
Practical takeaway: platform selection for agentic systems carries lock-in gravity exceeding previous software migrations; evaluate platforms on their lock-in architecture, not their current capability benchmarks.

2026-04-07 — A Polymarket Bot Made $438,000 In 30 Days. Your Industry Is Next. Here’s What to Do About It. #

Podcast · ~29 min · Podcast

AI is closing arbitrage windows that historically took decades to close — speed gaps, reasoning gaps, and discipline gaps are collapsing in weeks, not years.
The Polymarket example illustrates intelligence arbitrage replacing labour arbitrage as the dominant economic dynamic: the edge is no longer access to information or processing capacity, it is structural position.
Value is migrating upstream to judgment and taste — the structural gaps AI cannot close on a quarterly update cycle.
Practical takeaway: informational or cognitive arbitrage is now a liability, not an asset — it invites automated competition. Durable competitive positions require structural ownership AI cannot replicate through iteration.

2026-04-06 — You’re Building AI Agents on Layers That Won’t Exist in 18 Months #

YouTube/Podcast · ~12 min · Podcast

Walks through the six-layer agent infrastructure stack currently under development.
Argues the shift to agent-first primitives is comparable in scale to the cloud migration.
Key insight: different layers are maturing at wildly different speeds, and the orchestration layer enterprise deployments need is largely missing.
Warns that teams prioritising shipping speed over stack literacy will hit reliability failures as transitional lock-in and agent sprawl compound through 2026.
Practical takeaway: invest in foundational stack understanding now rather than patching later.

2026-04-06 — Your Agent Produces at 100x. Your Org Reviews at 3x. That’s the Problem #

YouTube/Podcast · ~10 min · Substack

Examines the mismatch in real-world agent deployments where AI output generation vastly outpaces organisational review capacity.
Breaks down four failure modes from OpenClaw deployments: clarity of intent determining output quality, hidden data integrity disasters, the skill-call vs hardwired-workflow distinction, and org redesign failures when AI scales output without scaling human oversight.
Core argument: treating agents as shortcuts rather than systems leads to predictable month-two failures.

2026-04-04 — Wall Street Just Bet $285 Billion on AI Agents. The Best One Barely Works #

YouTube/Podcast · ~15 min · Podcast

Despite massive Wall Street investment, most AI agents cannot answer three fundamental questions about their own capabilities.
Analyses specific tools — Lindy, Google Opal, Sauna, Obvious — separating those delivering real outcomes from those running on “demo energy.”
Introduces a three-layer architecture framework for builders who want control, with verifiability as the non-negotiable foundation.
Advice: apply rigorous evaluation before committing resources to any agent platform.

2026-04-03 — I Broke Down Anthropic’s $2.5 Billion Leak. Your Agent Is Missing 12 Critical Pieces #

YouTube/Podcast · ~14 min · Podcast

Deep analysis of leaked Claude Code architecture, revealing that successful agents are “80% plumbing and 20% model.”
Details twelve essential primitives including tool registries with metadata-first design, eighteen-module security architectures protecting individual tools, session persistence, and workflow state management.
Key warning: builders chasing glamorous AI components while neglecting foundational infrastructure will keep shipping demos that crash in production.
Argues against premature complexity.

2026-04-02 — Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit #

YouTube/Podcast · ~11 min · Podcast

Token efficiency deep-dive ahead of new pricing models.
Reveals how users typically waste 8-10x the necessary tokens through poor habits: raw PDFs inflating token counts, conversation sprawl compounding waste, plugin overhead costs, and ignoring model mixing strategies.
Provides concrete approaches to reduce session costs significantly.
Warning: wasteful token practices will become much more expensive as advanced models arrive at higher price points.

2026-05-05 — The Anticipation Gap: Why 4 Problems Have to Be Solved Together for Consumer AI to Work #

Substack · Read

Consumer AI agents remain reactive, not anticipatory — the agent waits for you rather than acting ahead of you. Nate identifies four structural problems that must be solved simultaneously (not sequentially) to flip this.
The framing is useful: “anticipation gap” as a named concept for why consumer AI doesn’t feel like having an assistant yet even though the raw capability is there.
Connects to enterprise agentic infrastructure — the same problems (state persistence, trigger architecture, intent modelling, trust) appear in enterprise contexts at higher stakes.

2026-05-04 — 55-75% of your week is on thin ice. Here is the audit that shows you which part. #

Substack · Read

A framework for categorising knowledge work into four buckets: theater (performative work with no real output), commodity (easily automatable), at-risk (automatable but not yet automated), and durable (judgment, relationships, context that AI can’t replicate).
The 55-75% estimate is deliberately provocative — the point is that most knowledge workers have not honestly audited which category their actual daily tasks fall into.
Complements the ai-societal-impact layoff data: the audit framework turns macro statistics into an individual professional diagnostic.

2026-05-02 — AI agents are about to route around every tool that can’t pass 5 structural tests #

Substack · Read

Tools become agent infrastructure when they have: clean data structures, predictable schemas, programmatic access, reliable state management, and composable outputs. Nate uses Linear and Symphony as case studies.
The practical implication: software tools not built for agent interaction will be bypassed, not upgraded. This is a product strategy warning for any SaaS tool relying on human-only workflows.
Cross-column: the “5 structural tests” are implicitly the criteria for what makes a good MCP connector target — directly relevant to claude-integrations topic.

2026-04-28 — ChatGPT 5.5 scored 87 where the next best model scored 67 #

Substack · Read

GPT-5.5 performance review with routing guidance: excels at multi-step knowledge work synthesis; Claude remains superior for long-context reasoning and instruction-following precision.
The “score 87 vs 67” framing drives the headline but the useful content is the task-routing heuristics — when to use which model for which class of work.
Practical routing logic is rare in coverage that tends toward binary “which model wins?” framing.

2026-04-24 — Claude Design just cut 60% of your designer’s week #

Substack · Read

Nate evaluates Claude Design alongside Claude Code and Claude Cowork as an integrated pipeline that eliminates the mockup-to-production handoff — the most expensive seam in product development.
The organisational implication: design review cycles, handoff meetings, and spec translation work are the immediate casualties; the durable roles are taste, direction-setting, and final judgment.
First serious practitioner analysis of Claude Design as part of a complete Anthropic product suite rather than an isolated tool.