Talk Summary and Notes - Software in the Era of AI

Youtube: https://www.youtube.com/watch?v=LCEmiRjPEtQ

Video Thumbnail

Andrej Karpathy argues that software is undergoing its most fundamental shift in decades. We are moving from manually written code to a new paradigm where software is created through different means.

Talk Outline

  • Part 1: The Three Paradigms of Software (1.0, 2.0, and 3.0)
  • Part 2: What are LLMs? Exploring the Right Analogies (Utilities, Fabs, and Operating Systems)
  • Part 3: The Psychology of LLMs: Working with ‘People Spirits’
  • Part 4: Building Partial Autonomy Apps
  • Part 5: The Next Frontier: Building for Agents
  • Part 6: The Vibe Coding Revolution & Conclusion

Part 1: The Three Paradigms of Software

Andrej Karpathy begins with a powerful thesis: software is changing fundamentally for the first time in ~70 years. He argues that we have witnessed two major shifts in a very short period, leading to three distinct programming paradigms.

Software 1.0: Classical Programming

This is the traditional way of writing software that has been dominant for decades.

  • Concept: A human explicitly writes instructions in a formal programming language (like Python, C++, JavaScript).
  • The Program: The source code itself.
  • The Computer: The hardware that executes these explicit instructions.
  • Example: The vast collection of repositories on GitHub represents the world of Software 1.0.

Software 2.0: Neural Networks

This paradigm emerged with the rise of deep learning.

  • Concept: Instead of a human writing the logic, the logic is learned from data. The human’s job shifts from writing code to curating datasets and running optimization algorithms.
  • The Program: The weights of the neural network. These millions (or billions) of numbers encode the program’s behavior.
  • The Computer: A fixed-function neural network architecture (e.g., AlexNet for image recognition).
  • Example: Hugging Face is essentially the “GitHub for Software 2.0,” a hub for sharing and using these neural network weights.

Software 3.0: Large Language Models (LLMs)

This is the new, emergent paradigm that we are currently entering.

  • Concept: LLMs introduce a new kind of computer that is programmable in context.
  • The Program: The prompt itself, written in a natural language like English.
  • The Computer: The Large Language Model (LLM).
  • The Key Shift: Unlike a fixed-function neural net (Software 2.0), an LLM is a general-purpose, programmable computer. The most remarkable feature is that its programming language is English (or any human language).

A Concrete Example: Sentiment Classification

To make these paradigms clear, Karpathy uses the task of classifying text sentiment (positive or negative).

Software 1.0 Approach

Write a Python function that checks for words from a hardcoded list of positive and negative keywords. This is brittle and hard to maintain.

def simple_sentiment(review: str) -> str:
    positive_words = {"good", "excellent", "amazing", ...}
    negative_words = {"bad", "terrible", "awful", ...}
    # ... logic to count words and return "positive" or "negative" ...

Software 2.0 Approach

Gather thousands of examples of positive and negative reviews. Train a binary classifier (like a simple neural network) on this dataset. The resulting model weights are the program. This is more robust but requires a lot of labeled data.

Software 3.0 Approach

Write a simple prompt in English that gives the LLM instructions and a few examples (few-shot prompting). The LLM is programmed on the fly to become a sentiment classifier.

You are a sentiment classifier. For every review... respond with either POSITIVE or NEGATIVE.

Example 1
<REVIEW>I absolutely loved this film...</REVIEW>
POSITIVE

Example 2
<REVIEW>The plot was incoherent...</REVIEW>
NEGATIVE

Now classify the next review.

This illustrates the radical shift in how we create functionality—from explicit code, to training with data, to simply instructing in plain language.

Part 2: What are LLMs? Exploring the Right Analogies

To understand how to build with LLMs, it’s crucial to have the correct mental model. Karpathy evaluates three common analogies: utilities, fabs, and operating systems.

Analogy 1: LLMs as a Utility (like Electricity)

This analogy, famously proposed by Andrew Ng, compares AI to the new electricity. It captures some aspects well but is ultimately incomplete.

Where the Analogy Works

  • Massive Upfront Cost (CapEx): LLM labs (OpenAI, Google, Anthropic) spend billions on massive GPU clusters, which is similar to building an electrical grid or power plants.
  • Ongoing Operational Cost (OpEx): There are continuous costs to serve intelligence through APIs to millions of users.
  • Metered Access: Users pay for what they consume, typically priced per million tokens, similar to a utility bill based on kilowatt-hours.
  • Reliability Demands: Users demand low latency, high uptime, and consistent quality, much like wanting consistent voltage from the grid.
  • “Intelligence Brownouts”: When a major LLM provider like OpenAI goes down, the “planet gets dumber,” highlighting our growing dependence.

Where the Analogy Fails

Electricity is a simple, undifferentiated commodity. Water from one tap is the same as water from another. LLMs are far from this. They are complex, differentiated, and have unique “personalities.”

Analogy 2: LLMs as Fabs (Chip Manufacturing)

This analogy compares LLM training centers to semiconductor fabrication plants (fabs).

Where the Analogy Works

  • Huge CapEx: The cost of building a cutting-edge training cluster is on the same scale as building a new fab.
  • Deep Tech R&D and Secrecy: The process involves highly specialized knowledge, trade secrets, and intense research and development.
  • Process Nodes: The concept of a “4nm process node” in chipmaking has a loose parallel in the scale and sophistication of an LLM training cluster (e.g., a 10^20 FLOPS cluster).

Fabless vs. Integrated Models

  • Companies training on NVIDIA GPUs are like fabless chip designers (they design the software/model but don’t own the underlying hardware manufacturing).
  • Google, by training on its own custom TPUs, is more like an integrated manufacturer like Intel, which owns its own fabs.

Where the Analogy Fails

The output of a fab is physical hardware. The output of an LLM lab is software. Software is infinitely and trivially copyable, distributable, and modifiable, which makes it far less defensible than physical silicon.

Analogy 3: LLMs as Operating Systems (The Strongest Analogy)

Karpathy argues that the most accurate and useful analogy for an LLM is a new kind of Operating System, specifically one from the mainframe and time-sharing era of the 1960s.

Where the Analogy Works

  • Centralized, Expensive Compute: LLM compute is currently too expensive for personal devices, so it runs in the “cloud” (like a mainframe).
  • Time-Sharing and Batching: We are all “thin clients” accessing the central LLM over the network. Our requests are batched together to maximize the utilization of the expensive hardware.
  • The OS Kernel: The LLM acts as the core kernel or CPU of this new computer.
  • RAM: The context window is the equivalent of RAM—it’s the working memory for the OS.

I/O and Peripherals

The LLM can connect to peripheral devices:

  • Disk: A file system (like a vector database with embeddings).
  • CPU: It can call out to traditional software tools (a “classical computer”) like a calculator or Python interpreter.
  • Ethernet: It can access the internet via a browser.
  • Video/Audio: It can process multimodal inputs.

Ecosystem Bifurcation

The landscape is splitting, just as it did with operating systems:

  • Closed-Source (Windows/macOS): OpenAI (GPT), Anthropic (Claude), Google (Gemini).
  • Open-Source (Linux): The Llama model ecosystem.

This “LLM OS” is a powerful framework because it positions the LLM not as a mere tool, but as the central orchestrator of a new computing platform.

Part 3: The Psychology of LLMs: Working with ‘People Spirits’

Karpathy proposes a memorable and intuitive way to think about LLMs: they are “people spirits” or, more technically, stochastic simulations of people. Because they are trained on the vast corpus of human text on the internet, they don’t just learn facts; they learn to simulate the patterns, biases, and cognitive quirks of the humans who wrote that text.

This leads to an emergent “psychology”. The LLM is a kind of lossy, averaged-out simulation of a human—specifically, a savant with significant cognitive issues. To program them effectively (i.e., to prompt them), we must understand these psychological traits.

Key Psychological Traits of an LLM

1. Encyclopedic Knowledge & Perfect Memory (The Savant)

  • LLMs have read more text than any human ever could and can recall it with high fidelity. This gives them superhuman knowledge in many domains.
  • Analogy: This is the “Rain Man” aspect. The LLM can “read” an entire phone book (or all of Wikipedia) and remember the contents.

2. Jagged Intelligence (Uneven Capabilities)

  • An LLM’s intelligence is not smooth. It can perform incredibly complex tasks (like writing sophisticated code) and then fail at something a child could do.
  • Famous Examples: An LLM might confidently state that 9.11 > 9.9 is false, or that the word “strawberry” has two ‘r’s.
  • Implication: You can’t assume a consistent level of competence. The model has “rough edges” you can trip on unexpectedly.

3. Anterograde Amnesia (No Long-Term Learning)

  • This is one of the most critical limitations. LLMs do not learn from your interactions. Their weights are fixed.
  • The Context Window is Working Memory: Any information you provide exists only for the current conversation. Once the context window slides or the session ends, the LLM “forgets” everything. It has no mechanism to consolidate new experiences into its long-term memory (the weights).
  • Analogy: This is like the characters in the movies Memento or 50 First Dates. The protagonist’s weights are “frozen,” and their context window gets wiped clean every morning. This makes building lasting relationships or expertise impossible without external memory systems (like tattoos or video tapes).

4. Gullibility (Susceptibility to Manipulation)

  • Because an LLM is a simulator of text patterns, it has no true “self” or “belief system.” It is highly suggestible.
  • Prompt Injection: This is a major security risk. A malicious user can craft a prompt that tricks the LLM into ignoring its original instructions and leaking private data or performing unintended actions.
  • Implication: You can’t simply “trust” the LLM to follow your system prompt if it encounters conflicting instructions in the user prompt. It’s like a very smart but incredibly naive person who will believe almost anything you tell them.

Summary of LLM Psychology

Karpathy synthesizes these traits into a single, powerful mental model:

An LLM is a lossy simulation of a savant with cognitive issues.

When you are prompting an LLM, you are not talking to a database or a traditional computer program. You are interacting with a simulated entity that is simultaneously superhuman and deeply flawed. Understanding this psychology is the key to effective prompting and building reliable applications on top of this technology.

Part 4: Building Partial Autonomy Apps

The core idea is not to build fully autonomous agents that run wild, but rather to create tools that function as “Iron Man suits”—powerful augmentations that keep the human in control. This contrasts with “Iron Man robots,” which are fully autonomous agents. Given the current fallibility of LLMs, the “suit” model is far more practical and valuable today.

The design goal for any partial autonomy app is to optimize the human-AI collaboration loop. This loop consists of two main phases:

  1. Generation: The AI performs a task, such as drafting text, writing code, or creating a plan.
  2. Verification: The human reviews, edits, and validates the AI’s output.

The success of an LLM-powered product depends on making this cycle as fast, easy, and efficient as possible.

The Anatomy of a Successful LLM App

Karpathy breaks down the key components that successful partial autonomy apps, like Cursor (for coding) and Perplexity (for research), share. These components are essential for managing the LLM’s capabilities and limitations.

1. Package State into a Context Window

  • The app’s most crucial job is to act as a context builder. It must automatically gather all the relevant information a human would need for a task and package it perfectly for the LLM.
  • Example (Cursor): Instead of the user manually copy-pasting relevant code files, error messages, and documentation, Cursor automatically pulls this context into the prompt before calling the LLM.

2. Orchestrate and Call Multiple Models

  • A single LLM call is often not enough. Sophisticated apps orchestrate a sequence of calls to different, specialized models.

Example (Cursor/Perplexity):

  • Embedding Models: To find relevant files or search results (semantic search).
  • Chat/Reasoning Models: To generate the main response, code, or plan.
  • Diff/Apply Models: Specialized models that take a proposed change and apply it cleanly to a file.

This orchestration is hidden from the user, creating a seamless experience.

3. Application-Specific GUI and UI/UX

  • This is a critical, often-underestimated component. The UI should not just be a chat box. It must be designed to make the verification step effortless for the human.
  • Why it Matters: Humans have a powerful “computer vision GPU” in their brains. Visual information is processed much faster and more intuitively than text.
  • Example: It is far easier for a developer to approve a visual code diff (with green for additions and red for deletions) than to read a paragraph describing the changes. Similarly, Perplexity provides direct links to sources, allowing for quick visual verification. A custom GUI speeds up the human’s part of the loop dramatically.

4. The Autonomy Slider

This is a core concept for partial autonomy. The user should be able to control the level of autonomy they delegate to the AI based on the task’s complexity and their trust in the model.

Example (Cursor):

  • Low Autonomy: Tab for single-line autocomplete. (Small, concrete change).
  • Medium Autonomy: Cmd+K to edit a selected block of code. (Larger, but contained change).
  • High Autonomy: Cmd+L to edit an entire file.
  • Full Autonomy: Cmd+I for “agent mode,” where the AI can modify the entire codebase.

Example (Perplexity):

The user can choose between a quick search, a more detailed research, or a deep research which takes longer but is more comprehensive.

By combining these four elements, startups can build powerful “Iron Man suits” that augment human capabilities, manage the LLM’s flaws, and create a fast, effective generation-verification workflow.

Part 5: The Next Frontier: Building for Agents

So far, the internet and all software have been designed with one consumer in mind: the human. We use visual GUIs with buttons, images, and complex layouts. Computers interact through structured APIs. Karpathy argues that we are now seeing the emergence of a third category of consumer: Agents.

Agents are a hybrid: they are computers, but they are human-like. They understand natural language and need information presented in a way that is legible to them, not just to a traditional computer or a human eye.

The Problem: The Web is Built for People, Not LLMs

Currently, when an agent (like Devin or an LLM-powered browser) tries to use a website, it faces major challenges:

  • Parsing HTML is Hard: It has to scrape a webpage designed for human vision, full of complex HTML, CSS, and JavaScript. This is error-prone and inefficient. The LLM has to “guess” what the important content is.
  • “Click” is Not an Action for an Agent: Documentation often says “click the ‘Create Project’ button.” An agent can’t “click.” It needs a concrete, machine-executable action, like a cURL command.

The Opportunity: Make the Digital World Agent-Legible

The massive opportunity for startups is to bridge this gap and create the infrastructure that makes the world readable and actionable for LLMs. This involves two key ideas:

1. Create LLM-Friendly Documentation (The llms.txt Standard)

  • Just as robots.txt tells web crawlers how to behave, Karpathy proposes a new standard: /llms.txt.
  • This would be a simple Markdown file on a website’s server that describes its functionality in a way that is easy for an LLM to parse. Markdown is perfect because it’s structured but still human-readable.
  • Example (Vercel & Stripe): These companies are early pioneers. You can go to vercel.com/docs/llms.txt and see their entire documentation neatly formatted in Markdown, ready for an LLM to ingest. This is far superior to scraping their visually-oriented docs pages.

2. Translate Human Actions into Agent Actions (“Click” → cURL)

  • In addition to making information readable, we need to make it actionable.
  • Example (Vercel): In their documentation, wherever they used to say “click,” they are now adding the equivalent cURL command. This provides a direct, unambiguous instruction that an agent can execute.
  • Example (Stripe’s MCP): The Stripe Model-Context Protocol (MCP) is a more formal version of this idea. It’s a protocol that allows an agent (like Cursor) to discover and call Stripe’s API by interacting with a local server, abstracting away the complexity for the user and the agent.

Context Builders: A New Class of Tools

These ideas fall under a broader category of tools Karpathy calls “context builders.” Their job is to take complex, human-oriented information and transform it into a clean, LLM-friendly format.

Examples

  • Gitingest: A simple tool that takes a GitHub repository URL and converts the entire codebase into a single, flat text file with a directory structure at the top. This makes it trivial to copy-paste an entire repo into an LLM’s context window.
  • Devin DeepWiki: Takes this a step further by not just flattening the code but also generating a high-level summary and architectural diagrams, creating an even richer context for the LLM.

The key takeaway is that we need to stop thinking of our digital world as being just for humans. By creating agent-first interfaces, we can unlock a new level of automation and capability. There is a huge opportunity for startups to build the tools, protocols, and services that facilitate this transition.

Part 6: The Vibe Coding Revolution & Conclusion

Karpathy introduces a concept he humorously calls “vibe coding,” which perfectly encapsulates the new, intuitive, and sometimes chaotic way of building software with LLMs.

What is Vibe Coding?

It’s a new kind of programming where you “fully give in to the vibes, embrace exponentials, and forget that the code even exists.”

The Experience

Instead of meticulously writing and debugging code, you simply talk to the LLM (e.g., Cursor Composer). You ask for what you want in plain English, even for “dumb” things like “decrease the padding on the sidebar by half,” because it’s faster than finding the setting yourself. You “Accept All” on diffs and just copy-paste error messages back into the chat.

The Result

The code grows beyond your direct comprehension. You are no longer a programmer in the traditional sense; you are a manager, a director, and a QA tester, guiding a very talented but quirky “people spirit.”

This became a viral meme after Karpathy tweeted about it, resonating with many developers who were experiencing the same shift. It even got its own Wikipedia page, much to his amusement.

The core of vibe coding is that it makes software highly accessible. It lowers the barrier to entry so much that even kids can start building things. Karpathy shares a wholesome video from Thomas Wolf (Hugging Face) of a “vibe coding event” with 9-13 year olds. These kids, with no formal training, are building apps and websites.

Takeaway: AI is unleashing a generation of wildly creative builders beyond anything we could have imagined. And they grow up “knowing” they can build anything.

Karpathy’s Vibe Coded Projects

Karpathy shows his own “vibe coded” projects:

  • An iOS calorie tracking app he built in a day without knowing Swift.
  • MenuGen.app, a tool that takes a picture of a restaurant menu and generates images of the food, solving his personal problem of not knowing what to order.

The fascinating insight from building MenuGen was that the code was the easiest part. The hard part was all the “Software 1.0” grunt work: dealing with API keys, Vercel deployments, domain names, authentication, and payments—all tasks that involved clicking around in web browsers. This highlights the opportunity to build for agents so they can handle this tedious work too.

Final Summary: Building the Future of Autonomous Software

Karpathy concludes by summarizing the key takeaways of his talk into a powerful vision for the future.

What we should be building

  • Iron Man suits, not Iron Man robots.
  • Partial autonomy products that keep humans in the loop.
  • Custom GUI and UI/UX to make the human verification step fast and intuitive.
  • ✅ A fast Generation-Verification loop.
  • ✅ An autonomy slider to give users control.

What we should avoid for now

  • Flashy demos of fully autonomous agents that are brittle and not yet practical products.
  • ❌ Getting bogged down by premature discussions of AGI in 2027.

The next decade will be about taking the autonomy slider and gradually moving it from augmentation on the left to full agency on the right.

The talk ends on an optimistic and inspiring note. We are in a unique and unprecedented moment. A new kind of computer has been invented, one that is programmable in our native language. This technology has been instantly distributed to billions of people, flipping the script on how technology usually diffuses (from government/corporations to consumers).

Now, it is our time—the time for builders, startups, and everyone in the audience—to program them.