A Developer's Roadmap to AI: 5 Stages to Get There

You’re a Developer. Now What?

Over the last few weeks, people have been DM’ing me asking the same question: “How do I get started in AI?” It’s hard to get a job as a developer right now without any AI experience, and people want to know how to break in. I wanted to write something that explains it well enough that I can just point people to it.

Here’s my honest take. I’ve been working in AI since ChatGPT dropped on November 30th, 2022. In that time, I’ve watched coding get solved. People are still skeptical. It hasn’t fully rolled out yet. But personally, I’m never touching a keyboard to write code again. I voice-prompt agents and use multiple agents to build specs and write code. That’s just how it works now. Most likely, we’re all going to be building agents in the future, and the more people who can get into agentic AI, the better.

If you’re a developer, you need to start understanding how AI works, and not in the “make a REST call and parse the response” way. An LLM is not a database, it’s not a cache, it’s not an API you just call and forget about. It’s its own thing, and you need to start treating it that way.

The good news: you do not need a PhD to do this. Getting a PhD or a master’s degree at this point would honestly be a waste of time because the field moves too fast. As Jad Tarifi, founder of Google’s first generative AI team, put it: “AI itself is going to be gone by the time you finish a PhD. Even things like applying AI to robotics will be solved by then.” Understanding the math behind it is helpful in some niche cases (like reducing latency when you’re processing billions of documents at scale for a telco), but for most work? Understanding the theory is more than enough. The math is a hobby, or for when you’re going really, really deep.

I’ve structured this as 5 stages. You can start looking for jobs at the end of Stage 4. Stage 5 is about continuous learning and happens in parallel with your career. Don’t wait until you’ve mastered everything. You’ll never be done learning in this field, and nobody expects you to be.

Stage 1: Learn How LLMs Work

This is the “just play with it” stage. No frameworks, no fancy patterns. Just you and an LLM.

What to Do

Set up a local model or a cloud option. You have two good paths:

Ollama with a small local model. Look for models that’ll run on modest hardware. You don’t need a beefy GPU for this. Check the Ollama model library for options.

Google Gemini — Sign up for Google AI Studio and you get a free tier that lets you start experimenting right away with lower rate limits. If you need higher throughput, you can use Vertex AI with pay-as-you-go pricing. Either way, of all the top-tier providers, Gemini is hard to beat on cost, availability, and speed. Here’s what the pricing looks like:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Best For
Gemini 2.5 Flash Lite	$0.10	$0.40	1M tokens	Learning, experimenting, high volume
Gemini 3 Flash	$0.50	$3.00	1M tokens	Pro-grade reasoning at flash speed
Gemini 3.1 Pro	$2.00	$12.00	1M tokens	Don’t start here, but great for complex reasoning and planning agents

For getting started, Gemini 3 Flash or Gemini 2.5 Flash Lite are the ones to use. Gemini 3 Flash is a beast — it outperforms even Gemini 2.5 Pro on several benchmarks, with full tool call support and high availability. Gemini 2.5 Flash Lite is even more economical with optional reasoning capabilities you can toggle on when needed. You probably don’t need Gemini 3.1 Pro at this stage, but it’s there if you want frontier-level intelligence for more complex tasks down the road. These are incredible models for the price, and they’re perfect for getting started since you’re just learning and don’t need to burn through a ton of money while you experiment.

Build a chatbot. This is your first project. Keep the message history, pass messages back and forth, and have an actual conversation with the model. Build it in the terminal first. If you’re feeling ambitious, stand up a simple React UI for it. With tools like Claude Code or Cursor, this should be straightforward.

Add tool calls. Once your chatbot works, give it tools. Have it fetch the weather, query a SQLite database, store things. This is where it gets interesting. You’re not just chatting anymore; you’re having the LLM do things.

Experiment with the context window. This is critical. The context window is the amount of text (turned into tokens, which are mathematical representations of your input) that the LLM can “see” at once. Every model has a hard limit on how much can go in the window and how much it can generate. But there are more hidden limits beyond just the amount of text. Context poisoning is a real problem — if you tell the model to do something, then later tell it not to do that thing, then tell it to do it again, it’s not going to have a clear picture of what you want. Contradictory messages in the history will degrade its ability to follow your intent. As the context grows, it gets more unwieldy, and you need to figure out where that threshold is for yourself. Start adding a ton of data to the context and watch the model’s performance degrade. Give it a clear goal, see how it performs, then flood it with data and watch it struggle. This is something you need to feel and understand viscerally, because it’ll bite you later if you don’t.

Play with system prompts and temperature. Change the system prompt and see how it changes behavior. Crank the temperature up and down. Temperature controls how “creative” or random the model’s responses are. You don’t need to understand exactly why; just observe what happens.

What to Avoid

Don’t study the math. Not yet. You’ll get to it eventually, but right now it’ll slow you down.
Don’t worry about what tokens or embeddings actually are. You just need to know that tokenization is how the LLM understands what you’re saying — your input gets converted into a mathematical representation. That’s it for now. The details can wait.
Don’t use any frameworks yet. Just the SDK for whatever provider you picked (Ollama SDK or Google Gemini SDK) and Python.

Goals

By the end of this stage:

You’ve built a chatbot that maintains conversation history
You’ve made tool calls (weather, database, etc.)
You’ve seen what happens when the context window gets crowded
You’ve experimented with system prompts and temperature
You can explain every piece of the code you wrote

That last point matters. You need to be able to talk through these projects as if you understand everything around them. Not the math, not the theory of temperature. Just: what does this code do, why did I build it this way, and what did I learn?

Suggested Resources

Ollama - Run models locally
Google Gemini API - Powerful, cost-effective, and highly available

Stage 2: Basic AI Agents

Now you’re going to start building something that feels more like a real product. This is where frameworks come in.

What to Do

Pick a framework. I’d recommend Pydantic AI. It’s great for building basic agents, it makes it trivial to swap between providers, and the streaming support is excellent. LangChain is another option that connects to a lot of things and makes it easy to get started, but I wouldn’t use it in production. The industry has mixed feelings about LangChain. Its original purpose (chaining LLM calls when context windows were tiny) isn’t really relevant anymore since modern context windows are enormous. It’s fine for learning, but reach for something like Pydantic AI when you want to build for real.

Build for provider flexibility. When you’re building something, you need to understand that swapping models and providers is going to happen. A new model comes out on a different provider and you need to switch over to it. How you’ve constructed your application is going to determine how easily you can do that. With a good framework like Pydantic AI, it can be as simple as changing the model name string. But if you’ve pieced together raw LLM provider SDKs, you might find that switching to a new model requires major changes to how you handle messages, tool calls, and responses. This is one of the biggest reasons to use a framework — it abstracts away the provider differences so you can move between models without rewriting your code.

Hook into streaming events. This is where a lot of the actual work lives. When your agent is doing things, the user needs to see what’s happening immediately. People aren’t going to wait 30 seconds. They’re not even going to wait 10 seconds. They need to start seeing results quickly, and LLMs will stream partial results as they generate them. This is an important thing to learn.

If someone interacts with your AI and sits there with no feedback, they’ll assume it’s broken. They’ll kill it. They’ll complain. Even if it was working perfectly behind the scenes.

You need to stream token output as it arrives. You need to show tool calls as they happen. You need to surface how many tokens are being used and how long the agent has been running. In Pydantic AI, you can hook into all of these streaming events. Half of the work in building AI applications is this infrastructure around getting events up to the user.

Start doing evals. Evals are how you measure whether your agent is actually doing what you told it to do. I want to be upfront here: evals are a very new field of study and nobody has the right answer yet. If someone tells you there’s one correct way to do evals, they’re wrong (the Pydantic AI eval docs say the same thing, and it’s good advice to follow). There are approaches and tools, but this is still being figured out across the entire industry. You need to try things out for yourself and see what works for your specific use case.

That said, you still need to start. Begin with 5 to 10 examples. Give the agent a task, measure whether it completed that task correctly. You can score this programmatically (did it return the right answer?) or use an LLM as a judge (have another model evaluate whether the agent did a good job). Don’t worry about measuring speed yet. Focus on accuracy: is it doing the right thing, and is it doing anything crazy?

What to Avoid

Don’t build your own agent framework. Do not piece together LLM provider SDKs and try to build your own agent loop. It will come out to be garbage. You need something that handles the agent loop, connects to streaming events, and plugs into observability. Use a library like Pydantic AI that does all of this out of the box. Find one that works well enough and get moving fast.
Don’t add memory or state. No key-value stores, no long-term memory, no persisted state. Keep it simple. Just the chat history and tool calls.
Don’t over-engineer the eval pipeline. Start small, 5 examples is fine.

Goals

By the end of this stage:

You’ve built an agent with a framework like Pydantic AI
You can swap one provider for another without issues
You’re catching streaming events and surfacing them to the user
You’ve written basic evals
You’ve started reading Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems by Antonio Gullí. Go through each pattern, try some of them out for yourself

Suggested Resources

Pydantic AI - The agent framework I’d recommend
LangChain - Good for learning, has lots of connectors
Shotgun (GitHub) - A Pydantic AI application with sub-agents, compaction, and more. Good codebase to study
LLM as a Judge - Pydantic AI’s guide on using LLMs to evaluate agent outputs
Agentic Design Patterns by Antonio Gullí - Essential reading on agentic AI patterns

Stage 3: Sub-Agents

From the work you did in Stage 1, you saw what happens when the context window gets crowded: calls get more expensive and the model starts doing weird things. Sub-agents are the solution.

What to Do

Build agents as tool calls. A sub-agent is just another LLM with a clean context window that handles a specific piece of work. In practice, this means wrapping an agent inside a tool call. When your main agent needs to do something complex, it calls a tool that spins up a new agent with a fresh context window, gives it a focused prompt, and gets back the result.

In Pydantic AI, this is straightforward. Your tool call function encapsulates the sub-agent, runs it with a clean window, and one of the inputs to that tool call is the prompt describing what the sub-agent should do.

Experience the telephone problem. When one agent tells another agent what to do, context gets lost. It’s literally a game of telephone. The main agent might not explain the full situation to the sub-agent. The sub-agent might misunderstand the task. The further down the sub-agent tree you go, the weirder things get.

Set up a complex scenario: have a sub-agent query data from a database, verify it with another tool call, and return or write the results. This sounds simple but it isn’t. Even the best frameworks still struggle with sub-agent coordination. You’ll lose a lot in this game of telephone, and you need to feel that pain firsthand.

Experiment with planning. Have a smarter model (like a larger, more capable model) create a plan, then pass that plan to a router agent, which delegates work to sub-agents. This pattern (planner -> router -> workers) is fundamental and you’ll find it covered extensively in books on agentic AI patterns.

What to Avoid

Don’t reach for agent-to-agent protocols yet. A2A (Agent-to-Agent) communication is an active field of study. It’s interesting but not something you need right now.
Don’t expect it to work perfectly. Sub-agent coordination is genuinely hard. The point of this stage is to understand why it’s hard.

Goals

By the end of this stage:

You’ve implemented agents as tool calls with clean context windows
You’ve experienced the “telephone problem” and understand why context degrades
You’ve experimented with planning patterns (planner -> router -> workers)
You can articulate when and why to use sub-agents vs. keeping everything in one context

Suggested Resources

Pydantic AI: Multi-Agent Applications - How to build sub-agents in Pydantic AI
Agentic Design Patterns by Antonio Gullí - Covers planning and delegation patterns in depth

Stage 4: Agentic Frameworks and MCP

There are a lot of agentic frameworks right now: Google ADK (Agent Developer Kit), Crew AI, Microsoft AutoGen, and more. These are powerful tools, but you absolutely cannot start here. If you don’t understand how LLMs work, how tool calls work, and how sub-agents coordinate, these frameworks will be a black box of confusion.

What to Do

Play with the big frameworks. Try Google ADK. Try Crew AI. Try AutoGen. Set up multi-agent applications with memory, planning, and sub-agents. These frameworks have things like artifacts, long-term memory, and built-in sub-agent orchestration. If you’ve done the first three stages, you’ll understand what’s happening under the hood. If one of these works for you and you can get started with it, great.

But you’ll also see the problems. When you add long-term memory or persistent planning, things can break in subtle ways. Say you’re building a coding agent and your long-term plan says “use JavaScript.” Later you tell it “switch to TypeScript.” If the memory and plans don’t get updated, the agent might ignore your instruction. Sub-agents might not get the message. This is the kind of thing that’s maddening if you don’t understand the underlying mechanics, but manageable if you do.

You don’t necessarily need these orchestration frameworks. It’s important to play with them so you know what’s out there and can say you’ve used them. But you might end up building your own orchestration, memory, or planning layer on top of a solid agent framework like Pydantic AI, and that’s fine. That’s different from what I warned against in Stage 2 — there, I said don’t piece together raw LLM provider SDKs to build your own agent loop. That’s still true. But building your own memory system or workflow orchestration on top of a proven agent framework? That’s a completely reasonable thing to do at this point. You might have business rules or logic that lets you cut down on cost or do things more efficiently without the overhead of a full orchestration framework.

Here’s the thing about memory and long-term planning that these frameworks make seem complicated: most of it doesn’t need to be. Long-term planning can be as simple as a file with checkboxes. The agent writes to the file and checks the box when it’s done. An agent like Claude Code can do this because the models are smart enough and have the fidelity to write accurate output. It doesn’t need to be more complex than putting an X somewhere in a text file.

Memory is the same story. Most of these frameworks use key-value stores and fancy persistence layers. When you’re starting out, you don’t need any of that. You can just use a text file. Have the agent update a markdown file — figure out which section to update, do a partial update, and move on. Read the memory from the file when you need it. Write to it when something changes. That’s enough. The framework-provided memory is useful and it’s good to understand how it works, but it’s also a bit of a black box. You need observability to understand what these things are actually storing and retrieving, and you probably don’t need the complexity for most use cases.

Start using MCP. MCP (Model Context Protocol) is an open standard for connecting AI agents to external tools and data sources. Think of it like a USB port for AI: instead of building custom integrations for every tool your agent needs to talk to, MCP provides a standardized way to connect to anything. There are already pre-built MCP servers for things like GitHub, Slack, Google Drive, Postgres, and more. Your agent connects to an MCP server, discovers what tools are available, and can start calling them. You don’t have to set up the tool call framework yourself — you’re just calling into a server, passing data back and forth, and the agent understands how to use it. Connect your agents to MCP servers and have them start doing real work through these integrations.

Add observability. This is where you want to start understanding what your agents are actually doing in production. Tools like Logfire let you trace agent execution, see where time is being spent, and debug issues. This is a skill that matters a lot in production environments.

Build portfolio projects. You should have projects in your GitHub that demonstrate you understand how agents work. Not toy examples but real applications that show you can build multi-agent systems, connect them to external services, handle streaming, and do evals.

What to Avoid

Don’t start here. I can’t stress this enough. If you jump straight to Crew AI or Google ADK without understanding the foundations, you’re going to have a very hard time.

Goals

By the end of this stage:

You’ve built applications with at least one major agentic framework
You’ve connected agents to MCP servers
You’ve added observability to your agent workflows
You have portfolio projects on GitHub that demonstrate your skills
You should start looking for a job. This is more than enough. If you can explain all the concepts from Stages 1 through 4, you’re in really good shape.

Suggested Resources

Google ADK - Google’s Agent Developer Kit
Crew AI - Multi-agent framework
MCP - Model Context Protocol
Logfire - Observability for AI applications

Stage 5: Continuous Learning

You do not need to know absolutely everything about AI to get a job. You do not need to understand the math behind it. Nobody is going to expect you to be able to train an LLM from scratch — a handful of people at frontier labs do this with billions of dollars in compute budgets. That’s not you, and that’s fine. The important thing at this point is to stay current because the field changes very quickly.

The Concepts Behind LLMs

One of the things worth learning over time is the concepts behind how LLMs actually work. What an embedding is. How the context window actually works. How training, fine-tuning, and reinforcement learning shape a model’s behavior. As Dario Amodei, CEO of Anthropic, wrote: “People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work. They are right to be concerned: this lack of understanding is essentially unprecedented in the history of technology.” It’s a series of equations with weights (numbers) that get adjusted in complicated ways, and changing those weights changes how the model responds.

You’re probably never going to build your own LLM from scratch. But it is likely that at some point you’ll take an existing model and do post-training on it — fine-tuning, reinforcement learning, and whatever new techniques come out next. Post-training is a heavily active area of research with new papers coming out constantly, and people are finding ways to do a lot with it without spending a ton of money on compute. Understanding where prompting ends and where post-training could help is a valuable thing to learn over time. Courses like Deep Learning AI cover these concepts well. You don’t need this knowledge to land your first AI role, but as you get more experience, people are going to expect that you know this stuff. Someone from an AI or ML team in an interview will ask you “how does this work?” and you need to be able to talk about the concepts.

Stay Current

Every few weeks there’s something new. New training techniques, new model architectures, new tools, new protocols. The fundamentals from Stages 1 through 4 don’t change much, but the tools and techniques evolve constantly.

Show that you’re keeping up. Go to events. Do hackathons. Build things. That’s what employers want to see: that you’re on top of the field, not that you have a degree in it.

Suggested Resources

Deep Learning AI - Great courses on the math and theory
Arxiv - Where new AI research gets published
Local AI meetups and hackathons - Nothing beats building with other people

Final Thoughts

You don’t need to go back to school. You don’t need a PhD. You don’t need to understand every equation behind transformer architectures. You need to build things, understand how the tools work, and be able to explain what you built and why.

Start at Stage 1. Don’t skip ahead. Every stage builds on the last, and the gaps will show up in interviews if you skip the fundamentals.

And start looking for jobs at the end of Stage 4. Don’t wait until you feel “ready.” You’ll never feel fully ready in a field that changes this fast. But if you can build agents, swap providers, stream events, do evals, coordinate sub-agents, and connect to MCP servers, you’re ahead of most candidates.

Go build something.