Complete Guide to Reasoning Model Prompting — How to Use o1, o3, Claude Extended Thinking
AI that thinks for itself: Reasoning Models don't work with conventional prompt methods. Complete mastery of o1·o3·Claude Extended Thinking.
Introduction: The Question Left by Episode 13
In episode 13, while covering multi-agent systems, we left this preview:
"How can we make these agents perform more complex reasoning?"
Throughout this series so far, we've learned various prompting techniques. Clear instructions, step-by-step thinking with CoT, few-shot examples, role distinction with XML tags. All these techniques are based on one premise:
"The model thinks the way I guide it to."
But recently emerged AI models overturn this premise.
Before producing an answer, they think to themselves at length—sometimes using thousands of tokens. Without being asked to. These models are called Reasoning Models. OpenAI's o1·o3 and Claude's Extended Thinking are prime examples.
And these models don't respond well to conventional prompting methods. In fact, techniques that worked well before can sometimes diminish performance.
This episode covers from start to finish what Reasoning Models are, why conventional methods don't work, and how to prompt them correctly.
1. What is a Reasoning Model?
1.1. A One-Line Difference from Existing Models
The simplest explanation goes like this:
# Existing Models (GPT-4o, Claude Sonnet, etc.)
Input → [Single forward pass] → Output
# Reasoning Models (o1, o3, Extended Thinking)
Input → [Internal reasoning process (hundreds to thousands of tokens)] → Output
Existing models generate output immediately upon receiving input. Like someone speaking improvisationally.
Reasoning Models first "think" before outputting. This thinking process is either invisible to users (o1) or shown as a separate "thinking" block (Claude Extended Thinking). Only after thinking is complete do they produce the final answer.
1.2. What Happens Inside
The "thinking" process of a Reasoning Model resembles Chain-of-Thought which we covered in episode 5. The principles are indeed connected, but there's an important difference:
CoT is a technique where we instruct the model to think step-by-step.
The internal reasoning of a Reasoning Model is behavior learned into the model itself.
# CoT Prompt — humans give instructions
"Solve the following problem step by step.
Step 1: Identify conditions
Step 2: Set up equations
Step 3: Calculate"
# Reasoning Model — does it on its own
User: "Solve the problem for me"
Internal reasoning: (model independently repeats condition identification → equations → calculation)
Output: "The answer is 42."
By analogy with humans: CoT is telling an existing model "think carefully step by step" to make it think deeply. A Reasoning Model is a person who thinks that way naturally without being told.
1.3. Current State of Major Reasoning Models
| Model | Developer | Characteristics |
|---|---|---|
| o1 | OpenAI | First commercial Reasoning Model. Strong in math and coding |
| o1-mini | OpenAI | Lightweight version of o1. Speed and cost trade-off |
| o3 | OpenAI | Successor to o1. Significant improvement in general reasoning ability |
| o3-mini | OpenAI | Lightweight version of o3 |
| Claude Extended Thinking | Anthropic | Supported from Claude 3.7 Sonnet. Thinking blocks controllable via API |
| DeepSeek R1 | DeepSeek | Open-source Reasoning Model. Developed by Chinese company |
2. Why Existing Prompting Techniques Don't Work
2.1. "Think Step by Step" — They're Already Doing It
The core of CoT is prompting the model to think step-by-step with phrases like "Let's think step by step." But what happens when you add this instruction to a Reasoning Model?
# CoT applied to existing models — effective
"Find the bug in this code. Analyze step by step."
→ Model starts analyzing line by line ✅
# CoT applied to Reasoning Models — ineffective or backfires
"Find the bug in this code. Analyze step by step."
→ Model is already analyzing step-by-step internally
→ External instruction may conflict with internal reasoning ⚠️
Reasoning Models already reason step-by-step on their own. Adding CoT instructions provides no additional benefit or may even disrupt the model's internal reasoning flow.
2.2. Few-shot Examples — Can Be Harmful
Few-shot is a technique of showing desired answer formats through examples. It's highly effective with existing models. But adding few-shot with detailed reasoning processes to a Reasoning Model creates problems.
# Few-shot with existing models — effective
Example:
Q: What is 5 × 7?
A: Add 5 seven times. 5+5+5+5+5+5+5 = 35. Answer: 35
---
Q: What is 8 × 6?
→ Model follows example format and solves step-by-step ✅
# Same few-shot with Reasoning Model
→ Model's internal reasoning is already far more complex and sophisticated
→ Forcing a simple reasoning pattern example actually degrades performance ❌
For Reasoning Models, it's better to avoid few-shot entirely or, if used, show only input-output format without reasoning processes.
# Few-shot for Reasoning Models — show format only
Example (for format reference):
Q: Analyze contract
A: {"risk_clauses": [...], "recommended_changes": [...], "overall_risk": "high"}
---
Q: Analyze the following contract.
[contract content]
2.3. Excessive Instructions — Interferes with Model Reasoning
Detailed instructions like "think this way, analyze that way, answer in this order" help existing models. But what happens with Reasoning Models?
# Existing models — detailed instructions help
"Analyze in this order:
1. Problem definition
2. Hypothesis setting
3. Evidence review
4. Conclusion derivation
Output each step as a ## heading."
→ Model provides structured analysis as instructed ✅
# Reasoning Models — excessive instructions backfire
Same instructions
→ Model's internal reasoning becomes constrained by external instructions
→ Flexible reasoning paths are blocked, resulting in oversimplified answers ❌
With Reasoning Models, the key is to clarify the goal only and let the process be the model's responsibility.
3. Reasoning Model Prompting Strategies
3.1. Principle: Less is More
This is the core principle of Reasoning Model prompting.
# Existing model style — long, detailed prompt
You are a senior software engineer.
When reviewing code, follow this order:
1. First understand the overall structure
2. Confirm each function's role
3. Mark parts with bug potential
4. Finally suggest improvements
Answer in Korean only.
# Reasoning Model style — short, clear prompt
As a senior software engineer, review this code and present bugs and improvements in Korean.
The second prompt is shorter, but with Reasoning Models it often produces better results. The model finds the most appropriate reasoning path on its own.
3.2. Clarify Goals, Delegate Processes
# ❌ Specifying processes
"First parse the data, then detect anomalies,
and finally provide statistical summary."
# ✅ Clarifying goals only
"Detect anomalies in this sales data and provide a statistical summary."
# ❌ Specifying thinking approach
"Use inductive reasoning: analyze individual cases first, then derive general principles."
# ✅ Specifying results only
"Find common patterns from these 10 cases and organize them as principles."
3.3. Specify Output Format Clearly
Delegate the process, but specify the result format concretely. This part is the same as existing models.
# No format specification — model decides arbitrarily
"Analyze risk factors in this contract."
→ Sometimes lists, sometimes narrative, sometimes tables
# Format clearly specified
"Analyze risk factors in this contract.
Output results only in the following JSON format:
{
'high_risk': ['clause name and reason', ...],
'medium_risk': ['clause name and reason', ...],
'recommendations': ['modification suggestions', ...]
}"
3.4. Give Difficult Problems to See True Capability
Reasoning Models are overkill for simple tasks. Their true value emerges with complex problems.
# Tasks unsuitable for Reasoning Models
"Translate hello to English."
"Summarize this text."
"How do you sort a list in Python?"
→ Existing models suffice. Reasoning Models waste cost and time
# Tasks where Reasoning Models shine
"Analyze all security vulnerabilities in this system from every attack vector perspective."
"Derive all possible edge cases in this business logic."
"Mathematically prove the time complexity of this algorithm."
→ Complex reasoning tasks show clear performance gap over existing models
4. Claude Extended Thinking: Practical Usage
Claude Extended Thinking allows explicit control of the thinking block via API. This is a differentiating feature from other Reasoning Models.
4.1. Basic Structure
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # max tokens for internal reasoning
},
messages=[{
"role": "user",
"content": "Analyze the vulnerabilities in this encryption algorithm."
}]
)
# thinking block and final answer are returned separately
for block in response.content:
if block.type == "thinking":
print("Internal reasoning:", block.thinking)
elif block.type == "text":
print("Final answer:", block.text)
4.2. budget_tokens Setting Guide
budget_tokens is the maximum tokens the model can use for internal reasoning. Adjust based on problem complexity.
| Task Type | Recommended budget_tokens | Examples |
|---|---|---|
| Simple reasoning | 1,000 ~ 3,000 | Simple math problems, basic code review |
| Medium complexity | 5,000 ~ 10,000 | Algorithm design, business logic analysis |
| High difficulty | 10,000 ~ 32,000 | Security audits, complex proofs, strategic planning |
# Simple task — small budget
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4000,
thinking={"type": "enabled", "budget_tokens": 2000},
messages=[{"role": "user", "content": "Explain the recurrence relation of the Fibonacci sequence."}]
)
# Complex task — large budget
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=20000,
thinking={"type": "enabled", "budget_tokens": 15000},
messages=[{"role": "user", "content": "Identify all failure scenarios in this distributed system and design response strategies."}]
)
4.3. What to Avoid with Extended Thinking
# ❌ Attempting to change temperature with thinking enabled
# Extended Thinking is fixed at temperature=1. Cannot be changed.
response = client.messages.create(
model="claude-opus-4-5",
thinking={"type": "enabled", "budget_tokens": 5000},
temperature=0.5, # Error occurs
...
)
# ❌ Using very large budget without streaming
# Response time may exceed tens of seconds → use streaming
response = client.messages.create(
model="claude-opus-4-5",
thinking={"type": "enabled", "budget_tokens": 30000},
stream=True, # Streaming essential for long reasoning
...
)
5. Reasoning Model Failure Patterns
5.1. Wasting on Simple Tasks
Reasoning Models are slower and more expensive than existing models. Using them for simple tasks wastes time and money.
# Reasoning Models are overkill for these questions
"How do you use the print function in Python?"
"What is JSON in one sentence?"
"Find the typo in this code."
# These are where they should be used
"Analyze architectural flaws and long-term maintenance risks in this codebase."
"Design an algorithm that satisfies all the following conditions and prove it."
5.2. The Mistake of Still Adding CoT
Habitually adding "step by step", "carefully", "first do X, then do Y" out of old habits.
# ❌ Prompt with CoT instructions added
"Review this mathematical proof.
Step 1: Confirm prerequisites
Step 2: Review logic of each step
Step 3: Search for counterexamples
Step 4: Derive conclusion"
# ✅ Clarifying goal only
"Review whether this mathematical proof is correct. If there are errors, present them with counterexamples."
The second prompt is shorter but produces better results with Reasoning Models.
5.3. Ignoring Costs in Design
Reasoning Model internal reasoning tokens also incur costs. If all requests are designed to use Reasoning Models unconditionally, costs explode.
# ❌ Applying Extended Thinking to all requests
def handle_request(user_message: str) -> str:
return claude_extended_thinking(user_message) # Same for simple questions
# ✅ Branch by complexity
def handle_request(user_message: str) -> str:
if needs_deep_reasoning(user_message):
return claude_extended_thinking(user_message) # Complex reasoning needed
else:
return claude_standard(user_message) # Simple tasks use existing model
5.4. The Mistake of Not Specifying Output Format
Delegate the process, but must always specify output format. Without format, answers come back in different structures each time.
# ❌ Format not specified
"Analyze the problems in this code."
→ Sometimes numbered lists, sometimes narrative, sometimes tables
# ✅ Format clearly specified
"Analyze the problems in this code.
Output in the following format only:
**Severity**: High/Medium/Low
**Issues**: (list)
**Fixes**: (with code)"
6. When to Use Reasoning Models and When Not to
6.1. Cases Where Reasoning Models Are Appropriate
✅ Mathematical proofs, complex calculations
✅ Multi-step code debugging (tracking bugs across multiple files)
✅ Comprehensive security vulnerability analysis
✅ Deriving all edge cases in complex business logic
✅ Decision analysis with intertwined conditions
✅ Detecting contradictions and logical errors in long documents
6.2. Cases Where Existing Models Are Better
❌ Simple translation, summarization, format conversion
❌ Short code writing (single function)
❌ FAQ answers, information retrieval
❌ Emotional writing, creative work
❌ Real-time responses where speed matters (chatbots)
❌ High-volume processing with budget constraints
6.3. Decision Criterion in One Line
"Is this a problem that expert humans would also need time to think through?"
If yes, use Reasoning Models. If no, existing models are better.
7. Complete Comparison: Existing Models vs Reasoning Models
| Aspect | Existing Models | Reasoning Models |
|---|---|---|
| Response method | Input → immediate output | Input → internal reasoning → output |
| CoT effectiveness | High (direct guidance needed) | None (already built-in) |
| Few-shot effectiveness | High | Low (show format only) |
| Optimal prompt length | Longer is better | Short and clear is better |
| Response speed | Fast | Slow (reasoning time) |
| Cost | Low | High (includes reasoning tokens) |
| Suitable tasks | General tasks | Complex reasoning tasks |
| Creative writing | Good | Average |
| Math and logic proofs | Average | Excellent |
Conclusion: Understand the Tool's Premise
From episode 1 through 13, we covered how to write good prompts. Giving clear instructions, establishing structure, and providing examples were hallmarks of good prompting.
Reasoning Models change this premise. Reducing instructions, not imposing structure, and clarifying goals only produces better results. Though they're AI models, their fundamental operation differs.
The most dangerous mistake in prompt engineering is applying the old approach without understanding the tool's new premise. Like driving a nail with a hammer.
The key when using Reasoning Models is simple: trust that the model thinks deeply on its own, clarify only the goal, and specify only the result format.
Summary of Core Principles
| Principle | Core |
|---|---|
| Less is More | Prompts should be short and clear. Minimize process instructions |
| CoT Forbidden | Instructions like "think step by step" are unnecessary or counterproductive |
| Goals and Format | Delegate processes but always specify output format clearly |
| Hard Problems Only | Use only for tasks requiring complex reasoning. Use existing models for simple tasks |
| Cost Design | Never unconditionally apply to all requests. Branch by complexity required |
Toward Episode 15
Having covered Reasoning Models, the remaining frontier is prompting expanding from text to images, audio, and video.
"How does prompting differ when instructing AI with images or videos instead of text?"
In [Episode 15: Multimodal Prompting — AI Strategy Beyond Text with Images, Audio, and Video] we'll cover prompting strategies in multimodal environments beyond text.
