Dechive Logo
Dechive
Dev#dechive#llm#prompt#reasoning-model#o1#o3#extended-thinking#chain-of-thought#ai-prompting

Complete Guide to Reasoning Model Prompting — How to Use o1, o3, Claude Extended Thinking

AI that thinks for itself: Reasoning Models don't work with conventional prompt methods. Complete mastery of o1·o3·Claude Extended Thinking.

Introduction: The Question Left by Episode 13

In episode 13, while covering multi-agent systems, we left this preview:

"How can we make these agents perform more complex reasoning?"

Throughout this series so far, we've learned various prompting techniques. Clear instructions, step-by-step thinking with CoT, few-shot examples, role distinction with XML tags. All these techniques are based on one premise:

"The model thinks the way I guide it to."

But recently emerged AI models overturn this premise.

Before producing an answer, they think to themselves at length—sometimes using thousands of tokens. Without being asked to. These models are called Reasoning Models. OpenAI's o1·o3 and Claude's Extended Thinking are prime examples.

And these models don't respond well to conventional prompting methods. In fact, techniques that worked well before can sometimes diminish performance.

This episode covers from start to finish what Reasoning Models are, why conventional methods don't work, and how to prompt them correctly.


1. What is a Reasoning Model?

1.1. A One-Line Difference from Existing Models

The simplest explanation goes like this:

# Existing Models (GPT-4o, Claude Sonnet, etc.)
Input → [Single forward pass] → Output

# Reasoning Models (o1, o3, Extended Thinking)
Input → [Internal reasoning process (hundreds to thousands of tokens)] → Output

Existing models generate output immediately upon receiving input. Like someone speaking improvisationally.

Reasoning Models first "think" before outputting. This thinking process is either invisible to users (o1) or shown as a separate "thinking" block (Claude Extended Thinking). Only after thinking is complete do they produce the final answer.

1.2. What Happens Inside

The "thinking" process of a Reasoning Model resembles Chain-of-Thought which we covered in episode 5. The principles are indeed connected, but there's an important difference:

CoT is a technique where we instruct the model to think step-by-step.
The internal reasoning of a Reasoning Model is behavior learned into the model itself.

# CoT Prompt — humans give instructions
"Solve the following problem step by step.
 Step 1: Identify conditions
 Step 2: Set up equations
 Step 3: Calculate"

# Reasoning Model — does it on its own
User: "Solve the problem for me"
Internal reasoning: (model independently repeats condition identification → equations → calculation)
Output: "The answer is 42."

By analogy with humans: CoT is telling an existing model "think carefully step by step" to make it think deeply. A Reasoning Model is a person who thinks that way naturally without being told.

1.3. Current State of Major Reasoning Models

ModelDeveloperCharacteristics
o1OpenAIFirst commercial Reasoning Model. Strong in math and coding
o1-miniOpenAILightweight version of o1. Speed and cost trade-off
o3OpenAISuccessor to o1. Significant improvement in general reasoning ability
o3-miniOpenAILightweight version of o3
Claude Extended ThinkingAnthropicSupported from Claude 3.7 Sonnet. Thinking blocks controllable via API
DeepSeek R1DeepSeekOpen-source Reasoning Model. Developed by Chinese company

2. Why Existing Prompting Techniques Don't Work

2.1. "Think Step by Step" — They're Already Doing It

The core of CoT is prompting the model to think step-by-step with phrases like "Let's think step by step." But what happens when you add this instruction to a Reasoning Model?

# CoT applied to existing models — effective
"Find the bug in this code. Analyze step by step."
→ Model starts analyzing line by line ✅

# CoT applied to Reasoning Models — ineffective or backfires
"Find the bug in this code. Analyze step by step."
→ Model is already analyzing step-by-step internally
→ External instruction may conflict with internal reasoning ⚠️

Reasoning Models already reason step-by-step on their own. Adding CoT instructions provides no additional benefit or may even disrupt the model's internal reasoning flow.

2.2. Few-shot Examples — Can Be Harmful

Few-shot is a technique of showing desired answer formats through examples. It's highly effective with existing models. But adding few-shot with detailed reasoning processes to a Reasoning Model creates problems.

# Few-shot with existing models — effective
Example:
Q: What is 5 × 7?
A: Add 5 seven times. 5+5+5+5+5+5+5 = 35. Answer: 35
---
Q: What is 8 × 6?
→ Model follows example format and solves step-by-step ✅

# Same few-shot with Reasoning Model
→ Model's internal reasoning is already far more complex and sophisticated
→ Forcing a simple reasoning pattern example actually degrades performance ❌

For Reasoning Models, it's better to avoid few-shot entirely or, if used, show only input-output format without reasoning processes.

# Few-shot for Reasoning Models — show format only
Example (for format reference):
Q: Analyze contract
A: {"risk_clauses": [...], "recommended_changes": [...], "overall_risk": "high"}
---
Q: Analyze the following contract.
[contract content]

2.3. Excessive Instructions — Interferes with Model Reasoning

Detailed instructions like "think this way, analyze that way, answer in this order" help existing models. But what happens with Reasoning Models?

# Existing models — detailed instructions help
"Analyze in this order:
 1. Problem definition
 2. Hypothesis setting
 3. Evidence review
 4. Conclusion derivation
 Output each step as a ## heading."
→ Model provides structured analysis as instructed ✅

# Reasoning Models — excessive instructions backfire
Same instructions
→ Model's internal reasoning becomes constrained by external instructions
→ Flexible reasoning paths are blocked, resulting in oversimplified answers ❌

With Reasoning Models, the key is to clarify the goal only and let the process be the model's responsibility.


3. Reasoning Model Prompting Strategies

3.1. Principle: Less is More

This is the core principle of Reasoning Model prompting.

# Existing model style — long, detailed prompt
You are a senior software engineer.
When reviewing code, follow this order:
1. First understand the overall structure
2. Confirm each function's role
3. Mark parts with bug potential
4. Finally suggest improvements
Answer in Korean only.

# Reasoning Model style — short, clear prompt
As a senior software engineer, review this code and present bugs and improvements in Korean.

The second prompt is shorter, but with Reasoning Models it often produces better results. The model finds the most appropriate reasoning path on its own.

3.2. Clarify Goals, Delegate Processes

# ❌ Specifying processes
"First parse the data, then detect anomalies,
 and finally provide statistical summary."

# ✅ Clarifying goals only
"Detect anomalies in this sales data and provide a statistical summary."
# ❌ Specifying thinking approach
"Use inductive reasoning: analyze individual cases first, then derive general principles."

# ✅ Specifying results only
"Find common patterns from these 10 cases and organize them as principles."

3.3. Specify Output Format Clearly

Delegate the process, but specify the result format concretely. This part is the same as existing models.

# No format specification — model decides arbitrarily
"Analyze risk factors in this contract."
→ Sometimes lists, sometimes narrative, sometimes tables

# Format clearly specified
"Analyze risk factors in this contract.
 Output results only in the following JSON format:
 {
   'high_risk': ['clause name and reason', ...],
   'medium_risk': ['clause name and reason', ...],
   'recommendations': ['modification suggestions', ...]
 }"

3.4. Give Difficult Problems to See True Capability

Reasoning Models are overkill for simple tasks. Their true value emerges with complex problems.

# Tasks unsuitable for Reasoning Models
"Translate hello to English."
"Summarize this text."
"How do you sort a list in Python?"
→ Existing models suffice. Reasoning Models waste cost and time

# Tasks where Reasoning Models shine
"Analyze all security vulnerabilities in this system from every attack vector perspective."
"Derive all possible edge cases in this business logic."
"Mathematically prove the time complexity of this algorithm."
→ Complex reasoning tasks show clear performance gap over existing models

4. Claude Extended Thinking: Practical Usage

Claude Extended Thinking allows explicit control of the thinking block via API. This is a differentiating feature from other Reasoning Models.

4.1. Basic Structure

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # max tokens for internal reasoning
    },
    messages=[{
        "role": "user",
        "content": "Analyze the vulnerabilities in this encryption algorithm."
    }]
)

# thinking block and final answer are returned separately
for block in response.content:
    if block.type == "thinking":
        print("Internal reasoning:", block.thinking)
    elif block.type == "text":
        print("Final answer:", block.text)

4.2. budget_tokens Setting Guide

budget_tokens is the maximum tokens the model can use for internal reasoning. Adjust based on problem complexity.

Task TypeRecommended budget_tokensExamples
Simple reasoning1,000 ~ 3,000Simple math problems, basic code review
Medium complexity5,000 ~ 10,000Algorithm design, business logic analysis
High difficulty10,000 ~ 32,000Security audits, complex proofs, strategic planning
# Simple task — small budget
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=4000,
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=[{"role": "user", "content": "Explain the recurrence relation of the Fibonacci sequence."}]
)

# Complex task — large budget
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=20000,
    thinking={"type": "enabled", "budget_tokens": 15000},
    messages=[{"role": "user", "content": "Identify all failure scenarios in this distributed system and design response strategies."}]
)

4.3. What to Avoid with Extended Thinking

# ❌ Attempting to change temperature with thinking enabled
# Extended Thinking is fixed at temperature=1. Cannot be changed.
response = client.messages.create(
    model="claude-opus-4-5",
    thinking={"type": "enabled", "budget_tokens": 5000},
    temperature=0.5,  # Error occurs
    ...
)

# ❌ Using very large budget without streaming
# Response time may exceed tens of seconds → use streaming
response = client.messages.create(
    model="claude-opus-4-5",
    thinking={"type": "enabled", "budget_tokens": 30000},
    stream=True,  # Streaming essential for long reasoning
    ...
)

5. Reasoning Model Failure Patterns

5.1. Wasting on Simple Tasks

Reasoning Models are slower and more expensive than existing models. Using them for simple tasks wastes time and money.

# Reasoning Models are overkill for these questions
"How do you use the print function in Python?"
"What is JSON in one sentence?"
"Find the typo in this code."

# These are where they should be used
"Analyze architectural flaws and long-term maintenance risks in this codebase."
"Design an algorithm that satisfies all the following conditions and prove it."

5.2. The Mistake of Still Adding CoT

Habitually adding "step by step", "carefully", "first do X, then do Y" out of old habits.

# ❌ Prompt with CoT instructions added
"Review this mathematical proof.
 Step 1: Confirm prerequisites
 Step 2: Review logic of each step
 Step 3: Search for counterexamples
 Step 4: Derive conclusion"

# ✅ Clarifying goal only
"Review whether this mathematical proof is correct. If there are errors, present them with counterexamples."

The second prompt is shorter but produces better results with Reasoning Models.

5.3. Ignoring Costs in Design

Reasoning Model internal reasoning tokens also incur costs. If all requests are designed to use Reasoning Models unconditionally, costs explode.

# ❌ Applying Extended Thinking to all requests
def handle_request(user_message: str) -> str:
    return claude_extended_thinking(user_message)  # Same for simple questions

# ✅ Branch by complexity
def handle_request(user_message: str) -> str:
    if needs_deep_reasoning(user_message):
        return claude_extended_thinking(user_message)  # Complex reasoning needed
    else:
        return claude_standard(user_message)  # Simple tasks use existing model

5.4. The Mistake of Not Specifying Output Format

Delegate the process, but must always specify output format. Without format, answers come back in different structures each time.

# ❌ Format not specified
"Analyze the problems in this code."
→ Sometimes numbered lists, sometimes narrative, sometimes tables

# ✅ Format clearly specified
"Analyze the problems in this code.
 Output in the following format only:
 **Severity**: High/Medium/Low
 **Issues**: (list)
 **Fixes**: (with code)"

6. When to Use Reasoning Models and When Not to

6.1. Cases Where Reasoning Models Are Appropriate

✅ Mathematical proofs, complex calculations
✅ Multi-step code debugging (tracking bugs across multiple files)
✅ Comprehensive security vulnerability analysis
✅ Deriving all edge cases in complex business logic
✅ Decision analysis with intertwined conditions
✅ Detecting contradictions and logical errors in long documents

6.2. Cases Where Existing Models Are Better

❌ Simple translation, summarization, format conversion
❌ Short code writing (single function)
❌ FAQ answers, information retrieval
❌ Emotional writing, creative work
❌ Real-time responses where speed matters (chatbots)
❌ High-volume processing with budget constraints

6.3. Decision Criterion in One Line

"Is this a problem that expert humans would also need time to think through?"

If yes, use Reasoning Models. If no, existing models are better.


7. Complete Comparison: Existing Models vs Reasoning Models

AspectExisting ModelsReasoning Models
Response methodInput → immediate outputInput → internal reasoning → output
CoT effectivenessHigh (direct guidance needed)None (already built-in)
Few-shot effectivenessHighLow (show format only)
Optimal prompt lengthLonger is betterShort and clear is better
Response speedFastSlow (reasoning time)
CostLowHigh (includes reasoning tokens)
Suitable tasksGeneral tasksComplex reasoning tasks
Creative writingGoodAverage
Math and logic proofsAverageExcellent

Conclusion: Understand the Tool's Premise

From episode 1 through 13, we covered how to write good prompts. Giving clear instructions, establishing structure, and providing examples were hallmarks of good prompting.

Reasoning Models change this premise. Reducing instructions, not imposing structure, and clarifying goals only produces better results. Though they're AI models, their fundamental operation differs.

The most dangerous mistake in prompt engineering is applying the old approach without understanding the tool's new premise. Like driving a nail with a hammer.

The key when using Reasoning Models is simple: trust that the model thinks deeply on its own, clarify only the goal, and specify only the result format.

Summary of Core Principles

PrincipleCore
Less is MorePrompts should be short and clear. Minimize process instructions
CoT ForbiddenInstructions like "think step by step" are unnecessary or counterproductive
Goals and FormatDelegate processes but always specify output format clearly
Hard Problems OnlyUse only for tasks requiring complex reasoning. Use existing models for simple tasks
Cost DesignNever unconditionally apply to all requests. Branch by complexity required

Toward Episode 15

Having covered Reasoning Models, the remaining frontier is prompting expanding from text to images, audio, and video.

"How does prompting differ when instructing AI with images or videos instead of text?"

In [Episode 15: Multimodal Prompting — AI Strategy Beyond Text with Images, Audio, and Video] we'll cover prompting strategies in multimodal environments beyond text.

사서Dechive 사서