SQL Complete Mastery Part 2: Everything About Relational Databases and SQL — From Definitions to Data Types
Definition of database, history, characteristics of relational DB, the birth of SQL and types of commands, table and data types all organized at once
Introduction
In Part 1, we covered the concepts of data modeling and ERD. It was about designing how to structure data.
This part enters the space where we actually implement that design — the database. We'll cover from the beginning what a database is, how it has evolved, and where the SQL we use every day came from.
1. What is a Database — Why Are There Multiple Definitions
It might seem easy to define a database in one line, but in reality, the definition varies depending on perspective. Understanding what each definition emphasizes reveals the true nature of databases.
Definition 1: Integrated Data
Data scattered in various places is collected into one place to minimize duplication. If the same data is stored separately in multiple files, the moment you miss updating one, the data becomes inconsistent. A database controls this duplication.
Definition 2: Stored Data
Data recorded on a storage medium that a computer can access. It must be physically stored on a disk or in memory, not in someone's head or on paper.
Definition 3: Operational Data
It's not just about storing data, but rather data that is essential for an organization to conduct its business. Not temporary files or test dummy data, but data actually used in real business operations.
Definition 4: Shared Data
Data used together by multiple users and applications. Unlike a personal Excel file, a database must allow multiple people to access and modify simultaneously.
Combining the Definitions
A database is a collection of data that is integrated and managed for the purpose of being shared and used by multiple people.
This is the most comprehensive definition. The key is the three words: sharing, integration, management.
2. The Evolution of Databases
Databases haven't always looked like they do now. They have evolved over decades.
1960s: The File System Era
The way computers stored data was initially just simple files. Each program created its own file to store data. The problem was clear. The same data was stored redundantly across files, and whenever a program changed, the file structure had to change too. Data and programs were tightly coupled.
Late 1960s: The Emergence of the First DBMS
The true database era begins when IBM develops IMS (Information Management System). IMS used a Hierarchical Model. This method of storing data in a tree structure was efficient for data with clear parent-child relationships. It was first used in managing components for the Apollo moon exploration project.
1970s: The Birth of the Relational Model
In 1970, IBM researcher Edgar F. Codd publishes a paper.
"A Relational Model of Data for Large Shared Data Banks"
This paper changes the history of databases. It proposes a relational model that stores data in table form rather than trees or graphs, and handles data based on mathematical set theory. The theory existed, but actual implementation was yet to come.
Late 1970s~1980s: The Birth of SQL and Commercial RDBMS
Based on Codd's theory, IBM starts the System R project, and SEQUEL, the predecessor of SQL (Structured Query Language), is born. Later, Oracle releases the first commercial relational database, and IBM's DB2 follows. Relational databases begin to establish themselves as the standard.
1990s: The Golden Age of RDBMS
With the emergence of Oracle, MySQL, PostgreSQL, Microsoft SQL Server, and others, relational databases become the core of enterprise IT infrastructure. SQL is established as an ANSI/ISO standard, bringing linguistic unity.
2000s and Beyond: The Emergence and Coexistence of NoSQL
As the internet explodes in growth, massive unstructured data that existing RDBMS struggled to process pours in. Google, Amazon, and Facebook develop their own distributed databases, and NoSQL databases like MongoDB, Redis, and Cassandra emerge. Relational DBs haven't disappeared; rather, it became an era of choosing the right tool for the job.
3. Characteristics of Relational Databases
There's a reason why relational databases (RDBMS: Relational Database Management System) have survived for over 50 years.
Characteristic 1: Data is represented in tables
All data is stored in tables consisting of rows and columns. It's intuitive and easy to understand. If you've worked with Excel, the structure of a table is familiar.
Characteristic 2: Relationships between data are defined
Tables don't exist independently; they are connected through Foreign Keys. For example, a customer table and an order table are connected by customer ID. This relationship allows for flexible representation of complex data structures.
Characteristic 3: A standard language called SQL is used
SQL, a standard language, is used to query, insert, modify, and delete data. SQL learned in Oracle works mostly the same in MySQL and PostgreSQL. There's no need to learn a new language when switching databases.
Characteristic 4: Integrity is guaranteed
Data Integrity means that data always maintains an accurate and consistent state. RDBMS enforces this through various constraints.
- Primary Key: Uniquely identifies each row
- Foreign Key: Maintains consistency in relationships between tables
- NOT NULL: Prevents empty values
- UNIQUE: Prevents duplicate values
Characteristic 5: Transactions are supported
A Transaction is a unit of work that guarantees "all succeed or all fail". For example, with a money transfer — the withdrawal from account A and the deposit to account B must succeed together. If an error occurs, both are cancelled. This is the ACID principle.
- Atomicity: All succeed or all fail
- Consistency: Data integrity is maintained before and after the transaction
- Isolation: Concurrently running transactions don't affect each other
- Durability: Completed transactions are permanently reflected
4. The Background and Characteristics of SQL
Where Did SQL Come From
After Codd's relational model paper in 1970, IBM begins the System R project to actually implement it. During this process, a language was needed to handle data, and SEQUEL (Structured English Query Language) is developed. Later, due to trademark issues, the name is changed to SQL (Structured Query Language).
SQL becomes the lingua franca of database languages when ANSI adopts it in 1986 and ISO in 1987.
Characteristics of SQL
1. Non-Procedural Language
It specifies "what to retrieve" rather than "how to retrieve". General programming languages require you to write loops, conditionals, and pointer manipulations to fetch data. With SQL, you just declare the desired result, and the DBMS figures out the optimal way.
-- "Fetch the name and email of customers aged 20 and above"
-- The DBMS decides how to find them
SELECT name, email FROM customers WHERE age >= 20;
2. Set-Oriented
SQL processes data not one at a time, but at the set level. It fetches all matching rows at once, modifies them at once, and deletes them at once.
3. Case Insensitive
SELECT and select are the same. By convention, SQL keywords are often written in uppercase, but it's not enforced.
4. Standards exist, but dialects do too
While ANSI SQL is the standard, each DBMS has its own extended syntax — Oracle's PL/SQL, Microsoft's T-SQL, MySQL's dialect, and so on. Basic syntax is mostly compatible, but differences appear in advanced features.
5. Types of SQL Statements
SQL statements are classified into four categories by purpose.
DDL (Data Definition Language)
Commands that define the structure of a database such as tables, indexes, and views.
| Command | Description |
|---|---|
CREATE | Create a new table, database |
ALTER | Modify an existing structure (add/remove columns, etc.) |
DROP | Delete a table, database |
TRUNCATE | Keep the table structure but delete all data |
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE
);
DDL is automatically committed. You cannot undo execution. Use DROP carefully.
DML (Data Manipulation Language)
Commands that retrieve, insert, modify, and delete data within tables. This is the most commonly used in practice.
| Command | Description |
|---|---|
SELECT | Retrieve data |
INSERT | Insert data |
UPDATE | Modify data |
DELETE | Delete data |
-- Retrieve
SELECT * FROM users WHERE age > 20;
-- Insert
INSERT INTO users (id, name, email) VALUES (1, 'Hong Gildong', 'hong@email.com');
-- Update
UPDATE users SET email = 'new@email.com' WHERE id = 1;
-- Delete
DELETE FROM users WHERE id = 1;
DCL (Data Control Language)
Commands that control access permissions to the database.
| Command | Description |
|---|---|
GRANT | Grant permissions |
REVOKE | Revoke permissions |
-- Grant SELECT permission to a user
GRANT SELECT ON users TO 'john'@'localhost';
-- Revoke permission
REVOKE SELECT ON users FROM 'john'@'localhost';
TCL (Transaction Control Language)
Commands that control transactions. Some classifications include this in DCL, but many treat it separately.
| Command | Description |
|---|---|
COMMIT | Confirm transaction (permanently apply) |
ROLLBACK | Cancel transaction (undo changes) |
SAVEPOINT | Set a rollback point |
BEGIN;
UPDATE accounts SET balance = balance - 10000 WHERE id = 1;
UPDATE accounts SET balance = balance + 10000 WHERE id = 2;
COMMIT; -- If both succeed, confirm
-- On error: ROLLBACK; -- Cancel both
6. What is a Table
In a relational database, all data is stored in tables. A table is the most basic unit of data storage.
Components of a Table
┌──────┬──────────┬─────────────────────┬─────┐
│ id │ name │ email │ age │ ← Column (Field, Attribute)
├──────┼──────────┼─────────────────────┼─────┤
│ 1 │ Hong GD │ hong@example.com │ 28 │ ← Row (Record, Tuple)
│ 2 │ Kim CS │ kim@example.com │ 35 │
│ 3 │ Lee YH │ lee@example.com │ 22 │
└──────┴──────────┴─────────────────────┴─────┘
- Column: Defines an attribute of the data. Each column has a name and data type.
- Row: A single piece of actual data. Each row has values for all columns in the table.
- Cell: A single value at the intersection of a row and column.
Important Concepts in Table Design
Primary Key: A column that uniquely identifies each row. It cannot be duplicated or NULL. In the example above, id serves as the primary key.
Foreign Key: A column that references the primary key of another table. It creates relationships between tables.
-- The user_id column in the orders table references the id column in the users table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT,
product VARCHAR(100),
FOREIGN KEY (user_id) REFERENCES users(id)
);
7. Data Types — Importance and Kinds
Why Data Types Matter
When defining a column, you must specify a Data Type. Data types aren't just about matching format.
1. Storage Space Efficiency: Storing numbers as characters wastes unnecessary space.
2. Calculation Accuracy: Only numeric types allow mathematical operations. Comparing the character-stored "10" and "9" results in "9" > "10" by string standards.
3. Index Performance: Using the right type makes indexes work more efficiently.
4. Data Integrity: It automatically prevents "Hello" from being entered into a date column.
Major Data Types
Numeric
| Type | Description | Range |
|---|---|---|
INT / INTEGER | Integer | Approx. ±2.1 billion |
BIGINT | Large integer | Approx. ±920 quadrillion |
DECIMAL(p, s) | Fixed-point decimal | Use for monetary calculations |
FLOAT / DOUBLE | Floating-point | Approximate values, scientific calculations |
Always use DECIMAL for amounts. FLOAT causes problems like 0.1 + 0.2 = 0.30000000000000004 due to floating-point errors.
Character
| Type | Description |
|---|---|
CHAR(n) | Fixed-length string |
VARCHAR(n) | Variable-length string |
TEXT | Long string with no length limit |
Date/Time
| Type | Description | Example |
|---|---|---|
DATE | Date only | 2026-04-09 |
TIME | Time only | 14:30:00 |
DATETIME | Date + time | 2026-04-09 14:30:00 |
TIMESTAMP | DATETIME + timezone | Can automatically record current time |
Boolean
| Type | Description |
|---|---|
BOOLEAN / BOOL | True/false (MySQL treats as TINYINT(1)) |
8. CHAR vs VARCHAR — When to Use What
This is the most frequently confused type. Both store strings, but they work differently.
CHAR(n) — Fixed Length
Always occupies the declared length n. If the actual data is shorter, the rest is padded with spaces.
-- Storing 'ABC' in CHAR(10)
-- 'ABC ' (7 spaces added, uses 10 bytes total)
When to use:
- Data with always consistent length — social security numbers (13 digits), country codes (
KR,US), gender codes (M,F) - Short code values with fixed length
Advantage: Fixed length makes storage and retrieval faster. Good index performance.
VARCHAR(n) — Variable Length
Stores only as much space as the actual data needs. However, there's overhead for storing length information.
-- Storing 'ABC' in VARCHAR(100)
-- 'ABC' + length info (uses approximately 3-4 bytes)
When to use:
- Data with varying length — names, emails, addresses, titles
- Most general string data
Advantage: Storage space efficient.
Quick Comparison
| Aspect | CHAR(10) | VARCHAR(10) |
|---|---|---|
| Storing 'A' | 10 bytes | 2 bytes (1 + length info) |
| Storing 'ABCDEFGHIJ' | 10 bytes | 11 bytes (10 + length info) |
| Speed | Slightly faster | Slightly slower |
| Space efficiency | Low | High |
| Suitable data | Fixed-length codes | General strings |
Practical Decision Making
Is the length always the same?
→ YES: CHAR
→ NO: VARCHAR
Can the data length exceed 255 characters?
→ YES: TEXT
→ NO: VARCHAR
Conclusion
A database is not just a storage facility. It originates from Codd's theory, becomes concrete through the SQL language, and holds data through the structure of tables and data types.
The content covered in this part is the most fundamental foundation for learning SQL. In the next part, we'll cover practical SQL syntax — actually creating tables, retrieving data, applying conditions, and connecting multiple tables.
