SQL Complete Mastery Part 2: Everything About Relational Databases and SQL — From Definitions to Data Types

Introduction

In Part 1, we covered the concepts of data modeling and ERD. It was about designing how to structure data.

This part enters the space where we actually implement that design — the database. We'll cover from the beginning what a database is, how it has evolved, and where the SQL we use every day came from.

1. What is a Database — Why Are There Multiple Definitions

It might seem easy to define a database in one line, but in reality, the definition varies depending on perspective. Understanding what each definition emphasizes reveals the true nature of databases.

Definition 1: Integrated Data

Data scattered in various places is collected into one place to minimize duplication. If the same data is stored separately in multiple files, the moment you miss updating one, the data becomes inconsistent. A database controls this duplication.

Definition 2: Stored Data

Data recorded on a storage medium that a computer can access. It must be physically stored on a disk or in memory, not in someone's head or on paper.

Definition 3: Operational Data

It's not just about storing data, but rather data that is essential for an organization to conduct its business. Not temporary files or test dummy data, but data actually used in real business operations.

Definition 4: Shared Data

Data used together by multiple users and applications. Unlike a personal Excel file, a database must allow multiple people to access and modify simultaneously.

Combining the Definitions

A database is a collection of data that is integrated and managed for the purpose of being shared and used by multiple people.

This is the most comprehensive definition. The key is the three words: sharing, integration, management.

2. The Evolution of Databases

Databases haven't always looked like they do now. They have evolved over decades.

1960s: The File System Era

The way computers stored data was initially just simple files. Each program created its own file to store data. The problem was clear. The same data was stored redundantly across files, and whenever a program changed, the file structure had to change too. Data and programs were tightly coupled.

Late 1960s: The Emergence of the First DBMS

The true database era begins when IBM develops IMS (Information Management System). IMS used a Hierarchical Model. This method of storing data in a tree structure was efficient for data with clear parent-child relationships. It was first used in managing components for the Apollo moon exploration project.

1970s: The Birth of the Relational Model

In 1970, IBM researcher Edgar F. Codd publishes a paper.

"A Relational Model of Data for Large Shared Data Banks"

This paper changes the history of databases. It proposes a relational model that stores data in table form rather than trees or graphs, and handles data based on mathematical set theory. The theory existed, but actual implementation was yet to come.

Late 1970s~1980s: The Birth of SQL and Commercial RDBMS

Based on Codd's theory, IBM starts the System R project, and SEQUEL, the predecessor of SQL (Structured Query Language), is born. Later, Oracle releases the first commercial relational database, and IBM's DB2 follows. Relational databases begin to establish themselves as the standard.

1990s: The Golden Age of RDBMS

With the emergence of Oracle, MySQL, PostgreSQL, Microsoft SQL Server, and others, relational databases become the core of enterprise IT infrastructure. SQL is established as an ANSI/ISO standard, bringing linguistic unity.

2000s and Beyond: The Emergence and Coexistence of NoSQL

As the internet explodes in growth, massive unstructured data that existing RDBMS struggled to process pours in. Google, Amazon, and Facebook develop their own distributed databases, and NoSQL databases like MongoDB, Redis, and Cassandra emerge. Relational DBs haven't disappeared; rather, it became an era of choosing the right tool for the job.

3. Characteristics of Relational Databases

There's a reason why relational databases (RDBMS: Relational Database Management System) have survived for over 50 years.

Characteristic 1: Data is represented in tables

All data is stored in tables consisting of rows and columns. It's intuitive and easy to understand. If you've worked with Excel, the structure of a table is familiar.

Characteristic 2: Relationships between data are defined

Tables don't exist independently; they are connected through Foreign Keys. For example, a customer table and an order table are connected by customer ID. This relationship allows for flexible representation of complex data structures.

Characteristic 3: A standard language called SQL is used

SQL, a standard language, is used to query, insert, modify, and delete data. SQL learned in Oracle works mostly the same in MySQL and PostgreSQL. There's no need to learn a new language when switching databases.

Characteristic 4: Integrity is guaranteed

Data Integrity means that data always maintains an accurate and consistent state. RDBMS enforces this through various constraints.

Primary Key: Uniquely identifies each row
Foreign Key: Maintains consistency in relationships between tables
NOT NULL: Prevents empty values
UNIQUE: Prevents duplicate values

Characteristic 5: Transactions are supported

A Transaction is a unit of work that guarantees "all succeed or all fail". For example, with a money transfer — the withdrawal from account A and the deposit to account B must succeed together. If an error occurs, both are cancelled. This is the ACID principle.

Atomicity: All succeed or all fail
Consistency: Data integrity is maintained before and after the transaction
Isolation: Concurrently running transactions don't affect each other
Durability: Completed transactions are permanently reflected

4. The Background and Characteristics of SQL

Where Did SQL Come From

After Codd's relational model paper in 1970, IBM begins the System R project to actually implement it. During this process, a language was needed to handle data, and SEQUEL (Structured English Query Language) is developed. Later, due to trademark issues, the name is changed to SQL (Structured Query Language).

SQL becomes the lingua franca of database languages when ANSI adopts it in 1986 and ISO in 1987.

Characteristics of SQL

1. Non-Procedural Language

It specifies "what to retrieve" rather than "how to retrieve". General programming languages require you to write loops, conditionals, and pointer manipulations to fetch data. With SQL, you just declare the desired result, and the DBMS figures out the optimal way.

-- "Fetch the name and email of customers aged 20 and above"
-- The DBMS decides how to find them
SELECT name, email FROM customers WHERE age >= 20;

2. Set-Oriented

SQL processes data not one at a time, but at the set level. It fetches all matching rows at once, modifies them at once, and deletes them at once.

3. Case Insensitive

SELECT and select are the same. By convention, SQL keywords are often written in uppercase, but it's not enforced.

4. Standards exist, but dialects do too

While ANSI SQL is the standard, each DBMS has its own extended syntax — Oracle's PL/SQL, Microsoft's T-SQL, MySQL's dialect, and so on. Basic syntax is mostly compatible, but differences appear in advanced features.

5. Types of SQL Statements

SQL statements are classified into four categories by purpose.

DDL (Data Definition Language)

Commands that define the structure of a database such as tables, indexes, and views.

Command	Description
`CREATE`	Create a new table, database
`ALTER`	Modify an existing structure (add/remove columns, etc.)
`DROP`	Delete a table, database
`TRUNCATE`	Keep the table structure but delete all data

CREATE TABLE users (
  id INT PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  email VARCHAR(100) UNIQUE
);

DDL is automatically committed. You cannot undo execution. Use DROP carefully.

DML (Data Manipulation Language)

Commands that retrieve, insert, modify, and delete data within tables. This is the most commonly used in practice.

Command	Description
`SELECT`	Retrieve data
`INSERT`	Insert data
`UPDATE`	Modify data
`DELETE`	Delete data

-- Retrieve
SELECT * FROM users WHERE age > 20;

-- Insert
INSERT INTO users (id, name, email) VALUES (1, 'Hong Gildong', 'hong@email.com');

-- Update
UPDATE users SET email = 'new@email.com' WHERE id = 1;

-- Delete
DELETE FROM users WHERE id = 1;

DCL (Data Control Language)

Commands that control access permissions to the database.

Command	Description
`GRANT`	Grant permissions
`REVOKE`	Revoke permissions

-- Grant SELECT permission to a user
GRANT SELECT ON users TO 'john'@'localhost';

-- Revoke permission
REVOKE SELECT ON users FROM 'john'@'localhost';

TCL (Transaction Control Language)

Commands that control transactions. Some classifications include this in DCL, but many treat it separately.

Command	Description
`COMMIT`	Confirm transaction (permanently apply)
`ROLLBACK`	Cancel transaction (undo changes)
`SAVEPOINT`	Set a rollback point

BEGIN;
UPDATE accounts SET balance = balance - 10000 WHERE id = 1;
UPDATE accounts SET balance = balance + 10000 WHERE id = 2;
COMMIT; -- If both succeed, confirm
-- On error: ROLLBACK; -- Cancel both

6. What is a Table

In a relational database, all data is stored in tables. A table is the most basic unit of data storage.

Components of a Table

┌──────┬──────────┬─────────────────────┬─────┐
│  id  │   name   │        email        │ age │  ← Column (Field, Attribute)
├──────┼──────────┼─────────────────────┼─────┤
│  1   │ Hong GD  │ hong@example.com    │ 28  │  ← Row (Record, Tuple)
│  2   │ Kim CS   │ kim@example.com     │ 35  │
│  3   │ Lee YH   │ lee@example.com     │ 22  │
└──────┴──────────┴─────────────────────┴─────┘

Column: Defines an attribute of the data. Each column has a name and data type.
Row: A single piece of actual data. Each row has values for all columns in the table.
Cell: A single value at the intersection of a row and column.

Important Concepts in Table Design

Primary Key: A column that uniquely identifies each row. It cannot be duplicated or NULL. In the example above, id serves as the primary key.

Foreign Key: A column that references the primary key of another table. It creates relationships between tables.

-- The user_id column in the orders table references the id column in the users table
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  user_id INT,
  product VARCHAR(100),
  FOREIGN KEY (user_id) REFERENCES users(id)
);

7. Data Types — Importance and Kinds

Why Data Types Matter

When defining a column, you must specify a Data Type. Data types aren't just about matching format.

1. Storage Space Efficiency: Storing numbers as characters wastes unnecessary space.

2. Calculation Accuracy: Only numeric types allow mathematical operations. Comparing the character-stored "10" and "9" results in "9" > "10" by string standards.

3. Index Performance: Using the right type makes indexes work more efficiently.

4. Data Integrity: It automatically prevents "Hello" from being entered into a date column.

Major Data Types

Numeric

Type	Description	Range
`INT` / `INTEGER`	Integer	Approx. ±2.1 billion
`BIGINT`	Large integer	Approx. ±920 quadrillion
`DECIMAL(p, s)`	Fixed-point decimal	Use for monetary calculations
`FLOAT` / `DOUBLE`	Floating-point	Approximate values, scientific calculations

Always use DECIMAL for amounts. FLOAT causes problems like 0.1 + 0.2 = 0.30000000000000004 due to floating-point errors.

Character

Type	Description
`CHAR(n)`	Fixed-length string
`VARCHAR(n)`	Variable-length string
`TEXT`	Long string with no length limit

Date/Time

Type	Description	Example
`DATE`	Date only	`2026-04-09`
`TIME`	Time only	`14:30:00`
`DATETIME`	Date + time	`2026-04-09 14:30:00`
`TIMESTAMP`	DATETIME + timezone	Can automatically record current time

Boolean

Type	Description
`BOOLEAN` / `BOOL`	True/false (MySQL treats as `TINYINT(1)`)

8. CHAR vs VARCHAR — When to Use What

This is the most frequently confused type. Both store strings, but they work differently.

CHAR(n) — Fixed Length

Always occupies the declared length n. If the actual data is shorter, the rest is padded with spaces.

-- Storing 'ABC' in CHAR(10)
-- 'ABC       ' (7 spaces added, uses 10 bytes total)

When to use:

Data with always consistent length — social security numbers (13 digits), country codes (KR, US), gender codes (M, F)
Short code values with fixed length

Advantage: Fixed length makes storage and retrieval faster. Good index performance.

VARCHAR(n) — Variable Length

Stores only as much space as the actual data needs. However, there's overhead for storing length information.

-- Storing 'ABC' in VARCHAR(100)
-- 'ABC' + length info (uses approximately 3-4 bytes)

When to use:

Data with varying length — names, emails, addresses, titles
Most general string data

Advantage: Storage space efficient.

Quick Comparison

Aspect	CHAR(10)	VARCHAR(10)
Storing 'A'	10 bytes	2 bytes (1 + length info)
Storing 'ABCDEFGHIJ'	10 bytes	11 bytes (10 + length info)
Speed	Slightly faster	Slightly slower
Space efficiency	Low	High
Suitable data	Fixed-length codes	General strings

Practical Decision Making

Is the length always the same?
  → YES: CHAR
  → NO:  VARCHAR

Can the data length exceed 255 characters?
  → YES: TEXT
  → NO:  VARCHAR

Conclusion

A database is not just a storage facility. It originates from Codd's theory, becomes concrete through the SQL language, and holds data through the structure of tables and data types.

The content covered in this part is the most fundamental foundation for learning SQL. In the next part, we'll cover practical SQL syntax — actually creating tables, retrieving data, applying conditions, and connecting multiple tables.