Dechive Logo
Dechive
Dev#GA4#구글 애널리틱스#데이터 분석#웹 분석#마케팅

What is GA4 — Complete Mastery of Google Analytics 4 Concepts and Structure Part 1

Complete Mastery of GA4 from Beginning to End: What It Is, Why It Was Created, and How It Works

I open GA4 every day and look at the numbers. Visitors, sessions, events. But do you know where and how those numbers are created?

It's not impossible to run a site without knowing this. However, there's quite a big difference between understanding GA4's structure and looking at the numbers, versus just looking at the numbers. Even when looking at the same data, some people extract insights while others just say "the visitors increased" and move on.

This series covers GA4 from beginning to end. In Part 1, we'll explore why GA4 was created, what's different from the previous version, and how data is collected and stored structurally. Once you grasp the concepts, you'll be able to read subsequent reports correctly.


The Background of GA4's Emergence

The history of Google Analytics began in 2005. Born when Google acquired Urchin Software, this tool subsequently became the standard in web analytics. In 2012, it underwent a major upgrade under the name UA (Universal Analytics), and was subsequently installed on tens of millions of sites worldwide for about 10 years.

Then in 2020, Google released an entirely newly designed analytics tool. GA4.

Why did Google abandon the well-functioning UA and create something new? There are three reasons.

The Rise of Smartphones and Apps

When UA was being designed, most web usage happened on PC browsers. But as smartphones became widespread, the situation changed. People started using the internet through apps, mobile browsers, and tablets.

UA was structurally designed as a tool for web browsers. It was very difficult to integrate and analyze apps and web in a single tool. For example, if a user viewed a product on a smartphone app and later purchased it on a PC browser, UA had difficulty connecting these two actions to the same person. This was because the devices were different, the sessions were different, and the cookies were different.

UA relied heavily on browser cookies to identify users. But from the late 2010s onwards, cookie regulations began strengthening globally.

Europe's GDPR, Apple's ITP (Intelligent Tracking Prevention), and Chrome's announcement to discontinue third-party cookies. This trend shook cookie-based tracking itself. UA had a structure that was difficult to adapt to this change.

The Need for Machine Learning and Predictive Analytics

The paradigm of data analysis was changing. It was evolving from seeing "what happened in the past" to predicting "what will happen in the future." UA's structure was strong at aggregating historical data, but the design itself wasn't suited to incorporating predictive analytics using machine learning.

GA4 is a tool completely redesigned from scratch to solve these three problems. It was officially launched in October 2020, and on July 1, 2023, Google completely stopped collecting data from UA. Now GA4 is the only standard.


What's Different Between UA and GA4

On the surface, both look like "tools that show visitor numbers." But the way data is collected and stored is fundamentally different. If you don't understand this difference, you can't interpret GA4's numbers correctly.

UA Session-Based vs GA4 Event-Based Model Comparison

Session-Based vs Event-Based

UA's data unit is the session. When a user accesses a site, a session is created, and all page views, events, and conversions occurring within that session are attributed to that session.

GA4 is different. GA4's data unit is the event. Everything that happens on a site is an event.

ActionGA4 Event Name
Opened a pagepage_view
Scrolled a page to 90%scroll
Clicked an external linkclick
Started playing a videovideo_start
Completed a purchasepurchase

Sessions still exist in GA4, but they're no longer the basic unit of data. A session starts with a session_start event, and all subsequent events have a session ID attached as a parameter. The session has become an attribute that describes the relationship between events.

Thanks to this structure, GA4 can handle web and apps in the same way. The screen_view event when viewing a screen in an app and the page_view event when viewing a page on the web have the same structure. This is why you can integrate and analyze web and app data in one property.

Changes in User Identification Method

In UA, the primary way to identify users was the client ID stored in browser cookies. If the cookie was deleted, a different browser was used, or a different device was used, the user was recognized as a different person.

GA4 uses three methods hierarchically for user identification.

GA4 User Identification Priority - 3 Levels

PriorityMethodDescription
1stUser IDAssign unique ID to logged-in users. Same user recognized across devices
2ndGoogle Signals DataIf Google account is logged in + ad personalization is enabled, Google identifies by account
3rdDevice-Based IdentificationBrowser cookie (client ID) or app instance ID. Similar to UA method

This identification priority is controlled by GA4's Reporting Identity setting. You can select which method to prioritize in GA4 Property → Data Display → Reporting Identity.

Flexibility of Event Structure

In UA, to collect custom events, you had to fit into 4 fixed fields: category, action, label, and value. Data that didn't fit this framework had to be forced in.

GA4 allows free design of event names and parameters. Using a blog post reading event as an example, it can be structured like this:

Event Name: post_read
Parameters:
  post_title: "What is GA4"
  post_category: "Dev"
  read_percentage: 85
  reading_time_seconds: 420

To contain the same data in UA, you had to put "post_read" in category, "Dev" in action, the title in label, and separately set up a custom dimension for read percentage. GA4's approach is much more intuitive.

Free BigQuery Integration

In UA, BigQuery integration was only possible with GA4 360 (paid version). The ability to export raw data was a paid feature.

GA4 provides BigQuery export for free. The reports in the GA4 interface show data that Google has pre-aggregated, and sampling may occur depending on circumstances. Raw data exported to BigQuery is stored as individual records one by one without sampling. Direct analysis with SQL allows for a depth of analysis impossible in the GA4 interface.

Below is a table summarizing the key differences between UA and GA4.

ItemUA (Universal Analytics)GA4
Data UnitSessionEvent
Web+App IntegrationDifficult (separate tool needed)Natively supported
User IdentificationCookie-basedHierarchical (User ID → Signal → Cookie)
Event StructureFixed category/action/label/valueFree design of name + parameters
BigQueryPaid version onlyFree
Predictive AnalysisNot supportedMachine learning-based prediction supported
Service StatusEnded July 2023Current standard

Event-Based Data Structure

GA4's core is events. Understanding how events are classified and what structure they have makes subsequent setup and analysis much clearer.

Four Types of Events

GA4 events are divided into four types based on collection method.

GA4 Event Types Hierarchy

1. Automatically Collected Events

These are automatically collected simply by installing the GA4 tag. No separate configuration is needed.

EventWhen It Occurs
first_visitWhen a user visits the site for the first time
session_startWhen a session starts
user_engagementWhen staying 10+ seconds, conversion occurs, or 2+ pages viewed

2. Enhanced Measurement Events

These are additionally collected when you enable enhanced measurement in GA4 data stream settings. GA4 automatically detects them without code writing.

EventWhen It Occurs
page_viewWhen a page loads (enabled by default)
scrollWhen a page is scrolled 90% or more
clickWhen an external link is clicked
view_search_resultsWhen a site search results page is viewed
video_startWhen YouTube video playback starts
video_progressWhen YouTube video reaches 10%, 25%, 50%, 75%
video_completeWhen YouTube video playback completes
file_downloadWhen a file is downloaded

3. Recommended Events

Events that Google recommends by industry. GA4 doesn't automatically collect them, but if you follow the specified name and parameter format, GA4's standard reports will automatically aggregate them.

Taking e-commerce related recommended events as examples:

EventMeaningRequired Parameters
view_itemProduct detail page viewitems
add_to_cartAdded to cartitems, value, currency
begin_checkoutCheckout starteditems, value, currency
purchasePurchase completedtransaction_id, value, currency, items

These events must have the exact name and parameters to be processed correctly in GA4 reports.

4. Custom Events

Events that don't fall into the above three types—you design and implement them yourself. You define behaviors you want to collect that match your site's characteristics, and write code or set up in Google Tag Manager (GTM) to send events when those behaviors occur.

One important note: custom event data doesn't automatically appear in GA4's standard reports. To analyze it in reports, you must set up a custom definition to register that parameter as a dimension or metric. This is covered in detail in Part 5.

Event Structure: Name and Parameters

All events in GA4 consist of two elements.

  • Event Name: Indicates what action occurred. Things like page_view, scroll, purchase.
  • Parameters: Additional information about that event. Information like which page was viewed or which product was purchased.

When a user views a specific post on this blog, the page_view event that occurs actually has this structure:

Event Name: page_view
Parameters:
  page_location: "https://dechive.info/archive/ga4-introduction"
  page_referrer: "https://www.google.com"
  page_title: "What is GA4"
  engagement_time_msec: 0

Every time an event occurs, this data is sent to Google's servers, where GA4 stores it, aggregates it, and displays it in reports.

Parameters come in two types.

TypeDescriptionExample
Event ParameterContext information when that event occursPage URL, product price, category
User PropertyInformation about the user itself, independent of eventsLogin status, membership grade, language setting

Once user properties are set, they're automatically attached to all subsequent events and sent.


GA4's Data Collection Flow

Understanding the process GA4 goes through from data collection to report display helps you identify causes when data looks unusual later.

GA4 Data Collection Flow - 5 Steps

Step 1: User Action Occurs

Users access the site, view pages, scroll, click, and make purchases. All these actions are the source of data.

Step 2: Event Generation on the Client

The GA4 tag installed on the site detects the user's action and creates event data. This process happens with JavaScript executing inside the user's browser.

If enhanced measurement is enabled, the GA4 tag automatically detects scrolling, clicking, page transitions, and so on. For custom events, code written by developers or tags configured in GTM generate data at the appropriate time.

Step 3: Transmission to Google Servers

The generated event data is sent to Google's collection servers. It's sent via HTTP POST request, and you can confirm it in the Network tab of browser developer tools as a google-analytics.com/g/collect request.

There's one important thing to know here. If this transmission fails, data is lost. If users navigate away too quickly or ad blockers prevent GA4 requests, data may not be collected. GA4's collected data doesn't reflect 100% of actual user behavior. It should be understood as a tool for understanding trends.

Step 4: Data Processing and Storage

Event data that reaches Google's servers goes through a processing phase. Sessions are calculated, bot traffic is filtered, geographic location and device information are categorized. Filters you've set up or event modifications are also applied during this step.

This processing takes time. According to Google's official documentation, most GA4 report data is fully processed and displayed within 24-48 hours of event occurrence. However, real-time reports can be checked within minutes separately. To accurately see today's data in standard reports, it's safe to check the next day.

Step 5: Display in Reports

Processed data is displayed in the GA4 interface. GA4 reports don't show raw data as-is, but rather pre-aggregated data. When the amount of data is large, sampling may occur.

To see raw data—individual records of each event—BigQuery integration is necessary. In BigQuery, raw event data is stored by date in tables with the format events_YYYYMMDD. This will be covered in detail in the latter part of the series.


GA4's Account Structure

When first setting up GA4, the account structure can be confusing. GA4 has a 3-level structure: account, property, and data stream.

GA4 Account > Property > Data Stream Structure

LevelDescriptionExample
AccountTop-level unit. One business or organizationDechive Account
PropertyBasic unit of analysis. Web+app integration possible. Measurement ID (G-XXXXXXXXXX) assignedDechive Property
Data StreamChannel where data flows in. Divided into Web / Android / iOSDechive Web Stream

You can create multiple properties within one account, and add multiple data streams within one property.


GA4 Key Terminology Summary

Terms frequently encountered in GA4 reports. Understanding their precise meaning is essential for reading numbers correctly.

User

GA4 aggregates users in two ways.

CategoryDescription
Total UsersUsers who visited at least once during the selected period
Active UsersUsers who started an engaged session, triggered a conversion event, or first launched an app

The "users" number displayed by default in GA4 reports is the active users count. Don't directly compare with UA's user figures.

Session

A unit of interaction when a user accesses a site. In GA4, a session starts with a session_start event and ends after 30 minutes of no interaction. Unlike UA, sessions aren't forcibly split at midnight.

Engaged Session is a concept newly introduced in GA4. It refers to a session meeting at least one of these three conditions:

  • Session where user stayed on site for 10+ seconds
  • Session where a conversion event occurred
  • Session where 2+ pages or screens were viewed

Conversion

Important events related to business goals. When you mark a desired event as a conversion, GA4 reports aggregate it separately, and you can use it as an optimization objective when linking with Google Ads. This is the equivalent of UA's "goal."

Dimensions and Metrics

The two most fundamental concepts for understanding GA4 reports.

ConceptDescriptionFormExample
DimensionCriteria for classifying dataText (string)Country, device type, channel, page URL
MetricMeasurable numerical valuesNumbersUser count, session count, conversion rate

Reports always combine "what dimension to divide by" and "what metric to view." Examples include "active users by country" or "conversion rate by channel."

Engagement Rate and Bounce Rate

IndicatorDefinitionBetter When
Engagement RateRatio of engaged sessions to all sessionsHigher
Bounce RateRatio of non-engaged sessions to all sessions (= 100% - Engagement Rate)Lower

UA's bounce rate and GA4's bounce rate are calculated differently. Don't directly compare bounce rate figures between the two tools.


Summary

The content covered in Part 1 is compressed into three points.

First, GA4 is event-centric. Everything that happens on the site—page views, clicks, scrolls, purchases—are all events. These events are the raw material for GA4 analysis.

Second, GA4 analyzes centered on users. It aims to connect actions across multiple devices to a single user. Unlike UA which relied only on cookies, GA4 hierarchically leverages user IDs and Google signals data.

Third, GA4's data goes through a processing phase. From event occurrence to report display takes 24-48 hours, and the data in reports is in aggregated form. If you need raw data, use BigQuery.

In Part 2, we'll dive deeper into GA4's event design. We'll cover what events you should collect, how to design events and parameters so analysis produces useful data, and what the criteria are for conversion design.

사서Dechive 사서