#2 - Z Chart Analysis

μ‹œμž‘ν•˜κΈ° 전에

규λͺ¨μžˆλŠ” 쿼리λ₯Ό μž‘μ„±ν•˜κΈ° μœ„ν•΄μ„œλŠ” μΌκ΄€λœ μ½”λ“œ ꡬ쑰와 μŠ€νƒ€μΌμ˜ μœ μ§€κ°€ μ€‘μš”ν•˜λ‹€. 이 κΈ€μ˜ μ˜ˆμ‹œλŠ” λ‹€μŒ BigQuery μŠ€νƒ€μΌ κ°€μ΄λ“œλ₯Ό μ°Έκ³ ν•˜κ³  μžˆλ‹€.

BI 뢄석 ν•Έμ¦ˆμ˜¨

κΈ°μ—…μ˜ 맀좜과 같이 μ‹œκ°„μ˜ 경과에 따라 λ³€ν™”ν•˜λŠ” λ°μ΄ν„°λ‘œλΆ€ν„° κ³Όκ±° μ‹€μ μ΄λ‚˜ 좔이 등을 νŒŒμ•…ν•˜κ³  미래의 싀적을 μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ λͺ©μ μœΌλ‘œ μ‹œκ³„μ—΄ 뢄석이 ν™œμš©λœλ‹€.

이번 ν•Έμ¦ˆμ˜¨μ€ μ‹œκ³„μ—΄ 뢄석 기법 μ€‘μ˜ ν•˜λ‚˜μΈ Z-Chart 뢄석을 BigQuery μ–Έμ–΄λ‘œ μ§„ν–‰ν•˜λ©° SQL ν™œμš© λŠ₯λ ₯을 ν‚€μš°λŠ” 것을 λͺ©ν‘œλ‘œ ν•œλ‹€.

μ•„λž˜μ˜ λ‹¨κ³„λ³„λ‘œ Z-Chart 뢄석에 μš”κ΅¬λ˜λŠ” μ§€ν‘œλ“€μ„ μ‚°μΆœν•˜κ³  μ‹œκ°ν™”ν•˜λŠ” 과정을 따라가 보자.

raw data --> data mart --> visualization

데이터 μ€€λΉ„

ν•Έμ¦ˆμ˜¨μ— μ‚¬μš©ν•  맀좜 λ°μ΄ν„°λŠ” μ•„λž˜ μ°Έκ³ λ„μ„œμ—μ„œ κ³΅κ°œν•œ 뢀둝/μ˜ˆμ œμ†ŒμŠ€λ₯Ό μ°Έκ³ ν•˜μ—¬ BigQuery ν™˜κ²½μ—μ„œ μ‹€μŠ΅ κ°€λŠ₯ν•˜λ„λ‘ μ •λ¦¬ν•˜μ˜€λ‹€.

[μ°Έκ³ λ„μ„œ] 데이터 뢄석을 μœ„ν•œ SQL λ ˆμ‹œν”Ό - 4μž₯

맀좜 데이터 뢄석

1. μ˜ˆμ‹œ ν…Œμ΄λΈ” 생성

μš°μ„  예제 μ†ŒμŠ€ 쀑 9-1-data.sql 을 μ΅œμ†Œν•œμ˜ λ³€κ²½μœΌλ‘œ BigQuery ν…Œμ΄λΈ”λ‘œ 생성해 보도둝 ν•˜μž.

CREATE OR REPLACE TABLE learning_club.purchase_log ( 
    dt              STRING,
    order_id        INT64,
    user_id         STRING,
    purchase_amount INT64,
);
INSERT INTO learning_club.purchase_log VALUES 
  ('2014-01-01',  1, 'rhwpvvitou', 13900),
  ('2014-01-01',  2, 'hqnwoamzic', 10616),
  ('2014-01-02',  3, 'tzlmqryunr', 21156),
  ('2014-01-10', 26, 'cyxfgumkst', 11339)
-- ...

무료둜 μ‚¬μš©μ€‘μΈ GCP Project free tier μ—μ„œ μœ„μ˜ μΏΌλ¦¬λŠ” INSERT 문을 μ‚¬μš©ν•˜μ§€ λͺ»ν•œλ‹€λŠ” μ—λŸ¬λ₯Ό λŒλ €μ€€λ‹€.

λ°°μ›€μ˜ 길은 멀고도 ν—˜ν•˜λ‹€. μ–΄μ©” 수 μ—†λ‹€. μ•½κ°„μ˜ 고급진 κΈ°λŠ₯을 μ‚¬μš©ν•˜μ—¬ INSERT같은 DML 문을 μ‚¬μš©ν•˜μ§€ μ•Šκ³  μ˜ˆμ‹œ ν…Œμ΄λΈ”μ„ λ§Œλ“€μ–΄ 보자.

BigQuery Scripting은 λ²”μš© ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄μ²˜λŸΌ λ¬Έμž₯λ‚΄μ—μ„œ λ³€μˆ˜λ₯Ό μ„ μ–Έν•˜κ³  μ‚¬μš©ν•  수 μžˆλ„λ‘ ν•΄μ€€λ‹€.

이 κ°€μš΄λ° DECLAREλŠ” λ³€μˆ˜λ₯Ό μ„ μ–Έν•˜λŠ” ꡬ문인데 이런게 μžˆκ΅¬λ‚˜ μ •λ„λ‘œ λ„˜μ–΄κ°€μž. BigQuery Scripting은 λ³„λ„μ˜ 문법 κ°•μ˜μ—μ„œ λ‹€λ£° μ˜ˆμ •μ΄λ‹€.

λ‹€λ§Œ, 데이터 ν˜•μ‹μ΄ ꡬ쑰체의 λ°°μ—΄(Array of Struct)인 λ³€μˆ˜λŠ” BigQueryλ‚΄μ—μ„œ ν…Œμ΄λΈ”κ³Ό 거의 λ™λ“±ν•˜κ²Œ λ‹€λ£¨μ–΄μ§„λ‹€λŠ” κ²ƒλ§Œ κΈ°μ–΅ν•΄ 두면 μ’‹κ² λ‹€.

μ•„λž˜ κ·Έλ¦Όμ—μ„œ λ°°μ—΄μ˜ 각 μš”μ†Œ(element)λŠ” ν…Œμ΄λΈ”μ—μ„œ ν•˜λ‚˜μ˜ ν–‰(row)으둜, 각 μš”μ†Œμ— ν•΄λ‹Ήν•˜λŠ” κ΅¬μ‘°μ²΄λŠ” 4개의 멀버을 κ°€μ§€λŠ”λ° ν…Œμ΄λΈ”μ˜ 컬럼(column)으둜 생각할 수 μžˆλ‹€.

Nested Table 생성 쿼리

DECLARE ꡬ문을 ν†΅ν•΄μ„œ purchase_logλΌλŠ” ꡬ쑰체 λ°°μ—΄ λ³€μˆ˜λ₯Ό ν•˜λ‚˜ μ„ μ–Έν•˜μž. κ΅¬μ‘°μ²΄λŠ” κ΅¬λ§€λ‚ μ§œ, 주문번호, μ‚¬μš©μžμ‹λ³„μž, κ΅¬λ§€κΈˆμ•‘μ„ ν¬ν•¨ν•˜λŠ” λ ˆμ½”λ“œμ΄λ‹€.

DECLARE purchase_log ARRAY<STRUCT<
    dt STRING,
    order_id INT64,
    user_id STRING,
    purchase_amount INT64
>>
DEFAULT [
  ('2014-01-01',  1, 'rhwpvvitou', 13900),
  ('2014-01-01',  2, 'hqnwoamzic', 10616),
  ('2014-01-02',  3, 'tzlmqryunr', 21156),
  ('2014-01-02',  4, 'wkmqqwbyai', 14893),
  ('2014-01-03',  5, 'ciecbedwbq', 13054),
  ('2014-01-03',  6, 'svgnbqsagx', 24384),
  ('2014-01-03',  7, 'dfgqftdocu', 15591),
  ('2014-01-04',  8, 'sbgqlzkvyn',  3025),
  ('2014-01-04',  9, 'lbedmngbol', 24215),
  ('2014-01-04', 10, 'itlvssbsgx',  2059),
  ('2014-01-05', 11, 'jqcmmguhik',  4235),
  ('2014-01-05', 12, 'jgotcrfeyn', 28013),
  ('2014-01-05', 13, 'pgeojzoshx', 16008),
  ('2014-01-06', 14, 'msjberhxnx',  1980),
  ('2014-01-06', 15, 'tlhbolohte', 23494),
  ('2014-01-06', 16, 'gbchhkcotf',  3966),
  ('2014-01-07', 17, 'zfmbpvpzvu', 28159),
  ('2014-01-07', 18, 'yauwzpaxtx',  8715),
  ('2014-01-07', 19, 'uyqboqfgex', 10805),
  ('2014-01-08', 20, 'hiqdkrzcpq',  3462),
  ('2014-01-08', 21, 'zosbvlylpv', 13999),
  ('2014-01-08', 22, 'bwfbchzgnl',  2299),
  ('2014-01-09', 23, 'zzgauelgrt', 16475),
  ('2014-01-09', 24, 'qrzfcwecge',  6469),
  ('2014-01-10', 25, 'njbpsrvvcq', 16584),
  ('2014-01-10', 26, 'cyxfgumkst', 11339)
];
-- ν…Œμ΄λΈ” 생성 쿼리
CREATE OR REPLACE TABLE learning_club.purchase_log AS
SELECT log.* FROM UNNEST(purchase_log) log;

UNNEST() ν•¨μˆ˜λ‘œ ꡬ쑰체 λ°°μ—΄(Array of Struct)의 μš”μ†ŒμΈ ꡬ쑰체λ₯Ό 각각의 ν–‰(row)으둜 ν’€μ–΄λ‚΄κ³ , 이어 ꡬ쑰체 μ•ˆμ˜ ν•„λ“œλ“€μ„ dot operator λ˜λŠ” member field access operator(멀버 μ ‘κ·Ό μ—°μ‚°μž) 라고 λΆˆλ¦¬μš°λŠ” ν‘œκΈ°λ²•λ₯Ό ν†΅ν•΄μ„œ μ—΄(column)둜 λΆ„λ¦¬μ‹œν‚€λ©΄ 일반 ν…Œμ΄λΈ”μ„ SELECT ν•˜λŠ” 것과 λ™μΌν•œ κ²°κ³Όκ°€ λ§Œλ“€μ–΄μ§„λ‹€.

μœ„ FROM 절의 UNNEST() ꡬ문과 SELECT 절의 log.* ν‘œκΈ°μ— μ£Όλͺ©ν•˜μž.

이후에 CTAS(Create Table As Select) ꡬ문으둜 Nested Table이 BigQuery의 Permanant Table이 λ˜λ„λ‘ λ§Œλ“ λ‹€.

이제 데이터가 μ€€λΉ„λ˜μ—ˆμœΌλ‹ˆ 뢄석을 μœ„ν•œ λ‹€μŒ λ‹¨κ³„λ‘œ λ„˜μ–΄κ°€μž.

2. λ‚ μ§œλ³„ 맀좜 집계

λ‹€μŒμœΌλ‘œ μ•žμ„œ λ§Œλ“€μ–΄μ§„ ꡬ맀이λ ₯ 데이터λ₯Ό νƒμƒ‰ν•˜κ³  μ§‘κ³„ν•¨μˆ˜(aggregate function)와 λΆ„μ„ν•¨μˆ˜(analytic function)을 μ‚¬μš©ν•˜μ—¬ λͺ‡κ°€μ§€ μ§€ν‘œλ“€μ„ μ‚°μΆœν•΄ 보자.

GROUP BY와 μ§‘κ³„ν•¨μˆ˜μ„ μ‚¬μš©ν•˜μ—¬ λ‚ μ§œλ³„ 맀좜 μ§€ν‘œλ₯Ό μ‚°μΆœν•˜λŠ” μΏΌλ¦¬λŠ” λ‹€μŒκ³Ό κ°™λ‹€.

dtλŠ” λ‚ μ§œλ₯Ό λ‚˜νƒ€λ‚΄λŠ” ν•˜λ‚˜μ˜ 차원(dimension)이 되고, COUNT(), SUM(), AVG() μ§‘κ³„ν•¨μˆ˜μ˜ 결과듀은 각각 ν•΄λ‹Ή μ°¨μ›μ—μ„œ κ΄€μΈ‘λœ 값인 λ©”νŠΈλ¦­(metric)이 λœλ‹€.

SELECT dt,
       COUNT(1) AS purchase_count,
       SUM(purchase_amount) AS total_amount, -- 일별 맀좜
       AVG(purchase_amount) AS avg_amount, -- 평균 ꡬ맀앑
  FROM learning_club.purchase_log
 GROUP BY 1
 ORDER BY dt;

쿼리 μˆ˜ν–‰μ˜ κ²°κ³Όλ₯Ό Explore Data (데이터 탐색) λ₯Ό 톡해 Data Studioμ—μ„œ κ°„λ‹¨ν•˜κ²Œ μ‹œκ°ν™”λ₯Ό ν•˜λ©΄ λ‹€μŒκ³Ό κ°™λ‹€.

3. 이동평균을 μ‚¬μš©ν•œ λ‚ μ§œλ³„ 좔이

μ•žμ„  μ‹œκ°ν™”λ₯Ό ν†΅ν•΄μ„œ 주쀑과 주말의 맀좜 차이λ₯Ό 확인할 수 μžˆλ‹€. 이제 μ‹œκ³„μ—΄ λ°μ΄ν„°μ˜ κ³„μ ˆμ„±(seasonality)을 μ™„ν™”μ‹œν‚€κ³  μΆ”μ„Έλ₯Ό 확인할 수 μžˆλ„λ‘ 7일 이동평균 으둜 ν‘œν˜„ν•΄ 보자.

SELECT dt,
       COUNT(1) AS purchase_count,
       SUM(purchase_amount) AS purchase_amount,
       AVG(SUM(purchase_amount)) OVER (ORDER BY dt ROWS 6 PRECEDING) AS avg_seven_day,
  FROM learning_club.purchase_log
 GROUP BY 1
 ORDER BY dt;

이동 평균을 κ΅¬ν•˜κΈ° μœ„ν•΄μ„œλŠ” λΆ„μ„ν•¨μˆ˜(λ˜λŠ” μœˆλ„μš°ν•¨μˆ˜)λ₯Ό ν•„μš”λ‘œ ν•˜λŠ”λ° μ•„λž˜ ꡬ문은 λ­”κ°€ λ³΅μž‘ν•˜λ‹€.

AVG(SUM(purchase_amount)) OVER (ORDER BY dt ROWS 6 PRECEDING)

  1. purchase_amountλŠ” purchase_log ν…Œμ΄λΈ”μ˜ 컬럼일까 μ•„λ‹ˆλ©΄ μ§μ „μ˜ AS purchase_amount에 μ˜ν•΄μ„œ μƒˆλ‘œ 이름지어진 별칭(alias)일까? μ •λ‹΅: ν…Œμ΄λΈ”μ˜ 컬럼

  2. AVG(SUM(....)) μ—μ„œ SUM()은 μ§‘κ³„ν•¨μˆ˜μΌκΉŒ? λΆ„μ„ν•¨μˆ˜μΌκΉŒ? μ •λ‹΅: μ§‘κ³„ν•¨μˆ˜

  3. AVG()λŠ” 그럼 무슨 ν•¨μˆ˜μΌκΉŒ? μ •λ‹΅: λΆ„μ„ν•¨μˆ˜

SELECT λ¬Έ(statment)을 μ΄λ£¨λŠ” 각 절(clause)의 ν•΄μ„μˆœμ„œμ™€ 컬럼 μ΄λ¦„μ˜ ν†΅μš©λ²”μœ„(scope) λ•Œλ¬ΈμΈλ° λ³΅μž‘ν•˜λ‹ˆ 일단 λ„˜μ–΄κ°€μž. BigQuery 문법 κ°•μ˜μ—μ„œ 이것도 닀뀄보도둝 ν•˜κ² λ‹€.

μœ„ ꡬ문은 ν…Œμ΄λΈ” 전체λ₯Ό λ‚ μ§œλ³„λ‘œ μ˜€λ¦„ 차순으둜 μ •λ ¬μ‹œν‚¨ 후에 각 ν–‰μ˜ μ—¬μ„― 번째 전에 μœ„μΉ˜ν•œ ν–‰λΆ€ν„° ν˜„μž¬ ν–‰κΉŒμ§€ 7μΌκ°„μ˜ SUM(purchase_amount) κ°’μ˜ 평균을 κ΅¬ν•˜λŠ” ꡬ문이닀. μ—¬κΈ°μ„œ SUM(purchase_amount)λŠ” GROUP BY에 μ˜ν•΄μ„œ λ¨Όμ € κ³„μ‚°λ˜μ–΄μ§€λŠ” λ‚ μ§œλ³„ ꡬ맀앑에 ν•΄λ‹Ήν•œλ‹€.

4. λ‹Ήμ›” λˆ„κ³„ 맀좜

Z Chart λΆ„μ„μ—μ„œ μš”κ΅¬ν•˜λŠ” μ§€ν‘œλ“€μ„ κ΅¬ν•˜λŠ” 방법듀을 ν•˜λ‚˜μ”© μ‚΄νŽ΄λ³΄μž.

λ‹€μŒμ€ 일별 맀좜 데이터가 μ£Όμ–΄μ‘Œμ„ λ•Œ ν•΄λ‹Ή μ›”μ˜ λˆ„κ³„ λ§€μΆœμ„ κ΅¬ν•˜λŠ” 쿼리이닀.

μœˆλ„μš° ν•¨μˆ˜ κ΅¬λ¬Έμ—μ„œ PARTITION BY SUBSTR(dt, 1, 7)은 μœˆλ„μš° ν•¨μˆ˜κ°€ 적용될 νŒŒν‹°μ…˜μ„ μ›” λ‹¨μœ„λ‘œ λ‚˜λˆˆλ‹€λŠ” 의미이고, ROWS UNBOUNDED PRECEDING은 μ›” λ‹¨μœ„λ‘œ λ‚˜λ‰˜μ–΄μ§„ νŒŒν‹°μ…˜μ˜ 첫번째 ν–‰ (맀월 1일에 ν•΄λ‹Ή)λΆ€ν„° ν˜„μž¬ ν–‰ (맀월 ν˜„μž¬μΌ)κΉŒμ§€ μœˆλ„μš°λ₯Ό μ„€μ •ν•œλ‹€. μ„€μ •λœ μœˆλ„μš°μ— ν¬ν•¨λœ 행듀을 SUM() λΆ„μ„ν•¨μˆ˜λ₯Ό ν˜ΈμΆœν•˜κ²Œ λ˜λŠ”λ° μ—¬κΈ°μ„œλŠ” 일별 맀좜(SUM(purchase_amount))을 λ‹€μ‹œ SUM() λΆ„μ„ν•¨μˆ˜λ‘œ λˆ„κ³„μ‹œν‚€κ³  μžˆλ‹€.

-- total_amountλŠ” 일별 λ§€μΆœμ•‘, agg_amountλŠ” ν•΄λ‹Ή μ›”μ˜ ν•΄λ‹Ή μΌκΉŒμ§€μ˜ 맀좜 λˆ„κ³„
SELECT dt,
       SUBSTR(dt, 1, 7) AS year_month,
       SUM(purchase_amount) AS total_amount,
       SUM(SUM(purchase_amount)) OVER (
         PARTITION BY SUBSTR(dt, 1, 7) ORDER BY dt ROWS UNBOUNDED PRECEDING
       ) AS agg_amount
  FROM learning_club.purchase_log
 GROUP BY dt
 ORDER BY dt       
;

-- μœ„μ™€ λ™λ“±ν•œ 결과의 쿼리, daily_purchase μ„œλΈŒμΏΌλ¦¬μ—μ„œ 일별 λ§€μΆœμ•‘μ„ λ¨Όμ € κ΅¬ν•˜κ³ ,
-- 메인 μΏΌλ¦¬μ—μ„œ μœˆλ„μš° ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ ν•΄λ‹ΉμΌκΉŒμ§€μ˜ 맀좜 λˆ„κ³„ μ‚°μΆœ
WITH daily_purchase AS (
  SELECT dt,
         SUBSTR(dt, 1, 4) AS year,
         SUBSTR(dt, 6, 2) AS month,
         SUBSTR(dt, 9, 2) AS day,
         SUM(purchase_amount) AS purchase_amount,
         COUNT(order_id) AS orders
    FROM learning_club.purchase_log
   GROUP BY 1
)
SELECT dt,
       year || month AS year_month,
       purchase_amount,
       SUM(purchase_amount) OVER (
         PARTITION BY year, month ORDER BY dt ROWS UNBOUNDED PRECEDING
       ) AS agg_amount
  FROM daily_purchase
;

5. 월별 맀좜의 μž‘λŒ€λΉ„(μž‘λ…„λŒ€λΉ„λΉ„μœ¨)

μ§€κΈˆκΉŒμ§€ μ‚¬μš©ν•œ λ°μ΄ν„°λŠ” μ—΄ν˜κ°„μ˜ 맀좜 싀적 λ°μ΄ν„°λ‘œ 월별 맀좜 좔이λ₯Ό λΆ„μ„ν•˜κΈ°μ—λŠ” λΆ€μ‘±ν•˜λ‹€.

λ‹€μŒμ€ 2014λ…„ 1μ›”λΆ€ν„° 2015λ…„ 12μ›”κΉŒμ§€ 맀좜 λ°μ΄ν„°λ‘œ 이λ₯Ό μ‚¬μš©ν•˜μ—¬ 월별 맀좜의 μž‘λŒ€λΉ„λ₯Ό ꡬ해보도둝 ν•˜μž.

μ˜ˆμ‹œ λ°μ΄ν„°λŠ” λ„μž…λΆ€μ—μ„œ μ–ΈκΈ‰ν•œ λ„μ„œμ˜ 곡개 데이터λ₯Ό μ‚¬μš©ν•˜μ˜€λ‹€.

DECLARE purchase_log_yr ARRAY<STRUCT<
  dt STRING,
  order_id INT64,
  user_id STRING,
  purchase_amount INT64
>> DEFAULT [
  ('2014-01-01',    1, 'rhwpvvitou', 13900),
  ('2014-02-08',   95, 'chtanrqtzj', 28469),
  ('2014-03-09',  168, 'bcqgtwxdgq', 18899),
  ('2014-04-11',  250, 'kdjyplrxtk', 12394),
  ('2014-05-11',  325, 'pgnjnnapsc',  2282),
  ('2014-06-12',  400, 'iztgctnnlh', 10180),
  ('2014-07-11',  475, 'eucjmxvjkj',  4027),
  ('2014-08-10',  550, 'fqwvlvndef',  6243),
  ('2014-09-10',  625, 'mhwhxfxrxq',  3832),
  ('2014-10-11',  700, 'wyrgiyvaia',  6716),
  ('2014-11-10',  775, 'cwpdvmhhwh', 16444),
  ('2014-12-10',  850, 'eqeaqvixkf', 29199),
  ('2015-01-09',  925, 'efmclayfnr', 22111),
  ('2015-02-10', 1000, 'qnebafrkco', 11965),
  ('2015-03-12', 1075, 'gsvqniykgx', 20215),
  ('2015-04-12', 1150, 'ayzvjvnocm', 11792),
  ('2015-05-13', 1225, 'knhevkibbp', 18087),
  ('2015-06-10', 1291, 'wxhxmzqxuw', 18859),
  ('2015-07-10', 1366, 'krrcpumtzb', 14919),
  ('2015-08-08', 1441, 'lpglkecvsl', 12906),
  ('2015-09-07', 1516, 'mgtlsfgfbj',  5696),
  ('2015-10-07', 1591, 'trgjscaajt', 13398),
  ('2015-11-06', 1666, 'ccfbjyeqrb',  6213),
  ('2015-12-05', 1741, 'onooskbtzp', 26024)
];
CREATE OR REPLACE TABLE learning_club.purchase_log_yr AS
SELECT l.* FROM UNNEST(purchase_log_yr) l;

μœ„μ™€ 같이 UNNESTν•¨μˆ˜μ™€ λ©€λ²„μ ‘κ·Όμ—°μ‚°μžλ₯Ό μ‚¬μš©ν•˜μ—¬ on-the-flyν•˜κ²Œ μ •μ˜λœ nested table을 BigQuery μ˜κ΅¬ν…Œμ΄λΈ”λ‘œ λ¨Όμ € λ§Œλ“ λ‹€.

LAG(col, n) λ„€λΉ„κ²Œμ΄μ…˜ ν•¨μˆ˜λŠ” νŒŒν‹°μ…˜μ˜ ν˜„μž¬ 행을 κΈ°μ€€μœΌλ‘œ n만큼 μ•žμ„  ν–‰μ˜ col값을 κ°€μ Έμ˜¨λ‹€. λ”°λΌμ„œ LAG(purchase_amount, 12)λŠ” λ‚ μ§œμˆœμœΌλ‘œ μ •λ ¬λœ 월별 λ§€μΆœμ•‘ ν…Œμ΄λΈ”μ—μ„œ 일년 μ „μ˜ λ§€μΆœμ•‘μ„ 돌렀주게 λœλ‹€. 이 κ°’μœΌλ‘œ ν˜„μž¬ μ›”λ§€μΆœμ•‘μ„ λ‚˜λˆ„κ²Œ 되면 μž‘λŒ€λΉ„(purchase_amount_rate)κ°€ μ‚°μΆœλœλ‹€.

CREATE OR REPLACE TABLE learning_club.purchase_log_dm AS
WITH daily_purchase AS (  
  SELECT dt,
         SUBSTR(dt, 1, 4) AS year,
         SUBSTR(dt, 6, 2) AS month,
         SUBSTR(dt, 9, 2) AS day,
         SUM(purchase_amount) AS purchase_amount,
         COUNT(order_id) AS orders
    FROM learning_club.purchase_log_yr
   GROUP BY 1
),
year_month_purchase AS (
  SELECT year,
         month,
         SUM(purchase_amount) AS purchase_amount,
         SUM(orders) AS orders
    FROM daily_purchase
   GROUP BY 1, 2
)
SELECT year, month, purchase_amount,
       LAG(purchase_amount, 12) OVER (ORDER BY year, month) AS purchase_amount_prev,
       ROUND(purchase_amount / LAG(purchase_amount, 12) OVER (ORDER BY year, month), 2) AS purchase_amount_rate,
  FROM year_month_purchase
 ORDER BY year, month;

λͺΈν’€κΈ°λ₯Ό λ§ˆμ³€μœΌλ‹ˆ 이제 본격적으둜 Z-Chart 뢄석에 λ“€μ–΄κ°€ 보도둝 ν•˜μž.

Z-Chart 뢄석

Z-Chart 뢄석은 μ›”λ³„λ§€μΆœ, (λ…„)λ§€μΆœλˆ„κ³„, 이동(λ…„κ°„)합계 λΌλŠ” 3가지 μ§€ν‘œλ₯Ό 꺾은선 κ·Έλž˜ν”„λ‘œ λ‚˜νƒ€λ‚Έ κ²ƒμœΌλ‘œ Z ν˜•νƒœλ₯Ό 이룬닀 ν•˜μ—¬ Z-Chart 둜 λΆˆλ¦°λ‹€.

경영 λΆ„μ„μ΄λ‚˜ λ§ˆμΌ€νŒ… λΆ„μ•Όμ—μ„œ μ›” λ³€λ™μ΄λ‚˜ κ³„μ ˆ 변동을 λ°°μ œν•œ 맀좜 κ²½ν–₯을 λΆ„μ„ν•˜κΈ° μœ„ν•΄ 자주 μ΄μš©λœλ‹€. Z-Chart 뢄석을 μœ„ν•΄μ„œλŠ” μ•„λž˜ 3가지 μ§€ν‘œκ°€ ν•„μš”ν•˜λ‹€.

  1. μ›”λ³„λ§€μΆœ

  2. λ§€μΆœλˆ„κ³„

  3. 이동년계

맀좜 이λ ₯ λ°μ΄ν„°λ‘œλΆ€ν„° 각각의 μ§€ν‘œλ₯Ό μ‚°μΆœν•˜λŠ” 과정은 λ‹€μŒκ³Ό κ°™λ‹€.

1. Z-Chart 뢄석 쿼리

CREATE OR REPLACE TABLE learning_club.zchart_dm AS
WITH daily_purchase AS (  
  SELECT dt,
         SUBSTR(dt, 1, 4) AS year,
         SUBSTR(dt, 6, 2) AS month,
         SUBSTR(dt, 9, 2) AS day,
         SUM(purchase_amount) AS purchase_amount,
         COUNT(order_id) AS orders
    FROM learning_club.purchase_log_yr
   GROUP BY 1
   ORDER BY 1
),
year_month_purchase AS (
  SELECT year,
         month,
         SUM(purchase_amount) AS purchase_amount,
         SUM(orders) AS orders
    FROM daily_purchase
   GROUP BY 1, 2
)
SELECT year, month,
       purchase_amount,
       SUM(purchase_amount) OVER (
         PARTITION BY year ORDER BY month ROWS UNBOUNDED PRECEDING
       ) AS purchase_cumulative,
       SUM(purchase_amount) OVER (
         ORDER BY year, month ROWS 11 PRECEDING
       ) AS purchase_moving_yr,
  FROM year_month_purchase
 ORDER BY year, month; 

μ›”λ³„λ§€μΆœμ€ GROUP BY 집계λ₯Ό 톡해 μ‰½κ²Œ κ³„μ‚°λœλ‹€. λ§€μΆœλˆ„κ³„μ™€ μ΄λ™λ…„κ³„λŠ” μ•žμ„œ λͺΈν’€κΈ°μ—μ„œ μ‚΄νŽ΄λ΄€λ˜ 일별 맀좜 λˆ„κ³„μ™€ 이동평균λ₯Ό μ‚°μΆœν•  λ•Œ μ‚¬μš©ν–ˆλ˜ λΆ„μ„ν•¨μˆ˜(μœˆλ„μš°ν•¨μˆ˜)λ₯Ό μ‚¬μš©ν•˜μ—¬ μ‚°μΆœ κ°€λŠ₯ν•˜λ‹€.

μœˆλ„μš° ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•  λ•ŒλŠ” 항상 νŒŒν‹°μ…˜μ„ μ–΄λ–»κ²Œ λ‚˜λˆŒμ§€, νŒŒν‹°μ…˜ μ•ˆμ—μ„œ μœˆλ„μš°μ˜ λ²”μœ„λ₯Ό μ–΄λ–»κ²Œ μž‘μ„μ§€λ₯Ό κ³ λ―Όν•΄μ•Ό ν•œλ‹€. μ•žμ˜ λ‚΄μš©μ„ λ°”νƒ•μœΌλ‘œ 각자 생각해 보자.

2. Z-Chart 뢄석 μ‹œκ°ν™”

μœ„μ—μ„œ μ‚°μΆœν•œ 3가지 μ§€ν‘œλ“€λ“€μ„ λ§ˆνŠΈν™”ν•˜μ—¬ BigQuery ν…Œμ΄λΈ”λ‘œ μƒμ„±ν•œ ν›„ Data Studioμ—μ„œ μ‹œκ°ν™”λ₯Ό ν•˜λ©΄ μ•„λž˜μ™€ 같이 Z 자 λͺ¨μ–‘μ˜ κ·Έλž˜ν”„κ°€ λ§Œλ“€μ–΄μ§„λ‹€.

Last updated