์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ๋ฅผ DataFrame์œผ๋กœ ์ฒ˜๋ฆฌํ•  ๋•Œ Pandas resample() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐ„์„ ์žฌ์กฐ์ • ํ•  ์ˆ˜ ์žˆ๋‹ค.

import pandas as pd
df = pd.read_csv("../input/bigdatacertificationkr/basic2.csv", parse_dates=['Date'], index_col=0)

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

์ฃผ ๋‹จ์œ„์˜ ์ปฌ๋Ÿผ ํ•ฉ๊ณ„๋ฅผ ๊ตฌํ•ด๋ณด์ž!  

resample('W') ์‚ฌ์šฉ

df_w = df.resample('W').sum()
df_w

Date ์ปฌ๋Ÿผ์„ ๋ณด๋ฉด ์ฃผ ๋‹จ์œ„๋กœ ๋ณ€๊ฒฝ ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

์›” ๋‹จ์œ„์˜ ์ปฌ๋Ÿผ ํ•ฉ๊ณ„๋ฅผ ๊ตฌํ•ด๋ณด์ž.

resample('M') ์‚ฌ์šฉ

df_m = df.resample('M').sum()
df_m

Date ์ปฌ๋Ÿผ์„ ๋ณด๋ฉด ์›”(๋งˆ์ง€๋ง‰๋‚  ๊ธฐ์ค€) ๋‹จ์œ„๋กœ ๋ณ€๊ฒฝ ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

resample('MS') ์‚ฌ์šฉ

df_ms = df.resample('MS').sum()
df_ms

Date ์ปฌ๋Ÿผ์„ ๋ณด๋ฉด ์›”(์‹œ์ž‘ ์ผ ๊ธฐ์ค€) ๋‹จ์œ„๋กœ ๋ณ€๊ฒฝ ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

์ฃผ๊ธฐ ๋‹จ์œ„ ์ •๋ฆฌ

๋‚ ์งœ ์˜คํ”„์…‹๋นˆ๋„ ๋ฌธ์ž์—ด์„ค๋ช…

์ฐธ๊ณ : https://pandas.pydata.org/pandas-docs/stable/user_guide/window.html

DateOffset ์—†์Œ ์ผ๋ฐ˜ ์˜คํ”„์…‹ ํด๋ž˜์Šค, ๊ธฐ๋ณธ๊ฐ’์€ ์ ˆ๋Œ€ 24์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค.
BDay๋˜๋Š”BusinessDay 'B' ์˜์—…์ผ(ํ‰์ผ)
CDay๋˜๋Š”CustomBusinessDay 'C' ๋งž์ถค ์˜์—…์ผ
Week 'W' ์ผ์ฃผ์ผ, ์„ ํƒ์ ์œผ๋กœ ์š”์ผ์— ๊ณ ์ •
WeekOfMonth 'WOM' ๋งค์›” y๋ฒˆ์งธ ์ฃผ์˜ x๋ฒˆ์งธ ๋‚ 
LastWeekOfMonth 'LWOM' ๋งค์›” ๋งˆ์ง€๋ง‰ ์ฃผ์˜ x๋ฒˆ์งธ ๋‚ 
MonthEnd 'M' ์—ญ์›”๋ง
MonthBegin 'MS' ์—ญ์›” ์‹œ์ž‘
BMonthEnd๋˜๋Š”BusinessMonthEnd 'BM' ์˜์—… ์›”๋ง
BMonthBegin๋˜๋Š”BusinessMonthBegin 'BMS' ์˜์—…์›” ์‹œ์ž‘
CBMonthEnd๋˜๋Š”CustomBusinessMonthEnd 'CBM' ๋งž์ถคํ˜• ์˜์—… ์›”๋ง
CBMonthBegin๋˜๋Š”CustomBusinessMonthBegin 'CBMS' ๋งž์ถคํ˜• ๋น„์ฆˆ๋‹ˆ์Šค ์›” ์‹œ์ž‘
SemiMonthEnd 'SM' 15์ผ(๋˜๋Š” ๋‹ค๋ฅธ day_of_month) ๋ฐ ์—ญ์›”๋ง
SemiMonthBegin 'SMS' 15์ผ(๋˜๋Š” ๋‹ค๋ฅธ day_of_month) ๋ฐ ์—ญ์›” ์‹œ์ž‘
QuarterEnd 'Q' ๋‹ฌ๋ ฅ ๋ถ„๊ธฐ๋ง
QuarterBegin 'QS' ๋‹ฌ๋ ฅ ๋ถ„๊ธฐ ์‹œ์ž‘
BQuarterEnd 'BQ ์‚ฌ์—… ๋ถ„๊ธฐ๋ง
BQuarterBegin 'BQS' ์‚ฌ์—… ๋ถ„๊ธฐ ์‹œ์ž‘
FY5253Quarter 'REQ' ์†Œ๋งค(์ผ๋ช… 52-53์ฃผ) ๋ถ„๊ธฐ
YearEnd 'A' ๋‹ฌ๋ ฅ ์—ฐ๋ง
YearBegin 'AS'๋˜๋Š”'BYS' ์—ญ๋…„ ์‹œ์ž‘
BYearEnd 'BA' ์‚ฌ์—…์—ฐ๋ง
BYearBegin 'BAS' ์‚ฌ์—… ์—ฐ๋„ ์‹œ์ž‘
FY5253 'RE' ์†Œ๋งค(์ผ๋ช… 52-53์ฃผ) ์—ฐ๋„
Easter ์—†์Œ ๋ถ€ํ™œ์ ˆ
BusinessHour 'BH' ์˜์—…์‹œ๊ฐ„
CustomBusinessHour 'CBH' ๋งž์ถค ์—…๋ฌด ์‹œ๊ฐ„
Day 'D' ์ ˆ๋Œ€์ ์ธ ํ•˜๋ฃจ
Hour 'H' ํ•œ ์‹œ๊ฐ„
Minute 'T'๋˜๋Š”'min' 1๋ถ„
Second 'S' ์ผ์ดˆ
Milli 'L'๋˜๋Š”'ms' 1๋ฐ€๋ฆฌ์ดˆ
Micro 'U'๋˜๋Š”'us' 1๋งˆ์ดํฌ๋กœ์ดˆ
Nano 'N' 1๋‚˜๋…ธ์ดˆ

 

 

df.cumsum(axis=None, skipna=True, args, kwargs) ๋ˆ„์ ํ•ฉ
df.cumprod(axis=None, skipna=True, args, kwargs) ๋ˆ„์ ๊ณฑ


axis : ๋ˆ„์ ํ•ฉ/๋ˆ„์ ๊ณฑ์„ ๊ตฌํ•  ์ถ•์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
skipna : ๊ฒฐ์ธก์น˜๋ฅผ ๋ฌด์‹œํ• ์ง€ ์—ฌ๋ถ€ ์ž…๋‹ˆ๋‹ค

์˜ˆ์‹œ

df = pd.DataFrame({'col1':[2,-2,4,5,6,8],'col2':[3,4,np.NaN,7,4,5]})
print(df)
   col1  col2
0     2   3.0
1    -2   4.0
2     4   NaN
3     5   7.0
4     6   4.0
5     8   5.0

 

๋ˆ„์ ํ•ฉ cumsum() 

print(df.cumsum())
 col1  col2
0     2   3.0
1     0   7.0
2     4   NaN
3     9  14.0
4    15  18.0
5    23  23.0

 

๋ˆ„์ ๊ณฑ cumprod()

print(df.cumprod())
   col1    col2
0     2     3.0
1    -4    12.0
2   -16     NaN
3   -80    84.0
4  -480   336.0
5 -3840  1680.0

 

skipna ์ธ์ˆ˜์˜ ์‚ฌ์šฉ

print(df.cumsum(skipna=False))
   col1  col2
0     2   3.0
1     0   7.0
2     4   NaN # NaN ๋“ฑ์žฅ๋ถ€ํ„ฐ ๊ณ„์‚ฐํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ NaN ๋ฐ˜ํ™˜
3     9   NaN
4    15   NaN
5    23   NaN

์™œ๋„(Skewness)

: ์‹ค์ˆ˜ ๊ฐ’ ํ™•๋ฅ  ๋ณ€์ˆ˜์˜ ํ™•๋ฅ  ๋ถ„ํฌ ๋น„๋Œ€์นญ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ

์™œ๋„ < 0:

ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์˜ ์™ผ์ชฝ ๋ถ€๋ถ„์— ๊ธด ๊ผฌ๋ฆฌ๋ฅผ ๊ฐ€์ง€๋ฉฐ ์ค‘์•™๊ฐ’์„ ํฌํ•จํ•œ ์ž๋ฃŒ๊ฐ€ ์˜ค๋ฅธ์ชฝ์— ๋” ๋งŽ์ด ๋ถ„ํฌ

์™œ๋„ > 0:

ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์˜ ์˜ค๋ฅธ์ชฝ ๋ถ€๋ถ„์— ๊ธด ๊ผฌ๋ฆฌ๋ฅผ ๊ฐ€์ง€๋ฉฐ ์ค‘์•™๊ฐ’์„ ํฌํ•จํ•œ ์ž๋ฃŒ๊ฐ€ ์™ผ์ชฝ์— ๋” ๋งŽ์ด ๋ถ„ํฌ

 

์ฒจ๋„(Kurtosis):

ํ™•๋ฅ ๋ถ„ํฌ์˜ ๊ผฌ๋ฆฌ๊ฐ€ ๋‘๊บผ์šด ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ฒ™๋„

k = 3:

์‚ฐํฌ๋„๊ฐ€ ์ •๊ทœ๋ถ„ํฌ์— ๊ฐ€๊นŒ์›€

k < 3:

์ •๊ทœ๋ถ„ํฌ๋ณด๋‹ค ๊ผฌ๋ฆฌ๊ฐ€ ์–‡์€ ๋ถ„ํฌ

k > 3:

์ •๊ทœ๋ถ„ํฌ๋ณด๋‹ค ๊ผฌ๋ฆฌ๊ฐ€ ๋‘๊บผ์šด ๋ถ„ํฌ

์ˆซ์ž ์˜ˆ์‹œ

df.skew() # ์™œ๋„
df.kurt() # ์ฒจ๋„
๋”๋ณด๊ธฐ

+ Recent posts