DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

fillna ๋ฉ”์„œ๋“œ : DataFrame์—์„œ ๊ฒฐ์ธก๊ฐ’์„ ์›ํ•˜๋Š” ๊ฐ’์œผ๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฉ”์„œ๋“œ์ž…๋‹ˆ๋‹ค.

  • value : ๊ฒฐ์ธก๊ฐ’์„ ๋Œ€์ฒดํ•  ๊ฐ’์ž…๋‹ˆ๋‹ค. dictํ˜•ํƒœ๋กœ๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • method : ๊ฒฐ์ธก๊ฐ’์„ ๋ณ€๊ฒฝํ•  ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. bfill๋กœ ํ• ๊ฒฝ์šฐ ๊ฒฐ์ธก๊ฐ’์„ ๋ฐ”๋กœ ์•„๋ž˜ ๊ฐ’๊ณผ ๋™์ผํ•˜๊ฒŒ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.ffill๋กœ ํ•  ๊ฒฝ์šฐ ๊ฒฐ์ธก๊ฐ’์„ ๋ฐ”๋กœ ์œ„ ๊ฐ’๊ณผ ๋™์ผํ•˜๊ฒŒ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
  • axis : {0 : index / 1 : columns} fillna ๋ฉ”์„œ๋“œ๋ฅผ ์ ์šฉํ•  ๋ ˆ์ด๋ธ”์ž…๋‹ˆ๋‹ค.
  • inplace : ์›๋ณธ์„ ๋ณ€๊ฒฝํ• ์ง€ ์—ฌ๋ถ€์ž…๋‹ˆ๋‹ค. True์ผ ๊ฒฝ์šฐ ์›๋ณธ์„ ๋ณ€๊ฒฝํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • limit : ๊ฒฐ์ธก๊ฐ’์„ ๋ณ€๊ฒฝํ•  ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. ์œ„์—์„œ๋ถ€ํ„ฐ limit๋กœ ์ง€์ •๋œ ๊ฐฏ์ˆ˜๋งŒํผ๋งŒ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
  • downcast : ๋‹ค์šด์บ์ŠคํŠธํ• ์ง€ ์—ฌ๋ถ€์ž…๋‹ˆ๋‹ค. downcast='infer'์ผ ๊ฒฝ์šฐ float64๋ฅผ int64๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

 

์˜ˆ์‹œ

๋จผ์ € 5x5 ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ

col  = ['col1','col2','col3','col4','col5']
row  = ['row1','row2','row3','row4','row5']
na = np.nan
data = [[na, 2,na, 4,na],
        [ 6, 7,na, 9,na],
        [11,na,na,14,15],
        [na,17,na,na,20],
        [na,22,na,na,25]]
df = pd.DataFrame(data,row,col)
print(df)
      col1  col2  col3  col4  col5
row1   NaN   2.0   NaN   4.0   NaN
row2   6.0   7.0   NaN   9.0   NaN
row3  11.0   NaN   NaN  14.0  15.0
row4   NaN  17.0   NaN   NaN  20.0
row5   NaN  22.0   NaN   NaN  25.0

 

value์˜ ํ˜•์‹์— ๋”ฐ๋ฅธ ์‚ฌ์šฉ

value๊ฐ€ ์ˆซ์ž๋‚˜ ๋ฌธ์ž์ผ ๊ฒฝ์šฐ ๊ทธ๋Œ€๋กœ ๊ฒฐ์ธก๊ฐ’์„ ๋Œ€์ฒดํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” A๋กœ ๋ฐ”๊ฟ”๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

print(df.fillna('A'))
      col1  col2 col3  col4  col5
row1     A   2.0    A   4.0     A
row2   6.0   7.0    A   9.0     A
row3  11.0     A    A  14.0  15.0
row4     A  17.0    A     A  20.0
row5     A  22.0    A     A  25.0

dictํ˜•ํƒœ๋กœ ์ž…๋ ฅํ•  ๊ฒฝ์šฐ ๊ฐ๊ฐ ๋ ˆ์ด๋ธ”๊ฐ’์— ๋Œ€ํ•ด ์›ํ•˜๋Š” ๊ฐ’์œผ๋กœ์˜ ๋ณ€๊ฒฝ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

dict = {'col1':'A', 'col2':'B', 'col3':'C', 'col4':'D', 'col5':'E'}
print(df.fillna(value=dict))

# print(df.fillna(dict))๋„ ๊ฐ€๋Šฅ
      col1  col2 col3  col4  col5
row1     A   2.0    C   4.0     E
row2   6.0   7.0    C   9.0     E
row3  11.0     B    C  14.0  15.0
row4     A  17.0    C     D  20.0
row5     A  22.0    C     D  25.0

 

method์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ

method์ธ์ˆ˜์— bfill์„ ์ž…๋ ฅํ•  ๊ฒฝ์šฐ ๊ฒฐ์ธก๊ฐ’์ด ๋ฐ”๋กœ ์•„๋ž˜๊ฐ’๊ณผ ๋™์ผํ•˜๊ฒŒ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

โ€ป df.backfill( )์ด๋‚˜ df.bfill( )๊ณผ ์™„์ „ํžˆ ๋™์ผํ•œ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰

print(df.fillna(method='bfill'))
      col1  col2  col3  col4  col5
row1   6.0   2.0   NaN   4.0  15.0
row2   6.0   7.0   NaN   9.0  15.0
row3  11.0  17.0   NaN  14.0  15.0
row4   NaN  17.0   NaN   NaN  20.0
row5   NaN  22.0   NaN   NaN  25.0

 

method์ธ์ˆ˜์— ffill์„ ์ž…๋ ฅํ•  ๊ฒฝ์šฐ ๊ฒฐ์ธก๊ฐ’์ด ๋ฐ”๋กœ ์œ„๊ฐ’๊ณผ ๋™์ผํ•˜๊ฒŒ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

โ€ป df.pad( )๋‚˜ df.ffill( )๊ณผ ์™„์ „ํžˆ ๋™์ผํ•œ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

print(df.fillna(method='ffill'))
      col1  col2  col3  col4  col5
row1   NaN   2.0   NaN   4.0   NaN
row2   6.0   7.0   NaN   9.0   NaN
row3  11.0   7.0   NaN  14.0  15.0
row4  11.0  17.0   NaN  14.0  20.0
row5  11.0  22.0   NaN  14.0  25.0

 

limit์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ

limit์ธ์ˆ˜๋Š” ๊ฐ ๋ ˆ์ด๋ธ”๊ฐ’์— ๋Œ€ํ•ด์„œ ๊ฒฐ์ธก์น˜ ๋ณ€๊ฒฝ์„ ์ˆ˜ํ–‰ํ•  ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. ํ–‰ ๊ธฐ์ค€์ผ๊ฒฝ์šฐ ์™ผ์ชฝ๋ถ€ํ„ฐ, ์—ด ๊ธฐ์ค€์ผ ๊ฒฝ์šฐ ์œ„์—์„œ๋ถ€ํ„ฐ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

print(df.fillna('A', limit=2))
       col1  col2 col3  col4  col5
row1     A   2.0    A   4.0     A
row2   6.0   7.0    A   9.0     A
row3  11.0     A  NaN  14.0  15.0
row4     A  17.0  NaN     A  20.0
row5   NaN  22.0  NaN     A  25.0

 

downcast์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ

downcast ์ธ์ˆ˜๋ฅผ 'infer'๋กœ ์„ค์ •ํ•จ์œผ๋กœ์จ float64ํ˜•ํƒœ๋ฅผ int64ํ˜•ํƒœ๋กœ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

print(df.fillna(0, downcast='infer'))
      col1  col2  col3  col4  col5
row1     0     2     0     4     0
row2     6     7     0     9     0
row3    11     0     0    14    15
row4     0    17     0     0    20
row5     0    22     0     0    25

 

inplace๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ

๋‹ค๋ฅธ ํŒŒ์ด์ฌ ๊ฐ์ฒด์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ inplace๋Š” ์›๋ณธ์„ ๋ฎ์–ด์”Œ์šฐ๋Š” ๊ธฐ๋Šฅ๊ณผ ์œ ์‚ฌํ•œ ๊ธฐ๋Šฅ์„ ํ•ฉ๋‹ˆ๋‹ค.
์ฆ‰ df.fillna(0, inplace=True) ๋Š” df=df.fillna(0)๊ณผ ๋™์ผํ•œ ๊ธฐ๋Šฅ์„ ํ•ฉ๋‹ˆ๋‹ค.

df.fillna('A', inplace=True)
print(df)
      col1  col2 col3  col4  col5
row1     A   2.0    A   4.0     A
row2   6.0   7.0    A   9.0     A
row3  11.0     A    A  14.0  15.0
row4     A  17.0    A     A  20.0
row5     A  22.0    A     A  25.0

 

+ Recent posts