Pandas: .loc [...] = 값은 SettingWithCopyWarning을 λ°˜ν™˜ν•©λ‹ˆλ‹€.

에 λ§Œλ“  2017λ…„ 09μ›” 08일  Β·  8μ½”λ©˜νŠΈ  Β·  좜처: pandas-dev/pandas

μ½”λ“œ μƒ˜ν”Œ

# My code
df.loc[0, 'column_name'] = 'foo bar'

문제 μ„€λͺ…

Pandas 20.3의이 μ½”λ“œλŠ” SettingWithCopyWarning을 λ°œμƒμ‹œν‚€κ³  λ‹€μŒμ„ μ œμ•ˆν•©λ‹ˆλ‹€.

"λŒ€μ‹  .loc[row_indexer,col_indexer] = value μ‚¬μš©".

이미 κ·Έλ ‡κ²Œν•˜κ³  μžˆλŠ”λ° μ•½κ°„μ˜ λ²„κ·Έκ°€μžˆλŠ” 것 κ°™μŠ΅λ‹ˆλ‹€. λ‚˜λŠ” Jupyterλ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.
κ°μ‚¬ν•©λ‹ˆλ‹€! :)

pd.show_versions()


컀밋 : μ—†μŒ
파이썬 : 3.6.1.final.0
파이썬 λΉ„νŠΈ : 64
운영체제 : Windows
OS 릴리슀 : 8.1
기계 : AMD64
ν”„λ‘œμ„Έμ„œ : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder : 쑰금
LC_ALL : μ—†μŒ
LANG : μ—†μŒ
λ‘œμΌ€μΌ : μ—†μŒ. μ—†μŒ

νŒλ‹€ : 0.20.1
pytest : 3.0.7
핍 : 9.0.1
setuptools : 35.0.2
사이 톀 : 0.25.2
numpy : 1.12.1
scipy : 0.19.0
xarray : μ—†μŒ
IPython : 5.3.0
μŠ€ν•‘ν¬μŠ€ : 1.5.6
νŒ¨μ‹œ : 0.4.1
dateutil : 2.6.0
pytz : 2017.2
blosc : μ—†μŒ
병λͺ© ν˜„μƒ : 1.2.1
ν‘œ : 3.2.2
numexpr : 2.6.2
κΉƒν„Έ : μ—†μŒ
matplotlib : 2.0.2
openpyxl : μ—†μŒ
xlrd : 1.0.0
xlwt : 1.2.0
xlsxwriter : 0.9.6
lxml : 3.7.3
bs4 : 4.6.0
html5lib : 0.999
sqlalchemy : 1.1.9
pymysql : μ—†μŒ
psycopg2 : μ—†μŒ
jinja2 : 2.9.6
s3fs : μ—†μŒ
pandas_gbq : μ—†μŒ
pandas_datareader : μ—†μŒ

Indexing Usage Question

κ°€μž₯ μœ μš©ν•œ λŒ“κΈ€

μ—¬κΈ°μ„œ λ¬Έμ œλŠ” 4 ν–‰μ—μ„œ .loc 둜 λ¨Όμ € 데이터 ν”„λ ˆμž„μ„ λΆ„ν• ν•œλ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€. ν•΄λ‹Ή μŠ¬λΌμ΄μŠ€μ— 값을 ν• λ‹Ήν•˜λ €λŠ” μ‹œλ„μž…λ‹ˆλ‹€.

df_c = df.loc[df.encountry == country, :]

PandasλŠ” 값을 df_c μŠ¬λΌμ΄μŠ€μ—λ§Œ 할당할지 μ•„λ‹ˆλ©΄ μ›λž˜ df κΉŒμ§€ λ‹€μ‹œ μ „νŒŒν• μ§€ 100 % ν™•μ‹ ν•˜μ§€ λͺ»ν•©λ‹ˆλ‹€. df_c λ₯Ό 처음 ν• λ‹Ή ν•  λ•Œμ΄λ₯Ό λ°©μ§€ν•˜λ €λ©΄ λ‹€μŒμ„ μ‚¬μš©ν•˜μ—¬ νŒ¬λ”μ—κ²Œ 자체 데이터 ν”„λ ˆμž„ (μŠ¬λΌμ΄μŠ€κ°€ μ•„λ‹˜)μž„μ„ μ•Œλ €μ•Όν•©λ‹ˆλ‹€.

df_c = df.loc[df.encountry == country, :].copy()

μ΄λ ‡κ²Œν•˜λ©΄ 였λ₯˜κ°€ μˆ˜μ •λ©λ‹ˆλ‹€. 이 μΈ‘λ©΄μ—μ„œ λ§Žμ€ μ‚¬μš©μžκ°€ νŒ¬λ”λ‘œ 인해 ν˜Όλž€μŠ€λŸ¬μ›Œν•˜λŠ” 것을 μ•Œμ•˜μœΌλ―€λ‘œ μœ„μ˜ μ„€λͺ…을 돕기 μœ„ν•΄ κ°„λ‹¨ν•œ 예제λ₯Ό μ œκ³΅ν•˜κ² μŠ΅λ‹ˆλ‹€.

κ΅¬μ„±λœ λ°μ΄ν„°κ°€μžˆλŠ” 예

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('QQQCC')})
>>> df
   A  B
0  1  Q
1  2  Q
2  3  Q
3  4  C
4  5  C
>>> df.loc[df['B'] == 'Q', 'new_col'] = 'hello'
>>> df
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q   hello
3  4  C     NaN
4  5  C     NaN

λ”°λΌμ„œ μœ„μ˜ λ‚΄μš©μ€ μ˜ˆμƒλŒ€λ‘œ μž‘λ™ν•©λ‹ˆλ‹€! 이제 λ°μ΄ν„°λ‘œ μ‹œλ„ν•œ μž‘μ—…μ„ λ°˜μ˜ν•˜λŠ” 예제λ₯Ό μ‹œλ„ν•΄ λ³΄κ² μŠ΅λ‹ˆλ‹€.

>>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('QQQCC')})
>>> df_q = df.loc[df['B'] == 'Q']
>>> df_q
   A  B
0  1  Q
1  2  Q
2  3  Q
>>> df_q.loc[df['A'] < 3, 'new_col'] = 'hello'
/Users/riddellcd/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:337: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)

>>> df_q
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q     NaN

같은 였λ₯˜κ°€ λ°œμƒν•œ 것 κ°™μŠ΅λ‹ˆλ‹€! ν•˜μ§€λ§Œ μ˜ˆμƒλŒ€λ‘œ df_q λ³€κ²½λ˜μ—ˆμŠ΅λ‹ˆλ‹€! 이것은 df_q 이 df 의 쑰각이기 λ•Œλ¬Έμž…λ‹ˆλ‹€. λ”°λΌμ„œ .loc []을 μ‚¬μš©ν•˜λ”λΌλ„ df_q pandasλŠ” λ³€κ²½ 사항을 μ „νŒŒν•˜μ§€ μ•Šμ„ 것이라고 κ²½κ³ ν•©λ‹ˆλ‹€. ~ df . 이λ₯Ό λ°©μ§€ν•˜λ €λ©΄ 더 λͺ…μ‹œ 적으둜 df_q df λ₯Ό λͺ…μ‹œ 적으둜 μ„ μ–Έν•˜μ—¬

df_q μ—μ„œ λ‹€μ‹œ μ‹œμž‘ν•˜μ§€λ§Œ μ΄λ²ˆμ—λŠ” .copy() ν•©λ‹ˆλ‹€.

>>> df_q = df.loc[df['B'] == 'Q'].copy()
>>> df_q
   A  B
0  1  Q
1  2  Q
2  3  Q

Lets try to reassign our value now!
>>> df_q.loc[df['A'] < 3, 'new_col'] = 'hello'
>>> df_q
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q     NaN

df_q 이 df 와 (κ³Ό) λ³„κ°œλΌκ³  Pandasμ—κ²Œ μ•Œλ € μ£Όμ—ˆμœΌλ―€λ‘œ 였λ₯˜μ—†μ΄ μž‘λ™ν•©λ‹ˆλ‹€.

μ‹€μ œλ‘œ df_c λŒ€ν•œ μ΄λŸ¬ν•œ λ³€κ²½ 사항이 μ΅œλŒ€ df κΉŒμ§€ μ „νŒŒλ˜κΈ°λ₯Ό μ›ν•˜λ©΄ μ™„μ „νžˆ λ‹€λ₯Έ 점이며 μ›ν•˜λŠ” 경우 λŒ€λ‹΅ ν•  κ²ƒμž…λ‹ˆλ‹€.

λͺ¨λ“  8 λŒ“κΈ€

@NadiaRom 전체 예제λ₯Ό 제곡 ν•  수 μžˆμŠ΅λ‹ˆκΉŒ? ν™•μ‹€νžˆ λ§ν•˜κΈ°λŠ” μ–΄λ ΅μ§€λ§Œ df λŠ” λ·°λ‚˜ μΉ΄ν”Ό 일 μˆ˜μžˆλŠ” μž‘μ—…μ—μ„œ λΉ„λ‘―λœ 것 κ°™μŠ΅λ‹ˆλ‹€. 예λ₯Ό λ“€λ©΄ :

In [8]: df = pd.DataFrame({"A": [1, 2], "B": [3, 4], "C": [4, 5]})

In [9]: df1 = df[['A', 'B']]

In [10]: df1.loc[0, 'A'] = 5
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/indexing.py:180: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
/Users/taugspurger/Envs/pandas-dev/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3.6

λ”°λΌμ„œ df1 μ˜¬λ°”λ₯΄κ²Œ μ—…λ°μ΄νŠΈν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. λͺ¨ν˜Έμ„±μ€ df 도 μ—…λ°μ΄νŠΈ 될 것인지 μ—¬λΆ€μž…λ‹ˆλ‹€. λΉ„μŠ·ν•œ 일이 μΌμ–΄λ‚˜κ³  μžˆλ‹€κ³  μƒκ°ν•˜μ§€λ§Œ μž¬ν˜„ κ°€λŠ₯ν•œ μ˜ˆκ°€ μ—†μœΌλ©΄ ν™•μ‹€νžˆ λ§ν•˜κΈ°κ°€ μ–΄λ ΅μŠ΅λ‹ˆλ‹€.

@TomAugspurger λ‹€μŒμ€ 일반적으둜 .loc없이 νŒ¬λ”μ— 값을 ν• λ‹Ήν•˜μ§€ μ•ŠλŠ” μ½”λ“œμž…λ‹ˆλ‹€.

df = pd.read_csv('df_unicities.tsv', sep='\t')
df.replace({'|': '--'}, inplace=True)

df_c = df.loc[df.encountry == country, : ]

df_c['sort'] = (df_c.encities_ua == 'all').astype(int) # new column
df_c['sort'] += (df_c.encities_foreign == 'all').astype(int)
df_c.sort_values(by='sort', inplace=True)

# ---end of chunk, everything is fine ---

if df_c.encities_foreign.str.contains('all').sum() < len(df_c):
    df_c.loc[df_c.encities_foreign.str.contains('all'), 'encities_foreign'] = 'other'
    df_c.loc[df_c.cities_foreign.str.contains('всі'), 'cities_foreign'] = 'Ρ–Π½ΡˆΡ–'
else:
    df_c.loc[df_c.encities_foreign.str.contains('all'), 'encities_foreign'] = country
    df_c.loc[df_c.cities_foreign.str.contains('всі'), 'cities_foreign'] = df_c.country.iloc[0]

if df_c.encities_ua.str.contains('all').sum() < len(df_c):
    df_c.loc[df_c.encities_ua.str.contains('all'), 'encities_ua'] = 'other'
    df_c.loc[df_c.cities_ua.str.contains('всі'), 'cities_ua'] = 'Ρ–Π½ΡˆΡ–'
else:
    df_c.loc[df_c.encities_ua.str.contains('all'), 'encities_ua'] = 'Ukraine'
    df_c.loc[df_c.cities_ua.str.contains('всі'), 'cities_ua'] = 'Π£ΠΊΡ€Π°Ρ—Π½Π°'

# Warning after it

λΉ λ₯Έ λ‹΅λ³€ κ°μ‚¬ν•©λ‹ˆλ‹€!

μ—¬κΈ°μ„œ λ¬Έμ œλŠ” 4 ν–‰μ—μ„œ .loc 둜 λ¨Όμ € 데이터 ν”„λ ˆμž„μ„ λΆ„ν• ν•œλ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€. ν•΄λ‹Ή μŠ¬λΌμ΄μŠ€μ— 값을 ν• λ‹Ήν•˜λ €λŠ” μ‹œλ„μž…λ‹ˆλ‹€.

df_c = df.loc[df.encountry == country, :]

PandasλŠ” 값을 df_c μŠ¬λΌμ΄μŠ€μ—λ§Œ 할당할지 μ•„λ‹ˆλ©΄ μ›λž˜ df κΉŒμ§€ λ‹€μ‹œ μ „νŒŒν• μ§€ 100 % ν™•μ‹ ν•˜μ§€ λͺ»ν•©λ‹ˆλ‹€. df_c λ₯Ό 처음 ν• λ‹Ή ν•  λ•Œμ΄λ₯Ό λ°©μ§€ν•˜λ €λ©΄ λ‹€μŒμ„ μ‚¬μš©ν•˜μ—¬ νŒ¬λ”μ—κ²Œ 자체 데이터 ν”„λ ˆμž„ (μŠ¬λΌμ΄μŠ€κ°€ μ•„λ‹˜)μž„μ„ μ•Œλ €μ•Όν•©λ‹ˆλ‹€.

df_c = df.loc[df.encountry == country, :].copy()

μ΄λ ‡κ²Œν•˜λ©΄ 였λ₯˜κ°€ μˆ˜μ •λ©λ‹ˆλ‹€. 이 μΈ‘λ©΄μ—μ„œ λ§Žμ€ μ‚¬μš©μžκ°€ νŒ¬λ”λ‘œ 인해 ν˜Όλž€μŠ€λŸ¬μ›Œν•˜λŠ” 것을 μ•Œμ•˜μœΌλ―€λ‘œ μœ„μ˜ μ„€λͺ…을 돕기 μœ„ν•΄ κ°„λ‹¨ν•œ 예제λ₯Ό μ œκ³΅ν•˜κ² μŠ΅λ‹ˆλ‹€.

κ΅¬μ„±λœ λ°μ΄ν„°κ°€μžˆλŠ” 예

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('QQQCC')})
>>> df
   A  B
0  1  Q
1  2  Q
2  3  Q
3  4  C
4  5  C
>>> df.loc[df['B'] == 'Q', 'new_col'] = 'hello'
>>> df
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q   hello
3  4  C     NaN
4  5  C     NaN

λ”°λΌμ„œ μœ„μ˜ λ‚΄μš©μ€ μ˜ˆμƒλŒ€λ‘œ μž‘λ™ν•©λ‹ˆλ‹€! 이제 λ°μ΄ν„°λ‘œ μ‹œλ„ν•œ μž‘μ—…μ„ λ°˜μ˜ν•˜λŠ” 예제λ₯Ό μ‹œλ„ν•΄ λ³΄κ² μŠ΅λ‹ˆλ‹€.

>>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('QQQCC')})
>>> df_q = df.loc[df['B'] == 'Q']
>>> df_q
   A  B
0  1  Q
1  2  Q
2  3  Q
>>> df_q.loc[df['A'] < 3, 'new_col'] = 'hello'
/Users/riddellcd/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:337: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)

>>> df_q
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q     NaN

같은 였λ₯˜κ°€ λ°œμƒν•œ 것 κ°™μŠ΅λ‹ˆλ‹€! ν•˜μ§€λ§Œ μ˜ˆμƒλŒ€λ‘œ df_q λ³€κ²½λ˜μ—ˆμŠ΅λ‹ˆλ‹€! 이것은 df_q 이 df 의 쑰각이기 λ•Œλ¬Έμž…λ‹ˆλ‹€. λ”°λΌμ„œ .loc []을 μ‚¬μš©ν•˜λ”λΌλ„ df_q pandasλŠ” λ³€κ²½ 사항을 μ „νŒŒν•˜μ§€ μ•Šμ„ 것이라고 κ²½κ³ ν•©λ‹ˆλ‹€. ~ df . 이λ₯Ό λ°©μ§€ν•˜λ €λ©΄ 더 λͺ…μ‹œ 적으둜 df_q df λ₯Ό λͺ…μ‹œ 적으둜 μ„ μ–Έν•˜μ—¬

df_q μ—μ„œ λ‹€μ‹œ μ‹œμž‘ν•˜μ§€λ§Œ μ΄λ²ˆμ—λŠ” .copy() ν•©λ‹ˆλ‹€.

>>> df_q = df.loc[df['B'] == 'Q'].copy()
>>> df_q
   A  B
0  1  Q
1  2  Q
2  3  Q

Lets try to reassign our value now!
>>> df_q.loc[df['A'] < 3, 'new_col'] = 'hello'
>>> df_q
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q     NaN

df_q 이 df 와 (κ³Ό) λ³„κ°œλΌκ³  Pandasμ—κ²Œ μ•Œλ € μ£Όμ—ˆμœΌλ―€λ‘œ 였λ₯˜μ—†μ΄ μž‘λ™ν•©λ‹ˆλ‹€.

μ‹€μ œλ‘œ df_c λŒ€ν•œ μ΄λŸ¬ν•œ λ³€κ²½ 사항이 μ΅œλŒ€ df κΉŒμ§€ μ „νŒŒλ˜κΈ°λ₯Ό μ›ν•˜λ©΄ μ™„μ „νžˆ λ‹€λ₯Έ 점이며 μ›ν•˜λŠ” 경우 λŒ€λ‹΅ ν•  κ²ƒμž…λ‹ˆλ‹€.

@CRiddler μ’‹μ•„μš”, κ°μ‚¬ν•©λ‹ˆλ‹€ !
μ–ΈκΈ‰ν–ˆλ“―μ΄ μ—°κ²°λœ .loc 은 (λŠ”) 예기치 μ•Šμ€ κ²°κ³Όλ₯Ό λ°˜ν™˜ ν•œ 적이 μ—†μŠ΅λ‹ˆλ‹€. λ‚΄κ°€ μ΄ν•΄ν•˜λŠ” 바와 같이, .copy() λŠ” Pandasκ°€ μ„ νƒν•œ df_sliced_once λ₯Ό λ³„λ„μ˜ 개체둜 μ·¨κΈ‰ν•˜κ³  초기 전체 df λ₯Ό λ³€κ²½ν•˜μ§€ μ•Šλ„λ‘ν•©λ‹ˆλ‹€. 뭐든지 μ„žμœΌλ©΄ μˆ˜μ • ν•΄μ£Όμ„Έμš”.

λ¬Έμ„œλŠ” μ—¬κΈ° http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning -a-view-versus-a-copy이며 @CRiddler μ—λŠ” 멋진 μ„€λͺ…이 μžˆμŠ΅λ‹ˆλ‹€. 일반적으둜 inplace λŠ” μ‚¬μš©ν•˜μ§€ μ•Šμ•„μ•Όν•©λ‹ˆλ‹€.

μ‹€μ œλ‘œ df_c λŒ€ν•œ μ΄λŸ¬ν•œ λ³€κ²½ 사항이 μ΅œλŒ€ df κΉŒμ§€ μ „νŒŒλ˜κΈ°λ₯Ό μ›ν•˜λ©΄ μ™„μ „νžˆ λ‹€λ₯Έ 점이며 μ›ν•˜λŠ” 경우 λŒ€λ‹΅ ν•  κ²ƒμž…λ‹ˆλ‹€.

@CRiddler κ°μ‚¬ν•©λ‹ˆλ‹€. 초기 데이터 ν”„λ ˆμž„μœΌλ‘œ μ „νŒŒν•˜κ±°λ‚˜ μˆ˜ν–‰ 방법을 ν‘œμ‹œ ν•  λ•Œ μΆ”κ°€ ν•  μˆ˜μžˆλŠ” Stack Overflow의 닡변보닀 λ‚«μŠ΅λ‹ˆλ‹€.

@persep 일반적으둜 문제λ₯Ό

원본 데이터 :

>>>import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('QQQCC')})
>>> df
   A  B
0  1  Q
1  2  Q
2  3  Q
3  4  C
4  5  C

μž„μ‹œ 데이터 ν”„λ ˆμž„μ„ λ§Œλ“€λ©΄ λ³€κ²½ 사항이 μ „νŒŒλ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.
이전 μ˜ˆμ—μ„œ λ³Ό 수 μžˆλ“―μ΄ df_q 만 λ³€κ²½ν•˜κ³  pandas κ²½κ³ κ°€ λ°œμƒν•©λ‹ˆλ‹€ (여기에 볡사 / λΆ™μ—¬ λ„£κΈ°λ˜μ§€ μ•ŠμŒ). 그리고 df λ³€κ²½ 사항을 μ „νŒŒν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

>>> df_q = df.loc[df["B"] == "Q"]
>>> df_q.loc[df["A"] < 3, "new_column"] = "hello"

# df remains unchanged because we only made changes to `df_q`
>>> df
   A  B
0  1  Q
1  2  Q
2  3  Q
3  4  C
4  5  C

λ‚΄κ°€ μ•„λŠ” ν•œ, μœ„μ™€ λ™μΌν•œ μ½”λ“œλ₯Ό μ‚¬μš©ν•˜κ³  λ³€κ²½ 사항을 κ°•μ œλ‘œ μ›λž˜ 데이터 ν”„λ ˆμž„μœΌλ‘œ μ „νŒŒν•˜λŠ” 방법은 μ—†μŠ΅λ‹ˆλ‹€.

κ·ΈλŸ¬λ‚˜ μš°λ¦¬κ°€ 생각을 쑰금 λ°”κΎΈκ³  μ™„μ „ν•œ λΆ€λΆ„ 집합 λŒ€μ‹  마슀크둜 μž‘μ—…ν•˜λ©΄ μ›ν•˜λŠ” κ²°κ³Όλ₯Ό 얻을 수 μžˆμŠ΅λ‹ˆλ‹€. 이것이 λ°˜λ“œμ‹œ ν•˜μœ„ μ§‘ν•©μ˜ μ›λž˜ 데이터 ν”„λ ˆμž„μ— λ³€κ²½ 사항을 "μ „νŒŒ"ν•˜λŠ” 것은 μ•„λ‹ˆμ§€λ§Œ, λ³€κ²½ 사항이 μ›λž˜ 데이터 ν”„λ ˆμž„ df μ—μ„œ λ°œμƒν•˜λ„λ‘ 보μž₯ν•©λ‹ˆλ‹€. 이λ₯Ό μœ„ν•΄ λ¨Όμ € 마슀크λ₯Ό λ§Œλ“  λ‹€μŒ df ν•˜μœ„ 집합을 λ³€κ²½ν•˜λ €λŠ” 경우 마슀크λ₯Ό μ μš©ν•©λ‹ˆλ‹€.

>>> q_mask = df["B"] == "Q"
>>> a_mask = df["A"] < 3

# Combine masks (in this case we used "&") to achieve what a nested subset would look like
#  In the same step we add in our item assignment. Instructing pandas to create a new column in `df` and assign
#  the value "hello" to the rows in `df` where `q_mask` & `a_mask` overlap.
>>> df.loc[q_mask & a_mask, "new_col"] = "hello"

# Successful "propagation" of new values to the original dataframe
>>> df
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q     NaN
3  4  C     NaN
4  5  C     NaN

λ§ˆμ§€λ§‰μœΌλ‘œ, df_qκ°€ μ–΄λ–»κ²Œ μƒκ²ΌλŠ”μ§€λ³΄κ³  μ‹Άλ‹€λ©΄ q_mask μ‚¬μš©ν•˜μ—¬ μ›λž˜ 데이터 ν”„λ ˆμž„μ—μ„œ ν•˜μœ„ 집합을 항상 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

>>> df.loc[q_mask, :]
   A  B new_col
0  1  Q   hello
1  2  Q   hello
2  3  Q     NaN

이것이 λ°˜λ“œμ‹œ df_q μ—μ„œ df "μ „νŒŒ"λ˜λŠ” 것은 μ•„λ‹ˆμ§€λ§Œ λ™μΌν•œ κ²°κ³Όλ₯Ό μ–»μŠ΅λ‹ˆλ‹€. μ‹€μ œ μ „νŒŒλŠ” λͺ…μ‹œ 적으둜 μˆ˜ν–‰λ˜μ–΄μ•Όν•˜λ©° 마슀크둜 μž‘μ—…ν•˜λŠ” 것보닀 νš¨μœ¨μ„±μ΄ λ–¨μ–΄μ§‘λ‹ˆλ‹€.

@CRiddler κ°μ‚¬ν•©λ‹ˆλ‹€, 당신은 맀우 λ„μ›€μ΄λ˜μ—ˆμŠ΅λ‹ˆλ‹€

이 νŽ˜μ΄μ§€κ°€ 도움이 λ˜μ—ˆλ‚˜μš”?
0 / 5 - 0 λ“±κΈ‰