Pandas: '์—ด'์€ ์ƒ‰์ธ์— ์—†์ง€๋งŒ ์ง€์˜ฅ์ž…๋‹ˆ๋‹ค. ๋ฒ„๊ทธ์ธ๋“ฏ...

์— ๋งŒ๋“  2017๋…„ 08์›” 18์ผ  ยท  24์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pandas-dev/pandas

๋‚˜๋Š” delivery๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ์ธ์‡„(delivery.columns)ํ•  ๋•Œ ๋‹ค์Œ์„ ์–ป์Šต๋‹ˆ๋‹ค.

Index(['Complemento_endereรงo', 'cnpj', 'Data_fundaรงรฃo', 'Nรบmero',
   'Razรฃo_social', 'CEP', 'situacao_cadastral', 'situacao_especial', 'Rua',
   'Nome_Fantasia', 'last_revenue_normalized', 'last_revenue_year',
   'Telefone', 'email', 'Capital_Social', 'Cidade', 'Estado',
   'Razรฃo_social', 'name_bairro', 'Natureza_Jurรญdica', 'CNAE', '#CNAE',
   'CNAEs_secundรกrios', 'Pessoas', 'percent'],
  dtype='object')

๊ธ€์Ž„, ์šฐ๋ฆฌ๋Š” 'Rua'์—ด์ด ์žˆ์Œ์„ ๋ถ„๋ช…ํžˆ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ ์ธ์‡„(delivery.Rua)ํ•˜๋ฉด ์ ์ ˆํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

82671                         R JUDITE MELO DOS SANTOS
817797                                R DOS GUAJAJARAS
180081           AV MARCOS PENTEADO DE ULHOA RODRIGUES
149373                                 AL MARIA TEREZA
455511                               AV RANGEL PESTANA
...

"delivery.columns: print('here I am')'์— if 'Rua'๋ผ๊ณ  ์จ๋„ 'here I am'์ด ์ธ์‡„๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ '๋ฃจ์•„'๊ฐ€ ์‹ค์ œ๋กœ ๊ฑฐ๊ธฐ์— ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธ€์Ž„, ๋‚ด๊ฐ€์ด ์ฝ”๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ฐ”๋กœ ๋‹ค์Œ ์ค„์— :

delivery=delivery.set_index('cnpj')[['Razรฃo_social','Nome_Fantasia','Data_fundaรงรฃo','CEP','Estado','Cidade','Bairro','Rua','Nรบmero','Complemento_endereรงo','Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurรญdica','Pessoas' ]]

๊ทธ๋ฆฌ๊ณ  ์งœ์ž”, ์ด์ƒํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

Traceback (most recent call last):
File "/file.py", line 45, in <module>
   'Telefone', 'email', 'Capital_Social', 'Cidade', 'Estado',
   'Razรฃo_social', 'name_bairro', 'Natureza_Jurรญdica', 'CNAE', '#CNAE',
'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurรญdica','Pessoas' ]]
   'CNAEs_secundรกrios', 'Pessoas', 'percent'],
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 1991, in __getitem__
  dtype='object')
return self._getitem_array(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2035, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/indexing.py", line 1214, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['Rua'] not in index"

๋ˆ„๊ตฐ๊ฐ€ ๋„์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? ๋‚˜๋Š” stackoverflow๋ฅผ ์‹œ๋„ํ–ˆ์ง€๋งŒ ์•„๋ฌด๋„ ๋„์šธ ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋‚ด๊ฐ€ ๋ฏธ์ณค๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ๊ณ  'Rua'๋Š” ๋‚ด ํ˜ผ๋ž€์Šค๋Ÿฌ์šด ๋งˆ์Œ์˜ ํ™˜์ƒ์ž…๋‹ˆ๋‹ค.

์ถ”๊ฐ€ ์ •๋ณด

์˜ค๋ฅ˜ ์ค„ ๋ฐ”๋กœ ์•ž์—์ด ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

delivery=pd.DataFrame()

for i in selection.index:
    sample=groups.get_group(selection['#CNAE'].loc[i]).sample(selection['samples'].loc[i])
    delivery=pd.concat((delivery,sample)).sort_values('Capital_Social',ascending=False)


print(delivery.columns)
print(delivery.Rua)
print(delivery.set_index('cnpj').columns)

delivery=delivery.set_index('cnpj')[['Razรฃo_social','Nome_Fantasia','Data_fundaรงรฃo','CEP','Estado','Cidade','Bairro','Rua','Nรบmero','Complemento_endereรงo',
                                 'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurรญdica','Pessoas' ]]

ํŽธ์ง‘ํ•˜๋‹ค

์ƒˆ๋กœ์šด ์ด์ƒํ•œ ๊ฒƒ๋“ค:
๋‚˜๋Š” ๊ทธ๊ฒƒ์ด ์ž‘๋™ํ•˜๊ธฐ๋ฅผ ๋ฐ”๋ผ๋Š” ๋งˆ์ง€๋ง‰ ์ฝ”๋“œ์—์„œ 'Rua'๋ฅผ ํฌ๊ธฐํ•˜๊ณ  ์‚ญ์ œํ–ˆ์Šต๋‹ˆ๋‹ค. ๋†€๋ž๊ฒŒ๋„ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์ง€๋งŒ ์ด์ œ๋Š” 'Nรบmero' ์—ด์ด ์žˆ์Šต๋‹ˆ๋‹ค.

delivery=delivery.set_index('cnpj')[['Razรฃo_social','Nome_Fantasia','Data_fundaรงรฃo','CEP','Estado','Cidade','Bairro','Nรบmero','Complemento_endereรงo',
                                                 'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurรญdica' ]]

KeyError: "['Nรบmero'] not in index"

ํŽธ์ง‘ 2

๊ทธ๋Ÿฌ๋‹ค '๋ˆ„๋ฉ”๋กœ'๋ฅผ ํฌ๊ธฐํ•˜๊ณ  ๊บผ๋ƒˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ 'Complemento_endereรงo'์—์„œ๋„ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ 'Complemento_endereรงo'๋ฅผ ์‚ญ์ œํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ทธ๊ฒƒ์€ 'Telefone' ๋“ฑ์— ์ผ์–ด๋‚ฌ์Šต๋‹ˆ๋‹ค.

* ํŽธ์ง‘ 3 *

pd.show_versions()๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

์„ค์น˜๋œ ๋ฒ„์ „

์ปค๋ฐ‹: ์—†์Œ
ํŒŒ์ด์ฌ: 3.5.0.final.0
ํŒŒ์ด์ฌ ๋น„ํŠธ: 64
์šด์˜ ์ฒด์ œ: ๋‹ค์œˆ
OS ๋ฆด๋ฆฌ์Šค: 16.5.0
๊ธฐ๊ณ„: x86_64
ํ”„๋กœ์„ธ์„œ: i386
๋ฐ”์ดํŠธ ์˜ค๋”: ์กฐ๊ธˆ
LC_ALL: ์—†์Œ
๋ž‘: ์—†์Œ

ํŒฌ๋”: 0.18.1
์ฝ”: ์—†์Œ
ํ•: 8.1.2
์„ค์ • ๋„๊ตฌ: 18.2
์‚ฌ์ด์ฌ: ์—†์Œ
์ˆซ์ž: 1.11.0
์‚ฌ์ดํ”ผ: 0.17.1
ํ†ต๊ณ„ ๋ชจ๋ธ: 0.6.1
xarray: ์—†์Œ
IPython: ์—†์Œ
์Šคํ•‘ํฌ์Šค: ์—†์Œ
ํŒจํ‹ฐ: 0.4.1
๋‚ ์งœ ์œ ํ‹ธ๋ฆฌํ‹ฐ: 2.5.3
ํ”ผ์ธ : 2016.4
๋ธ”๋ก: ์—†์Œ
๋ณ‘๋ชฉ ํ˜„์ƒ: ์—†์Œ
ํ…Œ์ด๋ธ”: ์—†์Œ
numexpr: ์—†์Œ
๋งคํŠธํ”Œ๋กฏ๋ฆฝ: 1.5.1
openpyxl: ์—†์Œ
xlrd: 1.0.0
xlwt: ์—†์Œ
xlsxwriter: ์—†์Œ
lxml: ์—†์Œ
bs4: 4.5.1
html5lib: ์—†์Œ
httplib2: ์—†์Œ
API ํด๋ผ์ด์–ธํŠธ: ์—†์Œ
sqlalchemy: 1.1.3
pymysql: 0.7.11.์—†์Œ
psycopg2: ์—†์Œ
jinja2: ์—†์Œ
๋ณดํ† : ์—†์Œ
pandas_datareader: ์—†์Œ
์—†์Œ

Indexing

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์ด๊ฒƒ์€ ๋ฌด์—‡์ด ์ž˜๋ชป๋˜์—ˆ๋Š”์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด Google์„ ๊ฒ€์ƒ‰ํ•˜์—ฌ ์—ฌ๊ธฐ์— ๋„์ฐฉํ•œ ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

CSV ๋˜๋Š” XLSX๋กœ ์ž‘์—…ํ•˜๋Š” ๊ฒฝ์šฐ ์—ด ์ด๋ฆ„์˜ ์•ž์ด๋‚˜ ๋์— ๊ณต๋ฐฑ์ด ์—†๋Š”์ง€ 100% ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

CSV๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ ์—ด์„ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. df๋ฅผ csv๋กœ ๋‚ด๋ณด๋‚ด๊ณ  Excel์—์„œ ์—ด ๋•Œ ํ›„ํ–‰ ๋˜๋Š” ์„ ํ–‰ ๊ณต๋ฐฑ์„ ๋ณผ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ์žฅ์ด๋‚˜ ๋ฉ”๋ชจ์žฅ++๋กœ ์—ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์‹œ ๋งํ•˜์ง€๋งŒ, ์ด๊ฒƒ์€ Google ๊ฒ€์ƒ‰์—์„œ ์—ฌ๊ธฐ์— ๋„์ฐฉํ•œ ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ ์ค‘์ธ csv, xlsx ๋˜๋Š” ๊ธฐํƒ€ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ํŒŒ์ผ ํ…œํ”Œ๋ฆฟ์˜ ์—ด ํ—ค๋” ์ด๋ฆ„์—์„œ ๋ชจ๋“  ์„ ํ–‰ ๋ฐ ํ›„ํ–‰ ๊ณต๋ฐฑ์ด ์ œ๊ฑฐ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋“  24 ๋Œ“๊ธ€

@abutremutante : ์‹ ๊ณ ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ์ •๋ง ์ด์ƒํ•ด ๋ณด์ด์ง€๋งŒ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ด ์‹œ์ ์—์„œ ๋ณต์ œํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์™„์ „ํ•œ ์ฝ”๋“œ ์ƒ˜ํ”Œ์„ ์ œ๊ณตํ•ด ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

๋˜ํ•œ ์ดˆ๊ธฐ ๋ฐœํ–‰ ์ƒ์ž์— pd.show_versons ์˜ ์ถœ๋ ฅ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”! ๋Œ€๋‹ต ํ•ด์ค˜์„œ ๊ณ ๋งˆ์›Œ.
github์— ๊ณต๊ฐœํ•˜๊ณ  ์‹ถ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋ฉ”์ผ๋กœ ๋ณด๋‚ด๋“œ๋ฆด๊นŒ์š”?

Em 17 de ago de 2017, ร (s) 19:38, gfyoung [email protected] escreveu:

@abutremutante https://github.com/abutremutante : ์‹ ๊ณ ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ์ •๋ง ์ด์ƒํ•ด ๋ณด์ด์ง€๋งŒ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ด ์‹œ์ ์—์„œ ๋ณต์ œํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์™„์ „ํ•œ ์ฝ”๋“œ ์ƒ˜ํ”Œ์„ ์ œ๊ณตํ•ด ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

๋˜ํ•œ ์ดˆ๊ธฐ ๋ฐœํ–‰ ์ƒ์ž์— pd.show_versons์˜ ์ถœ๋ ฅ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ฑฐ๋‚˜ GitHub https://github.com/pandas-dev/pandas/issues/17275#issuecomment-323213193 ์—์„œ ํ™•์ธํ•˜๊ฑฐ๋‚˜ https://github.com/notifications/unsubscribe-auth/ ์Šค๋ ˆ๋“œ๋ฅผ ์Œ์†Œ๊ฑฐ

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ์‚ฌ๋žŒ์€ ์ฝ”๋“œ๋ฅผ ๋ณผ ํ•„์š”๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ๊ฐ€๊ธ‰์ ์ด๋ฉด ์•ˆ ๋ฉ๋‹ˆ๋‹ค. ๋ฏผ๊ฐํ•œ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์ง€ ์•Š์€ ๋‹ค๋ฅธ ํ…Œ์ด๋ธ”(๋˜๋Š” DataFrame )๋กœ ๋ณต์ œ๋ฅผ ์‹œ๋„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์—ฌ๊ธฐ์—์„œ ์‹œ๋„:

pandas๋ฅผ pd๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
import FindCos.FindCos_Functions as find #๊ทธ๊ฒƒ์€ ๋‚ด๊ฐ€ ์ผ๋ถ€ ๊ธฐ๋Šฅ์„ ์ž‘์„ฑํ•˜๋Š” ํŒŒ์ผ์ž…๋‹ˆ๋‹ค
๋‚ ์งœ ์‹œ๊ฐ„ ๊ฐ€์ ธ์˜ค๊ธฐ
๊ฐ€์ ธ์˜ค๊ธฐ pdb

target=find.get_full_basics(business='select * from sqltable;',test_mode=False)

CNAE=['23.30-3-01','26','27','49.30-2-03','37.02-9-00','46.45','47.73','46.44-3-01' ]
๊ณ ์šฉ๋œ_cos=200

CNAE์—์„œ ํ•ญ๋ชฉ ์„ ํƒ

์„ ํƒ=pd.DataFrame()
CNAE์—์„œ i์˜ ๊ฒฝ์šฐ:
x=target.loc[๋Œ€์ƒ['#CNAE'].str.startswith(i) == ์ฐธ]
์„ ํƒ=pd.concat((์„ ํƒ,x))

ํ•„ํ„ฐ๋ง

selection=selection.loc[selection['์ž๋ณธ_์†Œ์…œ'] < 100000000].loc[selection['situacao_cadastral'] == 'ATIVA']\
.loc[selection['situacao_especial'].isnull() == True].loc[selection['Natureza_Juridica'] != 'EMPRESA INDIVIDUAL DE RESP.LIMITADA(DE NATUREZA EMPRESARIA)']\
.loc[selection['Natureza_Juridica'] != 'EMPRESARIO(๊ฐœ์ธ)']\
.loc[์„ ํƒ['์—์Šคํƒ€๋„'] != 'PA'].loc[์„ ํƒ['์—์Šคํƒ€๋„'] != '์˜ค์ „']\
.loc[์„ ํƒ['์—์Šคํƒ€๋„'] != 'RR'].loc[์„ ํƒ['์—์Šคํƒ€๋„'] != 'AC'].loc[์„ ํƒ['์—์Šคํƒ€๋„'] != 'RO'].loc[์„ ํƒ[ '์—์Šคํƒ€๋„'] != 'AP']\
.loc[selection['Estado'] != 'TO']

๋ณต์ œ ์ œ์–ด

lista=['ํŒŒ์ผ.csv']
selection=find.exclude_business(selection,lista)

ํ”„๋กœํ•„ ํ™•์ธ

groups=selection.groupby('#CNAE')
์„ ํƒ['๋ฐฑ๋ถ„์œจ']=๊ทธ๋ฃน['#CNAE'].transform('ํฌ๊ธฐ')/len(์„ ํƒ)
์„ ํƒ=์„ ํƒ[['#CNAE','๋ฐฑ๋ถ„์œจ']].drop_duplicates().sort_values('๋ฐฑ๋ถ„์œจ',์˜ค๋ฆ„์ฐจ์ˆœ=๊ฑฐ์ง“)
์„ ํƒ['์ƒ˜ํ”Œ']=round(((hired_cos 1.05) ์„ ํƒ['ํผ์„ผํŠธ']))

๋ฐฐ๋‹ฌ=pd.DataFrame()
selection.index์—์„œ i์˜ ๊ฒฝ์šฐ:
sample=groups.get_group(selection['#CNAE'].loc[i]).sample(selection['samples'].loc[i])
delivery=pd.concat((delivery,sample)).sort_values('Capital_Social',ascending=False)#.rename(columns={'Capital_Social':'Score_Tamanho'})

RUA๊ฐ€ ์‹ค์ œ๋กœ ์กด์žฌํ•˜๋Š”์ง€ ํ™•์ธ

์ธ์‡„(delivery.columns)
์ธ์‡„(๋ฐฐ๋‹ฌ.๋ฃจ์•„)
print(delivery.set_index('cnpj').columns)
delivery=delivery.rename(columns={'๋ฃจ์•„':'๋ฃจ์•„'})
delivery.columns์˜ 'Rua'์ธ ๊ฒฝ์šฐ:
print('์—ฌ๊ธฐ ์žˆ์Šต๋‹ˆ๋‹ค')

๋ฌธ์ œ ๋ผ์ธ

delivery=delivery.set_index('cnpj')[['cnpj','Razao_social','Nome_Fantasia','Data_fundacao','CEP','Estado','Cidade','Bairro','Rua','Numero ','Complemento_endereco','Telefone','email','Capital_Social','CNAE','#CNAE','Natureza_Juridica']]

@abutremutante : ๊ณ ๋ง™์ง€๋งŒ ๋ถˆํ–‰ํžˆ๋„ ์ด ์ฝ”๋“œ๋Š” ๋ณต์ œํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. import FindCos.FindCos_Functions ์‹คํ–‰ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. DataFrame ๋ฅผ ์ฒ˜์Œ๋ถ€ํ„ฐ ์ƒ์„ฑํ•˜๊ณ  ๋ฌธ์ œ๋ฅผ ๋ณต์ œํ•ด ๋ณด์‹ญ์‹œ์˜ค.

๋˜ํ•œ ์ดˆ๊ธฐ ๋ฐœํ–‰ ์ƒ์ž์— pd.show_versons์˜ ์ถœ๋ ฅ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@gfyoung : ์ดˆ๊ธฐ ๋ฐœํ–‰
๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๊ณผ ๊ด€๋ จํ•˜์—ฌ ์ƒ๋‹นํžˆ ๊ธด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ž…๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋ฐ”๋กœ ์—ฌ๊ธฐ์—์„œ 10๊ฐœ์˜ ์ฒซ ๋ฒˆ์งธ ์ค„์„ ์‚ฌ์šฉํ•˜์—ฌ csv๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

target=find.get_full_basics(business='select * from sqltable limit 10;',test_mode=False)
target.to_csv('target10items.csv')

์—ฌ๊ธฐ์— ์ฒจ๋ถ€ํ•ฉ๋‹ˆ๋‹ค.
target10items.csv.zip

1) ์ด ๋” ์ž‘์€ DataFrame ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ๋ณต์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?
2) pandas (ํ˜„์žฌ 0.20.3 ์žˆ์Œ)์˜ ์•„์ฃผ ์˜ค๋ž˜๋œ ๋ฒ„์ „์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ์—…๊ทธ๋ ˆ์ด๋“œ๋ฅผ ์‹œ๋„ํ•˜๊ณ  ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

'Bairro'๋Š” print(delivery.columns) ์— ๋Œ€ํ•œ ์ถœ๋ ฅ์—๋Š” ์—†์ง€๋งŒ set_index ๋‹ค์Œ์— ์ œ๊ณตํ•˜๋Š” ๋ชฉ๋ก์—๋Š” ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๋ชฉ๋ก์—์„œ 'Rua' ๋ฐ”๋กœ ์•ž์— 'Bairro'๊ฐ€ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์ด ์กฐ๊ธˆ ์˜์‹ฌ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค. ๋ˆ„๋ฝ๋œ ์—ด์„ ์„ ํƒํ•˜๋Š” ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€์— ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?

์ข‹์•„, ๋ฌธ์ œ๋Š” 'Bairro'๊ฐ€ ์‹ค์ œ๋กœ ๋ˆ„๋ฝ๋œ ํ‚ค์ด์ง€๋งŒ pandas 0.18.1 ์— ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๊ฐ€ ๋ˆ„๋ฝ๋œ ํ‚ค๋กœ ์ž˜๋ชป๋œ ํ•ญ๋ชฉ์„ ํ‘œ์‹œํ•˜๋Š” ๋ฒ„๊ทธ๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋‹ค์Œ ์ฝ”๋“œ ์‚ฌ์šฉ

import pandas as pd
import numpy as np

cols = pd.Index(['Complemento_endereรงo', 'cnpj', 'Data_fundaรงรฃo', 'Nรบmero',
   'Razรฃo_social', 'CEP', 'situacao_cadastral', 'situacao_especial', 'Rua',
   'Nome_Fantasia', 'last_revenue_normalized', 'last_revenue_year',
   'Telefone', 'email', 'Capital_Social', 'Cidade', 'Estado',
   'Razรฃo_social', 'name_bairro', 'Natureza_Jurรญdica', 'CNAE', '#CNAE',
   'CNAEs_secundรกrios', 'Pessoas', 'percent'],
  dtype='object')
delivery = pd.DataFrame(np.random.random(size=(5, len(cols))), columns=cols)

delivery = delivery.set_index('cnpj')[['Razรฃo_social','Nome_Fantasia','Data_fundaรงรฃo','CEP','Estado','Cidade','Bairro','Rua','Nรบmero','Complemento_endereรงo','Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurรญdica','Pessoas' ]]

pandas 0.18.1 ์—์„œ ๋‹ค์Œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

KeyError: "['Rua'] not in index"

๊ทธ๋Ÿฌ๋‚˜ pandas 0.20.3 ์—์„œ ์ˆ˜์ •๋œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

KeyError: "['Bairro'] not in index"

๋„ˆ
๋ชป ๋ฐ•์•˜๋‹ค
๊ทธ๊ฒƒ
@jschendel

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค @gfyoung

์ •๋ง ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค.

๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋œ ๊ฒƒ ๊ฐ™์•„์„œ ์ข…๋ฃŒํ•ฉ๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”
๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์‹ค์งˆ์ ์ธ ์•„์ด๋””์–ด๊ฐ€ ๋ณด์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค @gfyoung ์™œ ์ด๊ฒƒ์„ ๋‹ซ์Šต๋‹ˆ๊นŒ? ๋‚˜๋Š” ์—ฌ์ „ํžˆ์ด ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ถˆ๋งŒ์ด ์—†์Šต๋‹ˆ๋‹ค. ์ด ์˜ค๋ฅ˜์— ์ง€์ณค์Šต๋‹ˆ๋‹ค.

@wangxuesong29 ์ตœ์†Œํ•œ์˜ ์˜ˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

๋‚˜๋Š” ๋‹น์‹ ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. OpenOffice ํ”„๋กœ๊ทธ๋žจ์—์„œ .csv ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ๊ด€์ฐฐํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๋Œ€์‹  ์ธํ„ฐ๋„ท์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ๊ฐ„๋‹จํ•œ ๋ฉ”๋ชจ์žฅ++ ํŽธ์ง‘๊ธฐ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ํŽธ์ง‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์ •์ƒ์ ์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์ด ์†”๋ฃจ์…˜์ด ๋„์›€์ด ๋˜์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ์ง€๋งŒ .csv ํŒŒ์ผ์„ ์ง€์›ํ•˜๋Š” ํ…์ŠคํŠธ ํŽธ์ง‘๊ธฐ๋‚˜ ํ”„๋กœ๊ทธ๋žจ์„ ๋ณ€๊ฒฝํ•ด์•ผ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ค๋ฅ˜ :
ํŒฌ๋” ๋ฒ„์ „ 0.23.4
์œ„์™€ ๋™์ผํ•œ ์ฝ”๋“œ๋ฅผ ๋‚จ๊ฒจ๋‘๊ณ  ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด

์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•œ ํ›„ ๋‹ค์Œ์„ ์–ป์Šต๋‹ˆ๋‹ค.

์ƒ‰์ธ์— ์—†๋Š” '์ด์›ƒ'

์ฝ”๋“œ :
`pandas๋ฅผ pd๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
numpy๋ฅผ np๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ

cols = pd.Index(['Address_Complement', 'cnpj', 'Foundation_Date', '์ˆซ์ž',
'Corporate_reason', 'CEP', 'situacao_cadastral', 'situacao_especial', '๊ฑฐ๋ฆฌ',
'Fantasy_Name', 'last_revenue_normalized', 'last_revenue_year',
'์ „ํ™”', '์ด๋ฉ”์ผ', '์ž๋ณธ_์†Œ์…œ', '์‹œ', '์ฃผ',
'๊ธฐ์—…_์ด์œ ', '์ด๋ฆ„_์ด์›ƒ', '์ž์—ฐ๋ฒ•', 'CNAE', '#CNAE',
'Secondary_CNAE', '์‚ฌ๋žŒ', '๋ฐฑ๋ถ„์œจ'],
dtype='๊ฐ์ฒด')
๋ฐฐ๋‹ฌ = pd.DataFrame(np.random.random(ํฌ๊ธฐ=(5, len(cols))), ์—ด=์—ด)

delivery = delivery.set_index('cnpj')[['ํšŒ์‚ฌ_์ด๋ฆ„','ํŒํƒ€์ง€_์ด๋ฆ„','์žฌ๋‹จ_๋‚ ์งœ','์šฐํŽธ๋ฒˆํ˜ธ','์ฃผ','์‹œ','๋™๋„ค','๊ฑฐ๋ฆฌ','๋ฒˆํ˜ธ',' Address_Complement ','์ „ํ™”', '์ด๋ฉ”์ผ', 'Social_Capital', 'CNAE', '#CNAE', 'Legal_Nature','People' ]]
`

์ด๊ฒƒ์€ ๋ฌด์—‡์ด ์ž˜๋ชป๋˜์—ˆ๋Š”์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด Google์„ ๊ฒ€์ƒ‰ํ•˜์—ฌ ์—ฌ๊ธฐ์— ๋„์ฐฉํ•œ ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

CSV ๋˜๋Š” XLSX๋กœ ์ž‘์—…ํ•˜๋Š” ๊ฒฝ์šฐ ์—ด ์ด๋ฆ„์˜ ์•ž์ด๋‚˜ ๋์— ๊ณต๋ฐฑ์ด ์—†๋Š”์ง€ 100% ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

CSV๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ ์—ด์„ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. df๋ฅผ csv๋กœ ๋‚ด๋ณด๋‚ด๊ณ  Excel์—์„œ ์—ด ๋•Œ ํ›„ํ–‰ ๋˜๋Š” ์„ ํ–‰ ๊ณต๋ฐฑ์„ ๋ณผ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ์žฅ์ด๋‚˜ ๋ฉ”๋ชจ์žฅ++๋กœ ์—ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์‹œ ๋งํ•˜์ง€๋งŒ, ์ด๊ฒƒ์€ Google ๊ฒ€์ƒ‰์—์„œ ์—ฌ๊ธฐ์— ๋„์ฐฉํ•œ ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ ์ค‘์ธ csv, xlsx ๋˜๋Š” ๊ธฐํƒ€ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ํŒŒ์ผ ํ…œํ”Œ๋ฆฟ์˜ ์—ด ํ—ค๋” ์ด๋ฆ„์—์„œ ๋ชจ๋“  ์„ ํ–‰ ๋ฐ ํ›„ํ–‰ ๊ณต๋ฐฑ์ด ์ œ๊ฑฐ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ๋˜ํ•œ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
์—ด์˜ ์ด๋ฆ„์€ correctlz์ด์ง€๋งŒ csv ํŒŒ์ผ๊ณผ ํ•จ๊ป˜ seaborn์„ ์‚ฌ์šฉํ•  ๋•Œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค(๋‚ด ์—ด์€ ์šฐ๋ฆฌ์˜ ์ธ๋ฑ์Šค์ž„).

import seaborn as sns
import pandas as pd
Data = pd.read_csv('test.csv',delimiter=',') 
sns.lmplot(x='predLabel', y='trueLabel', data=Data)

์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€:
KeyError: "['predLabel' 'trueLabel'] ์ธ๋ฑ์Šค์— ์—†์Œ"

๋‚˜๋„ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€์žˆ๋‹ค
์—ด์˜ ์ด๋ฆ„์€ ์ •ํ™•ํ•˜์ง€๋งŒ csv ํŒŒ์ผ๊ณผ ํ•จ๊ป˜ seaborn์„ ์‚ฌ์šฉํ•  ๋•Œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค(๋‚ด ์—ด์ด ์ธ๋ฑ์Šค๋ฅผ ๋ฒ—์–ด๋‚ฌ์Šต๋‹ˆ๋‹ค).

์”จ๋ณธ์„ sns๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
pandas๋ฅผ pd๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
DF =
pd.read_csv('lawma1.csv', index_col =[0, 1], ๊ตฌ๋ถ„ ๊ธฐํ˜ธ=', ')
sns.lmplot(x='WEEK1', y='FLEET', ๋ฐ์ดํ„ฐ=df).savefig('law.png')

์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€:
KeyError: "['FLEET'] ์ธ๋ฑ์Šค์— ์—†์Œ

๋‚˜๋Š”์ด ์˜ค๋ฅ˜๊ฐ€ ์žˆ์—ˆ๊ณ  ์  "."์ด ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ด ์ด๋ฆ„์˜ ๋์—์„œ ์ œ๊ฑฐํ•œ ํ›„์— ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค.
podaci = pd.read_csv('data/fifa19a.csv', names=['id', 'ime', 'godine', 'ocjena', 'potencijal.', 'bodovi', 'stopalo', 'placa_tis_eur', 'cijena_mil_eur'])
"potencijal"๋ž€์— ์ด๋Ÿฐ ์‹์ด์—ˆ์Šต๋‹ˆ๋‹ค.
๋ฒ„๊ทธ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ฐ™์€ ๋ฌธ์ œ์— ์ง๋ฉดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์‹ฌ์ง€์–ด print(X.columns)๋ฅผ ์‚ฌ์šฉํ–ˆ๊ณ  ์ธ๋ฑ์Šค 'exposure_end'๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์ง€๋งŒ ๋‚ด๊ฐ€ ๊ทธ๊ฒƒ์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ centroids_new=X.groupby(["clusters"]).mean()[[" Exposure_end","Duration"]] ์ธ๋ฑ์Šค์— ์—†๋Š” 'exposure_end' ์˜ค๋ฅ˜๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ์ง€๋‚œ 2์‹œ๊ฐ„ ๋™์•ˆ ์—ฌ๊ธฐ์— ๊ฐ‡ํ˜”์Šต๋‹ˆ๋‹ค. ๋„์™€์ฃผ์„ธ์š”.

๋‚ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.
๋‚˜๋Š” centroids_new=X.groupby(["clusters"]).mean()[["exposure_end","Duration"]] ์œ„์˜ ์ง„์ˆ ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์—ˆ๋Š”๋ฐ ์ด ์ง„์ˆ  ์œ„์— x.mean(axis=1)์„ ์‚ฌ์šฉํ•œ ๋‹ค์Œ ์ง„์ˆ ์„ ์‚ฌ์šฉ
centroids_new=X.groupby(["clusters"]).mean()[["exposure_end","Duration"]] ํ‰๊ท  ์—†์ด ์ž˜ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค. groupby์™€ ํ•จ๊ป˜ ์ž‘๋™ํ•˜์ง€ ์•Š์•„ ๋‘ ๋‹จ๊ณ„๋กœ ์ˆ˜ํ–‰ํ•ด์•ผํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด์ „ ๋ฌธ์—์„œ ์ถ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ๊ทธ๊ฒƒ์ด ์ผ์–ด๋‚œ ์ฃผ๋œ ๋ฌธ์ œ๋Š” ์ถ•์ด 1๋กœ ์„ค์ •๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ์•˜๊ณ  ์™„๋ฒฝํ•˜๊ฒŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

csv ํŒŒ์ผ์ด ' , ' ๋˜๋Š” ' ๋กœ ๊ตฌ๋ถ„๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ' . ์ œ ๊ฒฝ์šฐ์—๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ' , '๋กœ ๊ตฌ๋ถ„๋˜์—ˆ์ง€๋งŒ ์ €๋Š” ' ; '.

๊ทธ๋ž˜์„œ ๊ฐ„๋‹จํžˆ ์ถ”๊ฐ€

Df= pd.read_csv('C:\Users\user\Desktop\data.csv', sep=", ")

"์—ด"๊ณผ "์—ด"

๋‘ ๊ฐ€์ง€ ๋‹ค๋ฅธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ์•ž์— ๊ณต๋ฐฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณต๊ฐ„์ด ์žˆ๋Š” ๊ณณ์— ๊ณต๊ฐ„์„ ์ถ”๊ฐ€ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ:
df["column "]์ด ๋‚˜๋ฅผ ์œ„ํ•ด ์ผํ–ˆ์Šต๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰