Pandas: ๋Œ€๊ธฐ ์‹œ๊ฐ„์ด ๊ธด ์—ฐ๊ฒฐ์„ ํ†ตํ•ด to_sql์—์„œ ์—„์ฒญ๋‚œ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋‹ค์ค‘ ํ–‰ ์‚ฝ์ž… ์‚ฌ์šฉ

์— ๋งŒ๋“  2014๋…„ 12์›” 01์ผ  ยท  48์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pandas-dev/pandas

pandas-0.15.1, oursql-0.9.3.1 ๋ฐ sqlalchemy-0.9.4๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ~30k ํ–‰์„ mysql ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์‚ฝ์ž…ํ•˜๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๊ณ„๊ฐ€ ์ €์™€ ๋Œ€์„œ์–‘ ๊ฑด๋„ˆํŽธ์— ์žˆ๊ธฐ ๋•Œ๋ฌธ์— data.to_sql ๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๋ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๋ฐ 1์‹œ๊ฐ„ ๋„˜๊ฒŒ ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค. wireshark๋กœ ๊ฒ€์‚ฌํ•  ๋•Œ ๋ฌธ์ œ๋Š” ๋ชจ๋“  ํ–‰์— ๋Œ€ํ•ด ์‚ฝ์ž…์„ ๋ณด๋‚ธ ๋‹ค์Œ ๋‹ค์Œ ํ–‰์„ ๋ณด๋‚ด๊ธฐ ์ „์— ACK๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๊ณ  ์žˆ๊ณ , ๊ฐ„๋‹จํžˆ ๋งํ•ด์„œ ํ•‘ ์‹œ๊ฐ„์ด ๋‚˜๋ฅผ ์ฃฝ์ด๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ SQLAlchemy ์˜ ์ง€์นจ์— ๋”ฐ๋ผ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค.

def _execute_insert(self, conn, keys, data_iter):
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement(), data)

์—๊ฒŒ

def _execute_insert(self, conn, keys, data_iter):
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

์ „์ฒด ์ž‘์—…์ด 1๋ถ„ ์ด๋‚ด์— ์™„๋ฃŒ๋ฉ๋‹ˆ๋‹ค. (ํด๋ฆญ ํ•œ ๋ฒˆ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด insert into foo (columns) values (rowX) ์— ๋Œ€ํ•œ ์—ฌ๋Ÿฌ ํ˜ธ์ถœ๊ณผ ๋Œ€๊ทœ๋ชจ insert into foo (columns) VALUES (row1), (row2), row3) ํ˜ธ์ถœ ๊ฐ„์˜ ์ฐจ์ด์ž…๋‹ˆ๋‹ค.) ์‚ฌ๋žŒ๋“ค์ด ํŒฌ๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์–ผ๋งˆ๋‚˜ ๋˜๋Š”์ง€๋ฅผ ๊ฐ์•ˆํ•  ๋•Œ ์ด๊ฒƒ์€ ๋” ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ํฌํ•จ๋  ์ˆ˜ ์žˆ๋Š” ํฐ ์Šน๋ฆฌ์ฒ˜๋Ÿผ ๋Š๊ปด์ง‘๋‹ˆ๋‹ค.

๋ช‡ ๊ฐ€์ง€ ๊ณผ์ œ:

  • ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ€ ๋‹ค์ค‘ ํ–‰ ์‚ฝ์ž…์„ ์ง€์›ํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค(๊ณผ๊ฑฐ์—๋Š” SQLite์™€ SQLServer๊ฐ€ ์ง€์›ํ•˜์ง€ ์•Š์•˜์ง€๋งŒ ์ง€๊ธˆ์€ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค). SQLAlchemy๋ฅผ ํ†ตํ•ด ์ด๊ฒƒ์„ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ๋‚ด๊ฐ€ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋˜ MySQL ์„œ๋ฒ„๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ ๋ฒˆ์— ๋ชจ๋‘ ์‚ฝ์ž…ํ•  ์ˆ˜ ์—†์—ˆ๊ณ  ์ฒญํฌ ํฌ๊ธฐ๋ฅผ ์„ค์ •ํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค(5k๋Š” ์ž˜ ์ž‘๋™ํ–ˆ์ง€๋งŒ ์ „์ฒด 30k๋Š” ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค). ์ด๊ฒƒ์„ ๊ธฐ๋ณธ ์‚ฝ์ž…์œผ๋กœ ์„ค์ •ํ•˜๋ฉด ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ๋žŒ๋“ค์€ ์ฒญํฌ ํฌ๊ธฐ๋ฅผ ์ถ”๊ฐ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(์„œ๋ฒ„์˜ ์ตœ๋Œ€ ํŒจํ‚ท ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ณ„์‚ฐํ•˜๊ธฐ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Œ).

์ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ multirow= ๋ถ€์šธ ๋งค๊ฐœ๋ณ€์ˆ˜(๊ธฐ๋ณธ๊ฐ’ False )๋ฅผ to_sql ํ•จ์ˆ˜์— ์ถ”๊ฐ€ํ•œ ๋‹ค์Œ ์ฒญํฌ ํฌ๊ธฐ ์„ค์ •์„ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋งก๊ธฐ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋” ์ข‹์€ ๋ฐฉ๋ฒ•์ด ์žˆ์„๊นŒ์š”?

์ƒ๊ฐ?

IO SQL Performance

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

๋‹ค๋ฅธ ์‚ฌ๋žŒ์—๊ฒŒ ์œ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์›์ˆญ์ด ํŒจ์น˜ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ƒˆ์Šต๋‹ˆ๋‹ค. ํŒฌ๋”๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์ „์— ์ด ์ฝ”๋“œ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print "Using monkey-patched _execute_insert"
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

SQLTable._execute_insert = _execute_insert

๋ชจ๋“  48 ๋Œ“๊ธ€

์ด๊ฒƒ์€ ํ•ฉ๋ฆฌ์ ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. ์กฐ์‚ฌํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

๊ตฌํ˜„์˜ ๊ฒฝ์šฐ sqlalchemy๊ฐ€ ์ด๊ฒƒ์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํŠน์ง•์„ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š”์ง€์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค(์ง€๊ธˆ์€ ํ…Œ์ŠคํŠธํ•  ์ˆ˜ ์—†์ง€๋งŒ sqlalchemy์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค(์˜ˆ: http://stackoverflow.com/questions/ 23886764/multiple-insert-statements-in-mssql-with-sqlalchemy) ๋˜ํ•œ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ์ฒญํฌ ํฌ๊ธฐ๋ฅผ ์„ค์ •ํ•ด์•ผ ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ดˆ๋ž˜ํ•˜๋Š” ๊ฒฝ์šฐ ์‹ค์ œ๋กœ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ํ•˜๋Š” ๊ฒƒ์€ ์ข‹์€ ์ƒ๊ฐ์ด ์•„๋‹™๋‹ˆ๋‹ค(์šฐ๋ฆฌ๊ฐ€ ์„ค์ •ํ•˜์ง€ ์•Š๋Š” ํ•œ ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฐ’์œผ๋กœ chunksize).
๋”ฐ๋ผ์„œ ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋” ๋‚˜์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

@artemyk @mangecoeur @hayd @danielballan

๋ถ„๋ช…ํžˆ SQLAlchemy์—๋Š” dialect.supports_multivalues_insert ํ”Œ๋ž˜๊ทธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: http://pydoc.net/Python/SQLAlchemy/0.8.3/sqlalchemy.sql.compiler/ ์ฐธ์กฐ, ๋‹ค๋ฅธ ๋ฒ„์ „์—์„œ๋Š” supports_multirow_insert ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ, https ://www.mail-archive.com/[email protected]/msg202880.html ).

์ด๊ฒƒ์€ ์‚ฝ์ž… ์†๋„๋ฅผ ๋งŽ์ด ๋†’์ผ ์ˆ˜ ์žˆ๊ณ  ์ง€์›์„ ์‰ฝ๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ๋ณธ์ ์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๊ณ  chunksize๋ฅผ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜๋„ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: 16kb ์ฒญํฌ... ํ™•์‹คํ•˜์ง€ ์•Š์Œ ๋Œ€๋ถ€๋ถ„์˜ ์ƒํ™ฉ์—์„œ ๋„ˆ๋ฌด ํผ). ๋‹ค์ค‘ ํ–‰ ์‚ฝ์ž…์ด ์‹คํŒจํ•˜๋ฉด ์ฒญํฌ ํฌ๊ธฐ๋ฅผ ๋‚ฎ์ถ”๋„๋ก ์ œ์•ˆํ•˜๋Š” ์˜ˆ์™ธ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ด์ œ SQLAlchemy ์‚ฌ๋žŒ๋“ค์—๊ฒŒ SQL Server >2005์—์„œ supports_multivalues_insert ๋ฅผ true๋กœ ์„ค์ •ํ•˜๋„๋ก ์„ค๋“ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(์ฝ”๋“œ์— ํ•ดํ‚นํ•˜์—ฌ ์ œ๋Œ€๋กœ ์ž‘๋™ํ•˜์ง€๋งŒ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ผœ์ ธ ์žˆ์ง€ ์•Š์Œ).

์ข€ ๋” ์ฃผ์ œ๋ณ„ ๋ฉ”๋ชจ์—์„œ ๋‚˜๋Š” ์ฒญํฌ ํฌ๊ธฐ๊ฐ€ ๊นŒ๋‹ค๋กœ์šธ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. mysql ์„ค์ •(๋Œ€์šฉ๋Ÿ‰ ํŒจํ‚ท์„ ํ—ˆ์šฉํ•˜๋„๋ก ๊ตฌ์„ฑ)์—์„œ chunksize=5000์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. SQLServer ์„ค์ •์—์„œ๋Š” 500์ด ๋„ˆ๋ฌด ์ปธ์ง€๋งŒ 100์€ ์ œ๋Œ€๋กœ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ธฐ์ˆ ์˜ ๋Œ€๋ถ€๋ถ„์˜ ์ด์ ์€ ํ•œ ๋ฒˆ์— 1ํ–‰์„ 100์—์„œ 1000์œผ๋กœ ์‚ฝ์ž…ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ 100์œผ๋กœ ์‚ฝ์ž…ํ•˜๋Š” ๋ฐ์„œ ์˜ค๋Š” ๊ฒƒ์ด ์‚ฌ์‹ค์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

chunksize=None ๊ฐ€ "์ ์ ˆํ•˜๊ฒŒ ์ฒญํฌ ํฌ๊ธฐ ์„ ํƒ"์„ ์˜๋ฏธํ•œ๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ์š”? 5000, 500, 50, 1๊ณผ ๊ฐ™์€ ๊ฒƒ์„ ์‹œ๋„ํ•˜์‹ญ์‹œ์˜ค. ์‚ฌ์šฉ์ž๋Š” ์ฒญํฌ ํฌ๊ธฐ๋ฅผ ์ง€์ •ํ•˜์—ฌ ์ด ๊ธฐ๋Šฅ์„ ๋Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ๋„์˜ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋„ˆ๋ฌด ํฌ๋ฉด @maxgrenderjones ์ œ์•ˆ์ด ์ข‹์Šต๋‹ˆ๋‹ค. chunksize=10 ๊ฐ€ chunksize=1 ๋ณด๋‹ค ๋” ๋‚˜์€ ๊ธฐ๋ณธ๊ฐ’์ž…๋‹ˆ๋‹ค.

๊ทธ ๋งˆ์ง€๋ง‰ ์ฝ”๋ฉ˜ํŠธ์—์„œ " chunksize=10 ๋Š” chunksize=1 ๋ณด๋‹ค ๋” ๋‚˜์€ ๊ธฐ๋ณธ๊ฐ’์ž…๋‹ˆ๋‹ค. " -> ๊ทธ๊ฒƒ์€ ์™„์ „ํžˆ ์‚ฌ์‹ค์ด ์•„๋‹ˆ๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ์ƒํ™ฉ์€ ์—ฌ๋Ÿฌ ํ–‰์˜ ๋‹จ์ผ ํ–‰ ์‚ฝ์ž… ๋ช…๋ น๋ฌธ(์ฒญํฌ ํฌ๊ธฐ๊ฐ€ 1์ด ์•„๋‹˜)์œผ๋กœ ๊ตฌ์„ฑ๋œ ์‹คํ–‰ ๋ช…๋ น๋ฌธ์„ _one_ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด chunksize=10 ๋Š” ํ•˜๋‚˜์˜ ๋‹ค์ค‘ ํ–‰์ด ์žˆ์„ ๋•Œ๋งˆ๋‹ค ๋งŽ์€ ์‹คํ–‰ ๋ช…๋ น๋ฌธ์„ ์ˆ˜ํ–‰ํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ผ์›Œ ๋„ฃ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์ด ๋ฐ˜๋“œ์‹œ ๋” ๋น ๋ฅธ์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ์ƒํ™ฉ์— ๋”ฐ๋ผ ๋งŽ์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ˜„์žฌ ์ฝ”๋“œ์™€ ๋กœ์ปฌ sqlite ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ๊ฒฝ์šฐ:

In [4]: engine = create_engine('sqlite:///:memory:') #, echo='debug')

In [5]: df = pd.DataFrame(np.random.randn(50000, 10))

In [6]: %timeit df.to_sql('test_default', engine, if_exists='replace')
1 loops, best of 3: 956 ms per loop

In [7]: %timeit df.to_sql('test_default', engine, if_exists='replace', chunksize=10)
1 loops, best of 3: 2.23 s per loop

๊ทธ๋Ÿฌ๋‚˜ ๋ฌผ๋ก  ์ด๊ฒƒ์€ ๋‹ค์ค‘ ํ–‰ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋‹ค๋ฅธ ์‚ฌ๋žŒ์—๊ฒŒ ์œ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์›์ˆญ์ด ํŒจ์น˜ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ƒˆ์Šต๋‹ˆ๋‹ค. ํŒฌ๋”๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์ „์— ์ด ์ฝ”๋“œ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print "Using monkey-patched _execute_insert"
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

SQLTable._execute_insert = _execute_insert

์ƒˆ๋กœ์šด multirow=True ํ‚ค์›Œ๋“œ(ํ˜„์žฌ ๊ธฐ๋ณธ๊ฐ’์€ False์ž„)๋ฅผ ํ†ตํ•ด ์ด ๊ธฐ๋Šฅ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ๋‚˜์ค‘์— ํ•ญ์ƒ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

@maxgrenderjones @nhockham ์ด ์ด๊ฒƒ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด PR์„ ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๊นŒ?

@jorisvandenbossche ํŠน์ • ์„ฑ๋Šฅ ํ”„๋กœํ•„์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ํ‚ค์›Œ๋“œ ์ธ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์€ ์œ„ํ—˜ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๊ฒฝ์šฐ์— ๋” ๋น ๋ฅด๋‹ค๊ณ  ๋ณด์žฅํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด(ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์ž…๋ ฅ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ตœ์ƒ์˜ ๋ฐฉ๋ฒ•์„ ๊ฒฐ์ •ํ•˜๋„๋ก ํ•˜์—ฌ) ํ”Œ๋ž˜๊ทธ๊ฐ€ ์ „ํ˜€ ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์„œ๋กœ ๋‹ค๋ฅธ DB ์„ค์ •์€ ์„œ๋กœ ๋‹ค๋ฅธ ์„ฑ๋Šฅ ์ตœ์ ํ™”(๋‹ค๋ฅธ DB ์„ฑ๋Šฅ ํ”„๋กœํ•„, ๋กœ์ปฌ ๋Œ€ ๋„คํŠธ์›Œํฌ, ๋Œ€์šฉ๋Ÿ‰ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€ ๊ณ ์† SSD ๋“ฑ)๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ฐ๊ฐ์— ๋Œ€ํ•ด ํ‚ค์›Œ๋“œ ํ”Œ๋ž˜๊ทธ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด ์—‰๋ง์ด ๋ฉ๋‹ˆ๋‹ค.

์„ฑ๋Šฅ๋ณ„ ๊ตฌํ˜„์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด SQLDatabase ๋ฐ SQLTable์˜ ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•˜์œ„ ํด๋ž˜์Šค๋Š” ๊ฐœ์ฒด ์ง€ํ–ฅ API๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์•„๋งˆ๋„ "๋ฐฑ์—”๋“œ ์ „ํ™˜" ๋ฐฉ๋ฒ•์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์†”์งํžˆ OO API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๊ฐ„๋‹จํ•˜๋ฏ€๋กœ ์ด๋ฏธ ์ „๋ฌธํ™”๋œ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋Œ€ํ•ด์„œ๋Š” ๊ณผ๋„ํ•ฉ๋‹ˆ๋‹ค.

Postgres์— ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋กœ๋“œํ•˜๊ธฐ ์œ„ํ•ด ์ด๋Ÿฌํ•œ ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค(์‹ค์ œ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ CSV์— ์ €์žฅํ•œ ๋‹ค์Œ ๋‚ด์žฅ๋œ ๋น„ํ‘œ์ค€ COPY FROM sql ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์‚ฝ์ž…์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฆ…๋‹ˆ๋‹ค. https://gist.github ์ฐธ์กฐ. com/mangecoeur/1fbd63d4758c2ba0c470#file-pandas_postgres-py). ๊ทธ๊ฒƒ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด PgSQLDatabase(engine, <args>).to_sql(frame, name,<kwargs>) ๋ฅผ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ ๋กœ ๋‹ค์ค‘ ํ–‰ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ @jorisvandenbossche (12์›” 3์ผ ๊ฒŒ์‹œ๋ฌผ)์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๊ฝค ๋Š๋ฆฝ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์—ฌ๊ธฐ์„œ ์†๋„ ์ ˆ์ถฉ์€ ์‚ฌ์†Œํ•œ ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค.

In [4]: engine = create_engine('sqlite:///:memory:') #, echo='debug')

In [5]: df = pd.DataFrame(np.random.randn(50000, 10))

In [6]: 

In [6]: %timeit df.to_sql('test_default', engine, if_exists='replace')
1 loops, best of 3: 1.05 s per loop

In [7]: 

In [7]: from pandas.io.sql import SQLTable

In [8]: 

In [8]: def _execute_insert(self, conn, keys, data_iter):
   ...:         data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
   ...:         conn.execute(self.insert_statement().values(data))
   ...:     

In [9]: SQLTable._execute_insert = _execute_insert

In [10]: 

In [10]: reload(pd)
Out[10]: <module 'pandas' from '/usr/local/lib/python2.7/site-packages/pandas/__init__.pyc'>

In [11]: 

In [11]: %timeit df.to_sql('test_default', engine, if_exists='replace', chunksize=10)
1 loops, best of 3: 9.9 s per loop

๋˜ํ•œ ํ‚ค์›Œ๋“œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์œ„ํ—˜ํ•˜๋‹ค๋Š” ๋ฐ ๋™์˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹ค์ค‘ ํ–‰ ๊ธฐ๋Šฅ์€ ๋งค์šฐ ๊ธฐ๋ณธ์ ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ 'monkey-patching'์€ ํ‚ค์›Œ๋“œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ณด๋‹ค API ๋ณ€๊ฒฝ์— ๋” ๊ฐ•๋ ฅํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚ด๊ฐ€ ์˜์‹ฌํ•œ ๋Œ€๋กœ์•ผ. ์›์ˆญ์ด ํŒจ์น˜๋Š” ๋‚ด๊ฐ€ ์ œ์•ˆํ•œ ์†”๋ฃจ์…˜์ด ์•„๋‹™๋‹ˆ๋‹ค. ์˜คํžˆ๋ ค ์ •๋ณด์— ์ž…๊ฐํ•œ ์‚ฌ์šฉ์ž๊ฐ€ OO ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ์„ฑ๋Šฅ ์ง€ํ–ฅ ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค(๋„ˆ๋ฌด ๋งŽ์€ ์˜ต์…˜์ด ์žˆ๋Š” ๊ธฐ๋Šฅ์  API ๋กœ๋“œ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด)

------์›๋ฌธ-----
๋ณด๋‚ธ ์‚ฌ๋žŒ: "Artemy Kolchinsky" [email protected]
๋ณด๋‚ธ ๋‚ ์งœ: โ€Ž26/โ€Ž02/โ€Ž2015 17:13
๋ฐ›๋Š” ์‚ฌ๋žŒ: "pydata/pandas" [email protected]
์ฐธ์กฐ: "mangecoeur" ์กด. [email protected]
์ œ๋ชฉ: Re: [pandas] to_sqlover ๋Œ€๊ธฐ ์‹œ๊ฐ„์ด ๊ธด ์—ฐ๊ฒฐ์—์„œ ์—„์ฒญ๋‚œ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋‹ค์ค‘ ํ–‰ ์‚ฝ์ž…์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค(#8953).

์ฐธ๊ณ ๋กœ ๋‹ค์ค‘ ํ–‰ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ @jorisvandenbossche (12์›” 3์ผ ๊ฒŒ์‹œ๋ฌผ)์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๊ฝค ๋Š๋ฆฝ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์—ฌ๊ธฐ์„œ ์†๋„ ์ ˆ์ถฉ์€ ์‚ฌ์†Œํ•œ ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค.
[4]์—์„œ: engine = create_engine('sqlite:///:memory:') #, echo='debug')

[5]์—์„œ: df = pd.DataFrame(np.random.randn(50000, 10))

[6]์—์„œ:

[6]์—์„œ: %timeit df.to_sql('test_default', ์—”์ง„, if_exists='replace')
1๊ฐœ ๋ฃจํ”„, 3๊ฐœ ์ค‘ ์ตœ๊ณ : ๋ฃจํ”„๋‹น 1.05์ดˆ

[7]์—์„œ:

[7]์—์„œ: pandas.io.sql์—์„œ SQLTable ๊ฐ€์ ธ์˜ค๊ธฐ

[8]์—์„œ:

[8]์—์„œ: def _execute_insert(self, conn, keys, data_iter):
...: ๋ฐ์ดํ„ฐ = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
...: conn.execute(self.insert_statement().values(data))
...:

[9]์—์„œ: SQLTable._execute_insert = _execute_insert

[10]์—์„œ:

[10]์—์„œ: ๋‹ค์‹œ ๋กœ๋“œ(pd)
์•„์›ƒ[10]:

[11]์—์„œ:

[11]์—์„œ: %timeit df.to_sql('test_default', ์—”์ง„, if_exists='replace', chunksize=10)
1๊ฐœ ๋ฃจํ”„, 3๊ฐœ ์ค‘ ์ตœ๊ณ : ๋ฃจํ”„๋‹น 9.9์ดˆ
๋˜ํ•œ ํ‚ค์›Œ๋“œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์œ„ํ—˜ํ•˜๋‹ค๋Š” ๋ฐ ๋™์˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹ค์ค‘ ํ–‰ ๊ธฐ๋Šฅ์€ ๋งค์šฐ ๊ธฐ๋ณธ์ ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ 'monkey-patching'์€ ํ‚ค์›Œ๋“œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ณด๋‹ค API ๋ณ€๊ฒฝ์— ๋” ๊ฐ•๋ ฅํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€”
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ํšŒ์‹ ํ•˜๊ฑฐ๋‚˜ GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.

์ดˆ๊ธฐ ํ‹ฐ์ผ“ ์ œ๋ชฉ์— ๋”ฐ๋ผ ์ด ์ ‘๊ทผ ๋ฐฉ์‹์ด ๋ชจ๋“  ๊ฒฝ์šฐ์— ๋ฐ”๋žŒ์งํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๊ฒƒ ์—†์ด๋Š” ํŒ๋‹ค to_sql ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ๊ณ„์† ๋ณ€๊ฒฝ์„ ์š”์ฒญํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. (ํŒฌ๋” ๋ฒ„์ „์„ ์—…๊ทธ๋ ˆ์ด๋“œํ•  ๋•Œ ๊ฐ€์žฅ ๋จผ์ € ๋ณ€๊ฒฝํ•˜๋Š” ํ•ญ๋ชฉ์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.) ํ•ฉ๋ฆฌ์ ์ธ chunksize ๊ฐ’์— ๊ด€ํ•ด์„œ๋Š”, ํŒจํ‚ท ํฌ๊ธฐ๊ฐ€ ์˜ˆ์ธกํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฐฉ์‹์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์—ด(๊ทธ๋ฆฌ๊ณ  ๊ทธ ์•ˆ์— ๋ฌด์—‡์ด ๋“ค์–ด ์žˆ๋Š”์ง€)์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— ํ•˜๋‚˜์˜ ์ง„์ •ํ•œ n ๊ฐ€ ์—†๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. . ๋ถˆํ–‰ํžˆ๋„ chunksize ๋ฅผ ๋„ˆ๋ฌด ๋†’๊ฒŒ ์„ค์ •ํ•˜๋ฉด(์•„๋งˆ๋„ SQLAlchemy์˜ ํŒจ์น˜๋ฅผ ์ œ์™ธํ•˜๊ณ  ๋‹ค์ค‘ ํ–‰ ์‚ฝ์ž…์ด ์ผœ์ง€์ง€ ์•Š๋Š” ์ด์œ ์ผ ์ˆ˜ ์žˆ์Œ) ์™„์ „ํžˆ ๊ด€๋ จ์ด ์—†์–ด ๋ณด์ด๋Š” ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€์™€ ํ•จ๊ป˜ SQLServer๊ฐ€ ์‹คํŒจํ•˜์ง€๋งŒ, mysql ์™€ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” n ๊ฐ’์ด ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ํฐ ํŒจํ‚ท ํฌ๊ธฐ(๋ฐฑ์—… ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ€ ๋ฌด์—‡์ด๋“  ๊ฐ„์—)๋ฅผ ์ดˆ๋ž˜ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š” ๊ฐ’์„ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ์‹คํ—˜ํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒฌ๋”๊ฐ€ n ๋ฅผ ์„ ํƒํ•˜๋ฉด ๊ตฌํ˜„ ์„ธ๋ถ€ ์‚ฌํ•ญ์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋” ์•„๋ž˜๋กœ ๋–จ์–ด์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค(์ฆ‰, ์ตœ๋Œ€ ๊ฐ€๋Šฅํ•œ ์ถ”์ƒํ™” SQLALchemy ์ ‘๊ทผ ๋ฐฉ์‹๊ณผ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ).

์š”์ปจ๋Œ€, ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์œ ์šฉํ•œ ์„ค๋ช…๊ณผ ํ•จ๊ป˜ ํ‚ค์›Œ๋“œ๋กœ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ํ‚ค์›Œ๋“œ๊ฐ€ ๊ตฌํ˜„์„ ์„ ํƒํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ ๊ฒƒ์€ ์ด๋ฒˆ์ด ์ฒ˜์Œ์€ ์•„๋‹ˆ์ง€๋งŒ(http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html ์ฐธ์กฐ) ์•„๋งˆ๋„ ๊ทธ๋ ‡์ง€ ์•Š์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์ข‹์€ ์˜ˆ๋Š” ์„ค๋ช…์„ ์ฝ์–ด๋„ raw= ์ด ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š”์ง€ ์ฒ˜์Œ์— ๋ชฐ๋ž๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค!

๋‚˜๋Š” ๊ทธ๊ฒƒ์ด ๋˜ํ•œ ์—„์ฒญ๋‚œ ์–‘์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์†Œ๋น„ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•„์ฐจ๋ ธ์Šต๋‹ˆ๋‹ค. ์•ฝ 700,000๊ฐœ์˜ ํ–‰๊ณผ 301๊ฐœ์˜ ์—ด์ด ์žˆ๋Š” 1.6GB ์ด์ƒ์˜ DataFrame๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์‚ฝ์ž…ํ•˜๋Š” ๋™์•ˆ ๊ฑฐ์˜ 34GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค! ๊ทธ๊ฒƒ์€ ๋น„ํšจ์œจ์ ์ธ ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์™œ ๊ทธ๋Ÿด ์ˆ˜ ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ์•„์ด๋””์–ด๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? ๋‹ค์Œ์€ ํ™”๋ฉด ํด๋ฆฝ์ž…๋‹ˆ๋‹ค.

image

์•ˆ๋…•ํ•˜์„ธ์š” ์—ฌ๋Ÿฌ๋ถ„,
์ด ๋ฌธ์ œ์— ์ง„์ „์ด ์žˆ์Šต๋‹ˆ๊นŒ?

to_sql์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ฝ 200K ํ–‰์„ ์‚ฝ์ž…ํ•˜๋ ค๊ณ  ํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ  ์—„์ฒญ๋‚œ ์–‘์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์†Œ๋น„ํ•ฉ๋‹ˆ๋‹ค! Chuncksize๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ์— ๋„์›€์ด ๋˜์ง€๋งŒ ์—ฌ์ „ํžˆ ์†๋„๊ฐ€ ๋งค์šฐ ๋Š๋ฆฝ๋‹ˆ๋‹ค.

๋‚ด ์ธ์ƒ์€ MSSQL DBase ์ถ”์ ์„ ๋ณด๋ฉด ์‚ฝ์ž…์ด ์‹ค์ œ๋กœ ํ•œ ๋ฒˆ์— ํ•œ ํ–‰์”ฉ ์ˆ˜ํ–‰๋œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด์ œ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์œ ์ผํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๊ณต์œ  ํด๋”์˜ csv ํŒŒ์ผ์— ๋คํ”„ํ•˜๊ณ  BULK INSERT๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๊ฒƒ์€ ๋งค์šฐ ์„ฑ๊ฐ€์‹œ๊ณ  ๋ถ€์ ์ ˆํ•ฉ๋‹ˆ๋‹ค!

@andreacassioli odo ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๊ฐ„ CSV ํŒŒ์ผ์„ ํ†ตํ•ด SQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— DataFrame ์„ ์‚ฝ์ž…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. SQL Database์— CSV ๋กœ๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ODBC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ BULK INSERT ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

@ostrokach ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ง€๊ธˆ csv ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ๊ฐ€๊นŒ์ด ๊ฐˆ ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๋‚˜๋Š” ๋‹จ์ˆœํ•จ์„ ์œ„ํ•ด ์•ฝ๊ฐ„์˜ ์‹œ๊ฐ„์„ ๊ตํ™˜ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค!

๋‚˜๋Š” ์ด๊ฒƒ์ด ๋ˆ„๊ตฐ๊ฐ€๋ฅผ ๋„์šธ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.
http://docs.sqlalchemy.org/en/latest/faq/performance.html#i -m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

@indera pandas๋Š” ORM์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  sqlalchemy Core๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค(์ด๋Š” ๋Œ€์šฉ๋Ÿ‰ ์‚ฝ์ž…์— ์‚ฌ์šฉํ•˜๋„๋ก ์ œ์•ˆ๋œ ๋ฌธ์„œ ํ•ญ๋ชฉ).

๊ทธ๋™์•ˆ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ํ•ฉ์˜๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? ๋‚˜๋Š” ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ํ–‰์„ postgres์— ์‚ฝ์ž…ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ์˜์›ํžˆ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค. CSV / odo๊ฐ€ ๊ฐˆ ๊ธธ์ž…๋‹ˆ๊นŒ?

@russlamb ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์‹ค์šฉ์ ์ธ ๋ฐฉ๋ฒ•์€ ๋‹จ์ˆœํžˆ ์ผ๊ด„ ์—…๋กœ๋“œํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋ˆ„๊ตฐ๊ฐ€ db ํŠน์ •์ด๋ฏ€๋กœ odo ์—๋Š” postgresl ( mysql ์ผ ์ˆ˜ ์žˆ์Œ)์— ๋Œ€ํ•œ ์†”๋ฃจ์…˜์ด ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. sqlserver์™€ ๊ฐ™์€ ๊ฒฝ์šฐ '์ง์ ‘ ์ˆ˜ํ–‰'ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(IOW ์ž‘์„ฑํ•ด์•ผ ํ•จ).

sqlserver์˜ ๊ฒฝ์šฐ SQLAlchemy ์—”ํ„ฐํ‹ฐ์™€ ํ•จ๊ป˜ โ€‹โ€‹FreeTDS ๋“œ๋ผ์ด๋ฒ„( http://www.freetds.org/software.html ๋ฐ https://github.com/mkleehammer/pyodbc )๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค์šฐ ๋น ๋ฅธ ์‚ฝ์ž…(๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๋‹น 20K ํ–‰)์„ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. :

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()


class DemographicEntity(Base):
    __tablename__ = 'DEMOGRAPHIC'

    patid = db.Column("PATID", db.Text, primary_key=True)
    """
    patid = db.Column("PATID", db.Text, primary_key=True, autoincrement=False, nullable=True)
    birth_date = db.Column("BIRTH_DATE", db.Date)
    birth_time = db.Column("BIRTH_TIME", db.Text(5))
    sex = db.Column("SEX", db.Text(2))

def get_db_url(db_host, db_port, db_name, db_user, db_pass):
    params = parse.quote(
        "Driver={{FreeTDS}};Server={};Port={};"
        "Database={};UID={};PWD={};"
        .format(db_host, db_port, db_name, db_user, db_pass))
    return 'mssql+pyodbc:///?odbc_connect={}'.format(params)

def get_db_pool():
    """
    Create the database engine connection.
    <strong i="6">@see</strong> http://docs.sqlalchemy.org/en/latest/core/engines.html

    :return: Dialect object which can either be used directly
            to interact with the database, or can be passed to
            a Session object to work with the ORM.
    """
    global DB_POOL

    if DB_POOL is None:
        url = get_db_url(db_host=DB_HOST, db_port=DB_PORT, db_name=DB_NAME,
                         db_user=DB_USER, db_pass=DB_PASS)
        DB_POOL = db.create_engine(url,
                                   pool_size=10,
                                   max_overflow=5,
                                   pool_recycle=3600)

    try:
        DB_POOL.execute("USE {db}".format(db=DB_NAME))
    except db.exc.OperationalError:
        logger.error('Database {db} does not exist.'.format(db=DB_NAME))

    return DB_POOL


def save_frame():
    db_pool = get_db_pool()
    records = df.to_dict(orient='records')
    result = db_pool.execute(entity.__table__.insert(), records)

CSV / odo๊ฐ€ ๊ฐˆ ๊ธธ์ž…๋‹ˆ๊นŒ?

์ด ์†”๋ฃจ์…˜์€ ๋‹ค์ค‘ ํ–‰/์ฒญํฌ ํฌ๊ธฐ ์„ค์ •์— ๊ด€๊ณ„์—†์ด ๊ฑฐ์˜ ํ•ญ์ƒ ๋” ๋น ๋ฅผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ @russlamb , ๊ทธ๋Ÿฌํ•œ ๋‹ค์ค‘ ํ–‰ ํ‚ค์›Œ๋“œ๊ฐ€ ๊ท€ํ•˜์˜ ๊ฒฝ์šฐ์— ๊ฐœ์„ ์ด ๋ ์ง€ ์—ฌ๋ถ€๋ฅผ ๋“ฃ๋Š” ๊ฒƒ์€ ํ•ญ์ƒ ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์‰ฝ๊ฒŒ ํ…Œ์ŠคํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์€ https://github.com/pandas-dev/pandas/issues/8953#issuecomment -76139975๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

(๋ฐ˜๋“œ์‹œ ๊ธฐ๋ณธ๊ฐ’์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ ๋„) ์ด๊ฒƒ์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์›ํ•œ๋‹ค๋Š” ๋ฐ ๋™์˜ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ์ด๋ฅผ ์œ„ํ•ด PR์„ ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ๊ทธ๊ฒƒ์€ ํ™•์‹คํžˆ ํ™˜์˜ํ•  ์ผ์ž…๋‹ˆ๋‹ค.
์ด ๊ธฐ๋Šฅ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์•ฝ๊ฐ„์˜ ํ† ๋ก ๋งŒ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค(์ƒˆ ํ‚ค์›Œ๋“œ ๋Œ€ OO API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ•˜์œ„ ํด๋ž˜์Šค).

@jorisvandenbossche ์œ„์—์„œ ๋งํฌํ•œ ๋ฌธ์„œ๋Š” "๋˜๋Š” SQLAlchemy ORM์€ ์†Œ๋Ÿ‰์˜ ORM์œผ๋กœ ์ฝ”์–ด ์ˆ˜์ค€ INSERT ๋ฐ UPDATE ๊ตฌ๋ฌธ์„ ๋ฐฉ์ถœํ•˜๊ธฐ ์œ„ํ•ด ์ž‘์—… ๋‹จ์œ„ ํ”„๋กœ์„ธ์Šค์˜ ํ•˜์œ„ ์„น์…˜์— ํ›„ํฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋Œ€๋Ÿ‰ ์ž‘์—… ๋ฐฉ๋ฒ• ์ œํ’ˆ๊ตฐ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. -๊ธฐ๋ฐ˜ ์ž๋™ํ™”."

๋‚ด๊ฐ€ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์€ ์œ„์— ๊ฒŒ์‹œํ•œ ์ฝ”๋“œ์—์„œ์™€ ๊ฐ™์ด ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋‚ด๋ถ€์ ์œผ๋กœ SQLAlchemy ORM์„ ์‚ฌ์šฉํ•˜๋Š” to_sql ์— ๋Œ€ํ•œ sqlserver ํŠน์ • ๋ฒ„์ „์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์ด์ „์— ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹น์‹ ์ด ๊ฐ€๋Š” ๋ฐฉ๋ฒ•์€ pandas sql์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋ฐฑ์—”๋“œ์— ์ตœ์ ํ™”๋œ ํด๋ž˜์Šค ๋‚˜๋Š” ๊ณผ๊ฑฐ์— ์‚ฌ์šฉ ์š”์ง€๋ฅผ ๊ฒŒ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
ํ›จ์”ฌ ๋น ๋ฅธ postgres COPY FROM ๋ช…๋ น. ๊ทธ๋Ÿฌ๋‚˜ ๋น„์Šทํ•œ ๊ฒƒ์„
์ด์ œ odo์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋” ๊ฐ•๋ ฅํ•œ ๋ฐฉ์‹์œผ๋กœ ๊ตฌ์ถ•๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ณ„๋กœ ์—†๋‹ค
odo์—์„œ ์ž‘์—…์„ ๋ณต์ œํ•  ๋•Œ IMHO๋ฅผ ๊ฐ€๋ฆฌํ‚ต๋‹ˆ๋‹ค.

2017๋…„ 3์›” 7์ผ 00์‹œ 53๋ถ„์— "Andrei Sura" [email protected] ์ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

@jorisvandenbossche https://github.com/jorisvandenbossche ๋ฌธ์„œ
์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ๋งํฌ๋Š” "๋˜๋Š” SQLAlchemy ORM์ด ๋Œ€๋Ÿ‰์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋ฉ”์†Œ๋“œ์˜ ํ•˜์œ„ ์„น์…˜์— ๋Œ€ํ•œ ํ›„ํฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ์ž‘์—… ๋ชจ์Œ
์ฝ”์–ด ๋ ˆ๋ฒจ INSERT ๋ฐ UPDATE๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ์ž‘์—… ๋‹จ์œ„ ํ”„๋กœ์„ธ์Šค
์•ฝ๊ฐ„์˜ ORM ๊ธฐ๋ฐ˜ ์ž๋™ํ™”๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค."

๋‚ด๊ฐ€ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์€ sqlserver ํŠน์ • ๋ฒ„์ „์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋‚ด๋ถ€์ ์œผ๋กœ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด SQLAlchemy ์ฝ”์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” "to_sql".

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/pandas-dev/pandas/issues/8953#issuecomment-284437587 ,
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AAtYVDXKLuTlsh9ycpMQvU5C0hs_RxuYks5rjCwBgaJpZM4DCjLh
.

๋˜ํ•œ sqlalchemy๊ฐ€ ๋Œ€์‹  ํ•ต์‹ฌ์ด ๋  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์–ธ๊ธ‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ญ”๊ฐ€๊ฐ€ ์•„๋‹ˆ๋ฉด
๋งŽ์ด ๋ณ€๊ฒฝ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ค ๊ฒฝ์šฐ์—๋„ sqlalchemy ์ฝ”์–ด๋งŒ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๋„ˆ๋ผ๋ฉด
์ฝ”์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์†๋„๋ฅผ ๋” ๋†’์ด๋ ค๋ฉด ๋” ๋‚ฎ์€ ์ˆ˜์ค€์ธ db๋กœ ์ด๋™ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
ํŠน์ • ์ตœ์ ํ™”

2017๋…„ 3์›” 7์ผ 00์‹œ 53๋ถ„์— "Andrei Sura" [email protected] ์ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

@jorisvandenbossche https://github.com/jorisvandenbossche ๋ฌธ์„œ
์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ๋งํฌ๋Š” "๋˜๋Š” SQLAlchemy ORM์ด ๋Œ€๋Ÿ‰์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋ฉ”์†Œ๋“œ์˜ ํ•˜์œ„ ์„น์…˜์— ๋Œ€ํ•œ ํ›„ํฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ์ž‘์—… ๋ชจ์Œ
์ฝ”์–ด ๋ ˆ๋ฒจ INSERT ๋ฐ UPDATE๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ์ž‘์—… ๋‹จ์œ„ ํ”„๋กœ์„ธ์Šค
์•ฝ๊ฐ„์˜ ORM ๊ธฐ๋ฐ˜ ์ž๋™ํ™”๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค."

๋‚ด๊ฐ€ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์€ sqlserver ํŠน์ • ๋ฒ„์ „์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋‚ด๋ถ€์ ์œผ๋กœ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด SQLAlchemy ์ฝ”์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” "to_sql".

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/pandas-dev/pandas/issues/8953#issuecomment-284437587 ,
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AAtYVDXKLuTlsh9ycpMQvU5C0hs_RxuYks5rjCwBgaJpZM4DCjLh
.

์ด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ/ํ•ด๊ฒฐ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ? ํ˜„์žฌ pandas ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ SQL db์— ์‚ฝ์ž…ํ•˜๋Š” ๊ฒƒ์€ ์žฅ๋‚œ๊ฐ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ด ์•„๋‹Œ ํ•œ ๋งค์šฐ ๋Š๋ฆฝ๋‹ˆ๋‹ค. ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ๊ฒฐ์ •ํ•˜๊ณ  ์ถ”์ง„ํ•ด ๋ณผ๊นŒ์š”?

@dfernan ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด odo ๋ฅผ ๋ณด๊ณ  ์‹ถ์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ค‘๊ฐ„ CSV ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜๋ฉด ์—ฌ๊ธฐ์—์„œ ์–ด๋–ค ์ข…๋ฅ˜์˜ ๊ฐœ์„ ์ด ๋ฐœ์ƒํ•˜๋”๋ผ๋„ sqlalchemy๋ฅผ ๊ฑฐ์น˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ•ญ์ƒ ํ›จ์”ฌ ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค...

@ostrokach , odo์˜ ๋™์ž‘์ด ์ผ๋ฐ˜์ ์ธ Pandas ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ๊ฒƒ์ธ์ง€ ํ™•์‹ ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ODBC๋ฅผ ํ†ตํ•œ ๋‹ค์ค‘ ํ–‰ ์‚ฝ์ž…์€ ์•„๋งˆ๋„ ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„์„๊ฐ€์—๊ฒŒ ์ถฉ๋ถ„ํžˆ ๋น ๋ฅผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋‚˜ ์ž์‹ ์„ ๋Œ€๋ณ€ํ•˜๊ธฐ ์œ„ํ•ด ์œ„์˜ ์›์ˆญ์ด ํŒจ์น˜์—์„œ odo๋กœ ์ „ํ™˜ํ•˜๋Š” ๋ฐ ๋ช‡ ์‹œ๊ฐ„์„ ๋ณด๋ƒˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜ ํŒฌ๋” ๋Ÿฐํƒ€์ž„์€ RBAR 10์‹œ๊ฐ„ ์ด์ƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์›์ˆญ์ด ํŒจ์น˜๋Š” ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ 2๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.
odo/CSV ๊ฒฝ๋กœ๋Š” ์˜ˆ์ƒ๋Œ€๋กœ ๋” ๋นจ๋ž์ง€๋งŒ ๋…ธ๋ ฅํ•  ๊ฐ€์น˜๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€๋Š” ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์›์ˆญ์ด ํŒจ์น˜๋ฅผ ํ”ผํ•œ๋‹ค๋Š” ๋ช…๋ชฉ์œผ๋กœ ๋ณ„๋กœ ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š์€ CSV ๋ณ€ํ™˜ ๋ฌธ์ œ๋ฅผ ๋งŒ์ง€์ž‘๊ฑฐ๋ ธ๋‹ค. NLP ๋ถ„์„์„ ์œ„ํ•ด ~10๊ฐœ์˜ mysql ๋ฐ PG DB์—์„œ Postgres์˜ ๊ณตํ†ต ์˜์—ญ์œผ๋กœ 250K ํ–‰์„ ๊ฐ€์ ธ์˜ค๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ €๋Š” ๋Œ€๋Ÿ‰ ๋กœ๋”ฉ ์ ‘๊ทผ ๋ฐฉ์‹์— ๋Œ€ํ•ด ๋งค์šฐ ์ž˜ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” CSV ๋ฐ์ดํ„ฐ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ช‡ ๋…„ ๋™์•ˆ ๊ทธ๊ฒƒ๋“ค์„ ์‚ฌ์šฉํ•ด ์™”์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฃผ์š” ์ œํ•œ ์‚ฌํ•ญ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  1. df->CSV->Postgres์˜ ๊ฒฝ์šฐ PG ํ˜ธ์ŠคํŠธ์—์„œ CSV๋ฅผ ๊ฐ€์ ธ์˜ค๋ ค๋ฉด ์…ธ ์•ก์„ธ์Šค ๋ฐ scp ๋‹จ๊ณ„๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. @mangecoeur ๊ฐ€ STDIN์œผ๋กœ์˜ ์ŠคํŠธ๋ฆผ์œผ๋กœ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  2. ๋‚ด ๋ชฉ์ ์„ ์œ„ํ•ด(ํ…์ŠคํŠธ ๋‚ด์šฉ์— ๋งŽ์€ ํŠน๋ณ„ํ•œ ๊ฒฝ์šฐ๊ฐ€ ํฌํ•จ๋œ 250K ํ–‰์˜ ์ฃผ์„) CSV ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์ด๊ฒƒ์— ๊ณ„์† ํˆฌ์žํ•  ๋งŒํผ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์›ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๋ถ„์„ ์ž‘์—…์„ ๊ณ„์†ํ•  ์ˆ˜ ์žˆ๋„๋ก ํŒจ์น˜๋กœ ๋‹ค์‹œ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

@jorisvandenbossche , @maxgrenderjones์— ๋™์˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ์„ ํƒํ•˜๋Š” ์˜ต์…˜(๊ธฐ๋ณธ๊ฐ’ ์•„๋‹˜)์€ ๋งค์šฐ ์œ ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. dialect.supports_multivalues_insert์— ๋Œ€ํ•œ @artemyk ์˜ ์š”์ ์€ ์ด๊ฒƒ์„ ํ•ฉ๋ฆฌ์ ์ธ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ๋งŒ๋“ค ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๊ฒƒ์ด ์•ž์œผ๋กœ ๋‚˜์•„๊ฐˆ ์ˆ˜ ์žˆ๋‹ค๋ฉด PR์„ ์ œ์ถœํ•˜๊ฒŒ ๋˜์–ด ๊ธฐ์ฉ๋‹ˆ๋‹ค.

odo์— ๋Œ€ํ•œ ๋‚ด ๊ฒฝํ—˜์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์ธ์ฝ”๋”ฉ๊ณผ ๊ด€๋ จ๋œ ์•Œ๋ ค์ง„ ๋ฌธ์ œ๋กœ ์ธํ•ด MS SQL ๋Œ€๋Ÿ‰ ์‚ฝ์ž…์—์„œ๋Š” ์ž‘๋™ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. imho m-row ์‚ฝ์ž…๋ฌผ์€ ๋Œ€๋ถ€๋ถ„์˜ ppl์„ ์œ„ํ•œ ์ข‹์€ ์‹ค์šฉ์ ์ธ ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค.

@markschwarz ๋” ๋น ๋ฅด๊ฒŒ ์ž‘๋™ํ•˜๋„๋ก ํ•˜๋Š” ์˜ต์…˜์€ ๋งค์šฐ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!

sqlite๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ๋ฅผ ์ถ”์ ํ•˜๋ฉด chunksize ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๋‹ค์ค‘ ์‚ฝ์ž…์— ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

2017-09-28 00:21:39,007 INFO sqlalchemy.engine.base.Engine INSERT INTO country_hsproduct_year (location_id, product_id, year, export_rca, import_value, cog, export_value, distance, location_level, product_level) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2017-09-28 00:21:39,007 INFO sqlalchemy.engine.base.Engine ((75, 1237, 1996, 1.7283086776733398, 273487116.0, 0.0, 514320160.0, 0.5413745641708374, 'country', '4digit'), (75, 1237, 1997, 1.7167805433273315, 312047528.0, 0.0, 592372864.0, 0.5314807891845703, 'country', '4digit'), (75, 1237, 1998, 1.2120152711868286, 341676961.0, 0.0, 468860608.0, 0.5472233295440674, 'country', '4digit'), (75, 1237, 1999, 1.236651062965393, 334604240.0, 0.0, 440722336.0, 0.5695921182632446, 'country', '4digit'), (75, 1237, 2000, 1.189828872680664, 383555023.0, 0.0, 426384832.0, 0.5794379711151123, 'country', '4digit'), (75, 1237, 2001, 0.9920380115509033, 374157144.0, 0.3462945520877838, 327031392.0, 0.6234743595123291, 'country', '4digit'), (75, 1237, 2002, 1.0405025482177734, 471456583.0, 0.0, 377909376.0, 0.6023964285850525, 'country', '4digit'), (75, 1237, 2003, 1.147829532623291, 552441401.0, 0.0, 481313504.0, 0.5896202325820923, 'country', '4digit')  ... displaying 10 of 100000 total bound parameter sets ...  (79, 1024, 2015, 0.0, None, 0.8785018920898438, 0.0, 0.9823430776596069, 'country', '4digit'), (79, 1025, 1995, 0.0, None, 0.5624096989631653, 0.0, 0.9839603304862976, 'country', '4digit'))

(์›์ˆญ์ด ํŒจ์น˜ ์—†์ด, ์ฆ‰)

ํฅ๋ฏธ๋กญ๊ฒŒ๋„ ์›์ˆญ์ด ํŒจ์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 10^5์˜ ์ฒญํฌ ํฌ๊ธฐ๋ฅผ ์ œ๊ณตํ•˜์ง€๋งŒ 10^3์€ ์ œ๊ณตํ•˜์ง€ ์•Š์œผ๋ฉด ์ค‘๋‹จ๋ฉ๋‹ˆ๋‹ค. ์˜ค๋ฅ˜๋Š” sqlite์—์„œ "๋„ˆ๋ฌด ๋งŽ์€ SQL ๋ณ€์ˆ˜"์ž…๋‹ˆ๋‹ค.

@makmanalp , ์•„์ง ๊ทธ ๋™์ž‘์„ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด PG๋ฅผ ์ถ”์ ํ•˜์ง€ ์•Š์•˜์ง€๋งŒ ๊ฑฐ์˜ ํ•ญ์ƒ ์‚ฝ์ž… ์‹œ chunksize๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ ์˜ˆ์—์„œ ๋ฌด์ž‘์œ„๋กœ 200-5000 ์‚ฌ์ด์˜ 5๊ฐœ ๊ฐ’์œผ๋กœ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์›์ˆญ์ด ํŒจ์น˜ ์—†์ด ๊ทธ ์„ ํƒ๋“ค ์‚ฌ์ด์—์„œ ๊ธ‰๊ฒฉํ•œ ๊ฒฝ๊ณผ ์‹œ๊ฐ„ ์ฐจ์ด๋ฅผ ๋ณด์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ํŒจ์น˜๋กœ ๊ฒฝ๊ณผ ์‹œ๊ฐ„์ด ~80% ๊ฐ์†Œํ–ˆ์Šต๋‹ˆ๋‹ค.

https://github.com/pandas-dev/pandas/issues/8953#issuecomment -76139975

์ด ์›์ˆญ์ด ํŒจ์น˜๊ฐ€ ์—ฌ์ „ํžˆ ์ž‘๋™ํ•ฉ๋‹ˆ๊นŒ? MS SQL Server์—์„œ ์‹œ๋„ํ–ˆ์ง€๋งŒ ๊ฐœ์„ ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์˜ˆ์™ธ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

(pyodbc.Error) ('07002', '[07002] [Microsoft][SQL Server Native Client 11.0]COUNT field incorrect or syntax error (0) (SQLExecDirectW)')

@hangyao ๋‚˜๋Š” ํŒจ์น˜๊ฐ€ ๊ตฌํ˜„์— ๋”ฐ๋ผ ๋‹ค๋ฅด๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š”๋ฐ, ํŒŒ์ด์ฌ DBAPI๊ฐ€ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด DBAPI ๋“œ๋ผ์ด๋ฒ„์— ๋‚จ๊ฒจ๋‘๋Š” ๊ฒƒ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋” ๋น ๋ฅผ ์ˆ˜๋„ ์žˆ๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. RE: ๊ตฌ๋ฌธ ์˜ค๋ฅ˜, ํ™•์‹คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

/io/sql.py ํŒŒ์ผ์˜ /io/sql.py ํŒŒ์ผ์— _engine_builder ํ•จ์ˆ˜๋ฅผ ๋ผ์ธ 521์— ์žˆ๋Š” ์ƒˆ IF ๊ตฌ๋ฌธ์— ์ถ”๊ฐ€ํ•˜์—ฌ 'new' _engine_builder ๋ฅผ ์•„๋ž˜์— ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•ด ์ƒ๊ฐํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จํŽธ. ๋‚ด ํ™˜๊ฒฝ์—์„œ ๊ฐ„๋‹จํžˆ ํ…Œ์ŠคํŠธํ–ˆ์œผ๋ฉฐ MSSQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ํ›Œ๋ฅญํ•˜๊ฒŒ ์ž‘๋™ํ•˜์—ฌ 100๋ฐฐ ์ด์ƒ์˜ ์†๋„ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ๋Š” ์•„์ง ํ…Œ์ŠคํŠธํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์ œ๊ฐ€ PR์„ ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ทธ๋ƒฅ ์‚ฝ์ž…ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๊น”๋”ํ•˜๊ณ  ์•ˆ์ „ํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋” ๋งŽ์€ ๋…ธ๋ ฅ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ํ•ญ์ƒ ์›ํ•˜๋Š” ์‚ฌ์–‘์€ ์•„๋‹ ์ˆ˜๋„ ์žˆ๊ณ  ์ด ์„ค์ •์„ ์ผœ๊ณ  ๋„๋Š” boolean ์Šค์œ„์น˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. , (์˜ˆ: fast_executemany=True ) to_sql ๋Š” ๋ฌป์ง€ ์•Š๊ณ  ํ•˜๊ธฐ์—๋Š” ๋„ˆ๋ฌด ํฐ ๋…ธ๋ ฅ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๋‚ด ์งˆ๋ฌธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์•„๋ž˜ ๊ธฐ๋Šฅ์ด ์ž‘๋™ํ•˜๊ณ  PostgreSQL์˜ INSERT ์†๋„๋„ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๊นŒ?

  • ํŒฌ๋” ์ด๋ฒคํŠธ๊ฐ€ ์†Œ์Šค์— ์ด ์Šค๋‹ˆํŽซ์„ ์›ํ•ฉ๋‹ˆ๊นŒ? ๊ทธ๋ ‡๋‹ค๋ฉด:

  • ์ด ๊ธฐ๋Šฅ์„ ๊ธฐ๋ณธ sql.py ๊ธฐ๋Šฅ์— ์ถ”๊ฐ€ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๊นŒ? ์•„๋‹ˆ๋ฉด ์ด๊ฒƒ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ์— ๋” ์ข‹์€ ๊ณณ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์˜๊ฒฌ์„ ๋“ฃ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

๋‹ต๋ณ€ ์ถœ์ฒ˜: https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc/48861231#48861231

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
def _engine_builder(con):
    """
    Returns a SQLAlchemy engine from a URI (if con is a string)
    else it just return con without modifying it.
    """
    global _SQLALCHEMY_INSTALLED
    if isinstance(con, string_types):
        try:
            import sqlalchemy
        except ImportError:
            _SQLALCHEMY_INSTALLED = False
        else:
            con = sqlalchemy.create_engine(con)

    @event.listens_for(engine, 'before_cursor_execute')
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        if executemany:
            cursor.fast_executemany = True
return con

@tsktsktsk123 ์ตœ๊ทผ ์ด์— ๊ด€๋ จ๋œ PR ๋ณ‘ํ•ฉ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค: https://github.com/pandas-dev/pandas/pull/19664. ๋‚˜๋Š” ์•„์ง ๊ท€ํ•˜์˜ ๊ฒŒ์‹œ๋ฌผ์„ ์ž์„ธํžˆ ์‚ดํŽด๋ณด์ง€ ์•Š์•˜์œผ๋ฉฐ ํ™•์‹คํžˆ ๋™์ผํ•˜์ง€๋Š” ์•Š์ง€๋งŒ(sqlalchemy ์—”์ง„์˜ supports_multivalues_insert ์†์„ฑ์„ ์‚ฌ์šฉํ•จ) ์ด๊ฒƒ์ด ์ด๋ฏธ ๋„์›€์ด ๋  ๊ฒฝ์šฐ์— ๋Œ€๋น„ํ•˜์—ฌ ์•Œ๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ.

์ข‹์€ ์†Œ์‹์ž…๋‹ˆ๋‹ค! ๋‚˜๋Š” PR์„ ์‚ดํŽด๋ณด์ง€ ์•Š์•˜์ง€๋งŒ ์ด๋ฒˆ ์ฃผ๋ง์— ๊ทธ๊ฒƒ์„ ๋น„๊ตํ•˜๊ณ  ๊ฒฐ๊ณผ๋กœ ๋Œ์•„์˜ฌ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๋ฐฉ๊ธˆ 0.23.0 RC2(postgresql์—์„œ)๋ฅผ ์‹œ๋„ํ•˜๊ณ  ์žˆ์—ˆ๊ณ  ์„ฑ๋Šฅ ํ–ฅ์ƒ ๋Œ€์‹  ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ƒ๋‹นํžˆ ๋Š๋ ค์กŒ์Šต๋‹ˆ๋‹ค. DB ์ฟผ๋ฆฌ๋Š” ํ›จ์”ฌ ๋นจ๋ผ์กŒ์ง€๋งŒ to_sql() ์‹œ๊ฐ„์„ ์ธก์ •ํ•˜๋ฉด ์‹ค์ œ๋กœ ์ตœ๋Œ€ 1.5๋ฐฐ ๋Š๋ ค์กŒ์Šต๋‹ˆ๋‹ค(์˜ˆ: 7์ดˆ์—์„œ 11์ดˆ)...

๋ฐฉ๊ธˆ RC๋ฅผ ํ…Œ์ŠคํŠธํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์†๋„ ์ €ํ•˜๊ฐ€ ์ด PR์—์„œ ๋น„๋กฏ๋œ ๊ฒƒ์ธ์ง€ ํ™•์‹คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ฒช์—ˆ์Šต๋‹ˆ๊นŒ?

@schettino72 ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜์…จ๋‚˜์š”?

10๊ฐœ์˜ ์—ด์ด ์žˆ๋Š” ์•ฝ 30,000๊ฐœ์˜ ํ–‰. ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ๋กœ ๋‚ด๊ฐ€ ์‹œ๋„ํ•˜๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ๊ฒƒ์ด ๋” ๋Š๋ฆฝ๋‹ˆ๋‹ค(SQL์€ ๋” ๋น ๋ฅด์ง€๋งŒ ์ „๋ฐ˜์ ์œผ๋กœ ๋Š๋ฆผ). ๋ชจ๋“  ๊ฐ’์— ๋Œ€ํ•œ ๊ฐ’ ๋ณด๊ฐ„์ด ์žˆ๋Š” ๊ฑฐ๋Œ€ํ•œ SQL ๋ฌธ์„ ์ƒ์„ฑํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ™์€ ๊ฒƒ

 %(user_id_m32639)s, %(event_id_m32639)s, %(colx_m32639)s,

d6tstack ์‚ฌ์šฉ์ด ํ›จ์”ฌ ๊ฐ„๋‹จํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค. ํ•œ ์ค„ d6tstack.utils.pd_to_psql(df, cfg_uri_psql, 'benchmark', if_exists='replace') ์ด๋ฉฐ df.to_sql() ๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฆ…๋‹ˆ๋‹ค. ํฌ์ŠคํŠธ๊ทธ๋ ˆ์Šค์™€ mysql์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb ์ฐธ์กฐ

๋‚˜๋Š” ์›์ˆญ์ด ํŒจ์น˜ ์†”๋ฃจ์…˜์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:

from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print "Using monkey-patched _execute_insert"
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

SQLTable._execute_insert = _execute_insert

์ž ์‹œ ๋™์•ˆ์ด์ง€๋งŒ ์ง€๊ธˆ์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

TypeError: insert_statement() missing 2 required positional arguments: 'data' and 'conn'

๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ์ด๊ฒƒ์„ ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ? ์ €๋Š” Python 3.6.5(Anaconda) ๋ฐ pandas==0.23.0์„ ์‚ฌ์šฉ ์ค‘์ž…๋‹ˆ๋‹ค.

์ด๊ฑฐ ๊ณ ์ณ์ง€๋‚˜์š”? ํ˜„์žฌ df.to_sql์€ ๋งค์šฐ ๋Š๋ฆฌ๊ณ  ๋งŽ์€ ์‹ค์ œ ์‚ฌ์šฉ ์‚ฌ๋ก€์—์„œ ์ „ํ˜€ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. Odo ํ”„๋กœ์ ํŠธ๋Š” ์ด๋ฏธ ํฌ๊ธฐํ•œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
df.to_sql์„ ๊ฑฐ์˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋Š” ๊ธˆ์œต ์‹œ๊ณ„์—ด์—์„œ ๋‹ค์Œ ์‚ฌ์šฉ ์‚ฌ๋ก€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
1) ๊ณผ๊ฑฐ csv ๋ฐ์ดํ„ฐ๋ฅผ postgres ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋ณต์‚ฌ - df.to_sql์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์œผ๋ฉฐ psycopg2 copy_from ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.
2) postgres ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋คํ”„๋  ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ(์ดˆ๋‹น ~500-3000ํ–‰์˜ ์ผ๊ด„ ์ฒ˜๋ฆฌ๋กœ ์ œ๊ณต๋จ) - ์ด๋Ÿฌํ•œ ์ž์—ฐ์ ์ธ ๋ฐ์ดํ„ฐ ์ผ๊ด„ ์ฒ˜๋ฆฌ๋ฅผ postgres์— ์‚ฝ์ž…ํ•˜๋Š” ๋ฐ ๋„ˆ๋ฌด ๋งŽ์€ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์‹œ df.to_sql ์„ฑ๋Šฅ์€ ์ƒ๋‹นํžˆ ์‹ค๋ง์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค.
๋‚ด๊ฐ€ ์ง€๊ธˆ df.to_sql์ด ์œ ์šฉํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š” ์œ ์ผํ•œ ์žฅ์†Œ๋Š” ์ž๋™์œผ๋กœ ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค!!! - ์„ค๊ณ„๋œ ์‚ฌ์šฉ ์‚ฌ๋ก€๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค๋„ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ณต์œ ํ•˜๋Š”์ง€ ํ™•์‹คํ•˜์ง€ ์•Š์ง€๋งŒ ์ด ๋ฌธ์ œ๋Š” "๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„-๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค" ์ธํ„ฐํŽ˜์ด์Šค๊ฐ€ ์›ํ™œํ•˜๊ฒŒ ์ž‘๋™ํ•˜๋ ค๋ฉด ์•ฝ๊ฐ„์˜ ์ฃผ์˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋Œ€ํ•ด.

SQLite ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋‹ค์ค‘ ์‚ฝ์ž…์„ ์ˆ˜ํ–‰ํ•˜๋ ค๊ณ  ํ•  ๋•Œ ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ๋‚ด ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.
df.to_sql("financial_data", con=conn, if_exists="append", index=False, method="multi")

์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

Traceback (most recent call last):

  File "<ipython-input-11-cf095145b980>", line 1, in <module>
    handler.insert_financial_data_from_df(data, "GOOG")

  File "C:\Users\user01\Documents\Code\FinancialHandler.py", line 110, in insert_financial_data_from_df
    df.to_sql("financial_data", con=conn, if_exists="append", index=False, method="multi")

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 2531, in to_sql
    dtype=dtype, method=method)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 460, in to_sql
    chunksize=chunksize, dtype=dtype, method=method)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 1547, in to_sql
    table.insert(chunksize, method)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 686, in insert
    exec_insert(conn, keys, chunk_iter)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 609, in _execute_insert_multi
    conn.execute(self.table.insert(data))

TypeError: insert() takes exactly 2 arguments (1 given)

์™œ ์ด๋Ÿฐ ์ผ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๊นŒ? ์ €๋Š” Python 3.7.3(Anaconda), pandas 0.24.2 ๋ฐ sqlite3 2.6.0์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฏธ๋ฆฌ ๋Œ€๋‹จํžˆ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

@jconstanzo ์ด ๋ฌธ์ œ๋ฅผ ์ƒˆ ๋ฌธ์ œ๋กœ ์—ด ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?
๊ทธ๋ฆฌ๊ณ  ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ์˜ˆ๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? (์˜ˆ: ๋ฌธ์ œ๋ฅผ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋Š” ์ž‘์€ ์˜ˆ์ œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„)

@jconstanzo ์—ฌ๊ธฐ์— ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. method='multi' (๋‚ด ๊ฒฝ์šฐ์—๋Š” chunksize ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ)์„ ์‚ฌ์šฉํ•˜๋ฉด SQLite ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์‚ฝ์ž…ํ•˜๋ ค๊ณ  ํ•  ๋•Œ ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ถˆํ–‰ํžˆ๋„ ๋‚ด ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ๊ฑฐ๋Œ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์ œ๋กœ ์˜ˆ์ œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ œ๊ณตํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ์ œ๊ฐ€ ์ฒ˜์Œ์— method ๋ฐ chunksize ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ์ž…๋‹ˆ๋‹ค.

๋Šฆ์–ด์„œ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ๋ฐฉ๊ธˆ ์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋ฌธ์ œ๋ฅผ ์—ด์—ˆ์Šต๋‹ˆ๋‹ค. https://github.com/pandas-dev/pandas/issues/29921

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰