Pandas: to_sql이 λ„ˆλ¬΄ λŠλ¦½λ‹ˆλ‹€.

에 λ§Œλ“  2017λ…„ 02μ›” 01일  Β·  24μ½”λ©˜νŠΈ  Β·  좜처: pandas-dev/pandas

μ½”λ“œ μƒ˜ν”Œ,

df_name.to_sql('table_name',
                          schema = 'public',
                          con = engine,
                          index = False,
                          if_exists = 'replace')

문제 μ„€λͺ…

λ‚˜λŠ” 500,000 ν–‰ 데이터 ν”„λ ˆμž„μ„ postgres AWS λ°μ΄ν„°λ² μ΄μŠ€μ— μ“°κ³  μžˆλŠ”λ° 데이터λ₯Ό ν‘Έμ‹œν•˜λŠ” 데 맀우 였랜 μ‹œκ°„μ΄ κ±Έλ¦½λ‹ˆλ‹€.

μƒλ‹Ήνžˆ 큰 SQL μ„œλ²„μ΄κ³  λ‚΄ 인터넷 연결이 μš°μˆ˜ν•˜λ―€λ‘œ λ¬Έμ œμ— κΈ°μ—¬ν•˜λŠ” κ²ƒμœΌλ‘œ λ°°μ œν–ˆμŠ΅λ‹ˆλ‹€.

이에 λΉ„ν•΄ csv2sql을 μ‚¬μš©ν•˜κ±°λ‚˜ λͺ…λ Ή μ€„μ—μ„œ cat 및 piping을 psql에 μ‚¬μš©ν•˜λŠ” 것이 훨씬 λΉ λ¦…λ‹ˆλ‹€.

IO SQL Usage Question

κ°€μž₯ μœ μš©ν•œ λŒ“κΈ€

engine = create_engine(connection_string) μ•„λž˜μ—μ΄ μ½”λ“œλ₯Ό μΆ”κ°€ν•˜μ‹­μ‹œμ˜€.

from sqlalchemy import event

@event.listens_for(e, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

λ‚΄ μ½”λ“œμ—μ„œ to_sql ν•¨μˆ˜λ₯Ό μ‹€ν–‰ν•˜λŠ” 데 7 뢄이 걸리고 μ΄μ œλŠ” 5 초 밖에 걸리지 μ•ŠμŠ΅λ‹ˆλ‹€.)

λͺ¨λ“  24 λŒ“κΈ€

μ—¬κΈ°λ₯Ό μ°Έμ‘°ν•˜μ‹­μ‹œμ˜€ : http://stackoverflow.com/questions/33816918/write-large-pandas-dataframes-to-sql-server-database

SQLServerλ₯Ό μ‚¬μš©ν•˜λ©΄ νš¨μœ¨μ„±μ„ μœ„ν•΄ λŒ€λŸ‰ μ—…λ‘œλ“œμ™€ ν•¨κ»˜ csvλ₯Ό 톡해 κ°€μ Έμ™€μ•Όν•©λ‹ˆλ‹€.

유용 ν•  κ²ƒμž…λ‹ˆλ‹€ : http://odo.pydata.org/en/latest/perf.html

ODOλŠ” λ‚˜λ₯Ό μœ„ν•΄ μž‘λ™ν•˜μ§€ μ•Šκ³  μˆ˜μ •ν•  수 μ—†μ—ˆλ˜ 였λ₯˜λ₯Ό μƒμ„±ν•˜μ§€λ§Œ d6tstack은 https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb μ—μ„œ 잘 μž‘λ™ν–ˆμŠ΅λ‹ˆλ‹€

engine = create_engine(connection_string) μ•„λž˜μ—μ΄ μ½”λ“œλ₯Ό μΆ”κ°€ν•˜μ‹­μ‹œμ˜€.

from sqlalchemy import event

@event.listens_for(e, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

λ‚΄ μ½”λ“œμ—μ„œ to_sql ν•¨μˆ˜λ₯Ό μ‹€ν–‰ν•˜λŠ” 데 7 뢄이 걸리고 μ΄μ œλŠ” 5 초 밖에 걸리지 μ•ŠμŠ΅λ‹ˆλ‹€.)

κ°μ‚¬ν•©λ‹ˆλ‹€ @llautert!
λ§Žμ€ λ„μ›€μ΄λ˜μ—ˆμŠ΅λ‹ˆλ‹€!

# dont forget to import event
from sqlalchemy import event, create_engine

engine = create_engine(connection_string)

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

이 μˆ˜μ • ν”„λ‘œκ·Έλž¨μ„ μ‹€ν–‰ν•˜λ €κ³ ν–ˆμ§€λ§Œ 였λ₯˜ λ©”μ‹œμ§€κ°€ ν‘œμ‹œλ©λ‹ˆλ‹€.

AttributeError: 'psycopg2.extensions.cursor' object has no attribute 'fast_executemany'

무슨 일인지 μ•„λŠ” μ‚¬λžŒ μžˆλ‚˜μš”?

μ•ˆλ…•ν•˜μ„Έμš” @ tim-sauchuk, λ™μΌν•œ 였λ₯˜κ°€ λ°œμƒν–ˆμŠ΅λ‹ˆλ‹€. pandas.io.sql.py νŒŒμΌμ„ μ•½κ°„ νŽΈμ§‘ν•˜λŠ” λ“± ν›Œλ₯­ν•˜κ²Œ μž‘λ™ν•˜λŠ” μ†”λ£¨μ…˜μ„ μ°Ύμ•˜μ§€λ§Œ λ‹€μ‹œ κ°€μ Έ 였기 전에 __pycache__μ—μ„œ .pyc νŒŒμΌμ„ μ‚­μ œν•˜μ‹­μ‹œμ˜€. μ••μΆ• νŒŒμΌμ— μƒˆ 버전을 κΈ°λ‘ν•˜λŠ”μ§€ 확인)

https://github.com/pandas-dev/pandas/issues/8953

μ•ˆλ…•ν•˜μ„Έμš” @ tim-sauchuk, λ™μΌν•œ 였λ₯˜κ°€ λ°œμƒν–ˆμŠ΅λ‹ˆλ‹€. pandas.io.sql.py νŒŒμΌμ„ μ•½κ°„ νŽΈμ§‘ν•˜λŠ” λ“± ν›Œλ₯­ν•˜κ²Œ μž‘λ™ν•˜λŠ” μ†”λ£¨μ…˜μ„ μ°Ύμ•˜μ§€λ§Œ λ‹€μ‹œ κ°€μ Έ 였기 전에 pycache μ—μ„œ .pyc νŒŒμΌμ„ μ‚­μ œν•˜μ‹­μ‹œμ˜€. μ••μΆ• νŒŒμΌμ— μƒˆ 버전을 κΈ°λ‘ν•˜λŠ”μ§€ 확인)

8953

@ bsaunders23이 μ–ΈκΈ‰ ν•œ 이슈 # 8953은 "μ›μˆ­μ΄ 패치"(λŸ°νƒ€μž„μ— μˆ˜μ •) 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€. λ‚˜λŠ” 그것을 μ‹œλ„ν–ˆκ³ , μ—…λ‘œλ“œν•˜λŠ” 데 10 λΆ„ 이상 κ±Έλ¦¬λŠ” 20k 데이터 μ„ΈνŠΈλŠ” 4 초 밖에 걸리지 μ•Šμ•˜μŠ΅λ‹ˆλ‹€.

κ°μ‚¬ν•©λ‹ˆλ‹€ @llautert!
λ§Žμ€ λ„μ›€μ΄λ˜μ—ˆμŠ΅λ‹ˆλ‹€!

# dont forget to import event
from sqlalchemy import event, create_engine

engine = create_engine(connection_string)

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

self.engine μΈμŠ€ν„΄μŠ€κ°€μžˆλŠ” 클래슀 λ‚΄μ—μ„œμ΄ μ†”λ£¨μ…˜μ„ κ΅¬ν˜„ν•˜λŠ” 방법을 μ•„λŠ” μ‚¬λžŒμ΄ μžˆμŠ΅λ‹ˆκΉŒ?

self.engine μΈμŠ€ν„΄μŠ€κ°€μžˆλŠ” 클래슀 λ‚΄μ—μ„œμ΄ μ†”λ£¨μ…˜μ„ κ΅¬ν˜„ν•˜λŠ” 방법을 μ•„λŠ” μ‚¬λžŒμ΄ μžˆμŠ΅λ‹ˆκΉŒ?

self.engine λ₯Ό μ°Έμ‘°ν•˜μ—¬ λ‚˜λ₯Ό μœ„ν•΄ μž‘λ™ν•©λ‹ˆλ‹€.

예:

    self.engine = sqlalchemy.create_engine(connectionString, echo=echo)
    self.connection = self.engine.connect()

    @event.listens_for(self.engine, 'before_cursor_execute')
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        print("Listen before_cursor_execute - executemany: %s" % str(executemany))
        if executemany:
            cursor.fast_executemany = True
            cursor.commit()

λ‚˜λ₯Ό μœ„ν•΄ μž‘λ™ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. μ–΄λ–€ pandas와 sqlalchemy 버전을 μ‚¬μš©ν•˜κ³  μžˆμŠ΅λ‹ˆκΉŒ?

sqlalchemy : 1.2.4-py35h14c3975_0 및 1.2.11-py35h7b6447c_0 μ‹€ν–‰ ν•΄ λ³΄μ•˜μŠ΅λ‹ˆλ‹€.

ν•˜μ§€λ§Œ λ‚˜λŠ” 점점

AttributeError : 'psycopg2.extensions.cursor'κ°œμ²΄μ— 'fast_executemany'속성이 μ—†μŠ΅λ‹ˆλ‹€.

νŠΈμœ— λ‹΄μ•„ κ°€κΈ°

이 μ»¨ν…μŠ€νŠΈμ—μ„œ ν•¨μˆ˜ ν˜ΈμΆœμ€ μ–΄λ–»κ²Œ μƒκ²ΌμŠ΅λ‹ˆκΉŒ? 즉, ν…Œμ΄λΈ”μ„ μ„±κ³΅μ μœΌλ‘œ μ—…λ‘œλ“œν•˜κΈ° μœ„ν•΄ 인수둜 무엇을 μ‚¬μš©ν•˜κ³  μžˆμŠ΅λ‹ˆκΉŒ?

<# dont forget to import event
from sqlalchemy import event, create_engine

engine = create_engine(connection_string)

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()>``

sqlalchemy : 1.2.4-py35h14c3975_0 및 1.2.11-py35h7b6447c_0 μ‹€ν–‰ ν•΄ λ³΄μ•˜μŠ΅λ‹ˆλ‹€.

ν•˜μ§€λ§Œ λ‚˜λŠ” 점점

AttributeError : 'psycopg2.extensions.cursor'κ°œμ²΄μ— 'fast_executemany'속성이 μ—†μŠ΅λ‹ˆλ‹€.

postgresql λ“œλΌμ΄λ²„ 인 psycopg2λ₯Ό μ‚¬μš©ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 이 문제 및 μˆ˜μ •μ€ pyodbc λ“œλΌμ΄λ²„λ₯Ό μ‚¬μš©ν•˜λŠ” Microsoft SQL Server와 관련이 μžˆμŠ΅λ‹ˆλ‹€.

'dtype'맀개 λ³€μˆ˜ μΆ”κ°€λŠ” μ–΄λ–»μŠ΅λ‹ˆκΉŒ?

self.engine μΈμŠ€ν„΄μŠ€κ°€μžˆλŠ” 클래슀 λ‚΄μ—μ„œμ΄ μ†”λ£¨μ…˜μ„ κ΅¬ν˜„ν•˜λŠ” 방법을 μ•„λŠ” μ‚¬λžŒμ΄ μžˆμŠ΅λ‹ˆκΉŒ?

self.engine λ₯Ό μ°Έμ‘°ν•˜μ—¬ λ‚˜λ₯Ό μœ„ν•΄ μž‘λ™ν•©λ‹ˆλ‹€.

예:

    self.engine = sqlalchemy.create_engine(connectionString, echo=echo)
    self.connection = self.engine.connect()

    @event.listens_for(self.engine, 'before_cursor_execute')
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        print("Listen before_cursor_execute - executemany: %s" % str(executemany))
        if executemany:
            cursor.fast_executemany = True
            cursor.commit()

방법을 μ•Œμ•„ λƒˆμŠ΅λ‹ˆκΉŒ?

pandas dataframe λ₯Ό μ €μž₯ν•˜λ €λŠ” 경우 정닡은 https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2 -batch-mode-fast-execution을 μ‚¬μš©ν•˜λŠ” κ²ƒμž…λ‹ˆλ‹€. pandas dataframe ~ postgres

μƒˆ λ²„μ „μ˜ Pandasμ—λŠ” 'λ©€ν‹°'둜 선택할 μˆ˜μžˆλŠ” method 맀개 λ³€μˆ˜κ°€ ν¬ν•¨λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. μ΄λ ‡κ²Œν•˜λ©΄ μ½”λ“œκ°€ 훨씬 λΉ λ₯΄κ²Œ μ‹€ν–‰λ©λ‹ˆλ‹€.

fast_executemany λŠ” 이제 단일 λ‹¨κ³„λ‘œ μˆ˜ν–‰ ν•  수 μžˆμŠ΅λ‹ˆλ‹€ (sqlalchemy> = 1.3.0) :

engine = sqlalchemy.create_engine(connection_string, fast_executemany=True)

λ¬Έμ„œμ˜ μ–΄λ”˜κ°€μ—μ„œ λ˜λŠ” μ˜ˆμ œμ™€ ν•¨κ»˜ μ–ΈκΈ‰ ν•  κ°€μΉ˜κ°€ μžˆμŠ΅λ‹ˆκΉŒ? νŒλ‹€μ™€ κ΄€λ ¨μ΄μ—†λŠ” νŠΉλ³„ν•œ κ²½μš°μ΄μ§€λ§Œ λ§Žμ€ 경우 μ„±λŠ₯을 크게 ν–₯μƒμ‹œν‚¬ μˆ˜μžˆλŠ” μž‘μ€ μΆ”κ°€μž…λ‹ˆλ‹€.

μƒˆ λ²„μ „μ˜ Pandasμ—λŠ” 'λ©€ν‹°'둜 선택할 μˆ˜μžˆλŠ” method 맀개 λ³€μˆ˜κ°€ ν¬ν•¨λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. μ΄λ ‡κ²Œν•˜λ©΄ μ½”λ“œκ°€ 훨씬 λΉ λ₯΄κ²Œ μ‹€ν–‰λ©λ‹ˆλ‹€.

chunksize 맀개 λ³€μˆ˜λ₯Ό μ„€μ •ν•˜λ©΄ to_sql 일괄 μ‚½μž…μ„ μˆ˜ν–‰ν•˜λŠ” 데 μΆ©λΆ„ν•˜λ‹€κ³  생각할 수 μžˆμŠ΅λ‹ˆλ‹€.

MS SQL μ‚¬μš©μžλ₯Όμœ„ν•œ λŒ€μ•ˆμ€ turbodbc.Cursor.insertmanycolumns 도 μ‚¬μš©ν•˜λŠ” κ²ƒμž…λ‹ˆλ‹€. 링크 된 StackOverflow κ²Œμ‹œλ¬Όμ—μ„œ μ„€λͺ…ν–ˆμŠ΅λ‹ˆλ‹€. https://stackoverflow.com/a/62671681/1689261

이에 λŒ€ν•΄ ν–₯ν›„ λ…μžλ₯Ό μœ„ν•΄ to_sql에 'batch_mode'λ₯Ό μ‚¬μš©ν•˜λŠ” 두 가지 μ˜΅μ…˜μ΄ μžˆμŠ΅λ‹ˆλ‹€. λ‹€μŒμ€ 두 가지 μ‘°ν•©μž…λ‹ˆλ‹€.

create_engine(connection_string, executemany_mode='batch', executemany_batch_page_size=x)

λ˜λŠ”

create_engine(connection_string, executemany_mode='values', executemany_values_page_size=x)

μ΄λŸ¬ν•œ μΈμˆ˜μ— λŒ€ν•œ μžμ„Έν•œ λ‚΄μš©μ€ https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2 -fast-execution-helpersμ—μ„œ 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.

postgres μ‚¬μš©μžμ˜ 경우 method λ₯Ό 콜러 λΈ”λ‘œ μ„€μ •ν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€.

μ„œλͺ…μœΌλ‘œ 호좜 κ°€λŠ₯ (pd_table, conn, keys, data_iter) : νŠΉμ • λ°±μ—”λ“œ μ–Έμ–΄ κΈ°λŠ₯을 κΈ°λ°˜μœΌλ‘œλ³΄λ‹€ μ„±λŠ₯이 λ›°μ–΄λ‚œ μ‚½μž… 방법을 κ΅¬ν˜„ν•˜λŠ” 데 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

μ—¬κΈ° https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#insertion -method 및 예제 μ½”λ“œμ—μ„œ ν•¨μˆ˜λ₯Ό ν˜ΈμΆœν•©λ‹ˆλ‹€.

COPY FROM 이 정말 훨씬 λΉ λ¦…λ‹ˆλ‹€ πŸš€

이 νŽ˜μ΄μ§€κ°€ 도움이 λ˜μ—ˆλ‚˜μš”?
0 / 5 - 0 λ“±κΈ‰