Pandas: `.to_sql`์— (ํ‚ค๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์‚ฝ์ž… ๋˜๋Š” ์—…๋ฐ์ดํŠธ) ์˜ต์…˜ ์ถ”๊ฐ€

์— ๋งŒ๋“  2016๋…„ 11์›” 01์ผ  ยท  42์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pandas-dev/pandas

person_age ๋ผ๋Š” ๊ธฐ์กด SQL ํ…Œ์ด๋ธ”์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ id ๋Š” ๊ธฐ๋ณธ ํ‚ค์ž…๋‹ˆ๋‹ค.

    age
id  
1   18
2   42

DataFrame ๋ผ๋Š” extra_data DataFrame ์ƒˆ ๋ฐ์ดํ„ฐ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

    age
id  
2   44
3   95

๊ทธ๋Ÿฌ๋ฉด primary key ๊ธฐ๋ฐ˜์œผ๋กœ ํ–‰์— INSERT ๋˜๋Š” UPDATE ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ DataFrame์„ SQL์— ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋Š” extra_data.to_sql() ์˜ต์…˜์ด ์žˆ์œผ๋ฉด ์œ ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. primary key .

์ด ๊ฒฝ์šฐ id=2 ํ–‰์ด age=44 ์—…๋ฐ์ดํŠธ๋˜๊ณ  id=3 ํ–‰์ด ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์ƒ ์ถœ๋ ฅ

    age
id  
1   18
2   44
3   95

(์•„๋งˆ๋„) ์œ ์šฉํ•œ ์ฝ”๋“œ ์ฐธ์กฐ

ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ๊ธฐ ์œ„ํ•ด pandas sql.py ์†Œ์Šค์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ณด์•˜์ง€๋งŒ ๋”ฐ๋ผ๊ฐˆ ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

์œ„์˜ ์˜ˆ๋ฅผ ๋ณต์ œํ•˜๋Š” ์ฝ”๋“œ

( sqlalchemy ์™€ sqlite ๋ฅผ ์„ž์–ด์„œ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค.

import pandas as pd
from sqlalchemy import create_engine
import sqlite3
conn = sqlite3.connect('example.db')

c = conn.cursor()
c.execute('''DROP TABLE IF EXISTS person_age;''')
c.execute('''
          CREATE TABLE person_age
          (id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL)
          ''')
conn.commit()
conn.close()

##### Create original table

engine = create_engine("sqlite:///example.db")
sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]})

sql_df.to_sql('person_age', engine, if_exists='append', index=False)


#### Extra data to insert/update

extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]})
extra_data.set_index('id', inplace=True)

#### extra_data.to_sql()  with row update or insert option

expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]})
expected_df.set_index('id', inplace=True)
Enhancement IO SQL

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

INSERT OR UPDATE ๊ฐ€ ๋ชจ๋“  ์—”์ง„์—์„œ ์ง€์›๋˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ์ง€๋งŒ INSERT OR REPLACE ๋Š” DataFrame ์ธ๋ฑ์Šค์˜ ๊ธฐ๋ณธ ํ‚ค ์ง‘ํ•ฉ์— ๋Œ€ํ•œ ๋Œ€์ƒ ํ…Œ์ด๋ธ”์—์„œ ํ–‰์„ ์‚ญ์ œํ•œ ๋‹ค์Œ DataFrame์˜ ๋ชจ๋“  ํ–‰. ํŠธ๋žœ์žญ์…˜์—์„œ ์ด๊ฒƒ์„ ํ•˜๊ณ  ์‹ถ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋ชจ๋“  42 ๋Œ“๊ธ€

์ด๊ฒƒ์€ ์ข‹์€ ๊ธฐ๋Šฅ์ด ๋  ๊ฒƒ์ด์ง€๋งŒ ์ฃผ์š” ๋ฌธ์ œ๋Š” pandas ์ž์ฒด์— ํฌํ•จํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ทจํ–ฅ์— ๋…๋ฆฝ์ ์ด๊ณ  sqlalchemy ์ฝ”์–ด(sqlalchemy ORM์ด ์•„๋‹˜)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๊ธฐ๋ฅผ ์›ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด๊ฒƒ์€ ๊ตฌํ˜„ํ•˜๊ธฐ ์–ด๋ ต๊ฒŒ ๋งŒ๋“ค ๊ฒƒ์ž…๋‹ˆ๋‹ค ..

์˜ˆ, upsert๊ฐ€ ๋ชจ๋“  db ์—”์ง„์—์„œ ์ง€์›๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์ด ํŒฌ๋”์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚ฌ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

INSERT OR UPDATE ๊ฐ€ ๋ชจ๋“  ์—”์ง„์—์„œ ์ง€์›๋˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ์ง€๋งŒ INSERT OR REPLACE ๋Š” DataFrame ์ธ๋ฑ์Šค์˜ ๊ธฐ๋ณธ ํ‚ค ์ง‘ํ•ฉ์— ๋Œ€ํ•œ ๋Œ€์ƒ ํ…Œ์ด๋ธ”์—์„œ ํ–‰์„ ์‚ญ์ œํ•œ ๋‹ค์Œ DataFrame์˜ ๋ชจ๋“  ํ–‰. ํŠธ๋žœ์žญ์…˜์—์„œ ์ด๊ฒƒ์„ ํ•˜๊ณ  ์‹ถ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@TomAugspurger ์ง€์›๋˜๋Š” db ์—”์ง„์— ๋Œ€ํ•ด upsert ์˜ต์…˜์„ ์ถ”๊ฐ€ํ•˜๊ณ  ์ง€์›๋˜์ง€ ์•Š๋Š” db ์—”์ง„์— ๋Œ€ํ•ด ์˜ค๋ฅ˜๋ฅผ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ด๊ฒƒ๋„ ๋ณด๊ณ  ์‹ถ๋„ค์š”. ๋‚˜๋Š” ์ˆœ์ˆ˜ํ•œ SQL๊ณผ SQL Alchemy๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์‚ฌ์ด์— ๋ผ์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋Œ€๋Ÿ‰ ์‚ฝ์ž…์„ ์œ„ํ•ด psycopg2 COPY๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๊ฐ’์ด ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ๋Š” ํ…Œ์ด๋ธ”์— pd.to_sql์„ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

insert_values = df.to_dict(orient='records')
insert_statement = sqlalchemy.dialects.postgresql.insert(table).values(insert_values)
upsert_statement = insert_statement.on_conflict_do_update(
    constraint='fact_case_pkey',
    set_= df.to_dict(orient='dict')
)

๊ทธ๋ฆฌ๊ณ  ์ˆœ์ˆ˜ํ•œ SQL:

def create_update_query(df, table=FACT_TABLE):
    """This function takes the Airflow execution date passes it to other functions"""
    columns = ', '.join([f'{col}' for col in DATABASE_COLUMNS])
    constraint = ', '.join([f'{col}' for col in PRIMARY_KEY])
    placeholder = ', '.join([f'%({col})s' for col in DATABASE_COLUMNS])
    values = placeholder
    updates = ', '.join([f'{col} = EXCLUDED.{col}' for col in DATABASE_COLUMNS])
    query = f"""INSERT INTO {table} ({columns}) 
    VALUES ({placeholder}) 
    ON CONFLICT ({constraint}) 
    DO UPDATE SET {updates};"""
    query.split()
    query = ' '.join(query.split())
    return query

def load_updates(df, connection=DATABASE):
    """Uses COPY from STDIN to load to Postgres
     :param df: The dataframe which is writing to StringIO, then loaded to the the database
     :param connection: Refers to a PostgresHook
    """
    conn = connection.get_conn()
    cursor = conn.cursor()
    df1 = df.where((pd.notnull(df)), None)
    insert_values = df1.to_dict(orient='records')
    for row in insert_values:
        cursor.execute(create_update_query(df), row)
        conn.commit()
    cursor.close()
    del cursor
    conn.close()

@ldacey ์ด ์Šคํƒ€์ผ์€ ์ €์—๊ฒŒ

insert_values = merged_transactions_channels.to_dict(orient='records')
 insert_statement = sqlalchemy.dialects.postgresql.insert(orders_to_channels).values(insert_values)
    upsert_statement = insert_statement.on_conflict_do_update(
        constraint='orders_to_channels_pkey',
        set_={'channel_owner': insert_statement.excluded.channel_owner}
    )

@cdagnino ์ด ์Šค๋‹ˆํŽซ์€ ๋ณตํ•ฉ ํ‚ค์˜ ๊ฒฝ์šฐ ์ž‘๋™ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ ํ•ด๋‹น ์‹œ๋‚˜๋ฆฌ์˜ค๋„ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ์ฐพ์•„๋ด์•ผ๊ฒ ์Šต๋‹ˆ๋‹ค

์ด ์—…๋ฐ์ดํŠธ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ sqlachemy์˜ bulk_update_mappings ๋ฅผ ์‚ฌ์šฉํ•˜๋Š”

session.bulk_update_mappings(
  Table,
  pandas_df.to_dict(orient='records)
)

@neilfrndes์— ๋™์˜ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ DB๊ฐ€ ์ง€์›ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ด์™€ ๊ฐ™์€ ๋ฉ‹์ง„ ๊ธฐ๋Šฅ์ด ๊ตฌํ˜„๋˜์ง€ ์•Š๋„๋ก ํ•ด์„œ๋Š” ์•ˆ ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์ด ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์•„๋งˆ. ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ PR์„ ํ•œ๋‹ค๋ฉด. ์ข€ ๋” ์ƒ๊ฐํ•ด ๋ณด๋ฉด ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ€ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์›์น™์— ๋”ฐ๋ผ ์ด์— ๋ฐ˜๋Œ€ํ•˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‚˜๋Š” sql ์ฝ”๋“œ์— ๋„ˆ๋ฌด ์ต์ˆ™ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ์ตœ์„ ์˜ ์ ‘๊ทผ ๋ฐฉ์‹์ด ๋ฌด์—‡์ธ์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ•œ ๊ฐ€์ง€ ๊ฐ€๋Šฅ์„ฑ์€ ์ด PR์ด ๋„์ž…๋œ ๊ฒฝ์šฐ method ํ˜ธ์ถœ ๊ฐ€๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ upsert์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค: https://github.com/pandas-dev/pandas/pull/21401

๋“ค์–ด ํฌ์ŠคํŠธ ๊ทธ๋ ˆ์Šค ๊ทธ ๊ฐ™์ด ๋ณด์ผ ๊ฒƒ์ด๋‹ค (์•ˆ๋œ) :

from sqlalchemy.dialects import postgresql

def pg_upsert(table, conn, keys, data_iter):
    for row in data:
        row_dict = dict(zip(keys, row))
        stmt = postgresql.insert(table).values(**row_dict)
        upsert_stmt = stmt.on_conflict_do_update(
            index_elements=table.index,
            set_=row_dict)
        conn.execute(upsert_stmt)

mysql์— ๋Œ€ํ•ด์„œ๋„ ์œ ์‚ฌํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํฌ์ŠคํŠธ๊ทธ๋ ˆ์Šค์˜ ๊ฒฝ์šฐ execute_values๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•„์ž์˜ ๊ฒฝ์šฐ ์ฟผ๋ฆฌ๋Š” ์—…๋ฐ์ดํŠธ ์„ธํŠธ๋ฅผ ์ˆ˜ํ–‰ ํ•ด์•ผ ํ•˜๋Š”์ง€ ๋˜๋Š” ์•„๋ฌด ์ž‘์—…๋„ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์•„์•ผ ํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํ‘œ์‹œํ•˜๋Š” jinja2 ํ…œํ”Œ๋ฆฟ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋งค์šฐ ๋น ๋ฅด๊ณ  ์œ ์—ฐํ•ฉ๋‹ˆ๋‹ค. COPY ๋˜๋Š” copy_expert๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋งŒํผ ๋น ๋ฅด์ง€๋Š” ์•Š์ง€๋งŒ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

from psycopg2.extras import execute_values

df = df.where((pd.notnull(df)), None)
tuples = [tuple(x) for x in df.values]

`` with pg_conn: with pg_conn.cursor() as cur: execute_values(cur=cur, sql=insert_query, argslist=tuples, template=None, )

@danich1 ์ด๊ฒƒ์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์˜ˆ๋ฅผ ๋“ค์–ด ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

๋‚˜๋Š” bulk_update_mappings๋ฅผ ์‚ดํŽด๋ณด๋ ค๊ณ  ํ–ˆ์ง€๋งŒ ์ •๋ง๋กœ ๊ธธ์„ ์žƒ์–ด ์ž‘๋™ํ•˜๊ฒŒ ๋งŒ๋“ค์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

@critianionescu92 ์˜ˆ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
id ๋ฐ name ํ•„๋“œ๊ฐ€ ์žˆ๋Š” User๋ผ๋Š” ํ…Œ์ด๋ธ”์ด ์žˆ์Šต๋‹ˆ๋‹ค.

| ์•„์ด๋”” | ์ด๋ฆ„ |
| --- | --- |
| 0 | ์กด |
| 1 | ์กฐ |
| 2 | ํ•ด๋ฆฌ |

์—ด์€ ๊ฐ™์ง€๋งŒ ๊ฐ’์ด ์—…๋ฐ์ดํŠธ๋œ ํŒฌ๋” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ด ์žˆ์Šต๋‹ˆ๋‹ค.

| ์•„์ด๋”” | ์ด๋ฆ„ |
| --- | --- |
| 0 | ํฌ๋ฆฌ์Šค |
| 1 | ์ œ์ž„์Šค |

๋˜ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์•ก์„ธ์Šคํ•˜๊ธฐ ์œ„ํ•ด ์—ด๋ ค ์žˆ๋Š” ์„ธ์…˜ ๋ณ€์ˆ˜๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ:

session.bulk_update_mappings(
User,
<pandas dataframe above>.to_dict(orient='records')
)

Pandas๋Š” ํ…Œ์ด๋ธ”์„ sql์ด ํ…Œ์ด๋ธ”์˜ ํ–‰์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์‚ฌ์ „ ๋ชฉ๋ก [{id: 0, name: "chris"}, {id: 1, name:"james"}]์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ตœ์ข… ํ…Œ์ด๋ธ”์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

| ์•„์ด๋”” | ์ด๋ฆ„ |
| --- | --- |
| 0 | ํฌ๋ฆฌ์Šค |
| 1 | ์ œ์ž„์Šค |
| 2 | ํ•ด๋ฆฌ |

์•ˆ๋…•ํ•˜์„ธ์š”, @danich1 ์ž…๋‹ˆ๋‹ค. ๋‹ต๋ณ€ ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” ์—…๋ฐ์ดํŠธ๊ฐ€ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์Šค์Šค๋กœ ์•Œ์•„๋ƒˆ๋‹ค. ๋ถˆํ–‰ํžˆ๋„ ๋‚˜๋Š” ์„ธ์…˜์œผ๋กœ ์ž‘์—…ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ชจ๋ฆ…๋‹ˆ๋‹ค. ์ €๋Š” ์•„์ฃผ ์ดˆ๋ณด์ž์ž…๋‹ˆ๋‹ค.

๋‚ด๊ฐ€ ํ•˜๊ณ  ์žˆ๋Š” ์ผ์„ ๋ณด์—ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

` pypyodbc ๊ฐ€์ ธ์˜ค๊ธฐ
from to_sql_newrows import clean_df_db_dups, to_sql_newrows # GitHub์—์„œ ์ฐพ์€ 2๊ฐœ์˜ ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. ๋ถˆํ–‰ํžˆ๋„ ๋งํฌ๊ฐ€ ๊ธฐ์–ต๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. Clean_df_db_dups๋Š” ์—ฌ๋Ÿฌ ํ‚ค ์—ด์„ ํ™•์ธํ•˜์—ฌ SQL ํ…Œ์ด๋ธ”์— ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ํ–‰์„ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ์ œ์™ธํ•˜๊ณ  to_sql_newrows๋Š” ์ƒˆ ํ–‰์„ SQL์— ์‚ฝ์ž…ํ•˜๋Š” ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค.

from sqlalchemy import create_engine
engine = create_engine("engine_connection_string")

#Write data to SQL
Tablename = 'Dummy_Table_Name'
Tablekeys = Tablekeys_string
dftoupdateorinsertinSQL= random_dummy_dataframe

#Connect to sql server db using pypyodbc
cnxn = pypyodbc.connect("Driver={SQL Server};"
                        "Server=ServerName;"
                        "Database=DatabaseName;"
                        "uid=userid;pwd=password")

newrowsdf= clean_df_db_dups(dftoupdateorinsertinSQL, Tablename, engine, dup_cols=Tablekeys)
newrowsdf.to_sql(Tablename, engine, if_exists='append', index=False, chunksize = 140)
end=timer()

tablesize = (len(newrowsdf.index))

print('inserted %r rows '%(tablesize))`

์œ„์˜ ์ฝ”๋“œ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ SQL์— ์ด๋ฏธ ์žˆ๋Š” ํ–‰์„ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ์ œ์™ธํ•˜๊ณ  ์ƒˆ ํ–‰๋งŒ ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ํ•„์š”ํ•œ ๊ฒƒ์€ ์กด์žฌํ•˜๋Š” ํ–‰์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ œ๊ฐ€ ๋‹ค์Œ์— ๋ฌด์—‡์„ ํ•ด์•ผ ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

๋” ๋‚˜์€ TO_SQL์„ ์œ„ํ•œ ๋™๊ธฐ
to_sql ๋ฐ์ดํ„ฐ ๊ณผํ•™์ด ์„ฑ์žฅํ•˜๊ณ  ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง๊ณผ ํ˜ผํ•ฉ๋จ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‚ฌ๋ก€์™€ ๋” ์ž˜ ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์ด ์ ์  ๋” ์ค‘์š”ํ•ด์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

upsert ๋Š” ๊ทธ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด replace ๋Œ€์‹  ์‚ฌ์šฉํ•˜์—ฌ ํ…Œ์ด๋ธ”๊ณผ ํ•จ๊ป˜ ๋ชจ๋“  ๋ณด๊ธฐ ๋ฐ ์ œ์•ฝ ์กฐ๊ฑด์„ ์‚ญ์ œํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์•Œ๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๊ฒฝํ—˜ ๋งŽ์€ ์‚ฌ์šฉ์ž์—๊ฒŒ์„œ ๋ณธ ๋Œ€์•ˆ์€ ์ด ๋‹จ๊ณ„์—์„œ ํŒ๋‹ค ์‚ฌ์šฉ์„ ์ค‘๋‹จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์—…์ŠคํŠธ๋ฆผ์œผ๋กœ ์ „ํŒŒ๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๊ณ  ๊ฒฝํ—˜ ๋งŽ์€ ์‚ฌ์šฉ์ž ์‚ฌ์ด์—์„œ ํŒ๋‹ค ํŒจํ‚ค์ง€๋ฅผ ๋Š์Šจํ•˜๊ฒŒ ์œ ์ง€ํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ํŒ๋‹ค๊ฐ€ ๊ฐ€๊ณ ์ž ํ•˜๋Š” ๋ฐฉํ–ฅ์ธ๊ฐ€?

๋‚˜๋Š” ์šฐ๋ฆฌ๊ฐ€ to_sql์ด ๊ฐ€๋Šฅํ•œ ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๊ณ  ํ•ต์‹ฌ SQL ์—ฐ๊ธˆ์ˆ ์„ ์‚ฌ์šฉํ•˜๊ธฐ๋ฅผ ์›ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค. ์ง„์ •ํ•œ upsert ๋Œ€์‹  ์ž๋ฅด๊ฑฐ๋‚˜ ์‚ญ์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ์ „ํžˆ โ€‹โ€‹๋งŽ์€ ๊ฐ€์น˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

Pandas ์ œํ’ˆ ๋น„์ „๊ณผ ํ†ตํ•ฉ
์œ„์˜ ๋งŽ์€ ๋…ผ์Ÿ์€ method ์ธ์ˆ˜( @kjford ๊ฐ€ psql_insert_copy ๋กœ ์–ธ๊ธ‰ํ•œ ๋ฐ”์™€ ๊ฐ™์ด)๊ฐ€ ๋„์ž… ๋˜๊ณ 

๋‚˜๋Š” Pandas์˜ ํ•ต์‹ฌ ๊ธฐ๋Šฅ ๋˜๋Š” ์‹คํŒจํ•˜๋Š” ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์ด Pandas ๋‚ด์—์„œ upsert ๊ธฐ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์†”๋ฃจ์…˜/๋ชจ๋ฒ” ์‚ฌ๋ก€์— ๋Œ€ํ•œ ๋ฌธ์„œ์— ๊ธฐ๊บผ์ด ๊ธฐ์—ฌํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io -sql-method

Pandas ํ•ต์‹ฌ ๊ฐœ๋ฐœ/์ œํ’ˆ ๊ด€๋ฆฌ์ž๊ฐ€ ์„ ํ˜ธํ•˜๋Š” ๋ฐฉํ–ฅ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‚˜๋Š” ์šฐ๋ฆฌ๊ฐ€ ์—”์ง„์— ํŠน์ •ํ•œ ๊ตฌํ˜„์— ์—ด๋ ค ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. method='upsert' ๋ฅผ ์‚ฌ์šฉํ•˜์ž๋Š” ์ œ์•ˆ์€ ํƒ€๋‹นํ•ด ๋ณด์ด์ง€๋งŒ ํ˜„์‹œ์ ์—์„œ ๋ช…ํ™•ํ•œ ๋””์ž์ธ ์ œ์•ˆ์„ ํ•ด์ค„ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ CSV์—์„œ MySQL ํ…Œ์ด๋ธ”์˜ ๊ธฐ์กด ๋ฐ์ดํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋ ค๋Š” ์œ ์‚ฌํ•œ ์š”๊ตฌ ์‚ฌํ•ญ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” df.to_sql()์ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒˆ๋กœ ์ƒ์„ฑ๋œ ์ž„์‹œ ํ…Œ์ด๋ธ” ์— ์‚ฝ์ž…ํ•œ ๋‹ค์Œ MySQL ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ๊ธฐ์กด ํ…Œ์ด๋ธ”์— ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€/์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

MySQL ์ฐธ์กฐ: https://stackoverflow.com/questions/2472229/insert-into-select-from-on-duplicate-key-update?answertab=active#tab -top

๋ฉด์ฑ… ์กฐํ•ญ: ๋ถˆ๊ณผ ๋ฉฐ์น  ์ „์— Python๊ณผ Pandas๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š” ํŒฌ๋” ์—ฌ๋Ÿฌ๋ถ„: ์ €๋„ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ฒช์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ ํŒฌ๋”์—์„œ ๋กœ๋“œํ•˜๊ณ  ์กฐ์ž‘ํ•˜๋Š” ๋ ˆ์ฝ”๋“œ๋กœ ๋กœ์ปฌ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ž์ฃผ ์—…๋ฐ์ดํŠธํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ„๋‹จํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ DataFrame ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ํ‚ค๋กœ ์‚ฌ์šฉํ•˜๋Š” df.to_sql ๋ฐ pd.read_sql_table์˜ ๋…๋ฆฝ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. sqlalchemy ์ฝ”์–ด๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

https://pypi.org/project/pandabase/0.2.1/
https://github.com/notsambeck/pandabase

์ด ๋„๊ตฌ๋Š” ์ƒ๋‹นํžˆ ๋…๋‹จ์ ์ด๋ฉฐ Pandas์— ์žˆ๋Š” ๊ทธ๋Œ€๋กœ ํฌํ•จํ•˜๊ธฐ์— ์ ํ•ฉํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‚ด ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์˜ ๊ฒฝ์šฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. Pandas์— ๋งž๊ฒŒ ๋งˆ์‚ฌ์ง€ํ•˜๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ๋‹ค๋ฉด ๊ธฐ๊บผ์ด ๋„์™€๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ˜„์žฌ๋กœ์„œ๋Š” ๋‹ค์Œ์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค(current-ish pandas ๋ฐ sqlalchemy์˜ ์ œํ•œ๋œ ๊ฒฝ์šฐ, ๊ธฐ๋ณธ ํ‚ค๋กœ ๋ช…๋ช…๋œ index, SQLite ๋˜๋Š” Postgres ๋ฐฑ์—”๋“œ ๋ฐ ์ง€์›๋˜๋Š” ๋ฐ์ดํ„ฐ ์œ ํ˜•):

pip install pandabase / pandabase.to_sql(df, table_name, con_string, how='upsert')

cvonsteg๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ์†”๋ฃจ์…˜์„ ์ž‘์—… ์ค‘์ž…๋‹ˆ๋‹ค. 10์›”์— ์ œ์•ˆ๋œ ๋””์ž์ธ์œผ๋กœ ๋Œ์•„์˜ฌ ๊ณ„ํš์ž…๋‹ˆ๋‹ค.

@TomAugspurger ๊ฐ€ ์ œ์•ˆํ•œ ๋Œ€๋กœ @rugg2 ์™€ ์ €๋Š” to_sql() ์˜ upsert ์˜ต์…˜์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์„ค๊ณ„ ์ œ์•ˆ์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.

์ธํ„ฐํŽ˜์ด์Šค ์ œ์•ˆ

to_sql() ๋ฉ”์„œ๋“œ์—์„œ ๊ฐ€๋Šฅํ•œ method ์ธ์ˆ˜๋กœ ์ถ”๊ฐ€ํ•  2๊ฐœ์˜ ์ƒˆ ๋ณ€์ˆ˜:
1) upsert_update - ํ–‰ ์ผ์น˜ ์‹œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ํ–‰ ์—…๋ฐ์ดํŠธ(์•Œ๊ณ  ์žˆ๋Š” ๋ ˆ์ฝ”๋“œ ์—…๋ฐ์ดํŠธ์˜ ๊ฒฝ์šฐ - ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ๋‚˜ํƒ€๋ƒ„)
2) upsert_ignore - ํ–‰ ์ผ์น˜ ์‹œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ํ–‰ ์—…๋ฐ์ดํŠธ ์•ˆ ํ•จ(๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ๊ฒน์น˜๊ณ  ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ์ •์˜ํ•˜์ง€ ์•Š์œผ๋ ค๋Š” ๊ฒฝ์šฐ)

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("connection string")
df = pd.DataFrame(...)

df.to_sql(
    name='table_name', 
    con=engine, 
    if_exists='append', 
    method='upsert_update' # (or upsert_ignore)
)

๊ตฌํ˜„ ์ œ์•ˆ

์ด๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด SQLTable ํด๋ž˜์Šค๋Š” SQLTable.insert() ๋ฉ”์†Œ๋“œ์—์„œ ํ˜ธ์ถœ๋˜๋Š” upsert ๋…ผ๋ฆฌ๋ฅผ ํฌํ•จํ•˜๋Š” 2๊ฐœ์˜ ์ƒˆ๋กœ์šด ๊ฐœ์ธ ๋ฉ”์†Œ๋“œ๋ฅผ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค.

def insert(self, chunksize=None, method=None):

    #set insert method
    if method is None:
        exec_insert = self._execute_insert
    elif method == "multi":
        exec_insert = self.execute_insert_multi
    #new upsert methods <<<
    elif method == "upsert_update":
        exec_insert = self.execute_upsert_update
    elif method == "upsert_ignore":
        exec_insert = self.execute_upsert_ignore
    # >>>
    elif callable(method):
        exec_inset = partial(method, self)
    else:
        raise ValueError("Invalid parameter 'method': {}".format(method))

    ...

์šฐ๋ฆฌ๋Š” ์•„๋ž˜์— ์ž์„ธํžˆ ์„ค๋ช…๋œ ๊ทผ๊ฑฐ์™€ ํ•จ๊ป˜ ๋‹ค์Œ ๊ตฌํ˜„์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค(๋ชจ๋“  ์š”์ ์€ ํ† ๋ก ์„ ์œ„ํ•ด ์—ด๋ ค ์žˆ์Šต๋‹ˆ๋‹ค).

(1) DELETE ๋ฐ INSERT ์˜ ์›์ž ์‹œํ€€์Šค๋ฅผ ํ†ตํ•ด SQLAlchemy ์ฝ”์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์—”์ง„ ๋ถˆ๊ฐ€์ง€๋ก ์ 

  • ์ผ๋ถ€ dbms๋งŒ ๊ธฐ๋ณธ์ ์œผ๋กœ upsert ์ง€์›ํ•˜๋ฉฐ ๊ตฌํ˜„์€ ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ฒซ ๋ฒˆ์งธ ๊ตฌํ˜„์œผ๋กœ์„œ ์šฐ๋ฆฌ๋Š” ๋ชจ๋“  dbms์—์„œ ํ•˜๋‚˜์˜ ๊ตฌํ˜„์„ ํ…Œ์ŠคํŠธํ•˜๊ณ  ์œ ์ง€ ๊ด€๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ๋” ์‰ฌ์šธ ๊ฒƒ์ด๋ผ๊ณ  ๋ฏฟ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ์ˆ˜์š”๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์—”์ง„๋ณ„ ๊ตฌํ˜„์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • upsert_ignore ์ด๋Ÿฌํ•œ ์ž‘์—…์€ ์ผ์น˜ํ•˜๋Š” ๋ ˆ์ฝ”๋“œ์—์„œ ๋ถ„๋ช…ํžˆ ๊ฑด๋„ˆ๋œ๋‹ˆ๋‹ค.
  • ์„ฑ๋Šฅ ์ธก๋ฉด์—์„œ ์—”์ง„์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๋Š” ๊ตฌํ˜„๊ณผ ์—”์ง„์— ํŠน์ •ํ•œ ๊ตฌํ˜„์„ ๋น„๊ตํ•  ๊ฐ€์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

(2) ๊ธฐ๋ณธ ํ‚ค์—๋งŒ Upsert

  • ๋‹ฌ๋ฆฌ ์ง€์ •๋˜์ง€ ์•Š๋Š” ํ•œ Upserts๋Š” ๊ธฐ๋ณธ ํ‚ค ์ถฉ๋Œ๋กœ ๊ธฐ๋ณธ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.
  • ์ผ๋ถ€ DBMS์—์„œ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๊ธฐ๋ณธ ํ‚ค๊ฐ€ ์•„๋‹Œ ์—ด์„ ์ง€์ •ํ•˜์—ฌ ๊ณ ์œ ์„ฑ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋” ๋งŽ์€ ์œ ์—ฐ์„ฑ์„ ๋ถ€์—ฌํ•˜์ง€๋งŒ ์ž ์žฌ์ ์ธ ํ•จ์ •์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์—ด์— UNIQUE ์ œ์•ฝ ์กฐ๊ฑด์ด ์—†์œผ๋ฉด ์—ฌ๋Ÿฌ ํ–‰์ด upsert ์กฐ๊ฑด๊ณผ ์ผ์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ์—…๋ฐ์ดํŠธํ•ด์•ผ ํ•˜๋Š” ๋ ˆ์ฝ”๋“œ๊ฐ€ ๋ชจํ˜ธํ•˜๋ฏ€๋กœ upsert๋ฅผ ์ˆ˜ํ–‰ํ•ด์„œ๋Š” ์•ˆ ๋ฉ๋‹ˆ๋‹ค. pandas์—์„œ ์ด๋ฅผ ์ ์šฉํ•˜๋ ค๋ฉด ์‚ฝ์ž…ํ•˜๊ธฐ ์ „์— ๊ฐ ํ–‰์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜์—ฌ 1๊ฐœ ๋˜๋Š” 0๊ฐœ ํ–‰๋งŒ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ๊ตฌํ˜„ํ•˜๊ธฐ๊ฐ€ ์ƒ๋‹นํžˆ ๊ฐ„๋‹จํ•˜์ง€๋งŒ ๊ฐ ๋ ˆ์ฝ”๋“œ์— ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ ์ž‘์—…(1๊ฐœ ๋ ˆ์ฝ”๋“œ ์ถฉ๋Œ์ด ๋ฐœ๊ฒฌ๋œ ๊ฒฝ์šฐ ์‚ญ์ œ)์ด ํ•„์š”ํ•˜๋ฏ€๋กœ ๋” ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ๋Š” ๋งค์šฐ ๋น„ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.
  • ํ–ฅํ›„ ๊ฐœ์„  ์‚ฌํ•ญ์—์„œ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์š”์ฒญํ•˜๋Š” ๊ฒฝ์šฐ ๊ธฐ๋ณธ ํ‚ค๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‚ฌ์šฉ์ž ์ง€์ • ํ•„๋“œ์—์„œ๋„ ์ž‘๋™ํ•˜๋„๋ก upsert๋ฅผ ํ™•์žฅํ•˜๋Š” ๊ธฐ๋Šฅ์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ Pandas๊ฐ€ ์ž˜๋ชป ์„ค๊ณ„๋œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋˜๋Š” ๋” ๋งŽ์€ ๊ธฐ๋Šฅ์„ ๊ฐ€์ง„ ์‚ฌ์šฉ์ž๋ฅผ ๋ณดํ˜ธํ•˜๊ธฐ ์œ„ํ•ด ๋‹จ์ˆœํ•˜๊ฒŒ ์œ ์ง€๋˜์–ด์•ผ ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ํ•ต์‹ฌ ๊ฐœ๋ฐœ ํŒ€์˜ ์žฅ๊ธฐ์ ์ธ ์งˆ๋ฌธ์ž…๋‹ˆ๋‹ค.

@TomAugspurger , upsert ์ œ์•ˆ์ด ๊ท€ํ•˜์—๊ฒŒ ์ ํ•ฉ

๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰ํ•˜๋ ค๋ฉด ์•Œ๋ ค์ฃผ์‹ญ์‹œ์˜ค.

์ œ์•ˆ์„œ๋ฅผ ์ฝ๋Š” ๊ฒƒ์€ ๋‚˜์˜ ํ•  ์ผ ๋ชฉ๋ก์— ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋‚ด๋ณด๋‹ค ์•ฝ๊ฐ„ ๋’ค์ณ์ ธ์žˆ๋‹ค.
์ง€๊ธˆ ์ด๋ฉ”์ผ.

2019๋…„ 10์›” 9์ผ ์ˆ˜์š”์ผ ์˜ค์ „ 9:18 Romain [email protected]์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

@TomAugspurger https://github.com/TomAugspurger , ๋งŒ์•ฝ ์šฐ๋ฆฌ๊ฐ€ ๋””์ž์ธํ•œ๋‹ค๋ฉด
@cvonsteg https://github.com/cvonsteg๋กœ ๋””์ž์ธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
์ฝ”๋“œ(ํ…Œ์ŠคํŠธ ํฌํ•จ)์˜ ๊ตฌํ˜„์„ ์ง„ํ–‰ํ•˜๊ณ  ํ’€์„ ์˜ฌ๋ฆฝ๋‹ˆ๋‹ค.
์š”๊ตฌ.

๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰ํ•˜๋ ค๋ฉด ์•Œ๋ ค์ฃผ์‹ญ์‹œ์˜ค.

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/pandas-dev/pandas/issues/14553?email_source=notifications&email_token=AAKAOITBNTWOQRBW3OWDEZDQNXR25A5CNFSM4CU2M7O2YY3PNVWWK3TUL52HS4DFVEXG43VMXVBW
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AAKAOIRZQEQWUY36PQ36QTLQNXR25ANCNFSM4CU2M7OQ
.

๋‚˜๋Š” ๊ฐœ์ธ์ ์œผ๋กœ ๊ทธ๊ฒƒ์— ๋Œ€ํ•ด ๋ฐ˜๋Œ€ํ•˜๋Š” ๊ฒƒ์ด ์—†์œผ๋ฏ€๋กœ PR์„ ํ™˜์˜ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. SQLAlchemy ์ฝ”์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋“  DBM์— ๋Œ€ํ•œ ํ•œ ๊ฐ€์ง€ ๊ตฌํ˜„์€ ๋‚ด๊ฐ€ ๊ท€ํ•˜์˜ ์š”์ ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ฝ๊ณ  ๊ธฐ๋ณธ ํ‚ค์™€ ๋™์ผํ•œ ๊ฒฝ์šฐ ํ™•์‹คํžˆ ์‹œ์ž‘ํ•ด์•ผ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

ํ•ญ์ƒ ์ž‘๊ฒŒ ์‹œ์ž‘ํ•˜์—ฌ ์ง‘์ค‘ํ•˜๊ณ  ํ™•์žฅํ•˜๊ธฐ๊ฐ€ ๋” ์‰ฝ์Šต๋‹ˆ๋‹ค.

์ด ๊ธฐ๋Šฅ์ด ์ ˆ์‹คํžˆ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๊ฐ€ cvonsteg๋กœ ์ž‘์„ฑํ•œ PR์€ ์ด์ œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: ์ง€๊ธˆ ๋ฆฌ๋ทฐ๋กœ!

์ด ๊ธฐ๋Šฅ์€ ์ ˆ๋Œ€์ ์œผ๋กœ ์˜๊ด‘์Šค๋Ÿฌ์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๋‚˜๋Š” github์˜ ์–ดํœ˜์— ์ •ํ†ตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ธฐ๋Šฅ์ด "์ง€๊ธˆ ๋ฆฌ๋ทฐ๋กœ ์ง„ํ–‰ ์ค‘"์ด๋ผ๋Š” @rugg2 ์˜ ์˜๊ฒฌ์€ ํ•ด๋‹น ๊ธฐ๋Šฅ์„ ๊ฒ€ํ† ํ•˜๋Š” ๊ฒƒ์ด ํŒฌ๋” ํŒ€์— ๋‹ฌ๋ ค ์žˆ๋‹ค๋Š” ์˜๋ฏธ์ธ๊ฐ€์š”? ์Šน์ธ๋˜๋ฉด ์„ค์น˜ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ ๋ฒ„์ „์˜ ํŒ๋‹ค๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๊นŒ? ์•„๋‹ˆ๋ฉด git์„ ํ†ตํ•ด ์ง์ ‘ ์ปค๋ฐ‹์„ ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ? (๋‚˜๋Š” conda๋ฅผ ํ†ตํ•ด ์ด๊ฒƒ์— ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์œผ๋ฏ€๋กœ ์ด ๊ฒฝ์šฐ ์ด ๊ธฐ๋Šฅ์ด ์ค€๋น„๋  ๋•Œ๊นŒ์ง€ ์†๋„๋ฅผ ๋†’์ด๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.) ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!!

@ pmgh2345 - ๋„ค, "์ง€๊ธˆ ๋ฆฌ๋ทฐ๊นŒ์ง€"๋Š” ํ’€ ๋ฆฌํ€˜์ŠคํŠธ๊ฐ€ ์ œ๊ธฐ๋˜์—ˆ๊ณ  ํ•ต์‹ฌ ๊ฐœ๋ฐœ์ž๋กœ๋ถ€ํ„ฐ ๊ฒ€ํ† 

์šฐ๋ฆฌ๊ฐ€ cvonsteg๋กœ ์ž‘์„ฑํ•œ PR์€ ์ด์ œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: ์ง€๊ธˆ ๋ฆฌ๋ทฐ๋กœ!

if_exists ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹  to_sql ๋ฉ”์„œ๋“œ์— ์ƒˆ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ด์œ ๋Š” if_exists ๊ฐ€ ํ–‰์ด ์•„๋‹Œ ํ…Œ์ด๋ธ”์˜ ์กด์žฌ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

@cvonsteg๋Š” ์›๋ž˜ method= ์‚ฌ์šฉ์„ ์ œ์•ˆ ํ–ˆ๋Š”๋ฐ, ์ด๋Š” if_exists ๋Œ€ํ•ด ๋‘ ๊ฐ€์ง€ ์˜๋ฏธ๋ฅผ ๊ฐ–๋Š” ๋ชจํ˜ธ์„ฑ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

df.to_sql(
    name='table_name', 
    con=engine, 
    if_exists='append', 
    method='upsert_update' # (or upsert_ignore)
)

@brylie ์‚ฌ์‹ค์ธ ์ƒˆ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์•Œ๋‹ค์‹œํ”ผ ๊ฐ๊ฐ์˜ ์ƒˆ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” API๋ฅผ ๋” ํˆฌ๋ฐ•ํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ ˆ์ถฉ์•ˆ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

ํ˜„์žฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ค‘์—์„œ ์„ ํƒํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ ์ฒ˜์Œ์—๋Š” method ์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๊ณ  ์ƒ๊ฐํ–ˆ์ง€๋งŒ ๋” ๋งŽ์€ ์ƒ๊ฐ์„ ํ•œ ํ›„์— (1) ์‚ฌ์šฉ๋ฒ•๊ณผ (2) ๋…ผ๋ฆฌ๊ฐ€ ๋ชจ๋‘ if_exists ๋” ์ ํ•ฉํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๊นจ๋‹ฌ์•˜์Šต๋‹ˆ๋‹ค.

1) API ์‚ฌ์šฉ ๊ด€์ ์—์„œ
์‚ฌ์šฉ์ž๋Š” ํ•œํŽธ์œผ๋กœ๋Š” method="multi" ๋˜๋Š” None์„ ์„ ํƒํ•˜๊ณ  ๋‹ค๋ฅธ ํ•œํŽธ์œผ๋กœ๋Š” "upsert"๋ฅผ ์„ ํƒํ•˜๊ธฐ๋ฅผ ์›ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ if_exists="append" ๋˜๋Š” "replace"(์žˆ๋Š” ๊ฒฝ์šฐ)์™€ ๋™์‹œ์— "upsert" ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•œ ๊ฐ•๋ ฅํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ์—†์Šต๋‹ˆ๋‹ค.

2) ๋…ผ๋ฆฌ์ ์ธ ๊ด€์ ์—์„œ

  • ๋ฐฉ๋ฒ•์€ ํ˜„์žฌ ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฝ์ž…๋˜๋Š” _how_์— ๋Œ€ํ•ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค: ํ–‰ ๋‹จ์œ„ ๋˜๋Š” "๋‹ค์ค‘"
  • if_exists ๋Š” "replace", "append", "upsert_update"(ํ‚ค๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ upsert, ์ƒˆ ๊ฒฝ์šฐ ์ถ”๊ฐ€), "upsert_ignore"(ํ‚ค๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ๋ฌด์‹œ, ์ƒˆ ๊ฒฝ์šฐ ์ถ”๊ฐ€)์™€ ๊ฐ™์ด ๋ ˆ์ฝ”๋“œ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋น„์ฆˆ๋‹ˆ์Šค ๋…ผ๋ฆฌ๋ฅผ ์บก์ฒ˜ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๊พธ๊ธฐ์™€ ์ถ”๊ฐ€๋Š” ํ…Œ์ด๋ธ”์˜ ์กด์žฌ๋ฅผ ์‚ดํŽด๋ณด์ง€๋งŒ ๋ ˆ์ฝ”๋“œ ์ˆ˜์ค€์—์„œ์˜ ์˜ํ–ฅ์œผ๋กœ๋„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ท€ํ•˜์˜ ์š”์ ์„ ์ž˜ ์ดํ•ดํ–ˆ๋‹ค๋ฉด ์•Œ๋ ค์ฃผ์‹œ๊ณ , ํ˜„์žฌ ๊ฒ€ํ†  ์ค‘์ธ ๊ตฌํ˜„(PR #29636)์ด ๋ถ€์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์‹œ๋ฉด ํฐ ์†Œ๋ฆฌ๋กœ ๋ง์”€ํ•ด ์ฃผ์‹ญ์‹œ์˜ค!

๋„ค, ์ œ ์š”์ ์„ ์ดํ•ดํ•˜๊ณ  ๊ณ„์‹ญ๋‹ˆ๋‹ค. ํ˜„์žฌ ๊ตฌํ˜„์€ ์ˆœ ๊ธ์ •์ ์ด์ง€๋งŒ ๋ชจํ˜ธํ•œ ์˜๋ฏธ๋กœ ์ธํ•ด ์•ฝ๊ฐ„ ๊ฐ์†Œํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ์—ฌ์ „ํžˆ if_exists ๊ฐ€ ํ…Œ์ด๋ธ” ์กด์žฌ๋ผ๋Š” ํ•œ ๊ฐ€์ง€๋งŒ ๊ณ„์† ์ฐธ์กฐํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค. ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋ชจํ˜ธ์„ฑ์ด ์žˆ์œผ๋ฉด ๊ฐ€๋…์„ฑ์— ๋ถ€์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ๋‚ด๋ถ€ ๋…ผ๋ฆฌ๊ฐ€ ๋ณต์žกํ•ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด upsert=True ์™€ ๊ฐ™์€ ์ƒˆ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ๋ช…ํ™•ํ•˜๊ณ  ๋ช…์‹œ์ ์ž…๋‹ˆ๋‹ค.

์—ฌ๋ณด์„ธ์š”!

upserts ์ˆ˜ํ–‰์— ๋Œ€ํ•œ ๋ถˆ๊ฐ€์ง€๋ก ์  ๊ตฌํ˜„์„ โ€‹โ€‹๋ณด๋ ค๋ฉด ๋‚ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

๊ณต๋™ ์ž‘์—…์ž์—๊ฒŒ ๋ช‡ ๊ฐ€์ง€ ์•„์ด๋””์–ด๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ƒ๊ฐ์„ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ @cvonsteg ์˜ PR ์ด ์ง„ํ–‰๋  ๋•Œ ์†๋„ ๋น„๊ต๋„ ํฅ๋ฏธ๋กœ์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ €๋Š” ์˜ค๋žซ๋™์•ˆ sqlalchemy ์ „๋ฌธ๊ฐ€๊ฐ€ ์•„๋‹ˆ์—ˆ์Šต๋‹ˆ๋‹ค!

์ด ๊ธฐ๋Šฅ์„ ์ •๋ง ์›ํ•ฉ๋‹ˆ๋‹ค. method='upsert_update' ๊ฐ€ ์ข‹์€ ์ƒ๊ฐ์ด๋ผ๋Š” ๋ฐ ๋™์˜ํ•ฉ๋‹ˆ๋‹ค.

์ด๊ฑฐ ์•„์ง ์˜ˆ์ •์ธ๊ฐ€์š”? ํŒฌ๋”๋Š” ์ด ๊ธฐ๋Šฅ์ด ์ •๋ง ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค

์˜ˆ, ์ด๊ฒƒ์€ ์•„์ง ๊ณ„ํš๋˜์–ด ์žˆ์œผ๋ฉฐ ๊ฑฐ์˜ โ€‹โ€‹๋‹ค ์™”์Šต๋‹ˆ๋‹ค!

์ฝ”๋“œ๊ฐ€ ์ž‘์„ฑ๋˜์—ˆ์ง€๋งŒ ํ†ต๊ณผํ•˜์ง€ ๋ชปํ•œ ํ…Œ์ŠคํŠธ๊ฐ€ ํ•˜๋‚˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋„์›€์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!
https://github.com/pandas-dev/pandas/pull/29636

2020๋…„ 5์›” 5์ผ ํ™”์š”์ผ 19:18 Leonel Atencio [email protected]์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

์ด๊ฑฐ ์•„์ง ์˜ˆ์ •์ธ๊ฐ€์š”? ํŒฌ๋”๋Š” ์ด ๊ธฐ๋Šฅ์ด ์ •๋ง ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/pandas-dev/pandas/issues/14553#issuecomment-624223231 ,
๋˜๋Š” ๊ตฌ๋… ์ทจ์†Œ
https://github.com/notifications/unsubscribe-auth/AI5X625A742YTYFZE7YW5A3RQBJ6NANCNFSM4CU2M7OQ
.

์—ฌ๋ณด์„ธ์š”! ๊ธฐ๋Šฅ์ด ์ค€๋น„๋˜์—ˆ๊ฑฐ๋‚˜ ์•„์ง ๋ˆ„๋ฝ๋œ ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๊นŒ? ๊ทธ๋ž˜๋„ ๋ˆ„๋ฝ๋œ ๊ฒƒ์ด ์žˆ์œผ๋ฉด ๋„์›€์ด ํ•„์š”ํ•˜๋ฉด ์•Œ๋ ค์ฃผ์„ธ์š”!

์†Œ์‹์ด ์žˆ๋‚˜์š”?))

Java ์„ธ๊ณ„์—์„œ ์™”๊ธฐ ๋•Œ๋ฌธ์— ์ด ๊ฐ„๋‹จํ•œ ๊ธฐ๋Šฅ์ด ๋‚ด ์ฝ”๋“œ๋ฒ ์ด์Šค๋ฅผ ์™„์ „ํžˆ ๋’ค์ง‘์„ ๊ฒƒ์ด๋ผ๊ณ ๋Š” ์ƒ๊ฐํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š” ์—ฌ๋Ÿฌ๋ถ„,

๋‚˜๋Š” upsert๊ฐ€ ์—ฌ๋Ÿฌ ๋ฐฉ์–ธ์— ๊ฑธ์ณ SQL์—์„œ ๊ตฌํ˜„๋˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณด์•˜๊ณ  ์—ฌ๊ธฐ์—์„œ ๋””์ž์ธ ๊ฒฐ์ •์„ ์•Œ๋ฆด ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ๊ธฐ์ˆ ์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋จผ์ € DELETE ... INSERT ๋…ผ๋ฆฌ ์‚ฌ์šฉ์— ๋Œ€ํ•ด ๊ฒฝ๊ณ ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ์™ธ๋ž˜ ํ‚ค๋‚˜ ํŠธ๋ฆฌ๊ฑฐ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ „์ฒด์˜ ๋‹ค๋ฅธ ๋ ˆ์ฝ”๋“œ๊ฐ€ ์‚ญ์ œ๋˜๊ฑฐ๋‚˜ ์—‰๋ง์ด ๋ฉ๋‹ˆ๋‹ค. MySQL์—์„œ REPLACE๋Š” ๋™์ผํ•œ ํ”ผํ•ด๋ฅผ ์ค๋‹ˆ๋‹ค. REPLACE๋ฅผ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์ œ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๋ฐ ๋ช‡ ์‹œ๊ฐ„์ด๋‚˜ ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค. ์ฆ‰, SQL์—์„œ ๊ตฌํ˜„๋œ ๊ธฐ์ˆ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ฐฉ์–ธ | ๊ธฐ์ˆ 
-- | --
MySQL | INSERT ... ์ค‘๋ณต ํ‚ค ์—…๋ฐ์ดํŠธ ์‹œ
PostgreSQL | INSERT ... ์ถฉ๋Œ ์‹œ
SQLite | INSERT ... ์ถฉ๋Œ ์‹œ
DB2 | ๋ณ‘ํ•ฉ
SQL ์„œ๋ฒ„ | ๋ณ‘ํ•ฉ
์˜ค๋ผํด | ๋ณ‘ํ•ฉ
SQL:2016 | ๋ณ‘ํ•ฉ

๋งค์šฐ ๋‹ค์–‘ํ•œ ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ DELETE ... INSERT๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„ ๋ฐฉ์–ธ์„ ๋ถˆ๊ฐ€์ง€๋ก ์ ์œผ๋กœ ๋งŒ๋“ค๊ณ  ์‹ถ์€ ์œ ํ˜น์„ ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ž„์‹œ ํ…Œ์ด๋ธ”๊ณผ ๊ธฐ๋ณธ INSERT ๋ฐ UPDATE ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ MERGE ๋ฌธ์˜ ๋…ผ๋ฆฌ๋ฅผ ๋ชจ๋ฐฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. SQL:2016 MERGE ๊ตฌ๋ฌธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

MERGE INTO target_table 
USING source_table 
ON search_condition
    WHEN MATCHED THEN
        UPDATE SET col1 = value1, col2 = value2,...
    WHEN NOT MATCHED THEN
        INSERT (col1,col2,...)
        VALUES (value1,value2,...);

์˜ค๋ผํด ํŠœํ† ๋ฆฌ์–ผ ์—์„œ ์ฐจ์šฉ
SQL Wikibook ์— ๋งž๊ฒŒ ์กฐ์ •๋จ

SQLAlchemy์—์„œ ์ง€์›ํ•˜๋Š” ๋ชจ๋“  ๋ฐฉ์–ธ์€ ์ž„์‹œ ํ…Œ์ด๋ธ”์„ ์ง€์›ํ•˜๋ฏ€๋กœ upsert๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋” ์•ˆ์ „ํ•˜๊ณ  ๋ฐฉ์–ธ์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋‹จ์ผ ํŠธ๋žœ์žญ์…˜์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ์ž„์‹œ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
  2. ํ•ด๋‹น ์ž„์‹œ ํ…Œ์ด๋ธ”์— ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜์‹ญ์‹œ์˜ค.
  3. ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜์‹ญ์‹œ์˜ค ... ๊ฐ€์ž…ํ•˜์‹ญ์‹œ์˜ค.
  4. ํ‚ค(PRIMARY ๋˜๋Š” UNIQUE)๊ฐ€ ์ผ์น˜ํ•˜์ง€ ์•Š๋Š” INSERT.
  5. ์ž„์‹œ ํ…Œ์ด๋ธ”์„ ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค.

๋ฐฉ์–ธ์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๋Š” ๊ธฐ์ˆ ์ธ ๊ฒƒ ์™ธ์—๋„ ์ตœ์ข… ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•  ํ‚ค๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ์œผ๋กœ์จ ํ™•์žฅ๋˜๋Š” ์ด์ ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์ž„์‹œ ํ…Œ์ด๋ธ”์˜ ๊ตฌ๋ฌธ๊ณผ ์—…๋ฐ์ดํŠธ ์กฐ์ธ์€ ๋ฐฉ์–ธ๋งˆ๋‹ค ์•ฝ๊ฐ„ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์ง€๋งŒ ๋ชจ๋“  ๊ณณ์—์„œ ์ง€์›๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜๋Š” ๋‚ด๊ฐ€ MySQL์— ๋Œ€ํ•ด ์ž‘์„ฑํ•œ ๊ฐœ๋… ์ฆ๋ช…์ž…๋‹ˆ๋‹ค.

import uuid

import pandas as pd
from sqlalchemy import create_engine


# This proof of concept uses this sample database
# https://downloads.mysql.com/docs/world.sql.zip


# Arbitrary, unique temp table name to avoid possible collision
source = str(uuid.uuid4()).split('-')[-1]

# Table we're doing our upsert against
target = 'countrylanguage'

db_url = 'mysql://<{user: }>:<{passwd: }>.@<{host: }>/<{db: }>'

df = pd.read_sql(
    f'SELECT * FROM `{target}`;',
    db_url
)

# Change for UPDATE, 5.3->5.4
df.at[0,'Percentage'] = 5.4
# Change for INSERT
df = df.append(
    {'CountryCode': 'ABW','Language': 'Arabic','IsOfficial': 'F','Percentage':0.0},
    ignore_index=True
)

# List of PRIMARY or UNIQUE keys
key = ['CountryCode','Language']

# Do all of this in a single transaction
engine = create_engine(db_url)
with engine.begin() as con:
    # Create temp table like target table to stage data for upsert
    con.execute(f'CREATE TEMPORARY TABLE `{source}` LIKE `{target}`;')
    # Insert dataframe into temp table
    df.to_sql(source,con,if_exists='append',index=False,method='multi')
    # INSERT where the key doesn't match (new rows)
    con.execute(f'''
        INSERT INTO `{target}`
        SELECT
            *
        FROM
            `{source}`
        WHERE
            (`{'`, `'.join(key)}`) NOT IN (SELECT `{'`, `'.join(key)}` FROM `{target}`);
    ''')
    # Create a doubled list of tuples of non-key columns to template the update statement
    non_key_columns = [(i,i) for i in df.columns if i not in key]
    # Whitespace for aesthetics
    whitespace = '\n\t\t\t'
    # Do an UPDATE ... JOIN to set all non-key columns of target to equal source
    con.execute(f'''
        UPDATE
            `{target}` `t`
                JOIN
            `{source}` `s` ON `t`.`{"` AND `t`.`".join(["`=`s`.`".join(i) for i in zip(key,key)])}`
        SET
            `t`.`{f"`,{whitespace}`t`.`".join(["`=`s`.`".join(i) for i in non_key_columns])}`;
    ''')
    # Drop our temp table.
    con.execute(f'DROP TABLE `{source}`;')

์—ฌ๊ธฐ์„œ ๋‚˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐ€์ •์„ ํ•œ๋‹ค.

  1. ์†Œ์Šค ๋ฐ ๋Œ€์ƒ์˜ ๊ตฌ์กฐ๋Š” ๋™์ผํ•ฉ๋‹ˆ๋‹ค.
  2. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ์‚ฝ์ž…์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  3. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋ฐ์ดํ„ฐ๋กœ ํ‚ค๊ฐ€ ์•„๋‹Œ ๋ชจ๋“  ์—ด์„ ๊ฐ„๋‹จํžˆ ์—…๋ฐ์ดํŠธํ•˜๋ ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  4. ํ‚ค ์—ด์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๊ณ  ์‹ถ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๊ฐ€์ •์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  MERGE์—์„œ ์˜๊ฐ์„ ์–ป์€ ๊ธฐ์ˆ ์ด ์œ ์—ฐํ•˜๊ณ  ๊ฐ•๋ ฅํ•œ upsert ์˜ต์…˜์„ ๊ตฌ์ถ•ํ•˜๋ ค๋Š” ๋…ธ๋ ฅ์— ์ •๋ณด๊ฐ€ ๋˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๋‚˜๋Š” ์ด๊ฒƒ์ด ์œ ์šฉํ•œ ๊ธฐ๋Šฅ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์ง€๋งŒ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜ ํ…Œ์ด๋ธ”์— ํ–‰์„ ์ถ”๊ฐ€ํ•˜๋ฉด์„œ ์ด๋Ÿฌํ•œ ๊ณตํ†ต ๊ธฐ๋Šฅ์„ ๊ฐ–๋Š” ๊ฒƒ์ด ์ง๊ด€์ ์ธ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค.

์ด ๊ธฐ๋Šฅ์„ ์ถ”๊ฐ€ํ•˜๋ ค๋ฉด ๋‹ค์‹œ ์ƒ๊ฐํ•ด ๋ณด์‹ญ์‹œ์˜ค. ๊ธฐ์กด ํ…Œ์ด๋ธ”์— ํ–‰์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
Alas Pangres๋Š” Python 3.7 ์ด์ƒ์œผ๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ๋‚ด ๊ฒฝ์šฐ์ฒ˜๋Ÿผ(๋‚˜๋Š” ์˜ค๋ž˜๋œ Python 3.4๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•จ) ํ•ญ์ƒ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์†”๋ฃจ์…˜์€ ์•„๋‹™๋‹ˆ๋‹ค.

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค, @GoldstHa - ์ •๋ง ์œ ์šฉํ•œ ์ •๋ณด์ž…๋‹ˆ๋‹ค. MERGE์™€ ๊ฐ™์€ ๊ตฌํ˜„์„ ์œ„ํ•ด POC๋ฅผ ๋งŒ๋“ค๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

DELETE/INSERT ์ ‘๊ทผ ๋ฐฉ์‹์˜ ๋ฌธ์ œ์™€ MySQL DB์˜ @GoldstHa MERGE ์ ‘๊ทผ ๋ฐฉ์‹์— ๋Œ€ํ•œ ์ž ์žฌ์  ์ฐจ๋‹จ์„ ๊ฐ์•ˆํ•  ๋•Œ ์กฐ๊ธˆ ๋” ํŒŒ๊ณ  sqlalchemy ์—…๋ฐ์ดํŠธ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ๋… ์ฆ๋ช…์„ ํ•จ๊ป˜

์ˆ˜์ •๋œ ์ ‘๊ทผ ๋ฐฉ์‹ ์ œ์•ˆ

API์™€ upsert๊ฐ€ ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ํ˜ธ์ถœ๋˜์–ด์•ผ ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ์ข‹์€ ๋…ผ์˜๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค(์ฆ‰, if_exists ์ธ์ˆ˜๋ฅผ ํ†ตํ•ด ๋˜๋Š” ๋ช…์‹œ์ ์ธ upsert ์ธ์ˆ˜๋ฅผ ํ†ตํ•ด). ์ด๊ฒƒ์€ ๊ณง ๋ฐํ˜€์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ๋กœ์„œ๋Š” SqlAlchemy upsert ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ๋Šฅ์ด ์ž‘๋™ํ•˜๋Š” ๋ฐฉ์‹์— ๋Œ€ํ•œ ์˜์‚ฌ ์ฝ”๋“œ ์ œ์•ˆ์ž…๋‹ˆ๋‹ค.

Identify primary key(s) and existing pkey values from DB table (if no primary key constraints identified, but upsert is called, return an error)

Make a temp copy of the incoming DataFrame

Identify records in incoming DataFrame with matching primary keys

Split temp DataFrame into records which have a primary key match, and records which don't

if upsert:
    Update the DB table using `update` for only the rows which match
else:
    Ignore rows from DataFrame with matching primary key values
finally:
    Append remaining DataFrame rows with non-matching values in the primary key column to the DB table
์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰