person_age
ãšããæ¢åã®SQLããŒãã«ããããšããŸããããã§ã id
ã¯äž»ããŒã§ãã
age
id
1 18
2 42
ãŸãã DataFrame
ãšåŒã°ããextra_data
DataFrame
æ°ããããŒã¿ããããŸã
age
id
2 44
3 95
次ã«ã primary key
åºã¥ããŠãè¡ã«INSERT
ãŸãã¯UPDATE
ãªãã·ã§ã³ãæå®ããŠDataFrameãSQLã«æž¡ãããšãã§ãããªãã·ã§ã³ãextra_data.to_sql()
ã«å«ãããšäŸ¿å©ã§ãã primary key
ã
ãã®å Žåã id=2
è¡ã¯age=44
æŽæ°ããã id=3
è¡ãè¿œå ãããŸãã
age
id
1 18
2 44
3 95
merge
ã䜿çšããŸããïŒpandas
sql.py
ãœãŒã¹ã³ãŒãã調ã¹ãŠè§£æ±ºçãèããŸãããããã©ããŒã§ããŸããã§ããã
ïŒ sqlalchemy
ãšsqlite
ãæ··ããŠããŸã£ãããšããè©«ã³ããŸã
import pandas as pd
from sqlalchemy import create_engine
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''DROP TABLE IF EXISTS person_age;''')
c.execute('''
CREATE TABLE person_age
(id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL)
''')
conn.commit()
conn.close()
##### Create original table
engine = create_engine("sqlite:///example.db")
sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]})
sql_df.to_sql('person_age', engine, if_exists='append', index=False)
#### Extra data to insert/update
extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]})
extra_data.set_index('id', inplace=True)
#### extra_data.to_sql() with row update or insert option
expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]})
expected_df.set_index('id', inplace=True)
ããã¯çŽ æŽãããæ©èœã§ãããäž»ãªåé¡ã¯ããã³ãèªäœã«å«ããããã«ãããŒã¿ããŒã¹ãã¬ãŒããŒã«äŸåãããsqlalchemyã³ã¢ïŒsqlalchemy ORMã§ã¯ãªãïŒã«åºã¥ããŠããããšã§ãã
ããã¯ãããå®è£
ããã®ãé£ããããŸã..
ãããã¢ãããµãŒãã¯ãã¹ãŠã®ããŒã¿ããŒã¹ãšã³ãžã³ã§ãµããŒããããŠããããã§ã¯ãªãã®ã§ãããã¯ãã³ãã®ç¯å²å€ã ãšæããŸãã
INSERT OR UPDATE
ã¯ãã¹ãŠã®ãšã³ãžã³ã§ãµããŒããããŠããããã§ã¯ãããŸãããã INSERT OR REPLACE
ã¯ãDataFrameã€ã³ããã¯ã¹ã®äž»ããŒã®ã»ããã®ã¿ãŒã²ããããŒãã«ããè¡ãåé€ããŠããã DataFrameã®ãã¹ãŠã®è¡ã ãã©ã³ã¶ã¯ã·ã§ã³ã§ãããå®è¡ããå¿
èŠããããŸãã
@TomAugspurgerãµããŒããããŠããdbãšã³ãžã³ã«
ãããèŠããã§ãã ç§ã¯çŽç²ãªSQLãšSQLAlchemyã®äœ¿çšã®äžéã«ãããŸãïŒããã¯ãŸã æ©èœããŠããŸãããããã¯ãç§ãdictãæž¡ãæ¹æ³ãšé¢ä¿ããããšæããŸãïŒã psycopg2 COPYã䜿çšããŠäžæ¬æ¿å ¥ããŸãããå€ãæéã®çµéãšãšãã«å€åããå¯èœæ§ããããæ¿å ¥ãå°ãé ããŠãããŸããªãããŒãã«ã«ã¯pd.to_sqlã䜿çšããããšæããŸãã
insert_values = df.to_dict(orient='records')
insert_statement = sqlalchemy.dialects.postgresql.insert(table).values(insert_values)
upsert_statement = insert_statement.on_conflict_do_update(
constraint='fact_case_pkey',
set_= df.to_dict(orient='dict')
)
ãããŠçŽç²ãªSQLïŒ
def create_update_query(df, table=FACT_TABLE):
"""This function takes the Airflow execution date passes it to other functions"""
columns = ', '.join([f'{col}' for col in DATABASE_COLUMNS])
constraint = ', '.join([f'{col}' for col in PRIMARY_KEY])
placeholder = ', '.join([f'%({col})s' for col in DATABASE_COLUMNS])
values = placeholder
updates = ', '.join([f'{col} = EXCLUDED.{col}' for col in DATABASE_COLUMNS])
query = f"""INSERT INTO {table} ({columns})
VALUES ({placeholder})
ON CONFLICT ({constraint})
DO UPDATE SET {updates};"""
query.split()
query = ' '.join(query.split())
return query
def load_updates(df, connection=DATABASE):
"""Uses COPY from STDIN to load to Postgres
:param df: The dataframe which is writing to StringIO, then loaded to the the database
:param connection: Refers to a PostgresHook
"""
conn = connection.get_conn()
cursor = conn.cursor()
df1 = df.where((pd.notnull(df)), None)
insert_values = df1.to_dict(orient='records')
for row in insert_values:
cursor.execute(create_update_query(df), row)
conn.commit()
cursor.close()
del cursor
conn.close()
@ldaceyãã®ã¹ã¿ã€ã«ã¯ç§ã®ããã«æ©èœããŸããïŒinsert_statement.excludedã¯ãå¶çŽã«éåããããŒã¿ã®è¡ã®ãšã€ãªã¢ã¹ã§ãïŒïŒ
insert_values = merged_transactions_channels.to_dict(orient='records')
insert_statement = sqlalchemy.dialects.postgresql.insert(orders_to_channels).values(insert_values)
upsert_statement = insert_statement.on_conflict_do_update(
constraint='orders_to_channels_pkey',
set_={'channel_owner': insert_statement.excluded.channel_owner}
)
@cdagninoãã®ã¹ããããã¯ãè€åããŒã®å Žåã¯æ©èœããªãå¯èœæ§ãããããã®ã·ããªãªã«ã泚æããå¿ èŠããããŸãã ç§ã¯åãããšãããæ¹æ³ãèŠã€ããããšããŸã
ãã®æŽæ°ã®åé¡ã解決ãã1ã€ã®æ¹æ³ã¯ãsqlachemyã®bulk_update_mappingsã䜿çšããããš
session.bulk_update_mappings(
Table,
pandas_df.to_dict(orient='records)
)
ç§ã¯@neilfrndesã«åæã
å€åã 誰ããPRãããå Žåã ããã«æ€èšãããšãäžéšã®ããŒã¿ããŒã¹ããµããŒãããŠããªããšããååã«åºã¥ããŠãããã«å察ããŠãããšã¯æããŸããã ããããç§ã¯SQLã³ãŒãã«ããŸã粟éããŠããªãã®ã§ãæåã®ã¢ãããŒããäœã§ãããããããŸããã
1ã€ã®å¯èœæ§ã¯ããã®PRãå°å
¥ãããå Žåã«åŒã³åºãå¯èœãªmethod
ã䜿çšããŠãã¢ãããµãŒãã®äŸãããã€ãæäŸããããšã§ãïŒ https ïŒ
ïŒãã¹ããããŠããªãïŒã®ããã«èŠããpostgresã®å ŽåïŒ
from sqlalchemy.dialects import postgresql
def pg_upsert(table, conn, keys, data_iter):
for row in data:
row_dict = dict(zip(keys, row))
stmt = postgresql.insert(table).values(**row_dict)
upsert_stmt = stmt.on_conflict_do_update(
index_elements=table.index,
set_=row_dict)
conn.execute(upsert_stmt)
mysqlã«ã€ããŠãåæ§ã®ããšãã§ããŸãã
postgresã§ã¯execute_valuesã䜿çšããŠããŸãã ç§ã®å Žåãã¯ãšãªã¯jinja2ãã³ãã¬ãŒãã§ãããæŽæ°ã»ããããäœãå®è¡ããªããã瀺ã
from psycopg2.extras import execute_values
df = df.where((pd.notnull(df)), None)
tuples = [tuple(x) for x in df.values]
``
with pg_conn:
with pg_conn.cursor() as cur:
execute_values(cur=cur,
sql=insert_query,
argslist=tuples,
template=None,
)
@ danich1ãé¡ãããŸãããããã©ã®ããã«æ©èœãããã®äŸã瀺ããŠãã ããã
ç§ã¯bulk_update_mappingsã調ã¹ãããšããŸããããæ¬åœã«è¿·åã«ãªããæ©èœãããããšãã§ããŸããã§ããã
@ cristianionescu92äŸã¯æ¬¡ã®ããã«ãªããŸãïŒ
次ã®ãã£ãŒã«ããæã€UserãšããããŒãã«ããããŸãïŒidãšnameã
| id | åå|
| --- | --- |
| 0 | ãžã§ã³|
| 1 | ãžã§ãŒ|
| 2 | ããªãŒ|
åãåã§å€ãæŽæ°ããããã³ãã®ããŒã¿ãã¬ãŒã ããããŸãã
| id | åå|
| --- | --- |
| 0 | ã¯ãªã¹|
| 1 | ãžã§ãŒã ãº|
ãŸããããŒã¿ããŒã¹ã«ã¢ã¯ã»ã¹ããããã«éããŠããã»ãã·ã§ã³å€æ°ããããšä»®å®ããŸãããã ãã®ã¡ãœãããåŒã³åºãããšã«ããïŒ
session.bulk_update_mappings(
User,
<pandas dataframe above>.to_dict(orient='records')
)
ãã³ãã¯ããŒãã«ãèŸæžã®ãªã¹ã[{idïŒ0ãnameïŒ "chris"}ã{idïŒ1ãnameïŒ "james"}]ã«å€æããSQLãããŒãã«ã®è¡ãæŽæ°ããããã«äœ¿çšããŸãã ãããã£ãŠãæçµçãªããŒãã«ã¯æ¬¡ã®ããã«ãªããŸãã
| id | åå|
| --- | --- |
| 0 | ã¯ãªã¹|
| 1 | ãžã§ãŒã ãº|
| 2 | ããªãŒ|
ããã«ã¡ã¯ã @ danich1 ããåçããããšã
ç§ãããŠããããšããèŠãããŸãããïŒ
`pypyodbcãã€ã³ããŒãããŸã
from to_sql_newrows import clean_df_db_dupsãto_sql_newrowsïŒãããã¯GitHubã§èŠã€ãã2ã€ã®é¢æ°ã§ãããæ®å¿µãªãããªã³ã¯ãæãåºããŸããã Clean_df_db_dupsã¯ãããã€ãã®ããŒåããã§ãã¯ããããšã«ãããSQLããŒãã«ã«ãã§ã«ååšããè¡ãããŒã¿ãã¬ãŒã ããé€å€ããŸããto_sql_newrowsã¯ãSQLã«æ°ããè¡ãæ¿å
¥ããé¢æ°ã§ãã
from sqlalchemy import create_engine
engine = create_engine("engine_connection_string")
#Write data to SQL
Tablename = 'Dummy_Table_Name'
Tablekeys = Tablekeys_string
dftoupdateorinsertinSQL= random_dummy_dataframe
#Connect to sql server db using pypyodbc
cnxn = pypyodbc.connect("Driver={SQL Server};"
"Server=ServerName;"
"Database=DatabaseName;"
"uid=userid;pwd=password")
newrowsdf= clean_df_db_dups(dftoupdateorinsertinSQL, Tablename, engine, dup_cols=Tablekeys)
newrowsdf.to_sql(Tablename, engine, if_exists='append', index=False, chunksize = 140)
end=timer()
tablesize = (len(newrowsdf.index))
print('inserted %r rows '%(tablesize))`
äžèšã®ã³ãŒãã¯åºæ¬çã«ãSQLã«æ¢ã«ããè¡ãããŒã¿ãã¬ãŒã ããé€å€ããæ°ããè¡ã®ã¿ãæ¿å ¥ããŸãã å¿ èŠãªã®ã¯ãååšããè¡ãæŽæ°ããããšã§ãã 次ã«äœããã¹ããç解ããã®ãæäŒã£ãŠãããŸãããã
ããè¯ãTO_SQLãžã®åæ©
to_sql
ããŒã¿ããŒã¹ã®å®è·µãšããããçµ±åããããšã¯ãããŒã¿ãµã€ãšã³ã¹ãæé·ããããŒã¿ãšã³ãžãã¢ãªã³ã°ãšæ··ããåãã«ã€ããŠããŸããŸã䟡å€ãé«ãŸããŸãã
upsert
ã¯ãã®1ã€ã§ããç¹ã«ãåé¿çã¯ä»£ããã«replace
ã䜿çšããããšã§ãããããããŒãã«ãåé€ããããã¹ãŠã®ãã¥ãŒãšå¶çŽãåé€ãããŸãã
ç§ãããçµéšè±å¯ãªãŠãŒã¶ãŒã«èŠã代æ¿æ¡ã¯ããã®æ®µéã§ãã³ãã®äœ¿çšãåæ¢ããããšã§ããããã¯äžæµã«äŒæããåŸåãããããã³ãããã±ãŒãžãçµéšè±å¯ãªãŠãŒã¶ãŒã®éã§ç·©ãä¿æããŸãã ããã¯ãã³ããè¡ãããæ¹åã§ããïŒ
to_sqlãå¯èœãªéãããŒã¿ããŒã¹ã«äŸåãããã³ã¢SQLã¢ã«ã±ããŒã䜿çšããããšãç解ããŠããŸãã ãã ããçã®ã¢ãããµãŒãã®ä»£ããã«åãæšãŠãŸãã¯åé€ããæ¹æ³ã§ããå€ãã®äŸ¡å€ãè¿œå ãããŸãã
Pandas補åããžã§ã³ãšã®çµ±å
äžèšã®è°è«ã®å€ãã¯ã method
åŒæ°ïŒ @kjfordãpsql_insert_copy
èšåïŒãå°å
¥ãããåŒã³åºãå¯èœãªããžã§ã¯ããæž¡ãå¯èœæ§ãå°å
¥ãããåã«çºçããŸããã
ç§ã¯ãã³ãã®ã³ã¢æ©èœã«åãã§è²¢ç®ãããããããã§ããªãå Žåã¯ã以äžã®ãããªãã³ãå
ã§ã¢ãããµãŒãæ©èœãå®çŸããæ¹æ³ã«é¢ãããœãªã¥ãŒã·ã§ã³/ãã¹ããã©ã¯ãã£ã¹ã«é¢ããããã¥ã¡ã³ããæäŸããŸãã
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io -sql-method
Pandasã®ã³ã¢éçº/補åãããŒãžã£ãŒã«ãšã£ãŠå¥œãŸããåé²æ¹æ³ã¯äœã§ããïŒ
ç§ãã¡ã¯ããšã³ãžã³åºæã®å®è£
ãåãå
¥ããŠãããšæããŸãã method='upsert'
ã䜿çšãããšããææ¡ã¯åŠ¥åœãªããã§ãããçŸæç¹ã§ã¯ãæ確ãªèšèšææ¡ãèãåºã人ãå¿
èŠã ãšæããŸãã
MySQLããŒãã«ã®æ¢åã®ããŒã¿ãè€æ°ã®CSVããæéã®çµéãšãšãã«æŽæ°ããããšããåæ§ã®èŠä»¶ããããŸãã
df.to_sqlïŒïŒã䜿çšããŠãæ°ããäœæããäžæããŒãã«ã«æ°ããããŒã¿ãæ¿å ¥ããMySQLã¯ãšãªãå®è¡ããŠãæ¢åã®ããŒãã«ã®ããŒã¿ãè¿œå /æŽæ°ããæ¹æ³ãå¶åŸ¡ã§ãããšæããŸããã
MySQLãªãã¡ã¬ã³ã¹ïŒ https ïŒ//stackoverflow.com/questions/2472229/insert-into-select-from-on-duplicate-key-update = activeïŒtab -top
å 責äºé ïŒç§ã¯ã»ãã®æ°æ¥åã«PythonãšPandasã䜿ãå§ããŸããã
ã¡ãã£ãšãã³ãã®äººã ïŒç§ã¯ãããšåãåé¡ãæ±ããŠããŠããã³ãã§æçµçã«ããŒãããŠæäœããã¬ã³ãŒãã§ããŒã«ã«ããŒã¿ããŒã¹ãé »ç¹ã«æŽæ°ããå¿ èŠããããŸããã ãããè¡ãããã«åçŽãªã©ã€ãã©ãªãäœæããŸãããããã¯åºæ¬çã«ãããã©ã«ãã§äž»ããŒãšããŠDataFrameã€ã³ããã¯ã¹ã䜿çšããdf.to_sqlããã³pd.read_sql_tableã®ä»£çšã§ãã sqlalchemyã³ã¢ã®ã¿ã䜿çšããŸãã
https://pypi.org/project/pandabase/0.2.1/
Https://github.com/notsambeck/pandabase
ãã®ããŒã«ã¯ããªãæèŠãåãããŠãããPandasã«ãã®ãŸãŸå«ããã®ã¯ããããé©åã§ã¯ãããŸããã ããããç§ã®ç¹å®ã®ãŠãŒã¹ã±ãŒã¹ã§ã¯ãåé¡ã¯è§£æ±ºããŸã...ãã³ãã«åãŸãããã«ããããããµãŒãžããããšã«èå³ãããå Žåã¯ãåãã§ãæäŒãããŸãã
ä»ã®ãšããã次ã®ããã«æ©èœããŸãïŒçŸåšã®ãã³ããšsqlalchemyã®éãããã±ãŒã¹ã§ã¯ãã€ã³ããã¯ã¹ãäž»ããŒãšããŠæå®ããSQLiteãŸãã¯Postgresããã¯ãšã³ããããã³ãµããŒããããŠããããŒã¿åïŒã
pip install pandabase / pandabase.to_sqlïŒdfãtable_nameãcon_stringãhow = 'upsert'ïŒ
cvonstegã䜿çšããŠããã«å¯Ÿããäžè¬çãªè§£æ±ºçã«åãçµãã§ããŸãã 10æã«ææ¡ããããã¶ã€ã³ã§æ»ã£ãŠããããšãèšç»ããŠããŸãã
@ rugg2ãææ¡ãšç§ã¯ã®ããã«ä»¥äžã®èšèšæ¡ãåºãŠãããã@TomAugspurger upsert
ã§ãªãã·ã§ã³to_sql()
ã
to_sql()
ã¡ãœããã§å¯èœãªmethod
åŒæ°ãšããŠè¿œå ããã2ã€ã®æ°ããå€æ°ïŒ
1ïŒ upsert_update
-è¡ã®äžèŽæã«ãããŒã¿ããŒã¹ã®è¡ãæŽæ°ããŸãïŒã¬ã³ãŒããæ
æã«æŽæ°ãããã-ã»ãšãã©ã®ãŠãŒã¹ã±ãŒã¹ãè¡šããŸãïŒ
2ïŒ upsert_ignore
-è¡ã®äžèŽæã«ãããŒã¿ããŒã¹ã®è¡ãæŽæ°ããªãã§ãã ããïŒããŒã¿ã»ãããéè€ããŠããŠãããŒãã«ã®ããŒã¿ãäžæžãããããªãå ŽåïŒ
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("connection string")
df = pd.DataFrame(...)
df.to_sql(
name='table_name',
con=engine,
if_exists='append',
method='upsert_update' # (or upsert_ignore)
)
ãããå®è£
ããããã«ã SQLTable
ã¯ã©ã¹ã¯ã SQLTable.insertïŒïŒã¡ãœããããåŒã³åºãããã¢ãããµãŒãããžãã¯ãå«ã2ã€ã®æ°ãããã©ã€ããŒãã¡ãœãããåãåããŸãã
def insert(self, chunksize=None, method=None):
#set insert method
if method is None:
exec_insert = self._execute_insert
elif method == "multi":
exec_insert = self.execute_insert_multi
#new upsert methods <<<
elif method == "upsert_update":
exec_insert = self.execute_upsert_update
elif method == "upsert_ignore":
exec_insert = self.execute_upsert_ignore
# >>>
elif callable(method):
exec_inset = partial(method, self)
else:
raise ValueError("Invalid parameter 'method': {}".format(method))
...
以äžã«è©³çŽ°ã«æŠèª¬ãããŠããçè«çæ ¹æ ãšãšãã«ã以äžã®å®è£ ãææ¡ããŸãïŒãã¹ãŠã®ãã€ã³ãã¯è°è«ã®äœå°ããããŸãïŒã
DELETE
ããã³INSERT
ã¢ãããã¯ã·ãŒã±ã³ã¹ãä»ããŠãSQLAlchemyã³ã¢ã䜿çšããŠãšã³ãžã³ã«äŸåããªãupsert
ãã€ãã£ãã«ãµããŒãããŠãããå®è£
ã¯ãã¬ãŒããŒã«ãã£ãŠç°ãªãå¯èœæ§ããããŸãupsert_ignore
ãããã®æäœã¯äžèŽããã¬ã³ãŒãã§ã¯æããã«ã¹ããããããŸãUNIQUE
å¶çŽããªãå Žåã¯ãè€æ°ã®è¡ãã¢ãããµãŒãæ¡ä»¶ã«äžèŽããå¯èœæ§ããããŸãã ãã®å Žåãã©ã®ã¬ã³ãŒããæŽæ°ããå¿
èŠããããããããŸãã§ãããããã¢ãããµãŒããå®è¡ããªãã§ãã ããã ãã³ããããããå®æœããã«ã¯ãæ¿å
¥ããåã«ãåè¡ãåå¥ã«è©äŸ¡ããŠã1è¡ãŸãã¯0è¡ã®ã¿ãäžèŽããããšã確èªããå¿
èŠããããŸãã ãã®æ©èœã®å®è£
ã¯ããªãç°¡åã§ãããåã¬ã³ãŒãã§èªã¿åãããã³æžã蟌ã¿æäœïŒããã«ã1ã€ã®ã¬ã³ãŒãã®è¡çªãèŠã€ãã£ãå Žåã¯åé€ïŒãå¿
èŠã«ãªãã倧èŠæš¡ãªããŒã¿ã»ããã§ã¯éåžžã«éå¹ççã§ãã@ TomAugspurger ã upsert
ææ¡ãé©åãªå Žåã¯ãã³ãŒãïŒãã¹ããå«ãïŒã§ã®å®è£
ãç¶è¡ãããã«ãªã¯ãšã¹ããçæããŸãã
å¥ã®æ¹æ³ã§é²ãããå Žåã¯ãç¥ãããã ããã
ææ¡ãèªãããšã¯ç§ã®ããããšãªã¹ãã«ãããŸãã ç§ã¯å°ãé
ããŠããŸã
ä»ããã¡ãŒã«ããŠãã ããã
æ°Žã2019幎10æ9æ¥ã«ã¯åå9æ18åã§ãAMããã³ã®[email protected]ã¯æžããŸããïŒ
@TomAugspurger https://github.com/TomAugspurger ããã¶ã€ã³ã
@cvonsteg https://github.com/cvonstegã§èšèšããããã®ãããªãã«åã£ãŠããŸããç§ãã¡ã¯
ã³ãŒãïŒãã¹ããå«ãïŒã§å®è£ ãé²ãããã«ãäžããŸã
ãªã¯ãšã¹ããå¥ã®æ¹æ³ã§é²ãããå Žåã¯ãç¥ãããã ããã
â
ããªããèšåãããã®ã§ããªãã¯ãããåãåã£ãŠããŸãã
ãã®ã¡ãŒã«ã«çŽæ¥è¿ä¿¡ããGitHubã§è¡šç€ºããŠãã ãã
https://github.com/pandas-dev/pandas/issues/14553?email_source=notifications&email_token=AAKAOITBNTWOQRBW3OWDEZDQNXR25A5CNFSM4CU2M7O2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN
ãŸãã¯ã¹ã¬ããããã¥ãŒãããŸã
https://github.com/notifications/unsubscribe-auth/AAKAOIRZQEQWUY36PQ36QTLQNXR25ANCNFSM4CU2M7OQ
ã
å人çã«ã¯å察ã¯ãªãã®ã§ãPRã¯å€§æè¿ã ãšæããŸãã SQLAlchemyã³ã¢ã䜿çšãããã¹ãŠã®DBMã«ããã1ã€ã®å®è£ ã¯ããã€ã³ããæ£ããèªã¿åã£ãŠããå Žåã«ãããã©ã®ããã«éå§ãããã§ãããäž»ããŒã ãã§ãåãã§ãã
å°ããå§ããŠéäžããããããæ¡å€§ããã®ã¯åžžã«ç°¡åã§ã
ãã®æ©èœãã²ã©ãå¿ èŠã§ãã
cvonstegã§æžããPRã¯ãæ©èœãæäŸããã¯ãã§ãïŒä»ããã¬ãã¥ãŒãŸã§ïŒ
ãã®æ©èœã¯çµ¶å¯Ÿã«çŽ æŽãããã§ãããïŒ ç§ã¯githubã®èªåœã«ããŸã粟éããŠããŸããã æ©èœããä»ããã¬ãã¥ãŒããããšãã@ rugg2ã®ã³ã¡ã³ãã¯ããããã¬ãã¥ãŒããã®ã¯ãã³ãããŒã
@ pmgh2345-ãããããªããèšã£ãããã«ããä»ããã¬ãã¥ãŒããããšã¯ããã«ãªã¯ãšã¹ããçºçããã³ã¢éçºè ããã¬ãã¥ãŒäžã§ããããšãæå³ããŸãã äžèšã®PRïŒïŒ29636ïŒãã芧ããã ããŸãã æ¿èªãããããæŽæ°ãããã³ãŒãã§ãã©ã³ããæè¡çã«ãã©ãŒã¯ããæ©èœãçµã¿èŸŒãŸããç¬èªã®ããŒã«ã«ããŒãžã§ã³ã®ãã³ããã³ã³ãã€ã«ã§ããŸãããã ãããã¹ã¿ãŒã«ããŒãžãããŠãªãªãŒã¹ããããŸã§åŸ ã£ãŠããããããã€ã³ã¹ããŒã«ããããšããå§ãããŸãããã³ãã®ææ°ããŒãžã§ã³ã
cvonstegã§æžããPRã¯ãæ©èœãæäŸããã¯ãã§ãïŒä»ããã¬ãã¥ãŒãŸã§ïŒ
if_exists
ã䜿çšããã®ã§ã¯ãªãã to_sql
ã¡ãœããã«æ°ãããã©ã¡ãŒã¿ãè¿œå ãã䟡å€ããããããããŸããã ãã®çç±ã¯ã if_exists
ã¯è¡ã§ã¯ãªããããŒãã«ã®ååšããã§ãã¯ããŠããããã§ãã
@cvonstegã¯åœåã method=
ã䜿çšããããšãææ¡ããŸãããããã«ããã if_exists
ã«å¯ŸããŠ2ã€ã®æå³ãæã€ãšãããããŸãããåé¿ãããŸãã
df.to_sql(
name='table_name',
con=engine,
if_exists='append',
method='upsert_update' # (or upsert_ignore)
)
@brylieçã®æ°ãããã©ã¡ãŒã¿ãŒãè¿œå ããããšãã§ããŸããããåç¥ã®ããã«ãæ°ãããã©ã¡ãŒã¿ãŒããšã«APIãäžæ Œå¥œã«ãªããŸãã ãã¬ãŒããªãããããŸãã
ããªããèšã£ãããã«ãçŸåšã®ãã©ã¡ãŒã¿ããéžæããå¿
èŠãããå Žåãæåã¯method
åŒæ°ã䜿çšããããšãèããŸããããããã«æ€èšããåŸãïŒ1ïŒäœ¿çšæ³ãšïŒ2ïŒããžãã¯ã®äž¡æ¹ãif_exists
ããé©åããããšã«æ°ä»ããŸãã
1ïŒAPIã®äœ¿çšã®èŠ³ç¹ãã
ãŠãŒã¶ãŒã¯ãäžæ¹ã§method = "multi"ãŸãã¯Noneãéžæããä»æ¹ã§ "upsert"ãéžæããããšããå§ãããŸãã ãã ãããupsertãæ©èœãif_exists = "append"ãŸã㯠"replace"ãšåæã«äœ¿çšããå Žåãåçã®åŒ·åãªãŠãŒã¹ã±ãŒã¹ã¯ãããŸããã
2ïŒè«ççãªèŠ³ç¹ãã
ç§ãããªãã®ãã€ã³ããããç解ãããã©ããç§ã«ç¥ãããŠãã ããããããŠããªããã¬ãã¥ãŒäžã®çŸåšã®å®è£ ïŒPRïŒ29636ïŒãæ£å³ã®ãã¬ãã£ãã«ãªããšæããªãå«ãã§ãã ããïŒ
ãããããªãã¯ç§ã®ãã€ã³ããç解ããŠããŸãã çŸåšã®å®è£ ã¯æ£å³ã®ããžãã£ãã§ããããããŸããªã»ãã³ãã£ã¯ã¹ã«ãã£ãŠãããã«æžå°ããŠããŸãã
if_exists
ã¯ãããŒãã«ã®ååšãšãã1ã€ã®ããšã ããåç
§ãç¶ããå¿
èŠããããšç§ã¯ä»ã§ã䞻匵ããŠããŸãã ãã©ã¡ãŒã¿ããããŸãã«ãªããšãèªã¿ãããã«æªåœ±é¿ãåãŒããè€éãªå
éšããžãã¯ã«ã€ãªããå¯èœæ§ããããŸãã äžæ¹ã upsert=True
ãããªæ°ãããã©ã¡ãŒã¿ãè¿œå ããããšã¯ãæ確ã§æ瀺çã§ãã
ããã«ã¡ã¯ïŒ
ã¢ãããµãŒããå®è¡ããããã®äžå¯ç¥è«çã§ãªãå®è£ ãèŠããå Žåã¯ãã©ã€ãã©ãªpangresã䜿çšããäŸããããŸãã ãããã®ããŒã¿ããŒã¹ã¿ã€ãã«åºæã®sqlalchemyé¢æ°ã䜿çšããŠPostgreSQLãšMySQLãåŠçããŸãã SQliteïŒããã³åæ§ã®ã¢ãããµãŒãæ§æãå¯èœã«ããä»ã®ããŒã¿ããŒã¹ã¿ã€ãïŒã«é¢ããŠã¯ãã³ã³ãã€ã«ãããéåžžã®sqlalchemyInsertã䜿çšããŸãã
ç§ã¯ããããå
±åäœæ¥è
ã«ããã€ãã®ã¢ã€ãã¢ãäžãããããããªããšããèããå
±æããŸãïŒãã ãããããSQLåã«ãšãããããéåžžã«çã«ããªã£ãŠããããšãèªèããŠããŸãïŒã ãŸãã @ cvonstegã®PRãå®äºãããšãé床ã®æ¯èŒãèå³æ·±ãã§ãããã
ç§ã¯é·ãésqlalchemyã®å°é家ãªã©ã§ã¯ãªãããšã«æ³šæããŠãã ããïŒ
ç§ã¯æ¬åœã«ãã®æ©èœã欲ããã§ãã method='upsert_update'
ãè¯ãèãã§ããããšã«åæããŸãã
ããã¯ãŸã èšç»ãããŠããŸããïŒ ãã³ãã¯æ¬åœã«ãã®æ©èœãå¿ èŠã§ã
ã¯ããããã¯ãŸã èšç»ãããŠãããããããã§ãïŒ
ã³ãŒãã¯æžãããŠããŸãããåæ Œããªããã¹ãã1ã€ãããŸãã ããããïŒ
https://github.com/pandas-dev/pandas/pull/29636
ç«ã2020幎5æ5æ¥ã«ã¯ã19æ18åã¬ãªãã«Atencioã®[email protected]ã¯æžããŸããïŒ
ããã¯ãŸã èšç»ãããŠããŸããïŒ ãã³ãã¯æ¬åœã«ãã®æ©èœãå¿ èŠã§ã
â
ããªããèšåãããã®ã§ããªãã¯ãããåãåã£ãŠããŸãã
ãã®ã¡ãŒã«ã«çŽæ¥è¿ä¿¡ããGitHubã§è¡šç€ºããŠãã ãã
https://github.com/pandas-dev/pandas/issues/14553#issuecomment-624223231 ã
ãŸãã¯è³Œèªã解é€ãã
https://github.com/notifications/unsubscribe-auth/AI5X625A742YTYFZE7YW5A3RQBJ6NANCNFSM4CU2M7OQ
ã
ããã«ã¡ã¯ïŒ æ©èœã®æºåã¯ã§ããŠããŸããããããšããŸã äœãã足ããŸãããïŒ ããã§ã足ããªããã®ãããå Žåã¯ãäœããæäŒãã§ããããšãããã°ãç¥ãããã ããã
é£çµ¡ãã£ãïŒïŒïŒ
Javaã®äžçããæ¥ãã®ã§ããã®åçŽãªæ©èœãç§ã®ã³ãŒãããŒã¹ãã²ã£ããè¿ããããããªããšã¯æã£ãŠãã¿ãŸããã§ããã
çããããã«ã¡ã¯ã
æ¹èšå šäœã§SQLã«ã¢ãããµãŒããã©ã®ããã«å®è£ ãããŠãããã調ã¹ããšãããããã§èšèšäžã®æ±ºå®ã«åœ¹ç«ã€ããã€ãã®ææ³ãèŠã€ãããŸããã ãã ããæåã«ãDELETE ... INSERTããžãã¯ã䜿çšããªãããã«èŠåããããšæããŸãã å€éšããŒãŸãã¯ããªã¬ãŒãããå ŽåãããŒã¿ããŒã¹å šäœã®ä»ã®ã¬ã³ãŒããåé€ãããããããã§ãªããã°æ··ä¹±ããããšã«ãªããŸãã MySQLã§ã¯ãREPLACEã¯åããã¡ãŒãžãäžããŸãã REPLACEã䜿çšããã®ã§ãå®éã«ããŒã¿ãä¿®æ£ããããã«äœæéãã®äœæ¥ãäœæããŸããã ã€ãŸããSQLã§å®è£ ãããŠããææ³ã¯æ¬¡ã®ãšããã§ãã
æ¹èš| æè¡
-| -
MySQL | INSERT ... ON DUPLICATE KEY UPDATE
PostgreSQL | æ¿å
¥...競åã«ã€ããŠ
SQLite | æ¿å
¥...競åã«ã€ããŠ
Db2 | ããŒãž
SQL Server | ããŒãž
ãªã©ã¯ã«| ããŒãž
SQLïŒ2016 | ããŒãž
æ§æã¯å€§ããç°ãªããŸãããDELETE ... INSERTã䜿çšããŠå®è£ æ¹èšã«ãšããããªãããã«ããããšããèªæãç解ããŠããŸãã ãã ããå¥ã®æ¹æ³ããããŸããäžæããŒãã«ãšåºæ¬çãªINSERTããã³UPDATEã¹ããŒãã¡ã³ãã䜿çšããŠãMERGEã¹ããŒãã¡ã³ãã®ããžãã¯ãæš¡å£ã§ããŸãã SQLïŒ 2016MERGEæ§æã¯æ¬¡ã®ãšããã§ãã
MERGE INTO target_table
USING source_table
ON search_condition
WHEN MATCHED THEN
UPDATE SET col1 = value1, col2 = value2,...
WHEN NOT MATCHED THEN
INSERT (col1,col2,...)
VALUES (value1,value2,...);
Oracleãã¥ãŒããªã¢ã«ããåçš
SQLãŠã£ãããã¯ã¹ã«æºæ ããããã«èª¿æŽãããŸãã
SQLAlchemyã§ãµããŒããããŠãããã¹ãŠã®æ¹èšã¯äžæããŒãã«ããµããŒãããŠãããããã¢ãããµãŒããå®è¡ããããã®ããå®å šã§æ¹èšã«ãšããããªãã¢ãããŒãã¯ãåäžã®ãã©ã³ã¶ã¯ã·ã§ã³ã§æ¬¡ã®ããã«ãªããŸãã
æ¹èšã«ãšããããªãææ³ã§ããããšã«å ããŠããšã³ããŠãŒã¶ãŒãããŒã¿ã®æ¿å ¥æ¹æ³ãæŽæ°æ¹æ³ãããã³ããŒã¿ãçµåããããŒãéžæã§ããããã«ããããšã§ãæ¡åŒµã§ãããšããå©ç¹ããããŸãã
äžæããŒãã«ã®æ§æãšæŽæ°çµåã¯æ¹èšéã§ãããã«ç°ãªãå ŽåããããŸãããã©ãã§ããµããŒãããå¿ èŠããããŸãã
以äžã¯ãMySQLçšã«äœæããæŠå¿µå®èšŒã§ãã
import uuid
import pandas as pd
from sqlalchemy import create_engine
# This proof of concept uses this sample database
# https://downloads.mysql.com/docs/world.sql.zip
# Arbitrary, unique temp table name to avoid possible collision
source = str(uuid.uuid4()).split('-')[-1]
# Table we're doing our upsert against
target = 'countrylanguage'
db_url = 'mysql://<{user: }>:<{passwd: }>.@<{host: }>/<{db: }>'
df = pd.read_sql(
f'SELECT * FROM `{target}`;',
db_url
)
# Change for UPDATE, 5.3->5.4
df.at[0,'Percentage'] = 5.4
# Change for INSERT
df = df.append(
{'CountryCode': 'ABW','Language': 'Arabic','IsOfficial': 'F','Percentage':0.0},
ignore_index=True
)
# List of PRIMARY or UNIQUE keys
key = ['CountryCode','Language']
# Do all of this in a single transaction
engine = create_engine(db_url)
with engine.begin() as con:
# Create temp table like target table to stage data for upsert
con.execute(f'CREATE TEMPORARY TABLE `{source}` LIKE `{target}`;')
# Insert dataframe into temp table
df.to_sql(source,con,if_exists='append',index=False,method='multi')
# INSERT where the key doesn't match (new rows)
con.execute(f'''
INSERT INTO `{target}`
SELECT
*
FROM
`{source}`
WHERE
(`{'`, `'.join(key)}`) NOT IN (SELECT `{'`, `'.join(key)}` FROM `{target}`);
''')
# Create a doubled list of tuples of non-key columns to template the update statement
non_key_columns = [(i,i) for i in df.columns if i not in key]
# Whitespace for aesthetics
whitespace = '\n\t\t\t'
# Do an UPDATE ... JOIN to set all non-key columns of target to equal source
con.execute(f'''
UPDATE
`{target}` `t`
JOIN
`{source}` `s` ON `t`.`{"` AND `t`.`".join(["`=`s`.`".join(i) for i in zip(key,key)])}`
SET
`t`.`{f"`,{whitespace}`t`.`".join(["`=`s`.`".join(i) for i in non_key_columns])}`;
''')
# Drop our temp table.
con.execute(f'DROP TABLE `{source}`;')
ããã§ã¯ã次ã®ããšãåæãšããŠããŸãã
ä»®å®ã«ãããããããç§ã®MERGEã«çæ³ãåŸãææ³ããæè»ã§å ç¢ãªã¢ãããµãŒããªãã·ã§ã³ãæ§ç¯ããããã®åãçµã¿ã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã
ããã¯äŸ¿å©ãªæ©èœã ãšæããŸãããããŒãã«ã«è¡ãè¿œå ãããšãã«ãã®ãããªäžè¬çãªæ©èœã䜿çšããã®ã¯çŽæçã§ãããããç¯å²å€ã®ããã§ãã
ãã®é¢æ°ãè¿œå ããããšãããäžåºŠèããŠãã ãããæ¢åã®ããŒãã«ã«è¡ãè¿œå ãããšéåžžã«äŸ¿å©ã§ãã
AlasPangresã¯Python3.7以éã«å¶éãããŠããŸãã ç§ã®å Žåã®ããã«ïŒç§ã¯å€ãPython 3.4ã䜿çšããããšãäœåãªããããŠããŸãïŒãããã¯åžžã«å®è¡å¯èœãªè§£æ±ºçã§ã¯ãããŸããã
ããããšãã @ GoldstHa-ããã¯æ¬åœã«åœ¹ç«ã€å ¥åã§ãã MERGEã®ãããªå®è£ ã®POCãäœæããããšããŸã
DELETE/INSERT
ã¢ãããŒãã®åé¡ãšã MySQLDBã§ã®@GoldstHa MERGE
ã¢ãããŒãã®æœåšçãªãããã«ãŒãèæ
®ããŠãããå°ãæãäžããŸããã sqlalchemyã®æŽæ°æ©èœã䜿çšããŠæŠå¿µå®èšŒããŸãšããŸãããããã¯ææã«èŠããŸãã ä»é±ã¯Pandasã³ãŒãããŒã¹ã§é©åã«å®è£
ãããã®ã¢ãããŒãããã¹ãŠã®DBãã¬ãŒããŒã§æ©èœããããã«ããŸãã
APIãšãã¢ãããµãŒããå®éã«åŒã³åºãæ¹æ³ïŒã€ãŸãã if_exists
åŒæ°ããŸãã¯æ瀺çãªupsert
åŒæ°ãä»ããŠïŒã«ã€ããŠã¯ãããã€ãã®è¯ãè°è«ããããŸããã ããã¯ãŸããªãæããã«ãªããŸãã ä»ã®ãšãããããã¯SqlAlchemy
upsert
ã¹ããŒãã¡ã³ãã䜿çšããŠæ©èœãã©ã®ããã«æ©èœãããã«ã€ããŠã®æ¬äŒŒã³ãŒãã®ææ¡ã§ãã
Identify primary key(s) and existing pkey values from DB table (if no primary key constraints identified, but upsert is called, return an error)
Make a temp copy of the incoming DataFrame
Identify records in incoming DataFrame with matching primary keys
Split temp DataFrame into records which have a primary key match, and records which don't
if upsert:
Update the DB table using `update` for only the rows which match
else:
Ignore rows from DataFrame with matching primary key values
finally:
Append remaining DataFrame rows with non-matching values in the primary key column to the DB table
æãåèã«ãªãã³ã¡ã³ã
INSERT OR UPDATE
ã¯ãã¹ãŠã®ãšã³ãžã³ã§ãµããŒããããŠããããã§ã¯ãããŸããããINSERT OR REPLACE
ã¯ãDataFrameã€ã³ããã¯ã¹ã®äž»ããŒã®ã»ããã®ã¿ãŒã²ããããŒãã«ããè¡ãåé€ããŠããã DataFrameã®ãã¹ãŠã®è¡ã ãã©ã³ã¶ã¯ã·ã§ã³ã§ãããå®è¡ããå¿ èŠããããŸãã