I am using python pandas to load data from a MySQL database, change, then update another table. There are a 100,000+ rows so the UPDATE query's take some time.
Is there a more efficient way to update the data in the database than to use the
df.iterrows() and run an
UPDATE query for each row?
Best How To :
The problem here is not pandas, it is the
UPDATE operations. Each row will fire its own
UPDATE query, meaning lots of overhead for the database connector to handle.
You are better off using the
df.to_csv('filename.csv') method for dumping your dataframe into CSV, then read that CSV file into your MySQL database using the
LOAD DATA INFILE
Load it into a new table, then
DROP the old one and
RENAME the new one to the old ones name.
Furthermore, I suggest you do the same when loading data into pandas. Use the
SELECT INTO OUTFILE MySQL command and then load that file into pandas using the