I am trying to make a script for a call centre that wishes to be able to upload millions of records from a csv file to a database, filtering out all duplicate phone numbers from the upload. To do this I am using Pandas and SQLAlchemy
df = read_csv('test.csv') rd = models.session.query(Test).all()
I know there is the
drop_duplicates() in pandas but can only find examples of removing duplicates from the same dataframe. Is this even applicable in my case
This is what I have so far thanks to the help of others.
df = read_csv('phones.csv') result_dict = [u.__dict__ for u in models.session.query(Dedupe).all()] df['tel'] = df.index rd = DataFrame.from_dict(result_dict) print df[~df['tel'].isin(rd['tel'].unique())]
It is still printing out all of the csv. Even if there are duplicates