I have a pandas DataFrame
my_data that looks like
event_id user_id attended 0 13 345 1 1 14 654 0 ...
user_id both have duplicates because there is an entry for each user and event combination. What I want to do is reshape this into a DataFrame where my indices (rows) are the DISTINCT
user_id's, the columns are the DISTINCT
event_id's and the values in a given (row, col) is just the boolean 0 or 1 of whether they attended.
It seems that the
pivot method is appropriate but of course when I tried
my_data.pivot(index='user_id', columns='event_id', values='attended') I got the error that the index has duplicates.
I was thinking I should do some kind of
groupby on the
user_id's first but I don't want to add up all the
attended 1's and 0's for each user because I specifically want to separate the
event_id's as my new columns and keep separate which event was attended by each user.
Any help would be greatly appreciated, thanks!