I have a large csv file, which is a log of caller data.
A short snippet of my file:
CompanyName High Priority QualityIssue Customer1 Yes User Customer1 Yes User Customer2 No User Customer3 No Equipment Customer1 No Neither Customer3 No User Customer3 Yes User Customer3 Yes Equipment Customer4 No User
I want to sort the entire list by the frequency of occurrence of customers so it will be like:
CompanyName High Priority QualityIssue Customer3 No Equipment Customer3 No User Customer3 Yes User Customer3 Yes Equipment Customer1 Yes User Customer1 Yes User Customer1 No Neither Customer2 No User Customer4 No User
I've tried groupby, but that only prints out the Company Name and the frequency but not the other columns, I also tried
df['Totals']= [sum(df['CompanyName'] == df['CompanyName'][i]) for i in xrange(len(df))]
df = [sum(df['CompanyName'] == df['CompanyName'][i]) for i in xrange(len(df))]
But these give me errors: ValueError: Wrong number of items passed 1, indices imply 24
I've looked at something like this:
for key, value in sorted(mydict.iteritems(), key=lambda (k,v): (v,k)): print "%s: %s" % (key, value)
but this only prints out two columns, and I want to sort my entire csv. My output should be my entire csv sorted by the first column.
Thanks for the help in advance!