I have a matrix consisting of a vector of which each element representing the rows is composed of a vector representing the columns of the matrix. I would like to sort the rows according to the 1st column.
Each element inside this matrix is a
double, although the first column contains a number that serves as an identifier (but is not unique).
My goal is to have something like the aggregate functions available in SQL, such as count() and sum() when I group by the first column.
For instance, if I have:
I would like to get:
ID COUNT MEAN
1 2 20
2 2 30
3 1 60
However, I am stuck in the very first step: how do I sort the rows according to the value of the first element of each row?
I found a clue on this topic, and changed adapted the comparator to:
bool compareFunction (double i,double j)
But the compiler was not very happy about that (making a reference to the stl_algo.h file):
error: cannot convert 'std::vector<double>' to 'double' in argument passing
I was therefore wondering if there is a way to sort such a
vectors when it contains
Best How To :
Answer (imho): use a different datastructure. What you are trying to do is setup a multimap. Oh hey look:
stl::multimap - how do i get groups of data?
It'll be faster for large numbers of elements. And is actually a map rather than a vector of vector of double.
Either that, or skip the sorting all together, and count by key using std::map, std::unordered_map, or (if you know the number of keys and/or the keys are offset by 1 with no breaks) std::vector.
To expand, sorting your list to get means will be slow. Sorting (using std::sort) is O(nlogn), and will be O(nlogn) every time you compute this mean. And it is an unessisary step: your stuff is grouped by key reguardless of order. std::map and std::multimap will "sort as you go" which will be just a little faster than sorting every time, but you won't have to sort the whole thing to get the list. Then you can just iterate the multimap to get the mean, O(n) each mean calculation. (It is still O(nlg(n)) to add all the elements to the multimap)
But if you know the key output is going to be 1,2,3...n-1,n, than sorting is a complete waste of time. Just make a counter for each key (since you know what the keys can be) and add to the key while iterating the array.
BUT WAIT THERE IS MORE
If the keys are actually setup the way you are thinking, than the best way from the get go is to forget the table structure, and make build it like this:
Count is now constant time for each row. Mean for each row is O(n). Getting a list is constant time for each row. EVERYBODY WINS.