I have a table which roughly looks like so:
id | unique | group ---------------------------------- 1 | aaa | group1 2 | bbb | group1 3 | ggg | group1 4 | iii | group2 5 | ooo | group2 6 | eee | group3 7 | ttt | group4 8 | qqq | group4 9 | nnn | group4
I would like to choose a somewhat random subset of roughly size
N from this table. The
N entries should belong to a subset of
groups, no matter which
group, no matter the group size and they should be randomly sorted on group-level. For example, for
N==3, I want either of
[ooo, iii, eee] (i.e. entries from group2, then group3) or
[eee, iii, ooo] (i.e. entries from group3, then group2) or
[ttt, nnn, qqq] (i.e. entries from group4) or
[qqq, nnn, ttt] (i.e. entries from group4) or
[eee, ggg, bbb, aaa] (i.e. entries from group3, then group1) or
I'm fine with not having a group "completely" selected.
I do not do this very often and I prefer multiple queries over one complex.
N will be around
100 or so, the table has <10k entries and the programm runs once per day.
N does not need to be exacly matched. I am doing this in Python/Mysqldb but I'm rather interested in the general algorithm/approach.
Additionally, once the
N entries are processed, I would like to ignore them on the next run, either by simply removing them or by setting some additonal attribute. In the latter case, the queries would need to be adapted.
EDIT: Changed for clarification.