I have a couple of long lists of lists of related objects that I'd like to group to reduce redundancy. Pseudocode:
>>>list_of_lists = [[1,2,3],[3,4],[5,6,7],[1,8,9,10]...]
So lists that contain the same elements would be collapsed into single lists. Collapsing them is easy, once I find lists to combine I can make the lists into sets and take their union, but I'm not sure how to compare the lists. Do I need to do a series of
My first thought was that I should loop through and check whether each item in a sublist is in any of the other lists, if yes, merge the lists and then start over, but that seems terribly inefficient. I did some searching and found this: Python - dividing a list-of-lists to groups but my data isn't structured. Also, my actual data is a series of strings and thus not sortable in any meaningful sense.
I can write some gnarly looping code to make this work, but I was wondering if there are any built-in functions that would make this sort of comparison easier. Maybe something in list comprehensions?
Best How To :
I think this is a reasonably efficient way of doing it, if I understand your question correctly. The result here will be a list of sets.
Maybe the missing bit of knowledge was
d & g (also written
d.intersection(g)) for finding the set intersection, along with the fact that an empty set is "falsey" in Python
data = [[1,2,3],[3,4],[5,6,7],[1,8,9,10]]
result = 
for d in data:
d = set(d)
matched = [d]
unmatched = 
# first divide into matching and non-matching groups
for g in result:
if d & g:
# then combine all matching groups into one group
# while leaving unmatched groups intact
result = unmatched + [set().union(*matched)]
# [set([5, 6, 7]), set([1, 2, 3, 4, 8, 9, 10])]
We start with no groups at all (
result = ). Then we take the first list from the data. We then check which of the existing groups intersect this list and which don't. Then we merge all of these matching groups along with the list (achieved by starting with
matched = [d]). We don't touch the non-matching groups (though maybe some of these will end up being merged in a later iteration). If you add a line
print(result) in each loop you should be able to see how it's built up.
The union of all the sets in
matched is computed by
set().union(*matched). For reference: