I have a table with two columns of interest,
bucket_id. There are a fixed number of values for
bucket_id and I'm okay with listing them out if I need to.
item_id can appear multiple times, but each occurrence will have a separate
bucket_id value. For example, the
123 can appear twice in the table, once under
A, once under
My goal is to determine how much overlap exists between each pair of
bucket_id values and display it as an N-by-N matrix.
For example, consider the following small example table:
item_id bucket_id ========= =========== 111 A 111 B 111 C 222 B 222 D 333 A 333 C 444 C
So for this dataset, buckets
B have one
item_id in common, buckets
D have no items in common, etc.
I would like to get the above table formatted into something like the following:
A B C D =================================== A 2 1 2 0 B 1 2 1 1 C 2 1 3 0 D 0 1 0 1
In the above table, the intersect of a row and column tells you how many records exist in both
bucket_id values. For example, where the
A row intersects the
C column we have a
2, because there are 2 records that exist in both
bucket_id A and C. Because the intersection of X and Y is the same as the intersection of Y and X, the above table is mirrored across the diagonal.
I imagine the query involves a
PIVOT, but I can't for the life of me figure out how to get it working.