I have a code written in Matlab that uses 'intersect' to find the vectors (and their indices) that intersect in two large matrices. I found that 'intersect' is the slowest line (by a large difference) in my code. Unfortunately I couldn't find a faster alternative so far.

As an example running the code below takes approx 5 seconds on my pc:

```
profile on
for i = 1 : 500
a = rand(10000,5);
b = rand(10000,5);
[intersectVectors, ind_a, ind_b] = intersect(a,b,'rows');
end
profile viewer
```

I was wondering if there is a faster way. Note that the matrices (a) and (b) have 5 columns. The number of rows don't necessary have to be the same for the two matrices.

Any help would be great. Thanks

# Best How To :

## Discussion and solution codes

You can use an approach that leverages `fast matrix multiplication in MATLAB`

to convert those `5`

columns of input arrays into one column by considering each column as a significant "digit" of a single number. Thus, you would end up with an array with only column and then, you can use `intersect`

or `ismember`

without `'rows'`

and that must speedup the codes in a big way!

Here are the promised implementations as function codes for easy usage -

**intersectrows_fast_v1.m:**

```
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v1(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use intersect without 'rows' option for a good speedup
[~, ind_a, ind_b] = intersect(acol1,bcol1);
intersectVectors = a(ind_a,:);
return;
```

**intersectrows_fast_v2.m:**

```
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v2(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use ismember to get indices of the common elements
[match_a,idx_b] = ismember(acol1,bcol1);
%// Now, with ismember, duplicate items are not taken care of automatically as
%// are done with intersect. So, we need to find the duplicate items and
%// remove those from the outputs of ismember
[~,a_sorted_ind] = sort(acol1);
a_rm_ind =a_sorted_ind([false;diff(sort(acol1))==0]); %//indices to be removed
match_a(a_rm_ind)=0;
intersectVectors = a(match_a,:);
ind_a = find(match_a);
ind_b = idx_b(match_a);
return;
```

## Quick tests and conclusions

With the datasizes listed in the question, the runtimes were -

```
-------------------------- With original approach
Elapsed time is 3.885792 seconds.
-------------------------- With Proposed approach - Version - I
Elapsed time is 0.581123 seconds.
-------------------------- With Proposed approach - Version - II
Elapsed time is 0.963409 seconds.
```

The results seem to suggest a big advantage in favour of the `version - I`

of the two proposed approaches with a whooping speedup of around `6.7x`

over the original approach!!

Also, please note that if you don't need any one or two of the three outputs from the original `intersect`

with 'rows' based approach, then both the proposed approaches could be further shortened for better runtime performances!