I have written a function to calculate entropy of a vector where each element represents number of elements of a class.
function x = Entropy(a) t = sum(a); t = repmat(t, [1, size(a, 2)]); x = sum(-a./t .* log2(a./t)); end
a = [4 0], then
entropy = -(0/4)*log2(0/4) - (4/4)*log2(4/4)
But for above function, the entropy is
NaN when the split is pure because of
log2(0), as in above example. The entropy of pure split should be zero.
How should I solve the problem with least effect on performance as data is very large? Thanks