I'm having a weird problem in training an SVM with an RBF kernel in Matlab. The issue is that, when doing a grid search, using 10-fold cross-validation, for the C and Sigma values I always get AUC values equal to approximately .50 (varying between .48 and .54 depending) -- I obtained this from:
[X,Y,T,AUC] = perfcurve(dSet1Label(test),label, 1); where
dSet1Label(test) are the actual test set labels, and
label are the predicted labels. The classifier only predicts the majority class, which constitutes just over 90% of the data.
Upon further investigation, when looking at the scores (obtained from
[label,score] = predict(svmStruct, dSet1(test,:)); where
svmStruct is a model trained on 9/10ths of the data and
dSet1(test,:) is the remaing 1/10th) they are all the same:
0.8323 -0.8323 0.8323 -0.8323 0.8323 -0.8323 0.8323 -0.8323 0.8323 -0.8323 0.8323 -0.8323 0.8323 -0.8323 0.8323 -0.8323 0.8323 -0.8323 . . . . . . 0.8323 -0.8323
The data consists of 443 features and 6,453 instances, 542 of which are of the positive class. The features have been scaled to a range of
[0,1], per standard SVM protocol. The classes are represented by
My code is as follows:
load('datafile.m'); boxVals = [1,2,5,10,20,50,100,200,500,1000]; rbfVals = [.0001,.01,.1,1,2,3,5,10,20]; [m,n] = size(dataset1); [c,v] = size(boxVals); [e,r] = size(rbfVals); auc_holder = ; accuracy_holder = ; for i = 1:v curBox = boxVals(i) for j = 1:r curRBF = rbfVals(j) valInd = crossvalind('Kfold', m, 10); temp_auc = ; temp_acc = ; cp = classperf(dSet1Label); for k = 1:10 test = (valInd==k); train = ~test; svmStruct = fitcsvm(dSet1(train,:), dSet1Label(train), 'KernelFunction', 'rbf', 'BoxConstraint', curBox, 'KernelScale', curRBF); [label,score] = predict(svmStruct, dSet1(test,:)); accuracy = sum(dSet1Label(test) == label)/numel(dSet1Label(test)); [X,Y,T,AUC] = perfcurve(dSet1Label(test),label, 1); temp_auc = [temp_auc AUC]; temp_acc = [temp_acc accuracy]; end avg_auc = mean(temp_auc); avg_acc = mean(temp_acc); auc_holder = [auc_holder avg_auc]; accuracy_holder = [accuracy_holder avg_acc]; end end
*Edit 1: It appears that, no matter what I set the box constraint to, all data points are considered support vectors.