python,machine-learning,scikit-learn,svm

This is a problem of the SVC using a One-vs-One strategy, and therefore the decision function having shape (n_samples, n_classes * (n_classes - 1) / 2). A possible workaround would be do to CallibratedClassifierCV(OneVsRestClassifier(SVC())). If you want to use sigmoidal calibration, you can also do SVC(probability=True) and not use CallibratedClassifierCV....

c++,opencv,machine-learning,svm,opencv3.0

with opencv3.0, it's definitely different , but not difficult: Ptr<ml::SVM> svm = ml::SVM::create(); // edit: the params struct got removed, // we use setter/getter now: svm->setType(ml::SVM::C_SVC); svm->setKernel(ml::SVM::POLY); svm->setGamma(3); Mat trainData; // one row per feature Mat labels; svm->train( trainData , ml::ROW_SAMPLE , labels ); // ... Mat query; // input,...

java,machine-learning,svm,encog

In Encog SVM is just a classification or regression model and can be used mostly interchangably with other model types. I modified the Hello World XOR example to use it, you can see the results below. This is a decent intro to them: http://webdoc.nyumc.org/nyumc/files/chibi/user-content/Final.pdf This is a more basic intro...

why it claims that i have 4 and 5 features respectively The extra space symbols at the end of lines are interpreted as extra features by http://hunch.net/~vw/validate.html. (Yes, the last line in your sample has two extra spaces.) Note that validate.html reports an empty name of the extra features:...

machine-learning,weka,svm,libsvm

Yes, the default kernel is RBF with gamma equal to 1/k. See other defaults in javadocs here or here. NB: Weka contains its own implementation - SMO, but it also provides wrapper for libsvm, and "LibSVM runs faster than SMO" (note that it requires installed libsvm, see docs)....

KMeans().predict(X) ..docs here Predict the closest cluster each sample in X belongs to. In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. Parameters: (New data to predict) X : {array-like, sparse...

machine-learning,svm,feature-extraction,feature-selection

You can transform the different time series to live in the same coordinate system by solving the orthogonal Procrustes problem. Here are the five arrays of Euler angles that you gave me (they are stored in arr[0] through arr[4] as 169x3 numpy arrays): Now we solve the orthogonal Procrustes problem...

Your basic setup seems correct especially given that you are getting 85-95% accuracy. Now, it's just a matter of tuning your procedure. Unfortunately, there is no way to do this other than testing a variety of parameters examining the results and repeating. I going to break this answer into two...

r,machine-learning,statistics,svm

From documentation: For the x argument: a data matrix, a vector, or a sparse matrix (object of class Matrix provided by the Matrix package,or of class matrix.csr provided by the SparseM package, or of class simple_triplet_matrix provided by the slam package). For the y argument: a response vector with one...

The reason you're not getting output predictions is that you are calling svmpredict incorrectly. There are two ways to call it: [predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model, 'libsvm_options') [predicted_label] = svmpredict(testing_label_vector, testing_instance_matrix, model, 'libsvm_options' With the output of one argument and of 3, but not 2. So to fix...

matlab,machine-learning,svm,cross-validation

You can use kfoldpredict for this purpose. It operates on the ClassificationPartitionedModel class. You can retrieve the classification loss using the allied kfoldloss function.

matlab,order,svm,training-data

No, it does not depends on the order of the data but it depends on the number of training samples. If you want to experiment, you may try with different method and kernel_function.

debugging,machine-learning,neural-network,svm,hidden-markov-models

What you refer to as "debugging" is known as optimizing in the machine learning community. While there are certain ways to optimize a classifier depending on the classifier and the problem, there is no standard way for this. For example, in a text classification problem you might find out through...

There are some papers that show some ways to do it: Financial time series forecasting using support vector machine Using Support Vector Machines in Financial Time Series Forecasting Financial Forecasting Using Support Vector Machines I really recommend that you go through the existent literature, but just for fun I will...

you need float data (and integer labels) 1 row per feature, 1 label per row. float f1,f2; for (int i=(0+(68*count_FOR)); i<(num_landCKplus+(68*count_FOR)); i++) { fin_land >> f1; fin_land >> f1; trainData.push_back(f1); // pushing the 1st thing will determine the type of trainData trainData.push_back(f2); } trainData = trainData.reshape(1, numItems); SVM.train(trainData, trainLabels,...

reducing a small set is a bad idea. keep all samples. if the classes are separable everything is fine. if not you can use the 'weight' feature to boost classes with little representation.

machine-learning,kernel,svm,ranking

This is known as grid search. I don't know if you're familiar with python and scikit-learn, but either way, I think their description and examples are very good and language agnostic. Basically, you specify some values you're interested in for each parameter (or an interval from which to take random...

Say you have a vector of x-coordinates X, and y-coordinates Y, and an indicator vector k of 1's and -1's, you could do plot(X(k>0),Y(k>0),'b',X(k<0),Y(k<0),'g') which uses logical indexing to pick out the elements with k=1 and k=-1 separately, or use scatter and use the k vector to colour the points....

To train a SVM you would need a matrix X with your features and a vector y with your labels. It should look like this for 3 images and two features: >>> from sklearn import svm >>> X = [[0, 0], <- negative 0 [1, 3], <- positive 1 2,...

matlab,validation,svm,cross-validation

First they produce cross-validated datasets. Then they train 5 models (one for each fold) and repeatedly train-test. You can do this as follows: % I assume use of LIBSVM for svm training-testing in this code snippet % Create a random data data=1+2*randn(1000,10); labels=randi(12,[1000,1]); %do 5-fold cross-validation ind=crossvalind('Kfold',labels,5); for i=1:5 %...

It is imposible to calculate the margin wit only given optimal decision plane. You should give the support vectors or at least samples of classes. Anyway, you can follow this steps: 1- Calculate Lagrange Multipliers (alphas) I don' t know which environment you work on but you can use Quadratic...

amazon-web-services,machine-learning,apache-spark,svm,mllib

Summary / TL;DR: The hardcoded methods for SVMWithSGD are: private val gradient = new HingeGradient() private val updater new SquaredL2Updater() Since these are hard-coded - you can not configure them the way you are used to in R. Details: At the "bare metal" level the mllib SVMWithSGD supports the following...

python,machine-learning,scikit-learn,svm

They simply don't exist for kernels that are not linear: The kernel SVM is solved in the dual space, so in general you only have access to the dual coefficients. In the linear case this can be translated to primal feature space coefficients. In the general case these coefficients would...

matlab,machine-learning,classification,svm,libsvm

The output of an svm are not probabilities! The score's sign indicates whether it belongs to class A or class B. And if the score is 1 or -1 it is on the margin, although that is not particularly useful to know. If you really need probabilities, you can convert...

image-processing,machine-learning,svm,feature-detection,feature-extraction

In regard solely to the difference in scales: this seems relatively straightforward. Set up classifier(s) based on the characteristics of the images in terms of the shapes/ pixel densities/ arrangements of artifacts. Those aspects are scale-independent. You may also wish to introduce rotation and shear invariant capabilities into the svm.

opencv,machine-learning,svm,training-data

There can't be a broad definitive answer to your question because (as always) it's depending on the specifics of the application scenario. So tl;dr version: Analyze your problem. 3 methods come to mind to gather evidence that your training set is useful: Using unsupervised techniques on your training data to...

I found a solution. I used the operator "Polynominal by Bionominal Classification". This operator training a model with 3 class using SVM. here an example: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <process version="5.3.015"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="5.3.015"...

python,properties,classification,svm,simplecv

Did you try to include nu in the parameter list?: classifier = SVMClassifier(feature_extractors,{'KernelType':'Linear','SVMType':'C','nu':None})? ...

scikit-learn,svm,feature-selection

I think LinearSVC() does returns features with non-zero coefficients. Could you please upload the sample data file and code script (for example, via dropbox sharelink) that can reproduce the inconsistency you saw? from sklearn.datasets import make_classification from sklearn.datasets import load_svmlight_file from sklearn.svm import LinearSVC import numpy as np X, y...

Add another level (but not data) to geslacht. x <- factor(c("A", "A"), levels = c("A", "B")) x [1] A A Levels: A B or x <- factor(c("A", "A")) levels(x) <- c("A", "B") x [1] A A Levels: A B ...

algorithm,machine-learning,signal-processing,classification,svm

The key to evaluating your heuristic is to develop a model of the behaviour of the system. For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time? What is the model for the sensor output?...

Not every parameter has an exact equivalent when porting from LibSVM in matlab to OpenCV SVM. The term criteria is one of them. Keep in mind that the SVM of opencv might have some bugs depending on the version you use (not an issue with the latest version). You should...

python,scipy,nlp,scikit-learn,svm

LinearSVC.fit and its predict method can both handle a sparse matrix as the first argument, so just removing the toarray calls from your code should work. All estimators that take sparse inputs are documented as doing so. E.g., the docstring for LinearSVC states: Parameters ---------- X : {array-like, sparse matrix},...

Your conversion code doesn't seems right. It should be something like: Mat eyes_train_data; eyes_train_data.convertTo(eyes_train_data, CV_32FC1); What's the type of Eyes.features? It seems that it should be already a Mat1f. However, are you sure that features.push_back works as expected? It seems that push_back needs a const Mat& m. You can get...

machine-learning,classification,svm,libsvm

In the case of C-SVM, you should use a linear kernel and a very large C value (or nu = 0.999... for nu-SVM). If you still have slacks with this setting, probably your data is not linearly separable. Quick explanation: the C-SVM optimization function tries to find the hyperplane having...

computer-vision,neural-network,svm,image-recognition

Do you think such a task is achievable? Yes What would be a suitable image classifier algorithm? You need training a neuronal network or something like thath with the pattern of your images ( http://upload.wikimedia.org/wikipedia/commons/thumb/0/09/Mallampati.svg/220px-Mallampati.svg.png ). Would you think of any special requirement for the input picture, such as...

python,machine-learning,scikit-learn,svm

You need to give the predict function the kernel between the test data and the training data. The easiest way for that is to give a callable to the kernel parameter kernel=chi2_kernel. Using K_test = chi2_kernel(X_test_scaled) will not work.It needs to be K_test = chi2_kernel(X_test_scaled, X_train_scaled) ...

Image classification can be quite general. In order to define good features, first you need to be clear what kind of output you want. For example, images can be categorized according to the scenes in them into nature view, city view, indoor view etc. Different kind of classifications may required...

You have to first train a support vector machine classifier using fitcsvm, with standardization of predictors set to true, as input to your CompactClassificationSVM. The syntax is mySVMModel = fitcsvm(X,Y,'Standardize',true) where X is your vector of predictors, and Y your vector of class labels. Standardization is set to false by...

gnuplot,classification,svm,libsvm

Replace your colours with numerical indices, e.g., like this: 5.1 3.5 1.4 0.2 0 4.9 3 1.4 0.2 0 7 3.2 4.7 1.4 1 6.4 3.2 4.5 1.5 1 7.1 3 5.9 2.1 2 6.3 2.9 5.6 1.8 2 A simple search-and-replace script should be able to do this for...

python,machine-learning,scikit-learn,svm

Because your y_train is (301, 1) and not (301,) numpy does broadcasting, so (y_train == model.predict(X_train)).shape == (301, 301) which is not what you intended. The correct version of your code would be np.mean(y_train.ravel() == model.predict(X_train)) which will give the same result as model.score(X_train, y_train) ...

python,numpy,pandas,scikit-learn,svm

The error here is in the df you pass as your labels: y_trainN if you compare against the sample docs version and your code: In [40]: n_samples, n_features = 10, 5 np.random.seed(0) y = np.random.randn(n_samples) print(y) y_trainN.values [ 1.76405235 0.40015721 0.97873798 2.2408932 1.86755799 -0.97727788 0.95008842 -0.15135721 -0.10321885 0.4105985 ] Out[40]:...

opencv,machine-learning,svm,libsvm

1) Length of features does not matter per se, what matters is predictive quality of features 2) No, it does not depend on number of samples, but it depends on number of features (prediction is generally very fast) 3) Normalization is required if features are in very different ranges of...

python,machine-learning,svm,libsvm

Here's a step-by-step guide for how to train an SVM using your data and then evaluate using the same dataset. It's also available at http://nbviewer.ipython.org/gist/anonymous/2cf3b993aab10bf26d5f. At the url you can also see the output of the intermediate data and the resulting accuracy (it's an iPython notebook) Step 0: Install dependencies...

machine-learning,computer-vision,svm

It looks like in the paper they needed 1.7 GB of RAM to train a classifier. To do that they had to load about 14000 of 64x128 RGB image patches. Which ends up about 1.5 GB when they are stored using integers. Once classifier is computed you right, only one...

python,c++,opencv,image-processing,svm

You can use the OpenCV function calcHist to compute histograms. calcHist(&bgr_planes[0], 1, 0, Mat(), b_hist, 1, &histSize, &histRange, uniform, accumulate ); where, &bgr_planes[0]: The source array(s) 1: The number of source arrays 0: The channel (dim) to be measured. In this case it is just the intensity so we just...

r,svm,ranking,feature-selection

Ok, I got the explanation because of help provided in comments by @BondedDust. The sparseM doesn't store the 0 value cells and hence the features having 0 weights are not stored here. ra: Object of class numeric, a real array of nnz elements containing the **non-zero elements** of A. ja:...

As the error message say, you need a numeric vector or ordered factor in lr.pred. The problem here is that predict (for the svm) returns the predicted class, making the ROC exercise pretty much useless. What you need is to get an internal score, like the class probabilities: lr.pred <-...

opencv,dataset,computer-vision,svm,training-data

Optimal size of images is that you can easily classify object by yourself. Yes, classifiers works better after normalization, there are options. Most popular ways is center dataset (subtract mean) and normalize range of values say in [-1:1] range. Other popular way of normalization is similar to previous but...

python,machine-learning,scikit-learn,svm

Take a look at the related links referenced in the docs: Public datasets in svmlight / libsvm format: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ Faster API-compatible implementation: https://github.com/mblondel/svmlight-loader If you click on the first link you'll find example data sets such as this one: -1 3:1 11:1 14:1 19:1 39:1 42:1 55:1 64:1 67:1 73:1...

The cost parameter penalizes large residuals. So a larger cost will result in a more flexible model with fewer misclassifications. In effect the cost parameter allows you to adjust the bias/variance trade-off. The greater the cost parameter, the more variance in the model and the less bias. So the answer...

Mapping to feature space requires you to have a weight for each of the distinct feature that determine the classes of your input. Getting the weight is a function of clearly understood the theoretical basis of your project e.g Your financial worth is determined by Money in bank and Investment....

matlab,machine-learning,classification,svm,libsvm

If you want to use liblinear for multi class classification, you can use one vs all technique. For more information Look at this. But if you have large database then use of SVM is not recommended. As Run time complexity of SVM is O(N * N * m) N =...

python,machine-learning,numbers,scikit-learn,svm

From the doc: The dataset generation functions and the svmlight loader share a simplistic interface, returning a tuple (X, y) consisting of a n_samples x n_features numpy array X and an array of length n_samples containing the targets y. So you have to construct two numpy arrays: first one with...

opencv,svm,surf,object-detection

First of all, using same parameters from an existing project doesn't prove that you are using correct parameters. In fact, in my opinion it is a completely nonsense approach (no offense). It is because, SVM parameters are affected from dataset and decriptor extraction method directly. In order to get correct...

matlab,classification,svm,libsvm,vlfeat

There isn't a way to represent the data to the vl_svmtrain method other than the D x N matrix that it's talking about. However, what you can do is unroll the cell array and transform each feature matrix so that it becomes a column vector. You would then construct your...

In short - because every machine learning method depends on the representation. In particular, it is true that for any reasonable (able to learn linearly separable data) classifier there exists data representation which results in this classifier having 50% accuracy (random classifier, assuming that classes are balanced) and there exists...

machine-learning,classification,svm,svmlight

You probably want to look at the argument: class.weights (which is explained on the help page). Best David...

python,classification,nltk,svm,naivebayes

Following what superbly proposed about the features extraction you could use the tfidvectorizer in scikit library to extract the important words from the tweets. Using the default configuration, coupled with a simple LogisticRegression it gives me 0.8 accuracy.Hope that helps. Here is an example on how to use it for...

r,machine-learning,nlp,svm,text-mining

This isn't really a programming question, but anyway: If your goal is prediction, as opposed to text classification, usual methods are backoff models (Katz Backoff) and interpolation/smoothing, e.g. Kneser-Ney smoothing. More complicated models like Random Forests are AFAIK not absolutely necessary and may pose problems if you need to make...

python,scikit-learn,svm,cross-validation,multilabel-classification

If you intend to solve multilabel task with scikit-learn, it's advised to first transform your output to a label binary indicator using MultilabelBinarizer. Stratified k-fold doesn't support multilabel format as it's would require to balance the proportion of positive for each label. Instead, you can use a K-folds or shuffle...

python,scikit-learn,svm,libsvm,svc

So after a bit more digging and head scratching, I've figured it out. As I mentioned above z is a test datum that's been scaled. To scale it I had to extract .mean_ and .std_ attributes from the preprocessing.StandardScaler() object (after calling .fit() on my training data of course). I...

The grid_search.estimator that you are looking at it the unfitted pipeline. The classes_ attribute only exists after fitting, as the classifier needs to have seen y. What you want it the estimator that was trained using the best parameter settings, which is grid_search.best_estimator_. The following will work: clf = grid_search.best_estimator_.named_steps['svm']...

r,statistics,svm,predict,kernlab

From the documentation: argument scaled: A logical vector indicating the variables to be scaled. If scaled is of length 1, the value is recycled as many times as needed and all non-binary variables are scaled. Per default, data are scaled internally (both x and y variables) to zero mean and...

machine-learning,svm,libsvm,gate,svmlight

The problem is in the multiClassiﬁcation2Binary string. There is a single glyph ﬁ that contains two joined characters "fi" together. You probably copied the text from some pdf... Simply replace ﬁ by fi and the error should go away.

You are using a kernel-svm with rbf kernel without adjusting gamma or C. That rarely works. Also, rbf kernel SVMs are not really a good match for text data. Try LinearSVC.

Unfortunately, you can not do it directly with the current interface. One solution would be to use the library libsvm instead. You may do it in opencv, but it will require a little bit of work. First, you must know that OpenCV uses a "1-against-1" strategy for multi-class classification. For...

Setting the probability argument to TRUE for both model fitting and prediction returns, for each prediction, the vector of probabilities of belonging to each class of the response variable. These are stored in a matrix, as an attribute of the prediction object. For example: library(e1071) model <- svm(Species ~ .,...

machine-learning,svm,libsvm,deep-learning

In machine learning applications it is hard to say if an algorithm will improve the results or not because the results really depend on the data. There is no best algorithm. You should follow the steps given below: Analyze your data Apply the appropriate algorithms by the help of your...

c++,opencv,machine-learning,svm

Please check your training and test feature vector. I'm assuming your feature data is some form of cv::Mat containing features on each row. In which case you want your training matrix to be a concatenation of each feature matrix from each image. These line doesn't look right: trainingDataFloat[i][0] = trainFeatures.rows;...

svm,libsvm,n-gram,rapidminer,concept

You have to use the Support Vector Machine (LibSVM) Operator. In contrast to the classic SVM which only supports two class problems, the LibSVM implementation (http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf) supports multi-class classification as well as regression.

python,scikit-learn,svm,pickle

pickle.dumps doesn't take the file argument. pickle.dump does. The interpreter is assuming that both open('svm.p', 'wb') and protocol=pickle.HIGHEST_PROTOCOL are being passed in as the protocol version, based on the order of parameters in the method definition. use pickle.dump as that will write the svm.p file....

c++,opencv,image-processing,machine-learning,svm

well, the coarse idea would go like this: for each image make a (flat, 1d) feature vector from your stasm points, x,y,x,y,x,y,x,y ... (you might have to normalize them, like get the boundingrect and subtract the position, divide by size) (taking the diff to the base model / neutral might...

You have called the argument costs and not cost. Here's an example using the sample data in ?svm so you can try this: model <- svm(Species ~ ., data = iris, cost=.6) model$cost # [1] 0.6 model <- svm(Species ~ ., data = iris, costs=.6) model$cost # [1] 1 R...

matlab,svm,activity-recognition

Your question is a bit unclear about how you want to go about classifying human motion from your video. You have two options, Look at each frame in isolation. This would classify each frame separately. Basically, it would be a pose classifier Build a new feature that treats the data...

c#,machine-learning,svm,accord.net

I will give an example on how to perform sequence classification using the DynamicTimeWarping kernel combined with a Gaussian kernel, which should hopefully give better results. The first task in a sequence classification problem is to organize the sequences properly to feed the learning algorithms. Each sequence can be composed...

Best to look at the available literature first: http://www.pyoudeyer.com/emotionsIJHCS.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.467.7166&rep=rep1&type=pdf http://www.researchgate.net/profile/Theodoros_Iliou/publication/267698141_Classification_on_Speech_Emotion_Recognition_-_A_Comparative_Study/links/5519be060cf244e9a4584c07.pdf http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6531196...

c++,opencv,machine-learning,svm

SVM during training tries to find a separating hyperplane such that trainset examples lay on different sides. There could be many such hyperplanes (or none), so to select the "best" we look for the one for which total distance from all classes are maximized. Indeed, the further away from the...

There is a load function CvSVM *SVM = new CvSVM; SVM->load("SVM.xml"); ...

I think this is what you want: library(e1071) data(iris) df <- iris df <- subset(df , Species=='setosa') #choose only one of the classes x <- subset(df, select = -Species) #make x variables y <- df$Species #make y variable(dependent) model <- svm(x, y,type='one-classification') #train an one-classification model print(model) summary(model) #print summary...

Look at the return value for the function load in the help file: Value: A character vector of the names of objects created, invisibly. So "model" is indeed the expected value of M. Your svm has been restored under its original name, which is model. If you find it a...

Unless you have some implementation bug (test your code with synthetic, well separated data), the problem might lay in the class imbalance. This can be solved by adjusting the missclassification cost (See this discussion in CV). I'd use the cost parameter of fitcsvm to increase the missclassification cost of the...

machine-learning,nlp,scikit-learn,svm,confusion-matrix

Classification report must be straightforward - a report of P/R/F-Measure for each element in your test data. In Multiclass problems, it is not a good idea to read Precision/Recall and F-Measure over the whole data any imbalance would make you feel you've reached better results. That's where such reports help....

You measure accuracy by holding back some data (not using it for training) and measuring the performance of the model on that data.

The multiclass solution has one SVM per class, not two. So you have SVM(A) to SVM(Y). SVM(A) tries to separate A from B-Y, SVM(Y) tries to separate Y from A-X. A hierarchy can be a better solution. If the V and Y are similar, you can have an SVM(VY) first,...

image,matlab,image-processing,svm,imread

The code should output the warning: "Warning: Escape sequence '\U' is not valid. See 'help sprintf' for valid escape sequences. " You need to escape the \ when using sprintf. With yor code path is C:. For examples how proper escaping is done, please check the documentation for sprintf. Instead...

The framework automatically builds a kernel function cache to help speed up computations during SVM learning. However, there are cases that this cache may take too much memory and lead to OutOfMemoryExceptions. To make a balance between memory consumption and CPU speed, set the CacheSize property to a lower value....

I think response variable sprd_cross_dir should be factor: require(e1071) #dummy data usd28 <- structure(list(ask_sprd0 = c(5L, 65L, 65L, 10L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 10L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 25L, 10L, 25L, 25L, 10L, 10L, 25L), bid_sprd0 = c(5L, 5L, 5L, 15L, 5L,...

machine-learning,scikit-learn,svm

There are probably some implementation bugs as well, but in general I think hashing will cause your data to be way too spread out, so basic hashing will not let you accurately predict outliers in this case since everything will be so far away from everything else that, well, everything...

matlab,svm,fold,cross-validation

You can check wiki, In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. and The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation. So no worries about different error rates of randomly selecting folds. Of course...

python,numpy,machine-learning,scikit-learn,svm

Here is the problem: df.label.unique() Out[50]: array([ 5., 4., 3., 1., 2., nan]) The sample code: import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.svm import SVC # replace your own data file_path df = pd.read_csv('data1.csv', header=0) df[df.label.isnull()] Out[52]: id content label 900 Daewoo_DWD_M1051__Opinio... 5 NaN 1463 Indesit_IWC_5105_B_it__O... 1...

Probably found the answer. Question 1. What this tool does is: given sets of label/feature_parameters, chooses the most "efficient" and "minimum" feature_parameters by performing grid search. Am I correct? The answer is No. grid.py performs grid search and estimates the best cost and gamma value. So it helps making SVM...