For binary classification, you should use a suitable loss function (--loss_function=logistic or --loss_function=hinge). The --binary switch just makes sure that the reported loss is the 0/1 loss (but you cannot optimize for 0/1 loss directly, the default loss function is --loss_function=squared). I recommend trying the --nn as one of the...

Values of features in Vowpal Wabbit can only be real numbers. If you have a categorical feature with n possible values you simply represent it as n binary features (so e.g. color=red is a name of a binary feature and its value is 1 by default). If you have a...

Your speculation is correct. When not using --save_resume with the first training on train1.vw, the model /tmp/weights does not contain learning rate and other state information (e.g. --adaptive is used by default in VW, so there is a learning rate for each feature). This may influence the quality of the...

For boosting use --boosting N (added recently, so use VW from GitHub). For bagging, use --bootstrap M. See Gradient boosting on Vowpal Wabbit. I don't see how recall and precision can be defined for classification into 3 classes. Let's assume for now, you have a standard binary classification (with two...

As you noticed, the first presentation contains an error/typo in the AdaGrad formula. The formula should be w_{i, t+1} := w_{i, t} - (\eta * g_{i, t} / \sqrt{sum}), where sum=\sum_{t'=1}^t g_{i, t'}^2. In VowpalWabbit, --adaptive (corresponding to the AdaGrad idea) is on by default. But --normalized and --invariant are...

vw -d train.dat --invert_hash traindModel No contextual bandit is specified here, so vw does a simple linear regression. How to interpet those results? See https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial#vws-diagnostic-information There are also 8 constant rows.. to what do they correspond? Contextual bandit in VW is implemented using a reduction to (in this case)...

If you want to predict probabilities, you should train with --loss_function=logistic and test with --link=logistic. The hinge loss (used in SVM) results in max-margin classifier, which is not suitable for predicting probabilities. Note that just using --loss_function=hinge does not make SVM from VW (there is no kernel). If you want...

The idea of gradient boosting is that an ensemble model is built from black-box weak models. You can surely use VW as the black box, but note that VW does not offer decision trees, which are the most popular choice for the black-box weak models in boosting. Boosting in general...

why it claims that i have 4 and 5 features respectively The extra space symbols at the end of lines are interpreted as extra features by http://hunch.net/~vw/validate.html. (Yes, the last line in your sample has two extra spaces.) Note that validate.html reports an empty name of the extra features:...

machine-learning,vowpalwabbit,precision-recall

Given that you have a pair of 'predicted vs actual' value for each example, you can use Rich Caruana's KDD perf utility to compute these (and many other) metrics. In the case of multi-class, you should simply consider every correctly classified case a success and every class-mismatch a failure to...

The reason your PATH=... setting doesn't work is you used a file-name (rather than a directory path) there. In your particular case, the correct setting is: export PATH=/Users/williamliu/GitHub/vowpal_wabbit/utl:$PATH (Please make sure that this is indeed the directory where the utilities reside.)...

cat input.data | perl -nale '$i=$m{$F[0]}; $i or $i=$m{$F[0]}=++$n; $F[0]=$i; print "@F"; END{warn "$_ $m{$_}\n" for sort {$m{$a}<=>$m{$b}} keys %m}' > output.data 2> mapping.txt ...

On systems supporting /dev/stdout (and /dev/stderr), you may try this: vw -t -i model.vw --daemon --port 26542 --link=logistic -r /dev/stdout The daemon will write raw predictions into standard output which in this case end up in the same place as localhost port 26542. The relative order of lines is guaranteed...

Yes, you are correct. This representation would definitely work with vowpal wabbit, but under some conditions, may not be optimal (it depends). To represent non-ordered, categorical variables (with discrete values), the standard vowpal wabbit trick is to use logical/boolean values for each possible (name, value) combination (e.g. person_is_good, color_blue, color_red)....

(2^k)-1 in binary is "k ones", e.g. (2^6)-1 = 111111(in binary). When you apply logical AND on the original hash number and (2^k)-1, you effectively take only the k lower-order bits of the hash. It is the same operation as mod 2^k.