machine-learning,amazon,prediction,ibm-watson,predictionio

No, prediction does not only run on numerical fields. It could be anything including text. My guess is that the MovieLens data uses ID instead of actual user and movie names because this saves storage space (this dataset is there for a long time and back then storage is definitely...

You are doing Machine Learning and in Machine Learning you never use the training data to evaluate your model. To answer your question, whether you are overfitting, or whether this is normal: If you don't split your dataset into basically training and test, you will be overfitting. First step: Split...

From the manual page of predict.loess: When the fit was made using surface = "interpolate" (the default), predict.loess will not extrapolate – so points outside an axis-aligned hypercube enclosing the original data will have missing (NA) predictions and standard errors If you change the surface parameter to "direct" you can...

Since I got the answer from the segmented package maintainer, I decided to share it here. First, up-date the package to version 0.3-1.0 by install.packages("segmented",type="source") After updating, running the same commands leads to: > Y<-c(13,21,12,11,16,9,7,5,8,8) > X<-c(74,81,80,79,89,96,69,88,53,72) > age<-c(50.45194,54.89382,46.52569,44.84934,53.25541,60.16029,50.33870, + 51.44643,38.20279,59.76469) > dat=data.frame(Y=Y,off.set.term=log(X),age=age) >...

sas,probability,prediction,logistic-regression

2 ways to get predicted values: 1. Using Score method in proc logistic 2. Adding the data to the original data set, minus the response variable and getting the prediction in the output dataset. Both are illustrated in the code below: *Create an dataset with the values you want predictions...

java,plugins,prediction,collaborative-filtering,lenskit

First, the traditional way to do this is via cross-validation, where you do a robust randomized splitting of the data into training data and test data. The LensKit Evaluator supports doing this. The Quick Start descries how to get started; also there is a quick start that includes current best...

Here's a quick hack of a class of linear functions. I'm fairly sure something better must exist somewhere... But anyway: linear <- function(betas){ betas = matrix(betas, ncol=1) ret = list( pred = function(z){ (cbind(1,z) %*% betas)[,1] } ) class(ret)="linear" ret } predict.linear <- function(object, newdata, ...){ object$pred(newdata) } Then you...

clear all program define press, rclass syntax varlist(fv) [if] [in] /// [fweight aweight pweight iweight] , /// [nodots] gettoken y x : varlist marksample touse preserve quietly keep if `touse' if "`weight'" != "" { local wgt "[`weight'`exp']" } tempvar pred temp prs quietly gen double `pred' = . if...

neural-network,time-series,prediction

Verba Docent, Exempla Trahunt Solution of Problem #1 While no one may constrain you from constructing a super-NN to train, cross-validate & forward-test/evaluate all of A, B, C & D at once / in parallel. While NN-s are forgiving in terms of (in)dependence of A,B,C,D ( as opposed to some...

Trivial Training-validation-test Create two datasets from your labelled instances. One will be training set and the other will be validation set. The training set will contain about 60% of the labelled data and the validation will contain 40% of the labelled data. There is no hard and fast rule for...

r,prediction,logistic-regression

Do you have NA in your variables? If so, you'll get NA for predict value.

image-processing,classification,bayesian,prediction,matlab-cvst

There are many possible approaches to a problem like this. One common method is the bag-of-features model. Take a look at this example using the Computer Vision System Toolbox in MATLAB.

This is a problem of using different names between your data and your newdata and not a problem between using vectors or dataframes. When you fit a model with the lm function and then use predict to make predictions, predict tries to find the same names on your newdata. In...

java,mongodb,prediction,recommendation-engine,predictionio

The admin GUI is removed during the re-architect of 0.8.x. However, you can accomplish all the admin tasks through command line http://docs.prediction.io/resources/command-line/ -Isabelle ...

r,statistics,prediction,lm,predict

There are ways to transform your response variable, G in this occasion but there needs to be a good reason to do this. For example, if you want the output to be probabilities between 0 and 1 and your response variable is binary (0,1) then you need a logistic regression....

r,maps,prediction,cross-validation,maxent

Sorry to be the bearer of bad news, but based on the source code, it looks like Dismo's predict function does not have the ability to generate a summary map. Nitty-gritty details for those who care: When you call maxent with replicates set to something greater than 1, the maxent...

I can answer only the first part of your question. how can I attach the prediction results to the data frame To do this you can use the cbind function: Considering the results of your predictions: predict(rf) Turn them into a data frame predResults <- data.frame(predict(rf)) And update your original...

python,machine-learning,scikit-learn,probability,prediction

Per the SVC documentation, it looks like you need to change how you construct the SVC: model = SVC(probability=True) and then use the predict_proba method: class_probabilities = model.predict_proba(sub_main) ...

python,numpy,prediction,kalman-filter

The 2D generalization of the 1-sigma interval is the confidence ellipse which is characterized by the equation (x-mx).T P^{-1}.(x-mx)==1, with x being the parameter 2D-Vector, mx the 2D mean or ellipse center and P^{-1} the inverse covariance matrix. See this answer on how to draw one. Like the sigma-intervals the...

sql-server,prolog,prediction,swi-prolog,expert-system

Yes- you can train a classifier using a machine learning algorithm. Algorithms which work well in prolog are ones that make rule models. For example a decision tree or a rule learner such as ripper. http://www.amazon.co.uk/Programming-Artificial-Intelligence-International-Computer/dp/0321417461 chapter 18 is a good start. There is a LOT of literature on the...

performance,regression,prediction,forecasting

Yes - I would use linear regression as a starting point. For an example, see How can I predict memory usage and time based on historical values. I found Data Analysis Using Regression and Multilevel/Hierarchical Models to be s highly readable introduction to the subject (you probably won't need multilevel...

predict(LinearModel.1, data.frame(Xt = seq(0, 100, 10)), interval = "confidence") ...

statistics,genetic-algorithm,prediction,generalization

The problem with over-fitting is that, within a single data-set it's pretty challenging to tell over-fitting apart from actually getting better in the general case. In many ways, this is more of an art than a science, but here are some general guidelines: A GA will learn to do exactly...

r,statistics,probability,prediction,calibration

The warning is telling you that predict.gam doesn't recognize the value you passed to the type parameter. Since it didn't understand, it decided to use the default value of type, which is "terms". Note that predict.gam with type="terms" returns information about the model terms, not probabilties. Hence the output values...

You should add the errors in quadrature, see for instance this link. For instance: total_error = sqrt(sig_1^2 + sig_2^2 + sig_3^2 ...) ...

r,ggplot2,prediction,confidence-interval,holtwinters

Typically you call confidence intervals for predictions "prediction intervals". The predict.HoltWinters function will give those to you if you ask for them with prediction.interval=T. So you can do pred <- predict(hw, n.ahead = 10, prediction.interval = TRUE) Now this will change the shape of the values returned. Rather than a...

Your original data does not have any cases where ApacheData$daily == 2. The lm object has no coefficient associated with it, so it throws an error.

machine-learning,simulator,prediction

I would suggest R language and forecast package. Please check this SO question as well. It also has some nice graphs features implemented. Here's more info on time series forecasting....

r,machine-learning,glm,prediction,random-forest

You need to specify type='response' for this to happen: Check this example: y <- rep(c(0,1),c(100,100)) x <- runif(200) df <- data.frame(y,x) fitgbm <- gbm(y ~ x, data=df, distribution = "bernoulli", n.trees = 100) predgbm <- predict(fitgbm, df, n.trees=100, type='response') Too simplistic but look at the summary of predgbm: > summary(predgbm)...

python,time-series,scikit-learn,regression,prediction

Here is my guess about what is happening in your two types of results: .days does not convert your index into a form that repeats itself between your train and test samples. So it becomes a unique value for every date in your dataset. As a consequence your models either...

php,prediction,recommendation-engine

Here's the outline of the simple algorithm I mentioned in the comments. Let's say a user's sliders are: cost=2.3, usability=2.1, functionality=4 You can construct a SQL query that will try to minimise the "total distance" of these values from values in your table. Pseudo-SQL-code: SELECT (cost - 2.3)^2 + (usability...

matlab,machine-learning,sequence,prediction,hidden-markov-models

From what I understand, I'm assuming you're training 200 different classes (HMMs) and each class has 500 training examples (observation sequences). O is the dimensionality of vectors, seems to be correct. There is no need to have a fixed T, it depends on the observation sequences you have. M is...