python,logistic-regression,statsmodels

The result of the fit should have a method predict(). That is what you need to use to predict future values, for example: result = sm.Logit(outcomes, values).fit() result.predict([82,45,2]) ...

python,scikit-learn,logistic-regression,feature-selection

You should use the get_support function: from sklearn.datasets import load_iris from sklearn.linear_model import RandomizedLogisticRegression iris = load_iris() X, y = iris.data, iris.target clf = RandomizedLogisticRegression() clf.fit(X,y) print clf.get_support() #prints [False True True True] Alternatively, you can get the indices of the support features: print clf.get_support(indices=True) #prints [1 2 3] ...

r,parallel-processing,logistic-regression

I used speedglm and the results are very good: using glm it took me 14.5 seconds to get results and with speedglm it took me 1.5 sec. that a 90% improvement..the code is very simple: m <- speedglm(y ~ s1 + s2,da). Just don't forget to install and call the...

As @MrFlick commented, V1 is probably a factor. So, first you have to change it to numeric class. This just substitutes "%" for nothing and divides by 100, so you will have proportions as numeric class: vvv$V1<-as.numeric(sub("%","",vvv$V1))/100 Doing this, you can use your own code and you will have a...

stanford-nlp,logistic-regression

Yes; you can simply define a new feature (e.g., "bias" or "intercept"), and set the weight of that to be the intercept value from scikit-learn.

python,scikit-learn,logistic-regression

Is this what you are looking for? from sklearn.datasets import make_classification from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression X, y = make_classification(n_samples=1000, n_features=100, weights=[0.1, 0.9], random_state=0) X.shape # build pipe: first standardize by substracting mean and dividing std # next do classificaiton pipe = make_pipeline(StandardScaler(),...

r,missing-data,logistic-regression

If you have missing data, NAs, R will strip these when the modelling functions does formula -> model.frame -> model.matrix() etc., because the default in all these functions is to have na.action = na.omit. In other words, rows with NAs are deleted before the actual computations are performed. This deletion...

r,plot,ggplot2,logistic-regression

If you really want to loop, you could use lapply. p <- lapply(names(df)[-1], function(nm){ ggplot(df, aes_string(x="xvar", y=nm)) + geom_point() + stat_smooth(method="glm", family="binomial", se=TRUE) }) print(p) However, I suspect that reshaping your data and displaying all the graphs together may be better. # reshaping data require(reshape2) df.melt <- melt(df, id.var='xvar') #...

python-3.x,scikit-learn,logistic-regression

The problem you are facing is related to the fact that scikit learn is using regularized logistic regression. The regularization term allows for controlling the trade-off between the fit to the data and generalization to future unknown data. The parameter C is used to control the regularization, in your case:...

scikit-learn,cluster-analysis,logistic-regression

I would recommend clusters the log of the odds ratios for each of the explanatory variable. This way the models that don't have certain regressors you can fill in empty values with 0.0 (this can be done quite easily with pandas Assume you have a list of all the models...

I suppose that depends on exactly what you want to do with the model once you have it back into R. At one point i helped someone create a pseudo-gml object that knew the coefficients for the variables and could be used with predict(). Many other functions required the fill...

r,logistic-regression,graph-coloring

Does this do what you want ? plot(predict(reg),residuals(reg),col=ifelse(residuals(reg)<0,"blue","red")) I just test whether the residual is larger than 0 or not. The main idea is to create a vector of colour of the same length of your data....

You could simulate some data where you know the true effects ... ?simulate.merMod makes this relatively easy. In any case, the effects are interpreted in terms of their effect on the log-odds of a response of 1 e.g., a slope of 0.5 implies that a 1-unit increase in the predictor...

r,logistic-regression,survival-analysis,coefficients

Another approach can be with the use of the summary function. You can see that with summary the coefficients of the model are taken as a matrix. > is.matrix(summary(B)$coefficients) [1] TRUE At this point you can store summary(B)$coefficients in an object and then subset it as you wish. summary(B)$coefficients[1,1] ...

You can use the add = TRUE argument the plot function to plot multiple ROC curves. Make up some fake data library(pROC) a=rbinom(100, 1, 0.25) b=runif(100) c=rnorm(100) Get model fits fit1=glm(a~b+c, family='binomial') fit2=glm(a~c, family='binomial') Predict on the same data you trained the model with (or hold some out to test...

r,classification,logistic-regression,glmnet

Well no you cannot do that in the function glmnet but you can do that very easily just before you run the function using model.matrix: a <- factor( rep(c("cat1", "cat2", "cat3", "no-cat"),50) ) #make a factor levels(a) <- c("no-cat", "cat1", "cat2", "cat3") #change the order of the levels because #the...

Assuming you have class saved as a factor, use the relevel function: auth$class <- relevel(auth$class, ref="YES)...

python,scikit-learn,logistic-regression

You are looking for the partial_fit method. LogisticRegression does not support it. You can use MultinomialNB (or any other Naive Bayes) or SGDClassifier instead.

By default, proc logistic uses "effect coding" for classification variables. The parameters represent the difference between the class effect and the average effect across all classes. If you want to interpret the parameters on your class variable as dummy variables, you could use (param=ref ref='0') in your class statement.

d <- as.data.frame(matrix(runif(15000),ncol=10)) m <- model.matrix(~.^6,data=d) ncol(m) ## 848 However, this will not handle higher-order self-interaction terms (e.g. for continuous variables it will have x, y, and x*y, but not x^2 nor y^2), which for continuous variables are arguably necessary for a consistent model. . stands for "all variables in...

c#,math,regression,logistic-regression,exponential

Some general advice: use linear regression only as basic regression algorithm. Higher Order regressions (polynoms, splines) tend to produce information which is not really based by the data, especially if you have just a handful of data points If you want to model exponential or logarithmic data, then take the...

r,prediction,logistic-regression

Do you have NA in your variables? If so, you'll get NA for predict value.

r,logistic-regression,coefficients

Another solution I discovered is converting the results to a data frame, then extracting the row names as follows: > allpredsincld<-as.data.frame(summary(step1)$coefficients) > allpredsincld Estimate Std. Error z value Pr(>|z|) (Intercept) -7.998346 1.216048 -6.577327 4.789808e-11 i1 3.928425 0.695920 5.644939 1.652402e-08 then: > allpredsincld<-allpredsincld[-1,] > allpredsincld Estimate Std. Error z value Pr(>|z|)...

r,plot,logistic-regression,spline,cox-regression

If you want the Odds ratio then you need to add a fun=-argument to transform to the odds ratio scale: plot(Predict(fit,fun=exp), anova=an, pval=TRUE, ylab="Odds ratio") I'm not sure I know what you mean by changing to the "probability of mortality", and "mortality rate" for "fit". The inverse logit function is...

r,logistic-regression,posthoc,lsmeans

There must be a reason you want to adjust for only 6 comparisons, and I'm guessing is it is because you want to break down the comparisons you're doing conditionally on one of the factors. This is easy to do using lsmeans: lsmeans(logmixed_ranks[[i]], pairwise ~ rating_ranks | indicator_var, adjust =...

r,regression,stata,logistic-regression,standard-error

The default so-called "robust" standard errors in Stata correspond to what sandwich() from the package of the same name computes. The only difference is how the finite-sample adjustment is done. In the sandwich(...) function no finite-sample adjustment is done at all by default, i.e., the sandwich is divided by 1/n...

python,logistic-regression,theano

p_y_given_x and y_pred are symbolic variable (just python object from Theano). Those python variable that point to the Theano object do not get updated. They just represent the computation we want to do. Think like in pseudo-code. They will be used when compiling the Theano function. It is only then...

python,sample,weight,statsmodels,logistic-regression

Took me a while to work this out, but it is actually quite easy to create a logit model in statsmodels with weighted rows / multiple observations per row. Here's how's it's done: import statsmodels.api as sm logmodel=sm.GLM(trainingdata[['Successes', 'Failures']], trainingdata[['const', 'A', 'B', 'C', 'D']], family=sm.families.Binomial(sm.families.links.logit)).fit() ...

python,machine-learning,scikit-learn,svm,logistic-regression

Edit: I forgot that you were looking for sklearn solution, but I think that simple weighted moving average could be good start. I usually try to start with something simple and only if that does not give me desired results go to more complicated stuff. There are fancier approaches, but...

fmincg takes the handle of the objective function as the first argument, which in this case is a handle to lrCostFunction. If you go inside fmincg.m, you will find the following lines: argstr = ['feval(f, X']; % compose string used to call function %---Code will not enter the following loop---%...

r,logistic-regression,lme4,mixed-models

You could try one of a few different optimizers available through the nloptr and optimx packages. There's even an allFit function available through the afex package that tries them for you (just see the allFit helpfile). e.g: all_mod <- allFit(exist_model) That will let you check how stable your estimates are....

machine-learning,glm,logistic-regression

The main benefit of GLM over logistic regression is overfitting avoidance. GLM usually try to extract linearity between input variables and then avoid overfitting of your model. Overfitting means very good performance on training data and poor performance on test data.

stats::ksmooth gives you the Nadaraya–Watson kernel regression: with(cars, { plot(speed, dist>40) lines(ksmooth(speed, dist>40, "normal", bandwidth = 2), col = 2) lines(ksmooth(speed, dist>40, "normal", bandwidth = 6), col = 3) }) ...

The proper R syntax for such a formula is y~(V1+V2+V3)^2 For example set.seed(15) dd <- data.frame(V1=runif(50), V2=runif(50), V3=runif(50), y=runif(50)) lm(y~(V1+V2+V3)^2, dd) Call: lm(formula = y ~ (V1 + V2 + V3)^2, data = dd) Coefficients: (Intercept) V1 V2 V3 V1:V2 V1:V3 V2:V3 0.54169 -0.10030 -0.01226 -0.10150 0.38521 -0.03159 0.01200 Or,...

machine-learning,neural-network,logistic-regression

This is more a problem to do with the function being minimized than the method used, if finding the true global minimum is important, then use a method such a simulated annealing. This will be able to find the global minimum, but may take a very long time to do...

r,ggplot2,regression,logistic-regression

You're not going to be able to plot the "S" shaped curves that you get with logistic regression because you do not have a continuous variable to plot over. Instead you're only going to be able to plot predicted values and CIs around those predicted values. Create a column in...

summary(o)$coefficients[order(summary(o)$coefficients[,4]),] # Estimate Std. Error z value Pr(>|z|) #Var1 1.1931750 1.1774564 1.0133497 0.3108932 #Var3 -0.1085742 0.2252867 -0.4819379 0.6298501 #Var2 -0.4337253 1.2724925 -0.3408470 0.7332187 #(Intercept) 0.2177110 1.5984713 0.1361995 0.8916635 ...

scala,machine-learning,apache-spark,logistic-regression,mllib

What you're looking for is called one hot encoding. Spark's MLlib has a one hot encoder which can do this for you....

python,scikit-learn,logistic-regression

It's not really desired, but it's a known issue that is very hard to fix. The thing is that LogisticRegression models are trained with Liblinear, which does not allow setting its random seed in a completely robust way. When you explicitly set the random_state, a best effort is made to...

May be this helps data.frame(strata= rep(1:nrow(df1), each=2), outcome=c(t(df1[2:3]))) ...

machine-learning,svm,bayesian,logistic-regression

This is a trivial linear model, where you don't even fit the weights of the model, but instead use the constant values. Linear models make deficision using cl(x) = sgn(<w,x>+b) = sgn( SUM w_i x_i + b ) where x is your data point (x_i is ith feature). In your...

r,plot,ggplot2,logistic-regression

The logistf objects differ in their structure compared to glm objects, but not too much. I've added support for logistf-fitted models, however, 1) model summaries can't be printed and b) predicted probability plots are currently not supported with logistf-models. I'll update the code on GitHub tonight, so you can try...

sas,probability,prediction,logistic-regression

2 ways to get predicted values: 1. Using Score method in proc logistic 2. Adding the data to the original data set, minus the response variable and getting the prediction in the output dataset. Both are illustrated in the code below: *Create an dataset with the values you want predictions...

Simply: ff <- lrm( Target ~ var1 + , data = dat ,subset=var1>=quantile(var1, 0.07) & var1<=quantile(var1, 0.09)) ...

You don't need to compute p in your data set at all. Just let it be a logical node in your model. I prefer the R2jags interface, which allows you to specify a BUGS model in the form of an R function ... jagsdata <- data.frame(y=rbinom(10, 500, 0.2), n=sample(500:600, 10),...

For relevel you need to specify the level label exactly as it appear in the factor: glm( TargetVar ~ relevel(cut_Var3,"(3,6]"), data = dat) Call: glm(formula = TargetVar ~ relevel(cut_Var3, "(3,6]"), data = dat) Coefficients: (Intercept) relevel(cut_Var3, "(3,6]")(0,3] 0.75 -0.35 relevel(cut_Var3, "(3,6]")(6,9] -0.50 Degrees of Freedom: 12 Total (i.e. Null); 10...

r,loops,glm,logistic-regression

Rather than messing around with building a formula dynamically, i might suggest subsetting the columns of your data.frame and not bothering with building strings with pluses. #SAMPLE DATA train.data<-data.frame(class=sample(1:5, 50, replace=T), matrix(runif(50*12), ncol=12)) library(VGAM) varlist <- list("X2", c("X8","X2"), c("X8","X2","X11")) models <- lapply(varlist, function(x) { vglm(class ~ ., data = train.data[,...

python,scikit-learn,logistic-regression,coefficients

The feature names can be access from vect using the get_feature_names method. You can zip them to the coefficients like this for example: zip(vect.get_feature_names(),d.coef_[0]) This returns a tuple with (token, coefficient)...

If you are talking about the interpretation of the glm() output and remain on the log-odds scale than it is exactly analogous to how you would interpret the output from lm(). In both cases it is better to talk about predictions rather than trying to separately interpret the coefficients. When...

machine-learning,scikit-learn,classification,logistic-regression

Logistic regression chooses the class that has the biggest probability. In case of 2 classes, the threshold is 0.5: if P(Y=0) > 0.5 then obviously P(Y=0) > P(Y=1). The same stands for the multiclass setting: again, it chooses the class with the biggest probability (see e.g. Ng's lectures, the bottom...

python,r,logistic-regression,statsmodels

Not sure what your data manipulations are intending but they seem to be loosing information in the R run. If I keep all the rank information in, then I get this on the original data-object (and the results look very similar in the areas they overlap on. (Likelihoods are only...