lm(A ~ B + C, data = Sample_table, subset = D == 4) ...

There's no need to create a matrix. Stata has commands that facilitate the task. Try estimates store and estimates restore. An example: clear set more off sysuse auto // initial regression/predictions regress price weight estimates store myest predict double resid, residuals // second regression/prediction regress price mpg predict double residdiff,...

neural-network,regression,pybrain

Try to normalize the values(input and output) between (-1, +1).

python,r,machine-learning,scikit-learn,regression

The best_score_ is the best score from the cross-validation. That is, the model is fit on part of the training data, and the score is computed by predicting the rest of the training data. This is because you passed X_train and y_train to fit; the fit process thus does not...

python,numpy,scipy,time-series,regression

You need to distinguish a data timeline (input) and a fit timeline (output). Once you do that, the approach is fairly clear. Below I called them tdata and tfit: import numpy as np from scipy.optimize import curve_fit import matplotlib.pyplot as plt tdata = np.linspace(0, 10) timeSeries = np.sin(tdata) + .4*np.random.random(tdata.shape)...

r,packages,regression,decision-tree

CHAID package uses partykit (recursive partitioning) tree structures. You can walk the tree by using party nodes - a node can be terminal or have a list of nodes with information about decision rule (split) and fitted data. The code below walks the tree and creates the decision table. It...

In my tests, fitted(inputfit) returned the same thing (a named numeric vector) whether I ran lm on a ts object or an xts object. So I'm skeptical that a ts object is really being returned. Regardless, you can convert the output of fitted(inputfit) back to xts using as.xts: > head(as.xts(fitted(inputfit),...

No, you are not correct. You would be correct if you had done this: lm( yvar ~ xvar + as.numeric(xfac) +I(as.numeric(xfac)^2), data=dat) But that's not the same as what R does when it encounters such a situation. Whether or not the quadratic term will "weaken" the linear estimate really depends...

r,loops,variables,statistics,regression

The answer is quite straightforward. You're using the wrong structure to store your results. You need to use a list. Note the use of [[ vs [ ... lm_store <- list() while (counter<length(SP)) { lm_store[[counter-lookback]] <- lm(SP[(counter-lookback):counter]~TS[(counter-lookback):counter]); counter <- counter+1; } print(lm_store[[5]]) ...

In the car package you can use the 'scatterplotMatrix' function. Here's an example using the Prestige dataset in that package: library(car) scatterplotMatrix(~prestige +income +education + women, data= Prestige)...

Ok all, I have figured it out. The issue is that R does not like to compute R^2 values for data indexed by time. By regressing the data values against time, an error in difftime() occurs. I solved this by changing the index from time values to a standard integer...

I think that I have an answer, that was close to my try: results <- mtcars %>% group_by(cyl) %>% do({ mod = lm(mpg ~ wt + qsec, data = filter(., vs == 0)) print(mod) Pred <- predict(mod, .) data.frame(. , Pred) }) print(results, n=100) ...

It seems to me that coefplot(fitsur$eq[[1]]) might solve your problem. Here is the reproducible example: library(systemfit) library(coefplot) # this paragraph was borrowed from the systemfit manual data( "Kmenta" ) eqDemand <- consump ~ price + income eqSupply <- consump ~ price + farmPrice + trend system <- list( demand =...

It objects to one of your variables (as gvrocha said); you might have a factor with only one level, or a string. A tip to quickly track down the offending variable(s) is to do interval bisection and increase/decrease the col indices till you trigger the error. Best to use the...

r,optimization,regression,rscript

This seems to work fine: opt1 <- optim(startparam, fn=ls,method="L-BFGS-B", Val=Val,Hi=Hi,Di=Di, lower =c(20,10,0), upper =c(100,70,25)) note that the values of Val, Hi, Di get passed through optim to the objective function....

Here is a try: N = 100; % Number of points n = 10; % Number of x-bins % Define and plot points x = rand(N,1); y = x.*rand(N,1); scatter(x, y, '+'); % Define errorbar bin = linspace(min(x), max(x), n+1); ind = sum(bsxfun(@minus, x, ones(N,1)*bin)>=0,2); m = NaN(n,1); e =...

matlab,regression,numerical-methods,linear

Judging from the link you provided, and my understanding of your problem, you want to calculate the line of best fit for a set of data points. You also want to do this from first principles. This will require some basic Calculus as well as some linear algebra for solving...

It would be nice if you gave a reproducible example. I think you're looking for cc <- coef(summary(step1))[2,,drop=FALSE] as.data.frame(cc) Using accessors such as coef(summary(.)) rather than summary(.)$coefficients is both prettier and more robust (there is no guarantee that the internal structure of summary() will stay the same -- although admittedly...

r,regression,mathematical-optimization,linear-programming,minimization

Because Quw and ruw are both data, all constraints as well as the objective are linear in the yuw decision variables. As a result, this is a linear programming problem that can be approached by the lpSolve package. To abstract this out a bit, let's assume R=20 and C=10 describe...

r,loops,statistics,data.frame,regression

here is a quick rewrite of your code, this should give you what you are looking for. Assigning a value of each column is unnecessary since myData should be a data.frame, as such you can access each column with it's column name. rm(list=ls()) myData <-read.csv(file="C:/Users/Documents/myfile.csv",header=TRUE, sep=",") for(i in names(myData)) {...

Try this: fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1], lambda=cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.min) coef(fit) Or you can specify a specify a lambda value in coef: fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1]) coef(fit, s = cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.min) You need to pick a "best" lambda. In this case, I used lambda.min, but you could use cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.1se or any...

matlab,matrix,regression,numerical-methods

This sounds like a regression problem. Assuming that the unexplained errors in measurements are Gaussian distributed, you can find the parameters via least squares. Basically, you'd have to rewrite the equation so that you get this to the form of ma + nb + oc = p and then you...

c#,math,regression,logistic-regression,exponential

Some general advice: use linear regression only as basic regression algorithm. Higher Order regressions (polynoms, splines) tend to produce information which is not really based by the data, especially if you have just a handful of data points If you want to model exponential or logarithmic data, then take the...

The problem arises from you mixture of subsetting types here: df$target[which(df$snakes=='a'),] Once you use $ the output is no longer a data.frame, and the two parameter [ subsetting is no longer valid. You are better off compacting it to: sum(df[df$snakes=="a","target"]) [1] 23 As for your model, you can just create...

I had done something similar. You will need to modify the loop for your need. Let me know if you need help with that. vars=colnames(mydata)[-1] for (i in vars) { for (j in vars) { if (i != j) { factor= paste(i,j,sep='*')} lm.fit <- lm(paste("Sales ~", factor), data=mydata) print(summary(lm.fit)) }}...

You have two problems with the code as provided. To get the trendline to start in the correct place on the x-axis you simply must fix what I assume was a typo in your code: curve(qdata$intercept3 + 0*x, add=T, from=qdata$intecept3, to=max(f$age), col=2) should be written as: curve(qdata$intercept3 + 0*x, add=T,...

You can use mapply to iterate over several arguments. Keep in mind that you need to define all the other variables if you want to do this. Also, this isn't the most memory-efficient way, but it should work for smaller sizes of combinations. predictshrine<- function(rain,citysize,wonders,chief,children,deaths,crops) { 0*rain-399.8993+5*crops+50.4296*log(citysize)+ 4.5071*wonders*chief+.02301*children*deaths+1.806*children+ .10799*deaths-2.0755*wonders-.0878*children^2+.001062*children^3-...

What about the following using the optim? f <- function(p){ sum((y - (p[1]*x+p[2]*z+p[3])^p[4]+p[5])^2) } p <- optim(rep(.5, 5), f)$par p [1] 3.5539397 0.8423521 0.1872422 0.6287906 -0.1863242 So, a5 is -0.1863242. The fitted values look as follows: plot(seq_along(y), y) lines((p[1]*x+p[2]*z+p[3])^p[4]+p[5]) ...

G. Grothendieck and Brian Borchers at CrossValidated suggested the onls package, which is exactly what I was looking for. Thanks everyone, the help is much appreciated. see here for more info: http://www.r-bloggers.com/introducing-orthogonal-nonlinear-least-squares-regression-in-r/ Here's an example using the same data from above - this gives the same fitted parameters as the...

You could try to implement save/load the coefficients in res.params to a JSON file but the easiest way would be to use the native methods: res.save('results.pickle') Later you can do: import statsmodels.api as smf res = smf.load('results.pickle') res.predict(...) ...

The easiest way might be to manipulate the model matrix to remove the unwanted columns: xx <- model.matrix(y ~ 0 + x + x*f) omit <- grep("[:]fs[^45]", colnames(xx)) xx <- xx[, -omit] lm(y ~ 0 + xx) Output: Call: lm(formula = y ~ 0 + xx) Coefficients: xxx xxfs1 xxfs2...

You could add at the end of the loop: x = [1 2 3 4 5; 4 5 6 8 9; 8 7 6 3 1; 5 6 7 9 1; 6 4 2 9 6]; y = [10 30 24 35 40]'; one=[]; for ii=1:5 one = x(:,ii); [b_LS,...

r,loops,regression,lag,dummy-data

Maybe this can help #store your model model<-your_model #get the last pt observation last<-dato[nrows(dato$pt), c('pt', 'age')] years<-12/4 #create dummy t1<-rep(c(1,0,0,0) , years) t2<-rep(c(0,1,0,0) , years) t3<-rep(c(0,0,1,0) , years) t4<-rep(c(0,0,0,1) , years) #create pt observation pt<-c(last$pt, rep(NA, length(t1)-1 )) df<-data.frame(t1=t1,t2=t2,t3=t3,t4=t4,lag_pt=pt, age=last$age) df$predict<-NA for (i in 1:nrow(df) ) { df$predict[i]<-predict(model, data=df[i,]) if...

r,neural-network,regression,r-caret

train doesn't support multiple outcomes so the intended symbolic formula x + y resolves to a literal one adding x and y. Max...

python,scikit-learn,regression

Take a look at this question. It says how to obtain leaf's ID. Then you can use those clf.tree_.value and clf.tree.n_samples.

You need to make a proper formula from your character values. The easiest way in this case is the reformulate() function reformulate(y,x) # body ~ brain then you can use this in your lm() call lm(reformulate(y,x), data = Animals) # # Call: # lm(formula = reformulate(y, x), data = Animals)...

matlab,regression,curve-fitting,spline

I'm not sure how to display any equations on the graph, but you should be able to replicate the cubic spline interpolation using the commands spline and unmkpp. % returns the piecewise polynomial form of the cubic spline interpolant pp = spline(x,Y) % use unmkpp(pp) to get the piecewise polynomial...

This combines both options from my earlier posting (interactions and polynomial terms) in a hypothetical situation where the column names look like "X1", "X2", ...., "X30". You would take out the terms() call which is just in there to demonstrate that it was successful: terms( as.formula( paste(" ~ (", paste0("X",...

average,regression,effect,spss,coefficients

If you test an interaction hypothesis, you have to include a number of terms in your model. In this case, you would have to include: Base effect of price Base effects of brands (dummies) Interaction effects of brand dummies * price. Since you have 5 brands, you will have to...

You can do this with one line of code: apply(df[-1], 2, function(x) summary(lm(x ~ df$Year))$coef[1,c(1,4)]) L1 L2 L3 L4 Estimate -160.0660000 -382.2870000 136.4690000 106.9820000 Pr(>|t|) 0.6069965 0.3886881 0.7340981 0.7030296 ...

This isn't pure "dplyr", but rather, "dplyr" + "tidyr" + "data.table". Still, I think it should be pretty easily readable. library(data.table) library(dplyr) library(tidyr) mtcars %>% gather(var, val, cyl:carb) %>% as.data.table %>% .[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var] # var Estimate Std. Error # 1: cyl -2.87579014 0.322408883 #...

I'm not sure it makes sense to copy your covariates into a new list like that. Here's a way to loop over columns and to dynimcally build formulas dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) dat1 <- dat[-9,] #x.list not used fit...

python,matplotlib,scipy,regression,odr

import numpy as np import matplotlib.pyplot as plt fig = plt.figure() ax1 = fig.add_subplot(111) x = np.linspace(0,1E15,10) y = 1E-15*x-2 ax1.set_xlim(-0.05E15,1.1E15) ax1.set_ylim(-2.1, -0.7) ax1.plot(x, y, 'o') # Fit using odr def f(B, x): return B[0]*x + B[1] sx = np.std(x) sy = np.std(y) linear = Model(f) mydata = RealData(x=x,y=y, sx=sx,...

Dates in R are actually just numeric values recast into a date format according to certain rules. For example, Date format is the number of days elapsed since Jan 1, 1970 and POSIXct format is the number of seconds elapsed since Jan 1, 1970 referenced to the UTC time zone....

You can predict into a new data set of whatever length you want, you just need to make sure you assign the results to an existing vector of appropriate size. This line causes a problem because stackloss$predict1[-1] <- predict(stackloss.lm, newdata) because you can't assign and subset a non-existing vector at...

performance,regression,prediction,forecasting

Yes - I would use linear regression as a starting point. For an example, see How can I predict memory usage and time based on historical values. I found Data Analysis Using Regression and Multilevel/Hierarchical Models to be s highly readable introduction to the subject (you probably won't need multilevel...

python,time-series,scikit-learn,regression,prediction

Here is my guess about what is happening in your two types of results: .days does not convert your index into a form that repeats itself between your train and test samples. So it becomes a unique value for every date in your dataset. As a consequence your models either...

r,regression,lm,beta,standardized

There is a convenience function in the QuantPsyc package for that, called lm.beta. However, I think the easiest way is to just standardize your variables. The coefficients will then automatically be the standardized "beta"-coefficients (i.e. coefficients in terms of standard deviations). For instance, lm(scale(your.y) ~ scale(your.x), data=your.Data) will give you...

python,syntax,regression,linear

Your last variable, or rather pair of variables, is invalid. Syy/Sxx If that is supposed to be a single variable, know that you cannot have / in your variable name. Python identifiers and keywords Identifiers (also referred to as names) are described by the following lexical definitions: identifier ::= (letter|"_")...

You'd find the rpart package useful, particularly the where element. where: an integer vector of the same length as the number of observations in the root node, containing the row number of frame corresponding to the leaf node that each observation falls into. library(rpart) fit <- rpart(Kyphosis ~ Age +...

The fit method of the linear models, discrete models and GLM, take a cov_type and a cov_kwds argument for specifying robust covariance matrices. This will be attached to the results instance and used for all inference and statistics reported in the summary table. Unfortunately, the documentation doesn't really show this...

matlab,for-loop,plot,regression

The first part could be done in a number of ways. I would test the second column for zeroness zerodata = A(:,2) == 0; which will give you a logical array of ones and zeros like [1 1 1 0 1 0 0 ...]. Then you can use this to...

You can use the caret package to do so: Data: library(rpart) train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10), RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE), Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE), Activity = factor(c("active", "very active", "very active", "inactive", "very inactive",...

matlab,regression,linear-regression

regress expect its inputs as column vectors. Transposing (.') your inputs should do the trick: >> b = regress( Y.', X.' ) b = 0.4291 ...

r,loops,dataframes,regression,linear

This is a slight modification of @BondedDust's comment. models <- sapply(unique(as.character(df$country)), function(cntry)lm(BirthRate~US.,df,subset=(country==cntry)), simplify=FALSE,USE.NAMES=TRUE) # to summarize all the models lapply(models,summary) # to run anova on all the models lapply(models,anova) This produces a named list of models, so you could extract the model for Aruba as: models[["Aruba"]] ...

matlab,regression,curve-fitting,ellipse,best-fit-curve

You can also try with fminsearch, but to avoid falling on local minima you will need a good starting point given the amount of coefficients (try to eliminate some of them). Here is an example with a 2D ellipse: % implicit equation fxyc = @(x, y, c_) c_(1)*x.^2 + c_(2).*y.^2...

NeweyWest calculates the 'lag' with this code: lag <- floor(bwNeweyWest(x, order.by = order.by, prewhite = prewhite, ar.method = ar.method, data = data)) ... and when called with the default arguments it replicates your (and my replication of it) error: >bwNeweyWest(m2,lag = NULL, order.by = NULL, prewhite = TRUE, adjust =...

r,ggplot2,regression,logistic-regression

You're not going to be able to plot the "S" shaped curves that you get with logistic regression because you do not have a continuous variable to plot over. Instead you're only going to be able to plot predicted values and CIs around those predicted values. Create a column in...

r,function,if-statement,regression

You have two problems. The error you're encountering is because you're trying to use the weigh variable without referencing it as coming from the mydata dataset. Try using mydata$weig. This will solve your first error, but you then get the actual one related to using the weights argument, which is:...

Perhaps this will help: If your version of Excel is 2007 or later, you can use the TREND function which I will demonstrate below. But to do this graphically, first Insert a Scatter Graph using your x-y coordinates; add a polynomial order 2 trendline, and show the formula. (I also...

r,machine-learning,classification,regression,caret

Look at the help page ?models. Also, here are some links too. Also: > is_class <- unlist(lapply(mods, function(x) any(x$type == "Classification"))) > class_mods <- names(is_class)[is_class] > head(class_mods) [1] "ada" "AdaBag" "AdaBoost.M1" "amdai" "avNNet" [6] "bag" ...

Found the answer myself. It's so easy: hexbinplot(yy~xx, main = "(text) Scatterplot: text", xlab="wind",ylab="other wind",style="nested.centroids", type=c("r"), col.line = "yourcolor", lwd="3") ...

python,regression,linear-regression

if sm is a defined object in statsmodels ,you need to invoke it by statsmodels.sm,or using from statsmodel import sm,then you can invoke sm directly

r,regression,curve-fitting,data-fitting,nls

Here's one way to do it. (On edit: this works fine, a typo in my original code made it seem like it wasn't working, thanks to @MrFlick and @Gregor for pointing this out). First replicate your code with a fixed random seed: set.seed(1) x<-seq(0,120, by=5) y<-100/50*exp(-0.02*x)*rnorm(25, mean=1, sd=0.05) y2<-(1*100/50)*(0.1/(0.1-0.02))*(exp(-0.02*x)-exp(-0.1*x))*rnorm(25, mean=1,...

Trying to predict with a data.frame after fitting an lm model with variables not inside a data.frame (especially matrices) is not fun. It's better if you always fit your model from data in a data.frame. For example if you did seasonalfit <- lm(airp ~ ., data.frame(airp=airp,SIN=SIN,COS=COS)) Then your predict would...

r,ggplot2,regression,quantile,vgam

How about something like this: # required packages library(VGAM) require(reshape2) require(ggplot2) # fitted values from vgam fit4 <- vgam(BMI ~ s(age, df = c(4, 2)), lms.bcn(zero = 1), data = bmi.nz, trace = TRUE) fitted.values <- data.frame(qtplot.lmscreg(fit4, percentiles = c(5,50,90,99))$fitted.values) fitted.values[, 'age'] <- bmi.nz[, 'age'] # melt data.frame dmelt <-...

It looks like you are missing the l in substitute(). That is, use substitute(yourFormula, l). Here's a MWE without the r^2 that parallels the one you're looking at (which I think is at ggplot2: Adding Regression Line Equation and R2 on graph). library(ggplot2) # Function to generate correlated data. GenCorrData...

r,optimization,regression,least-squares

Use the fact that vec(AXA') = (A ⊗ A ) vec(X) so: k <- ncol(A) AA1 <- kronecker(A, A)[, c(diag(k)) == 1] lm.fit(AA1, c(V)) ...

With the default R formula syntax the * not only includes the interaction terms, but also includes the individual terms. If you just want the interaction term, then you the : operator. So you in your case, you want fe1 <- summary(lm(qnorm(y) ~ factor(Bank) -1 + factor(Country):x ,data=PDwideHPI)) ...

Besides the elegant method from G.G there are other ways to handle ranges. You could use paste or sprintf to construct names or grep or match, all of those options potentially effective within "[" calls to restrict columns passed to the data argument. More complete answers would be possible after...

Maybe this will be of help: set.seed(1) X1 <- runif(50, 0, 1) X2 <- runif(50, 0, 10) # I included another variable just for a better demonstration Y <- runif(50, 0, 1) df <- data.frame(X1,X2,Y) rolling_lms <- lapply( seq(20,nrow(df) ), function(x) lm( Y ~ X1+X2, data = df[1:x , ])...

regression,matlab,gradient-descent

Not sure I'm following your logic, but it's quite obvious that 'e' (the error) should not be squared. Let's see what you should be using. theta is a column vector of unknowns, y is a column vector of measurements and X is the model matrix where each row is an...

python,machine-learning,scikit-learn,regression,decision-tree

According to the documentation, there's tree_ attribute, you can traverse that tree to find any properties of interest. In particular, children_right and children_left properties seem to be useful.

Upon further thinking (and reading an old article by Nick Cox), it occurred to me that statsby can be used to avoid the loop and speed up the program. Here's a comparison of their speed. Let's first prepare example data. set more off timer clear webuse nlswork,clear keep idcode ln_wage...

This brings up another important question: Is whether a variable is numeric or categorical a property of the data or a property of the analysis? Back in the early days of statistical computing it was easier to store categorical variables as numbers and therefore it was necessary to designate at...

python,r,machine-learning,statistics,regression

Note that this is not an exact answer. I seriously have no idea what you are trying to do. But I can suggest you a way. Assuming that there is only one peak in the graph and you have all the 2D points data i.e; (X1,Y1)...(Xn,Yn)... Try calculating the differences...

The mefp/monitor functions can only deal with ts time series. Hence, you can either supply a data argument that is a (multivariate) ts, a data.frame where the response variable is a ts or a standalone ts without a data argument. In your case, the data appears to be quarterly and...

r,regression,decision-tree,non-linear-regression

Simulate some data to make a reproducible example: A=data.frame(ads_return_count=sample(100,10,TRUE), actual_cpc=runif(100), is_user_agent_bot=factor(rep("False",100))) cubist(A[,c("ads_return_count","is_user_agent_bot")],A[,"actual_cpc"]) cubist code called exit with value 1 Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds Great, now we're on the same page. What bothers me is that the second argument, the outcome, is all "False". I'm not...

When you create a your normal stochastastic with pymc.Normal('w0', 0, 0.000001), PyMC2 initializes the value with a random draw from the prior distribution. Since your prior is so diffuse, this can be a value which is so unlikely that the posterior is effectively zero. To fix, just request a reasonable...

c++,opencv,machine-learning,regression,random-forest

to use it for regression, you just have to set the var_type as CV_VAR_ORDERED i.e. var_type.at<uchar>(ATTRIBUTES_PER_SAMPLE, 0) = CV_VAR_ORDERED; and you might want to set the regression_accuracy to a very small number like 0.0001f. ...

javascript,jquery,highcharts,regression

fitData() function that you are using doesn't support this format of data. When x axis is category data should be array of numbers (e.g. var sourceData = [100,200,300,400];) Example: http://jsfiddle.net/t2tc93zh/ Source for plugin: http://rawgit.com/virtualstaticvoid/highcharts_trendline/master/regression.js...

TSTAT is a vector of dimension K+1, where K is the number of independent variables, therefore the following code should be work: K = length(X(1,:)) n = 30; Tstat = zeros(n,K+1); # A matrix to store the n TSTAT vectors, each one of size K+1 for i = 1:n [B,TSTAT]...

r,regression,linear-regression,rms

They're computed on the fly. Digging inside rms:::print.fastbw (the print method for objects of class fastbw) you can find: cof <- coef(x) vv <- if (length(cof) > 1) diag(x$var) else x$var z <- cof/sqrt(vv) stats <- cbind(cof, sqrt(vv), z, 1 - pchisq(z^2, 1)) (if you want more accurate small p-values,...

You should look into Regression\Matrix methods as Mark Baker suggests in the comment: there should be some method exposing the protected MainMatrix member. And if there isn't any... looks like object can be typecasted into (associative) array and the protected members have keys prefixed with chr(0).'*'.chr(0) (see @fardelian's comment here)....

r,statistics,regression,regression-testing

I haven't used regsubsets() before, but the way I see it you can simply set the intercept parameter to FALSE, check ?regsubsets. Example: data(swiss) a.fit <- regsubsets(Fertility ~ ., data = swiss, nvmax = 10, intercept = F) minimum <- which.min((summary(a.fit)$cp)) # 4 coef(a.fit, minimum) Agriculture Education Catholic Infant.Mortality 0.11714390...

r,regression,stata,logistic-regression,standard-error

The default so-called "robust" standard errors in Stata correspond to what sandwich() from the package of the same name computes. The only difference is how the finite-sample adjustment is done. In the sandwich(...) function no finite-sample adjustment is done at all by default, i.e., the sandwich is divided by 1/n...

If your data is in a data.frame x, and each row corresponds to an observation, then the way to go about this is to identify complete cases via complete.cases(x). Conversely, to find missing values in an observation, do ! complete.cases(x). To find out which observation contains missing values, do which(!...

You are using predict.segemented incorrectly. Like nearly all the predict() functions, your newdata parameter should be a data.frame, not a vector. Also, it needs to have names that match the variables used in your regression. Try predict.segmented(seg_model, data.frame(x=xtest)) instead. When using a function for the first time, be sure the...

r,ggplot2,regression,linear-equation

You can use annotate to place text on your figure library(ggplot2) ggplot(file, aes(x, y, color=outlier)) + geom_point() + annotate("text", c(-1,-1), c(3,4), label=equation_end$formula) If you want the text the same color as some lines, try using geom_text, ggplot(file, aes(x, y, color=outlier)) + geom_point() + geom_smooth(fill=NA) + geom_text(data=equation_end, aes(x=c(-1,-1), y=c(3,4), label=formula), show_guide=F)...

A simplification of your method, using dictionary.columns. This is completely identical to the proc contents method in what ultimately happens, it just makes it take a bit less code and a bit less time. proc sql; select name into :reg_varlist separated by ' ' from dictionary.columns where libname='WORK' and memname='MYDAT'...

I would probably recommend using predict() for this. The intercept is just the value a time x=0, and the slope is the difference in the values between x=1 and x=0. So you can do int <- predict(m, cbind(groups,x=0)) t1 <- predict(m, cbind(groups,x=1)) data.frame(group=groups$groups, int=int, slope=t1-int) You didn't set a seed...

The glmnet package should be useful. There is a great tutorial by the authors. But, here is a quick start using your code. require(glmnet) est <- glmnet(as.matrix(data[,2:16]), data$PlacedN , family="binomial") summary(est) plot(est) last <- dim(coef(est))[2] coef(est)[last] Hope this helps!...

It sounds as if you want to suppress the linear predictor scale altogether. You can do that with nomogram(..., lp=FALSE). To add a scale with predicted probabilities use nomogram(..., fun=plogis, funlabel='Predicted Risk'). Your question confused the linear predictor with your notion of a "final" output of logistic regression, the predicted...

r,regression,correlation,weighted-average

The answer can be found in that CERN paper: ftp://ftp.desy.de/pub/preprints/cern/ppe/ppe94-185.ps.gz the procedure is a generalised least square regression. See the equation (2) page (1) for the result....

regression,development-environment,agile,scrum

An extensive period of regression testing is something you should try to avoid at all cost when you want to practice Scrum. Not fully testing your software during or at the end of each sprint breaks one of the most important rules of Scrum, namely that the increment you deliver...

I would recommend a fourier regression, rather than polynomial regression, i.e. rho = a0 + a1 * cos(theta) + a2 * cos(2*theta) + a3 * cos(3*theta) + ... b1 * sin(theta) + b2 * sin(2*theta) + b3 * sin(3*theta) + ... for example, given the following points >> plot(x, y,...