r,data.table,linear-regression

Here is a data.table solution using dcast.data.table, which takes data in the long format (your input) and converts it to the wide format required for the lm call. lm(`1` ~ ., dcast.data.table(dtData, Date ~ SecId, fill=0)) Here is the output of the dcast call: Date 1 2 3 1: 2014-01-02...

r,data.frame,linear-regression

Here's a pretty general strategy dd<-read.table(text="a1 a2 a3 a4 1 3 3 5 5 2 4 3 5 5 3 5 4 6 5 4 6 5 7 3", header=T) mm<-diag(ncol(dd)) mm[lower.tri(mm)] <- combn(dd, 2, function(x) coef(lm(x[,2]~x[,1]+0))) mm[upper.tri(mm)] <- rev(combn(dd[length(dd):1], 2, function(x) coef(lm(x[,2]~x[,1]+0)))) This gives the matrix mm # [,1]...

python,machine-learning,scikit-learn,linear-regression,multivariate-testing

This is a mathematical/stats question, but I will try to answer it here anyway. The outcome you see is absolutely expected. A linear model like this won't take correlation between dependent variables into account. If you had only one dependent variable, your model would essentially consist of a weight vector...

r,statistics,plyr,apply,linear-regression

I don't know how this will be helpful in a linear regression but you could do something like that: df <- read.table(header=T, text="Assay Sample Dilution meanresp number 1 S 0.25 68.55 1 1 S 0.50 54.35 2 1 S 1.00 44.75 3") Using lapply: > lapply(2:nrow(df), function(x) df[(x-1):x,] ) [[1]]...

This one is pretty simple, but only after going round in circles for a while with that second if() statement in crunch(). Looking at the summary method for caic, it's just a subset of the entire summary / model > summary.caic function (object, ...) { summary(object$mod, ...) } <environment: namespace:caper>...

Note that lm() returns an object of class "lm" and summary() on that object produces a "summary.lm" object. There are custom print.lm() and print.summary.lm() objects. So what ever is printed to the console may be different than what's in the object itself. When you manually concatenate (c()) two summary.lm objects,...

python,numpy,matplotlib,linear-regression

The most common approach is to vary the color and/or size of the scatter symbols. For example: import numpy as np import matplotlib.pyplot as plt np.random.seed(2) ## generate a random data set x, y = np.random.randn(2, 30) y *= 100 z = 11*x + 3.4*y - 4 + np.random.randn(30) ##the...

The Apache Commons Math class SimpleRegression does it all. You use addData() to input the x and y values. You use getSlope() and getIntercept() to get the equation of the regression line. You use getR() to get the correlation coefficient. It couldn't be easier!...

r,design,matrix,linear-regression

Yes. table_design<-data.frame( Treatment=rep(c("A","B"), each=10), Time=rpois(20,20) ) all.equal( model.matrix(~ Treatment + Time + Treatment*Time, table_design), model.matrix(~ Treatment*Time, table_design) ) # [1] TRUE ...

python,pandas,regression,linear-regression

pandas and statsmodel work beautifully together for things like this, see this example: In [16]: import statsmodels.formula.api as smf In [17]: df=pd.DataFrame(np.random.random((10,2)), columns=['A','B']) In [18]: df.index=pd.date_range('1/1/2014', periods=10) In [19]: dfd=df.diff().dropna() In [20]: print df A B 2014-01-01 0.455924 0.375653 2014-01-02 0.585738 0.864693 2014-01-03 0.201121 0.640144 2014-01-04 0.685951 0.256225 2014-01-05 0.203623...

python,statistics,linear-regression

If you can use a least squares fit, you can calculate the slope, y-intercept, correlation coefficient, standard deviation of the slope, and standard deviation of the y-intercept with the following function: import numpy as np def lsqfity(X, Y): """ Calculate a "MODEL-1" least squares fit. The line is fit by...

python,linear-regression,least-squares

What makes you so sure your residuals are not normally distributed? One way to check for this assumption is to use a Q-Q plot. From a pragmatic perspective, most people will just look at a scatterplot of their data to see whether residuals are normally distributed. Often a violation of...

machine-learning,statistics,linear-regression

So Linear Regression assumes your data is linear even in multiple dimensions. It wont be possible to visualize high dimensional data unless you use some methods to reduce the high dimensional data. PCA can do that but bringing it down to 2 dimensions won't be helpful. You should do Cross...

models <- lapply(dsets, function(data){ lm(reformulate(termlabels=".", response=names(data)[1]), data) }) reformulate allows you to construct a formula from character strings....

Try this: in ui.R sliderInput(inputId=XValues, label="This slider determines Xinf and Xsup or whatever", min=0, max=1000, value=c(0,1000)) And in your server.R: FitWeibull <- function(data, xinf, xsup){ sub.data <- data[(data$X >= log(input$XValues[1]) & data$X <= log(input$XValues[2]),] my.lm <- lm(Y~X, data = sub.data) return(my.lm) } I just put in 0 and 1000 as...

machine-learning,linear-regression

Yes. You will essentially have 10 Million predictor variables. This is unavoidable if you are doing regression/classification unless you want to club "similar" keywords together to reduce the number of predictor variables. E.g. you can club keyword_1, keyword_2, keyword_3 into a single keyword if they share a specific relation among...

If your purposes are related to just one prediction you can just grab your coefficient with coef(mod) Or you can just build a simple equation like this. coef(mod)[1] + "Your_Value"*coef(mod)[2] ...

Here's a vote for the plyr package and ddply(). plyrFunc <- function(x){ mod <- lm(b~c, data = x) return(summary(mod)$coefficients[2,3]) } tStats <- ddply(dF, .(a), plyrFunc) tStats a V1 1 a 1.6124515 2 b -0.1369306 3 c 0.6852483 ...

r,plot,dataset,linear-regression

Your DATE variable is a factor (or possibly a character). You need to recode it to numeric - but be careful to do this right and not get the internal factor coding, so recode first as Date, then as numeric. Read the data: d <- read.csv("regresion.csv",sep=";") Convert your dates (which...

python,regression,linear-regression

if sm is a defined object in statsmodels ,you need to invoke it by statsmodels.sm,or using from statsmodel import sm,then you can invoke sm directly

c++,algorithm,machine-learning,artificial-intelligence,linear-regression

Looks like everything is behaving as expected, but you are having problems selecting a reasonable learning rate. That's not a totally trivial problem, and there are many approaches ranging from pre-defined schedules that progressively reduce the learning rate (see e.g. this paper) to adaptive methods such as AdaGrad or AdaDelta....

data,weka,data-mining,equation,linear-regression

If the vendor is equal to any of the line's nominal values, then the value is a one, otherwise, the value is a zero. For example, in line 1: -152.7641 * vendor=microdata,prime,formation,harris,dec,wang,perkin-elmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl The value would be subtracted by 152.7641 if and only if the vendor is equal to one of...

python,numpy,scikit-learn,linear-regression

First of all, a suggestion not directly related to your question: You don't need to do x = np.array(list(data.charge_time)), you can directly call x = np.array(data.charge_time) or, even better, x = data.charge_time.values which directly returns the underlying ndarray. It is also not clear to me why you're adding a dimension...

r,for-loop,apply,linear-regression,lapply

Replace the columns which are all NA with zeros: Coef <- function(x) { DF <- setNames(as.data.frame(t(x[-(1:2)])), x$Variable) DF[colSums(is.na(DF)) == nrow(DF)] <- 0 coef(lm(var1 ~., DF)) } do.call(rbind, by(Q, Q$Country, Coef)) giving: (Intercept) var2 var3 CountryA 0.01863015 NA -0.1982462 CountryB 0.26296826 -0.35416216 NA CountryC 0.11098809 -0.07225439 0.1439667 ...

r,closures,wrapper,linear-regression

The error you are getting is from either the subset or na.action arguments, which are optional but don't have defaults. Hence, when you call the function without specifying them, they are passed as they are from the wrapper, which is type closure. The simplest solution is to pass everything to...

java,classification,weka,linear-regression

Linear Regression should accept both nominal and numeric data types. It is simply that the target class cannot be a nominal data type. The Model's toString() method should be able to spit out the model (other classifier options may also be required depending on your needs), but if you...

r,dynamic,linear-regression,predict

Unfortunately, the dynlm package does not provide a predict() method. At the moment the package completely separates the data pre-processing (which knows about functions like d(), L(), trend(), season() etc.) and the model fitting (which itself is not aware of the functions). A predict() method has been on my wishlist...

With math3 library you can do the way below. Sample is based on SimpleRegression class: import org.apache.commons.math3.stat.regression.SimpleRegression; public class Try_Regression { public static void main(String[] args) { // creating regression object, passing true to have intercept term SimpleRegression simpleRegression = new SimpleRegression(true); // passing data to the model // model...

You can turn "myForm" into a formula using as.formula(): myForm <- "Species~Petal.Length" class(myForm) # [1] "character" myForm <- as.formula(myForm) class(myForm) # [1] "formula" myForm # Species ~ Petal.Length lda(formula=myForm, data=iris) # Call: # lda(myForm, data = iris) # Prior probabilities of groups: # setosa versicolor virginica # 0.3333333 0.3333333 0.3333333...

It seems to me you are describing a form of dummy variable coding. This is not necessary in R at all, since any factor column in your data will automatically be dummy coded for you. Recreate your data: dat <- read.table(text=" InputName InputValue Output Oxide 35 0.4 Oxide 35.2 0.42...

c#,math,linear-regression,mathnet

The exception text is really bad in this case, we should fix this. There are two concrete problems causing this to fail: The system with 3 unknowns but only 2 equations/samples (2x3 matrix) is under-defined; Applying a regression to such a problem does not actually make any sense as there...

matlab,regression,linear-regression

regress expect its inputs as column vectors. Transposing (.') your inputs should do the trick: >> b = regress( Y.', X.' ) b = 0.4291 ...

r,optimization,linear-regression,least-squares

You are describing linear regression, which can be done with the lm function: coefficients(lm(v~t(B)+0)) # t(B)1 t(B)2 t(B)3 # 0.2280676 -0.1505233 0.7431653 ...

matlab,neural-network,linear-regression,backpropagation,perceptron

A neural network will generally not find or encode a formula like t = a + b*X1 + c*X2, unless you built a really simple one with no hidden layers and linear output. If you did then you could read the values [a,b,c] from the weights attached to bias, input...

List comprehensions are a good, clean solution: sublist = [[a[0], a[1]] for a in list] subarray = [a[2] for a in list] ...

The difference is due to the presence of intercept or not: in statsmodels.formula.api, similarly to the R approach, a constant is automatically added to your data and an intercept in fitted in statsmodels.api, you have to add a constant yourself (see the documentation here). Try using add_constant from statsmodels.api x1...

numpy,linear-regression,standard-deviation

As already mentioned by @ebarr in the comments, you can use np.polyfit to return the residuals by using the keyword argument full=True. Example: x = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0]) y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0]) z, residuals, rank, singular_values, rcond = np.polyfit(x, y, 3, full=True) residuals...

Following this example you could do this: lapply(1:3, function(i){ lm(as.formula(sprintf("y ~ x%i + x4 + x5", i)), a) }) ...

sql,sql-server,linear-regression

Short answer is: calculating trend line for dates is pretty much the same as calculating trend line for floats. For dates you can choose some starting date and use number of days between the starting date and your dates as an X. I didn't check your function itself and I...

python,pandas,scikit-learn,linear-regression

The train_test_split function from sklearn (see docs: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) is random, so it is logical you get different results each time. You can pass an argument to the random_state keyword to have it the same each time.

Yes, you should. If you want to normalize it between 0 and 1, you could use mat2gray function (assuming "vector" as your list of variables). norm_vect = mat2gray(vector); This function is used to convert matrix into an image, but works well if you don't want to write yours. You also...

Your data function runs the first time when your shiny app starts. Because the input file is missing it will return NULL. But Your plotting function doesn't test its input and is equivalent to plot(NULL) in this case. If you test for an input !is.null() it should work: output$contents <-...

As dennis mentioned, a different set of basis functions might do better. However you can improve the polynomial fit with QR factorisation, rather than just \ to solve the matrix equation. It is a badly conditioned problem no matter what you do however, and using smooth basis functions wont allow...

r,statistics,linear-regression

This is the typical problem of having different levels in the factor variables between the folds in the cross validation. The algorithm creates dummy variables for the training set but the test set has different levels to the training set and thus the error. The solution is to create the...

It's the equation you solve to find the coefficients, y=Vc, now you know V and c so use that to find the corresponding y: yFit=[xs.^3 xs.^2 xs xs.^0]*c; Are you happy with the setup to find c using V\y, do you understand how/why it works? EDIT: To fit higher-degree functions...

you can see the code that is used to print the summary by typing class(sumres) #> "summary.lm" to get the class, and then get the code for the print method by typing stats:::print.summary.lm into the console which includes these lines: cat(...lots of stuff..., "p-value:", format.pval(pf(x$fstatistic[1L], x$fstatistic[2L], x$fstatistic[3L], lower.tail = FALSE),...

machine-learning,classification,linear-regression

Neither of this datasets can be modeled using linear classification/regression. In case of the "input data transfromation" if only dataset is consistent (there are no two exact same points with two different labels) there always exists transformation after which data is linearly separable. In particular one can construct it with:...

python-2.7,numpy,machine-learning,linear-regression

All you need is x = data[['Col1', 'Col2']] ...

python,statistics,scikit-learn,linear-algebra,linear-regression

Using this kronecker product identity it becomes a classic linear regression problem. But even without that, it is just the transpose of a linear regression setup. import numpy as np m, n = 3, 4 N = 100 # num samples rng = np.random.RandomState(42) W = rng.randn(m, n) X =...

python,scikit-learn,linear-regression

Make a list for each row: x = [] y = [] for ii in range(0,100): x.append([ii]) <----- y.append(ii) clf = LinearRegression() clf.fit(x, y) clf.predict(101) Output: array([ 101.])...

r,regression,linear-regression,glmnet

This can be achieved by providing a penalty.factor vector, as described in ?glmnet. A penalty factor of 0 indicates that the "variable is always included in the model", while 1 is the default. glmntfit <- cv.glmnet(mydata[,-1], mydata[, 1], penalty.factor=c(0, rep(1, ncol(mydata) - 2))) ...

r,regression,linear-regression

Sometimes (with certain versions of R, as Andrew points out) float entries in a CSV are long enough that it thinks they are strings and not floats. In this case, you can do the following data <- read.csv("filename.csv") data$some.column <- as.numeric(as.character(data$some.column)) Or you could pass stringsAsFactors=F to the read.csv call,...

matlab,oop,time-series,linear-regression,superclass

To determine this, you can use the superclasses function: superclasses('LinearModel') superclasses('GeneralizedLinearMixedModel') This will return the names of the visible superclasses for each case. As you'll see, both inherit from the abstract superclass classreg.regr.ParametricRegression. You can also view the actual classdef files and look at the inheritances. In your Command Window,...

If you read the documentation for predict.lm, you will see the following. So, use the newdata argument to pass the newmodel data you imported to get predictions. predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf, interval = c("none", "confidence", "prediction"), level = 0.95, type = c("response", "terms"),...

r,linear-regression,data-analysis

If the columns are exactly the same, you should be able to do something like this: data_2013_and_2014 <- rbind(data2013, data2014) new_model <- lm(x ~ y, data = data_2013_and_2014) Lookup ?rbind for more details....

python,scikit-learn,linear-regression,statsmodels

Try lm.params instead of lm.params() The latter tries to call the params as a function (which it isn't)...

Replace the zeros with NAs. In that case the default na.action = na.omit argument to lm will drop them automatically. RETS.na <- replace(RETS, RETS == 0, NA) or if you want to drop all rows for which there is an NA in any column then: RETS.na <- na.omit(RETS.na) To get...

r,loops,repeat,linear-regression

You want to run 22,000 linear regressions and extract the coefficients? That's simple to do from a coding standpoint. # number of columns in the Lung and Blood data.frames. 22,000 for you? n <- 10 # dummy data obs <- 50 # observations Lung <- data.frame(matrix(rnorm(obs*n), ncol=n)) Blood <- data.frame(matrix(rnorm(obs*n),...

First of all, you really should avoid using attach. And for functions that have data= parameters (like plot and lm), its usually wiser to use that parameter rather than with(). Also, abline() is a function that should be called after plot(). Putting it is a parameter to plot() doesn't really...

Set the weights of those points to zero, then update the model: w <- abs(rstudent(lm1)) < 3 & abs(cooks.distance(lm1)) < 4/nrow(lm1$model) lm2 <- update(lm1, weights=as.numeric(w)) This is probably a weak approach statistically, but at least the code isn't too hard......

Here is one approach, adapted from ?raster::localFun set.seed(0) b <- stack(system.file("external/rlogo.grd", package="raster")) x <- flip(b[[2]], 'y') + runif(ncell(b)) y <- b[[1]] + runif(ncell(b)) # local regression: rfun <- function(x, y, ...) { d <- na.omit(data.frame(x, y)) if (nrow(d) < 3) return(NA) m <- lm(y~x, data=d) # return slope coefficients(m)[2] }...

python,statistics,linear-regression,statsmodels

A linear hypothesis has the form R params = q where R is the matrix that defines the linear combination of parameters and q is the hypothesized value. In the simple case where we want to test whether some parameters are zero, the R matrix has a 1 in the...

r,data.frame,subset,linear-regression

First, you might want to write a function that can calculate the slope for three consecutive values, like this: slope <- function(x){ if(all(is.na(x))) # if x is all missing, then lm will throw an error that we want to avoid return(NA) else return(coef(lm(I(1:3)~x))[2]) } Then you can use the apply()...

r,regression,linear-regression

You need good starting values: #starting values from linearization fit0 <- lm(log(y) ~ log(x1) + log(x2) +log(x3), data=dat) # fit a nonlinear model fm <- nls(y ~ f(x1,x2,x3,a,b1,b2,b3), data = dat, start = list(a=exp(coefficients(fit0)[1]), b1=coefficients(fit0)[2], b2=coefficients(fit0)[3], b3=coefficients(fit0)[4])) summary(fm) # Parameters: # Estimate Std. Error t value Pr(>|t|) # a 265.19567...

statistics,calculator,linear-regression,ti-basic

Nevermind, I figured it out. One only needs to use ShowStats and hit enter and it will show you the regression stuff....

r,statistics,linear-regression

I wouldn't use a polynomial to model variances. Among the variance functions offered by package nlme is varConstPower. Let's try this: n <- c(1, 2, 4, 8, 16, 32) v <- c(5.85, 6.35, 6.55, 6.85, 7.02, 7.15) plot(v ~ n) fit_ConstPower <- nls(v ~ n^(2*theta) + c, start = list(theta...

python,numpy,machine-learning,linear-regression

The problem is: features.transpose().dot(features) may not be invertible. And numpy.linalg.inv works only for full-rank matrix according to the documents. However, a (non-zero) regularization term always makes the equation nonsingular. By the way, you are right about the implementation. But it is not efficient. An efficient way to solve this equation...

r,regression,linear-regression,rms

They're computed on the fly. Digging inside rms:::print.fastbw (the print method for objects of class fastbw) you can find: cof <- coef(x) vv <- if (length(cof) > 1) diag(x$var) else x$var z <- cof/sqrt(vv) stats <- cbind(cof, sqrt(vv), z, 1 - pchisq(z^2, 1)) (if you want more accurate small p-values,...

r,loops,window,time-series,linear-regression

So the way I figured out how to carry out the regression was like this. I formatted my data into a zoo matrix. library(zoo) Yvar.nd <- Yvar[,-1] Yvar.mt <- as.matrix(Yvar.nd) Yvar.mt.zoo <- zoo(Yvar.mt) #Steprollbig big model rolling no coef coercsion steprollbig4 <- rollapply(Yvar.mt.zoo, width = 36, function(x) step(lm(Yvar ~ ACWI...