python,matplotlib,statsmodels,seaborn

As I mention in my comments, there are two ways I would think about doing this. The first is to define a function that does the fit and then plots and pass it to FacetGrid.map: import pandas as pd import seaborn as sns tips = sns.load_dataset("tips") def plot_good_tip(day, total_bill, **kws):...

python,r,time-series,statsmodels

Clearly you have some seasonality in your data. Then arma models and stationarity tests need to be carefully done. Apparently, the reason for the difference in adf test between python and R is the number of default lags each software uses. > (nobs=length(dataseries)) [1] 91 > 12*(nobs/100)^(1/4) #python default [1]...

python,statistics,models,statsmodels

I don't see any problem. Generalized Linear Models are Maximum Likelihood models, if the scale is the one implied by the family. statsmodels.GLM doesn't currently implement Quasi-Likelihood methods where the scale can deviate from those of the underlying family, e.g. overdispersed Poisson, so the Likelihood Ratio test can be applied....

python,documentation,statsmodels

Inference for parameters is the same across models and is mostly inherited from the base classes. Quantile regression has a model specific covariance matrix of the parameters. tvalues, pvalues, confidence intervals, t_test and wald_test are all based on the assumption of an asymptotic normal distribution of the estimated parameters with...

results.params contains the "slopes" for all the variables used.

python,numpy,pandas,statsmodels

You can use a dictionary: >>> fit.predict({"b": example_df["c"]}) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892]) or create a numpy array for the prediction, although that is much more complicated if there are categorical explanatory variables: >>> fit.predict(sm.add_constant(example_df["c"].values), transform=False) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093,...

You need to import statsmodels.api as sm

pandas,matplotlib,statsmodels,mosaic-plot

According to the docs the first parameter should be a contingency table. The fact that your way of doing things works at all seems to be an undocumented feature. The behaviour you're seeing (including your "funny" looking labels) is because many of the entries in your contingency table are zero,...

You can find all examples directly in Github. the only example I found using the wald_test was ex_ols_robustcov.py.

I used your data and this code: mosaic(myDataframe, ['size', 'length']) and got the chart like this: ...

As the documentation shows, if the keyword typ is passed to the predict method, the answer can be show in the original predictor variables: typ : str {‘linear’, ‘levels’} ‘linear’ : Linear prediction in terms of the differenced endogenous variables. ‘levels’ : Predict the levels of the original endogenous variables....

python,time-series,forecasting,statsmodels

Two problems. As the error message indicates, '2014-1-3' isn't in your data. You need to start the prediction within one time step of your data, as the docs should mention. Second problem, your data doesn't have a defined frequency. By removing the holidays from the business day frequency data, you...

python,sample,weight,statsmodels,logistic-regression

Took me a while to work this out, but it is actually quite easy to create a logit model in statsmodels with weighted rows / multiple observations per row. Here's how's it's done: import statsmodels.api as sm logmodel=sm.GLM(trainingdata[['Successes', 'Failures']], trainingdata[['const', 'A', 'B', 'C', 'D']], family=sm.families.Binomial(sm.families.links.logit)).fit() ...

That seems to be a misunderstanding. You can either convert a whole summary into latex via summary.as_latex() or convert its tables one by one by calling table.as_latex_tabular() for each table. The following example code is taken from statsmodels documentation. Note that you cannot call as_latex_tabular on a summary object. import...

python,logistic-regression,statsmodels

The result of the fit should have a method predict(). That is what you need to use to predict future values, for example: result = sm.Logit(outcomes, values).fit() result.predict([82,45,2]) ...

Use try: ... except: ... to catch the exception and continue for p in range(6): for d in range(2): for q in range(4): try: arima_mod=sm.tsa.ARIMA(df,(p,d,q)).fit(transparams=True) x=arima_mod.aic x1= p,d,q print (x1,x) aic.append(x) pdq.append(x1) except: pass # ignore the error and go on ...

As best I can tell, statsmodels 0.5.0 simply doesn't work with Python 3.4, even with Cython 0.20.1 (latest) installed. The latest master installed fine, however, so here's one approach if you're willing to use an unreleased version: git clone https://github.com/statsmodels/statsmodels cd statsmodels pip install . Update: This shouldn't be necessary...

python,pandas,statistics,data-analysis,statsmodels

Compliments to @behzad.nouri who penned this answer originally. I chose to post the entire answer rather than a link to it because links can sometimes change or be deleted. Perhaps you could detect high-multi-collinearity by inspecting the eigen values of correlation matrix. A very low eigen value shows that the...

GMM and related IV estimators are still in the sandbox and have not been included in the statsmodels API yet. The import needs to be directly from the module from statsmodels.sandbox.regression import gmm Then, these classes can be accessed with, for example gmm.GMM The main models that are currently available...

python,arrays,numpy,statsmodels

The weights aren't normalized in the model in any way. You passed negative weights and as the docstring says, the sqrt of weights is used. This introduces NaNs, which is usually what the SVD convergence failure indicates.

python,python-3.x,numpy,scipy,statsmodels

The three are very different but overlap in the parameter estimation for the very simple example with only one explanatory variable. By increasing generality: scipy.stats.linregress only handles the case of a single explanatory variable with specialized code and calculates a few extra statistics. numpy.polynomial.polynomial.polyfit estimates the regression for a polynomial...

python,scikit-learn,linear-regression,statsmodels

Try lm.params instead of lm.params() The latter tries to call the params as a function (which it isn't)...

python,time-series,statsmodels,autoregressive-models

There is nothing wrong. That's the behavior of a stationary ARMA process where predictions converge to the mean. If you have fixed seasonality, then you could difference the time series at the seasonal lag, i.e. use a SARIMA, and the prediction would converge to a fixed seasonal structure. If you...

The fit method of the linear models, discrete models and GLM, take a cov_type and a cov_kwds argument for specifying robust covariance matrices. This will be attached to the results instance and used for all inference and statistics reported in the summary table. Unfortunately, the documentation doesn't really show this...

Assuming df2 is your new out of sample DataFrame: model = sm.OLS(Y, X).fit() new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api y_predict = model.predict(new_x) >>> y_predict array([ 4.61319034, 5.88274588, 6.15220225]) You can assign the results directly to df2 as follows: df2.loc[:, 'Sales'] = model.predict(new_x) To...

See the typ keyword of predict in the docstring. It determines whether you get predictions in terms of differences or levels. The default is 'linear' differences not levels. As an aside, your start should not be greater than your end. If this works, then this may NOT be giving you...

python,python-2.7,pandas,statsmodels

I think the information you're looking for is in design_info.column_names: >>> dm = dmatrix("carbs + score", dta) >>> dm.design_info DesignInfo(['Intercept', 'carbs[T.lo]', 'carbs[T.very_high]', 'score'], term_slices=OrderedDict([(Term([]), slice(0, 1, None)), (Term([EvalFactor('carbs')]), slice(1, 3, None)), (Term([EvalFactor('score')]), slice(3, 4, None))]), builder=<patsy.build.DesignMatrixBuilder at 0xb03f8cc>) >>> dm.design_info.column_names ['Intercept', 'carbs[T.lo]', 'carbs[T.very_high]', 'score']...

python,arrays,numpy,statsmodels

There's no such thing as WLS for one observation. The single weight would simply become 1 when they're normalized to sum to 1. If you want to do this, though I supsect you don't, just use OLS. The solution will be a consequence of the SVD not any actual relationship...

python,matplotlib,pandas,statsmodels

This is a result of different common definitions between statistics and signal processing. Basically, the signal processing definition assumes that you're going to handle the detrending. The statistical definition assumes that subtracting the mean is all the detrending you'll do, and does it for you. First off, let's demonstrate the...

python,statistics,glm,statsmodels

There isn't, unfortunately. However, you can roll your own by using the model's hypothesis testing methods on each of the terms. In fact, some of their ANOVA methods do not even use the attribute ssr (which is the model's sum of squared residuals, thus obviously undefined for a binomial GLM)....

The vertical lines are pointwise prediction intervals. Prediction intervals combine the estimation error for the parameters and the variance for the inherent noise to get the confidence interval for the observed response variable, i.e. Murder Rate in this case. This is similar to the usual prediction confidence band. However, since...

python,time-series,forecasting,statsmodels

The code: predict_price1 = arma_mod1.predict(start_pred, end_pred, exog=True, dynamic=True) print ('Predicted Price (ARMAX): {}' .format(predict_price1)) has to be changed into: predict_price1 = arma_mod1.predict(start_pred, end_pred, external_df, dynamic=True) print ('Predicted Price (ARMAX): {}' .format(predict_price1)) that way it works! I compared the values without external_dfand they where different which can be seen as a...

python,object,typeerror,result,statsmodels

prsquared is an attribute, not a function. Try: print(result.prsquared) ...

python,time-series,statsmodels

The constant is the zero-th element in params. E.g., params[0]. Your code should be fit = [] for t in range(result.k_ar, len(data)): value = result.params[0] for i in range(2, result.k_ar + 2): value += result.params[i - 1] * data[t - i + 1] fit.append(value) Or even easier, since we've made...

python,pandas,least-squares,statsmodels

statsmodels is not directly of any help here, at least not yet. I think your linearized non-linear least square optimization is essentially what scipy.optimize.leastsq does internally. It has several more user friendly or extended wrappers, for example scipy.optimize.curve_fit or the lmfit package. Statsmodels currently does not have a generic version...

python,r,logistic-regression,statsmodels

Not sure what your data manipulations are intending but they seem to be loosing information in the R run. If I keep all the rank information in, then I get this on the original data-object (and the results look very similar in the areas they overlap on. (Likelihoods are only...

python,r,statistics,statsmodels

This link says it is determined using repeated KPSS tests. I see no reason why it couldn't be implemented in Python, it would just need to be written. Otherwise, you could use rpy2 and just call auto.arima from python. from rpy2 import * import rpy2.robjects as RO RO.r('library(forecast)') # use...

Ah, I see the issue. You don't have an ARIMA model. You have an ARMA model because d=0. ARMA.predict doesn't take a typ keyword argument because they don't need one.

python,statistics,scikit-learn,statsmodels,cvxopt

statsmodels has had for some time a fit_regularized for the discrete models including NegativeBinomial. http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.fit_regularized.html which doesn't have the docstring (I just saw). The docstring for Poisson has the same information http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.Poisson.fit_regularized.html and there should be some examples available in the documentation or unit tests. It uses an interior...

python,statistics,linear-regression,statsmodels

A linear hypothesis has the form R params = q where R is the matrix that defines the linear combination of parameters and q is the hypothesized value. In the simple case where we want to test whether some parameters are zero, the R matrix has a 1 in the...

python,numpy,time-series,statsmodels

I assume you want to test for expanding cointegration? Note that you should use sm.tsa.coint to test for cointegration. You could test for historical cointegrating relationship between realgdp and realdpi using pandas like so import pandas as pd import statsmodels.api as sm data = sm.datasets.macrodata.load_pandas().data def rolling_coint(x, y): yy =...

python,statsmodels,autoregressive-models

It works after adding a bit of noise, for example signal = np.ones(20) + 1e-6 * np.random.randn(20) My guess is that the constant is not added properly because of perfect collinearity with the signal. You should open an issue to handle this corner case better. https://github.com/statsmodels/statsmodels/issues My guess is also...

I'm running a development branch so things may have changed, but the results class returned by MixedLM.fit() should have an attribute called 'llf'. That is the value of the log-likelihood function at the estimated parameters. If you have two nested models and take -2 times the difference in their llf...

You could do something like this ... import pandas as pd import statsmodels.api as sm for products in linear_regression_df.product_desc.unique(): tempdf = linear_regression_df[linear_regression_df.product_desc == products] X = tempdf['period_num'] y = tempdf['TOTALS'] model = sm.OLS(y, X) results = model.fit() print results.params # Or whatever summary info you want ...

python,latex,regression,stata,statsmodels

Well, there is summary_col in statsmodels; it doesn't have all the bells and whistles of estout, but it does have the basic functionality you are looking for (including export to LaTeX): import statsmodels.api as sm from statsmodels.iolib.summary2 import summary_col p['const'] = 1 reg0 = sm.OLS(p['p0'],p[['const','exmkt','smb','hml']]).fit() reg1 = sm.OLS(p['p2'],p[['const','exmkt','smb','hml']]).fit() reg2 =...

There are two predict methods. logit in your example is the model instance. The model instance doesn't know about the estimation results. The model predict has a different signature because it needs the parameters also logit.predict(params, exog). This is mainly interesting for internal usage. What you want is the predict...

machine-learning,glm,statsmodels

Sourceforge is down right now. When it's back up, you should read through the documentation and examples. There are plenty of usage notes for prediction and GLM. How to label your target is up to you and probably a question for cross-validated. Poisson is intended for counts but can be...

statistics,tableau,statsmodels

Trying out the example, it looks like the improvement is around 6%, but with a wide confidence interval. A break in trend doesn't look significant. The first models below are estimated with OLS with a shift in the constant. In the first case also a shift in trend. I use...

boston_df["lstat^4"] = np.power(boston_df["lstat"], 4) boston_df["lstat^3"] = np.power(boston_df["lstat"], 4) Here, why both are 4-th order poly-terms? Is this a typo or intention? Because from the result, the coefficients for 3-rd and 4-th order terms have exactly the same magnitude but just differ in sign. This is typically due to the multicollinearity...

python,statistics,time-series,statsmodels

I believe you have to use the development version (0.6) to do the following in statsmodels: import pandas as pd import numpy as np import statsmodels.api as sm df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9], 'b':[3,5,6,2,4,6,7,8,7,8,9]}) results = sm.OLS(df.a, sm.add_constant(df.b)).fit() new = results.get_robustcov_results(cov_type='HAC',maxlags=1) print new.summary() OLS Regression Results ============================================================================== Dep. Variable: a R-squared: 0.281...

python,pandas,time-series,statsmodels,trend

Quick and dirty ... # get some data import pandas.io.data as web import datetime start = datetime.datetime(2015, 1, 1) end = datetime.datetime(2015, 4, 30) df=web.DataReader("F", 'yahoo', start, end) # a bit of munging - better column name - Day as integer df = df.rename(columns={'Adj Close':'AdjClose'}) dayZero = df.index[0] df['Day'] =...

python,time-series,forecasting,statsmodels

The documentation in the Notes section explicitly states how you can speed things up...See the docstring for fit_kw to change arguments given to the ARMA.fit method. This is going to be slow for high numbers of models. It's a naive implementation and just does a pairwise fit of them all....

From the lowess documentation: Definition: lowess(endog, exog, frac=0.6666666666666666, it=3, delta=0.0, is_sorted=False, missing='drop', return_sorted=True) [...] Parameters ---------- endog: 1-D numpy array The y-values of the observed points exog: 1-D numpy array The x-values of the observed points It accepts arguments in the other order. It also doesn't only return y: >>>...

python,pandas,regression,statsmodels

An example with time fixed effects using pandas' PanelOLS (which is in the plm module). Notice, the import of PanelOLS: >>> from pandas.stats.plm import PanelOLS >>> df y x date id 2012-01-01 1 0.1 0.2 2 0.3 0.5 3 0.4 0.8 4 0.0 0.2 2012-02-01 1 0.2 0.7 2 0.4...

python,scipy,statsmodels,goodness-of-fit

An approximate solution for equal probability bins: Estimate the parameters of the distribution Use the inverse cdf, ppf if it's a scipy.stats.distribution, to get the binedges for a regular probability grid, e.g. distribution.ppf(np.linspace(0, 1, n_bins + 1), *args) Then, use np.histogram to count the number of observations in each bin...