r,ggplot2,time-series,timeserieschart

This is the solution: library(ggplot2) library(reshape2) library(ecp) synthetic_control.data <- read.table("/Users/geoHeil/Dropbox/6.Semester/BachelorThesis/rResearch/data/synthetic_control.data.txt", quote="\"", comment.char="") n <- 2 s <- sample(1:100, n) idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s) sample2 <- synthetic_control.data[idx,] df = as.data.frame(t(as.matrix(sample2))) #calculate the change points changeP <- e.divisive(as.matrix(df[1]), k=8, R = 400, alpha = 2, min.size = 3)...

database,rest,time-series,publish-subscribe,iot

If you want a single solution, try ATSD, it does all of the above.

matlab,oop,time-series,linear-regression,superclass

To determine this, you can use the superclasses function: superclasses('LinearModel') superclasses('GeneralizedLinearMixedModel') This will return the names of the visible superclasses for each case. As you'll see, both inherit from the abstract superclass classreg.regr.ParametricRegression. You can also view the actual classdef files and look at the inheritances. In your Command Window,...

r,nested,time-series,lapply,sapply

Using plyr: As a matrix (time in cols, rows corresponding to rows of df): aaply(df, 1, function(x) weisurv(t, x$sc, x$shp), .expand = FALSE) As a list: alply(df, 1, function(x) weisurv(t, x$sc, x$shp)) As a data frame (structure as per matrix above): adply(df, 1, function(x) setNames(weisurv(t, x$sc, x$shp), t)) As a...

You can use data.table too. data.table is a very powerful data manipulation package. You can get started here. library("data.table") as.data.table(testdata)[, lapply(.SD, function(x)x/shift(x) - 1), .SDcols = 2:4] gdp cpi_index rpi_index 1: NA NA NA 2: 0.006427064 0.0072281257 0.009296686 3: 0.007166400 0.0030245056 0.004805767 4: 0.004061822 0.0061020008 0.006377043 5: 0.006772674 0.0009282349 0.005544554...

python,pandas,match,time-series,multi-index

This method is a little messy, but I am trying to make it more robust to account for missing data. First, we'll remove duplicates in the data and then convert the dates to Pandas Timestamps: df = df.drop_duplicates() df.SampleDate = [pd.Timestamp(ts) for ts in df.SampleDate] Then let's arrange you DataFrame...

set.seed(1) r <- rnorm(20,0,1) z <- c(1,1,1,1,1,-1,-1,-1,1,-1,1,1,1,-1,1,1,-1,-1,1,-1) data <- as.data.frame(na.omit(cbind(z, r))) series1 <- ts(cumsum(c(1,data[,2]*data[,1]))) series2 <- ts(cumsum(c(1,data[,2]))) d1y <- seq(as.Date("1991-01-01"),as.Date("2015-01-01"),length.out=24) matplot(cbind(series1, series2), xaxt = "n", xlab = "Time", ylab = "Value", col = 1:3, ann = TRUE, type = 'l', lty = 1) axis(1, at=seq(2,20,2), labels=format(d1y[seq(2,20,2)],"%Y")) ...

NeweyWest calculates the 'lag' with this code: lag <- floor(bwNeweyWest(x, order.by = order.by, prewhite = prewhite, ar.method = ar.method, data = data)) ... and when called with the default arguments it replicates your (and my replication of it) error: >bwNeweyWest(m2,lag = NULL, order.by = NULL, prewhite = TRUE, adjust =...

cassandra,time-series,composite-key

My intention of this question was more like this. Cassandra storage internal Check it out....

On Saturday, May 16, 2015 at 6:09:20 AM UTC-4, Rick wrote: Your assessment is generally correct, "nX" and "b" parameters do indeed correspond to the exogenous input data "x(t)". The number of columns (i.e., time series) in x(t) is "nX" and is what SAS calls "r", and the coefficient vector...

Your values variable is a factor (usually used for categorical values). Convert values to numeric before creating time series: values <- as.numeric(levels(values))[values] ...

r,graph,plot,ggplot2,time-series

Two ways of doing this: If sample data created as follows: Full.df <- data.frame(Date = as.Date("2006-01-01") + as.difftime(0:364, units = "days")) Full.df$Month <- as.Date(format(Full.df$Date, "%Y-%m-01")) Full.df[paste0("Count.", c("S", "G", "W", "F"))] <- matrix(sample(100, 365 * 4, replace = TRUE), ncol = 4) Optimal way using reshape2 package: molten <- melt(Full.df, id.vars...

There is an answer on the Mathworks website that I think you will find helpful: http://www.mathworks.com/matlabcentral/answers/92565-how-do-i-control-axis-tick-labels-limits-and-axes-tick-locations. Basically what you want to do is manipulate the XTick or XTickLabel attributes of the current axis handle. Lets say I have a plot that spans 100 years from 1900 - 2000. After creating...

sql,.net,sql-server,time-series

1) You probably want to explore the use of partitions. This will allow very effective inserts (its a meta operation if you do the partitioning correctly) and very fast (2). You may want to explore columnstore indexes because the data (once collected) will never change and you will have very...

Try this (assuming your data is called df) ts(df$Number, start = c(2010, 01), frequency = 12) ## Jan Feb Mar ## 2010 1 19 1 Edit: this will work only if you don't have missing dates and your data is in correct order. For a more general solution see @Anandas...

r,time-series,shiny,forecasting

The issue wound up being that I was using the arima(...) function instead of Arima(...). It turns out that they are two different functions. The issue that I was experiencing was a result of differences in how the functions store their data. More information about this can be found in...

matlab,statistics,time-series,histogram,fractals

Your code seems to be generally bug-free but I made some changes since you perform needless repetitions over loops (I moved the outer loop inside and "vectorized" it since all moment calculations can be performed simultaneously for a given histogram. Also, it is building the histogram that takes longest). intel...

As was pointed out in the documentation from ?lag.xts, this is the intended behavior.

sql-server,duplicates,time-series,sql-delete

You can do this using a CTE and ROW_NUMBER: SQL Fiddle WITH CteGroup AS( SELECT *, grp = ROW_NUMBER() OVER(ORDER BY MS) - ROW_NUMBER() OVER(PARTITION BY Value ORDER BY MS) FROM YourTable ), CteFinal AS( SELECT *, RN_FIRST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS), RN_LAST = ROW_NUMBER()...

I think the answer is that you can specify the alpha for each test via that iParams and sparams arguments. Without such a user specification, each test has a default alpha. The button to "Answer Your Question" doesn't seem to be working, so here it is, in the Comments.

For a ggplot2 plot first convert df to long form (using melt from the reshape2 package), convert the date column to "Date" class and the value column to a factor and then use geom_tile: library(ggplot2) library(reshape2) long <- melt(df, measure.var = 2:4) long <- transform(long, date = as.Date(long$date, "%d/%m/%Y"), value...

I found a way of doing this, not too happy about it tho: full_index = [] for g in all_genders: for s in all_states: for m in all_months: full_index.append((g, s, m)) df = df.set_index(['Gender', 'State', 'Month']) df = df.reindex(full_index) # fill in all missing values So basically, instead of dealing...

If your data has the value 4199, this means that you included the date column when trying to form your ts object. Since you specified the start and frequency of your time series in your ts function, you no longer need the date values as it will be generated by...

r,datetime,merge,data.frame,time-series

It is sometimes hard to avoid loops, especially when you have conditions like you do. Sometimes we end up spending much efforts avoiding them while they are probably either the best we can do, or are not too far behind in terms of performance and/or readability. Having said that, this...

Using rowsum seems to be faster (at least for this small example dataset) than the data.table approach: sgibb <- function(datframe) { data.frame(Group = unique(df$Group), Avg = rowsum(df$Weighted_Value, df$Group)/rowsum(df$SumVal, df$Group)) } Adding the rowsum approach to @platfort's benchmark: library(microbenchmark) library(dplyr) library(data.table) microbenchmark( Nader = df %>% group_by(Group) %>% summarise(res = sum(Weighted_Value)...

r,datetime,time-series,forecasting

Here is a simple example assuming weekly data: x <- ts(rnorm(200), frequency=52) endx <- end(x) window(x, end=c(endx[1],endx[2]-3)) Of course, there are not actually 52 weeks in a year, but that is probably a complication that can be overlooked for most analyses....

the following worked for me: # create some random data with datetime index spanning 17 months s = pd.Series(index=pd.date_range(start=dt.datetime(2014,1,1), end = dt.datetime(2015,6,1)), data = np.random.randn(517)) In [25]: # now calc the mean for each month s.groupby(s.index.month).mean() Out[25]: 1 0.021974 2 -0.192685 3 0.095229 4 -0.353050 5 0.239336 6 -0.079959 7...

r,time-series,permutation,quantmod

If you wanted to get a list of data frames, one for each pair, you could try: dfs <- lapply(seq_len(ncol(perm)), function(x) close[,paste0(perm[,x], ".Close")]) Now you can get the 2-column data frames for each pair with dfs[[1]], dfs[[2]], etc. You can perform statistical analyses on each pair using the lapply function....

Try this (short is the name of your 2nd matrix): res <- as.matrix(merge(long.date.col, short, all.x = T)) res[is.na(res)] <- "-9999" ...

You can try Reduce(function(...) merge(..., by=c('Date', 'Month', 'Week', 'Year'), all=TRUE), list(Standard.df, Guardian.df, Welt.df)) ...

You could use matplot as follows: matplot(cbind(xtsplot1, xtsplot2, xtsplot3), xaxt = "n", xlab = "Time", ylab = "Value", col = 1:3, ann = FALSE, type = 'l') ...

python,pandas,time-series,freeze

Your approach looks a little complicated ... I hope my simplification is this what you need ... # get an index of pandas Timestamps df.index = pd.to_datetime(df.Date + ' ' + df.Time) # get the column we want as a pandas Series called price price = df['Close'] Update # use...

I had to augment your example to get something to play with, but here is something that works. And I just changed it to eliminate lubridate... library(xts) d1 <- seq(as.Date("2001-01-01"),as.Date("2021-01-01"),"years") d2 <- rnorm(21,10,1) Dollar <- data.frame(d1,d2) dates <- as.Date(Dollar[,1], "%d.%m.%Y",tz="GMT") xtsplot <- as.xts(Dollar[,2], dates) plot(xtsplot, xaxt = "n", main="SMA", ann...

python,pandas,time-series,ipython-notebook

Your problem (as spotted by @ J Richard Snape) is that your dates are in fact strings so it's ordered lexicographically. You should convert to datetime dtype: df1['Ship_date'] = pd.to_datetime(df1['Ship_date']) After which it should maintain the expected order....

Here's a very straightforward solution that isn't pretty but does the job. First, just a change to your data to make comparisons easier: mtable<-data.frame(date,t.1,t.2,m.result, stringsAsFactors = FALSE) Edited in: If you want to assure the matches are ordered by date, you can use order as pointed out by @eipi10: mtable...

Use aggregate.ts with sum, mean or whatever summary function desired. See ?aggregate.ts > aggregate(tser, 4, sum) Qtr1 Qtr2 Qtr3 Qtr4 2010 10.21558 15.22923 21.98924 30.94460 2011 39.81982 45.00208 61.26129 73.03194 2012 87.63780 97.27455 104.69757 115.09325 2013 126.71070 138.39925 145.47344 159.00137 ...

the error message says that you should use as.POSIXct on lims. You also need to add the date (year, month and day) in lims, because by default it will be `2015, which is off limits. lims <- as.POSIXct(strptime(c("2011-01-01 03:00","2011-01-01 16:00"), format = "%Y-%m-%d %H:%M")) ggplot(df, aes(x=dates, y=times)) + geom_point() +...

You don't need any code to change the text that looks like dates into real dates. Select the column of dates, then click Data > Text To Columns > Next > Next In this dialog select Date as the data type and choose the order of MDY if the text...

python,time-series,statsmodels,autoregressive-models

There is nothing wrong. That's the behavior of a stationary ARMA process where predictions converge to the mean. If you have fixed seasonality, then you could difference the time series at the seasonal lag, i.e. use a SARIMA, and the prediction would converge to a fixed seasonal structure. If you...

matlab,time-series,forecasting

The term y_real-y_pred is the vector of errors. The expression squares each element of it, and then sqrts each element of it, thus having the effect of abs(). Then std() is run on the vector of errors. Thus, this is computing the S.D. of the (absolute) error. That is a...

First, we will define the two thresholds you specified. (I set the second one to 4 so we can work consistently with "<" and ">", instead of the error-prone "<" and ">="). threshold.data <- 10 threshold.NA <- 4 Now, the key is to work with run length encoding on is.na(y)....

Create a new data table with the sundays data: MUSEUS_PLOT_SUNDAYS <- MUSEUS_PLOT[weekdays(MUSEUS_PLOT$VisitDate) == "Sunday"] And change the geom_vline for this: geom_vline(data = MUSEUS_PLOT_SUNDAYS,aes(xintercept = as.numeric(VisitDate)),colour = "black") ...

The easiest approach is to fit a nested model with interactions rather than two separate models. So you can first generate a factor that encodes the two segments: fac <- factor(as.numeric(time(zoop) > as.Date("2005-01-24"))) fac <- zoo(fac, time(zoop)) And then you can fit a model where all coefficients are constrained to...

r,statistics,time-series,correlation,xts

What about using rollapply in different way? As you dont supply the complete dataset, here a demonstration how I mean it: set.seed(123) m <- matrix(rnorm(100), ncol = 10) rollapply(1:nrow(m), 5, function(x) cor.mean(m[x,])) [1] -0.080029692 -0.038168840 -0.058443824 0.005699772 -0.014459878 -0.021569173 As I just figured out, you can also use the function...

d3.js,time-series,timeline,timeserieschart

Here is an augmentation of your JS fiddle, Demo: http://jsfiddle.net/robschmuecker/c8txLxo9/ It takes the data you have and then parses it to get a collection of years so that we only insert one dom element per year rather than several. Then we can conditionally add events for years which have more...

Upgrade comment You can change the x-axis labels using scale_x_date and formats from the scales package. Using the code from the ggplot2 scale_x_date help pages library(ggplot2) library(scales) # to access breaks/formatting functions # Change Date to date format aaci$dt <- as.Date(aaci$Date) # Plot # You can change the format to...

As per ?zoo: Subscripting by a zoo object whose data contains logical values is undefined. So you need to wrap the subsetting in a which call: log_ret[which(!is.finite(log_ret))] <- 0 log_ret x y z s p t 2005-01-01 0.234 -0.012 0 0 0.454 0 ...

r,math,statistics,time-series,forecasting

You seem to be confused between modelling and simulation. You are also wrong about auto.arima(). auto.arima() does allow exogenous variables via the xreg argument. Read the help file. You can include the exogenous variables for future periods using forecast.Arima(). Again, read the help file. It is not clear at all...

I think there are two critical points: (1) sorting by Year and Term so that the order corresponds to temporal order; and (2) using groupby to collect on IDs before selecting and shifting the Rating. So, from a frame like >>> df ID Year Term Rating 0 1 2010 0...

javascript,django,python-2.7,highcharts,time-series

This is a good case for the pointStart and pointInterval properties, on a datetime type x axis. Example: http://jsfiddle.net/jlbriggs/92gkjwo3/ You can use the axis label formatter function, and the tickInterval properly to define the placement and format of the labels. References: http://api.highcharts.com/highcharts#xAxis.labels.formatter http://api.highcharts.com/highcharts#plotOptions.series.pointInterval http://api.highcharts.com/highcharts#plotOptions.series.pointStart ...

Here is one solution to what I think you are after. Generate data. myData <- mapply(rnorm, 1000, 200, mean=seq(-50,50,0.5)) This is a matrix with 1000 rows (observations) and 201 time points. In each time point the mean of data there shifts gradually from -50 to 50. By 0.5 each time....

1 You can aggregate with a data.table library(data.table) # This turns all Jans to 1 and Decs to 12 for example mth <- month(as.Date(df$date)) dt2 <- as.data.table(df) # turn df into data table dt dt2[, mth := mth] # pop month into your data frame setkey(dt2, "mth") # data tables...

Now I see what you mean. One way to handle this would be to create two time series, and use one for your calculations and plotting your data, and the other for the tic marks. Like this: library(xts) n <- 1000 d1 <- seq(as.Date("2001-01-01"),as.Date("2021-01-01"),length.out=n) d1y <- seq(as.Date("2001-01-01"),as.Date("2021-01-01"),length.out=21) d2 <- rnorm(n,10,1)...

Give this a try. Using map to pull directly from your series of averages df["diff"] = df["snow_depth"] - df["month"].map(nameofyourseries) year month snow_depth diff 0 1979 1 18.322581 3.937382 1 1979 2 11.535714 -3.776587 2 1979 3 5.322581 -1.187855 3 1979 4 0.300000 0.031092 4 1979 5 0.000000 -0.005819 5 1979...

I will try to iterate to an answer, but being so many branches of discussion, i prefer to access directly onto this format. Whatever mean, this is a constructive process, as the purpose of this forum is... Some previous "clarifications": The Output Covariance from EstSpec.Q after and before running the...

One method after converting to datetime64, if frequency sampling rate is the same then we could call diff() to calculate the difference between all rows which should be the same and compare this with a np.timedelta64 type, so for your sample data this would be: In [277]: all(df.datetime.diff()[1:] == np.timedelta64(1,...

java,android,gps,time-series,kalman-filter

The answer is simple, the SensorEvent.timestamp has an arbitrary zero reference: It turns out after a bit of Googling (tip o' the hat to StackOverflow, as usual) that the timestamp one receives isn't based off of any particular 0-point defined in the Android OS or the API; it's an arbitrary...

You could try something like this: # make an index of the latest events last_event_index <- cumsum(df$event) + 1 # shift it by one to the right last_event_index <- c(1, last_event_index[1:length(last_event_index) - 1]) # get the dates of the events and index the vector with the last_event_index, # added an...

python,date,datetime,pandas,time-series

After understanding what you want this is much simpler, so we calculate whether the difference between the current and previous rows is larger than 5 days giving us a boolean series, we use this filter the df and then use the index value to perform slicing: In [57]: inactive_index =...

python,datetime,numpy,pandas,time-series

You can do this using the groupby, just subtract each group's mean from the values for that group: average_diff = ts.groupby([ts.index.month, ts.index.day]).apply( lambda g: g - g.mean() ) ...

Mannat here is an answer using data.table package to help you aggregate. Use install.packages(data.table) to first get it into your R. library(data.table) # For others # I copied your data into a csv file, Mannat you will not need this step, # other helpers look at data in DATA section...

ts.intersect determines whether the objects is a ts object by looking for the tsp attribute. as.xts.ts removes the tsp attribute, which is why it is not coerced back to a ts object. This looks like a bug in xts->ts->xts conversion, but I need to take a closer look. As a...

python,pandas,time-series,shift

In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007], 'value':[5,10,8,72,12,13] }) In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ] In [590]: df Out[590]: date value previous_value 0 2000 5 NaN 1 2001 10 5 2 2003 8 NaN 3 2004 72 8 4 2005 12 72 5 2007 13 NaN...

I don't see the xts frequency argument doing the same thing as the ts frequency argument. So, I assume you need to convert your data into a ts object before you use decompose. The way I got it to work is the following: Using the following data: data(sample_matrix) df <-...

mongodb,mapreduce,time-series,mongodb-query,nosql-aggregation

This will be hard to achieve using the aggregation framework. But it "works" well with MapReduce. Something along the lines of that (untested): // collect *individual* values map = function() { for (var min in this.values) for (sec in this.values[min]) data = {value: {}, count: {}} data.value[this.name] = this.values[min][sec] data.count[this.name]...

cassandra,apache-spark,time-series,cql

I'm trying to understand what exactly happens internally in storage engine level when a row(columns) is inserted in a CQL style table. Let's say that I build tables with both of your PRIMARY KEYs, and INSERT some data: [email protected]:stackoverflow2> SELECT userid, time, dateof(time), category, subcategory, itemid, count, price FROM...

You do not need time series, just tapply: res=tapply(AVG_LOSCAT2$AVG_LOSCAT, list(year = AVG_LOSCAT2$YEAR, month = AVG_LOSCAT2$MONTH), round,2) res month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 NA NA NA NA NA 7.51 7.31 8.33 7.66 5.36 6.46 8.30 2013 5.74 7.89 6.49 7.09 5.91...

R has multiple ways of represeting time series. Since you're working with daily prices of stocks, you may wish to consider that financial markets are closed on weekends and business holidays so that trading days and calendar days are not the same. However, you may need to work with your...

fitted gives in-sample one-step forecasts. The "right" way to do what you want is via a Kalman smoother. A rough approximation good enough for most purposes is obtained using the average of the forward and backward forecasts for the missing section. Like this: x <- AirPassengers x[90:100] <- NA fit...

So (X,y) is your train set (356 data instances with their labels), to forecast the first month of the next year your SVR Model need a data set X_nextMonth (30 data instances with the same features as those of X) to pass as argument to its .predict() method that he...

Use the dynlm package. Here is an example using the data you supplied: library(dynlm) dfX = read.table( textConnection( "Date YY XX ZZ MM 03.01.2005 2.154 2.089 0.001 344999 04.01.2005 2.151 2.084 0.006 344999 05.01.2005 2.151 2.087 -0.007 333998 06.01.2005 2.15 2.085 -0.005 333998 07.01.2005 2.146 2.086 -0.006 333998 10.01.2005 2.146...

matlab,for-loop,return,time-series,volatility

I think you are a bit confused about how matrix indexing works in Matlab. If understood correctly, you have a variable TR_t with which you want to store the value for time t. You then try to do the following: TR_t = TR_{t-1} * exp(R_t); I will try to explain...

database,cassandra,time-series,data-modeling,cql

Writing this question has helped me sort out some of my problems. I've come up with an alternative solution which I am more-or-less happy with but will need some fine-tuning. There is the possibility of calculating all of the time buckets we need to access, making a query for each...

time-series,forecasting,state-space

The state vector is exactly the same in the multiplicative case as in the additive case. All the equations are given here: https://www.otexts.org/fpp/7/7 For the ETS(M,Md,N) model, ...

If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the...

time-series,sampling,measurement,probability-theory

I'm going to approach this problem as if it were on a test. First, let's name the variables. Bx is value of the boolean variable after x opportunities to flip (and B0 is the initial state). P is the chance of changing to a different value every opportunity. Given that...

Use GetFitARpMLE(z,4) You will get > GetFitARpMLE(z,4) $loglikelihood [1] -2350.516 $phiHat ar1 ar2 ar3 ar4 0.0000000 0.0000000 0.0000000 -0.9262513 $constantTerm [1] 0.05388392 ...

I solved the direct question so this is technically the answer while I don't completely understand why. I read through the HTS code on using the trace() function and found the line causing issues: else if (fmethod == "arima") { models <- auto.arima(x, lambda = lambda, xreg = xreg, parallel...

try using mean instead of sum like this ggplot(data = df, aes(x = Month, y = Count.V)) + stat_summary(fun.y = mean, geom ="line")+ stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) + geom_point()+ scale_x_date(labels = date_format("%m-%y"), breaks = "3 months") ...

r,data.frame,time-series,weekend

Find the first Saturday in your data, then assign a week ID to all dates in your data set based on that : library(lubridate) # for the wday() and ymd() functions daily_FWIH$Date <- ymd(daily_FWIH$Date) saturdays <- daily_FWIH[wday(daily_FWIH$Date) == 7, ] # filter for Saturdays startDate <- min(saturdays$Date) # select first...

python,time-series,scikit-learn,regression,prediction

Here is my guess about what is happening in your two types of results: .days does not convert your index into a form that repeats itself between your train and test samples. So it becomes a unique value for every date in your dataset. As a consequence your models either...

matlab,time-series,libsvm,forecasting

A Support-Vector-Regression based predictor is used for exactly that. It shall stand for PH >= 1. The value of epsilon in the epsilon-SVR model specifies the epsilon-tube, within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value Y(t)....

There must be more than 2 periods, so frequency must be less than n/2 n = 1000 x = ts(0.1*rnorm(n) + sin(6*pi*(1:n)/n) + (1:n)/n, frequency=n/2.1) plot(x) stl(x,"per") ...

python,pandas,group-by,time-series

(Am a bit amused, as this question caught me doing the exact same thing.) You could do something like valgdata\ .groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\ .mean()\ .unstack() which would reverse the groupby unstack the new sites to be columns To plot, just do the previous snippet immediately followed by .plot(): valgdata\ .groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\...

You are very close. You want the x-axis to be a measure of where in the year you are, but you have it as a character vector and so are getting every single point labelled. If you instead make a continuous variable represent this, you could have better results. One...

Do not use the dates in your plot, use a numeric sequence as x axis. You can use the dates as labels. Try something like this: y=GED$Mfg.Shipments.Total..USA. n=length(y) model_a1 <- auto.arima(y) plot(x=1:n,y,xaxt="n",xlab="") axis(1,at=seq(1,n,length.out=20),labels=index(y)[seq(1,n,length.out=20)], las=2,cex.axis=.5) lines(fitted(model_a1), col = 2) The result depending on your data will be something similar: ...

We can remove the 'NA' elements with !is.na(x), but the lag(x) will return NA as the first element, which can be removed by using na.rm=TRUE in the sd volcalc= function (x) { x <- x[!is.na(x)] returns=log(x)-log(lag(x)) vol=sd(returns, na.rm=TRUE)*sqrt(252) return(vol) } apply(dataexample, 2, volcalc) # x y #3.012588 1.030484 ...

python,pandas,time-series,statsmodels,trend

Quick and dirty ... # get some data import pandas.io.data as web import datetime start = datetime.datetime(2015, 1, 1) end = datetime.datetime(2015, 4, 30) df=web.DataReader("F", 'yahoo', start, end) # a bit of munging - better column name - Day as integer df = df.rename(columns={'Adj Close':'AdjClose'}) dayZero = df.index[0] df['Day'] =...

You are not allowing dynlm to use the same amount of data as in lm. The latter model contains two fewer observations. dim(model.frame(reg1)) # [1] 24 7 dim(model.frame(lmx)) # [1] 22 7 The reason is that withlm you are transforming the variables (differencing) with the entire data set (31 observations),...

Your timestamps are in milliseconds. You need to convert them to seconds to be able to use them with as.POSIXct. And there's no point in calling strptime on a POSIXct vector. Also, it's good practice to explicitly set the timezone, rather than leave it set to "". df$datetime <- as.POSIXct(df$timestamp/1000,...

On Friday, May 29, 2015 at 2:05:06 PM UTC-4, Rick wrote: (1) No, not necessarily. Turning off a flag (i.e., setting a particular element of an input "solve" flag to logical FALSE) holds the corresponding parameter value fixed throughout the estimation. For example, if, say, the 3rd element of the...

By default, time series plots in R use type = "l", which means that you get a line but no point characters. To get both, you can change your type to "b". xyplot(a1, col = "red", pch = 2, type = "b") This yields: The same logic applies to the...

Here's something that's almost your dataframe (I avoided copying the dates): df = pd.DataFrame({ 'col1': [1, 1, 1, 2, 2, 2], 'col2': [1, 2, 3, 1, 2, 3], 'date': [1, 9, 10, 10, 10, 25] }) With this, define: def max_diff_date(g): g = g.sort(columns=['date']) return g.col2.ix[(g.date.ix[1: ] - g.date.shift(1)).argmax() -...

I took the freedom to define a simple fnc function. The idea is to loop over the indices of n_lens and not on the values of n_lens. Nested for loops may be (will be?) slower in R compared to other ways of R. It produces the required output. fnc <-...