Use GetFitARpMLE(z,4) You will get > GetFitARpMLE(z,4) $loglikelihood [1] -2350.516 $phiHat ar1 ar2 ar3 ar4 0.0000000 0.0000000 0.0000000 -0.9262513 $constantTerm [1] 0.05388392 ...
Upgrade comment You can change the x-axis labels using scale_x_date and formats from the scales package. Using the code from the ggplot2 scale_x_date help pages library(ggplot2) library(scales) # to access breaks/formatting functions # Change Date to date format aaci$dt <- as.Date(aaci$Date) # Plot # You can change the format to...
One method after converting to datetime64, if frequency sampling rate is the same then we could call diff() to calculate the difference between all rows which should be the same and compare this with a np.timedelta64 type, so for your sample data this would be: In [277]: all(df.datetime.diff()[1:] == np.timedelta64(1,...
matlab,time-series,forecasting
The term y_real-y_pred is the vector of errors. The expression squares each element of it, and then sqrts each element of it, thus having the effect of abs(). Then std() is run on the vector of errors. Thus, this is computing the S.D. of the (absolute) error. That is a...
We can remove the 'NA' elements with !is.na(x), but the lag(x) will return NA as the first element, which can be removed by using na.rm=TRUE in the sd volcalc= function (x) { x <- x[!is.na(x)] returns=log(x)-log(lag(x)) vol=sd(returns, na.rm=TRUE)*sqrt(252) return(vol) } apply(dataexample, 2, volcalc) # x y #3.012588 1.030484 ...
On Saturday, May 16, 2015 at 6:09:20 AM UTC-4, Rick wrote: Your assessment is generally correct, "nX" and "b" parameters do indeed correspond to the exogenous input data "x(t)". The number of columns (i.e., time series) in x(t) is "nX" and is what SAS calls "r", and the coefficient vector...
python,pandas,time-series,shift
In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007], 'value':[5,10,8,72,12,13] }) In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ] In [590]: df Out[590]: date value previous_value 0 2000 5 NaN 1 2001 10 5 2 2003 8 NaN 3 2004 72 8 4 2005 12 72 5 2007 13 NaN...
Is this what you want? Code: p <- ggplot(data = dtm, aes(x = asDate, y = mortes, group=interaction(date, trmt))) p + geom_boxplot(aes(fill = factor(dtm$trmt))) The key is to group by interaction(date, trmt) so that you get all of the boxes, and not cast asDate to a factor, so that ggplot...
Here's a very straightforward solution that isn't pretty but does the job. First, just a change to your data to make comparisons easier: mtable<-data.frame(date,t.1,t.2,m.result, stringsAsFactors = FALSE) Edited in: If you want to assure the matches are ordered by date, you can use order as pointed out by @eipi10: mtable...
r,math,statistics,time-series,forecasting
You seem to be confused between modelling and simulation. You are also wrong about auto.arima(). auto.arima() does allow exogenous variables via the xreg argument. Read the help file. You can include the exogenous variables for future periods using forecast.Arima(). Again, read the help file. It is not clear at all...
fitted gives in-sample one-step forecasts. The "right" way to do what you want is via a Kalman smoother. A rough approximation good enough for most purposes is obtained using the average of the forward and backward forecasts for the missing section. Like this: x <- AirPassengers x[90:100] <- NA fit...
By default, time series plots in R use type = "l", which means that you get a line but no point characters. To get both, you can change your type to "b". xyplot(a1, col = "red", pch = 2, type = "b") This yields: The same logic applies to the...
sql-server,duplicates,time-series,sql-delete
You can do this using a CTE and ROW_NUMBER: SQL Fiddle WITH CteGroup AS( SELECT *, grp = ROW_NUMBER() OVER(ORDER BY MS) - ROW_NUMBER() OVER(PARTITION BY Value ORDER BY MS) FROM YourTable ), CteFinal AS( SELECT *, RN_FIRST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS), RN_LAST = ROW_NUMBER()...
You don't need any code to change the text that looks like dates into real dates. Select the column of dates, then click Data > Text To Columns > Next > Next In this dialog select Date as the data type and choose the order of MDY if the text...
Do not use the dates in your plot, use a numeric sequence as x axis. You can use the dates as labels. Try something like this: y=GED$Mfg.Shipments.Total..USA. n=length(y) model_a1 <- auto.arima(y) plot(x=1:n,y,xaxt="n",xlab="") axis(1,at=seq(1,n,length.out=20),labels=index(y)[seq(1,n,length.out=20)], las=2,cex.axis=.5) lines(fitted(model_a1), col = 2) The result depending on your data will be something similar: ...
You could try something like this: # make an index of the latest events last_event_index <- cumsum(df$event) + 1 # shift it by one to the right last_event_index <- c(1, last_event_index[1:length(last_event_index) - 1]) # get the dates of the events and index the vector with the last_event_index, # added an...
Use aggregate.ts with sum, mean or whatever summary function desired. See ?aggregate.ts > aggregate(tser, 4, sum) Qtr1 Qtr2 Qtr3 Qtr4 2010 10.21558 15.22923 21.98924 30.94460 2011 39.81982 45.00208 61.26129 73.03194 2012 87.63780 97.27455 104.69757 115.09325 2013 126.71070 138.39925 145.47344 159.00137 ...
python,pandas,group-by,time-series
(Am a bit amused, as this question caught me doing the exact same thing.) You could do something like valgdata\ .groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\ .mean()\ .unstack() which would reverse the groupby unstack the new sites to be columns To plot, just do the previous snippet immediately followed by .plot(): valgdata\ .groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\...
Try this (assuming your data is called df) ts(df$Number, start = c(2010, 01), frequency = 12) ## Jan Feb Mar ## 2010 1 19 1 Edit: this will work only if you don't have missing dates and your data is in correct order. For a more general solution see @Anandas...
r,ggplot2,time-series,timeserieschart
This is the solution: library(ggplot2) library(reshape2) library(ecp) synthetic_control.data <- read.table("/Users/geoHeil/Dropbox/6.Semester/BachelorThesis/rResearch/data/synthetic_control.data.txt", quote="\"", comment.char="") n <- 2 s <- sample(1:100, n) idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s) sample2 <- synthetic_control.data[idx,] df = as.data.frame(t(as.matrix(sample2))) #calculate the change points changeP <- e.divisive(as.matrix(df[1]), k=8, R = 400, alpha = 2, min.size = 3)...
r,datetime,merge,data.frame,time-series
It is sometimes hard to avoid loops, especially when you have conditions like you do. Sometimes we end up spending much efforts avoiding them while they are probably either the best we can do, or are not too far behind in terms of performance and/or readability. Having said that, this...
Here is one solution to what I think you are after. Generate data. myData <- mapply(rnorm, 1000, 200, mean=seq(-50,50,0.5)) This is a matrix with 1000 rows (observations) and 201 time points. In each time point the mean of data there shifts gradually from -50 to 50. By 0.5 each time....
python,pandas,time-series,statsmodels,trend
Quick and dirty ... # get some data import pandas.io.data as web import datetime start = datetime.datetime(2015, 1, 1) end = datetime.datetime(2015, 4, 30) df=web.DataReader("F", 'yahoo', start, end) # a bit of munging - better column name - Day as integer df = df.rename(columns={'Adj Close':'AdjClose'}) dayZero = df.index[0] df['Day'] =...
Now I see what you mean. One way to handle this would be to create two time series, and use one for your calculations and plotting your data, and the other for the tic marks. Like this: library(xts) n <- 1000 d1 <- seq(as.Date("2001-01-01"),as.Date("2021-01-01"),length.out=n) d1y <- seq(as.Date("2001-01-01"),as.Date("2021-01-01"),length.out=21) d2 <- rnorm(n,10,1)...
You could use matplot as follows: matplot(cbind(xtsplot1, xtsplot2, xtsplot3), xaxt = "n", xlab = "Time", ylab = "Value", col = 1:3, ann = FALSE, type = 'l') ...
matlab,for-loop,return,time-series,volatility
I think you are a bit confused about how matrix indexing works in Matlab. If understood correctly, you have a variable TR_t with which you want to store the value for time t. You then try to do the following: TR_t = TR_{t-1} * exp(R_t); I will try to explain...
As was pointed out in the documentation from ?lag.xts, this is the intended behavior.
javascript,django,python-2.7,highcharts,time-series
This is a good case for the pointStart and pointInterval properties, on a datetime type x axis. Example: http://jsfiddle.net/jlbriggs/92gkjwo3/ You can use the axis label formatter function, and the tickInterval properly to define the placement and format of the labels. References: http://api.highcharts.com/highcharts#xAxis.labels.formatter http://api.highcharts.com/highcharts#plotOptions.series.pointInterval http://api.highcharts.com/highcharts#plotOptions.series.pointStart ...
This seemed to work. I got tripped up by trying to use the date as the bottom window cutoff, but then the duplicates add 0's as a min where there wouldn't be otherwise. as.POSIXct might not be necessary depending on what format your date is. I also used 60 seconds...
First, we will define the two thresholds you specified. (I set the second one to 4 so we can work consistently with "<" and ">", instead of the error-prone "<" and ">="). threshold.data <- 10 threshold.NA <- 4 Now, the key is to work with run length encoding on is.na(y)....
cassandra,apache-spark,time-series,cql
I'm trying to understand what exactly happens internally in storage engine level when a row(columns) is inserted in a CQL style table. Let's say that I build tables with both of your PRIMARY KEYs, and INSERT some data: [email protected]:stackoverflow2> SELECT userid, time, dateof(time), category, subcategory, itemid, count, price FROM...
r,time-series,shiny,forecasting
The issue wound up being that I was using the arima(...) function instead of Arima(...). It turns out that they are two different functions. The issue that I was experiencing was a result of differences in how the functions store their data. More information about this can be found in...
Your timestamps are in milliseconds. You need to convert them to seconds to be able to use them with as.POSIXct. And there's no point in calling strptime on a POSIXct vector. Also, it's good practice to explicitly set the timezone, rather than leave it set to "". df$datetime <- as.POSIXct(df$timestamp/1000,...
sql,.net,sql-server,time-series
1) You probably want to explore the use of partitions. This will allow very effective inserts (its a meta operation if you do the partitioning correctly) and very fast (2). You may want to explore columnstore indexes because the data (once collected) will never change and you will have very...
The easiest approach is to fit a nested model with interactions rather than two separate models. So you can first generate a factor that encodes the two segments: fac <- factor(as.numeric(time(zoop) > as.Date("2005-01-24"))) fac <- zoo(fac, time(zoop)) And then you can fit a model where all coefficients are constrained to...
python,pandas,time-series,freeze
Your approach looks a little complicated ... I hope my simplification is this what you need ... # get an index of pandas Timestamps df.index = pd.to_datetime(df.Date + ' ' + df.Time) # get the column we want as a pandas Series called price price = df['Close'] Update # use...
matlab,statistics,time-series,histogram,fractals
Your code seems to be generally bug-free but I made some changes since you perform needless repetitions over loops (I moved the outer loop inside and "vectorized" it since all moment calculations can be performed simultaneously for a given histogram. Also, it is building the histogram that takes longest). intel...
Since you are specifying the time and date in the keys, you can do this by projecting the keys you want displayed. So if you wanted the week from 16 to 22 February, you could do something like this: db.servers.find( { "_id": "i-09484d47_201502" }, { "values.16": 1, "values.17": 1, "values.18":...
Try this (short is the name of your 2nd matrix): res <- as.matrix(merge(long.date.col, short, all.x = T)) res[is.na(res)] <- "-9999" ...
You do not need time series, just tapply: res=tapply(AVG_LOSCAT2$AVG_LOSCAT, list(year = AVG_LOSCAT2$YEAR, month = AVG_LOSCAT2$MONTH), round,2) res month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 NA NA NA NA NA 7.51 7.31 8.33 7.66 5.36 6.46 8.30 2013 5.74 7.89 6.49 7.09 5.91...
I took the freedom to define a simple fnc function. The idea is to loop over the indices of n_lens and not on the values of n_lens. Nested for loops may be (will be?) slower in R compared to other ways of R. It produces the required output. fnc <-...
cassandra,time-series,composite-key
My intention of this question was more like this. Cassandra storage internal Check it out....
matlab,oop,time-series,linear-regression,superclass
To determine this, you can use the superclasses function: superclasses('LinearModel') superclasses('GeneralizedLinearMixedModel') This will return the names of the visible superclasses for each case. As you'll see, both inherit from the abstract superclass classreg.regr.ParametricRegression. You can also view the actual classdef files and look at the inheritances. In your Command Window,...
python,date,datetime,pandas,time-series
After understanding what you want this is much simpler, so we calculate whether the difference between the current and previous rows is larger than 5 days giving us a boolean series, we use this filter the df and then use the index value to perform slicing: In [57]: inactive_index =...
I found a way of doing this, not too happy about it tho: full_index = [] for g in all_genders: for s in all_states: for m in all_months: full_index.append((g, s, m)) df = df.set_index(['Gender', 'State', 'Month']) df = df.reindex(full_index) # fill in all missing values So basically, instead of dealing...
python,pandas,time-series,ipython-notebook
Your problem (as spotted by @ J Richard Snape) is that your dates are in fact strings so it's ordered lexicographically. You should convert to datetime dtype: df1['Ship_date'] = pd.to_datetime(df1['Ship_date']) After which it should maintain the expected order....
Not really sure why you want this, but here you go: library(data.table) dt = as.data.table(E) # or convert in place using setDT dt[, .(contract_len = as.numeric(difftime(Date[.N], Date[1], unit = 'days')), first_pay = Date[1], last_pay = Date[.N], num_payments = .N, payment = sum(Amt), summary = list(data.table(Date, Amt))) , by = ID]...
There is an answer on the Mathworks website that I think you will find helpful: http://www.mathworks.com/matlabcentral/answers/92565-how-do-i-control-axis-tick-labels-limits-and-axes-tick-locations. Basically what you want to do is manipulate the XTick or XTickLabel attributes of the current axis handle. Lets say I have a plot that spans 100 years from 1900 - 2000. After creating...
Here's something that's almost your dataframe (I avoided copying the dates): df = pd.DataFrame({ 'col1': [1, 1, 1, 2, 2, 2], 'col2': [1, 2, 3, 1, 2, 3], 'date': [1, 9, 10, 10, 10, 25] }) With this, define: def max_diff_date(g): g = g.sort(columns=['date']) return g.col2.ix[(g.date.ix[1: ] - g.date.shift(1)).argmax() -...
I think the answer is that you can specify the alpha for each test via that iParams and sparams arguments. Without such a user specification, each test has a default alpha. The button to "Answer Your Question" doesn't seem to be working, so here it is, in the Comments.
Give this a try. Using map to pull directly from your series of averages df["diff"] = df["snow_depth"] - df["month"].map(nameofyourseries) year month snow_depth diff 0 1979 1 18.322581 3.937382 1 1979 2 11.535714 -3.776587 2 1979 3 5.322581 -1.187855 3 1979 4 0.300000 0.031092 4 1979 5 0.000000 -0.005819 5 1979...
r,graph,plot,ggplot2,time-series
Two ways of doing this: If sample data created as follows: Full.df <- data.frame(Date = as.Date("2006-01-01") + as.difftime(0:364, units = "days")) Full.df$Month <- as.Date(format(Full.df$Date, "%Y-%m-01")) Full.df[paste0("Count.", c("S", "G", "W", "F"))] <- matrix(sample(100, 365 * 4, replace = TRUE), ncol = 4) Optimal way using reshape2 package: molten <- melt(Full.df, id.vars...
Create a new data table with the sundays data: MUSEUS_PLOT_SUNDAYS <- MUSEUS_PLOT[weekdays(MUSEUS_PLOT$VisitDate) == "Sunday"] And change the geom_vline for this: geom_vline(data = MUSEUS_PLOT_SUNDAYS,aes(xintercept = as.numeric(VisitDate)),colour = "black") ...
R has multiple ways of represeting time series. Since you're working with daily prices of stocks, you may wish to consider that financial markets are closed on weekends and business holidays so that trading days and calendar days are not the same. However, you may need to work with your...
I solved the direct question so this is technically the answer while I don't completely understand why. I read through the HTS code on using the trace() function and found the line causing issues: else if (fmethod == "arima") { models <- auto.arima(x, lambda = lambda, xreg = xreg, parallel...
r,nested,time-series,lapply,sapply
Using plyr: As a matrix (time in cols, rows corresponding to rows of df): aaply(df, 1, function(x) weisurv(t, x$sc, x$shp), .expand = FALSE) As a list: alply(df, 1, function(x) weisurv(t, x$sc, x$shp)) As a data frame (structure as per matrix above): adply(df, 1, function(x) setNames(weisurv(t, x$sc, x$shp), t)) As a...
python,time-series,statsmodels,autoregressive-models
There is nothing wrong. That's the behavior of a stationary ARMA process where predictions converge to the mean. If you have fixed seasonality, then you could difference the time series at the seasonal lag, i.e. use a SARIMA, and the prediction would converge to a fixed seasonal structure. If you...
try using mean instead of sum like this ggplot(data = df, aes(x = Month, y = Count.V)) + stat_summary(fun.y = mean, geom ="line")+ stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) + geom_point()+ scale_x_date(labels = date_format("%m-%y"), breaks = "3 months") ...
You can use data.table too. data.table is a very powerful data manipulation package. You can get started here. library("data.table") as.data.table(testdata)[, lapply(.SD, function(x)x/shift(x) - 1), .SDcols = 2:4] gdp cpi_index rpi_index 1: NA NA NA 2: 0.006427064 0.0072281257 0.009296686 3: 0.007166400 0.0030245056 0.004805767 4: 0.004061822 0.0061020008 0.006377043 5: 0.006772674 0.0009282349 0.005544554...
Your values variable is a factor (usually used for categorical values). Convert values to numeric before creating time series: values <- as.numeric(levels(values))[values] ...
For a ggplot2 plot first convert df to long form (using melt from the reshape2 package), convert the date column to "Date" class and the value column to a factor and then use geom_tile: library(ggplot2) library(reshape2) long <- melt(df, measure.var = 2:4) long <- transform(long, date = as.Date(long$date, "%d/%m/%Y"), value...
I will try to iterate to an answer, but being so many branches of discussion, i prefer to access directly onto this format. Whatever mean, this is a constructive process, as the purpose of this forum is... Some previous "clarifications": The Output Covariance from EstSpec.Q after and before running the...
You are not allowing dynlm to use the same amount of data as in lm. The latter model contains two fewer observations. dim(model.frame(reg1)) # [1] 24 7 dim(model.frame(lmx)) # [1] 22 7 The reason is that withlm you are transforming the variables (differencing) with the entire data set (31 observations),...
If your data has the value 4199, this means that you included the date column when trying to form your ts object. Since you specified the start and frequency of your time series in your ts function, you no longer need the date values as it will be generated by...
matlab,time-series,libsvm,forecasting
A Support-Vector-Regression based predictor is used for exactly that. It shall stand for PH >= 1. The value of epsilon in the epsilon-SVR model specifies the epsilon-tube, within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value Y(t)....
python,pandas,match,time-series,multi-index
This method is a little messy, but I am trying to make it more robust to account for missing data. First, we'll remove duplicates in the data and then convert the dates to Pandas Timestamps: df = df.drop_duplicates() df.SampleDate = [pd.Timestamp(ts) for ts in df.SampleDate] Then let's arrange you DataFrame...
There must be more than 2 periods, so frequency must be less than n/2 n = 1000 x = ts(0.1*rnorm(n) + sin(6*pi*(1:n)/n) + (1:n)/n, frequency=n/2.1) plot(x) stl(x,"per") ...
set.seed(1) r <- rnorm(20,0,1) z <- c(1,1,1,1,1,-1,-1,-1,1,-1,1,1,1,-1,1,1,-1,-1,1,-1) data <- as.data.frame(na.omit(cbind(z, r))) series1 <- ts(cumsum(c(1,data[,2]*data[,1]))) series2 <- ts(cumsum(c(1,data[,2]))) d1y <- seq(as.Date("1991-01-01"),as.Date("2015-01-01"),length.out=24) matplot(cbind(series1, series2), xaxt = "n", xlab = "Time", ylab = "Value", col = 1:3, ann = TRUE, type = 'l', lty = 1) axis(1, at=seq(2,20,2), labels=format(d1y[seq(2,20,2)],"%Y")) ...
r,datetime,time-series,forecasting
Here is a simple example assuming weekly data: x <- ts(rnorm(200), frequency=52) endx <- end(x) window(x, end=c(endx[1],endx[2]-3)) Of course, there are not actually 52 weeks in a year, but that is probably a complication that can be overlooked for most analyses....
As per ?zoo: Subscripting by a zoo object whose data contains logical values is undefined. So you need to wrap the subsetting in a which call: log_ret[which(!is.finite(log_ret))] <- 0 log_ret x y z s p t 2005-01-01 0.234 -0.012 0 0 0.454 0 ...
d3.js,time-series,timeline,timeserieschart
Here is an augmentation of your JS fiddle, Demo: http://jsfiddle.net/robschmuecker/c8txLxo9/ It takes the data you have and then parses it to get a collection of years so that we only insert one dom element per year rather than several. Then we can conditionally add events for years which have more...
mongodb,mapreduce,time-series,mongodb-query,nosql-aggregation
This will be hard to achieve using the aggregation framework. But it "works" well with MapReduce. Something along the lines of that (untested): // collect *individual* values map = function() { for (var min in this.values) for (sec in this.values[min]) data = {value: {}, count: {}} data.value[this.name] = this.values[min][sec] data.count[this.name]...
Mannat here is an answer using data.table package to help you aggregate. Use install.packages(data.table) to first get it into your R. library(data.table) # For others # I copied your data into a csv file, Mannat you will not need this step, # other helpers look at data in DATA section...
I think there are two critical points: (1) sorting by Year and Term so that the order corresponds to temporal order; and (2) using groupby to collect on IDs before selecting and shifting the Rating. So, from a frame like >>> df ID Year Term Rating 0 1 2010 0...
r,statistics,time-series,correlation,xts
What about using rollapply in different way? As you dont supply the complete dataset, here a demonstration how I mean it: set.seed(123) m <- matrix(rnorm(100), ncol = 10) rollapply(1:nrow(m), 5, function(x) cor.mean(m[x,])) [1] -0.080029692 -0.038168840 -0.058443824 0.005699772 -0.014459878 -0.021569173 As I just figured out, you can also use the function...
I don't see the xts frequency argument doing the same thing as the ts frequency argument. So, I assume you need to convert your data into a ts object before you use decompose. The way I got it to work is the following: Using the following data: data(sample_matrix) df <-...
ts.intersect determines whether the objects is a ts object by looking for the tsp attribute. as.xts.ts removes the tsp attribute, which is why it is not coerced back to a ts object. This looks like a bug in xts->ts->xts conversion, but I need to take a closer look. As a...
java,android,gps,time-series,kalman-filter
The answer is simple, the SensorEvent.timestamp has an arbitrary zero reference: It turns out after a bit of Googling (tip o' the hat to StackOverflow, as usual) that the timestamp one receives isn't based off of any particular 0-point defined in the Android OS or the API; it's an arbitrary...
time-series,sampling,measurement,probability-theory
I'm going to approach this problem as if it were on a test. First, let's name the variables. Bx is value of the boolean variable after x opportunities to flip (and B0 is the initial state). P is the chance of changing to a different value every opportunity. Given that...
Use the dynlm package. Here is an example using the data you supplied: library(dynlm) dfX = read.table( textConnection( "Date YY XX ZZ MM 03.01.2005 2.154 2.089 0.001 344999 04.01.2005 2.151 2.084 0.006 344999 05.01.2005 2.151 2.087 -0.007 333998 06.01.2005 2.15 2.085 -0.005 333998 07.01.2005 2.146 2.086 -0.006 333998 10.01.2005 2.146...
r,time-series,permutation,quantmod
If you wanted to get a list of data frames, one for each pair, you could try: dfs <- lapply(seq_len(ncol(perm)), function(x) close[,paste0(perm[,x], ".Close")]) Now you can get the 2-column data frames for each pair with dfs[[1]], dfs[[2]], etc. You can perform statistical analyses on each pair using the lapply function....
r,data.frame,time-series,weekend
Find the first Saturday in your data, then assign a week ID to all dates in your data set based on that : library(lubridate) # for the wday() and ymd() functions daily_FWIH$Date <- ymd(daily_FWIH$Date) saturdays <- daily_FWIH[wday(daily_FWIH$Date) == 7, ] # filter for Saturdays startDate <- min(saturdays$Date) # select first...
Using rowsum seems to be faster (at least for this small example dataset) than the data.table approach: sgibb <- function(datframe) { data.frame(Group = unique(df$Group), Avg = rowsum(df$Weighted_Value, df$Group)/rowsum(df$SumVal, df$Group)) } Adding the rowsum approach to @platfort's benchmark: library(microbenchmark) library(dplyr) library(data.table) microbenchmark( Nader = df %>% group_by(Group) %>% summarise(res = sum(Weighted_Value)...
time-series,forecasting,state-space
The state vector is exactly the same in the multiplicative case as in the additive case. All the equations are given here: https://www.otexts.org/fpp/7/7 For the ETS(M,Md,N) model, ...
the error message says that you should use as.POSIXct on lims. You also need to add the date (year, month and day) in lims, because by default it will be `2015, which is off limits. lims <- as.POSIXct(strptime(c("2011-01-01 03:00","2011-01-01 16:00"), format = "%Y-%m-%d %H:%M")) ggplot(df, aes(x=dates, y=times)) + geom_point() +...
You are very close. You want the x-axis to be a measure of where in the year you are, but you have it as a character vector and so are getting every single point labelled. If you instead make a continuous variable represent this, you could have better results. One...
You can try Reduce(function(...) merge(..., by=c('Date', 'Month', 'Week', 'Year'), all=TRUE), list(Standard.df, Guardian.df, Welt.df)) ...
database,rest,time-series,publish-subscribe,iot
If you want a single solution, try ATSD, it does all of the above.
database,cassandra,time-series,data-modeling,cql
Writing this question has helped me sort out some of my problems. I've come up with an alternative solution which I am more-or-less happy with but will need some fine-tuning. There is the possibility of calculating all of the time buckets we need to access, making a query for each...
So (X,y) is your train set (356 data instances with their labels), to forecast the first month of the next year your SVR Model need a data set X_nextMonth (30 data instances with the same features as those of X) to pass as argument to its .predict() method that he...
NeweyWest calculates the 'lag' with this code: lag <- floor(bwNeweyWest(x, order.by = order.by, prewhite = prewhite, ar.method = ar.method, data = data)) ... and when called with the default arguments it replicates your (and my replication of it) error: >bwNeweyWest(m2,lag = NULL, order.by = NULL, prewhite = TRUE, adjust =...
I had to augment your example to get something to play with, but here is something that works. And I just changed it to eliminate lubridate... library(xts) d1 <- seq(as.Date("2001-01-01"),as.Date("2021-01-01"),"years") d2 <- rnorm(21,10,1) Dollar <- data.frame(d1,d2) dates <- as.Date(Dollar[,1], "%d.%m.%Y",tz="GMT") xtsplot <- as.xts(Dollar[,2], dates) plot(xtsplot, xaxt = "n", main="SMA", ann...
tabout from SSC may work for you: clear set more off *----- example data set ----- input /// id year occup 1 1999 1 1 2000 1 1 2001 1 2 1999 1 2 2000 2 2 2001 1 3 1999 1 3 2000 2 3 2001 2 4 1999...
the following worked for me: # create some random data with datetime index spanning 17 months s = pd.Series(index=pd.date_range(start=dt.datetime(2014,1,1), end = dt.datetime(2015,6,1)), data = np.random.randn(517)) In [25]: # now calc the mean for each month s.groupby(s.index.month).mean() Out[25]: 1 0.021974 2 -0.192685 3 0.095229 4 -0.353050 5 0.239336 6 -0.079959 7...
1 You can aggregate with a data.table library(data.table) # This turns all Jans to 1 and Decs to 12 for example mth <- month(as.Date(df$date)) dt2 <- as.data.table(df) # turn df into data table dt dt2[, mth := mth] # pop month into your data frame setkey(dt2, "mth") # data tables...
If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the...
On Friday, May 29, 2015 at 2:05:06 PM UTC-4, Rick wrote: (1) No, not necessarily. Turning off a flag (i.e., setting a particular element of an input "solve" flag to logical FALSE) holds the corresponding parameter value fixed throughout the estimation. For example, if, say, the 3rd element of the...
python,datetime,numpy,pandas,time-series
You can do this using the groupby, just subtract each group's mean from the values for that group: average_diff = ts.groupby([ts.index.month, ts.index.day]).apply( lambda g: g - g.mean() ) ...
python,time-series,scikit-learn,regression,prediction
Here is my guess about what is happening in your two types of results: .days does not convert your index into a form that repeats itself between your train and test samples. So it becomes a unique value for every date in your dataset. As a consequence your models either...