I think the error is caused by failing to understand R's syntax to define a function (and a further error in not knowing that column names such as "month" are not available as global variables. Try instead: multifactorglm <- function(x){ glm(rained ~ temp + humidity, data=x, family="binomial") } do.call(rbind, do(df,...
Edit: I got bit by a subtle data.table behavior. data.table keeps keys on summarized data, but only the ones you summarized on. So the join wasn't doing what I thought it was doing. Here is the exact same logic, but with one interim step to unset the partial key on...
r,plyr,tapply,split-apply-combine
Answer using data.table package: > dt <- data.table(eg = letters[1:8], Type=rep(c("F","W"), 4)) > a <- dt[, paste(eg, collapse=" "), by=Type] > a Type V1 1: F a c e g 2: W b d f h The bonus of using data.table is that this will still run in a few...
Normally the output is ordered, but you can come up with examples where it is not. For example if you have factors with unordered levels. df <- data.frame(Name = factor(c('Ben', 'Al'), levels = c('Ben', 'Al')), Earnings = c(1, 4)) tapply(df$Earnings, df$Name, sum) ## Ben Al ## 1 4 In that...
r,matrix,rstudio,scatter-plot,tapply
You can subset using typical methods for row subsetting; using which() is simple. For example, I want a scatterplot matrix of a few columns of mtcars, but I'm only interested in the rows where cyl is 4. pairs(mtcars[which(mtcars$cyl==4),c('disp','hp','drat')]) ...
Could try doing this via tapply tapply(seq_len(ncol(X)), cluster, function(x) f(T%*%X[, x])) # 0 1 # 3840.681 1238.826 ...
As mentioned in the comments, you missed an rnorm(). You can also use the vector c("control","ConditionB","ConditionC") and times = 300 instead of repeating rep() 3 times. Column1=rep(c("control","ConditionB","ConditionC"), times = 300) Column2=rnorm(900,mean=100,sd=10) data=data.frame(Column1,Column2) tapply(data$Column2,data$Column1,mean) ...
You may need to use cut mat <- tapply(moneyspent, list(gender, age=cut(age, breaks=c(20,30,40), include.lowest=TRUE)), mean) nm1 <- outer(rownames(mat), colnames(mat), FUN=paste) setNames(c(mat), nm1) #female [20,30] male [20,30] female (30,40] male (30,40] # 300 150 450 150 Other options include library(dplyr) data %>% group_by(gender, age=cut(age, breaks=c(20,30,40), include.lowest=TRUE)) %>% summarise(moneyspent=mean(moneyspent)) Or library(data.table) setDT(data)[, list(moneyspent=mean(moneyspent)),...
To get the Tukey HSD for each compound as you've specified, try this: lapply(unique(t.df$Compound), function(x, df) TukeyHSD(aov(glm(Proportion ~ Treatment, data = df, subset = Compound == x)))[[1]], df = t.df) For each unique compound, this calls TukeyHSD() on an ANOVA for a general linear model fit on the subset of...
To build on @jvcasill's original function and on other users' responses: dPrime <- function (data, subj = 1, stimDiff = 2, stimSame = 3) { # dPrime() returns a vector of the length of the number of subjects #+ in data[, subj] that contains the sensitivity index "d'" for each....
Try this... set.seed(42) df0 <- data.frame( X = rnorm(100,10,10), Y = rnorm(100), Z = rnorm(100)) df0$seq <- as.integer(df0$X) library(data.table) dt = data.table(df0) dt[,lapply(.SD, sum), by=seq ] seq X Y Z 1: 23 164.8144774 1.293768670 -3.74807730 2: 4 8.9247301 1.909529066 -0.06277254 3: 13 40.2090180 -2.036599633 0.88836392 4: 16 147.8571697 -2.571487358 -1.35542918...
You can use by and reference the rownames of the row returned by which.max: Data[by(Data, Data$G, function(dat) rownames(dat)[which.max(dat$X)] ),] # X Y G #4 1.595281 -0.3309078 1 #61 2.401618 0.9510128 2 #147 2.087167 0.9160193 3 #171 2.307978 -0.3887222 4 (This assumes set.seed(1) for reproducibility's sake)...
To give you an example of how this would work with the dplyr package if you want to calculate the means of Handle by groups of Period AND Queue: require(dplyr) ctrlmeans <- #data.frame to store your results ttp1 %.% #data.frane to use for analysis group_by(Period,Queue) %.% #grouping variables (you can...
r,plot,histogram,rstudio,tapply
Try this: par(mfrow = c(2, 2)) tapply(iris$Sepal.Length, iris$Species, hist) however, for multi-panel plots you might find the lattice or ggplot2 plackages more suitable. library(lattice) histogram(~ Sepal.Length | Species, iris) library(ggplot2) ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_wrap(~ Species) ...
Try with(iris[!iris$Petal.Width %in% c('1.3', '1.5'),], tapply(Sepal.Length, Species, median)) # setosa versicolor virginica # 5.0 5.8 6.5 The idea here is to operate on the subset-ted data in the first place. Your line didn't work because the FUN argument should be applied on X (Sepal.Length in your case) rather over the...
You can wrap it with invisible so that the list elements don't get printed in the R console. invisible(sapply(rownames(c), function(x) cat("Gear", x, "contains", c[x], "qsec\n"))) Gear 3 contains 15 qsec Gear 4 contains 12 qsec Gear 5 contains 5 qsec ...
You tagged your question with tapply, so here's a tapply answer: tapply(df[, "value"], INDEX=list(df[, "month"], df[, "date"]), FUN=mean) # 1 2 3 # 1 -0.42965680 0.6943236 0.04505399 # 2 0.55021401 -0.3138895 -0.40966078 # 3 0.05676266 0.5212944 0.12521106 data.frame(as.table( tapply(df[, "value"], INDEX=list(df[, "month"], df[, "date"]), FUN=mean))) # Var1 Var2 Freq #...
If you use the data.table package, this is very easy: install.packages("data.table") library(data.table) DF = data.table(DF) DF[,No_Days:=unlist(lapply(rle(Runoff>0.05)$lengths,function(x) rev(seq(x:1)))),by=Soil] DF[Runoff <= 0.05, No_Days:=0] ...
There is a drop argument in dcast that you can use in this situation, but you'll need to convert subject to a factor. Here is a dataset with a second subject ID that has no values that meet your condition that the absolute value of z.score is less than one....
Try mapply(function(x,y) tapply(x,y, FUN=mean) , Example[seq(1, ncol(Example), 2)], Example[seq(2, ncol(Example), 2)]) Or instead of seq(1, ncol(Example), 2) just use c(TRUE, FALSE) and c(FALSE, TRUE) for the second case...
You do not need time series, just tapply: res=tapply(AVG_LOSCAT2$AVG_LOSCAT, list(year = AVG_LOSCAT2$YEAR, month = AVG_LOSCAT2$MONTH), round,2) res month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 NA NA NA NA NA 7.51 7.31 8.33 7.66 5.36 6.46 8.30 2013 5.74 7.89 6.49 7.09 5.91...
Your example is slightly problematic, since if you set: df <- data.frame(my_vector = rnorm(150), my_factor1 = gl(3,50), my_factor2 = gl(2,75) ) You will have only one unique value for my_factor2 when my_factor1 = 1 or 3 because of how your repetitions overlap. See ?gl. So do: df <- data.frame(my_vector =...
I think you want this: tapply(q,# the variable to be summarized v,# the variable that defines the bins function(x) # the function to calculate the summary statistics within each bin sum(x)/length(x)) ...
I also found that I can simply alter the tapply() output to strptime() afterwards, via a dataframe() rather than trying to do it before, then order() by date Data$Date <- as.factor(Data$Date) DAVEH <- tapply(Data$Humidity,Data$Date, FUN = mean) site.daily<-data.frame(c(names(DAVEH)),c(DAVEH)) rownames(site.daily)<-seq_len(nrow(site.daily)) colnames(site.daily)<-c("Date","DAVEH") site.daily$Date<-strptime(site.daily$Date, format="%d/%m/%Y") site.daily<-site.daily[order(site.daily$Date),]...
using dplyr and reshape2: library(dplyr) library(reshape2) df %>% group_by(Transect) %>% summarise(A = sum(Area[Point %in% c(1, 2, 3)]), B = sum(Area[Point %in% c(3, 4, 5)]), C = sum(Area[Point %in% c(5, 6, 1)])) %>% melt() ...