Menu
  • HOME
  • TAGS

R how to do multiple GLMs for each level for a factor in my data.frame?

r,loops,glm,tapply

I think the error is caused by failing to understand R's syntax to define a function (and a further error in not knowing that column names such as "month" are not available as global variables. Try instead: multifactorglm <- function(x){ glm(rained ~ temp + humidity, data=x, family="binomial") } do.call(rbind, do(df,...

R - Finding minimum values based on multiple conditions and returning one or multiple created strings based on the minimum value

r,tapply

Edit: I got bit by a subtle data.table behavior. data.table keeps keys on summarized data, but only the ones you summarized on. So the join wasn't doing what I thought it was doing. Here is the exact same logic, but with one interim step to unset the partial key on...

Collapse a character vector by value in another column r [duplicate]

r,plyr,tapply,split-apply-combine

Answer using data.table package: > dt <- data.table(eg = letters[1:8], Type=rep(c("F","W"), 4)) > a <- dt[, paste(eg, collapse=" "), by=Type] > a Type V1 1: F a c e g 2: W b d f h The bonus of using data.table is that this will still run in a few...

R - Is the result of tapply always in alphabetical order

r,tapply

Normally the output is ordered, but you can come up with examples where it is not. For example if you have factors with unordered levels. df <- data.frame(Name = factor(c('Ben', 'Al'), levels = c('Ben', 'Al')), Earnings = c(1, 4)) tapply(df$Earnings, df$Name, sum) ## Ben Al ## 1 4 In that...

How to make a scatterplot matrix in R using subset of data

r,matrix,rstudio,scatter-plot,tapply

You can subset using typical methods for row subsetting; using which() is simple. For example, I want a scatterplot matrix of a few columns of mtcars, but I'm only interested in the rows where cyl is 4. pairs(mtcars[which(mtcars$cyl==4),c('disp','hp','drat')]) ...

Tapply over matrix using matrix math

r,matrix,tapply

Could try doing this via tapply tapply(seq_len(ncol(X)), cluster, function(x) f(T%*%X[, x])) # 0 1 # 3840.681 1238.826 ...

tapply with non numeric values

r,character,dataframes,tapply

As mentioned in the comments, you missed an rnorm(). You can also use the vector c("control","ConditionB","ConditionC") and times = 300 instead of repeating rep() 3 times. Column1=rep(c("control","ConditionB","ConditionC"), times = 300) Column2=rnorm(900,mean=100,sd=10) data=data.frame(Column1,Column2) tapply(data$Column2,data$Column1,mean) ...

using tapply in multiple variables

r,tapply

You may need to use cut mat <- tapply(moneyspent, list(gender, age=cut(age, breaks=c(20,30,40), include.lowest=TRUE)), mean) nm1 <- outer(rownames(mat), colnames(mat), FUN=paste) setNames(c(mat), nm1) #female [20,30] male [20,30] female (30,40] male (30,40] # 300 150 450 150 Other options include library(dplyr) data %>% group_by(gender, age=cut(age, breaks=c(20,30,40), include.lowest=TRUE)) %>% summarise(moneyspent=mean(moneyspent)) Or library(data.table) setDT(data)[, list(moneyspent=mean(moneyspent)),...

Applying consecutive functions to a dataframe and outputting results of each into a table

r,tapply

To get the Tukey HSD for each compound as you've specified, try this: lapply(unique(t.df$Compound), function(x, df) TukeyHSD(aov(glm(Proportion ~ Treatment, data = df, subset = Compound == x)))[[1]], df = t.df) For each unique compound, this calls TukeyHSD() on an ANOVA for a general linear model fit on the subset of...

How to apply a custom function to each participant in a data frame

r,function,tapply

To build on @jvcasill's original function and on other users' responses: dPrime <- function (data, subj = 1, stimDiff = 2, stimSame = 3) { # dPrime() returns a vector of the length of the number of subjects #+ in data[, subj] that contains the sensitivity index "d'" for each....

Apply sum by integer factor after make a round in R

r,integer,numerical,tapply

Try this... set.seed(42) df0 <- data.frame( X = rnorm(100,10,10), Y = rnorm(100), Z = rnorm(100)) df0$seq <- as.integer(df0$X) library(data.table) dt = data.table(df0) dt[,lapply(.SD, sum), by=seq ] seq X Y Z 1: 23 164.8144774 1.293768670 -3.74807730 2: 4 8.9247301 1.909529066 -0.06277254 3: 13 40.2090180 -2.036599633 0.88836392 4: 16 147.8571697 -2.571487358 -1.35542918...

R function which.max with tapply

r,which,tapply

You can use by and reference the rownames of the row returned by which.max: Data[by(Data, Data$G, function(dat) rownames(dat)[which.max(dat$X)] ),] # X Y G #4 1.595281 -0.3309078 1 #61 2.401618 0.9510128 2 #147 2.087167 0.9160193 3 #171 2.307978 -0.3887222 4 (This assumes set.seed(1) for reproducibility's sake)...

How to use “with” and “tapply” to calculate a new variable based on multiple factors

r,with-statement,tapply

To give you an example of how this would work with the dplyr package if you want to calculate the means of Handle by groups of Period AND Queue: require(dplyr) ctrlmeans <- #data.frame to store your results ttp1 %.% #data.frane to use for analysis group_by(Period,Queue) %.% #grouping variables (you can...

How to merge tapply() and hist() in R?

r,plot,histogram,rstudio,tapply

Try this: par(mfrow = c(2, 2)) tapply(iris$Sepal.Length, iris$Species, hist) however, for multi-panel plots you might find the lattice or ggplot2 plackages more suitable. library(lattice) histogram(~ Sepal.Length | Species, iris) library(ggplot2) ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_wrap(~ Species) ...

Combining tapply and 'not in' logic, using R

r,notin,tapply

Try with(iris[!iris$Petal.Width %in% c('1.3', '1.5'),], tapply(Sepal.Length, Species, median)) # setosa versicolor virginica # 5.0 5.8 6.5 The idea here is to operate on the subset-ted data in the first place. Your line didn't work because the FUN argument should be applied on X (Sepal.Length in your case) rather over the...

cat() returning undesired list when using tapply

r,cat,tapply

You can wrap it with invisible so that the list elements don't get printed in the R console. invisible(sapply(rownames(c), function(x) cat("Gear", x, "contains", c[x], "qsec\n"))) Gear 3 contains 15 qsec Gear 4 contains 12 qsec Gear 5 contains 5 qsec ...

average of multiple numbers in R

r,aggregate,average,tapply

You tagged your question with tapply, so here's a tapply answer: tapply(df[, "value"], INDEX=list(df[, "month"], df[, "date"]), FUN=mean) # 1 2 3 # 1 -0.42965680 0.6943236 0.04505399 # 2 0.55021401 -0.3138895 -0.40966078 # 3 0.05676266 0.5212944 0.12521106 data.frame(as.table( tapply(df[, "value"], INDEX=list(df[, "month"], df[, "date"]), FUN=mean))) # Var1 Var2 Freq #...

Using rle() according to other variable group R

r,time-series,tapply

If you use the data.table package, this is very easy: install.packages("data.table") library(data.table) DF = data.table(DF) DF[,No_Days:=unlist(lapply(rle(Runoff>0.05)$lengths,function(x) rev(seq(x:1)))),by=Soil] DF[Runoff <= 0.05, No_Days:=0] ...

Insert NA's in case there are no observations when using subset() and then dcast or tapply

r,subset,na,reshape2,tapply

There is a drop argument in dcast that you can use in this situation, but you'll need to convert subject to a factor. Here is a dataset with a second subject ID that has no values that meet your condition that the absolute value of z.score is less than one....

Averaging values between paired columns across a large data frame

r,apply,mean,sapply,tapply

Try mapply(function(x,y) tapply(x,y, FUN=mean) , Example[seq(1, ncol(Example), 2)], Example[seq(2, ncol(Example), 2)]) Or instead of seq(1, ncol(Example), 2) just use c(TRUE, FALSE) and c(FALSE, TRUE) for the second case...

Converting time series to data frame, matrix, or table

r,time-series,tapply

You do not need time series, just tapply: res=tapply(AVG_LOSCAT2$AVG_LOSCAT, list(year = AVG_LOSCAT2$YEAR, month = AVG_LOSCAT2$MONTH), round,2) res month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 NA NA NA NA NA 7.51 7.31 8.33 7.66 5.36 6.46 8.30 2013 5.74 7.89 6.49 7.09 5.91...

R programming how to apply individual elements in list to a function

r,list,lapply,sapply,tapply

May be this helps: lapply(dat.list, function(x) matrix(unlist(lapply(x$z, function(y) forecast(ets(y, lambda=x$lam), h=12)$mean)), ncol=12, byrow=TRUE)) ...

How to perform t-tests for each level of a factor with tapply

r,tapply

Your example is slightly problematic, since if you set: df <- data.frame(my_vector = rnorm(150), my_factor1 = gl(3,50), my_factor2 = gl(2,75) ) You will have only one unique value for my_factor2 when my_factor1 = 1 or 3 because of how your repetitions overlap. See ?gl. So do: df <- data.frame(my_vector =...

calculate error rate by category

r,level,scoring,tapply

I think you want this: tapply(q,# the variable to be summarized v,# the variable that defines the bins function(x) # the function to calculate the summary statistics within each bin sum(x)/length(x)) ...

using tapply() with strptime() formatted date

r,strptime,tapply

I also found that I can simply alter the tapply() output to strptime() afterwards, via a dataframe() rather than trying to do it before, then order() by date Data$Date <- as.factor(Data$Date) DAVEH <- tapply(Data$Humidity,Data$Date, FUN = mean) site.daily<-data.frame(c(names(DAVEH)),c(DAVEH)) rownames(site.daily)<-seq_len(nrow(site.daily)) colnames(site.daily)<-c("Date","DAVEH") site.daily$Date<-strptime(site.daily$Date, format="%d/%m/%Y") site.daily<-site.daily[order(site.daily$Date),]...

Summing overlapping rows from a single column in R

r,sum,plyr,tapply

using dplyr and reshape2: library(dplyr) library(reshape2) df %>% group_by(Transect) %>% summarise(A = sum(Area[Point %in% c(1, 2, 3)]), B = sum(Area[Point %in% c(3, 4, 5)]), C = sum(Area[Point %in% c(5, 6, 1)])) %>% melt() ...