since you want the tableGrob to be outside the plot panel and not inside, you shouldn't use annotation_custom, but arrangeGrob to arrange a plot and a table on a page. The list of grobs can then be printed page by page in the pdf device. library(plyr) plots <- dlply(iris, "Species",...

You could try with base R using Map, which would be more compact than the one with llply. Basically, you have two lists with same number of list elements and wanted to subset one each list element of the first list ("list1") based on the index list elements of ("list2")....

library(dplyr) a1<-group_by(df,tel) mutate(a1,mycol=intentos/lag(intentos,1)) Source: local data frame [10 x 5] Groups: tel tel hora intentos contactos mycol 1 1 1 1 0 NA 2 1 2 5 1 5.0000000 3 1 4 1 0 0.2000000 4 1 4 4 0 4.0000000 5 2 1 9 0 NA 6 2 1...

You can use the apply() function in row mode on your data frame. You can pass each row to the function classify_emotion like this: result <- data.frame(apply(mydata, 1, function(x) { y <- classify_emotion(x, "bayes", 1.0) return(y) })) ...

Try this: library(dplyr) df %>% group_by(Nest) %>% mutate(Change = c(Weight[1], diff(Weight))) or with just the base of R transform(df, Change = ave(Weight, Nest, FUN = function(x) c(x[1], diff(x)))) ...

r,list,data.frame,plyr,reshape2

Expanding my comment to include blank row, I suggest the following, assuming mylist is the object: mylist[vapply(mylist,length,1L)==0]<-list(list(rep("",4))) x<-do.call(rbind,unlist(mylist,recursive=FALSE)) colnames(x)<-names(mylist[[c(1,1)]]) ...

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table: sample1 <- data.frame(A=1:10, B=letters[1:10]) sample2 <- data.frame(B=11:20, C=letters[11:20]) rename_map <- c("A"="var1", "B"="var2", "C"="var3") names(sample1) <- rename_map[names(sample1)] str(sample1) names(sample2) <- rename_map[names(sample2)] str(sample2) Fundamentally the algorithm is simple: Build...

r,data.frame,data.table,plyr,dplyr

Here's a data.table way: library(data.table) setDT(df)[,`:=`( del = Pt_A - Pt_A[1], perc = Pt_A/Pt_A[1]-1 ),by=ID] which gives ID Pt_A del perc 1: 101 50 0 0.0000000 2: 101 100 50 1.0000000 3: 101 150 100 2.0000000 4: 102 20 0 0.0000000 5: 102 30 10 0.5000000 6: 102 40 20...

You're having troubles because strsplit() returns a list which we then need to apply as.data.frame.list() to each element to get it into the proper format that dplyr requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a...

You can use sapply: sapply(funcs, function(f) {tmp <- f(2); setNames(list(tmp$val), tmp$ref)}) # $XX1 # [1] 1 # # $XX55 # [1] 341 # # $XX3 # [1] 11 ...

Here's how I'd do it using data.table: require(data.table) setkey(setDT(ev1), test_id) ev1[ev2, .(ev2.time=i.time, ev1.time=time[which.min(abs(i.time-time))]), by=.EACHI] # test_id ev2.time ev1.time # 1: 0 6 3 # 2: 0 1 1 # 3: 0 8 3 # 4: 1 4 4 # 5: 1 5 4 # 6: 1 11 4 In joins...

First of all, what's with assign() in the graph1/graph2 functions? That seems completely unnecessary. So just change those to graph1 <- function(df) {xyplot (deltaxz ~ dates | SPEC, data=df, type=c("p","g"), col=c("black"), layout=c(1,3))} graph2 <- function(df) {xyplot (1/deltaxz ~ dates | SPEC, data=df, type=c("p","g"), col=c("red"), layout=c(1,3))} and secondly, the d_ply is...

You get the error : Error: object of type 'closure' is not subsettable Because when ddply try to resolve t in the local environment before the global environment. Indeed, it found the transpose function (closure) t and not your global variable t. You need just to change to something other...

r,function,aggregate,plyr,multi-level

If I understand well, what you want is to compute a variable at the level of the district and then attribute it to the school level. I hardly understand the rest of your post. You do that in base R using successively aggregate and merge . Given that you already...

r,data.table,aggregate,plyr,summary

You can use dplyr's summarise_each function combined with group_by: library(dplyr) soil %>% group_by(CODE_PLOT) %>% summarise_each(funs(mean = mean(., na.rm = TRUE), sd = sd(., na.rm = TRUE), N = n()), 4:30) This will summarise the columns 4:30 of your data. If you want to supply a vector of column names to...

You say some teams "reappear" and at that point I thought the little intergroup helper function from this answer might be just the right tool here. It is useful when in your case, there are teams e.g. "w" that reappear in the same year, e.g. 2013, after another team has...

You could for example subset the data you pass into ddply: ddply(subset(data, Source != "abc"), .(Source), summarize, Cost= sum(Cost)) Or ddply(subset(data, !Source %in% c("abc", "def")), .(Source), summarize, Cost= sum(Cost)) Of course you could use [ instead of subset. Or you could give dplyr a try: library(dplyr) data %>% filter(!Source %in%...

The problem with plyr_test is that df_2 is defined in plyr_test which isn't accessible from the doParallel package, and therefore it fails when it tries to export df_2. So that is a scoping issue. plyr_test2 avoids this problem because is doesn't try to use the .export option, but as you...

If you need a date-time object library(data.table) setDT(df)[, as.POSIXct(paste(v1[1], v1[-1]), format='%Y-%m-%d %H:%M'), by=gr] # gr V1 #1: 1 2014-12-01 07:00:00 #2: 1 2014-12-01 08:00:00 #3: 1 2014-12-01 09:00:00 #4: 2 2014-12-02 06:00:00 #5: 2 2014-12-02 09:00:00 Or if you need it to be in the format as shown in the...

r,dataframes,plyr,split-apply-combine

This is in dplyr and may be able to be cleaned up some, but it looks like it works: library(dplyr) newdf <- data %>% group_by(serial) %>% mutate( cidx = year == 1985 & moved == 0, urban.rural.code = ifelse(year == 1984 & isTRUE(cidx[year==1985]), urban.rural.code[year == 1985], urban.rural.code) ) ...

What you need to be doing: ddply(adhd_p, "pid", summarise, hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])), falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]]))) Why you need to be doing it: When you call ddply, the function works within the .data (adhd_p in your case) as the local namespace. This is similar to calling attach(adhd_p); calling the name of a...

r,warnings,plyr,r-package,package-development

There are several workarounds. The easiest is to just assign NULL to all variables with no visible binding. VarA <- VarB <- VarC <- VarD <- VarE <- NULL A more elegant solution is to use as.quoted and substitute. UPDATE by @Dr. Mike: the call to as.quoted need to be...

You can try Reduce(function(...) merge(..., by='id'), list(df1, df2, df3)) # id score1 score2 score3 #1 1 50 33 50 #2 2 23 23 23 #3 4 68 64 68 #4 5 82 12 82 If you have many dataset object names with pattern 'df' followed by number Reduce(function(...) merge(..., by='id'),...

library(dplyr) library(tidyr) df$Date <- as.Date(df$Date) Step 1: Generate a full list of {Date, Index} pairs full_dat <- expand.grid( Date = date_range, Index = indices, stringsAsFactors = FALSE ) %>% arrange(Date, Index) %>% tbl_df Step 2: Define a cumsum() function that ignores NA cumsum2 <- function(x){ x[is.na(x)] <- 0 cumsum(x) }...

You are passign a string name "Species" to a ddply function. So you should get it's value inside. Then ddply recognize column name library(plyr) IG_test <-function(data, feature){ dd<-ddply(data, feature, here(summarise), N=length(get(feature))) return(dd) } IG_test(iris, "Species") ...

Here's an alternative > sites <- gsub("\\.com$", "", sites) > ifelse(sites %in% c("facebook", "google"), sites, "other") [1] "other" "other" "facebook" "google" "other" ...

r,statistics,plyr,apply,linear-regression

I don't know how this will be helpful in a linear regression but you could do something like that: df <- read.table(header=T, text="Assay Sample Dilution meanresp number 1 S 0.25 68.55 1 1 S 0.50 54.35 2 1 S 1.00 44.75 3") Using lapply: > lapply(2:nrow(df), function(x) df[(x-1):x,] ) [[1]]...

One option would be to use data.table. We can convert the data.frame to data.table (setDT(df1)), get the mean (lapply(.SD, mean)) for the selected columns ('var2' and 'var3') by specifying the column index in .SDcols, grouped by 'geo'. Create new columns by assigning the output (:=) to the new column names...

1) rollapply works on data frames too so it is not necessary to convert df to zoo. 2) lm uses na.action, not na.rm, and its default is na.omit so we can just drop this argument. 3) rollapplyr is a more concise way to write rollapply(..., align = "right"). Assuming that...

r,classification,plyr,quantile

I don't see anything wrong with your computation. Each quartile contains same number of elements only when you have a very large data set. Below I extracted Diameter for the first group. table(findInterval(diameter, quantile(diameter, c(.25, .5, .75)))) # 0 1 2 3 #22 23 24 23 sum(diameter < quantile(diameter, .25))...

As you'd like to produce boxplots for each group and year in the same graph, I think your dataset is ready for that and you can do the following: p <- ggplot(tmp.data, aes(factor(year), fill=group, value)) p + geom_boxplot() ...

I think this works.... you should check the results are as desired... dta %>% group_by(A) %>% do(fn(.)) # A B #1 A 0.22276975 #2 A 0.01183619 #3 A 1.84315247 #4 A 0.19809142 #5 A 0.08114770 #6 A 1.48606944 #7 A 0.84864389 #8 A 0.60060566 #9 A 0.25362720 #10 A 1.68528202...

Is the following your desired output? lapply(mydf[-1], function(x) lm(x ~ 0 + mydf[,1])) $x1 Call: lm(formula = x ~ 0 + mydf[, 1]) Coefficients: mydf[, 1]A mydf[, 1]B mydf[, 1]C 2.511 2.608 2.405 $x2 Call: lm(formula = x ~ 0 + mydf[, 1]) Coefficients: mydf[, 1]A mydf[, 1]B mydf[, 1]C...

You could try dplyr library(dplyr) library(lazyeval) mydf %>% group_by(colors) %>% summarise_(sum_val=interp(~sum(var), var=as.name(mycol))) # colors sum_val #1 Blue 5 #2 Green 9 #3 Red 7 Or using ddply from plyr library(plyr) ddply(mydf, .(colors), summarize, sum_val=eval(substitute(sum(var), list(var=as.name(mycol)))) ) # colors sum_val #1 Blue 5 #2 Green 9 #3 Red 7 Regarding the...

I downloaded your data and had a look. If I am not mistaken, all you need is to subset data using Time.h. Here you have a range of time (10-23) you want. I used dplyr and did the following. You are asking R to pick up rows which have values...

I would also use dplyr for bigger datasets (similar to @jlhoward's answer) data <- read.csv('data2.csv') library(dplyr) data %>% group_by(year) %>% summarise(Observations=n(), Total_Monitors=n_distinct(indivID),#n_distinct contributed by @beginneR Urban=round(length(urban==1)/n_distinct(fips)), Counties=n_distinct(fips), RVPI_Counties=length(unique(fips[RVPI==1]))) # year Observations Total_Monitors Urban Counties RVPI_Counties #1 1989 147 2 74 2 2 #2 1990 209 4 52 4 4 #3...

You can achieve the desired result by using merge: merge(df.A,df.B,by='Category',all=T) which will produce the following output: # Category Number.x Number.y #1 A 1 5 #2 B 2 6 #3 C 3 7 #4 D 4 NA ...

r,alias,plyr,pipeline,magrittr

use_series is just an alias for $. You can see that by typing the function name without the parenthesis use_series # .Primitive("$") The $ primitive function does not have formal argument names in the same way a user-defined function does. It would be easier to use extract2 in this case...

You can use cSplit_e from my "splitstackshape" package, like this: library(splitstackshape) cSplit_e(mydata, "NAMES", sep = ",", type = "character", fill = 0) # ID NAMES NAMES_333 NAMES_4444 NAMES_456 NAMES_765 # 1 1 4444, 333, 456 1 1 1 0 # 2 2 333 1 0 0 0 # 3 3...

python,r,pandas,plyr,split-apply-combine

Here's the two-line solution: import itertools for grpname,grpteams in df.groupby('group')['team']: # No need to use grpteams.tolist() to convert from pandas Series to Python list print list(itertools.combinations(grpteams, 2)) [('Canada', 'Netherlands'), ('Canada', 'China'), ('Canada', 'New Zealand'), ('Netherlands', 'China'), ('Netherlands', 'New Zealand'), ('China', 'New Zealand')] [('Germany', 'Norway'), ('Germany', 'Thailand'), ('Germany', 'Ivory Coast'), ('Norway',...

Here's a way with dplyr. I first join the tomatch data.frame with the FIPS names by state (allowing only in-state matches): require(dplyr) df <- tomatch %>% left_join(fips, by="state") Next, I noticed that a lot of counties don't have 'Saint' but 'St.' in the FIPS dataset. Cleaning that up first should...

I think you have a few nested mistakes which is causing you problems. The biggest one is using count() instead summarise(). I'm guessing you wanted n(): weighting.function <- function(dataframe, variable){ dataframe %>% group_by_(variable) %>% summarise_( freq = ~n(), freq_weighted = ~sum(survey_weight) ) } weighting.function(test_dataframe, ~gender) You also had a few...

What about this? res <- t(sapply(AP, function(y) sapply(unique(unlist(AP)), function(x) sum(x == y)))) colnames(res) <- unique(unlist(AP)) res 411050384 411050456 411058568 428909002 CMP1 1 2 1 0 CMP2 1 1 0 0 CMP3 1 1 1 2 ...

You can use the subset operator [<-: x <- texts is1997 <- str_detect(names(texts), "1997") x[is1997] <- lapply(texts[is1997], str_extract, regexp) x # $AB1997R.txt # [1] "abcdef" # # $BG2000S.txt # [1] "mnopqrstuvwxyz" # # $MN1999R.txt # [1] "ghijklmnopqrs" # # $DC1997S.txt # [1] "abcdef" # ...

You were almost there: lapply(x, function(z) z[! (z %in% bad.words)]) Alternatively, you could do lapply(x, function(z) setdiff(z,bad.words)) which seems more elegant to me....

Thanks to Joran, installing from CRAN actually solved the issue. install.packages("plyr") ...

r,formatting,plyr,percentage,crosstab

pt <- percent(c(round(prop.table(tab), 3))) dim(pt) <- dim(tab) dimnames(pt) <- dimnames(tab) This should work. c being used here for its property of turning a table or matrix into a vector. Alternative using sprintf: pt <- sprintf("%0.1f%%", prop.table(tab) * 100) dim(pt) <- dim(tab) dimnames(pt) <- dimnames(tab) If you want the table written...

llply goes down to loop_apply which eventually calls this function: // [[Rcpp::export]] List loop_apply(int n, Function f) { List out(n); for(int i = 0; i < n; ++i) { out[i] = f(i + 1); } return out; } So, as part of the business of Rcpp::export we get a call...

With base R you can do: aggregate(Wert ~ ., df, sum) # GN Datum Land Wert #1 11747 2012-01-04 Thailand 17187 If you want to preserve other columns you have in your data, you can for example do (using dplyr): df %>% group_by(GN, Datum, Land) %>% mutate(Wert = sum(Wert)) %>%...

Maybe try library(dplyr) dat %>% group_by(group) %>% summarise(V1 = sum((value - mean(value))^2)) %>% summarise(V1 = sum(V1)) %>% .$V1 # [1] 1372.8 or, if you want do: dat %>% group_by(group) %>% do({data.frame(V1 = sum((.$value-mean(.$value))^2))}) %>% ungroup() %>% summarise(V1 = sum(V1)) %>% .$V1 # [1] 1372.8 ...

I'm biased in favor of cSplit from the "splitstackshape" package, but you might be interested in unnest from "tidyr" in conjunction with "dplyr": library(dplyr) library(tidyr) df %>% mutate(b = strsplit(b, ";")) %>% unnest(b) # a b # 1 1 g # 2 1 j # 3 1 n # 4...

You can use this code: library(dplyr) d %>% mutate(before=ifelse(event,lag(amount),NA), after =ifelse(event,lead(amount),NA)) # amount event before after #1 3 FALSE NA NA #2 4 FALSE NA NA #3 6 TRUE 4 7 #4 7 FALSE NA NA #5 3 FALSE NA NA #6 4 TRUE 3 8 #7 8 FALSE NA...

This is a simple operation within the data.table scope dt[, .(b = sum(b), c = c[which.max(b)]), by = a] # a b c # 1: a 6 p # 2: b 9 r A similar option would be dt[order(b), .(b = sum(b), c = c[.N]), by = a] ...

You want to do two things with your code: Use dlply instead of ddply, since you want a list of rpart objects instead of a data frame of (?). ddply would be useful if you wanted to show predicted values of the original data, since that can be formatted into...

You can add a "keep" column that is TRUE only if the standard deviation is below 2. Then, you can use a left join (merge) to add the "keep" column to the initial dataframe. In the end, you just select with keep equal to TRUE. # add the keep column...

This is an alternate way (one of many) to get country names from lat/lon. This won't require API calls out to a server. (Save the GeoJSON file locally for real/production use): library(rgdal) library(magrittr) world <- readOGR("https://raw.githubusercontent.com/AshKyd/geojson-regions/master/data/source/ne_50m_admin_0_countries.geo.json", "OGRGeoJSON") places %>% select(place_lon, place_lat) %>% coordinates %>% SpatialPoints(CRS(proj4string(world))) %over% world %>% select(iso_a2, name)...

rollapply passes a matrix to the function so only pass the numeric columns. Using rolled from my prior answer and the setup in that question: do.call("rbind", by(dat[c("x", "y")], dat[c("w", "z")], rolled)) Added Another way to do it is to perform the rollapply over the row indexes instead of over the...

I think you were close, you just misplaced the sep argument: gather(df9, pt.num.type, value, 2:17) separate(pt.num.type, c("type", "pt.num"), sep=1) Using dplyr you could do something like: df9 %>% gather(pt.num.type, value, 2:5) %>% separate(pt.num.type, c("type", "pt.num"), sep=1) %>% group_by(GeneID, type) %>% summarise(sum = sum(value)) # GeneID type sum # 1 A2M...

If you wish to return your tree counts at each point as class table, you need to use dlply with an anonymous function. This will result in a list with one element per point, each containing a table: dlply(df, .(Point), function(x) table(x$Species)) # $`99` # # Ulmus.alata # 1 #...

Here is one way. I am sure there will better ways. First, I grouped the data by gr. Second, I checked if there is any row which has identical values in x1 and x2. If there is such a row, I asked R to assign 1, otherwise 0. Finally, I...

If you want to stick with plyr: df.ddply <- ddply(df, "name", summarise, counter=length(var[var == 1])) ...

Try mtcars %>% mutate(mpg=replace(mpg, cyl==4, NA)) %>% as.data.frame() ...

May be you need df1 <- subset(df, specialty %in% c('Real Estate', 'Tort')) library(reshape2) dM <- melt(df1, id.var='specialty')[,-2] dM[] <- lapply(dM, factor) table(dM) # value #specialty 38564 44140 44950 49000 49255 49419 NULL # Real Estate 0 0 0 2 0 1 3 # Tort 1 1 1 1 2 0...

r,plyr,tapply,split-apply-combine

Answer using data.table package: > dt <- data.table(eg = letters[1:8], Type=rep(c("F","W"), 4)) > a <- dt[, paste(eg, collapse=" "), by=Type] > a Type V1 1: F a c e g 2: W b d f h The bonus of using data.table is that this will still run in a few...

To fix your code you only need parse(): func_list <- list( total_count = parse(text="sum(count)"), avg_amnt = parse(text="mean(amnt)")) This will tell the interpreter that the text should be evaluated as code and not as strings....

aggregate(Volume~., data=df, sum) MRN Product Transfusion.Date Volume 1 1 PRBC 2004-12-02 50 2 2 PRBC 2004-12-02 150 3 3 FFP 2004-12-03 2 4 3 FFP 2004-12-04 1 ...

Moving some comments to the correct place (answers), the two most common solutions would be: c(list_1, list_2) or append(list_1, list_2) Since you had already tried: list(list_1, list_2) and found that this had created a nested list, you can also unlist the nested list with the argument recursive = FALSE. unlist(list(list_1,...

You can determine the number of times each site appeared at each time with the table function: (tab <- table(df$time, df$site)) # A B C D E # 1 1 1 1 1 0 # 2 1 1 1 0 0 # 3 1 1 1 1 1 With some...

Had a similar issue, my answer was sorting on groups and the relevant ranked variable(s) in order to then use row_number() when using group_by. # Sample dataset df <- data.frame(group=rep(c("GROUP 1", "GROUP 2"),10), value=as.integer(rnorm(20, mean=1000, sd=500))) require(dplyr) print.data.frame(df[0:10,]) group value 1 GROUP 1 1273 2 GROUP 2 1261 3 GROUP...

DT[setkey(DT[, min(period):max(period), by = project], project, V1)] # project period v3 v4 # 1: 6 1 a red # 2: 6 2 b yellow # 3: 6 3 NA NA # 4: 6 4 NA NA # 5: 6 5 c red # 6: 6 6 d yellow # 7:...

You could try data.table. Using a set.seed(20) for creating the "df" (for reproducibility). Instead of the "wide" format, I am reshaping "df" to "long" using melt, converted to "data.table" (as.data.table), set the key columns (setkey(..)), join the "lookup" dataset, convert it back to "wide" format with dcast.data.table, and finally join...

Something like this maybe, where I've extended the patterns you are looking for to show how it could become adaptable: library(stringr) patterns <- c("Two","Four","Three") hits <- lapply(myList[is1997], function(x) { out <- sapply(patterns, str_extract, string=x) paste(out[!is.na(out)],collapse="££") }) myList[is1997] <- hits #[[1]] #[1] "Two££Four££Three" # #[[2]] #[1] "mnopqrstuvwxyz" # #[[3]] #[1] "ghijklmnopqrs"...

What about using ifelse to select the scaling direction, based on the value of variable: tableau.m = ddply(tableau.m, .(variable), transform, rescale = ifelse(variable=="B", rescale(value, to=c(1,0)), rescale(value))) Net variable value rescale 1 a B 1.88 1.00000000 2 b B 2.05 0.32000000 3 c B 2.09 0.16000000 4 d B 2.07 0.24000000...

. in do is a data frame, which is why you get the error. This works: df %>% rowwise() %>% do(data.frame(as.list(quantile(unlist(.),probs = c(0.1,0.5,0.9))))) but risks being horrendously slow. Why not just: apply(df, 1, quantile, probs = c(0.1,0.5,0.9)) Here are some timings with larger data: df <- as.data.frame(matrix(rbinom(100000,10,0.5),nrow = 1000)) library(microbenchmark)...

r,group-by,aggregate,plyr,dplyr

One option: library(dplyr) df %>% group_by(session_id) %>% mutate(rank = dense_rank(-seller_feedback_score)) dense_rank is "like min_rank, but with no gaps between ranks" so I negated the seller_feedback_score column in order to turn it into something like max_rank (which doesn't exist in dplyr). If you want the ranks with gaps so that you...

There are many tools for split-apply-combine in R. I'd be inclined to use the data.table package: require(data.table) mydt <- data.table(data) mycols <- c('C','E','M','P') newcols <- paste0(mycols,'new') my1vec <- c(1.1,.9,1,1.01) my0vec <- c(.8,1.05,1.01,1.01) mydt[FF==1,(newcols):=mapply(`*`,my1vec,.SD,SIMPLIFY=FALSE),.SDcols=mycols] mydt[FF==0,(newcols):=mapply(`*`,my0vec,.SD,SIMPLIFY=FALSE),.SDcols=mycols] I put the new values in new columns. If instead you want to overwrite the old...

This is all about writing a modified na.locf function. After that you can plug it into data.table like any other function. new.locf <- function(x){ # might want to think about the end of this loop # this works here but you might need to add another case # if there...

dplyr package is created for this purpose to handle large datasets. try this library(dplyr) df %>% group_by(firstword) %>% arrange(desc(Freq)) %>% top_n(6) ...

Try this: #data df <- read.table(text=" tissueA tissueB tissueC gene1 4.5 6.2 5.8 gene2 3.2 4.7 6.6") #result apply(df,1,function(i){ my.max <- max(i) my.statistic <- (1-log2(i)/log2(my.max)) my.sum <- sum(my.statistic) my.answer <- my.sum/(length(i)-1) my.answer }) #result # gene1 gene2 # 0.1060983 0.2817665 ...

I think you need to split the column before you can use it to order the data frame: library("reshape2") ## for colsplit() library("gtools") Construct test data: dat <- data.frame(matrix(1:25,5)) names(dat) <- c('Testname1.1', 'Testname1.100', 'Testname1.11','Testname1.2','Testname2.99') Split and order: cdat <- colsplit(names(dat),"\\.",c("name","num")) dat[,order(mixedorder(cdat$name),cdat$num)] ## Testname1.1 Testname1.2 Testname1.11 Testname1.100 Testname2.99 ## 1 1...

ddply is almost fully defunct in the shade of dplyr library(dplyr) a1$variable <- as.character(a1$variable) a1 %>% group_by(variable) %>% summarise(mvalue = mean(value, na.rm=TRUE), medvalue = median(value, na.rm=TRUE), sd = sd(value, na.rm=TRUE), n = sum(!is.na(value)), se = sd/sqrt(n)) %>% ggplot(., aes(x=variable, y=mvalue, fill=variable)) + geom_bar(stat='identity', position='dodge')+ geom_errorbar(aes(ymin=mvalue-se, ymax=mvalue+se))+ scale_fill_grey() ...

I updated the package "Rcpp" and now it is working for me. install.packages("Rcpp")

Is this what you are after... library(plyr) library(dplyr) Data set.seed(1) df <-data.frame(category=rep(LETTERS[1:5],each=10), superregion=sample(c("EMEA","LATAM","AMER","APAC"),100,replace=T), country=sample(c("Country1","Country2","Country3","Country4","Country5","Country6","Country7","Country8"),100,replace=T), market=sample(c("Market1","Market2","Market3","Market4","Market5","Market6","Market7","Market8","Market9","Market10","Market11","Market12"),100,replace=T),...

Per your comment, if the subgroups are unique you can do library(dplyr) group_by(df, group) %>% mutate(percent = value/sum(value)) # group subgroup value percent # 1 A a 1 0.1250000 # 2 A b 4 0.5000000 # 3 A c 2 0.2500000 # 4 A d 1 0.1250000 # 5 B...

In your call, when you do sum(Total) you are using the total value of the group, which when used with Total/sum(Total) simply produces 1 for this data/grouping. You could calculate the total sum from the entire data set by using df$Total in the sum() call. With ddply this would be...

Here's my answer, using built-in functions quantile and boxplot.stats. geom_boxplot does the calcualtions for boxplot slightly differently than boxplot.stats. Read ?geom_boxplot and ?boxplot.stats to understand my implementation below #Function to calculate boxplot stats to match ggplot's implemention as in geom_boxplot. my_boxplot.stats <-function(x){ quantiles <-quantile(x, c(0, 0.25, 0.5, 0.75, 1)) labels...

you may want to do this in two steps as in: #initialize the new variable df$new <- df$percent # Add 10% from code == 1 to code == 3 df$new[df$code == 3] <- df$new[df$code == 3] + 0.1 * df$percent[df$code == 1] # sutbtract off 10% from code 1 where...

r,data.table,aggregate,plyr,dplyr

Using dplyr: df %>% group_by(ID, Type) %>% summarise_each(funs(sum(., na.rm = T))) Or df %>% group_by(ID, Type) %>% summarise(Point_A = sum(Point_A, na.rm = T), Point_B = sum(Point_B, na.rm = T)) Or f <- function(x) sum(x, na.rm = T) df %>% group_by(ID, Type) %>% summarise(Point_A = f(Point_A), Point_B = f(Point_B)) Which gives:...

You could try library(plyr) ddply(dat, "company", summarise, ratingMax = max(rating), ID = ID[which.max(rating)]) # company ratingMax ID #1 CompA 4 A12 #2 CompB 5 A22 #3 CompC 4 A31 Or using dplyr library(dplyr) dat %>% group_by(company) %>% summarise(ratingMax=max(rating), ID=ID[which.max(rating)]) # company ratingMax ID #1 CompA 4 A12 #2 CompB 5...

Here's an option using do: i <- 2:5 n <- c(names(dat), paste0("power_", i)) dat %>% do(data.frame(., sapply(i, function(x) .$a^x))) %>% setNames(n) # a power_2 power_3 power_4 power_5 #1 1 1 1 1 1 #2 2 4 8 16 32 #3 3 9 27 81 243 #4 4 16 64 256...