First of all, what's with assign() in the graph1/graph2 functions? That seems completely unnecessary. So just change those to graph1 <- function(df) {xyplot (deltaxz ~ dates | SPEC, data=df, type=c("p","g"), col=c("black"), layout=c(1,3))} graph2 <- function(df) {xyplot (1/deltaxz ~ dates | SPEC, data=df, type=c("p","g"), col=c("red"), layout=c(1,3))} and secondly, the d_ply is...

Something like this maybe, where I've extended the patterns you are looking for to show how it could become adaptable: library(stringr) patterns <- c("Two","Four","Three") hits <- lapply(myList[is1997], function(x) { out <- sapply(patterns, str_extract, string=x) paste(out[!is.na(out)],collapse="££") }) myList[is1997] <- hits #[[1]] #[1] "Two££Four££Three" # #[[2]] #[1] "mnopqrstuvwxyz" # #[[3]] #[1] "ghijklmnopqrs"...

You can use the subset operator [<-: x <- texts is1997 <- str_detect(names(texts), "1997") x[is1997] <- lapply(texts[is1997], str_extract, regexp) x # $AB1997R.txt # [1] "abcdef" # # $BG2000S.txt # [1] "mnopqrstuvwxyz" # # $MN1999R.txt # [1] "ghijklmnopqrs" # # $DC1997S.txt # [1] "abcdef" # ...

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table: sample1 <- data.frame(A=1:10, B=letters[1:10]) sample2 <- data.frame(B=11:20, C=letters[11:20]) rename_map <- c("A"="var1", "B"="var2", "C"="var3") names(sample1) <- rename_map[names(sample1)] str(sample1) names(sample2) <- rename_map[names(sample2)] str(sample2) Fundamentally the algorithm is simple: Build...

Here's an alternative > sites <- gsub("\\.com$", "", sites) > ifelse(sites %in% c("facebook", "google"), sites, "other") [1] "other" "other" "facebook" "google" "other" ...

To fix your code you only need parse(): func_list <- list( total_count = parse(text="sum(count)"), avg_amnt = parse(text="mean(amnt)")) This will tell the interpreter that the text should be evaluated as code and not as strings....

If you need a date-time object library(data.table) setDT(df)[, as.POSIXct(paste(v1[1], v1[-1]), format='%Y-%m-%d %H:%M'), by=gr] # gr V1 #1: 1 2014-12-01 07:00:00 #2: 1 2014-12-01 08:00:00 #3: 1 2014-12-01 09:00:00 #4: 2 2014-12-02 06:00:00 #5: 2 2014-12-02 09:00:00 Or if you need it to be in the format as shown in the...

rollapply passes a matrix to the function so only pass the numeric columns. Using rolled from my prior answer and the setup in that question: do.call("rbind", by(dat[c("x", "y")], dat[c("w", "z")], rolled)) Added Another way to do it is to perform the rollapply over the row indexes instead of over the...

I updated the package "Rcpp" and now it is working for me. install.packages("Rcpp")

Try mtcars %>% mutate(mpg=replace(mpg, cyl==4, NA)) %>% as.data.frame() ...

You are passign a string name "Species" to a ddply function. So you should get it's value inside. Then ddply recognize column name library(plyr) IG_test <-function(data, feature){ dd<-ddply(data, feature, here(summarise), N=length(get(feature))) return(dd) } IG_test(iris, "Species") ...

You can achieve the desired result by using merge: merge(df.A,df.B,by='Category',all=T) which will produce the following output: # Category Number.x Number.y #1 A 1 5 #2 B 2 6 #3 C 3 7 #4 D 4 NA ...

With base R you can do: aggregate(Wert ~ ., df, sum) # GN Datum Land Wert #1 11747 2012-01-04 Thailand 17187 If you want to preserve other columns you have in your data, you can for example do (using dplyr): df %>% group_by(GN, Datum, Land) %>% mutate(Wert = sum(Wert)) %>%...

r,list,data.frame,plyr,reshape2

Expanding my comment to include blank row, I suggest the following, assuming mylist is the object: mylist[vapply(mylist,length,1L)==0]<-list(list(rep("",4))) x<-do.call(rbind,unlist(mylist,recursive=FALSE)) colnames(x)<-names(mylist[[c(1,1)]]) ...

since you want the tableGrob to be outside the plot panel and not inside, you shouldn't use annotation_custom, but arrangeGrob to arrange a plot and a table on a page. The list of grobs can then be printed page by page in the pdf device. library(plyr) plots <- dlply(iris, "Species",...

I think you have a few nested mistakes which is causing you problems. The biggest one is using count() instead summarise(). I'm guessing you wanted n(): weighting.function <- function(dataframe, variable){ dataframe %>% group_by_(variable) %>% summarise_( freq = ~n(), freq_weighted = ~sum(survey_weight) ) } weighting.function(test_dataframe, ~gender) You also had a few...

Use do.call() with rbind() to convert the data to a single data frame, then reshape2::dcast() for the reshaping: dat <- do.call(rbind, raw.data) dat$obs <- gsub(".*?\\.", "", row.names(dat)) library(reshape2) dcast(dat, obs ~ Year, fun.aggregate = sum, value.var = "Pop") obs 1920 1921 1922 1923 1924 1 1 1927433 1915576 1902111 0...

If you want to stick with plyr: df.ddply <- ddply(df, "name", summarise, counter=length(var[var == 1])) ...

library(dplyr) library(tidyr) df$Date <- as.Date(df$Date) Step 1: Generate a full list of {Date, Index} pairs full_dat <- expand.grid( Date = date_range, Index = indices, stringsAsFactors = FALSE ) %>% arrange(Date, Index) %>% tbl_df Step 2: Define a cumsum() function that ignores NA cumsum2 <- function(x){ x[is.na(x)] <- 0 cumsum(x) }...

You could try data.table. Using a set.seed(20) for creating the "df" (for reproducibility). Instead of the "wide" format, I am reshaping "df" to "long" using melt, converted to "data.table" (as.data.table), set the key columns (setkey(..)), join the "lookup" dataset, convert it back to "wide" format with dcast.data.table, and finally join...

I'm biased in favor of cSplit from the "splitstackshape" package, but you might be interested in unnest from "tidyr" in conjunction with "dplyr": library(dplyr) library(tidyr) df %>% mutate(b = strsplit(b, ";")) %>% unnest(b) # a b # 1 1 g # 2 1 j # 3 1 n # 4...

Is the following your desired output? lapply(mydf[-1], function(x) lm(x ~ 0 + mydf[,1])) $x1 Call: lm(formula = x ~ 0 + mydf[, 1]) Coefficients: mydf[, 1]A mydf[, 1]B mydf[, 1]C 2.511 2.608 2.405 $x2 Call: lm(formula = x ~ 0 + mydf[, 1]) Coefficients: mydf[, 1]A mydf[, 1]B mydf[, 1]C...

dplyr package is created for this purpose to handle large datasets. try this library(dplyr) df %>% group_by(firstword) %>% arrange(desc(Freq)) %>% top_n(6) ...

You can use this code: library(dplyr) d %>% mutate(before=ifelse(event,lag(amount),NA), after =ifelse(event,lead(amount),NA)) # amount event before after #1 3 FALSE NA NA #2 4 FALSE NA NA #3 6 TRUE 4 7 #4 7 FALSE NA NA #5 3 FALSE NA NA #6 4 TRUE 3 8 #7 8 FALSE NA...

I think you were close, you just misplaced the sep argument: gather(df9, pt.num.type, value, 2:17) separate(pt.num.type, c("type", "pt.num"), sep=1) Using dplyr you could do something like: df9 %>% gather(pt.num.type, value, 2:5) %>% separate(pt.num.type, c("type", "pt.num"), sep=1) %>% group_by(GeneID, type) %>% summarise(sum = sum(value)) # GeneID type sum # 1 A2M...

If you wish to return your tree counts at each point as class table, you need to use dlply with an anonymous function. This will result in a list with one element per point, each containing a table: dlply(df, .(Point), function(x) table(x$Species)) # $`99` # # Ulmus.alata # 1 #...

I think this works.... you should check the results are as desired... dta %>% group_by(A) %>% do(fn(.)) # A B #1 A 0.22276975 #2 A 0.01183619 #3 A 1.84315247 #4 A 0.19809142 #5 A 0.08114770 #6 A 1.48606944 #7 A 0.84864389 #8 A 0.60060566 #9 A 0.25362720 #10 A 1.68528202...

Here is one way. I am sure there will better ways. First, I grouped the data by gr. Second, I checked if there is any row which has identical values in x1 and x2. If there is such a row, I asked R to assign 1, otherwise 0. Finally, I...

You say some teams "reappear" and at that point I thought the little intergroup helper function from this answer might be just the right tool here. It is useful when in your case, there are teams e.g. "w" that reappear in the same year, e.g. 2013, after another team has...

r,formatting,plyr,percentage,crosstab

pt <- percent(c(round(prop.table(tab), 3))) dim(pt) <- dim(tab) dimnames(pt) <- dimnames(tab) This should work. c being used here for its property of turning a table or matrix into a vector. Alternative using sprintf: pt <- sprintf("%0.1f%%", prop.table(tab) * 100) dim(pt) <- dim(tab) dimnames(pt) <- dimnames(tab) If you want the table written...

To apply the differential equation solver to every subject: First: write step 4 as a function: simulate.conc <- function(simeventdfi) { #Initial values - compartments and time-dependent parameters A_0i <- c("A1"=0,"A2"=0, "Rate"=simeventdfi$value[1], "CL"=simeventdfi$value[simeventdfi$var==4 & simeventdfi$time==0], "V1"=simeventdfi$value[simeventdfi$var==5 & simeventdfi$time==0], "Q"= simeventdfi$value[simeventdfi$var==6 & simeventdfi$time==0], "V2"=simeventdfi$value[simeventdfi$var==7 & simeventdfi$time==0]) #Run...

r,group-by,aggregate,plyr,dplyr

One option: library(dplyr) df %>% group_by(session_id) %>% mutate(rank = dense_rank(-seller_feedback_score)) dense_rank is "like min_rank, but with no gaps between ranks" so I negated the seller_feedback_score column in order to turn it into something like max_rank (which doesn't exist in dplyr). If you want the ranks with gaps so that you...

Here's how I'd do it using data.table: require(data.table) setkey(setDT(ev1), test_id) ev1[ev2, .(ev2.time=i.time, ev1.time=time[which.min(abs(i.time-time))]), by=.EACHI] # test_id ev2.time ev1.time # 1: 0 6 3 # 2: 0 1 1 # 3: 0 8 3 # 4: 1 4 4 # 5: 1 5 4 # 6: 1 11 4 In joins...

I would also use dplyr for bigger datasets (similar to @jlhoward's answer) data <- read.csv('data2.csv') library(dplyr) data %>% group_by(year) %>% summarise(Observations=n(), Total_Monitors=n_distinct(indivID),#n_distinct contributed by @beginneR Urban=round(length(urban==1)/n_distinct(fips)), Counties=n_distinct(fips), RVPI_Counties=length(unique(fips[RVPI==1]))) # year Observations Total_Monitors Urban Counties RVPI_Counties #1 1989 147 2 74 2 2 #2 1990 209 4 52 4 4 #3...

Here's my answer, using built-in functions quantile and boxplot.stats. geom_boxplot does the calcualtions for boxplot slightly differently than boxplot.stats. Read ?geom_boxplot and ?boxplot.stats to understand my implementation below #Function to calculate boxplot stats to match ggplot's implemention as in geom_boxplot. my_boxplot.stats <-function(x){ quantiles <-quantile(x, c(0, 0.25, 0.5, 0.75, 1)) labels...

You want to do two things with your code: Use dlply instead of ddply, since you want a list of rpart objects instead of a data frame of (?). ddply would be useful if you wanted to show predicted values of the original data, since that can be formatted into...

Try this: library(dplyr) df %>% group_by(Nest) %>% mutate(Change = c(Weight[1], diff(Weight))) or with just the base of R transform(df, Change = ave(Weight, Nest, FUN = function(x) c(x[1], diff(x)))) ...

r,statistics,plyr,apply,linear-regression

I don't know how this will be helpful in a linear regression but you could do something like that: df <- read.table(header=T, text="Assay Sample Dilution meanresp number 1 S 0.25 68.55 1 1 S 0.50 54.35 2 1 S 1.00 44.75 3") Using lapply: > lapply(2:nrow(df), function(x) df[(x-1):x,] ) [[1]]...

This is an alternate way (one of many) to get country names from lat/lon. This won't require API calls out to a server. (Save the GeoJSON file locally for real/production use): library(rgdal) library(magrittr) world <- readOGR("https://raw.githubusercontent.com/AshKyd/geojson-regions/master/data/source/ne_50m_admin_0_countries.geo.json", "OGRGeoJSON") places %>% select(place_lon, place_lat) %>% coordinates %>% SpatialPoints(CRS(proj4string(world))) %over% world %>% select(iso_a2, name)...

r,data.frame,data.table,plyr,dplyr

Here's a data.table way: library(data.table) setDT(df)[,`:=`( del = Pt_A - Pt_A[1], perc = Pt_A/Pt_A[1]-1 ),by=ID] which gives ID Pt_A del perc 1: 101 50 0 0.0000000 2: 101 100 50 1.0000000 3: 101 150 100 2.0000000 4: 102 20 0 0.0000000 5: 102 30 10 0.5000000 6: 102 40 20...

You can try Reduce(function(...) merge(..., by='id'), list(df1, df2, df3)) # id score1 score2 score3 #1 1 50 33 50 #2 2 23 23 23 #3 4 68 64 68 #4 5 82 12 82 If you have many dataset object names with pattern 'df' followed by number Reduce(function(...) merge(..., by='id'),...

You're having troubles because strsplit() returns a list which we then need to apply as.data.frame.list() to each element to get it into the proper format that dplyr requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a...

Moving some comments to the correct place (answers), the two most common solutions would be: c(list_1, list_2) or append(list_1, list_2) Since you had already tried: list(list_1, list_2) and found that this had created a nested list, you can also unlist the nested list with the argument recursive = FALSE. unlist(list(list_1,...

This is all about writing a modified na.locf function. After that you can plug it into data.table like any other function. new.locf <- function(x){ # might want to think about the end of this loop # this works here but you might need to add another case # if there...

One option would be to use data.table. We can convert the data.frame to data.table (setDT(df1)), get the mean (lapply(.SD, mean)) for the selected columns ('var2' and 'var3') by specifying the column index in .SDcols, grouped by 'geo'. Create new columns by assigning the output (:=) to the new column names...

What about using ifelse to select the scaling direction, based on the value of variable: tableau.m = ddply(tableau.m, .(variable), transform, rescale = ifelse(variable=="B", rescale(value, to=c(1,0)), rescale(value))) Net variable value rescale 1 a B 1.88 1.00000000 2 b B 2.05 0.32000000 3 c B 2.09 0.16000000 4 d B 2.07 0.24000000...

Here's an option using do: i <- 2:5 n <- c(names(dat), paste0("power_", i)) dat %>% do(data.frame(., sapply(i, function(x) .$a^x))) %>% setNames(n) # a power_2 power_3 power_4 power_5 #1 1 1 1 1 1 #2 2 4 8 16 32 #3 3 9 27 81 243 #4 4 16 64 256...

1) rollapply works on data frames too so it is not necessary to convert df to zoo. 2) lm uses na.action, not na.rm, and its default is na.omit so we can just drop this argument. 3) rollapplyr is a more concise way to write rollapply(..., align = "right"). Assuming that...

ddply is almost fully defunct in the shade of dplyr library(dplyr) a1$variable <- as.character(a1$variable) a1 %>% group_by(variable) %>% summarise(mvalue = mean(value, na.rm=TRUE), medvalue = median(value, na.rm=TRUE), sd = sd(value, na.rm=TRUE), n = sum(!is.na(value)), se = sd/sqrt(n)) %>% ggplot(., aes(x=variable, y=mvalue, fill=variable)) + geom_bar(stat='identity', position='dodge')+ geom_errorbar(aes(ymin=mvalue-se, ymax=mvalue+se))+ scale_fill_grey() ...

You can determine the number of times each site appeared at each time with the table function: (tab <- table(df$time, df$site)) # A B C D E # 1 1 1 1 1 0 # 2 1 1 1 0 0 # 3 1 1 1 1 1 With some...

Per your comment, if the subgroups are unique you can do library(dplyr) group_by(df, group) %>% mutate(percent = value/sum(value)) # group subgroup value percent # 1 A a 1 0.1250000 # 2 A b 4 0.5000000 # 3 A c 2 0.2500000 # 4 A d 1 0.1250000 # 5 B...

What you need to be doing: ddply(adhd_p, "pid", summarise, hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])), falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]]))) Why you need to be doing it: When you call ddply, the function works within the .data (adhd_p in your case) as the local namespace. This is similar to calling attach(adhd_p); calling the name of a...

you may want to do this in two steps as in: #initialize the new variable df$new <- df$percent # Add 10% from code == 1 to code == 3 df$new[df$code == 3] <- df$new[df$code == 3] + 0.1 * df$percent[df$code == 1] # sutbtract off 10% from code 1 where...

You can use cSplit_e from my "splitstackshape" package, like this: library(splitstackshape) cSplit_e(mydata, "NAMES", sep = ",", type = "character", fill = 0) # ID NAMES NAMES_333 NAMES_4444 NAMES_456 NAMES_765 # 1 1 4444, 333, 456 1 1 1 0 # 2 2 333 1 0 0 0 # 3 3...

Using dplyr library(dplyr) mtcars %>% group_by(cyl) %>% do(data.frame(as.list(quantile(.$mpg,probs=probs)), check.names=FALSE)) # cyl 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% #1 4 21.4 21.50 22.80 22.80 24.40 26.0 27.30 30.40 30.40 32.40 33.9 #2 6 17.8 17.98 18.32 18.98 19.40 19.7 20.48 21.00 21.00 21.16 21.4 #3 8...

You could try with base R using Map, which would be more compact than the one with llply. Basically, you have two lists with same number of list elements and wanted to subset one each list element of the first list ("list1") based on the index list elements of ("list2")....

As you'd like to produce boxplots for each group and year in the same graph, I think your dataset is ready for that and you can do the following: p <- ggplot(tmp.data, aes(factor(year), fill=group, value)) p + geom_boxplot() ...

DT[setkey(DT[, min(period):max(period), by = project], project, V1)] # project period v3 v4 # 1: 6 1 a red # 2: 6 2 b yellow # 3: 6 3 NA NA # 4: 6 4 NA NA # 5: 6 5 c red # 6: 6 6 d yellow # 7:...

r,data.table,aggregate,plyr,summary

You can use dplyr's summarise_each function combined with group_by: library(dplyr) soil %>% group_by(CODE_PLOT) %>% summarise_each(funs(mean = mean(., na.rm = TRUE), sd = sd(., na.rm = TRUE), N = n()), 4:30) This will summarise the columns 4:30 of your data. If you want to supply a vector of column names to...

You can use sapply: sapply(funcs, function(f) {tmp <- f(2); setNames(list(tmp$val), tmp$ref)}) # $XX1 # [1] 1 # # $XX55 # [1] 341 # # $XX3 # [1] 11 ...

Try this: #data df <- read.table(text=" tissueA tissueB tissueC gene1 4.5 6.2 5.8 gene2 3.2 4.7 6.6") #result apply(df,1,function(i){ my.max <- max(i) my.statistic <- (1-log2(i)/log2(my.max)) my.sum <- sum(my.statistic) my.answer <- my.sum/(length(i)-1) my.answer }) #result # gene1 gene2 # 0.1060983 0.2817665 ...

You could try dplyr library(dplyr) library(lazyeval) mydf %>% group_by(colors) %>% summarise_(sum_val=interp(~sum(var), var=as.name(mycol))) # colors sum_val #1 Blue 5 #2 Green 9 #3 Red 7 Or using ddply from plyr library(plyr) ddply(mydf, .(colors), summarize, sum_val=eval(substitute(sum(var), list(var=as.name(mycol)))) ) # colors sum_val #1 Blue 5 #2 Green 9 #3 Red 7 Regarding the...

The problem with plyr_test is that df_2 is defined in plyr_test which isn't accessible from the doParallel package, and therefore it fails when it tries to export df_2. So that is a scoping issue. plyr_test2 avoids this problem because is doesn't try to use the .export option, but as you...

Interesting question. It appears that your decay factor, if call it so, is 0.25, the following two steps do what is intended (first 10 observations printed, the resultant is called z): In [67]: z = df.groupby('x').y.apply(lambda x: np.convolve(x, np.power(0.25, range(len(x)))[:len(x)], mode='full')[:len(x)]) print z x 1 [1.0, 2.25, 3.5625, 4.890625, 6.22265625]...

library(dplyr) a1<-group_by(df,tel) mutate(a1,mycol=intentos/lag(intentos,1)) Source: local data frame [10 x 5] Groups: tel tel hora intentos contactos mycol 1 1 1 1 0 NA 2 1 2 5 1 5.0000000 3 1 4 1 0 0.2000000 4 1 4 4 0 4.0000000 5 2 1 9 0 NA 6 2 1...

May be you need df1 <- subset(df, specialty %in% c('Real Estate', 'Tort')) library(reshape2) dM <- melt(df1, id.var='specialty')[,-2] dM[] <- lapply(dM, factor) table(dM) # value #specialty 38564 44140 44950 49000 49255 49419 NULL # Real Estate 0 0 0 2 0 1 3 # Tort 1 1 1 1 2 0...

python,r,pandas,plyr,split-apply-combine

Here's the two-line solution: import itertools for grpname,grpteams in df.groupby('group')['team']: # No need to use grpteams.tolist() to convert from pandas Series to Python list print list(itertools.combinations(grpteams, 2)) [('Canada', 'Netherlands'), ('Canada', 'China'), ('Canada', 'New Zealand'), ('Netherlands', 'China'), ('Netherlands', 'New Zealand'), ('China', 'New Zealand')] [('Germany', 'Norway'), ('Germany', 'Thailand'), ('Germany', 'Ivory Coast'), ('Norway',...

Thanks to Joran, installing from CRAN actually solved the issue. install.packages("plyr") ...

You could for example subset the data you pass into ddply: ddply(subset(data, Source != "abc"), .(Source), summarize, Cost= sum(Cost)) Or ddply(subset(data, !Source %in% c("abc", "def")), .(Source), summarize, Cost= sum(Cost)) Of course you could use [ instead of subset. Or you could give dplyr a try: library(dplyr) data %>% filter(!Source %in%...

r,alias,plyr,pipeline,magrittr

use_series is just an alias for $. You can see that by typing the function name without the parenthesis use_series # .Primitive("$") The $ primitive function does not have formal argument names in the same way a user-defined function does. It would be easier to use extract2 in this case...

r,dataframes,plyr,split-apply-combine

This is in dplyr and may be able to be cleaned up some, but it looks like it works: library(dplyr) newdf <- data %>% group_by(serial) %>% mutate( cidx = year == 1985 & moved == 0, urban.rural.code = ifelse(year == 1984 & isTRUE(cidx[year==1985]), urban.rural.code[year == 1985], urban.rural.code) ) ...

You were almost there: lapply(x, function(z) z[! (z %in% bad.words)]) Alternatively, you could do lapply(x, function(z) setdiff(z,bad.words)) which seems more elegant to me....

You can use an ifelse statement to scale the E.e and F.f values based on their combined range, rather than the range of each individual group of values: tableau.m = ddply(tableau.m, .(variable), transform, rescale = ifelse(variable %in% c("E.e","F.f"), rescale(value, from=range(value[variable %in% c("E.e","F.f")])), rescale(value))) UPDATE: After seeing your comment, I realized...

r,warnings,plyr,r-package,package-development

There are several workarounds. The easiest is to just assign NULL to all variables with no visible binding. VarA <- VarB <- VarC <- VarD <- VarE <- NULL A more elegant solution is to use as.quoted and substitute. UPDATE by @Dr. Mike: the call to as.quoted need to be...

r,data.table,aggregate,plyr,dplyr

Using dplyr: df %>% group_by(ID, Type) %>% summarise_each(funs(sum(., na.rm = T))) Or df %>% group_by(ID, Type) %>% summarise(Point_A = sum(Point_A, na.rm = T), Point_B = sum(Point_B, na.rm = T)) Or f <- function(x) sum(x, na.rm = T) df %>% group_by(ID, Type) %>% summarise(Point_A = f(Point_A), Point_B = f(Point_B)) Which gives:...

r,classification,plyr,quantile

I don't see anything wrong with your computation. Each quartile contains same number of elements only when you have a very large data set. Below I extracted Diameter for the first group. table(findInterval(diameter, quantile(diameter, c(.25, .5, .75)))) # 0 1 2 3 #22 23 24 23 sum(diameter < quantile(diameter, .25))...

r,plyr,tapply,split-apply-combine

Answer using data.table package: > dt <- data.table(eg = letters[1:8], Type=rep(c("F","W"), 4)) > a <- dt[, paste(eg, collapse=" "), by=Type] > a Type V1 1: F a c e g 2: W b d f h The bonus of using data.table is that this will still run in a few...

You can use the apply() function in row mode on your data frame. You can pass each row to the function classify_emotion like this: result <- data.frame(apply(mydata, 1, function(x) { y <- classify_emotion(x, "bayes", 1.0) return(y) })) ...

I think you need to split the column before you can use it to order the data frame: library("reshape2") ## for colsplit() library("gtools") Construct test data: dat <- data.frame(matrix(1:25,5)) names(dat) <- c('Testname1.1', 'Testname1.100', 'Testname1.11','Testname1.2','Testname2.99') Split and order: cdat <- colsplit(names(dat),"\\.",c("name","num")) dat[,order(mixedorder(cdat$name),cdat$num)] ## Testname1.1 Testname1.2 Testname1.11 Testname1.100 Testname2.99 ## 1 1...

What about this? res <- t(sapply(AP, function(y) sapply(unique(unlist(AP)), function(x) sum(x == y)))) colnames(res) <- unique(unlist(AP)) res 411050384 411050456 411058568 428909002 CMP1 1 2 1 0 CMP2 1 1 0 0 CMP3 1 1 1 2 ...

aggregate(Volume~., data=df, sum) MRN Product Transfusion.Date Volume 1 1 PRBC 2004-12-02 50 2 2 PRBC 2004-12-02 150 3 3 FFP 2004-12-03 2 4 3 FFP 2004-12-04 1 ...

Maybe try library(dplyr) dat %>% group_by(group) %>% summarise(V1 = sum((value - mean(value))^2)) %>% summarise(V1 = sum(V1)) %>% .$V1 # [1] 1372.8 or, if you want do: dat %>% group_by(group) %>% do({data.frame(V1 = sum((.$value-mean(.$value))^2))}) %>% ungroup() %>% summarise(V1 = sum(V1)) %>% .$V1 # [1] 1372.8 ...

You can add a "keep" column that is TRUE only if the standard deviation is below 2. Then, you can use a left join (merge) to add the "keep" column to the initial dataframe. In the end, you just select with keep equal to TRUE. # add the keep column...

Here's a way with dplyr. I first join the tomatch data.frame with the FIPS names by state (allowing only in-state matches): require(dplyr) df <- tomatch %>% left_join(fips, by="state") Next, I noticed that a lot of counties don't have 'Saint' but 'St.' in the FIPS dataset. Cleaning that up first should...

Had a similar issue, my answer was sorting on groups and the relevant ranked variable(s) in order to then use row_number() when using group_by. # Sample dataset df <- data.frame(group=rep(c("GROUP 1", "GROUP 2"),10), value=as.integer(rnorm(20, mean=1000, sd=500))) require(dplyr) print.data.frame(df[0:10,]) group value 1 GROUP 1 1273 2 GROUP 2 1261 3 GROUP...

llply goes down to loop_apply which eventually calls this function: // [[Rcpp::export]] List loop_apply(int n, Function f) { List out(n); for(int i = 0; i < n; ++i) { out[i] = f(i + 1); } return out; } So, as part of the business of Rcpp::export we get a call...

You can replicate the data 4 times: - including sex and group - including sex - including group - not including any column The columns that are not included become "all" require(plyr) dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29,...

There are many tools for split-apply-combine in R. I'd be inclined to use the data.table package: require(data.table) mydt <- data.table(data) mycols <- c('C','E','M','P') newcols <- paste0(mycols,'new') my1vec <- c(1.1,.9,1,1.01) my0vec <- c(.8,1.05,1.01,1.01) mydt[FF==1,(newcols):=mapply(`*`,my1vec,.SD,SIMPLIFY=FALSE),.SDcols=mycols] mydt[FF==0,(newcols):=mapply(`*`,my0vec,.SD,SIMPLIFY=FALSE),.SDcols=mycols] I put the new values in new columns. If instead you want to overwrite the old...

You get the error : Error: object of type 'closure' is not subsettable Because when ddply try to resolve t in the local environment before the global environment. Indeed, it found the transpose function (closure) t and not your global variable t. You need just to change to something other...

This is a simple operation within the data.table scope dt[, .(b = sum(b), c = c[which.max(b)]), by = a] # a b c # 1: a 6 p # 2: b 9 r A similar option would be dt[order(b), .(b = sum(b), c = c[.N]), by = a] ...

I downloaded your data and had a look. If I am not mistaken, all you need is to subset data using Time.h. Here you have a range of time (10-23) you want. I used dplyr and did the following. You are asking R to pick up rows which have values...

In your call, when you do sum(Total) you are using the total value of the group, which when used with Total/sum(Total) simply produces 1 for this data/grouping. You could calculate the total sum from the entire data set by using df$Total in the sum() call. With ddply this would be...

r,function,aggregate,plyr,multi-level

If I understand well, what you want is to compute a variable at the level of the district and then attribute it to the school level. I hardly understand the rest of your post. You do that in base R using successively aggregate and merge . Given that you already...

. in do is a data frame, which is why you get the error. This works: df %>% rowwise() %>% do(data.frame(as.list(quantile(unlist(.),probs = c(0.1,0.5,0.9))))) but risks being horrendously slow. Why not just: apply(df, 1, quantile, probs = c(0.1,0.5,0.9)) Here are some timings with larger data: df <- as.data.frame(matrix(rbinom(100000,10,0.5),nrow = 1000)) library(microbenchmark)...

Is this what you are after... library(plyr) library(dplyr) Data set.seed(1) df <-data.frame(category=rep(LETTERS[1:5],each=10), superregion=sample(c("EMEA","LATAM","AMER","APAC"),100,replace=T), country=sample(c("Country1","Country2","Country3","Country4","Country5","Country6","Country7","Country8"),100,replace=T), market=sample(c("Market1","Market2","Market3","Market4","Market5","Market6","Market7","Market8","Market9","Market10","Market11","Market12"),100,replace=T),...

You could try library(plyr) ddply(dat, "company", summarise, ratingMax = max(rating), ID = ID[which.max(rating)]) # company ratingMax ID #1 CompA 4 A12 #2 CompB 5 A22 #3 CompC 4 A31 Or using dplyr library(dplyr) dat %>% group_by(company) %>% summarise(ratingMax=max(rating), ID=ID[which.max(rating)]) # company ratingMax ID #1 CompA 4 A12 #2 CompB 5...