Update : Here's the easiest dplyr method I've found so far. And I'll add a stringi function to speed things up. Provided there are no identical sentences in df$text, we can group by that column and then apply mutate() Note: Package versions are dplyr 0.4.1 and stringi 0.4.1 library(dplyr) library(stringi)...

Perhaps sapply(strsplit(df$qual,split="/") , "[[", 1) ? The explanation: strsplit generates a list of results, i.e. a character vector for each character element in the original input. The "[[" is a short-hand way to call the indexing operator, and 1 says to pass the additional argument 1 to [[ -- i.e.,...

This is a typical ave + seq_along type of problem, but we need to convert the data to vectors first: t(`dim<-`(ave(rep(1, prod(dim(x[, -1]))), c(t(x[, -1])), FUN = seq_along) - 1, rev(dim(x[, -1])))) # [,1] [,2] [,3] # [1,] 0 0 0 # [2,] 1 1 0 # [3,] 1 2...

r,data.frame,unique,lapply,sapply

I will give +1 to BondedDust's answer, as I was also about to write almost same answer... Also as John wanted to have list of such colnames for given list of data.frames, I have added following to lines #dfList is list of dataframes for which operation is needed myfun =...

r,parallel-processing,lapply,sapply

?mclapply help page says that this is possible (argument SIMPLIFY), although only for mcmapply. As you've already figured it out, (mc)mapply with only one object passed is a special case and is equivalent to (mc)lapply.

Try this: Just to clear the confusion. dat1=as.data.frame(matrix(rnorm(25),ncol=5)) dat5=as.data.frame(matrix(rnorm(25),ncol=5)) dat7=as.data.frame(matrix(rnorm(25),ncol=5)) my_fun <- function(dataframe){ rowMeans( dataframe[ , c("V1","V2")],na.rm=TRUE) } dfList<-list(dat1,dat5,dat7) Vars <- grep("dat", ls(), value=TRUE) Vars #[1] "dat1" "dat5" "dat7" res <- lapply(dfList, function(x) transform(x,V6=my_fun(x))) for(i in 1:length(Vars)){ assign(Vars[i], res[[i]],envir=.GlobalEnv) } A Second function: my_funSD <-...

One attempt using Map and sweep, which I think gives the intended result: Map(function(x,y) abs(sweep(x,2,y,FUN="-"))/(sweep(abs(x),2,abs(y),FUN="+")), listA, listB) E.g.: listA <- list(x=matrix(1:9, nrow=3), y=matrix(1:9, nrow=3)) listB <- list(x=matrix(1:3, nrow=1), y=matrix(4:6, nrow=1)) Map(function(x,y) abs(sweep(x,2,y,FUN="-"))/(sweep(abs(x),2,abs(y),FUN="+")), listA, listB) #$x # [,1] [,2] [,3] #[1,] 0.0000000 0.3333333 0.4000000 #[2,] 0.3333333 0.4285714 0.4545455 #[3,] 0.5000000 0.5000000...

If you want each the matrices in a list res <- lapply(seq_len(ncol(rainfall)), function(i) matrix(rainfall[,i], ncol=24, byrow=TRUE) ) sapply(res, dim) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] #[1,] 13149 13149 13149 13149 13149 13149 13149 13149 13149 13149 13149 13149 #[2,] 24 24 24 24...

Some parts of the code is not clear. May be you did attach the dataset. Also, there is the problem of using wrong dat instead of dat.temp as commented by @BrodieG. Regarding the error, it could be because the column County is factor and the levels were not dropped. You...

You can try lapply lapply(spmtcars, function(x) x[order(-x$mpg),]) ...

Perhaps, you are looking for mapply. You can edit the FUN argument of mapply to do what you want on each RD and CD argument. mapply passes one element each of data$Record_Date and data$Compare_Date into RD and CD arguments of FUN respectively. mapply(FUN = function(RD, CD) { d <- as.numeric(CD...

You can replace sdat <- with(dat, split(dat, strat.var)) with sdat <- split(dat, dat[strat.var]) in the myFun. The previous code was not splitting as it was intended, instead you were getting the sum for the whole data, i.e. sum(with(warpbreaks, tapply(breaks, tension, FUN=mean))) #[1] 84.44444 Using the corrected myFun myFun(warpbreaks, strat.var='wool', PSU='tension',...

bucket declared outside of the function and bucket inside of the function are not necessarily the same thing. When inside the function, your call of bucket <- c(bucket, genelist.info.u[x, "Gene"]) updates the bucket in that function. Because you do not return bucket at the end, the one you initialized at...

This does the trick: vec = addr$addr testData$addr = apply(testData, 1, function(u){ bool = sapply(vec, function(x) grepl(x, u[['content']])) if(any(bool)) vec[bool] else NA }) ...

Here is an option: funs <- list(sd=sd, mean=mean) sapply(funs, function(x) sapply(df, x, na.rm=T)) Produces: sd mean col1.value 39.34826 39.42857 col2.value 28.33946 51.625 If you want to get cute with the functional library: sapply(funs, Curry(sapply, X=df), na.rm=T) Does the same thing....

r,nested,time-series,lapply,sapply

Using plyr: As a matrix (time in cols, rows corresponding to rows of df): aaply(df, 1, function(x) weisurv(t, x$sc, x$shp), .expand = FALSE) As a list: alply(df, 1, function(x) weisurv(t, x$sc, x$shp)) As a data frame (structure as per matrix above): adply(df, 1, function(x) setNames(weisurv(t, x$sc, x$shp), t)) As a...

string,r,vectorization,apply,sapply

This uses the data.table package and should be relatively quick. Do check your column types because the example data you gave gets converted to a factor variable (so I used stringsAsFactors=FALSE when recreating it). require(data.table) dt <- data.table( data , key = "ID" ) dt[ dt[ , list( Keyword =...

You may need simplify=FALSE in the sapply sapply(token, function(x) str_trim(x, side='both'), simplify=FALSE) Or better would be to use lapply lapply(token, function(x) str_trim(x, side='both')) ...

Subsetting can be done by using []. See the SpatialPolygons-class help (?'SpatialPolygons-class'): Methods [...]: [ : select subset of (sets of) polygons; NAs are not permitted in the row index" So using your data: library(sp) Sr1 = Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2))) Sr2 = Polygon(cbind(c(5,4,2,5),c(2,3,2,2))) Sr3 = Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5))) Sr4 = Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)...

You are doing two expensive things in an otherwise reasonable algorithm: You are recreating a matrix from your list for every iteration; this is likely slow You are recomputing the entire row sums repeatedly, when in reality you just need to calculate the marginal changes Here is an alternative. We...

You could try something like: for( v in ivs){ eval(parse(text=paste0(v,"$state <- tolower(",v,"$state)"))) } ...

In cases like this, I find it easier to use indices, instead of the data itself: sapply((1:ncol(data_2))[-2], function(i) { c(mean(data_2[,i]), sd(data_2[,i])) # add other functions }) ...

sapply iterates through the supplied vector or list and supplies each member in turn to the function. In your case, you're getting the values 2 and 4 and then trying to index your vector again using its own values. Since the oth_let1 vector has only two members, you get NA....

Here is the solution: For each group, identify subgroups using cut and drop the absentee subgroups using droplevels. Allocate weights as (x/2^n)/freq. Then identify the minimum weights and adjust them such that sum of weights in a group add upto 1. dat <- read.table("clipboard", header = T) groupIDs <- unique(dat$GroupID)...

c$secs <- ifelse(nchar(as.character(rt502[,8]))==7, substr(rt502[,8],6,6), substr(rt502[,8],6,7)) Is this what you are looking for?...

Use loop and paste file names. for(i in 1:777){ infile <- paste0("nsamplescluster.split",i,".adjusted") outfile <- paste0("newsplit",i,".txt") all <- read.table(infile, header=TRUE, sep=";") all <- all[, -grep("GType", colnames(all))] write.table(all, outfile, sep=";") } ...

You can try sapply(gregexpr("\\S+", x), length) ## [1] 6 2 1 1 Or as suggested in comments you can try sapply(strsplit(x, "\\s+"), length) ## [1] 6 2 1 1 ...

Why i is different... It looks like there were changes in R 3.2. An index variable i has been added to the current environment of lapply (which is what sapply actually calls). This goes along with the new behavior to force evaluation of the parameters passed along to the function...

Try mapply(function(x,y) tapply(x,y, FUN=mean) , Example[seq(1, ncol(Example), 2)], Example[seq(2, ncol(Example), 2)]) Or instead of seq(1, ncol(Example), 2) just use c(TRUE, FALSE) and c(FALSE, TRUE) for the second case...

I would use xpath instead, maybe... library(rentrez) x <- entrez_fetch("pubmed", "xml", id=c(11841882,11841881)) doc <- xmlParse(x) pubs <- getNodeSet(doc, "//PubmedArticle") y <- lapply(pubs, function(x) data.frame( pmid = xpathSApply(x, ".//MedlineCitation/PMID", xmlValue), mesh = xpathSApply(x, ".//MeshHeading/DescriptorName", xmlValue)) ) do.call("rbind", y) pmid mesh 1 11841882 Cardiopulmonary Resuscitation 2 11841882 Child, Preschool 3 11841882 Female...

Assuming x is pre-initialized to be all ones, and df is the data frame that contains a, b, and c, then a simple solution is: x[with(df, a + b + c == 0)] <- 0 Here we generate an index vector that contains TRUE whenever the desired condition is met...

Yes, you can use scale: scale(x, center=FALSE, scale=y) or sweep: sweep(x, 2, y, FUN='/') ...

One approach - advantage is conciseness but clearly not functional programming oriented - since it has border effect in modifying i: mapply(function(u,v) i<<-gsub(u,v,i), sf, rp) #> i #[1] "one 6 5 4" "7 4 three one" Or here is a pure functional programming approach: library(functional) Reduce(Compose, Map(function(u,v) Curry(gsub, pattern=u, replacement=v),...

r,apply,mathematical-optimization,sapply

list_dat is not a list, it is an array of lists. Your definition of min.RSS defines data as it's argument, but then refers to list # You don't really need to preallocate the list, but if you insist list_dat <- vector(length=2, mode='list') list_dat[[1]] =data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,6,8,12,15,19)) list_dat[[2]] =data.frame(x=c(1,2,3,4,5,6), y=c(1,3,5,6,8,12)) min.RSS...

r,sum,vectorization,apply,sapply

If you want to generate all of the values of L(.) for varying values of r and s, a loop-less method might be: rs <- expand.grid(r=r,s=s); rm(r); rm(s) #edit rs$qrs <- with(rs, L(r, s, R, S)^2 ) q <- sum(rs$qrs) I'm not convinced this will be faster. There is a...

How about dd <- as.data.frame(mat) dd[sapply(dd,function(x) all(x>=0))] ? sapply(...) returns a logical vector (in this case TRUE TRUE FALSE TRUE) that states whether the columns have all non-negative values. when used with a data frame (not a matrix), single-bracket indexing with a logical vector treats the data frame as a...

you can use apply group of functions for this purpose. here is a tutorial http://www.r-bloggers.com/using-apply-sapply-lapply-in-r/ sapply(obj, function(x) x$logL ) ...

r,functional-programming,lapply,sapply

lapply returns a list by default: From documentation: lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. sapply returns a vector by default: From documentation: sapply is a user-friendly version and wrapper of...