You can try sapply(data.matrix, function(x) min(x$P)) If the min values should replace the P column lapply(data.matrix, function(x) {x$P <- min(x$P);x}) ...

Try mapply(function(x,y) tapply(x,y, FUN=mean) , Example[seq(1, ncol(Example), 2)], Example[seq(2, ncol(Example), 2)]) Or instead of seq(1, ncol(Example), 2) just use c(TRUE, FALSE) and c(FALSE, TRUE) for the second case...

Using rowsum seems to be faster (at least for this small example dataset) than the data.table approach: sgibb <- function(datframe) { data.frame(Group = unique(df$Group), Avg = rowsum(df$Weighted_Value, df$Group)/rowsum(df$SumVal, df$Group)) } Adding the rowsum approach to @platfort's benchmark: library(microbenchmark) library(dplyr) library(data.table) microbenchmark( Nader = df %>% group_by(Group) %>% summarise(res = sum(Weighted_Value)...

I'm not a big fan of by(). I'd tackle this task with split() and lapply(). do.call(rbind, lapply(split(df, list(df$A, df$B)), function(d) { l <- lm(C~D, data=d)$coef data.frame(A=d$A[1], B=d$B[1], COR=cor(d$C, d$D), LM1=l[1], LM2=l[2]) } )) This gives: A B COR LM1 LM2 x.a x a 1 -5.000000 2.0000000 y.a y a 1...

Try this: #data df <- read.table(text=" tissueA tissueB tissueC gene1 4.5 6.2 5.8 gene2 3.2 4.7 6.6") #result apply(df,1,function(i){ my.max <- max(i) my.statistic <- (1-log2(i)/log2(my.max)) my.sum <- sum(my.statistic) my.answer <- my.sum/(length(i)-1) my.answer }) #result # gene1 gene2 # 0.1060983 0.2817665 ...

In base R, you could use merge and rowMeans (assuming that the 'score' column is 'numeric'). res <- merge(test1, test2[-1], by='studentName') res # studentName id score.x score.y #1 Alice 1 100 90 #2 Bob 2 98 95 #3 Josh 3 64 80 We are interested in averaging the rows of...

You could try data_list <- lapply(data_list, function(x) {x$year <- substr(x$year, 1,4) x}) ...

We don't need apply with MARGIN=1. Instead, we can paste the columns by with(birds, paste(year, month, day, sep="-")) and wrap it with as.Date to convert to 'Date' class. The output of ymd is POSIXct class, within the apply, it will be coerced to 'numeric' form. library(lubridate) library(dplyr) mutate(birds, date=ymd(paste(year, month,...

I made some minor changes to your function. You should just return the object and save the result of the function rather than using <<- #example data element1 <- c("control", "control", "variation", "variation") element2 <- c("control", "variation", "variation", "control") element3 <- c("variation", "control", "variation", "variation") metric <- c(10,15,20,25) other <-...

Using the Matrix package (which ships with a standard installation of R) nums <- c(1,2,3,4,5,1,2,4,3,5) apply(Matrix::sparseMatrix(i=seq_along(nums), j=nums), 2, cumsum) # [,1] [,2] [,3] [,4] [,5] # [1,] 1 0 0 0 0 # [2,] 1 1 0 0 0 # [3,] 1 1 1 0 0 # [4,] 1 1...

You could also use Reduce instead of apply as it is generally more efficient. You just need to slightly modify your function to use cbind instead of c f <- function (a, b) { cbind(a + b, a * b) # midified to use `cbind` instead of `c` } df[c('apb',...

If you end goal is to combine them into a single data.table, then in the latest version (1.9.5+) you can do it all in one step: rbindlist(test, idcol = 'Site') # Site x y # 1: a 1.907162564 -1.28512736 # 2: a 1.144876890 0.03482725 # 3: a -0.764530737 1.57029534 #...

Answer to your question is no, you can't distinguish between Letter.type and Letter.apply, so you need a workaround. I would suggest you to use Type-Class pattern for it, it's much more extenible than adding random-generating apply method to companion object trait RandomGenerator[T] { def gen: T } implicit object RandomLetter...

Just try: outer(a,b,"==")+0 # [,1] [,2] [,3] [,4] [,5] #[1,] 1 0 0 0 0 #[2,] 0 1 0 0 0 #[3,] 0 0 1 0 0 If you want row and column names: res<-outer(a,b,"==")+0 dimnames(res)<-list(a,b) EDIT Just a funnier one: `[<-`(matrix(0,nrow=length(a),ncol=length(b)), cbind(seq_along(a),match(a,b)), 1) ...

If ls1 and ls2 have equal length: lapply( seq_along(ls1), function(i) { rbind.fill.matrix(ls1[[i]], ls2[[i]]) } ) Result: # [[1]] # A B C D E F G H I J W X Y Z # [1,] 0 1 1 0 0 0 0 1 1 0 NA NA NA NA #...

I am not clear on why, but it seems the problem is that you are returning a series. This seems to work in your given example: def make_mask(s): if s.unique().shape[0] == 2: # If binary, return all-false mask return np.zeros(s.shape[0], dtype=bool) else: # Otherwise, identify outliers return s >= np.percentile(s,...

python,pandas,apply,geojson,shapely

Thanks, DSM, for pointing that out. Lesson learned: pandas is not good for arbitrary Python objects So this is what I wound up doing: temp = zip(list(data.geom), list(data.address)) output = map(lambda x: {'geometry': x[0], 'properties':{'address':x[1]}}, temp) ...

Not sure why you used sqldf, see this example: #dummy data set.seed(12) datwe <- data.frame(replicate(37,sample(c(1,2,99),10,rep=TRUE))) #convert to Yes/No res <- as.data.frame( sapply(datwe[,23:37], function(i) ifelse(i==1, "Yes", ifelse(i==2, "No", ifelse(i==99,NA,"Name itttt"))))) #update dataframe datwe <- cbind(datwe[, 1:22],res) #output, just showing first 2 columns datwe[,23:24] # X23 X24 # 1 No Yes #...

We could split the 'bar' by 'column' (col(bar)) and with mapply we can apply 'foo' for the corresponding 'a', 'b', 'c' values to each column of 'bar' mapply(foo, split(bar, col(bar)), a, b, c) Or without using apply ind <- col(bar) (a[ind]*bar +b[ind])^c[ind] ...

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

You could do this with dplyr like this: library(dplyr) help %>% group_by(deid) %>% mutate(epi = cumsum(ifelse(days.since.last>90,1,0))+1) deid session.number days.since.last epi 1 1 1 0 1 2 1 2 7 1 3 1 3 12 1 4 5 1 0 1 5 5 2 7 1 6 5 3 14 1...

Here is an implementation using while, although it is taking much longer than nested for loops which is a bit counter intuitive. f1 <- function() { n <- 1500 d <- 250 f = runif(n,1,5) f = embed(f, d) f = f[-(n-d+1),] count = rep(0, n-d) for(i in 1:(n-d)) {...

If I understand correctly you want to evaluate the first expression with the first value of x, the second with the second etc. You could do: mapply(function(ex, x) eval(ex, envir = list(x = x)), funs.list[1:2], c(7, 60)) ...

python,function,arguments,apply

Try using argument unpacking. self["commands"][values[0]](*values[1:]) ...

r,function,vectorization,apply,mapply

You can try outer f1 <- function(x,y) x^2+x^y-3 outer(1:5, 12:16, f1) which would be similar to t(Vectorize(function(x) f1(x,12:16))(1:5)) ...

You can try with(tradeData, ave(AL, Login, FUN=function(x) -1*c(0, diff(x)))) #[1] 0 0 0 1 0 0 0 -1 0 1 -1 0 1 0 1 0 0 -1 0 0 1 0 -1 Or an option using data.table. Convert the "data.frame" to "data.table" with setDT. Take the difference between current...

@alexis_laz answered the question (Thanks!) by linking to this. I'm posting it here since it it was mentioned in the comments section.

The following preserves the original data structure. Is it what's looked for? df = data.frame('test'=c(0,0,1,0)) df[] <- apply(df,2,function(j){sub(0,'00',j)}) df[] <- apply(df,2,function(j){sub(1,'01',j)}) df[] <- apply(df,2,function(j){sub(2,'10',j)}) df # test # 1 00 # 2 00 # 3 01 # 4 00 df1 = t(data.frame('test'=c(0,0,1,0))) df1[] <- apply(df1,2,function(j){sub(0,'00',j)}) df1[] <- apply(df1,2,function(j){sub(1,'01',j)}) df1[] <-...

You could try using the apply with "MARGIN=2" to loop over the columns of m. The below code is similar to the one you used for "m.low" except that it is using replace function to replace the elements in each column based on the condition argument i < sort(i).. to...

You can split rs by days, apply aggregatets on them and rbind l <- split.xts(rs, f="days") ts <- do.call("rbind", lapply(l, function(x){ aggregatets(x ,on="minutes", k=15)})) ...

One option would be to compare with equally sized elements. For this we can replicate the elements in 'nv' each by number of rows of 'df' (rep(nv, each=nrow(df))) and compare with df or use the col function that does similar output as rep. which(df > nv[col(df)], arr.ind=TRUE) If you need...

If I understand what you want correctly it's just a matter of making sure your function returns a vector of values rather than a data.frame object. I think this function will do what you want when run through the mutate() step: idw_w=function(x,y,z){ geog2 <- data.frame(x,y,z) coordinates(geog2) = ~x+y geog.grd <-...

How about dd <- as.data.frame(mat) dd[sapply(dd,function(x) all(x>=0))] ? sapply(...) returns a logical vector (in this case TRUE TRUE FALSE TRUE) that states whether the columns have all non-negative values. when used with a data frame (not a matrix), single-bracket indexing with a logical vector treats the data frame as a...

You can improve the speed of your function by using data.table. However, you would still have to use for loops (which is not a bad thing). library(data.table) simdiffuse <- function(a, b, c, d) { endo <- 1/a # innovation endogenous effect endomacro <- 1/b # category endogenous effect appeal <-...

r,for-loop,conditional-statements,apply

Here is a way in dplyr to replicate the same process that you showed with base R library(dplyr) fake.dat %>% summarise_each(funs(sum(.[(which.max(.)+1):n()]>5, na.rm=TRUE))) # samp1 samp2 samp3 samp4 #1 2 2 4 3 If you need it as two steps: datNA <- fake.dat %>% mutate_each(funs(replace(., seq_len(which.max(.)), NA))) datNA %>% summarise_each(funs(sum(.>5, na.rm=TRUE)))...

First of all, your return statement should really give you an error. You probably mean containedin <- function(t1,t2){ length(Reduce(intersect, strsplit(c(t1,t2), "\\s+"))) } Anyway, you can use mapply to solve your problem. mapply(containedin, as.character(data.selected[, 'keywords']), as.character(data.selected[, 'title'])) The as.character is only necessary if class(data.selected[, 'keywords']) is factor (instead of character)...

You can compute the mean by group using ave(). Assuming your data frame is called df, you can do the following: df$Mean <- with(df, ave(Value, ID, FUN=mean)) This adds Mean as another column in your data frame....

After some try and error attempts, I found a solution. In order to make comb_apply to work, I needed to unname each exp value before use it. Here is the code: comb_apply <- function(f,...){ exp <- expand.grid(...,stringsAsFactors = FALSE) apply(exp,1,function(x) do.call(f,unname(x))) } Now, executing str(comb_apply(testFunc,l1,l2)) I get the desired result,...

r,user-defined-functions,apply,udf,multiple-arguments

You can accomplish the same thing by passing the function directly into the apply apply(test, 1, function(x) if(x[1] > 0) sum(x) else x[1] - x[2] - x[3]) [1] 4 7 10 If you want to use your UDF you need to modify it. testfn = function(mydf){ if(mydf[1] > 0){y =...

You can try do.call(`c`,apply(splitData, 1, function(x) list(test[x,]))) Or lapply(seq_len(nrow(splitData)), function(i) test[unlist(splitData[i,]),]) From ?apply If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’ returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’. If ‘n’ equals ‘1’, ‘apply’ returns a vector if ‘MARGIN’ has length 1...

I think you want to use outer() and take advantage of lexical scoping so that you don't have to pass myData to the function being called with the longitude and lattitude: myData <- read.table(...) # or whatever outer(seq.int(dim(mydata)[1]), seq.int(dim(mydata)[2]), function(longitude,lattitude){ do things that depend on myData[longitude,lattitude,] }) ...

One value in 'start' was '0'. So, I changed to '1', created a matrix ('m1') of 1000 columns and 6 rows (length of unique elements in the 'id' column). Using Map, created a sequence for each 'start', 'end' value, the output is a list ('lst'). We rbind the 'lst' ('d2'),...

Here's another way with a more traditional loop: for (i in 2:length(log_return)) { assign(names(log_return[i]), xts(log_return[i], log_return$Date)) } This will create an xts object for each column name in the data.frame -- that is, an xts object named AUS.Yield, BRA.Yield, etc......

Put the if block in a function: plotGG <- function(i,j) { if (i != j) { ... } else{ ... } } Then call it: mapply(plotGG,8:11,8:11) And it works. Your code will not work due to a scoping issue with ggplot. But you can view the solution here: Local Variables...

I think this is what you're looking for. The easiest way to refer to columns of a data frame functionally is to use quoted column names. In principle, what you're doing is this data[, "weight"] / data[, "height"]^2 but inside a function you might want to let the user specify...

r,function,data.frame,apply,difference

I believe dplyr can help you here. library(dplyr) dfData <- data.frame(ID = c(1, 2, 3, 4, 5), DistA = c(100, 239, 392, 700, 770), DistB = c(200, 390, 550, 760, 900)) dfData <- mutate(dfData, comparison = DistA - lag(DistB)) This results in... dfData ID DistA DistB comparison 1 1 100...

There's no clean and easy functional solution in ES5. Here's the simplest I have: var myary = Array.apply(0,Array(N)).map(function(_,i){return i}); Edit: Be careful that expressions of this kind, while being sometimes convenient, can be very slow. This commit I made was motivated by performance issues. An old style and boring for...

apply returns an array and thus your output. Convert it to a data.frame and you ll be fine: #example data df <- data.frame(a=rep('[1, 4)',50) ) > df a 1 [1, 4) 2 [1, 4) 3 [1, 4) 4 [1, 4) 5 [1, 4) 6 [1, 4) 7 [1, 4) 8...

I would use outer instead of *apply: res <- outer( 1:nrow(VectIndVar), 1:nrow(VectClasses), Vectorize(function(i,k) sum(VectIndVar[i,-1]==VectClasses[k,-1])) ) (Thanks to this Q&A for clarifying that Vectorize is needed.) This gives > head(res) # with set.seed(1) before creating the data [,1] [,2] [,3] [,4] [1,] 1 1 2 1 [2,] 0 0 1 0...

You were close, but need to do mean(x[x > 150]) rather than mean(x > 150): test<- apply(Example,2,function(x) {mean(x[x > 150])}) This works because x[x > 150] says "take all values of x where x is above 150"....

Using a matrix. Using a matrix operation on a matrix is not slow: mat <- t(as.matrix(dt0[,-1,with=FALSE])) colnames(mat) <- dt0[[1]] mat[] <- na.spline(mat,na.rm=FALSE) which gives TOTAL,F,AD TOTAL,F,AL TOTAL,F,AM TOTAL,F,AT TOTAL,F,AZ 2014 32832 1409931 1692440 4351253 4755163 2013 37408 1409931 1688458 4328238 4707690 2012 38252 1409931 1684000 4309977 4651601 2011 38252 1409931...

python,pandas,dataframes,apply

Generally, Pandas Dataframe is good if you want to iterate over rows and treat each row as a vector. I would suggest that you use 2-dimensional numpy array. Once you have the array, you can iterate over each row and columns very easily. Here is the sample code: `for index,...

r,performance,if-statement,for-loop,apply

This produces your desired output and should be quite a bit faster than your initial approach with for-loops and if .. else .. statements: library(dplyr) dataset %>% group_by(ParticleId) %>% mutate(Volume = Volume[1L] - cumsum(lag(reduction, default = 0L)*flag)) #Source: local data frame [20 x 5] #Groups: ParticleId # # X1.20 ParticleId...

apply takes matrix arguments. data.frames will be coerced to matrix before anything else is done. Hence the conversion of everything to the same type (character). lapply takes list arguments. Therefore it coerces the data.frame to a list and does not have to convert the arguments. ...

I believe you want to use apply rather than lapply which apply a function to a list. Try this: Null_Counter <- apply(indata, 2, function(x) length(which(x == "" | is.na(x) | x == "NA" | x == "-999" | x == "0"))/length(x)) Null_Name <- colnames(indata)[Null_Counter >= 0.3] ...

Why reinvent the wheel? You have several library packages to choose from with functions that return a character matrix with one column for each capturing group in your pattern. stri_match_all_regex — stringi x <- c('[hg19:21:34809787-34809808:+]', '[hg19:11:105851118-105851139:+]', '[hg19:17:7482245-7482266:+]', '[hg19:6:19839915-19839936:+]') do.call(rbind, stri_match_all_regex(x, '\\[[^:]+:(\\d+):(\\d+)-(\\d+):([-+])]')) # [,1] [,2] [,3] [,4] [,5] # [1,] "[hg19:21:34809787-34809808:+]"...

Some example data: import numpy as np import pandas as pd my_target = 25 df = pd.DataFrame({'column1': np.random.normal(25, 3, 20), 'weight_column': np.random.random_integers(1, 10, 20)}) df Out[4]: column1 weight_column 0 23.147356 6 1 24.361162 5 2 25.665186 4 3 20.059039 1 4 28.573390 5 5 26.543743 1 6 23.177928 2 #...

Just apply the same procedure over a list: out_list <- lapply(lst, function(x) { lapply(x, fevd,type="GEV",method = c("MLE"))# fit GEv to each column }) You can find the model of 2nd df and 3rd column like this: out_list[[2]][[3]]) I'm not sure what exactly to average. If you want average values per...

This is ideal for ?rowsum, which should be fast Using RStudent's data rowsum(m, rep(1:3, each=5), na.rm=TRUE) The second argument, group, defines the rows which to apply the sum over. More generally, the group argument could be defined rep(1:nrow(m), each=5, length=nrow(m)) (sub nrow with length if applying over a vector)...

r,apply,mathematical-optimization,sapply

list_dat is not a list, it is an array of lists. Your definition of min.RSS defines data as it's argument, but then refers to list # You don't really need to preallocate the list, but if you insist list_dat <- vector(length=2, mode='list') list_dat[[1]] =data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,6,8,12,15,19)) list_dat[[2]] =data.frame(x=c(1,2,3,4,5,6), y=c(1,3,5,6,8,12)) min.RSS...

You are almost there, as the error says, you just need to define a function in apply: apply(df, 2, function(u) table(factor(u, levels=vec))) # V1 V2 V3 #x 2 1 0 #y 1 1 1 #z 0 1 2 You can also use lapply function which iterates over the columns of...

Here's a try. Turn the outliers data frame into a named vector: out <- outliers$outlier names(out) <- outliers$subject Then use it as a lookup table to select all the rows of data where the RT column is less than the outlier value for the subject: data[data$RT < out[as.character(data$subject)], ] The...

Since mapply use ellipsis ... to pass vectors (atomics or lists) and not a named argument (X) as in sapply, lapply, etc ... you don't need to name the parameter X = trees if you use mapply instead of sapply : funs <- list(sd = sd, mean = mean) x...

This is easier than you're making it # Which are the rows with bad values for mm? Create an indexing vector: bad_mm <- is.na(zooplankton$length_mm) # Now, for those rows, replace length_mm with length_units/10 zooplankton$length_mm[bad_mm] <- zooplankton$length_units[bad_mm]/10 Remember to use is.na(x) instead of x==NA when checking for NA vals. Why? Take...

elasticsearch,conditional,apply,exists

Try this { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "bool": { "must": [ { "range": { "date": { "from": "2015-06-01", "to": "2015-06-30" } } }, { "bool": { "should": [ { "missing": { "field": "groups" } }, { "bool": { "must": { "term": { "groups.sex":...

r,statistics,plyr,apply,linear-regression

I don't know how this will be helpful in a linear regression but you could do something like that: df <- read.table(header=T, text="Assay Sample Dilution meanresp number 1 S 0.25 68.55 1 1 S 0.50 54.35 2 1 S 1.00 44.75 3") Using lapply: > lapply(2:nrow(df), function(x) df[(x-1):x,] ) [[1]]...

There are a number of things that I don't think you're entirely understanding about how all these elements of Scheme fit together. First of all, the term "tuple" is a little ambiguous. Scheme does not have any formal tuple type—it has pairs and lists. Lists are themselves built from pairs,...

I think I understand what you're after. This is actually slightly more complex than it may seem, because months are not regular periods of time; they vary in number of days, and February varies between years due to leap years. Thus a simple regular logical or numeric index vector will...

You could use ave from base R test$meanbyname <- with(test, ave(value, name)) Or using mutate from dplyr or := in data.table, can get the results i.e. library(dplyr) group_by(test, name) %>% mutate(meanbyname=mean(value)) Or library(data.table) setDT(test)[, meanbyname:= mean(value), by=name] ...

Scala starts to look for implicit conversions, only when it can't find an existing method with the required signature. But in Try companion object there is already a suitable method: def apply[T](r: ⇒ T): Try[T], so Scala infers T in apply as Future[Something] and doesn't check for implicit conversions. Also,...

r,aggregate,nested-loops,apply,summary

Let us name the anonymous function in the question as follows. Then the Map statement at the end applies aggregate to df[1:3] separately by each grouping variable: mean.sd.n <- function(x) c(m = mean(x, na.rm=T), sd = sd(x, na.rm=T), n = length(x)) Map(function(nm) aggregate(df[1:3], df[nm], mean.sd.n), names(df)[4:6]) giving: $g1 g1 s1.m...

r,performance,algorithm,matrix,apply

This is my implementation of your dist.JSD_2 dist0 <- function(m) { ncol <- ncol(m) result <- matrix(0, ncol, ncol) for (i in 2:ncol) { for (j in 1:(i-1)) { x <- m[,i]; y <- m[,j] result[i, j] <- sqrt(0.5 * (sum(x * log(x / ((x + y) / 2))) +...

you could also use cut as in: cut(unclass(x)$hour-7,c(0,15,24)-8,c('night','morning')) (note that you have to shift your frame of reference so that you don't have two 'night' categories with this solution)...

You can use rolling join from data.table package library(data.table) setkey(setDT(df), x) df1 <- data.table(x=a, id1=1:length(a)) setkey(df1, x) df1[df, roll="nearest"] id1 column will give you the desired result....

Rescaled pop2010 in order to avoid integer overflow. with(county, tapply((pop2010/10000)*per_capita_income, state, function(x) x/length(x))) answer posted by jbaums...

As mentioned by @DavidArenburg, there are better ways to do this. If you are really after factors, then you can do as @David recommended: df[] <- lapply(df, factor, levels = levels, labels = labels) The [] preserves the structure of the input while assigning the value returned from the function/s...

You have called the argument costs and not cost. Here's an example using the sample data in ?svm so you can try this: model <- svm(Species ~ ., data = iris, cost=.6) model$cost # [1] 0.6 model <- svm(Species ~ ., data = iris, costs=.6) model$cost # [1] 1 R...

There is a function seq.Date in the base package that will allow you to make a sequence for a Date object. But a matrix will still only take atomic vectors, so you will either just have to call as.Date() again whenever you need to use the Date, or just store...

Using sapply over the number of rows,(essentially just hiding the for loop) gives you what you want: values = sapply(1:nrow(true), function(i) cut(true[i,], br[i,], labels=FALSE, include.lowest=TRUE))) values = t(values) Unfortunately we need an extra transpose step to get the matrix the correct way. Regarding your for loop in your question, when...

Aha! Seconds after posting this I arrived at the answer: need to include the parens on the function call: i.e methodReturnsArray () (0) : scala> methodReturnsArray()(0) res22: Double = 1.0 ...

You can use mapply: mapply(FUN= distancePointSegment, point_coords[1,], point_coords[2,], MoreArgs = list(x1=x1, x2=x2, y1=y1, y2=y2)) Or change your function and use apply: # Function that I want to apply: distancePointSegment <- function(p, x1, y1, x2, y2) { px <- p[1] #the coordinates are passed as a vector to the function py...

Try this do.call(rbind.data.frame, lapply(1:length(List), function(i) cbind(List[[i]][[8]]))) ...

Just use window functions for these calculations: SELECT DISTINCT tmp.Arrival, tmp.Flight, COUNT(*) OVER (PARTITION BY Flight) as NumPassengers, SUM(CASE WHEN SegmentNumber = 1 AND LegNumber = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY Flight, Arrival) ) as NumLocalPassengers, STD, STA FROM #TempLocalOrg tmp; ...

r,matrix,probability,apply,frequency-distribution

Here's an attempt, but on a dataframe instead of a matrix: df <- data.frame(replicate(100,sample(1:10, 10e4, rep=TRUE))) I tried a dplyr approach: library(dplyr) df %>% mutate(rs = rowSums(.)) %>% mutate_each(funs(. / rs), -rs) %>% select(-rs) Here are the results: library(microbenchmark) mbm = microbenchmark( dplyr = df %>% mutate(rs = rowSums(.)) %>%...

python,pandas,dataframes,apply

You could use pd.rolling_apply: import numpy as np import pandas as pd df = pd.read_table('data', sep='\s+') def foo(x, df): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) def bar(dvals): # print(dvals) return dvals.mean() df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,)) print(df) yields a...

In [22]: pd.set_option('max_rows',20) In [33]: N = 10000000 In [34]: df = DataFrame({'A' : np.random.randint(0,100,size=N), 'B' : np.random.randint(0,100,size=N)}) In [35]: df[df.groupby('A')['B'].transform('max') == df['B']] Out[35]: A B 161 30 99 178 53 99 264 58 99 337 96 99 411 44 99 428 85 99 500 84 99 598 98 99...

Use mapply: Airlines$Tref <- mapply( FUN = FUN_Tref, Airlines$AC_MODEL, Airlines$CalcAlt) # AC_MODEL CalcAlt Tref #1 320-232 200 30.76923 #2 321-231 200 14.76000 #3 320-232 400 30.53846 #4 321-231 400 14.52000 #5 320-232 600 30.30769 #6 321-231 600 14.28000 #7 320-232 800 30.07692 #8 321-231 800 14.04000 #9 320-232 1000 29.84615...

r,data.frame,apply,na,missing-data

x <- sample.df[ lapply( sample.df, function(x) sum(is.na(x)) / length(x) ) < 0.1 ] ...

r,loops,matrix,vectorization,apply

To make my remarks in comment column clear, suppose we have dfmat as a list of matrices. It is almost always easier to work with a list of matrices than one big named matrix. Also if you want to fully vectorize the solution given here, you might want to obtain...

There is no generic way to write a function which will seemlessly handle both DataFrames and Series. You would either need to use an if-statement to check for type, or use try..except to handle exceptions. Instead of doing either of those things, I think it is better to make sure...

r,loops,apply,subsetting,multiple-conditions

You can try to use a self-defined function in aggregate sum1sttwo<-function (x){ return(x[1]+x[2]) } aggregate(count~id+group, data=df,sum1sttwo) and the output is: id group count 1 2 A 14 2 8 A 11 3 10 B 12 4 11 B 11 5 16 C 8 6 18 C 7 04/2015 edit: dplyr...

r,if-statement,nested,data.frame,apply

You want to check if any of the variables in a row are 0, so you need to use any(x==0) instead of x == 0 in the ifelse statement: apply(data, 1, function(x) {ifelse(any(x == 0), NA, length(unique(x)))}) # [1] 1 NA 2 Basically ifelse returns a vector of length n...