python,function,arguments,apply

Try using argument unpacking. self["commands"][values[0]](*values[1:]) ...

If ls1 and ls2 have equal length: lapply( seq_along(ls1), function(i) { rbind.fill.matrix(ls1[[i]], ls2[[i]]) } ) Result: # [[1]] # A B C D E F G H I J W X Y Z # [1,] 0 1 1 0 0 0 0 1 1 0 NA NA NA NA #...

By using anonymous functions, we are returning only the value of that function, and not the value of 'x'. We have to specify return(x) or simply x. lapply(lst, function(x) { length(x) <- max(lengths(lst)) x}) #$a #[1] 1 NA #$b #[1] 2 3 ...

Just use window functions for these calculations: SELECT DISTINCT tmp.Arrival, tmp.Flight, COUNT(*) OVER (PARTITION BY Flight) as NumPassengers, SUM(CASE WHEN SegmentNumber = 1 AND LegNumber = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY Flight, Arrival) ) as NumLocalPassengers, STD, STA FROM #TempLocalOrg tmp; ...

You can improve the speed of your function by using data.table. However, you would still have to use for loops (which is not a bad thing). library(data.table) simdiffuse <- function(a, b, c, d) { endo <- 1/a # innovation endogenous effect endomacro <- 1/b # category endogenous effect appeal <-...

I think I understand what you're after. This is actually slightly more complex than it may seem, because months are not regular periods of time; they vary in number of days, and February varies between years due to leap years. Thus a simple regular logical or numeric index vector will...

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

Using rowsum seems to be faster (at least for this small example dataset) than the data.table approach: sgibb <- function(datframe) { data.frame(Group = unique(df$Group), Avg = rowsum(df$Weighted_Value, df$Group)/rowsum(df$SumVal, df$Group)) } Adding the rowsum approach to @platfort's benchmark: library(microbenchmark) library(dplyr) library(data.table) microbenchmark( Nader = df %>% group_by(Group) %>% summarise(res = sum(Weighted_Value)...

r,aggregate,nested-loops,apply,summary

Let us name the anonymous function in the question as follows. Then the Map statement at the end applies aggregate to df[1:3] separately by each grouping variable: mean.sd.n <- function(x) c(m = mean(x, na.rm=T), sd = sd(x, na.rm=T), n = length(x)) Map(function(nm) aggregate(df[1:3], df[nm], mean.sd.n), names(df)[4:6]) giving: $g1 g1 s1.m...

You can use the negated rowSums() for the subset df[!rowSums(df[-1] > 0.7), ] # zips ABC DEF GHI JKL # 4 4 0.6 0.4 0.2 0.3 # 6 6 0.2 0.7 0.3 0.4 df[-1] > 0.7 gives us a logical matrix telling us which df[-1] are greater than 0.7 rowSums()...

You can try sapply(data.matrix, function(x) min(x$P)) If the min values should replace the P column lapply(data.matrix, function(x) {x$P <- min(x$P);x}) ...

As mentioned by @DavidArenburg, there are better ways to do this. If you are really after factors, then you can do as @David recommended: df[] <- lapply(df, factor, levels = levels, labels = labels) The [] preserves the structure of the input while assigning the value returned from the function/s...

You could use ave from base R test$meanbyname <- with(test, ave(value, name)) Or using mutate from dplyr or := in data.table, can get the results i.e. library(dplyr) group_by(test, name) %>% mutate(meanbyname=mean(value)) Or library(data.table) setDT(test)[, meanbyname:= mean(value), by=name] ...

Use mapply: Airlines$Tref <- mapply( FUN = FUN_Tref, Airlines$AC_MODEL, Airlines$CalcAlt) # AC_MODEL CalcAlt Tref #1 320-232 200 30.76923 #2 321-231 200 14.76000 #3 320-232 400 30.53846 #4 321-231 400 14.52000 #5 320-232 600 30.30769 #6 321-231 600 14.28000 #7 320-232 800 30.07692 #8 321-231 800 14.04000 #9 320-232 1000 29.84615...

r,performance,algorithm,matrix,apply

This is my implementation of your dist.JSD_2 dist0 <- function(m) { ncol <- ncol(m) result <- matrix(0, ncol, ncol) for (i in 2:ncol) { for (j in 1:(i-1)) { x <- m[,i]; y <- m[,j] result[i, j] <- sqrt(0.5 * (sum(x * log(x / ((x + y) / 2))) +...

r,function,vectorization,apply,mapply

You can try outer f1 <- function(x,y) x^2+x^y-3 outer(1:5, 12:16, f1) which would be similar to t(Vectorize(function(x) f1(x,12:16))(1:5)) ...

r,loops,matrix,vectorization,apply

To make my remarks in comment column clear, suppose we have dfmat as a list of matrices. It is almost always easier to work with a list of matrices than one big named matrix. Also if you want to fully vectorize the solution given here, you might want to obtain...

@alexis_laz answered the question (Thanks!) by linking to this. I'm posting it here since it it was mentioned in the comments section.

Using sapply over the number of rows,(essentially just hiding the for loop) gives you what you want: values = sapply(1:nrow(true), function(i) cut(true[i,], br[i,], labels=FALSE, include.lowest=TRUE))) values = t(values) Unfortunately we need an extra transpose step to get the matrix the correct way. Regarding your for loop in your question, when...

Try: integers <- as.data.table(apply(dt, 1, function(x) as.integer(substr(x, 50, 51)))) The apply family of functions accept other functions and executes them over vectors and arrays. These functions are some times already defined, but an interesting feature was added to apply functions, you can write the function right there at the line...

r,matrix,apply,matrix-multiplication

t3 <- apply(t2, 2, function(v) v/max(v)) or for (i in 1:ncol(t2)) t2[,i] <- t2[,i]/t2[i,i] I'm assuming you want the asymmetric matrix, i.e. percentage of people who purchased product X who also purchased product Y (which is different from percentage of people who purchased product Y who also purchased product X)....

You could try using the apply with "MARGIN=2" to loop over the columns of m. The below code is similar to the one you used for "m.low" except that it is using replace function to replace the elements in each column based on the condition argument i < sort(i).. to...

Just apply the same procedure over a list: out_list <- lapply(lst, function(x) { lapply(x, fevd,type="GEV",method = c("MLE"))# fit GEv to each column }) You can find the model of 2nd df and 3rd column like this: out_list[[2]][[3]]) I'm not sure what exactly to average. If you want average values per...

Using a matrix. Using a matrix operation on a matrix is not slow: mat <- t(as.matrix(dt0[,-1,with=FALSE])) colnames(mat) <- dt0[[1]] mat[] <- na.spline(mat,na.rm=FALSE) which gives TOTAL,F,AD TOTAL,F,AL TOTAL,F,AM TOTAL,F,AT TOTAL,F,AZ 2014 32832 1409931 1692440 4351253 4755163 2013 37408 1409931 1688458 4328238 4707690 2012 38252 1409931 1684000 4309977 4651601 2011 38252 1409931...

This is easier than you're making it # Which are the rows with bad values for mm? Create an indexing vector: bad_mm <- is.na(zooplankton$length_mm) # Now, for those rows, replace length_mm with length_units/10 zooplankton$length_mm[bad_mm] <- zooplankton$length_units[bad_mm]/10 Remember to use is.na(x) instead of x==NA when checking for NA vals. Why? Take...

r,if-statement,nested,data.frame,apply

You want to check if any of the variables in a row are 0, so you need to use any(x==0) instead of x == 0 in the ifelse statement: apply(data, 1, function(x) {ifelse(any(x == 0), NA, length(unique(x)))}) # [1] 1 NA 2 Basically ifelse returns a vector of length n...

Rescaled pop2010 in order to avoid integer overflow. with(county, tapply((pop2010/10000)*per_capita_income, state, function(x) x/length(x))) answer posted by jbaums...

Some example data: import numpy as np import pandas as pd my_target = 25 df = pd.DataFrame({'column1': np.random.normal(25, 3, 20), 'weight_column': np.random.random_integers(1, 10, 20)}) df Out[4]: column1 weight_column 0 23.147356 6 1 24.361162 5 2 25.665186 4 3 20.059039 1 4 28.573390 5 5 26.543743 1 6 23.177928 2 #...

I think you want to use outer() and take advantage of lexical scoping so that you don't have to pass myData to the function being called with the longitude and lattitude: myData <- read.table(...) # or whatever outer(seq.int(dim(mydata)[1]), seq.int(dim(mydata)[2]), function(longitude,lattitude){ do things that depend on myData[longitude,lattitude,] }) ...

Answer to your question is no, you can't distinguish between Letter.type and Letter.apply, so you need a workaround. I would suggest you to use Type-Class pattern for it, it's much more extenible than adding random-generating apply method to companion object trait RandomGenerator[T] { def gen: T } implicit object RandomLetter...

r,user-defined-functions,apply,udf,multiple-arguments

You can accomplish the same thing by passing the function directly into the apply apply(test, 1, function(x) if(x[1] > 0) sum(x) else x[1] - x[2] - x[3]) [1] 4 7 10 If you want to use your UDF you need to modify it. testfn = function(mydf){ if(mydf[1] > 0){y =...

r,performance,if-statement,for-loop,apply

This produces your desired output and should be quite a bit faster than your initial approach with for-loops and if .. else .. statements: library(dplyr) dataset %>% group_by(ParticleId) %>% mutate(Volume = Volume[1L] - cumsum(lag(reduction, default = 0L)*flag)) #Source: local data frame [20 x 5] #Groups: ParticleId # # X1.20 ParticleId...

You have called the argument costs and not cost. Here's an example using the sample data in ?svm so you can try this: model <- svm(Species ~ ., data = iris, cost=.6) model$cost # [1] 0.6 model <- svm(Species ~ ., data = iris, costs=.6) model$cost # [1] 1 R...

Here is an implementation using while, although it is taking much longer than nested for loops which is a bit counter intuitive. f1 <- function() { n <- 1500 d <- 250 f = runif(n,1,5) f = embed(f, d) f = f[-(n-d+1),] count = rep(0, n-d) for(i in 1:(n-d)) {...

Here's a try. Turn the outliers data frame into a named vector: out <- outliers$outlier names(out) <- outliers$subject Then use it as a lookup table to select all the rows of data where the RT column is less than the outlier value for the subject: data[data$RT < out[as.character(data$subject)], ] The...

How about dd <- as.data.frame(mat) dd[sapply(dd,function(x) all(x>=0))] ? sapply(...) returns a logical vector (in this case TRUE TRUE FALSE TRUE) that states whether the columns have all non-negative values. when used with a data frame (not a matrix), single-bracket indexing with a logical vector treats the data frame as a...

There's no clean and easy functional solution in ES5. Here's the simplest I have: var myary = Array.apply(0,Array(N)).map(function(_,i){return i}); Edit: Be careful that expressions of this kind, while being sometimes convenient, can be very slow. This commit I made was motivated by performance issues. An old style and boring for...

There is a function seq.Date in the base package that will allow you to make a sequence for a Date object. But a matrix will still only take atomic vectors, so you will either just have to call as.Date() again whenever you need to use the Date, or just store...

You can use mapply: mapply(FUN= distancePointSegment, point_coords[1,], point_coords[2,], MoreArgs = list(x1=x1, x2=x2, y1=y1, y2=y2)) Or change your function and use apply: # Function that I want to apply: distancePointSegment <- function(p, x1, y1, x2, y2) { px <- p[1] #the coordinates are passed as a vector to the function py...

One value in 'start' was '0'. So, I changed to '1', created a matrix ('m1') of 1000 columns and 6 rows (length of unique elements in the 'id' column). Using Map, created a sequence for each 'start', 'end' value, the output is a list ('lst'). We rbind the 'lst' ('d2'),...

You could do this with dplyr like this: library(dplyr) help %>% group_by(deid) %>% mutate(epi = cumsum(ifelse(days.since.last>90,1,0))+1) deid session.number days.since.last epi 1 1 1 0 1 2 1 2 7 1 3 1 3 12 1 4 5 1 0 1 5 5 2 7 1 6 5 3 14 1...

Put the if block in a function: plotGG <- function(i,j) { if (i != j) { ... } else{ ... } } Then call it: mapply(plotGG,8:11,8:11) And it works. Your code will not work due to a scoping issue with ggplot. But you can view the solution here: Local Variables...

In base R, you could use merge and rowMeans (assuming that the 'score' column is 'numeric'). res <- merge(test1, test2[-1], by='studentName') res # studentName id score.x score.y #1 Alice 1 100 90 #2 Bob 2 98 95 #3 Josh 3 64 80 We are interested in averaging the rows of...

python,pandas,dataframes,apply

Using the Series constructor within the apply usually does the trick: In [11]: df[['new_1','new_2']] = df[['A','B','C']].apply(lambda x: pd.Series([x[1]/2,x[2]*2]), axis=1) In [12]: df Out[12]: A B C new_1 new_2 0 11 21 31 10 62 1 12 22 31 11 62 I see a different error without it (before assignment): In...

r,matrix,probability,apply,frequency-distribution

Here's an attempt, but on a dataframe instead of a matrix: df <- data.frame(replicate(100,sample(1:10, 10e4, rep=TRUE))) I tried a dplyr approach: library(dplyr) df %>% mutate(rs = rowSums(.)) %>% mutate_each(funs(. / rs), -rs) %>% select(-rs) Here are the results: library(microbenchmark) mbm = microbenchmark( dplyr = df %>% mutate(rs = rowSums(.)) %>%...

Aha! Seconds after posting this I arrived at the answer: need to include the parens on the function call: i.e methodReturnsArray () (0) : scala> methodReturnsArray()(0) res22: Double = 1.0 ...

I am not clear on why, but it seems the problem is that you are returning a series. This seems to work in your given example: def make_mask(s): if s.unique().shape[0] == 2: # If binary, return all-false mask return np.zeros(s.shape[0], dtype=bool) else: # Otherwise, identify outliers return s >= np.percentile(s,...

Here's another way with a more traditional loop: for (i in 2:length(log_return)) { assign(names(log_return[i]), xts(log_return[i], log_return$Date)) } This will create an xts object for each column name in the data.frame -- that is, an xts object named AUS.Yield, BRA.Yield, etc......

You can try do.call(`c`,apply(splitData, 1, function(x) list(test[x,]))) Or lapply(seq_len(nrow(splitData)), function(i) test[unlist(splitData[i,]),]) From ?apply If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’ returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’. If ‘n’ equals ‘1’, ‘apply’ returns a vector if ‘MARGIN’ has length 1...

r,apply,mathematical-optimization,sapply

list_dat is not a list, it is an array of lists. Your definition of min.RSS defines data as it's argument, but then refers to list # You don't really need to preallocate the list, but if you insist list_dat <- vector(length=2, mode='list') list_dat[[1]] =data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,6,8,12,15,19)) list_dat[[2]] =data.frame(x=c(1,2,3,4,5,6), y=c(1,3,5,6,8,12)) min.RSS...

One option would be to compare with equally sized elements. For this we can replicate the elements in 'nv' each by number of rows of 'df' (rep(nv, each=nrow(df))) and compare with df or use the col function that does similar output as rep. which(df > nv[col(df)], arr.ind=TRUE) If you need...

You are almost there, as the error says, you just need to define a function in apply: apply(df, 2, function(u) table(factor(u, levels=vec))) # V1 V2 V3 #x 2 1 0 #y 1 1 1 #z 0 1 2 You can also use lapply function which iterates over the columns of...

If I understand what you want correctly it's just a matter of making sure your function returns a vector of values rather than a data.frame object. I think this function will do what you want when run through the mutate() step: idw_w=function(x,y,z){ geog2 <- data.frame(x,y,z) coordinates(geog2) = ~x+y geog.grd <-...

You can use rolling join from data.table package library(data.table) setkey(setDT(df), x) df1 <- data.table(x=a, id1=1:length(a)) setkey(df1, x) df1[df, roll="nearest"] id1 column will give you the desired result....

There are a number of things that I don't think you're entirely understanding about how all these elements of Scheme fit together. First of all, the term "tuple" is a little ambiguous. Scheme does not have any formal tuple type—it has pairs and lists. Lists are themselves built from pairs,...

There is no generic way to write a function which will seemlessly handle both DataFrames and Series. You would either need to use an if-statement to check for type, or use try..except to handle exceptions. Instead of doing either of those things, I think it is better to make sure...

Try this: #data df <- read.table(text=" tissueA tissueB tissueC gene1 4.5 6.2 5.8 gene2 3.2 4.7 6.6") #result apply(df,1,function(i){ my.max <- max(i) my.statistic <- (1-log2(i)/log2(my.max)) my.sum <- sum(my.statistic) my.answer <- my.sum/(length(i)-1) my.answer }) #result # gene1 gene2 # 0.1060983 0.2817665 ...

Try mapply(function(x,y) tapply(x,y, FUN=mean) , Example[seq(1, ncol(Example), 2)], Example[seq(2, ncol(Example), 2)]) Or instead of seq(1, ncol(Example), 2) just use c(TRUE, FALSE) and c(FALSE, TRUE) for the second case...

Since mapply use ellipsis ... to pass vectors (atomics or lists) and not a named argument (X) as in sapply, lapply, etc ... you don't need to name the parameter X = trees if you use mapply instead of sapply : funs <- list(sd = sd, mean = mean) x...

Just try: outer(a,b,"==")+0 # [,1] [,2] [,3] [,4] [,5] #[1,] 1 0 0 0 0 #[2,] 0 1 0 0 0 #[3,] 0 0 1 0 0 If you want row and column names: res<-outer(a,b,"==")+0 dimnames(res)<-list(a,b) EDIT Just a funnier one: `[<-`(matrix(0,nrow=length(a),ncol=length(b)), cbind(seq_along(a),match(a,b)), 1) ...

I would use outer instead of *apply: res <- outer( 1:nrow(VectIndVar), 1:nrow(VectClasses), Vectorize(function(i,k) sum(VectIndVar[i,-1]==VectClasses[k,-1])) ) (Thanks to this Q&A for clarifying that Vectorize is needed.) This gives > head(res) # with set.seed(1) before creating the data [,1] [,2] [,3] [,4] [1,] 1 1 2 1 [2,] 0 0 1 0...

This is ideal for ?rowsum, which should be fast Using RStudent's data rowsum(m, rep(1:3, each=5), na.rm=TRUE) The second argument, group, defines the rows which to apply the sum over. More generally, the group argument could be defined rep(1:nrow(m), each=5, length=nrow(m)) (sub nrow with length if applying over a vector)...

If you end goal is to combine them into a single data.table, then in the latest version (1.9.5+) you can do it all in one step: rbindlist(test, idcol = 'Site') # Site x y # 1: a 1.907162564 -1.28512736 # 2: a 1.144876890 0.03482725 # 3: a -0.764530737 1.57029534 #...

If I understand correctly you want to evaluate the first expression with the first value of x, the second with the second etc. You could do: mapply(function(ex, x) eval(ex, envir = list(x = x)), funs.list[1:2], c(7, 60)) ...

The following preserves the original data structure. Is it what's looked for? df = data.frame('test'=c(0,0,1,0)) df[] <- apply(df,2,function(j){sub(0,'00',j)}) df[] <- apply(df,2,function(j){sub(1,'01',j)}) df[] <- apply(df,2,function(j){sub(2,'10',j)}) df # test # 1 00 # 2 00 # 3 01 # 4 00 df1 = t(data.frame('test'=c(0,0,1,0))) df1[] <- apply(df1,2,function(j){sub(0,'00',j)}) df1[] <- apply(df1,2,function(j){sub(1,'01',j)}) df1[] <-...

You can split rs by days, apply aggregatets on them and rbind l <- split.xts(rs, f="days") ts <- do.call("rbind", lapply(l, function(x){ aggregatets(x ,on="minutes", k=15)})) ...

You were close, but need to do mean(x[x > 150]) rather than mean(x > 150): test<- apply(Example,2,function(x) {mean(x[x > 150])}) This works because x[x > 150] says "take all values of x where x is above 150"....

r,loops,apply,subsetting,multiple-conditions

You can try to use a self-defined function in aggregate sum1sttwo<-function (x){ return(x[1]+x[2]) } aggregate(count~id+group, data=df,sum1sttwo) and the output is: id group count 1 2 A 14 2 8 A 11 3 10 B 12 4 11 B 11 5 16 C 8 6 18 C 7 04/2015 edit: dplyr...

I made some minor changes to your function. You should just return the object and save the result of the function rather than using <<- #example data element1 <- c("control", "control", "variation", "variation") element2 <- c("control", "variation", "variation", "control") element3 <- c("variation", "control", "variation", "variation") metric <- c(10,15,20,25) other <-...

elasticsearch,conditional,apply,exists

Try this { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "bool": { "must": [ { "range": { "date": { "from": "2015-06-01", "to": "2015-06-30" } } }, { "bool": { "should": [ { "missing": { "field": "groups" } }, { "bool": { "must": { "term": { "groups.sex":...

Why reinvent the wheel? You have several library packages to choose from with functions that return a character matrix with one column for each capturing group in your pattern. stri_match_all_regex — stringi x <- c('[hg19:21:34809787-34809808:+]', '[hg19:11:105851118-105851139:+]', '[hg19:17:7482245-7482266:+]', '[hg19:6:19839915-19839936:+]') do.call(rbind, stri_match_all_regex(x, '\\[[^:]+:(\\d+):(\\d+)-(\\d+):([-+])]')) # [,1] [,2] [,3] [,4] [,5] # [1,] "[hg19:21:34809787-34809808:+]"...

I think this is what you're looking for. The easiest way to refer to columns of a data frame functionally is to use quoted column names. In principle, what you're doing is this data[, "weight"] / data[, "height"]^2 but inside a function you might want to let the user specify...

We could split the 'bar' by 'column' (col(bar)) and with mapply we can apply 'foo' for the corresponding 'a', 'b', 'c' values to each column of 'bar' mapply(foo, split(bar, col(bar)), a, b, c) Or without using apply ind <- col(bar) (a[ind]*bar +b[ind])^c[ind] ...

You can compute the mean by group using ave(). Assuming your data frame is called df, you can do the following: df$Mean <- with(df, ave(Value, ID, FUN=mean)) This adds Mean as another column in your data frame....

Not sure why you used sqldf, see this example: #dummy data set.seed(12) datwe <- data.frame(replicate(37,sample(c(1,2,99),10,rep=TRUE))) #convert to Yes/No res <- as.data.frame( sapply(datwe[,23:37], function(i) ifelse(i==1, "Yes", ifelse(i==2, "No", ifelse(i==99,NA,"Name itttt"))))) #update dataframe datwe <- cbind(datwe[, 1:22],res) #output, just showing first 2 columns datwe[,23:24] # X23 X24 # 1 No Yes #...

I believe you want to use apply rather than lapply which apply a function to a list. Try this: Null_Counter <- apply(indata, 2, function(x) length(which(x == "" | is.na(x) | x == "NA" | x == "-999" | x == "0"))/length(x)) Null_Name <- colnames(indata)[Null_Counter >= 0.3] ...

Scala starts to look for implicit conversions, only when it can't find an existing method with the required signature. But in Try companion object there is already a suitable method: def apply[T](r: ⇒ T): Try[T], so Scala infers T in apply as Future[Something] and doesn't check for implicit conversions. Also,...

r,function,data.frame,apply,difference

I believe dplyr can help you here. library(dplyr) dfData <- data.frame(ID = c(1, 2, 3, 4, 5), DistA = c(100, 239, 392, 700, 770), DistB = c(200, 390, 550, 760, 900)) dfData <- mutate(dfData, comparison = DistA - lag(DistB)) This results in... dfData ID DistA DistB comparison 1 1 100...

We don't need apply with MARGIN=1. Instead, we can paste the columns by with(birds, paste(year, month, day, sep="-")) and wrap it with as.Date to convert to 'Date' class. The output of ymd is POSIXct class, within the apply, it will be coerced to 'numeric' form. library(lubridate) library(dplyr) mutate(birds, date=ymd(paste(year, month,...

you could also use cut as in: cut(unclass(x)$hour-7,c(0,15,24)-8,c('night','morning')) (note that you have to shift your frame of reference so that you don't have two 'night' categories with this solution)...

You can try with(tradeData, ave(AL, Login, FUN=function(x) -1*c(0, diff(x)))) #[1] 0 0 0 1 0 0 0 -1 0 1 -1 0 1 0 1 0 0 -1 0 0 1 0 -1 Or an option using data.table. Convert the "data.frame" to "data.table" with setDT. Take the difference between current...

Using the Matrix package (which ships with a standard installation of R) nums <- c(1,2,3,4,5,1,2,4,3,5) apply(Matrix::sparseMatrix(i=seq_along(nums), j=nums), 2, cumsum) # [,1] [,2] [,3] [,4] [,5] # [1,] 1 0 0 0 0 # [2,] 1 1 0 0 0 # [3,] 1 1 1 0 0 # [4,] 1 1...

I'm not a big fan of by(). I'd tackle this task with split() and lapply(). do.call(rbind, lapply(split(df, list(df$A, df$B)), function(d) { l <- lm(C~D, data=d)$coef data.frame(A=d$A[1], B=d$B[1], COR=cor(d$C, d$D), LM1=l[1], LM2=l[2]) } )) This gives: A B COR LM1 LM2 x.a x a 1 -5.000000 2.0000000 y.a y a 1...

python,pandas,dataframes,apply

You could use pd.rolling_apply: import numpy as np import pandas as pd df = pd.read_table('data', sep='\s+') def foo(x, df): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) def bar(dvals): # print(dvals) return dvals.mean() df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,)) print(df) yields a...

After some try and error attempts, I found a solution. In order to make comb_apply to work, I needed to unname each exp value before use it. Here is the code: comb_apply <- function(f,...){ exp <- expand.grid(...,stringsAsFactors = FALSE) apply(exp,1,function(x) do.call(f,unname(x))) } Now, executing str(comb_apply(testFunc,l1,l2)) I get the desired result,...

python,pandas,dataframes,apply

Generally, Pandas Dataframe is good if you want to iterate over rows and treat each row as a vector. I would suggest that you use 2-dimensional numpy array. Once you have the array, you can iterate over each row and columns very easily. Here is the sample code: `for index,...

In [22]: pd.set_option('max_rows',20) In [33]: N = 10000000 In [34]: df = DataFrame({'A' : np.random.randint(0,100,size=N), 'B' : np.random.randint(0,100,size=N)}) In [35]: df[df.groupby('A')['B'].transform('max') == df['B']] Out[35]: A B 161 30 99 178 53 99 264 58 99 337 96 99 411 44 99 428 85 99 500 84 99 598 98 99...

r,data.frame,apply,na,missing-data

x <- sample.df[ lapply( sample.df, function(x) sum(is.na(x)) / length(x) ) < 0.1 ] ...