one way: iris$Sepal.Width[ iris$Species %in% "virginica"] Probably best to google subsetting though as this is all easily available in tutorials everywhere, and this will have been asked elsewhere on SO....

Try df[grep('1{3,}', df$history),] ...

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

Read up on mapcar et al: (defparameter a (list 1 2 3 4)) (mapcon (lambda (tail) (mapcar (lambda (x) (cons (car tail) x)) (cdr tail))) a) ==> ((1 . 2) (1 . 3) (1 . 4) (2 . 3) (2 . 4) (3 . 4)) ...

r,logic,aggregate,subset,subsetting

The problem is that == is alternating between the values of "honda" and "harley" and comparing with the value in the relevant position of your "manufacturer" variable. On the other hand, %in% (as suggested by MrFlick) and | are checking across the entire "manufacturer" variable before deciding which values to...

Adapting my comment into an answer, taking into account the presented data and OP's comments. Note, code is not checked as dput of data was not obtained. library("dplyr") data_summarised <- data %>% mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date,...

With your example, it is now clearer what you want and I give it a second try. I use dataGL_all as defined in your question and the define stations <- rep(c("FLI","FBE"),each=2) directions <- rep(c("in","out"),times=length(stations)/2) You could also extract the stations and directions from your data frame. Using your example, the...

If you melt your data.table to long format, this is easy: library(reshape2) news1 <- melt(news, id.vars = "ID") news2 <- news1[abs(value) > 0.01,] # ID variable value #1: 8 diff.jan 0.101 #2: 202 diff.apr 10.000 #3: 203 diff.apr 11.000 #4: 50 diff.aug 0.221 dcast.data.table(news2, ID ~ variable) # ID diff.jan...

You mentioned data.table, so here's two possible approaches for both requests library(data.table) For 1. setDT(df)[, .SD[all(V2 != "X")], by = V1] # V1 V2 V3 # 1: HIJ P 40 # 2: HIJ Y 41 For 2. df[, .SD[.N == 1L], by = V1] # V1 V2 V3 # 1:...

r,vector,replace,subset,readline

You can build the new set of file lines as follows: new.r.lines <- c(r.lines[1:27],r.lines[28:1027][grepl(var,r.lines[28:1027])],r.lines[1028:length(r.lines)]); This combines lines 1:27 with the subset of the following 28:1027 lines that match your search pattern, then further combines with lines 1028 to the end of the file. Thus, you can pass that to writeLines()...

You could try dat <- subset(wg, Year > 2007 & Year < 2010 & hour == 23 & mint %in% c(30, 32, 39, 40, 41, 49, 31)) ...

Your commands depends on equality and this requires an atomic data, but instead you gave a vector. Instead of this, you can use operators as @davidArenburg has mentioned. You can do this by first forming a list of T/F and then retrieve the list according to corresponding value of the...

r,data.frame,subset,linear-regression

First, you might want to write a function that can calculate the slope for three consecutive values, like this: slope <- function(x){ if(all(is.na(x))) # if x is all missing, then lm will throw an error that we want to avoid return(NA) else return(coef(lm(I(1:3)~x))[2]) } Then you can use the apply()...

I downloaded your data and had a look. If I am not mistaken, all you need is to subset data using Time.h. Here you have a range of time (10-23) you want. I used dplyr and did the following. You are asking R to pick up rows which have values...

I created a sample data. When you use subset(), you need a data frame and a condition. When you use lapply(), you make your function anonymous. That is, you write function(x) and further write codes which you want R to loop through. In your case, you want to loop through...

You can try df1[duplicated(df1)|duplicated(df1, fromLast=TRUE),] # A B #2 1A 2 #3 1A 2 #5 2 4 #6 2 4 #7 3A 0 #8 3A 0 #9 4A 1 #10 4A 1 data df1 <- structure(list(A = c("1", "1A", "1A", "2", "2", "2", "3A", "3A", "4A", "4A", "5"), B =...

Use a regular expression. For example: myd <- subset(df, grepl("-01$", Date_ID)) or myd <- df[grep("-01$", df$Date_ID),] ...

r,matrix,filter,data.frame,subset

Try this: (I suspect will be faster than any apply approach) mat[ rowSums(mat == mat[,1])!=ncol(mat) , ] # ---with your object--- [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 ...

you can try this with(df, df[ (x==1 & y>15) | (x==2 & y>5), ]) x y 1 1 30 4 2 10 5 2 18 or with dplyr library(dplyr) filter(df, (x==1 & y>15) | (x==2 & y>5)) ...

Here's another idea using dplyr: library(dplyr) my_df %>% filter(lead(let == "b", 5) | lag(let == "b", 5)) Or as per @akrun suggestion using the devel version of data.table: setDT(my_df)[shift(let == "b", 5) | shift(let == "b", type = "lead", 5)] Which gives: # num let #1 0.36723709 a #2 0.24743170...

You can try subset(df, V1 %in% l) # V1 V2 #1 a54 hi #3 sdx637 hi intersect can be used to get the common elements intersect(df$V1, l) #[1] "a54" "sdx637" but this will not give a logical index to subset the data, df[intersect(df$V1, l),] # V1 V2 #NA <NA> <NA>...

You can use := to create a new column ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)), by=zip][, zgrp:=NULL] # cgrp zip V1 #1: 3 12007 19.35484 #2: 4 12007 48.38710 #3: 1 12007 32.25806 #4: 1 12008 57.89474 #5: 4 12008 31.57895 #6: 3 12008 10.52632 Or as @Frank commented, you...

It seems to me that almost all of your questions are regarding a list of data frames with same columns which cause you to use lapply loops on every single operation (which seem highly inefficient). Alternatively, you could vectorize most of your operations by simply binding all the lists into...

Use split and cumsum: ccc <- data.frame(ccc) split(ccc[ccc$aaa==1,], cumsum(ccc$aaa!=1)[ccc$aaa==1]) #$`0` # aaa bbb #1 1 4 #2 1 4 #3 1 4 # #$`2` # aaa bbb #6 1 3 # #$`3` # aaa bbb #8 1 3 #9 1 2 # #$`4` # aaa bbb #11 1 2 #12...

It sounds like you're looking for get: x[get(cname) > cutoff,] # val1 val2 # 1: 2 5 # 2: 3 4 # 3: 4 2 # 4: 5 4 # 5: 3 5 ...

If there are lagging/leading spaces, this could occur. Remove those and it should work. library(stringr) data[,5] <- str_trim(data[,5]) Or data[,5] <- gsub('^\\s+|\\s+$', '', data[,5]) data[data[,5]=='Y',] Another option without removing the spaces would be grep data[grep('\\bY\\b', data[,5]),] ...

We can use Reduce with intersect in base R lapply(my.list, function(x) x[with(x, Letters %in% Reduce(intersect, split(Letters, Numbers))),]) Or using dplyr library(dplyr) lapply(my.list, function(x) x %>% group_by(Letters) %>% filter(n_distinct(Numbers)==2)) Instead of having a list, it can be changed to a single dataset with an additional grouping column and then do the...

just a little googling would have solved your problem, for example read this about logical operators, like this? ITEproduction_2014.2015<-subset(ITEproduction_2014.2015,Date.Difference>3 & Date.Difference<40) ...

Try this WantedData=Data2[Data2$ccession_number %in% SubsetData1$accession_number, ] ...

This will give you the desired output df. givenStr <- "that" row <- df[df$strs==givenStr,] df[,c(1,1+which(row[,-1]==1))] ...

constraints,scheduling,subset,cplex,opl

It is not clear what your problem is, but I am guessing your problem is to do with modelling things like products(j) in constraint 2. Try using sets for these - so create an array of sets of products in each product family. There are examples of this in the...

Use mongo aggregation like following : First use $unwind this will unwind stuff and then use $match to find elements greater than 4. After that $group data based on things.name and add required fields in $project. The query will be as following: db.collection.aggregate([ { $unwind: "$things" }, { $unwind: "$things.stuff"...

Short answer is: do not use subset but something like employ.data[employ.data[salary_string]>23000,] ...

You can try a[,setdiff(colnames(a), S)] Or a[,!colnames(a) %in% S] ...

Using dplyr library(dplyr) df %>% filter(!grepl("13",Date)) ...

this should work: library(dplyr) inner_join(dfA, dfB) %>% anti_join(dfC) which gives: Efficiency Value 1 8 7 2 2 4 ...

I think this is what you want. I've done it using dplyr's group_by and summarize here. For each Batch/ID it calculates the number of observations, the number of observations where measurement is between 6 and 7 and the ratio of those two. library(dplyr) # example data set set.seed(10) Measurement <-...

I'm not sure it makes sense to copy your covariates into a new list like that. Here's a way to loop over columns and to dynimcally build formulas dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) dat1 <- dat[-9,] #x.list not used fit...

library(ggplot2) diamonds %>% group_by(clarity) %>% summarise(mean_price = mean(price) , min_price =min(price) ,max_price = max(price) , median_price = median(as.numeric(price)), count = n()) %>% arrange(clarity) for arranging in descending order use arrange(desc(clarity)) instead of arrange(clarity)...

This is not a subsetting issue, it's a formatting/presentation issue. You're in the first circle of Burns's R Inferno ("[i]f you are using R and you think you’re in hell, this is a map for you"): another aspect of virtuous pagan beliefs—what is printed is all that there is If...

python,algorithm,python-3.x,set,subset

Solution: def partitions(A): if not A: yield [] else: a, *R = A for partition in partitions(R): yield partition + [[a]] for i, subset in enumerate(partition): yield partition[:i] + [subset + [a]] + partition[i+1:] Explanation: The empty set only has the empty partition. For a non-empty set, take out one...

It isn't a nicest solution, but does what you want. library(MuMIn) options(na.action = na.fail) fm1 <- lm(y ~ X1 + X2, Cement) m1 <- dredge(fm1) ms1 <- subset(m1, delta < 32) fm2 <- lm(y ~ X3 + X4, Cement) m2 <- dredge(fm2) ms2 <- subset(m2, delta < 20) a1 <-...

First, if you want a dataframe, you should use data.frame, not c: df <- data.frame(id, problem, solution1, solution2) Then you can subset like this for instance (no need to use subset per se) df2 <- df[!(grepl("a", df$problem) & (grepl("eat", df$solution1) | grepl("eat", solution2))),] # id problem solution1 solution2 # 2...

You can solve this by sorting first: import operator ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]] sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True) sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0)) filtered = [] i,j = 0,0 while i < len(sorted_ranges): filtered.append(sorted_ranges[i]) j = i+1 while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]: print "Remove " ,...

This should work: dfOutput <- dfInput[apply(dfInput[,3:19]>0.00000001 & dfInput[,3:19]<0.300, 1, all, na.rm=TRUE), ] And now for a reproducible example, I'll explain what is going on: # data df <- data.frame(x = c(1:3, NA, 3:1), y=c(NA, NA, NA, 3, 3, 2, 3)) # this returns a matrix! df[, 1:2] > 2 #...

You can use list slicing to get top 10 elements from the ids list. Students.objects.filter(studentid__in=[p[0] for p in ids[:10]]) ...

Perhaps my question was not formulated correctly, but this post had the solutions I was essentially looking for: http://stackoverflow.com/a/123481/2966951 http://stackoverflow.com/a/121435/2966951 Filtering out the most recent row was my problem. I was surprised that selecting from a subquery with a max value could yield anything other than that value....

The question is quite badly formulated but if I understand it correctly it would mean doing something like that: col=3 #col is the # of the column where the numbers should be odd dat[dat[,col]%%2==1,] #dat is the data frame or matrix containing your data Or in the example you give...

I might do as following. Only end dates seem to be necessary as start dates are just 1 year before. Loop is made using lapply() which iterates over all end dates. Subsetting is done mainly with difftime() by filtering any non-zero time difference between the two dates. set.seed(24) df1 <-...

This will return only those rows beginning with a capital "S" using the substr()-ing function: dat[ substr( dat$City, 1 ,1) == "S" , ] Could also have used: dat[ grepl("^S", dat$City) , ] The second option is a very simple regular expression. Look at ?regex and ?grep....

java,algorithm,recursion,dynamic-programming,subset

Here's the super naive solution that simply generates a power set on your input array and then iterates over each set to see if the sum satisfies the given total. I hacked it together with code already available on StackOverflow. O(2n) in time and space. Gross. You can use the...

I think I spot a couple problems. In grep you don't want to set value to be TRUE. Setting value to be true returns the matched word instead of the index of the row. Also you are missing a comma (hence the undefinied columns error). Try This: LakeK_all[grep("^S", LakeK_all$Lake), ]...

You could try something like this: select user_id from user_assets where asset_id = 1 or asset_id = 2 ... group by user_id having count(distinct asset_id) = (number of assets you are looking for) demo here showing your required output. the distinct isn't necessary if (user_id, asset_id) is a unique key...

Try this : subset(raw_data,eval(parse(text=keep_rows))) Test : keep_rows <- "Blok>1" raw_data<- data.frame(Blok=c(1,2,3,0)) subset(raw_data,eval(parse(text=keep_rows))) Blok 2 2 3 3 ...

Try foverlaps from data.table library(data.table) setkey(setDT(df1), Chromosome, start, end) setkey(setDT(df2), Chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5), with=FALSE]), names(df1))[] # Chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000 Or as @Arun commented, we can use which=TRUE (to extract the indices) and subset 'df1'...

If we already created 3 datasets and want to subset the first "a" based on the elements of "c/c1", one option is anti_join from dplyr library(dplyr) anti_join(a, c1, by=c('A', 'B', 'C')) Update Or we could use a base R option with interaction to paste the columns of interest together in...

The comment by @false is correct, as far as I can see: given two sets S and S': if S is a subset of S', then the intersection of the complement of S' and S should be the empty set (there are no elements outside of S' that are elements...

r,loops,double,conditional,subset

Or simply with base R aggregate(Julian_Day ~., df, min) # Year Id Julian_Day # 1 1901 1 40 # 2 1968 1 200 # 3 1901 5 56 Or library(dplyr) df %>% group_by(Id, Year) %>% summarise(Julian_Day = min(Julian_Day)) # Source: local data frame [3 x 3] # Groups: Id #...

You want the ifelse function, which is a vectorized conditional: > x <- c(1, 1, 0, 0, 1) > y <- c(1, 2, 3, 4, 5) > z <- c(6, 7, 8, 9, 10) > ifelse(x == 1, y, z) [1] 1 2 8 9 5 You will have to...

If you just want to keep, say, the first row of treat for each value of ID, then you can use slice: DATA_clean <- treat %>% group_by(ID) %>% slice(1) Your original code didn't work because n() returns to the total number of rows for each value of ID. If every...

#one way is to use `filter` from `dplyr` package (and assuming Date is already in Date format) library(dplyr) wg %>% filter(year %in% c(2008,2009) & months(Date) %in% c("July","August","September") #If you want to stick to subset, replace second & with |: subset(wg, date >= "2008-07-01" & date <= "2008-09-30" | date >=...

As you'd like to produce boxplots for each group and year in the same graph, I think your dataset is ready for that and you can do the following: p <- ggplot(tmp.data, aes(factor(year), fill=group, value)) p + geom_boxplot() ...

Creating the Test/Sample dataset data test; infile datalines dlm=','; input Over : 8. Ball : 8. Bowling : $15. Runs_scored : 8. Count : 8. ; datalines; 39,1,Ali,1,1 39,2,Ali,1,2 39,3,Ali,2,3 39,4,Ali,1,4 39,5,Ali,1,5 39,6,Ali,1,6 36,1,Anderson,1,1 36,2,Anderson,1,2 36,3,Anderson,1,3 36,4,Anderson,0,4 36,6,Anderson,0,6 ; run; Selecting the distinct overs(as I understand Cricket, each over would...

If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the...

dat <- read.table(text="Patient ID,Disease Score 101,5 101,2 101,2 105,1 110,5 115,1 115,1", stringsAs=FALSE, header=TRUE, sep=",") # one way in base dat[dat$Patient.ID %in% names(which(table(dat$Patient.ID)>2)),] # one way in dplyr library(dplyr) dat %>% group_by(Patient.ID) %>% mutate(n=n()) %>% ungroup() %>% filter(n>=2) %>% select(Patient.ID, Disease.Score) ...

You can use the function pmatch: x[-pmatch(y,x)] #[1] "A" "C" "A" "B" "D" Edit If your data can be strings of more than 1 character, here is an option to get what you want: xNew <- unlist(sapply(x[!duplicated(x)], function(item, tab1, tab2) { rep(item, tab1[item] - ifelse(item %in% names(tab2), tab2[item], 0)) },...

algorithm,recursion,combinations,subset

I think this can help you: void subset(vector<int> &input, vector<int> output, int current) { static int n=0; if (current == input.size()) { cout<<n++<<":\t {"; for(int i=0;i<output.size();i++) cout<<output[i]<<", "; cout<<"\b\b}\n"; } else { subset(input, output, current+1); //exclude current'th item output.push_back(input[current]); subset(input, output, current+1); //include current'th item } } and first time...

Using data.table, we'd do: setDT(data)[colA == "ABC", ColB := "XXXX"] and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies. We call this sub-assign by reference. You can read more about it in the new HTML vignettes....

data = sub240 is an assignment statement. You can assign things on their own line or in function definitions and calls, but you can only provide logical statements in a while loop definition. If you want logical equality, you need ==. But unless data changes in the loop AND you...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

r,conditional,condition,subset

I was told the problem with my code is that I needed to put indexing on either side. Without the indexing on the right side, it does not know which row to apply the value from. So the correct code in this case would be: df$new[df$date=='a' & !is.na(df$date)] <- df$va[df$date=='a'...

You may try with data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(a)), grouped by 'var1', we get a logical index for 'var2' elements that are greater than or equal to corresponding 'var2' elements for which 'var3' is TRUE and subset the dataset .SD. library(data.table) setDT(a)[,.SD[var2 >= var2[var3]], var1] #...

You can use transform() and ave() to add a column indicating how many observations are in each group and then use the subset() parameter to only keep those with more than 1 obs. For example boxplot(height~group, transform(test, groupcount=ave(ID, group, FUN=length)), subset=groupcount>1) Note that you can only use the subset= parameter...

I hope that you want; (renewed :) function subset() { var arr1 = [1, 9, 3, 5, 4, 8, 2, 6, 3, 4] var arr2 = [5, 2, 4, 8, 2, 6, 4] var arr3 = []; var minSize=2; // minimum 2 element must in intersection var findedMax = "";...

You can use top_n from dplyr to select the 'n` top rows library(dplyr) top_n(AB, 4, BETAdn) Or use order from base R and then subset the top 'n' rows AB[order(-AB$BETAdn),][1:4,] ...

We could create a column 'MonthYr' from the 'date' column after converting it to 'Date' class. Get the number of observations ('n') per group ('permno', 'MonthYr') and use that to remove the IDs ('permno') that have at least one 'n' less than 10. library(dplyr) res <- df1 %>% mutate(MonthYr=format(as.Date(date, format='%m/%d/%Y'),...

One solution in base R: #using as.character since one$x and two$x are factors in this case > two[ as.character(one$x) != as.character(two$x), ] x y z 11 k 11 -0.6680130 12 l 12 -1.0501888 13 m 13 -1.0987269 14 n 14 1.0045557 15 o 15 -0.6002310 16 p 16 1.3162201 17...

There is no problem. Look at nrow(Sin). You should see that is has fewer rows after subsetting. The first column in the output is the "row name". It is not a cumulative index that tells you how many rows there are. Row names are preserved after subsetting (ie they will...

Based on your stated use of pandas colsToUse = ['col1', 'col2', 'col3'] rowsToUse = np.random.choice(range(len(df1)), 500) df2 = df1.ix[:, colsToUse] df3 = df1.ix[rowsToUse, :] There are also some other DataFrame helper functions for indexing: df1.loc, df1.iloc, and df1.xs. It's also helpful to look at the guide NumPy for MATLAB Users...

You could do: library(dplyr) df %>% # create an hypothetical "customer.name" column mutate(customer.name = sample(LETTERS[1:10], size = n(), replace = TRUE)) %>% # group data by "Parcel.." group_by(Parcel..) %>% # apply sum() to the selected columns mutate_each(funs(sum(.)), one_of("X.11", "X.13", "X.15", "num_units")) %>% # likewise for mean() mutate_each(funs(mean(.)), one_of("Acres", "Ttl_sq_ft", "Mtr.Size"))...

r,loops,data.frame,pattern-matching,subset

Your question boils down to searching for sequences of "ABC" within the sequences of the IDs: (matches <- gregexpr("ABC", paste(dat$ID, collapse=""))[[1]]) # [1] 8 # ... This indicates that the only match begins at row 8. You now know that the information for Sensor1 are at rows numbered matches, the...

You can change the 'Time' column to 'POSIXct' class and then subset datasubcolrow$Time <- as.POSIXct(datasubcolrow$Time, format='%d/%m/%Y %H:%M:%OS') subset(datasubcolrow, Time < as.POSIXct('15/05/2015 13:30:15.417', format='%d/%m/%Y %H:%M:%OS')) data datasubcolrow <- structure(list(Time = c("15/05/2015 13:30:07.291", "15/05/2015 13:30:08.307", "15/05/2015 13:30:09.323", "15/05/2015 13:30:10.338", "15/05/2015 13:30:11.354", "15/05/2015 13:30:12.370", "15/05/2015 13:30:13.386", "15/05/2015 13:30:14.402", "15/05/2015 13:30:15.417",...

You'll want to use something like the following: new_data <- Data[sample(nrow(Data), N, prob = (1 - Data$Prob), replace = F),] ...

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

r,if-statement,matrix,subset,covariance

I think this does what you're asking, if I'm interpreting the question correctly. I've given you a couple solutions, pick your poison. The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric. m <- read.table(header=T, stringsAsFactors=F,...

Subsetting can be done by using []. See the SpatialPolygons-class help (?'SpatialPolygons-class'): Methods [...]: [ : select subset of (sets of) polygons; NAs are not permitted in the row index" So using your data: library(sp) Sr1 = Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2))) Sr2 = Polygon(cbind(c(5,4,2,5),c(2,3,2,2))) Sr3 = Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5))) Sr4 = Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)...

Using lapply() student.count = 2 # depends on your choice out = do.call(rbind, lapply(split(df, f = df$Schools), function(x){ x$no.of.students = length(x$Students); x = subset(x, no.of.students > student.count) })) #> out # Schools Students no.of.students #SchA.1 SchA st1 5 #SchA.2 SchA st2 5 #SchA.3 SchA st3 5 #SchA.4 SchA st4 5...

With the sample data dd<-read.table(text="Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000", header=T) you can do this with base R subset(dd, Count>.25*ave(Count, Group, FUN=sum)) or the dplyr library library(dplyr) dd %>% group_by(Group) %>% filter(Count > .25 * sum(Count)) perhaps you'll find one more...

You could try sampling the indices of players to construct the first team instead of sampling the names. idx1 <- sample(1:nrow(players), 5) You can actually use these indices to grab all the information about each team: team1 <- players[idx1,] team2 <- players[-idx1,] The score for each team can be computed...

You can do this using a variant of the compare-cumsum-groupby pattern. Starting from >>> df["markers"].isin(["x","y"]) 0 False 1 False 2 True 3 False 4 False 5 False 6 True 7 False 8 False 9 True Name: markers, dtype: bool We can shift and take the cumulative sum to get: >>>...

c#,arrays,mongodb,subset,mongodb-csharp

Use mongo Set Operator using $setIsSubset in aggregation you will get your result, check following query : db.collectionName.aggregate({ "$project": { "Name": 1, "Tags": 1, "match": { "$setIsSubset": ["$Tags", ["A", "B", "C"]] //check Tags is subset of given array in your case array is ["A","B","C"] } } }, { "$match": {...

python,list,filtering,subset,subsetting

You need realize methods hash and eq on object class A: def __init__(self, a): self.attr1 = a def __hash__(self): return hash(self.attr1) def __eq__(self, other): return self.attr1 == other.attr1 def __repr__(self): return str(self.attr1) Example: l = [A(5), A(4), A(4)] print list(set(l)) print list(set(l))[0].__class__ # ==> __main__.A. It's a object of class...

Another option is to use Filter within lapply lapply(list12, Filter, f = is.numeric) # [[1]] # x1 z1 # 1 1 0 # 2 2 1 # 3 3 0 # 4 4 1 # # [[2]] # x2 y2 # 1 0 0 # 2 1 1 # 3...

you can read about argument drop in the help page: ?'[' M[which(rownames(M) != "A"), ,drop=FALSE] ...

geom_boxplot calls boxplot.stats to calculate the positions of the upper and lower whiskers. You can do it too: > boxplot.stats(v) $stats [1] 93.340 96.069 97.876 99.087 100.359 $n [1] 24 $conf [1] 96.90265 98.84935 $out [1] -234.347 75.764 (v is assumed to be your input data vector): From the boxplot.stats...