Read up on mapcar et al: (defparameter a (list 1 2 3 4)) (mapcon (lambda (tail) (mapcar (lambda (x) (cons (car tail) x)) (cdr tail))) a) ==> ((1 . 2) (1 . 3) (1 . 4) (2 . 3) (2 . 4) (3 . 4)) ...

It isn't a nicest solution, but does what you want. library(MuMIn) options(na.action = na.fail) fm1 <- lm(y ~ X1 + X2, Cement) m1 <- dredge(fm1) ms1 <- subset(m1, delta < 32) fm2 <- lm(y ~ X3 + X4, Cement) m2 <- dredge(fm2) ms2 <- subset(m2, delta < 20) a1 <-...

r,loops,double,conditional,subset

Or simply with base R aggregate(Julian_Day ~., df, min) # Year Id Julian_Day # 1 1901 1 40 # 2 1968 1 200 # 3 1901 5 56 Or library(dplyr) df %>% group_by(Id, Year) %>% summarise(Julian_Day = min(Julian_Day)) # Source: local data frame [3 x 3] # Groups: Id #...

Use a regular expression. For example: myd <- subset(df, grepl("-01$", Date_ID)) or myd <- df[grep("-01$", df$Date_ID),] ...

one way: iris$Sepal.Width[ iris$Species %in% "virginica"] Probably best to google subsetting though as this is all easily available in tutorials everywhere, and this will have been asked elsewhere on SO....

As you'd like to produce boxplots for each group and year in the same graph, I think your dataset is ready for that and you can do the following: p <- ggplot(tmp.data, aes(factor(year), fill=group, value)) p + geom_boxplot() ...

First, if you want a dataframe, you should use data.frame, not c: df <- data.frame(id, problem, solution1, solution2) Then you can subset like this for instance (no need to use subset per se) df2 <- df[!(grepl("a", df$problem) & (grepl("eat", df$solution1) | grepl("eat", solution2))),] # id problem solution1 solution2 # 2...

r,conditional,condition,subset

I was told the problem with my code is that I needed to put indexing on either side. Without the indexing on the right side, it does not know which row to apply the value from. So the correct code in this case would be: df$new[df$date=='a' & !is.na(df$date)] <- df$va[df$date=='a'...

r,vector,replace,subset,readline

You can build the new set of file lines as follows: new.r.lines <- c(r.lines[1:27],r.lines[28:1027][grepl(var,r.lines[28:1027])],r.lines[1028:length(r.lines)]); This combines lines 1:27 with the subset of the following 28:1027 lines that match your search pattern, then further combines with lines 1028 to the end of the file. Thus, you can pass that to writeLines()...

You can try df1[duplicated(df1)|duplicated(df1, fromLast=TRUE),] # A B #2 1A 2 #3 1A 2 #5 2 4 #6 2 4 #7 3A 0 #8 3A 0 #9 4A 1 #10 4A 1 data df1 <- structure(list(A = c("1", "1A", "1A", "2", "2", "2", "3A", "3A", "4A", "4A", "5"), B =...

This is not a subsetting issue, it's a formatting/presentation issue. You're in the first circle of Burns's R Inferno ("[i]f you are using R and you think you’re in hell, this is a map for you"): another aspect of virtuous pagan beliefs—what is printed is all that there is If...

You could try something like this: select user_id from user_assets where asset_id = 1 or asset_id = 2 ... group by user_id having count(distinct asset_id) = (number of assets you are looking for) demo here showing your required output. the distinct isn't necessary if (user_id, asset_id) is a unique key...

r,if-statement,matrix,subset,covariance

I think this does what you're asking, if I'm interpreting the question correctly. I've given you a couple solutions, pick your poison. The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric. m <- read.table(header=T, stringsAsFactors=F,...

You can use the function pmatch: x[-pmatch(y,x)] #[1] "A" "C" "A" "B" "D" Edit If your data can be strings of more than 1 character, here is an option to get what you want: xNew <- unlist(sapply(x[!duplicated(x)], function(item, tab1, tab2) { rep(item, tab1[item] - ifelse(item %in% names(tab2), tab2[item], 0)) },...

r,data.frame,subset,linear-regression

First, you might want to write a function that can calculate the slope for three consecutive values, like this: slope <- function(x){ if(all(is.na(x))) # if x is all missing, then lm will throw an error that we want to avoid return(NA) else return(coef(lm(I(1:3)~x))[2]) } Then you can use the apply()...

Use split and cumsum: ccc <- data.frame(ccc) split(ccc[ccc$aaa==1,], cumsum(ccc$aaa!=1)[ccc$aaa==1]) #$`0` # aaa bbb #1 1 4 #2 1 4 #3 1 4 # #$`2` # aaa bbb #6 1 3 # #$`3` # aaa bbb #8 1 3 #9 1 2 # #$`4` # aaa bbb #11 1 2 #12...

You mentioned data.table, so here's two possible approaches for both requests library(data.table) For 1. setDT(df)[, .SD[all(V2 != "X")], by = V1] # V1 V2 V3 # 1: HIJ P 40 # 2: HIJ Y 41 For 2. df[, .SD[.N == 1L], by = V1] # V1 V2 V3 # 1:...

this should work: library(dplyr) inner_join(dfA, dfB) %>% anti_join(dfC) which gives: Efficiency Value 1 8 7 2 2 4 ...

With the sample data dd<-read.table(text="Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000", header=T) you can do this with base R subset(dd, Count>.25*ave(Count, Group, FUN=sum)) or the dplyr library library(dplyr) dd %>% group_by(Group) %>% filter(Count > .25 * sum(Count)) perhaps you'll find one more...

Here's another idea using dplyr: library(dplyr) my_df %>% filter(lead(let == "b", 5) | lag(let == "b", 5)) Or as per @akrun suggestion using the devel version of data.table: setDT(my_df)[shift(let == "b", 5) | shift(let == "b", type = "lead", 5)] Which gives: # num let #1 0.36723709 a #2 0.24743170...

You could try dat <- subset(wg, Year > 2007 & Year < 2010 & hour == 23 & mint %in% c(30, 32, 39, 40, 41, 49, 31)) ...

If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the...

Adapting my comment into an answer, taking into account the presented data and OP's comments. Note, code is not checked as dput of data was not obtained. library("dplyr") data_summarised <- data %>% mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date,...

You could try sampling the indices of players to construct the first team instead of sampling the names. idx1 <- sample(1:nrow(players), 5) You can actually use these indices to grab all the information about each team: team1 <- players[idx1,] team2 <- players[-idx1,] The score for each team can be computed...

I might do as following. Only end dates seem to be necessary as start dates are just 1 year before. Loop is made using lapply() which iterates over all end dates. Subsetting is done mainly with difftime() by filtering any non-zero time difference between the two dates. set.seed(24) df1 <-...

you can try this with(df, df[ (x==1 & y>15) | (x==2 & y>5), ]) x y 1 1 30 4 2 10 5 2 18 or with dplyr library(dplyr) filter(df, (x==1 & y>15) | (x==2 & y>5)) ...

Try foverlaps from data.table library(data.table) setkey(setDT(df1), Chromosome, start, end) setkey(setDT(df2), Chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5), with=FALSE]), names(df1))[] # Chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000 Or as @Arun commented, we can use which=TRUE (to extract the indices) and subset 'df1'...

Subsetting can be done by using []. See the SpatialPolygons-class help (?'SpatialPolygons-class'): Methods [...]: [ : select subset of (sets of) polygons; NAs are not permitted in the row index" So using your data: library(sp) Sr1 = Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2))) Sr2 = Polygon(cbind(c(5,4,2,5),c(2,3,2,2))) Sr3 = Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5))) Sr4 = Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

It sounds like you're looking for get: x[get(cname) > cutoff,] # val1 val2 # 1: 2 5 # 2: 3 4 # 3: 4 2 # 4: 5 4 # 5: 3 5 ...

You can change the 'Time' column to 'POSIXct' class and then subset datasubcolrow$Time <- as.POSIXct(datasubcolrow$Time, format='%d/%m/%Y %H:%M:%OS') subset(datasubcolrow, Time < as.POSIXct('15/05/2015 13:30:15.417', format='%d/%m/%Y %H:%M:%OS')) data datasubcolrow <- structure(list(Time = c("15/05/2015 13:30:07.291", "15/05/2015 13:30:08.307", "15/05/2015 13:30:09.323", "15/05/2015 13:30:10.338", "15/05/2015 13:30:11.354", "15/05/2015 13:30:12.370", "15/05/2015 13:30:13.386", "15/05/2015 13:30:14.402", "15/05/2015 13:30:15.417",...

If you melt your data.table to long format, this is easy: library(reshape2) news1 <- melt(news, id.vars = "ID") news2 <- news1[abs(value) > 0.01,] # ID variable value #1: 8 diff.jan 0.101 #2: 202 diff.apr 10.000 #3: 203 diff.apr 11.000 #4: 50 diff.aug 0.221 dcast.data.table(news2, ID ~ variable) # ID diff.jan...

You want the ifelse function, which is a vectorized conditional: > x <- c(1, 1, 0, 0, 1) > y <- c(1, 2, 3, 4, 5) > z <- c(6, 7, 8, 9, 10) > ifelse(x == 1, y, z) [1] 1 2 8 9 5 You will have to...

You can use package dplyr: library(dplyr) intersect(df1,df2) # a b #1 1 m #2 3 f Edit for the new data.frames with c column: you can use function semi_join (also from dplyr): semi_join(df1,df2,by=c("a","b")) # a b c #1 1 m df1 #2 3 f df1 Other option, in base R:...

Another option is to use Filter within lapply lapply(list12, Filter, f = is.numeric) # [[1]] # x1 z1 # 1 1 0 # 2 2 1 # 3 3 0 # 4 4 1 # # [[2]] # x2 y2 # 1 0 0 # 2 1 1 # 3...

I'm not sure it makes sense to copy your covariates into a new list like that. Here's a way to loop over columns and to dynimcally build formulas dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) dat1 <- dat[-9,] #x.list not used fit...

Short answer is: do not use subset but something like employ.data[employ.data[salary_string]>23000,] ...

just a little googling would have solved your problem, for example read this about logical operators, like this? ITEproduction_2014.2015<-subset(ITEproduction_2014.2015,Date.Difference>3 & Date.Difference<40) ...

Try this WantedData=Data2[Data2$ccession_number %in% SubsetData1$accession_number, ] ...

You can do this using a variant of the compare-cumsum-groupby pattern. Starting from >>> df["markers"].isin(["x","y"]) 0 False 1 False 2 True 3 False 4 False 5 False 6 True 7 False 8 False 9 True Name: markers, dtype: bool We can shift and take the cumulative sum to get: >>>...

The question is quite badly formulated but if I understand it correctly it would mean doing something like that: col=3 #col is the # of the column where the numbers should be odd dat[dat[,col]%%2==1,] #dat is the data frame or matrix containing your data Or in the example you give...

Use mongo aggregation like following : First use $unwind this will unwind stuff and then use $match to find elements greater than 4. After that $group data based on things.name and add required fields in $project. The query will be as following: db.collection.aggregate([ { $unwind: "$things" }, { $unwind: "$things.stuff"...

With your example, it is now clearer what you want and I give it a second try. I use dataGL_all as defined in your question and the define stations <- rep(c("FLI","FBE"),each=2) directions <- rep(c("in","out"),times=length(stations)/2) You could also extract the stations and directions from your data frame. Using your example, the...

library(ggplot2) diamonds %>% group_by(clarity) %>% summarise(mean_price = mean(price) , min_price =min(price) ,max_price = max(price) , median_price = median(as.numeric(price)), count = n()) %>% arrange(clarity) for arranging in descending order use arrange(desc(clarity)) instead of arrange(clarity)...

Based on your stated use of pandas colsToUse = ['col1', 'col2', 'col3'] rowsToUse = np.random.choice(range(len(df1)), 500) df2 = df1.ix[:, colsToUse] df3 = df1.ix[rowsToUse, :] There are also some other DataFrame helper functions for indexing: df1.loc, df1.iloc, and df1.xs. It's also helpful to look at the guide NumPy for MATLAB Users...

You were close, try this mean(iris$Sepal.Length[which(iris$Species != 'setosa')]) or mean(iris$Sepal.Length[iris$Species != 'setosa']) or mean(iris[iris$Species!= "setosa", "Sepal.Length"]) ...

There is no problem. Look at nrow(Sin). You should see that is has fewer rows after subsetting. The first column in the output is the "row name". It is not a cumulative index that tells you how many rows there are. Row names are preserved after subsetting (ie they will...

data = sub240 is an assignment statement. You can assign things on their own line or in function definitions and calls, but you can only provide logical statements in a while loop definition. If you want logical equality, you need ==. But unless data changes in the loop AND you...

I created a sample data. When you use subset(), you need a data frame and a condition. When you use lapply(), you make your function anonymous. That is, you write function(x) and further write codes which you want R to loop through. In your case, you want to loop through...

You can try a[,setdiff(colnames(a), S)] Or a[,!colnames(a) %in% S] ...

If you just want to keep, say, the first row of treat for each value of ID, then you can use slice: DATA_clean <- treat %>% group_by(ID) %>% slice(1) Your original code didn't work because n() returns to the total number of rows for each value of ID. If every...

You could do: library(dplyr) df %>% # create an hypothetical "customer.name" column mutate(customer.name = sample(LETTERS[1:10], size = n(), replace = TRUE)) %>% # group data by "Parcel.." group_by(Parcel..) %>% # apply sum() to the selected columns mutate_each(funs(sum(.)), one_of("X.11", "X.13", "X.15", "num_units")) %>% # likewise for mean() mutate_each(funs(mean(.)), one_of("Acres", "Ttl_sq_ft", "Mtr.Size"))...

If we already created 3 datasets and want to subset the first "a" based on the elements of "c/c1", one option is anti_join from dplyr library(dplyr) anti_join(a, c1, by=c('A', 'B', 'C')) Update Or we could use a base R option with interaction to paste the columns of interest together in...

This will give you the desired output df. givenStr <- "that" row <- df[df$strs==givenStr,] df[,c(1,1+which(row[,-1]==1))] ...

You may try with data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(a)), grouped by 'var1', we get a logical index for 'var2' elements that are greater than or equal to corresponding 'var2' elements for which 'var3' is TRUE and subset the dataset .SD. library(data.table) setDT(a)[,.SD[var2 >= var2[var3]], var1] #...

Perhaps my question was not formulated correctly, but this post had the solutions I was essentially looking for: http://stackoverflow.com/a/123481/2966951 http://stackoverflow.com/a/121435/2966951 Filtering out the most recent row was my problem. I was surprised that selecting from a subquery with a max value could yield anything other than that value....

This should work: dfOutput <- dfInput[apply(dfInput[,3:19]>0.00000001 & dfInput[,3:19]<0.300, 1, all, na.rm=TRUE), ] And now for a reproducible example, I'll explain what is going on: # data df <- data.frame(x = c(1:3, NA, 3:1), y=c(NA, NA, NA, 3, 3, 2, 3)) # this returns a matrix! df[, 1:2] > 2 #...

#one way is to use `filter` from `dplyr` package (and assuming Date is already in Date format) library(dplyr) wg %>% filter(year %in% c(2008,2009) & months(Date) %in% c("July","August","September") #If you want to stick to subset, replace second & with |: subset(wg, date >= "2008-07-01" & date <= "2008-09-30" | date >=...

The comment by @false is correct, as far as I can see: given two sets S and S': if S is a subset of S', then the intersection of the complement of S' and S should be the empty set (there are no elements outside of S' that are elements...

If there are lagging/leading spaces, this could occur. Remove those and it should work. library(stringr) data[,5] <- str_trim(data[,5]) Or data[,5] <- gsub('^\\s+|\\s+$', '', data[,5]) data[data[,5]=='Y',] Another option without removing the spaces would be grep data[grep('\\bY\\b', data[,5]),] ...

We can use Reduce with intersect in base R lapply(my.list, function(x) x[with(x, Letters %in% Reduce(intersect, split(Letters, Numbers))),]) Or using dplyr library(dplyr) lapply(my.list, function(x) x %>% group_by(Letters) %>% filter(n_distinct(Numbers)==2)) Instead of having a list, it can be changed to a single dataset with an additional grouping column and then do the...

Your commands depends on equality and this requires an atomic data, but instead you gave a vector. Instead of this, you can use operators as @davidArenburg has mentioned. You can do this by first forming a list of T/F and then retrieve the list according to corresponding value of the...

r,logic,aggregate,subset,subsetting

The problem is that == is alternating between the values of "honda" and "harley" and comparing with the value in the relevant position of your "manufacturer" variable. On the other hand, %in% (as suggested by MrFlick) and | are checking across the entire "manufacturer" variable before deciding which values to...

arrays,algorithm,sum,big-o,subset

You can easily solve this problem with a simple recursive. def F(arr): if len(arr) == 1: return (arr[0], 1) else: r = F(arr[:-1]) return (11 * r[0] + (r[1] + 1) * arr[-1], 2 * r[1] + 1) So, how does it work? It is simple. Let say we want...

r,loops,data.frame,pattern-matching,subset

Your question boils down to searching for sequences of "ABC" within the sequences of the IDs: (matches <- gregexpr("ABC", paste(dat$ID, collapse=""))[[1]]) # [1] 8 # ... This indicates that the only match begins at row 8. You now know that the information for Sensor1 are at rows numbered matches, the...

constraints,scheduling,subset,cplex,opl

It is not clear what your problem is, but I am guessing your problem is to do with modelling things like products(j) in constraint 2. Try using sets for these - so create an array of sets of products in each product family. There are examples of this in the...

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

python,list,filtering,subset,subsetting

You need realize methods hash and eq on object class A: def __init__(self, a): self.attr1 = a def __hash__(self): return hash(self.attr1) def __eq__(self, other): return self.attr1 == other.attr1 def __repr__(self): return str(self.attr1) Example: l = [A(5), A(4), A(4)] print list(set(l)) print list(set(l))[0].__class__ # ==> __main__.A. It's a object of class...

You can use %in% instead of == subset(data, x %in% 1:3) In general, if we are comparing two vectors of unequal sizes, %in% would be used. There are cases where we can take advantage of the recycling (it can fail too) if the length of one of the vector is...

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

I hope that you want; (renewed :) function subset() { var arr1 = [1, 9, 3, 5, 4, 8, 2, 6, 3, 4] var arr2 = [5, 2, 4, 8, 2, 6, 4] var arr3 = []; var minSize=2; // minimum 2 element must in intersection var findedMax = "";...

c#,arrays,mongodb,subset,mongodb-csharp

Use mongo Set Operator using $setIsSubset in aggregation you will get your result, check following query : db.collectionName.aggregate({ "$project": { "Name": 1, "Tags": 1, "match": { "$setIsSubset": ["$Tags", ["A", "B", "C"]] //check Tags is subset of given array in your case array is ["A","B","C"] } } }, { "$match": {...

java,algorithm,recursion,dynamic-programming,subset

Here's the super naive solution that simply generates a power set on your input array and then iterates over each set to see if the sum satisfies the given total. I hacked it together with code already available on StackOverflow. O(2n) in time and space. Gross. You can use the...

One solution in base R: #using as.character since one$x and two$x are factors in this case > two[ as.character(one$x) != as.character(two$x), ] x y z 11 k 11 -0.6680130 12 l 12 -1.0501888 13 m 13 -1.0987269 14 n 14 1.0045557 15 o 15 -0.6002310 16 p 16 1.3162201 17...

Using data.table, we'd do: setDT(data)[colA == "ABC", ColB := "XXXX"] and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies. We call this sub-assign by reference. You can read more about it in the new HTML vignettes....

python,algorithm,python-3.x,set,subset

Solution: def partitions(A): if not A: yield [] else: a, *R = A for partition in partitions(R): yield partition + [[a]] for i, subset in enumerate(partition): yield partition[:i] + [subset + [a]] + partition[i+1:] Explanation: The empty set only has the empty partition. For a non-empty set, take out one...

We could create a column 'MonthYr' from the 'date' column after converting it to 'Date' class. Get the number of observations ('n') per group ('permno', 'MonthYr') and use that to remove the IDs ('permno') that have at least one 'n' less than 10. library(dplyr) res <- df1 %>% mutate(MonthYr=format(as.Date(date, format='%m/%d/%Y'),...

I think this is what you want. I've done it using dplyr's group_by and summarize here. For each Batch/ID it calculates the number of observations, the number of observations where measurement is between 6 and 7 and the ratio of those two. library(dplyr) # example data set set.seed(10) Measurement <-...

Try df[grep('1{3,}', df$history),] ...

Creating the Test/Sample dataset data test; infile datalines dlm=','; input Over : 8. Ball : 8. Bowling : $15. Runs_scored : 8. Count : 8. ; datalines; 39,1,Ali,1,1 39,2,Ali,1,2 39,3,Ali,2,3 39,4,Ali,1,4 39,5,Ali,1,5 39,6,Ali,1,6 36,1,Anderson,1,1 36,2,Anderson,1,2 36,3,Anderson,1,3 36,4,Anderson,0,4 36,6,Anderson,0,6 ; run; Selecting the distinct overs(as I understand Cricket, each over would...

Try this : subset(raw_data,eval(parse(text=keep_rows))) Test : keep_rows <- "Blok>1" raw_data<- data.frame(Blok=c(1,2,3,0)) subset(raw_data,eval(parse(text=keep_rows))) Blok 2 2 3 3 ...

algorithm,recursion,combinations,subset

I think this can help you: void subset(vector<int> &input, vector<int> output, int current) { static int n=0; if (current == input.size()) { cout<<n++<<":\t {"; for(int i=0;i<output.size();i++) cout<<output[i]<<", "; cout<<"\b\b}\n"; } else { subset(input, output, current+1); //exclude current'th item output.push_back(input[current]); subset(input, output, current+1); //include current'th item } } and first time...

First, you are trying to modify a reactive object outside the reactive expression. I would suggest to define column names inside the expression. Second, I don't think that modifying bc()$Yield is an authorized operation. So I would try do generate Yield also inside a reactive expression. Below is an edited...

You can use := to create a new column ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)), by=zip][, zgrp:=NULL] # cgrp zip V1 #1: 3 12007 19.35484 #2: 4 12007 48.38710 #3: 1 12007 32.25806 #4: 1 12008 57.89474 #5: 4 12008 31.57895 #6: 3 12008 10.52632 Or as @Frank commented, you...

You can try subset(df, V1 %in% l) # V1 V2 #1 a54 hi #3 sdx637 hi intersect can be used to get the common elements intersect(df$V1, l) #[1] "a54" "sdx637" but this will not give a logical index to subset the data, df[intersect(df$V1, l),] # V1 V2 #NA <NA> <NA>...

Using lapply() student.count = 2 # depends on your choice out = do.call(rbind, lapply(split(df, f = df$Schools), function(x){ x$no.of.students = length(x$Students); x = subset(x, no.of.students > student.count) })) #> out # Schools Students no.of.students #SchA.1 SchA st1 5 #SchA.2 SchA st2 5 #SchA.3 SchA st3 5 #SchA.4 SchA st4 5...

You'll want to use something like the following: new_data <- Data[sample(nrow(Data), N, prob = (1 - Data$Prob), replace = F),] ...

This will return only those rows beginning with a capital "S" using the substr()-ing function: dat[ substr( dat$City, 1 ,1) == "S" , ] Could also have used: dat[ grepl("^S", dat$City) , ] The second option is a very simple regular expression. Look at ?regex and ?grep....

r,matrix,filter,data.frame,subset

Try this: (I suspect will be faster than any apply approach) mat[ rowSums(mat == mat[,1])!=ncol(mat) , ] # ---with your object--- [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 ...

You can solve this by sorting first: import operator ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]] sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True) sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0)) filtered = [] i,j = 0,0 while i < len(sorted_ranges): filtered.append(sorted_ranges[i]) j = i+1 while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]: print "Remove " ,...