This is not a subsetting issue, it's a formatting/presentation issue. You're in the first circle of Burns's R Inferno ("[i]f you are using R and you think you’re in hell, this is a map for you"): another aspect of virtuous pagan beliefs—what is printed is all that there is If...

Use split and cumsum: ccc <- data.frame(ccc) split(ccc[ccc$aaa==1,], cumsum(ccc$aaa!=1)[ccc$aaa==1]) #$`0` # aaa bbb #1 1 4 #2 1 4 #3 1 4 # #$`2` # aaa bbb #6 1 3 # #$`3` # aaa bbb #8 1 3 #9 1 2 # #$`4` # aaa bbb #11 1 2 #12...

Perhaps my question was not formulated correctly, but this post had the solutions I was essentially looking for: http://stackoverflow.com/a/123481/2966951 http://stackoverflow.com/a/121435/2966951 Filtering out the most recent row was my problem. I was surprised that selecting from a subquery with a max value could yield anything other than that value....

It seems to me that almost all of your questions are regarding a list of data frames with same columns which cause you to use lapply loops on every single operation (which seem highly inefficient). Alternatively, you could vectorize most of your operations by simply binding all the lists into...

You can do this using a variant of the compare-cumsum-groupby pattern. Starting from >>> df["markers"].isin(["x","y"]) 0 False 1 False 2 True 3 False 4 False 5 False 6 True 7 False 8 False 9 True Name: markers, dtype: bool We can shift and take the cumulative sum to get: >>>...

You can try df1[duplicated(df1)|duplicated(df1, fromLast=TRUE),] # A B #2 1A 2 #3 1A 2 #5 2 4 #6 2 4 #7 3A 0 #8 3A 0 #9 4A 1 #10 4A 1 data df1 <- structure(list(A = c("1", "1A", "1A", "2", "2", "2", "3A", "3A", "4A", "4A", "5"), B =...

geom_boxplot calls boxplot.stats to calculate the positions of the upper and lower whiskers. You can do it too: > boxplot.stats(v) $stats [1] 93.340 96.069 97.876 99.087 100.359 $n [1] 24 $conf [1] 96.90265 98.84935 $out [1] -234.347 75.764 (v is assumed to be your input data vector): From the boxplot.stats...

r,if-statement,matrix,subset,covariance

I think this does what you're asking, if I'm interpreting the question correctly. I've given you a couple solutions, pick your poison. The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric. m <- read.table(header=T, stringsAsFactors=F,...

I might do as following. Only end dates seem to be necessary as start dates are just 1 year before. Loop is made using lapply() which iterates over all end dates. Subsetting is done mainly with difftime() by filtering any non-zero time difference between the two dates. set.seed(24) df1 <-...

You can change the 'Time' column to 'POSIXct' class and then subset datasubcolrow$Time <- as.POSIXct(datasubcolrow$Time, format='%d/%m/%Y %H:%M:%OS') subset(datasubcolrow, Time < as.POSIXct('15/05/2015 13:30:15.417', format='%d/%m/%Y %H:%M:%OS')) data datasubcolrow <- structure(list(Time = c("15/05/2015 13:30:07.291", "15/05/2015 13:30:08.307", "15/05/2015 13:30:09.323", "15/05/2015 13:30:10.338", "15/05/2015 13:30:11.354", "15/05/2015 13:30:12.370", "15/05/2015 13:30:13.386", "15/05/2015 13:30:14.402", "15/05/2015 13:30:15.417",...

python,list,filtering,subset,subsetting

You need realize methods hash and eq on object class A: def __init__(self, a): self.attr1 = a def __hash__(self): return hash(self.attr1) def __eq__(self, other): return self.attr1 == other.attr1 def __repr__(self): return str(self.attr1) Example: l = [A(5), A(4), A(4)] print list(set(l)) print list(set(l))[0].__class__ # ==> __main__.A. It's a object of class...

You mentioned data.table, so here's two possible approaches for both requests library(data.table) For 1. setDT(df)[, .SD[all(V2 != "X")], by = V1] # V1 V2 V3 # 1: HIJ P 40 # 2: HIJ Y 41 For 2. df[, .SD[.N == 1L], by = V1] # V1 V2 V3 # 1:...

one way: iris$Sepal.Width[ iris$Species %in% "virginica"] Probably best to google subsetting though as this is all easily available in tutorials everywhere, and this will have been asked elsewhere on SO....

r,loops,double,conditional,subset

Or simply with base R aggregate(Julian_Day ~., df, min) # Year Id Julian_Day # 1 1901 1 40 # 2 1968 1 200 # 3 1901 5 56 Or library(dplyr) df %>% group_by(Id, Year) %>% summarise(Julian_Day = min(Julian_Day)) # Source: local data frame [3 x 3] # Groups: Id #...

You can solve this by sorting first: import operator ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]] sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True) sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0)) filtered = [] i,j = 0,0 while i < len(sorted_ranges): filtered.append(sorted_ranges[i]) j = i+1 while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]: print "Remove " ,...

The question is quite badly formulated but if I understand it correctly it would mean doing something like that: col=3 #col is the # of the column where the numbers should be odd dat[dat[,col]%%2==1,] #dat is the data frame or matrix containing your data Or in the example you give...

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

You can use the function pmatch: x[-pmatch(y,x)] #[1] "A" "C" "A" "B" "D" Edit If your data can be strings of more than 1 character, here is an option to get what you want: xNew <- unlist(sapply(x[!duplicated(x)], function(item, tab1, tab2) { rep(item, tab1[item] - ifelse(item %in% names(tab2), tab2[item], 0)) },...

You could do: library(dplyr) df %>% # create an hypothetical "customer.name" column mutate(customer.name = sample(LETTERS[1:10], size = n(), replace = TRUE)) %>% # group data by "Parcel.." group_by(Parcel..) %>% # apply sum() to the selected columns mutate_each(funs(sum(.)), one_of("X.11", "X.13", "X.15", "num_units")) %>% # likewise for mean() mutate_each(funs(mean(.)), one_of("Acres", "Ttl_sq_ft", "Mtr.Size"))...

You want the ifelse function, which is a vectorized conditional: > x <- c(1, 1, 0, 0, 1) > y <- c(1, 2, 3, 4, 5) > z <- c(6, 7, 8, 9, 10) > ifelse(x == 1, y, z) [1] 1 2 8 9 5 You will have to...

Subsetting can be done by using []. See the SpatialPolygons-class help (?'SpatialPolygons-class'): Methods [...]: [ : select subset of (sets of) polygons; NAs are not permitted in the row index" So using your data: library(sp) Sr1 = Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2))) Sr2 = Polygon(cbind(c(5,4,2,5),c(2,3,2,2))) Sr3 = Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5))) Sr4 = Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)...

Based on your stated use of pandas colsToUse = ['col1', 'col2', 'col3'] rowsToUse = np.random.choice(range(len(df1)), 500) df2 = df1.ix[:, colsToUse] df3 = df1.ix[rowsToUse, :] There are also some other DataFrame helper functions for indexing: df1.loc, df1.iloc, and df1.xs. It's also helpful to look at the guide NumPy for MATLAB Users...

r,matrix,filter,data.frame,subset

Try this: (I suspect will be faster than any apply approach) mat[ rowSums(mat == mat[,1])!=ncol(mat) , ] # ---with your object--- [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 ...

You can use top_n from dplyr to select the 'n` top rows library(dplyr) top_n(AB, 4, BETAdn) Or use order from base R and then subset the top 'n' rows AB[order(-AB$BETAdn),][1:4,] ...

We could create a column 'MonthYr' from the 'date' column after converting it to 'Date' class. Get the number of observations ('n') per group ('permno', 'MonthYr') and use that to remove the IDs ('permno') that have at least one 'n' less than 10. library(dplyr) res <- df1 %>% mutate(MonthYr=format(as.Date(date, format='%m/%d/%Y'),...

I created a sample data. When you use subset(), you need a data frame and a condition. When you use lapply(), you make your function anonymous. That is, you write function(x) and further write codes which you want R to loop through. In your case, you want to loop through...

You can use list slicing to get top 10 elements from the ids list. Students.objects.filter(studentid__in=[p[0] for p in ids[:10]]) ...

You can use := to create a new column ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)), by=zip][, zgrp:=NULL] # cgrp zip V1 #1: 3 12007 19.35484 #2: 4 12007 48.38710 #3: 1 12007 32.25806 #4: 1 12008 57.89474 #5: 4 12008 31.57895 #6: 3 12008 10.52632 Or as @Frank commented, you...

You can try a[,setdiff(colnames(a), S)] Or a[,!colnames(a) %in% S] ...

#one way is to use `filter` from `dplyr` package (and assuming Date is already in Date format) library(dplyr) wg %>% filter(year %in% c(2008,2009) & months(Date) %in% c("July","August","September") #If you want to stick to subset, replace second & with |: subset(wg, date >= "2008-07-01" & date <= "2008-09-30" | date >=...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

I downloaded your data and had a look. If I am not mistaken, all you need is to subset data using Time.h. Here you have a range of time (10-23) you want. I used dplyr and did the following. You are asking R to pick up rows which have values...

With the sample data dd<-read.table(text="Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000", header=T) you can do this with base R subset(dd, Count>.25*ave(Count, Group, FUN=sum)) or the dplyr library library(dplyr) dd %>% group_by(Group) %>% filter(Count > .25 * sum(Count)) perhaps you'll find one more...

r,conditional,condition,subset

I was told the problem with my code is that I needed to put indexing on either side. Without the indexing on the right side, it does not know which row to apply the value from. So the correct code in this case would be: df$new[df$date=='a' & !is.na(df$date)] <- df$va[df$date=='a'...

Your commands depends on equality and this requires an atomic data, but instead you gave a vector. Instead of this, you can use operators as @davidArenburg has mentioned. You can do this by first forming a list of T/F and then retrieve the list according to corresponding value of the...

Using dplyr library(dplyr) df %>% filter(!grepl("13",Date)) ...

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

Try foverlaps from data.table library(data.table) setkey(setDT(df1), Chromosome, start, end) setkey(setDT(df2), Chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5), with=FALSE]), names(df1))[] # Chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000 Or as @Arun commented, we can use which=TRUE (to extract the indices) and subset 'df1'...

you can read about argument drop in the help page: ?'[' M[which(rownames(M) != "A"), ,drop=FALSE] ...

Short answer is: do not use subset but something like employ.data[employ.data[salary_string]>23000,] ...

You could try sampling the indices of players to construct the first team instead of sampling the names. idx1 <- sample(1:nrow(players), 5) You can actually use these indices to grab all the information about each team: team1 <- players[idx1,] team2 <- players[-idx1,] The score for each team can be computed...

library(ggplot2) diamonds %>% group_by(clarity) %>% summarise(mean_price = mean(price) , min_price =min(price) ,max_price = max(price) , median_price = median(as.numeric(price)), count = n()) %>% arrange(clarity) for arranging in descending order use arrange(desc(clarity)) instead of arrange(clarity)...

We can use Reduce with intersect in base R lapply(my.list, function(x) x[with(x, Letters %in% Reduce(intersect, split(Letters, Numbers))),]) Or using dplyr library(dplyr) lapply(my.list, function(x) x %>% group_by(Letters) %>% filter(n_distinct(Numbers)==2)) Instead of having a list, it can be changed to a single dataset with an additional grouping column and then do the...

Use a regular expression. For example: myd <- subset(df, grepl("-01$", Date_ID)) or myd <- df[grep("-01$", df$Date_ID),] ...

With your example, it is now clearer what you want and I give it a second try. I use dataGL_all as defined in your question and the define stations <- rep(c("FLI","FBE"),each=2) directions <- rep(c("in","out"),times=length(stations)/2) You could also extract the stations and directions from your data frame. Using your example, the...

r,vector,replace,subset,readline

You can build the new set of file lines as follows: new.r.lines <- c(r.lines[1:27],r.lines[28:1027][grepl(var,r.lines[28:1027])],r.lines[1028:length(r.lines)]); This combines lines 1:27 with the subset of the following 28:1027 lines that match your search pattern, then further combines with lines 1028 to the end of the file. Thus, you can pass that to writeLines()...

You were close, try this mean(iris$Sepal.Length[which(iris$Species != 'setosa')]) or mean(iris$Sepal.Length[iris$Species != 'setosa']) or mean(iris[iris$Species!= "setosa", "Sepal.Length"]) ...

java,algorithm,recursion,dynamic-programming,subset

Here's the super naive solution that simply generates a power set on your input array and then iterates over each set to see if the sum satisfies the given total. I hacked it together with code already available on StackOverflow. O(2n) in time and space. Gross. You can use the...

You may try with data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(a)), grouped by 'var1', we get a logical index for 'var2' elements that are greater than or equal to corresponding 'var2' elements for which 'var3' is TRUE and subset the dataset .SD. library(data.table) setDT(a)[,.SD[var2 >= var2[var3]], var1] #...

If there are lagging/leading spaces, this could occur. Remove those and it should work. library(stringr) data[,5] <- str_trim(data[,5]) Or data[,5] <- gsub('^\\s+|\\s+$', '', data[,5]) data[data[,5]=='Y',] Another option without removing the spaces would be grep data[grep('\\bY\\b', data[,5]),] ...

Use mongo aggregation like following : First use $unwind this will unwind stuff and then use $match to find elements greater than 4. After that $group data based on things.name and add required fields in $project. The query will be as following: db.collection.aggregate([ { $unwind: "$things" }, { $unwind: "$things.stuff"...

First, you are trying to modify a reactive object outside the reactive expression. I would suggest to define column names inside the expression. Second, I don't think that modifying bc()$Yield is an authorized operation. So I would try do generate Yield also inside a reactive expression. Below is an edited...

You can use %in% instead of == subset(data, x %in% 1:3) In general, if we are comparing two vectors of unequal sizes, %in% would be used. There are cases where we can take advantage of the recycling (it can fail too) if the length of one of the vector is...

As you'd like to produce boxplots for each group and year in the same graph, I think your dataset is ready for that and you can do the following: p <- ggplot(tmp.data, aes(factor(year), fill=group, value)) p + geom_boxplot() ...

this should work: library(dplyr) inner_join(dfA, dfB) %>% anti_join(dfC) which gives: Efficiency Value 1 8 7 2 2 4 ...

r,data.frame,subset,linear-regression

First, you might want to write a function that can calculate the slope for three consecutive values, like this: slope <- function(x){ if(all(is.na(x))) # if x is all missing, then lm will throw an error that we want to avoid return(NA) else return(coef(lm(I(1:3)~x))[2]) } Then you can use the apply()...

data = sub240 is an assignment statement. You can assign things on their own line or in function definitions and calls, but you can only provide logical statements in a while loop definition. If you want logical equality, you need ==. But unless data changes in the loop AND you...

c#,arrays,mongodb,subset,mongodb-csharp

Use mongo Set Operator using $setIsSubset in aggregation you will get your result, check following query : db.collectionName.aggregate({ "$project": { "Name": 1, "Tags": 1, "match": { "$setIsSubset": ["$Tags", ["A", "B", "C"]] //check Tags is subset of given array in your case array is ["A","B","C"] } } }, { "$match": {...

I think I spot a couple problems. In grep you don't want to set value to be TRUE. Setting value to be true returns the matched word instead of the index of the row. Also you are missing a comma (hence the undefinied columns error). Try This: LakeK_all[grep("^S", LakeK_all$Lake), ]...

r,loops,data.frame,pattern-matching,subset

Your question boils down to searching for sequences of "ABC" within the sequences of the IDs: (matches <- gregexpr("ABC", paste(dat$ID, collapse=""))[[1]]) # [1] 8 # ... This indicates that the only match begins at row 8. You now know that the information for Sensor1 are at rows numbered matches, the...

It sounds like you're looking for get: x[get(cname) > cutoff,] # val1 val2 # 1: 2 5 # 2: 3 4 # 3: 4 2 # 4: 5 4 # 5: 3 5 ...

Adapting my comment into an answer, taking into account the presented data and OP's comments. Note, code is not checked as dput of data was not obtained. library("dplyr") data_summarised <- data %>% mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date,...

python,algorithm,python-3.x,set,subset

Solution: def partitions(A): if not A: yield [] else: a, *R = A for partition in partitions(R): yield partition + [[a]] for i, subset in enumerate(partition): yield partition[:i] + [subset + [a]] + partition[i+1:] Explanation: The empty set only has the empty partition. For a non-empty set, take out one...

It isn't a nicest solution, but does what you want. library(MuMIn) options(na.action = na.fail) fm1 <- lm(y ~ X1 + X2, Cement) m1 <- dredge(fm1) ms1 <- subset(m1, delta < 32) fm2 <- lm(y ~ X3 + X4, Cement) m2 <- dredge(fm2) ms2 <- subset(m2, delta < 20) a1 <-...

Read up on mapcar et al: (defparameter a (list 1 2 3 4)) (mapcon (lambda (tail) (mapcar (lambda (x) (cons (car tail) x)) (cdr tail))) a) ==> ((1 . 2) (1 . 3) (1 . 4) (2 . 3) (2 . 4) (3 . 4)) ...

This will give you the desired output df. givenStr <- "that" row <- df[df$strs==givenStr,] df[,c(1,1+which(row[,-1]==1))] ...

Here's another idea using dplyr: library(dplyr) my_df %>% filter(lead(let == "b", 5) | lag(let == "b", 5)) Or as per @akrun suggestion using the devel version of data.table: setDT(my_df)[shift(let == "b", 5) | shift(let == "b", type = "lead", 5)] Which gives: # num let #1 0.36723709 a #2 0.24743170...

algorithm,recursion,combinations,subset

I think this can help you: void subset(vector<int> &input, vector<int> output, int current) { static int n=0; if (current == input.size()) { cout<<n++<<":\t {"; for(int i=0;i<output.size();i++) cout<<output[i]<<", "; cout<<"\b\b}\n"; } else { subset(input, output, current+1); //exclude current'th item output.push_back(input[current]); subset(input, output, current+1); //include current'th item } } and first time...

If we already created 3 datasets and want to subset the first "a" based on the elements of "c/c1", one option is anti_join from dplyr library(dplyr) anti_join(a, c1, by=c('A', 'B', 'C')) Update Or we could use a base R option with interaction to paste the columns of interest together in...

If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the...

dat <- read.table(text="Patient ID,Disease Score 101,5 101,2 101,2 105,1 110,5 115,1 115,1", stringsAs=FALSE, header=TRUE, sep=",") # one way in base dat[dat$Patient.ID %in% names(which(table(dat$Patient.ID)>2)),] # one way in dplyr library(dplyr) dat %>% group_by(Patient.ID) %>% mutate(n=n()) %>% ungroup() %>% filter(n>=2) %>% select(Patient.ID, Disease.Score) ...

constraints,scheduling,subset,cplex,opl

It is not clear what your problem is, but I am guessing your problem is to do with modelling things like products(j) in constraint 2. Try using sets for these - so create an array of sets of products in each product family. There are examples of this in the...

Try this : subset(raw_data,eval(parse(text=keep_rows))) Test : keep_rows <- "Blok>1" raw_data<- data.frame(Blok=c(1,2,3,0)) subset(raw_data,eval(parse(text=keep_rows))) Blok 2 2 3 3 ...

Try df[grep('1{3,}', df$history),] ...

You can try subset(df, V1 %in% l) # V1 V2 #1 a54 hi #3 sdx637 hi intersect can be used to get the common elements intersect(df$V1, l) #[1] "a54" "sdx637" but this will not give a logical index to subset the data, df[intersect(df$V1, l),] # V1 V2 #NA <NA> <NA>...

If you melt your data.table to long format, this is easy: library(reshape2) news1 <- melt(news, id.vars = "ID") news2 <- news1[abs(value) > 0.01,] # ID variable value #1: 8 diff.jan 0.101 #2: 202 diff.apr 10.000 #3: 203 diff.apr 11.000 #4: 50 diff.aug 0.221 dcast.data.table(news2, ID ~ variable) # ID diff.jan...

First, if you want a dataframe, you should use data.frame, not c: df <- data.frame(id, problem, solution1, solution2) Then you can subset like this for instance (no need to use subset per se) df2 <- df[!(grepl("a", df$problem) & (grepl("eat", df$solution1) | grepl("eat", solution2))),] # id problem solution1 solution2 # 2...

If you just want to keep, say, the first row of treat for each value of ID, then you can use slice: DATA_clean <- treat %>% group_by(ID) %>% slice(1) Your original code didn't work because n() returns to the total number of rows for each value of ID. If every...

You'll want to use something like the following: new_data <- Data[sample(nrow(Data), N, prob = (1 - Data$Prob), replace = F),] ...

This should work: dfOutput <- dfInput[apply(dfInput[,3:19]>0.00000001 & dfInput[,3:19]<0.300, 1, all, na.rm=TRUE), ] And now for a reproducible example, I'll explain what is going on: # data df <- data.frame(x = c(1:3, NA, 3:1), y=c(NA, NA, NA, 3, 3, 2, 3)) # this returns a matrix! df[, 1:2] > 2 #...

arrays,algorithm,sum,big-o,subset

You can easily solve this problem with a simple recursive. def F(arr): if len(arr) == 1: return (arr[0], 1) else: r = F(arr[:-1]) return (11 * r[0] + (r[1] + 1) * arr[-1], 2 * r[1] + 1) So, how does it work? It is simple. Let say we want...

Another option is to use Filter within lapply lapply(list12, Filter, f = is.numeric) # [[1]] # x1 z1 # 1 1 0 # 2 2 1 # 3 3 0 # 4 4 1 # # [[2]] # x2 y2 # 1 0 0 # 2 1 1 # 3...

One solution in base R: #using as.character since one$x and two$x are factors in this case > two[ as.character(one$x) != as.character(two$x), ] x y z 11 k 11 -0.6680130 12 l 12 -1.0501888 13 m 13 -1.0987269 14 n 14 1.0045557 15 o 15 -0.6002310 16 p 16 1.3162201 17...

I'm not sure it makes sense to copy your covariates into a new list like that. Here's a way to loop over columns and to dynimcally build formulas dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) dat1 <- dat[-9,] #x.list not used fit...

I hope that you want; (renewed :) function subset() { var arr1 = [1, 9, 3, 5, 4, 8, 2, 6, 3, 4] var arr2 = [5, 2, 4, 8, 2, 6, 4] var arr3 = []; var minSize=2; // minimum 2 element must in intersection var findedMax = "";...

r,logic,aggregate,subset,subsetting

The problem is that == is alternating between the values of "honda" and "harley" and comparing with the value in the relevant position of your "manufacturer" variable. On the other hand, %in% (as suggested by MrFlick) and | are checking across the entire "manufacturer" variable before deciding which values to...

You could try dat <- subset(wg, Year > 2007 & Year < 2010 & hour == 23 & mint %in% c(30, 32, 39, 40, 41, 49, 31)) ...

You could try something like this: select user_id from user_assets where asset_id = 1 or asset_id = 2 ... group by user_id having count(distinct asset_id) = (number of assets you are looking for) demo here showing your required output. the distinct isn't necessary if (user_id, asset_id) is a unique key...

This will return only those rows beginning with a capital "S" using the substr()-ing function: dat[ substr( dat$City, 1 ,1) == "S" , ] Could also have used: dat[ grepl("^S", dat$City) , ] The second option is a very simple regular expression. Look at ?regex and ?grep....

I think this is what you want. I've done it using dplyr's group_by and summarize here. For each Batch/ID it calculates the number of observations, the number of observations where measurement is between 6 and 7 and the ratio of those two. library(dplyr) # example data set set.seed(10) Measurement <-...

Using data.table, we'd do: setDT(data)[colA == "ABC", ColB := "XXXX"] and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies. We call this sub-assign by reference. You can read more about it in the new HTML vignettes....

Using lapply() student.count = 2 # depends on your choice out = do.call(rbind, lapply(split(df, f = df$Schools), function(x){ x$no.of.students = length(x$Students); x = subset(x, no.of.students > student.count) })) #> out # Schools Students no.of.students #SchA.1 SchA st1 5 #SchA.2 SchA st2 5 #SchA.3 SchA st3 5 #SchA.4 SchA st4 5...

just a little googling would have solved your problem, for example read this about logical operators, like this? ITEproduction_2014.2015<-subset(ITEproduction_2014.2015,Date.Difference>3 & Date.Difference<40) ...

The comment by @false is correct, as far as I can see: given two sets S and S': if S is a subset of S', then the intersection of the complement of S' and S should be the empty set (there are no elements outside of S' that are elements...

You can use package dplyr: library(dplyr) intersect(df1,df2) # a b #1 1 m #2 3 f Edit for the new data.frames with c column: you can use function semi_join (also from dplyr): semi_join(df1,df2,by=c("a","b")) # a b c #1 1 m df1 #2 3 f df1 Other option, in base R:...