The question is quite badly formulated but if I understand it correctly it would mean doing something like that: col=3 #col is the # of the column where the numbers should be odd dat[dat[,col]%%2==1,] #dat is the data frame or matrix containing your data Or in the example you give...

You can change the 'Time' column to 'POSIXct' class and then subset datasubcolrow$Time <- as.POSIXct(datasubcolrow$Time, format='%d/%m/%Y %H:%M:%OS') subset(datasubcolrow, Time < as.POSIXct('15/05/2015 13:30:15.417', format='%d/%m/%Y %H:%M:%OS')) data datasubcolrow <- structure(list(Time = c("15/05/2015 13:30:07.291", "15/05/2015 13:30:08.307", "15/05/2015 13:30:09.323", "15/05/2015 13:30:10.338", "15/05/2015 13:30:11.354", "15/05/2015 13:30:12.370", "15/05/2015 13:30:13.386", "15/05/2015 13:30:14.402", "15/05/2015 13:30:15.417",...

If you just want to keep, say, the first row of treat for each value of ID, then you can use slice: DATA_clean <- treat %>% group_by(ID) %>% slice(1) Your original code didn't work because n() returns to the total number of rows for each value of ID. If every...

r,logic,aggregate,subset,subsetting

The problem is that == is alternating between the values of "honda" and "harley" and comparing with the value in the relevant position of your "manufacturer" variable. On the other hand, %in% (as suggested by MrFlick) and | are checking across the entire "manufacturer" variable before deciding which values to...

This should work: dfOutput <- dfInput[apply(dfInput[,3:19]>0.00000001 & dfInput[,3:19]<0.300, 1, all, na.rm=TRUE), ] And now for a reproducible example, I'll explain what is going on: # data df <- data.frame(x = c(1:3, NA, 3:1), y=c(NA, NA, NA, 3, 3, 2, 3)) # this returns a matrix! df[, 1:2] > 2 #...

You could try dat <- subset(wg, Year > 2007 & Year < 2010 & hour == 23 & mint %in% c(30, 32, 39, 40, 41, 49, 31)) ...

python,list,filtering,subset,subsetting

You need realize methods hash and eq on object class A: def __init__(self, a): self.attr1 = a def __hash__(self): return hash(self.attr1) def __eq__(self, other): return self.attr1 == other.attr1 def __repr__(self): return str(self.attr1) Example: l = [A(5), A(4), A(4)] print list(set(l)) print list(set(l))[0].__class__ # ==> __main__.A. It's a object of class...

Try df[grep('1{3,}', df$history),] ...

#one way is to use `filter` from `dplyr` package (and assuming Date is already in Date format) library(dplyr) wg %>% filter(year %in% c(2008,2009) & months(Date) %in% c("July","August","September") #If you want to stick to subset, replace second & with |: subset(wg, date >= "2008-07-01" & date <= "2008-09-30" | date >=...

r,data.frame,subset,linear-regression

First, you might want to write a function that can calculate the slope for three consecutive values, like this: slope <- function(x){ if(all(is.na(x))) # if x is all missing, then lm will throw an error that we want to avoid return(NA) else return(coef(lm(I(1:3)~x))[2]) } Then you can use the apply()...

You may try with data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(a)), grouped by 'var1', we get a logical index for 'var2' elements that are greater than or equal to corresponding 'var2' elements for which 'var3' is TRUE and subset the dataset .SD. library(data.table) setDT(a)[,.SD[var2 >= var2[var3]], var1] #...

this should work: library(dplyr) inner_join(dfA, dfB) %>% anti_join(dfC) which gives: Efficiency Value 1 8 7 2 2 4 ...

Try this : subset(raw_data,eval(parse(text=keep_rows))) Test : keep_rows <- "Blok>1" raw_data<- data.frame(Blok=c(1,2,3,0)) subset(raw_data,eval(parse(text=keep_rows))) Blok 2 2 3 3 ...

one way: iris$Sepal.Width[ iris$Species %in% "virginica"] Probably best to google subsetting though as this is all easily available in tutorials everywhere, and this will have been asked elsewhere on SO....

If we already created 3 datasets and want to subset the first "a" based on the elements of "c/c1", one option is anti_join from dplyr library(dplyr) anti_join(a, c1, by=c('A', 'B', 'C')) Update Or we could use a base R option with interaction to paste the columns of interest together in...

Try this WantedData=Data2[Data2$ccession_number %in% SubsetData1$accession_number, ] ...

You'll want to use something like the following: new_data <- Data[sample(nrow(Data), N, prob = (1 - Data$Prob), replace = F),] ...

First, you are trying to modify a reactive object outside the reactive expression. I would suggest to define column names inside the expression. Second, I don't think that modifying bc()$Yield is an authorized operation. So I would try do generate Yield also inside a reactive expression. Below is an edited...

r,if-statement,matrix,subset,covariance

I think this does what you're asking, if I'm interpreting the question correctly. I've given you a couple solutions, pick your poison. The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric. m <- read.table(header=T, stringsAsFactors=F,...

constraints,scheduling,subset,cplex,opl

It is not clear what your problem is, but I am guessing your problem is to do with modelling things like products(j) in constraint 2. Try using sets for these - so create an array of sets of products in each product family. There are examples of this in the...

There is no problem. Look at nrow(Sin). You should see that is has fewer rows after subsetting. The first column in the output is the "row name". It is not a cumulative index that tells you how many rows there are. Row names are preserved after subsetting (ie they will...

With your example, it is now clearer what you want and I give it a second try. I use dataGL_all as defined in your question and the define stations <- rep(c("FLI","FBE"),each=2) directions <- rep(c("in","out"),times=length(stations)/2) You could also extract the stations and directions from your data frame. Using your example, the...

Based on your stated use of pandas colsToUse = ['col1', 'col2', 'col3'] rowsToUse = np.random.choice(range(len(df1)), 500) df2 = df1.ix[:, colsToUse] df3 = df1.ix[rowsToUse, :] There are also some other DataFrame helper functions for indexing: df1.loc, df1.iloc, and df1.xs. It's also helpful to look at the guide NumPy for MATLAB Users...

If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the...

python,algorithm,python-3.x,set,subset

Solution: def partitions(A): if not A: yield [] else: a, *R = A for partition in partitions(R): yield partition + [[a]] for i, subset in enumerate(partition): yield partition[:i] + [subset + [a]] + partition[i+1:] Explanation: The empty set only has the empty partition. For a non-empty set, take out one...

you can read about argument drop in the help page: ?'[' M[which(rownames(M) != "A"), ,drop=FALSE] ...

just a little googling would have solved your problem, for example read this about logical operators, like this? ITEproduction_2014.2015<-subset(ITEproduction_2014.2015,Date.Difference>3 & Date.Difference<40) ...

You could try something like this: select user_id from user_assets where asset_id = 1 or asset_id = 2 ... group by user_id having count(distinct asset_id) = (number of assets you are looking for) demo here showing your required output. the distinct isn't necessary if (user_id, asset_id) is a unique key...

Using data.table, we'd do: setDT(data)[colA == "ABC", ColB := "XXXX"] and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies. We call this sub-assign by reference. You can read more about it in the new HTML vignettes....

You can do this using a variant of the compare-cumsum-groupby pattern. Starting from >>> df["markers"].isin(["x","y"]) 0 False 1 False 2 True 3 False 4 False 5 False 6 True 7 False 8 False 9 True Name: markers, dtype: bool We can shift and take the cumulative sum to get: >>>...

c#,arrays,mongodb,subset,mongodb-csharp

Use mongo Set Operator using $setIsSubset in aggregation you will get your result, check following query : db.collectionName.aggregate({ "$project": { "Name": 1, "Tags": 1, "match": { "$setIsSubset": ["$Tags", ["A", "B", "C"]] //check Tags is subset of given array in your case array is ["A","B","C"] } } }, { "$match": {...

You can use := to create a new column ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)), by=zip][, zgrp:=NULL] # cgrp zip V1 #1: 3 12007 19.35484 #2: 4 12007 48.38710 #3: 1 12007 32.25806 #4: 1 12008 57.89474 #5: 4 12008 31.57895 #6: 3 12008 10.52632 Or as @Frank commented, you...

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

Adapting my comment into an answer, taking into account the presented data and OP's comments. Note, code is not checked as dput of data was not obtained. library("dplyr") data_summarised <- data %>% mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date,...

algorithm,recursion,combinations,subset

I think this can help you: void subset(vector<int> &input, vector<int> output, int current) { static int n=0; if (current == input.size()) { cout<<n++<<":\t {"; for(int i=0;i<output.size();i++) cout<<output[i]<<", "; cout<<"\b\b}\n"; } else { subset(input, output, current+1); //exclude current'th item output.push_back(input[current]); subset(input, output, current+1); //include current'th item } } and first time...

r,vector,replace,subset,readline

You can build the new set of file lines as follows: new.r.lines <- c(r.lines[1:27],r.lines[28:1027][grepl(var,r.lines[28:1027])],r.lines[1028:length(r.lines)]); This combines lines 1:27 with the subset of the following 28:1027 lines that match your search pattern, then further combines with lines 1028 to the end of the file. Thus, you can pass that to writeLines()...

Use split and cumsum: ccc <- data.frame(ccc) split(ccc[ccc$aaa==1,], cumsum(ccc$aaa!=1)[ccc$aaa==1]) #$`0` # aaa bbb #1 1 4 #2 1 4 #3 1 4 # #$`2` # aaa bbb #6 1 3 # #$`3` # aaa bbb #8 1 3 #9 1 2 # #$`4` # aaa bbb #11 1 2 #12...

Your commands depends on equality and this requires an atomic data, but instead you gave a vector. Instead of this, you can use operators as @davidArenburg has mentioned. You can do this by first forming a list of T/F and then retrieve the list according to corresponding value of the...

We could create a column 'MonthYr' from the 'date' column after converting it to 'Date' class. Get the number of observations ('n') per group ('permno', 'MonthYr') and use that to remove the IDs ('permno') that have at least one 'n' less than 10. library(dplyr) res <- df1 %>% mutate(MonthYr=format(as.Date(date, format='%m/%d/%Y'),...

As you'd like to produce boxplots for each group and year in the same graph, I think your dataset is ready for that and you can do the following: p <- ggplot(tmp.data, aes(factor(year), fill=group, value)) p + geom_boxplot() ...

arrays,algorithm,sum,big-o,subset

You can easily solve this problem with a simple recursive. def F(arr): if len(arr) == 1: return (arr[0], 1) else: r = F(arr[:-1]) return (11 * r[0] + (r[1] + 1) * arr[-1], 2 * r[1] + 1) So, how does it work? It is simple. Let say we want...

library(ggplot2) diamonds %>% group_by(clarity) %>% summarise(mean_price = mean(price) , min_price =min(price) ,max_price = max(price) , median_price = median(as.numeric(price)), count = n()) %>% arrange(clarity) for arranging in descending order use arrange(desc(clarity)) instead of arrange(clarity)...

You mentioned data.table, so here's two possible approaches for both requests library(data.table) For 1. setDT(df)[, .SD[all(V2 != "X")], by = V1] # V1 V2 V3 # 1: HIJ P 40 # 2: HIJ Y 41 For 2. df[, .SD[.N == 1L], by = V1] # V1 V2 V3 # 1:...

I think I spot a couple problems. In grep you don't want to set value to be TRUE. Setting value to be true returns the matched word instead of the index of the row. Also you are missing a comma (hence the undefinied columns error). Try This: LakeK_all[grep("^S", LakeK_all$Lake), ]...

you can try this with(df, df[ (x==1 & y>15) | (x==2 & y>5), ]) x y 1 1 30 4 2 10 5 2 18 or with dplyr library(dplyr) filter(df, (x==1 & y>15) | (x==2 & y>5)) ...

Perhaps my question was not formulated correctly, but this post had the solutions I was essentially looking for: http://stackoverflow.com/a/123481/2966951 http://stackoverflow.com/a/121435/2966951 Filtering out the most recent row was my problem. I was surprised that selecting from a subquery with a max value could yield anything other than that value....

I might do as following. Only end dates seem to be necessary as start dates are just 1 year before. Loop is made using lapply() which iterates over all end dates. Subsetting is done mainly with difftime() by filtering any non-zero time difference between the two dates. set.seed(24) df1 <-...

If you melt your data.table to long format, this is easy: library(reshape2) news1 <- melt(news, id.vars = "ID") news2 <- news1[abs(value) > 0.01,] # ID variable value #1: 8 diff.jan 0.101 #2: 202 diff.apr 10.000 #3: 203 diff.apr 11.000 #4: 50 diff.aug 0.221 dcast.data.table(news2, ID ~ variable) # ID diff.jan...

r,conditional,condition,subset

I was told the problem with my code is that I needed to put indexing on either side. Without the indexing on the right side, it does not know which row to apply the value from. So the correct code in this case would be: df$new[df$date=='a' & !is.na(df$date)] <- df$va[df$date=='a'...

You could try sampling the indices of players to construct the first team instead of sampling the names. idx1 <- sample(1:nrow(players), 5) You can actually use these indices to grab all the information about each team: team1 <- players[idx1,] team2 <- players[-idx1,] The score for each team can be computed...

r,loops,data.frame,pattern-matching,subset

Your question boils down to searching for sequences of "ABC" within the sequences of the IDs: (matches <- gregexpr("ABC", paste(dat$ID, collapse=""))[[1]]) # [1] 8 # ... This indicates that the only match begins at row 8. You now know that the information for Sensor1 are at rows numbered matches, the...

You could do: library(dplyr) df %>% # create an hypothetical "customer.name" column mutate(customer.name = sample(LETTERS[1:10], size = n(), replace = TRUE)) %>% # group data by "Parcel.." group_by(Parcel..) %>% # apply sum() to the selected columns mutate_each(funs(sum(.)), one_of("X.11", "X.13", "X.15", "num_units")) %>% # likewise for mean() mutate_each(funs(mean(.)), one_of("Acres", "Ttl_sq_ft", "Mtr.Size"))...

I hope that you want; (renewed :) function subset() { var arr1 = [1, 9, 3, 5, 4, 8, 2, 6, 3, 4] var arr2 = [5, 2, 4, 8, 2, 6, 4] var arr3 = []; var minSize=2; // minimum 2 element must in intersection var findedMax = "";...

If there are lagging/leading spaces, this could occur. Remove those and it should work. library(stringr) data[,5] <- str_trim(data[,5]) Or data[,5] <- gsub('^\\s+|\\s+$', '', data[,5]) data[data[,5]=='Y',] Another option without removing the spaces would be grep data[grep('\\bY\\b', data[,5]),] ...

The comment by @false is correct, as far as I can see: given two sets S and S': if S is a subset of S', then the intersection of the complement of S' and S should be the empty set (there are no elements outside of S' that are elements...

It isn't a nicest solution, but does what you want. library(MuMIn) options(na.action = na.fail) fm1 <- lm(y ~ X1 + X2, Cement) m1 <- dredge(fm1) ms1 <- subset(m1, delta < 32) fm2 <- lm(y ~ X3 + X4, Cement) m2 <- dredge(fm2) ms2 <- subset(m2, delta < 20) a1 <-...

Use a regular expression. For example: myd <- subset(df, grepl("-01$", Date_ID)) or myd <- df[grep("-01$", df$Date_ID),] ...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

r,matrix,filter,data.frame,subset

Try this: (I suspect will be faster than any apply approach) mat[ rowSums(mat == mat[,1])!=ncol(mat) , ] # ---with your object--- [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 ...

Subsetting can be done by using []. See the SpatialPolygons-class help (?'SpatialPolygons-class'): Methods [...]: [ : select subset of (sets of) polygons; NAs are not permitted in the row index" So using your data: library(sp) Sr1 = Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2))) Sr2 = Polygon(cbind(c(5,4,2,5),c(2,3,2,2))) Sr3 = Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5))) Sr4 = Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)...

Short answer is: do not use subset but something like employ.data[employ.data[salary_string]>23000,] ...

You can solve this by sorting first: import operator ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]] sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True) sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0)) filtered = [] i,j = 0,0 while i < len(sorted_ranges): filtered.append(sorted_ranges[i]) j = i+1 while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]: print "Remove " ,...

java,algorithm,recursion,dynamic-programming,subset

Here's the super naive solution that simply generates a power set on your input array and then iterates over each set to see if the sum satisfies the given total. I hacked it together with code already available on StackOverflow. O(2n) in time and space. Gross. You can use the...

Using lapply() student.count = 2 # depends on your choice out = do.call(rbind, lapply(split(df, f = df$Schools), function(x){ x$no.of.students = length(x$Students); x = subset(x, no.of.students > student.count) })) #> out # Schools Students no.of.students #SchA.1 SchA st1 5 #SchA.2 SchA st2 5 #SchA.3 SchA st3 5 #SchA.4 SchA st4 5...

It sounds like you're looking for get: x[get(cname) > cutoff,] # val1 val2 # 1: 2 5 # 2: 3 4 # 3: 4 2 # 4: 5 4 # 5: 3 5 ...

You can try subset(df, V1 %in% l) # V1 V2 #1 a54 hi #3 sdx637 hi intersect can be used to get the common elements intersect(df$V1, l) #[1] "a54" "sdx637" but this will not give a logical index to subset the data, df[intersect(df$V1, l),] # V1 V2 #NA <NA> <NA>...

You want the ifelse function, which is a vectorized conditional: > x <- c(1, 1, 0, 0, 1) > y <- c(1, 2, 3, 4, 5) > z <- c(6, 7, 8, 9, 10) > ifelse(x == 1, y, z) [1] 1 2 8 9 5 You will have to...

Use mongo aggregation like following : First use $unwind this will unwind stuff and then use $match to find elements greater than 4. After that $group data based on things.name and add required fields in $project. The query will be as following: db.collection.aggregate([ { $unwind: "$things" }, { $unwind: "$things.stuff"...

This will return only those rows beginning with a capital "S" using the substr()-ing function: dat[ substr( dat$City, 1 ,1) == "S" , ] Could also have used: dat[ grepl("^S", dat$City) , ] The second option is a very simple regular expression. Look at ?regex and ?grep....

I'm not sure it makes sense to copy your covariates into a new list like that. Here's a way to loop over columns and to dynimcally build formulas dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) dat1 <- dat[-9,] #x.list not used fit...

One solution in base R: #using as.character since one$x and two$x are factors in this case > two[ as.character(one$x) != as.character(two$x), ] x y z 11 k 11 -0.6680130 12 l 12 -1.0501888 13 m 13 -1.0987269 14 n 14 1.0045557 15 o 15 -0.6002310 16 p 16 1.3162201 17...

data = sub240 is an assignment statement. You can assign things on their own line or in function definitions and calls, but you can only provide logical statements in a while loop definition. If you want logical equality, you need ==. But unless data changes in the loop AND you...

r,loops,double,conditional,subset

Or simply with base R aggregate(Julian_Day ~., df, min) # Year Id Julian_Day # 1 1901 1 40 # 2 1968 1 200 # 3 1901 5 56 Or library(dplyr) df %>% group_by(Id, Year) %>% summarise(Julian_Day = min(Julian_Day)) # Source: local data frame [3 x 3] # Groups: Id #...

You were close, try this mean(iris$Sepal.Length[which(iris$Species != 'setosa')]) or mean(iris$Sepal.Length[iris$Species != 'setosa']) or mean(iris[iris$Species!= "setosa", "Sepal.Length"]) ...

With the sample data dd<-read.table(text="Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000", header=T) you can do this with base R subset(dd, Count>.25*ave(Count, Group, FUN=sum)) or the dplyr library library(dplyr) dd %>% group_by(Group) %>% filter(Count > .25 * sum(Count)) perhaps you'll find one more...

This will give you the desired output df. givenStr <- "that" row <- df[df$strs==givenStr,] df[,c(1,1+which(row[,-1]==1))] ...

You can use %in% instead of == subset(data, x %in% 1:3) In general, if we are comparing two vectors of unequal sizes, %in% would be used. There are cases where we can take advantage of the recycling (it can fail too) if the length of one of the vector is...

I think this is what you want. I've done it using dplyr's group_by and summarize here. For each Batch/ID it calculates the number of observations, the number of observations where measurement is between 6 and 7 and the ratio of those two. library(dplyr) # example data set set.seed(10) Measurement <-...

We can use Reduce with intersect in base R lapply(my.list, function(x) x[with(x, Letters %in% Reduce(intersect, split(Letters, Numbers))),]) Or using dplyr library(dplyr) lapply(my.list, function(x) x %>% group_by(Letters) %>% filter(n_distinct(Numbers)==2)) Instead of having a list, it can be changed to a single dataset with an additional grouping column and then do the...

First, if you want a dataframe, you should use data.frame, not c: df <- data.frame(id, problem, solution1, solution2) Then you can subset like this for instance (no need to use subset per se) df2 <- df[!(grepl("a", df$problem) & (grepl("eat", df$solution1) | grepl("eat", solution2))),] # id problem solution1 solution2 # 2...

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

Creating the Test/Sample dataset data test; infile datalines dlm=','; input Over : 8. Ball : 8. Bowling : $15. Runs_scored : 8. Count : 8. ; datalines; 39,1,Ali,1,1 39,2,Ali,1,2 39,3,Ali,2,3 39,4,Ali,1,4 39,5,Ali,1,5 39,6,Ali,1,6 36,1,Anderson,1,1 36,2,Anderson,1,2 36,3,Anderson,1,3 36,4,Anderson,0,4 36,6,Anderson,0,6 ; run; Selecting the distinct overs(as I understand Cricket, each over would...

Here's another idea using dplyr: library(dplyr) my_df %>% filter(lead(let == "b", 5) | lag(let == "b", 5)) Or as per @akrun suggestion using the devel version of data.table: setDT(my_df)[shift(let == "b", 5) | shift(let == "b", type = "lead", 5)] Which gives: # num let #1 0.36723709 a #2 0.24743170...

Another option is to use Filter within lapply lapply(list12, Filter, f = is.numeric) # [[1]] # x1 z1 # 1 1 0 # 2 2 1 # 3 3 0 # 4 4 1 # # [[2]] # x2 y2 # 1 0 0 # 2 1 1 # 3...

You can try a[,setdiff(colnames(a), S)] Or a[,!colnames(a) %in% S] ...

You can try df1[duplicated(df1)|duplicated(df1, fromLast=TRUE),] # A B #2 1A 2 #3 1A 2 #5 2 4 #6 2 4 #7 3A 0 #8 3A 0 #9 4A 1 #10 4A 1 data df1 <- structure(list(A = c("1", "1A", "1A", "2", "2", "2", "3A", "3A", "4A", "4A", "5"), B =...

You can use the function pmatch: x[-pmatch(y,x)] #[1] "A" "C" "A" "B" "D" Edit If your data can be strings of more than 1 character, here is an option to get what you want: xNew <- unlist(sapply(x[!duplicated(x)], function(item, tab1, tab2) { rep(item, tab1[item] - ifelse(item %in% names(tab2), tab2[item], 0)) },...

You can use package dplyr: library(dplyr) intersect(df1,df2) # a b #1 1 m #2 3 f Edit for the new data.frames with c column: you can use function semi_join (also from dplyr): semi_join(df1,df2,by=c("a","b")) # a b c #1 1 m df1 #2 3 f df1 Other option, in base R:...

Read up on mapcar et al: (defparameter a (list 1 2 3 4)) (mapcon (lambda (tail) (mapcar (lambda (x) (cons (car tail) x)) (cdr tail))) a) ==> ((1 . 2) (1 . 3) (1 . 4) (2 . 3) (2 . 4) (3 . 4)) ...

Try foverlaps from data.table library(data.table) setkey(setDT(df1), Chromosome, start, end) setkey(setDT(df2), Chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5), with=FALSE]), names(df1))[] # Chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000 Or as @Arun commented, we can use which=TRUE (to extract the indices) and subset 'df1'...

This is not a subsetting issue, it's a formatting/presentation issue. You're in the first circle of Burns's R Inferno ("[i]f you are using R and you think you’re in hell, this is a map for you"): another aspect of virtuous pagan beliefs—what is printed is all that there is If...

I created a sample data. When you use subset(), you need a data frame and a condition. When you use lapply(), you make your function anonymous. That is, you write function(x) and further write codes which you want R to loop through. In your case, you want to loop through...