Your commands depends on equality and this requires an atomic data, but instead you gave a vector. Instead of this, you can use operators as @davidArenburg has mentioned. You can do this by first forming a list of T/F and then retrieve the list according to corresponding value of the...

Read up on mapcar et al: (defparameter a (list 1 2 3 4)) (mapcon (lambda (tail) (mapcar (lambda (x) (cons (car tail) x)) (cdr tail))) a) ==> ((1 . 2) (1 . 3) (1 . 4) (2 . 3) (2 . 4) (3 . 4)) ...

one way: iris$Sepal.Width[ iris$Species %in% "virginica"] Probably best to google subsetting though as this is all easily available in tutorials everywhere, and this will have been asked elsewhere on SO....

r,conditional,condition,subset

I was told the problem with my code is that I needed to put indexing on either side. Without the indexing on the right side, it does not know which row to apply the value from. So the correct code in this case would be: df$new[df$date=='a' & !is.na(df$date)] <- df$va[df$date=='a'...

First, if you want a dataframe, you should use data.frame, not c: df <- data.frame(id, problem, solution1, solution2) Then you can subset like this for instance (no need to use subset per se) df2 <- df[!(grepl("a", df$problem) & (grepl("eat", df$solution1) | grepl("eat", solution2))),] # id problem solution1 solution2 # 2...

I'm not sure it makes sense to copy your covariates into a new list like that. Here's a way to loop over columns and to dynimcally build formulas dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) dat1 <- dat[-9,] #x.list not used fit...

Use a regular expression. For example: myd <- subset(df, grepl("-01$", Date_ID)) or myd <- df[grep("-01$", df$Date_ID),] ...

I think this is what you want. I've done it using dplyr's group_by and summarize here. For each Batch/ID it calculates the number of observations, the number of observations where measurement is between 6 and 7 and the ratio of those two. library(dplyr) # example data set set.seed(10) Measurement <-...

data = sub240 is an assignment statement. You can assign things on their own line or in function definitions and calls, but you can only provide logical statements in a while loop definition. If you want logical equality, you need ==. But unless data changes in the loop AND you...

You mentioned data.table, so here's two possible approaches for both requests library(data.table) For 1. setDT(df)[, .SD[all(V2 != "X")], by = V1] # V1 V2 V3 # 1: HIJ P 40 # 2: HIJ Y 41 For 2. df[, .SD[.N == 1L], by = V1] # V1 V2 V3 # 1:...

You can use the function pmatch: x[-pmatch(y,x)] #[1] "A" "C" "A" "B" "D" Edit If your data can be strings of more than 1 character, here is an option to get what you want: xNew <- unlist(sapply(x[!duplicated(x)], function(item, tab1, tab2) { rep(item, tab1[item] - ifelse(item %in% names(tab2), tab2[item], 0)) },...

As you'd like to produce boxplots for each group and year in the same graph, I think your dataset is ready for that and you can do the following: p <- ggplot(tmp.data, aes(factor(year), fill=group, value)) p + geom_boxplot() ...

r,vector,replace,subset,readline

You can build the new set of file lines as follows: new.r.lines <- c(r.lines[1:27],r.lines[28:1027][grepl(var,r.lines[28:1027])],r.lines[1028:length(r.lines)]); This combines lines 1:27 with the subset of the following 28:1027 lines that match your search pattern, then further combines with lines 1028 to the end of the file. Thus, you can pass that to writeLines()...

You can try a[,setdiff(colnames(a), S)] Or a[,!colnames(a) %in% S] ...

constraints,scheduling,subset,cplex,opl

It is not clear what your problem is, but I am guessing your problem is to do with modelling things like products(j) in constraint 2. Try using sets for these - so create an array of sets of products in each product family. There are examples of this in the...

One solution in base R: #using as.character since one$x and two$x are factors in this case > two[ as.character(one$x) != as.character(two$x), ] x y z 11 k 11 -0.6680130 12 l 12 -1.0501888 13 m 13 -1.0987269 14 n 14 1.0045557 15 o 15 -0.6002310 16 p 16 1.3162201 17...

algorithm,recursion,combinations,subset

I think this can help you: void subset(vector<int> &input, vector<int> output, int current) { static int n=0; if (current == input.size()) { cout<<n++<<":\t {"; for(int i=0;i<output.size();i++) cout<<output[i]<<", "; cout<<"\b\b}\n"; } else { subset(input, output, current+1); //exclude current'th item output.push_back(input[current]); subset(input, output, current+1); //include current'th item } } and first time...

Try this WantedData=Data2[Data2$ccession_number %in% SubsetData1$accession_number, ] ...

If we already created 3 datasets and want to subset the first "a" based on the elements of "c/c1", one option is anti_join from dplyr library(dplyr) anti_join(a, c1, by=c('A', 'B', 'C')) Update Or we could use a base R option with interaction to paste the columns of interest together in...

You may try with data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(a)), grouped by 'var1', we get a logical index for 'var2' elements that are greater than or equal to corresponding 'var2' elements for which 'var3' is TRUE and subset the dataset .SD. library(data.table) setDT(a)[,.SD[var2 >= var2[var3]], var1] #...

You could do: library(dplyr) df %>% # create an hypothetical "customer.name" column mutate(customer.name = sample(LETTERS[1:10], size = n(), replace = TRUE)) %>% # group data by "Parcel.." group_by(Parcel..) %>% # apply sum() to the selected columns mutate_each(funs(sum(.)), one_of("X.11", "X.13", "X.15", "num_units")) %>% # likewise for mean() mutate_each(funs(mean(.)), one_of("Acres", "Ttl_sq_ft", "Mtr.Size"))...

You could try dat <- subset(wg, Year > 2007 & Year < 2010 & hour == 23 & mint %in% c(30, 32, 39, 40, 41, 49, 31)) ...

You were close, try this mean(iris$Sepal.Length[which(iris$Species != 'setosa')]) or mean(iris$Sepal.Length[iris$Species != 'setosa']) or mean(iris[iris$Species!= "setosa", "Sepal.Length"]) ...

you can try this with(df, df[ (x==1 & y>15) | (x==2 & y>5), ]) x y 1 1 30 4 2 10 5 2 18 or with dplyr library(dplyr) filter(df, (x==1 & y>15) | (x==2 & y>5)) ...

You can do this using a variant of the compare-cumsum-groupby pattern. Starting from >>> df["markers"].isin(["x","y"]) 0 False 1 False 2 True 3 False 4 False 5 False 6 True 7 False 8 False 9 True Name: markers, dtype: bool We can shift and take the cumulative sum to get: >>>...

Try df[grep('1{3,}', df$history),] ...

The comment by @false is correct, as far as I can see: given two sets S and S': if S is a subset of S', then the intersection of the complement of S' and S should be the empty set (there are no elements outside of S' that are elements...

We could create a column 'MonthYr' from the 'date' column after converting it to 'Date' class. Get the number of observations ('n') per group ('permno', 'MonthYr') and use that to remove the IDs ('permno') that have at least one 'n' less than 10. library(dplyr) res <- df1 %>% mutate(MonthYr=format(as.Date(date, format='%m/%d/%Y'),...

java,algorithm,recursion,dynamic-programming,subset

Here's the super naive solution that simply generates a power set on your input array and then iterates over each set to see if the sum satisfies the given total. I hacked it together with code already available on StackOverflow. O(2n) in time and space. Gross. You can use the...

You can try subset(df, V1 %in% l) # V1 V2 #1 a54 hi #3 sdx637 hi intersect can be used to get the common elements intersect(df$V1, l) #[1] "a54" "sdx637" but this will not give a logical index to subset the data, df[intersect(df$V1, l),] # V1 V2 #NA <NA> <NA>...

just a little googling would have solved your problem, for example read this about logical operators, like this? ITEproduction_2014.2015<-subset(ITEproduction_2014.2015,Date.Difference>3 & Date.Difference<40) ...

You can change the 'Time' column to 'POSIXct' class and then subset datasubcolrow$Time <- as.POSIXct(datasubcolrow$Time, format='%d/%m/%Y %H:%M:%OS') subset(datasubcolrow, Time < as.POSIXct('15/05/2015 13:30:15.417', format='%d/%m/%Y %H:%M:%OS')) data datasubcolrow <- structure(list(Time = c("15/05/2015 13:30:07.291", "15/05/2015 13:30:08.307", "15/05/2015 13:30:09.323", "15/05/2015 13:30:10.338", "15/05/2015 13:30:11.354", "15/05/2015 13:30:12.370", "15/05/2015 13:30:13.386", "15/05/2015 13:30:14.402", "15/05/2015 13:30:15.417",...

You can use package dplyr: library(dplyr) intersect(df1,df2) # a b #1 1 m #2 3 f Edit for the new data.frames with c column: you can use function semi_join (also from dplyr): semi_join(df1,df2,by=c("a","b")) # a b c #1 1 m df1 #2 3 f df1 Other option, in base R:...

Based on your stated use of pandas colsToUse = ['col1', 'col2', 'col3'] rowsToUse = np.random.choice(range(len(df1)), 500) df2 = df1.ix[:, colsToUse] df3 = df1.ix[rowsToUse, :] There are also some other DataFrame helper functions for indexing: df1.loc, df1.iloc, and df1.xs. It's also helpful to look at the guide NumPy for MATLAB Users...

I downloaded your data and had a look. If I am not mistaken, all you need is to subset data using Time.h. Here you have a range of time (10-23) you want. I used dplyr and did the following. You are asking R to pick up rows which have values...

We can use Reduce with intersect in base R lapply(my.list, function(x) x[with(x, Letters %in% Reduce(intersect, split(Letters, Numbers))),]) Or using dplyr library(dplyr) lapply(my.list, function(x) x %>% group_by(Letters) %>% filter(n_distinct(Numbers)==2)) Instead of having a list, it can be changed to a single dataset with an additional grouping column and then do the...

Subsetting can be done by using []. See the SpatialPolygons-class help (?'SpatialPolygons-class'): Methods [...]: [ : select subset of (sets of) polygons; NAs are not permitted in the row index" So using your data: library(sp) Sr1 = Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2))) Sr2 = Polygon(cbind(c(5,4,2,5),c(2,3,2,2))) Sr3 = Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5))) Sr4 = Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)...

You want the ifelse function, which is a vectorized conditional: > x <- c(1, 1, 0, 0, 1) > y <- c(1, 2, 3, 4, 5) > z <- c(6, 7, 8, 9, 10) > ifelse(x == 1, y, z) [1] 1 2 8 9 5 You will have to...

This will return only those rows beginning with a capital "S" using the substr()-ing function: dat[ substr( dat$City, 1 ,1) == "S" , ] Could also have used: dat[ grepl("^S", dat$City) , ] The second option is a very simple regular expression. Look at ?regex and ?grep....

It sounds like you're looking for get: x[get(cname) > cutoff,] # val1 val2 # 1: 2 5 # 2: 3 4 # 3: 4 2 # 4: 5 4 # 5: 3 5 ...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

r,loops,double,conditional,subset

Or simply with base R aggregate(Julian_Day ~., df, min) # Year Id Julian_Day # 1 1901 1 40 # 2 1968 1 200 # 3 1901 5 56 Or library(dplyr) df %>% group_by(Id, Year) %>% summarise(Julian_Day = min(Julian_Day)) # Source: local data frame [3 x 3] # Groups: Id #...

Use split and cumsum: ccc <- data.frame(ccc) split(ccc[ccc$aaa==1,], cumsum(ccc$aaa!=1)[ccc$aaa==1]) #$`0` # aaa bbb #1 1 4 #2 1 4 #3 1 4 # #$`2` # aaa bbb #6 1 3 # #$`3` # aaa bbb #8 1 3 #9 1 2 # #$`4` # aaa bbb #11 1 2 #12...

r,logic,aggregate,subset,subsetting

The problem is that == is alternating between the values of "honda" and "harley" and comparing with the value in the relevant position of your "manufacturer" variable. On the other hand, %in% (as suggested by MrFlick) and | are checking across the entire "manufacturer" variable before deciding which values to...

It isn't a nicest solution, but does what you want. library(MuMIn) options(na.action = na.fail) fm1 <- lm(y ~ X1 + X2, Cement) m1 <- dredge(fm1) ms1 <- subset(m1, delta < 32) fm2 <- lm(y ~ X3 + X4, Cement) m2 <- dredge(fm2) ms2 <- subset(m2, delta < 20) a1 <-...

I hope that you want; (renewed :) function subset() { var arr1 = [1, 9, 3, 5, 4, 8, 2, 6, 3, 4] var arr2 = [5, 2, 4, 8, 2, 6, 4] var arr3 = []; var minSize=2; // minimum 2 element must in intersection var findedMax = "";...

geom_boxplot calls boxplot.stats to calculate the positions of the upper and lower whiskers. You can do it too: > boxplot.stats(v) $stats [1] 93.340 96.069 97.876 99.087 100.359 $n [1] 24 $conf [1] 96.90265 98.84935 $out [1] -234.347 75.764 (v is assumed to be your input data vector): From the boxplot.stats...

I think I spot a couple problems. In grep you don't want to set value to be TRUE. Setting value to be true returns the matched word instead of the index of the row. Also you are missing a comma (hence the undefinied columns error). Try This: LakeK_all[grep("^S", LakeK_all$Lake), ]...

The question is quite badly formulated but if I understand it correctly it would mean doing something like that: col=3 #col is the # of the column where the numbers should be odd dat[dat[,col]%%2==1,] #dat is the data frame or matrix containing your data Or in the example you give...

This will give you the desired output df. givenStr <- "that" row <- df[df$strs==givenStr,] df[,c(1,1+which(row[,-1]==1))] ...

You can use := to create a new column ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)), by=zip][, zgrp:=NULL] # cgrp zip V1 #1: 3 12007 19.35484 #2: 4 12007 48.38710 #3: 1 12007 32.25806 #4: 1 12008 57.89474 #5: 4 12008 31.57895 #6: 3 12008 10.52632 Or as @Frank commented, you...

c#,arrays,mongodb,subset,mongodb-csharp

Use mongo Set Operator using $setIsSubset in aggregation you will get your result, check following query : db.collectionName.aggregate({ "$project": { "Name": 1, "Tags": 1, "match": { "$setIsSubset": ["$Tags", ["A", "B", "C"]] //check Tags is subset of given array in your case array is ["A","B","C"] } } }, { "$match": {...

Using lapply() student.count = 2 # depends on your choice out = do.call(rbind, lapply(split(df, f = df$Schools), function(x){ x$no.of.students = length(x$Students); x = subset(x, no.of.students > student.count) })) #> out # Schools Students no.of.students #SchA.1 SchA st1 5 #SchA.2 SchA st2 5 #SchA.3 SchA st3 5 #SchA.4 SchA st4 5...

You could try sampling the indices of players to construct the first team instead of sampling the names. idx1 <- sample(1:nrow(players), 5) You can actually use these indices to grab all the information about each team: team1 <- players[idx1,] team2 <- players[-idx1,] The score for each team can be computed...

Here's another idea using dplyr: library(dplyr) my_df %>% filter(lead(let == "b", 5) | lag(let == "b", 5)) Or as per @akrun suggestion using the devel version of data.table: setDT(my_df)[shift(let == "b", 5) | shift(let == "b", type = "lead", 5)] Which gives: # num let #1 0.36723709 a #2 0.24743170...

Using data.table, we'd do: setDT(data)[colA == "ABC", ColB := "XXXX"] and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies. We call this sub-assign by reference. You can read more about it in the new HTML vignettes....

With the sample data dd<-read.table(text="Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000", header=T) you can do this with base R subset(dd, Count>.25*ave(Count, Group, FUN=sum)) or the dplyr library library(dplyr) dd %>% group_by(Group) %>% filter(Count > .25 * sum(Count)) perhaps you'll find one more...

r,if-statement,matrix,subset,covariance

I think this does what you're asking, if I'm interpreting the question correctly. I've given you a couple solutions, pick your poison. The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric. m <- read.table(header=T, stringsAsFactors=F,...

arrays,algorithm,sum,big-o,subset

You can easily solve this problem with a simple recursive. def F(arr): if len(arr) == 1: return (arr[0], 1) else: r = F(arr[:-1]) return (11 * r[0] + (r[1] + 1) * arr[-1], 2 * r[1] + 1) So, how does it work? It is simple. Let say we want...

I created a sample data. When you use subset(), you need a data frame and a condition. When you use lapply(), you make your function anonymous. That is, you write function(x) and further write codes which you want R to loop through. In your case, you want to loop through...

Try foverlaps from data.table library(data.table) setkey(setDT(df1), Chromosome, start, end) setkey(setDT(df2), Chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5), with=FALSE]), names(df1))[] # Chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000 Or as @Arun commented, we can use which=TRUE (to extract the indices) and subset 'df1'...

Creating the Test/Sample dataset data test; infile datalines dlm=','; input Over : 8. Ball : 8. Bowling : $15. Runs_scored : 8. Count : 8. ; datalines; 39,1,Ali,1,1 39,2,Ali,1,2 39,3,Ali,2,3 39,4,Ali,1,4 39,5,Ali,1,5 39,6,Ali,1,6 36,1,Anderson,1,1 36,2,Anderson,1,2 36,3,Anderson,1,3 36,4,Anderson,0,4 36,6,Anderson,0,6 ; run; Selecting the distinct overs(as I understand Cricket, each over would...

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

If you just want to keep, say, the first row of treat for each value of ID, then you can use slice: DATA_clean <- treat %>% group_by(ID) %>% slice(1) Your original code didn't work because n() returns to the total number of rows for each value of ID. If every...

r,loops,data.frame,pattern-matching,subset

Your question boils down to searching for sequences of "ABC" within the sequences of the IDs: (matches <- gregexpr("ABC", paste(dat$ID, collapse=""))[[1]]) # [1] 8 # ... This indicates that the only match begins at row 8. You now know that the information for Sensor1 are at rows numbered matches, the...

Perhaps my question was not formulated correctly, but this post had the solutions I was essentially looking for: http://stackoverflow.com/a/123481/2966951 http://stackoverflow.com/a/121435/2966951 Filtering out the most recent row was my problem. I was surprised that selecting from a subquery with a max value could yield anything other than that value....

r,matrix,filter,data.frame,subset

Try this: (I suspect will be faster than any apply approach) mat[ rowSums(mat == mat[,1])!=ncol(mat) , ] # ---with your object--- [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 ...

this should work: library(dplyr) inner_join(dfA, dfB) %>% anti_join(dfC) which gives: Efficiency Value 1 8 7 2 2 4 ...

You can try df1[duplicated(df1)|duplicated(df1, fromLast=TRUE),] # A B #2 1A 2 #3 1A 2 #5 2 4 #6 2 4 #7 3A 0 #8 3A 0 #9 4A 1 #10 4A 1 data df1 <- structure(list(A = c("1", "1A", "1A", "2", "2", "2", "3A", "3A", "4A", "4A", "5"), B =...

python,algorithm,python-3.x,set,subset

Solution: def partitions(A): if not A: yield [] else: a, *R = A for partition in partitions(R): yield partition + [[a]] for i, subset in enumerate(partition): yield partition[:i] + [subset + [a]] + partition[i+1:] Explanation: The empty set only has the empty partition. For a non-empty set, take out one...

python,list,filtering,subset,subsetting

You need realize methods hash and eq on object class A: def __init__(self, a): self.attr1 = a def __hash__(self): return hash(self.attr1) def __eq__(self, other): return self.attr1 == other.attr1 def __repr__(self): return str(self.attr1) Example: l = [A(5), A(4), A(4)] print list(set(l)) print list(set(l))[0].__class__ # ==> __main__.A. It's a object of class...

Try this : subset(raw_data,eval(parse(text=keep_rows))) Test : keep_rows <- "Blok>1" raw_data<- data.frame(Blok=c(1,2,3,0)) subset(raw_data,eval(parse(text=keep_rows))) Blok 2 2 3 3 ...

you can read about argument drop in the help page: ?'[' M[which(rownames(M) != "A"), ,drop=FALSE] ...

#one way is to use `filter` from `dplyr` package (and assuming Date is already in Date format) library(dplyr) wg %>% filter(year %in% c(2008,2009) & months(Date) %in% c("July","August","September") #If you want to stick to subset, replace second & with |: subset(wg, date >= "2008-07-01" & date <= "2008-09-30" | date >=...

If there are lagging/leading spaces, this could occur. Remove those and it should work. library(stringr) data[,5] <- str_trim(data[,5]) Or data[,5] <- gsub('^\\s+|\\s+$', '', data[,5]) data[data[,5]=='Y',] Another option without removing the spaces would be grep data[grep('\\bY\\b', data[,5]),] ...

You could try something like this: select user_id from user_assets where asset_id = 1 or asset_id = 2 ... group by user_id having count(distinct asset_id) = (number of assets you are looking for) demo here showing your required output. the distinct isn't necessary if (user_id, asset_id) is a unique key...

You'll want to use something like the following: new_data <- Data[sample(nrow(Data), N, prob = (1 - Data$Prob), replace = F),] ...

If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the...

This should work: dfOutput <- dfInput[apply(dfInput[,3:19]>0.00000001 & dfInput[,3:19]<0.300, 1, all, na.rm=TRUE), ] And now for a reproducible example, I'll explain what is going on: # data df <- data.frame(x = c(1:3, NA, 3:1), y=c(NA, NA, NA, 3, 3, 2, 3)) # this returns a matrix! df[, 1:2] > 2 #...

This is not a subsetting issue, it's a formatting/presentation issue. You're in the first circle of Burns's R Inferno ("[i]f you are using R and you think you’re in hell, this is a map for you"): another aspect of virtuous pagan beliefs—what is printed is all that there is If...

r,data.frame,subset,linear-regression

First, you might want to write a function that can calculate the slope for three consecutive values, like this: slope <- function(x){ if(all(is.na(x))) # if x is all missing, then lm will throw an error that we want to avoid return(NA) else return(coef(lm(I(1:3)~x))[2]) } Then you can use the apply()...

I might do as following. Only end dates seem to be necessary as start dates are just 1 year before. Loop is made using lapply() which iterates over all end dates. Subsetting is done mainly with difftime() by filtering any non-zero time difference between the two dates. set.seed(24) df1 <-...

library(ggplot2) diamonds %>% group_by(clarity) %>% summarise(mean_price = mean(price) , min_price =min(price) ,max_price = max(price) , median_price = median(as.numeric(price)), count = n()) %>% arrange(clarity) for arranging in descending order use arrange(desc(clarity)) instead of arrange(clarity)...

Short answer is: do not use subset but something like employ.data[employ.data[salary_string]>23000,] ...

It seems to me that almost all of your questions are regarding a list of data frames with same columns which cause you to use lapply loops on every single operation (which seem highly inefficient). Alternatively, you could vectorize most of your operations by simply binding all the lists into...

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

If you melt your data.table to long format, this is easy: library(reshape2) news1 <- melt(news, id.vars = "ID") news2 <- news1[abs(value) > 0.01,] # ID variable value #1: 8 diff.jan 0.101 #2: 202 diff.apr 10.000 #3: 203 diff.apr 11.000 #4: 50 diff.aug 0.221 dcast.data.table(news2, ID ~ variable) # ID diff.jan...

Use mongo aggregation like following : First use $unwind this will unwind stuff and then use $match to find elements greater than 4. After that $group data based on things.name and add required fields in $project. The query will be as following: db.collection.aggregate([ { $unwind: "$things" }, { $unwind: "$things.stuff"...

There is no problem. Look at nrow(Sin). You should see that is has fewer rows after subsetting. The first column in the output is the "row name". It is not a cumulative index that tells you how many rows there are. Row names are preserved after subsetting (ie they will...

You can use %in% instead of == subset(data, x %in% 1:3) In general, if we are comparing two vectors of unequal sizes, %in% would be used. There are cases where we can take advantage of the recycling (it can fail too) if the length of one of the vector is...

You can use top_n from dplyr to select the 'n` top rows library(dplyr) top_n(AB, 4, BETAdn) Or use order from base R and then subset the top 'n' rows AB[order(-AB$BETAdn),][1:4,] ...

You can solve this by sorting first: import operator ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]] sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True) sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0)) filtered = [] i,j = 0,0 while i < len(sorted_ranges): filtered.append(sorted_ranges[i]) j = i+1 while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]: print "Remove " ,...