SELECT IF(total > 50, category, 'Others') AS category, SUM(total) AS total FROM (SELECT category, COUNT(*) AS total FROM table_name GROUP BY category) AS subquery GROUP BY category ...

java,arrays,sorting,summarization

The indices of frequency correspond to the number being counted, and the value at one of those indices is the frequency of that number. It works because the maximum number in responses is 10, and the length of frequency is 11, meaning that 10 is a valid index into frequency...

You could try left_join library(dplyr) left_join(expand.grid(type=unique(df$type), day=unique(df$day)), df1) %>% group_by(day, type) %>% summarise(sum=sum(value, na.rm=TRUE)) # day type sum #1 1 a 0.0000000 #2 1 b 0.5132914 #3 2 a 1.2482210 #4 2 b 0.9232343 #5 3 a 2.0381779 #6 3 b 0.7558351 where df1 is df1 <- df[day != 1...

I agree with the above comment that sticking them at the end of your data frame doesn't seem like a good idea. Anyway, you could take this opportunity to expand your R-pertoire with rapply str(iris) # 'data.frame': 150 obs. of 5 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6...

If I understand correct and since you're already using dplyr, you could do it like this: library(dplyr); library(tidyr) unite(df, X, ID1:ID2, sep = ".") %>% mutate(Position = row_number()) %>% group_by(X) %>% slice(which.min(value)) #Source: local data frame [4 x 3] #Groups: X # # X value Position #1 1.1 1 1...

The key is to start grouping by both a and b to compute the frequencies and then take only the most frequent per group of a, for example like this: df %>% count(a, b) %>% slice(which.max(n)) Source: local data frame [2 x 3] Groups: a a b n 1 1...

DF <- read.table(text=" Client Q Sales Date A 2 30 01/01/2014 A 3 24 02/01/2014 A 1 10 03/01/2014 B 4 10 01/01/2014 B 1 20 02/01/2014 B 3 30 03/01/2014", header=TRUE) library(plyr) ddply(DF, .(Client), summarise, Q = sum(Q), `Sales03/01/2014` = Sales[Date=="03/01/2014"], Sales = sum(Sales)) # Client Q Sales03/01/2014 Sales...

MEAD from the University of Michigan (and others) is available as public domain software, and according to the documentation, it supports multi-document summarization. http://www.summarization.com/mead/ However, you might be better off implementing your own solution. A simple approach could be: Concatenate all of the documents into a single file Use TF-IDF...

You may try library(ggplot2) library(dplyr) library(tidyr) gather(df1, Var, Val, -TRA) %>% group_by(TRA, Var) %>% summarise(Mean=mean(Val), SD=sd(Val)) %>% ggplot(., aes(x=TRA, y=Mean, fill=Var))+ geom_bar(position=position_dodge(), stat='identity')+ geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), width=.2, position=position_dodge(.9)) data df1 <- structure(list(TRA = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c"), PRO1 = c(83L,...

Here is a quick solution using data.table package: Step1: Create the data.table library(data.table) DT <- data.table( BillN=c('B1','B1','B1','B1','B2','B2','B2','B2','B3','B3','B3','B3'), Item_Name=c('Prod A','Prod B','Prod C','Prod D','Prod A','Prod B','Prod C','Prod D','Prod A','Prod B','Prod C','Prod D'), # going on to Product(n) Quantity=c(1,2,1,2,1,2,1,1,1,2,1,1) ) Step2: Set appropriate key: setkey(DT,BillN) Step3: Make sure that the string vector Item_Name...