Menu
  • HOME
  • TAGS

MySQL count all items, aggregate less than as others

mysql,pie-chart,summarization

SELECT IF(total > 50, category, 'Others') AS category, SUM(total) AS total FROM (SELECT category, COUNT(*) AS total FROM table_name GROUP BY category) AS subquery GROUP BY category ...

Array summary ( working code), not grasping a line of code.

java,arrays,sorting,summarization

The indices of frequency correspond to the number being counted, and the value at one of those indices is the frequency of that number. It works because the maximum number in responses is 10, and the length of frequency is 11, meaning that 10 is a valid index into frequency...

Avoiding missing row after summarise

r,dplyr,summarization

You could try left_join library(dplyr) left_join(expand.grid(type=unique(df$type), day=unique(df$day)), df1) %>% group_by(day, type) %>% summarise(sum=sum(value, na.rm=TRUE)) # day type sum #1 1 a 0.0000000 #2 1 b 0.5132914 #3 2 a 1.2482210 #4 2 b 0.9232343 #5 3 a 2.0381779 #6 3 b 0.7558351 where df1 is df1 <- df[day != 1...

R Summarizing data.frame with in last row with characters

r,summarization

I agree with the above comment that sticking them at the end of your data frame doesn't seem like a good idea. Anyway, you could take this opportunity to expand your R-pertoire with rapply str(iris) # 'data.frame': 150 obs. of 5 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6...

summarise by group of columns using min and maintaing row number

r,position,min,summarization

If I understand correct and since you're already using dplyr, you could do it like this: library(dplyr); library(tidyr) unite(df, X, ID1:ID2, sep = ".") %>% mutate(Position = row_number()) %>% group_by(X) %>% slice(which.min(value)) #Source: local data frame [4 x 3] #Groups: X # # X value Position #1 1.1 1 1...

Return most frequent string value for each group

r,summarization

The key is to start grouping by both a and b to compute the frequencies and then take only the most frequent per group of a, for example like this: df %>% count(a, b) %>% slice(which.max(n)) Source: local data frame [2 x 3] Groups: a a b n 1 1...

Summarize variable for different time periods and by group using ddply

r,plyr,summarization

DF <- read.table(text=" Client Q Sales Date A 2 30 01/01/2014 A 3 24 02/01/2014 A 1 10 03/01/2014 B 4 10 01/01/2014 B 1 20 02/01/2014 B 3 30 03/01/2014", header=TRUE) library(plyr) ddply(DF, .(Client), summarise, Q = sum(Q), `Sales03/01/2014` = Sales[Date=="03/01/2014"], Sales = sum(Sales)) # Client Q Sales03/01/2014 Sales...

Opensource Multi Document Summarization

nlp,summarization

MEAD from the University of Michigan (and others) is available as public domain software, and according to the documentation, it supports multi-document summarization. http://www.summarization.com/mead/ However, you might be better off implementing your own solution. A simple approach could be: Concatenate all of the documents into a single file Use TF-IDF...

Summaring data in R

r,matrix,summarization

You may try library(ggplot2) library(dplyr) library(tidyr) gather(df1, Var, Val, -TRA) %>% group_by(TRA, Var) %>% summarise(Mean=mean(Val), SD=sd(Val)) %>% ggplot(., aes(x=TRA, y=Mean, fill=Var))+ geom_bar(position=position_dodge(), stat='identity')+ geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), width=.2, position=position_dodge(.9)) data df1 <- structure(list(TRA = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c"), PRO1 = c(83L,...

Aggregating Categorical Variable Values in a single variable in R

r,plyr,summarization

Here is a quick solution using data.table package: Step1: Create the data.table library(data.table) DT <- data.table( BillN=c('B1','B1','B1','B1','B2','B2','B2','B2','B3','B3','B3','B3'), Item_Name=c('Prod A','Prod B','Prod C','Prod D','Prod A','Prod B','Prod C','Prod D','Prod A','Prod B','Prod C','Prod D'), # going on to Product(n) Quantity=c(1,2,1,2,1,2,1,1,1,2,1,1) ) Step2: Set appropriate key: setkey(DT,BillN) Step3: Make sure that the string vector Item_Name...