This question was embedded in another question I asked here, but as it goes beyond the scope of what I wanted to know in the initial inquiry, I thought it might deserve a separate thread.

I've been trying to come up with a solution for this problem based on the answers I have received here and here using `dplyr`

and the functions written by Khashaa and Jaap.

Using the solutions provided to me (especially from Jaap), I have been able to summarize the raw data I received into a matrix-looking data table

```
dput(SO_Example_v1)
structure(list(Type = structure(c(3L, 1L, 2L), .Label = c("Community",
"Contaminant", "Healthcare"), class = "factor"), hosp1_WoundAssocType = c(464L,
285L, 24L), hosp1_BloodAssocType = c(73L, 40L, 26L), hosp1_UrineAssocType = c(75L,
37L, 18L), hosp1_RespAssocType = c(137L, 77L, 2L), hosp1_CathAssocType = c(80L,
34L, 24L), hosp2_WoundAssocType = c(171L, 115L, 17L), hosp2_BloodAssocType = c(127L,
62L, 12L), hosp2_UrineAssocType = c(50L, 29L, 6L), hosp2_RespAssocType = c(135L,
142L, 6L), hosp2_CathAssocType = c(95L, 24L, 12L)), .Names = c("Type",
"hosp1_WoundAssocType", "hosp1_BloodAssocType", "hosp1_UrineAssocType",
"hosp1_RespAssocType", "hosp1_CathAssocType", "hosp2_WoundAssocType",
"hosp2_BloodAssocType", "hosp2_UrineAssocType", "hosp2_RespAssocType",
"hosp2_CathAssocType"), class = "data.frame", row.names = c(NA,
-3L))
```

Which looks as follows

```
require(dplyr)
df <- tbl_df(SO_Example_v1)
head(df)
Type hosp1_WoundAssocType hosp1_BloodAssocType hosp1_UrineAssocType
1 Healthcare 464 73 75
2 Community 285 40 37
3 Contaminant 24 26 18
Variables not shown: hosp1_RespAssocType (int), hosp1_CathAssocType (int), hosp2_WoundAssocType
(int), hosp2_BloodAssocType (int), hosp2_UrineAssocType (int), hosp2_RespAssocType (int),
hosp2_CathAssocType (int)
```

The column `Type`

is the type of bacteria, the following columns represent where they were cultured. The digits represent the number of times the respective type of bacteria were detected.

I know what my final table should look like, but until now I have been doing it step by step for each comparison and variable and there must undoubtedly be a way to do this by piping multiple functions in `dplyr`

- but alas, I have not found the answer on SO to this.

Example of what final table should look like

```
Wound
Type n Hospital 1 (%) n Hospital 2 (%) p-val
Healthcare associated bacteria 464 (60.0) 171 (56.4) 0.28
Community associated bacteria 285 (36.9) 115 (38.0) 0.74
Contaminants 24 (3.1) 17 (5.6) 0.05
```

Where the first grouping variable "Wound" is then subsequently replaced by "Urine", "Respiratory", ... and then there's a final column termed "All/Total", which is the total number of times each variable in the rows of "Type" was found and summarized across Hospital 1 and 2 and then compared.

What I have done until now is the following and very tedious, as it's calculated "by hand" and then I poulate the table with all of the results manually.

```
### Wound cultures & healthcare associated (extracted manually)
# hosp1 464 (yes), 309 (no), 773 wound isolates in total; (% = 464 / 309 * 100)
# hosp2 171 (yes), 132 (no), 303 would isolates in total; (% = 171 / 303 * 100)
### Then the chisq.test of my contingency table
chisq.test(cbind(c(464,309),c(171,132)),correct=FALSE)
```

I appreciate that if I run a piped `dplyr`

on the raw data.frame I won't be able to get the exact formatting of my desired table, but there must be a way to at least automate all the steps here and put the results together in a final table that I can export as a .csv file and then just do some final column editing etc.?

Any help is greatly appreciated.