I am trying to calculate statistical parameters phi coefficient, Cramer's V and Contigency Coefficient using Rpy module of python. In R I am able to do so but I am at my wits end in my attempts to replicate the same in python
Library(vcd) data <- read.csv("test.csv") assocstats(table(data$var_4, data$target) Output X^2 df P(> X^2) Likelihood Ratio 113.28 1 0 Pearson 112.51 1 0 Phi-Coefficient : 0.15 Contingency Coeff.: 0.148 Cramer's V : 0.15
Implementation in python
from Rpy import r # Already connected with mysql q="Select var_4 , target from test" cur.execute(q) data=cur.fetchall() ls1= ls2= for i in range(len(data)): ls1.append(data[i]) ls2.append(data[i]) rpy.r.library("vcd") rpy.r.assocstats(rpy.r.table(ls1,ls2))
Traceback (most recent call last): File "<pyshell#14>", line 1, in <module> rpy.r.assocstats(rpy.r.table(ls1,ls2)) RPy_RException: Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?
The other way I am trying is to calculate the phi sq from scipy module and then use the mathematical formula to calculate cramer's v etc. But I intend to use Rpy heavily in my project going forward.I would really appreciate I you can point out the problem in above approach . I think I am not able to pass on the input in proper format in the formula Thanks in Advance