Python, Dictionaries, And Chi-square Contingency Table
This is a problem I've been racking my brains on for a long time, so any help would be great. I have a file which contains several lines in the following format (word, time that th
Solution 1:
Your 4 numbers for apple/1 add up to 12, more than the total number of observations (11)! There are only 5 documents outside time '1' that don't contain the word 'apple'.
You need to partition the observations into 4 disjoint subsets: a: apple and 1 => 3 b: not-apple and 1 => 2 c: apple and not-1 => 1 d: not-apple and not-1 => 5
Here is some code that shows one way of doing it:
from collections import defaultdict
classCrosstab(object):
def__init__(self):
self.count = defaultdict(lambda: defaultdict(int))
self.row_tot = defaultdict(int)
self.col_tot = defaultdict(int)
self.grand_tot = 0defadd(self, r, c, n):
self.count[r][c] += n
self.row_tot[r] += n
self.col_tot[c] += n
self.grand_tot += n
defload_data(line_iterator, conv_funcs):
ct = Crosstab()
for line in line_iterator:
r, c, n = [func(s) for func, s inzip(conv_funcs, line.split(','))]
ct.add(r, c, n)
return ct
defdisplay_all_2x2_tables(crosstab):
for rx in crosstab.row_tot:
for cx in crosstab.col_tot:
a = crosstab.count[rx][cx]
b = crosstab.col_tot[cx] - a
c = crosstab.row_tot[rx] - a
d = crosstab.grand_tot - a - b - c
assertall(x >= 0for x in (a, b, c, d))
print",".join(str(x) for x in (rx, cx, a, b, c, d))
if __name__ == "__main__":
# inputfile# <word, time, frequency>
lines = """\
apple, 1, 3
banana, 1, 2
apple, 2, 1
banana, 2, 4
orange, 3, 1""".splitlines()
ct = load_data(lines, (str.strip, int, int))
display_all_2x2_tables(ct)
and here is the output:
orange,1,0,5,1,5
orange,2,0,5,1,5
orange,3,1,0,0,10
apple,1,3,2,1,5
apple,2,1,4,3,3
apple,3,0,1,4,6
banana,1,2,3,4,2
banana,2,4,1,2,4
banana,3,0,1,6,4
Post a Comment for "Python, Dictionaries, And Chi-square Contingency Table"