How To Create A Frequency Matrix?
Solution 1:
As far as I understand you want to create a matrix that shows the number of lists where two words are located together for each pair of words.
First of all we should fix the set of unique words:
lst = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...] # list is a reserved word in python, don't use it as a name of variables
words = set()
for sublst in lst:
words |= set(sublst)
words = list(words)
Second we should define a matrix with zeros:
result = [[0] * len(words)] * len(words) # zeros matrix N x N
And finally we fill the matrix going through the given list:
for sublst in lst:
sublst = list(set(sublst)) # selecting unique words only
for i in xrange(len(sublst)):
for j in xrange(i + 1, len(sublst)):
index1 = words.index(sublst[i])
index2 = words.index(sublst[j])
result[index1][index2] += 1
result[index2][index1] += 1
print result
Solution 2:
I find it really hard to understand what you're really asking for, but I'll try by making some assumptions:
- (1) You have a list (A), containing other lists (b) of multiple words (w).
- (2) For each b-list in A-list
- (3) For each w in b:
- (3.1) count the total number of appearances of w in all of the b-lists
- (3.2) count how many of the b-lists, in which w appears only once
- (3) For each w in b:
If these assumptions are correct, then the table doesn't correspond correctly to the list you've provided. If my assumptions are wrong, then I still believe my solution may give you inspiration or some ideas on how to solve it correctly. Finally, I do not claim my solution to be optimal with respect to speed or similar.
OBS!! I use python's built-in dictionaries, which may become terribly slow if you intend to fill them with thousands of words!! Have a look at:
frq_dict = {} # num of appearances / frequency
uqe_dict = {} # uniquefor list_b in list_A:
temp_dict = {}
for word in list_b:
if( word in temp_dict ):
# frq is the number of appearances for word, frq in temp_dict.iteritems():
if( frq > 1 ):
if( word in frq_dict )
frq_dict[word] += frq
frq_dict[word] = frq
if( word in uqe_dict )
uqe_dict[word] += 1
uqe_dict[word] = 1
Solution 3:
I managed to come up with the right answer to my own question:
list = [["Word1","Word2","Word2"],["Word2", "Word3", "Word4"],["Word2","Word3"]]
#Names of all dicts
all_words = sorted(set([w for sublist inlistfor w in sublist]))
#Creating the dicts
dicts = []
for i in all_words:
dicts.append([i, dict.fromkeys([w for w in all_words if w != i],0)])
#Updating the dictsfor l inlist:
for word insorted(set(l)):
tmpL = [w for w in l if w != word]
ind = ([w[0] for w in dicts].index(word))
for w in dicts[ind][1]:
dicts[ind][1][w] += l.count(w)
print dicts
Gets the result:
['Word1', {'Word4': 0, 'Word3': 0, 'Word2': 2}], ['Word2', {'Word4': 1, 'Word1': 1, 'Word3': 2}], ['Word3', {'Word4': 1, 'Word1': 0, 'Word2': 2}], ['Word4', {'Word1': 0, 'Word3': 1, 'Word2': 1}]]
