Skip to content Skip to sidebar Skip to footer

How To Create A Frequency Matrix?

I just started using Python and I just came across the following problem: Imagine I have the following list of lists: list = [['Word1','Word2','Word2','Word4566'],['Word2', 'Word3

Solution 1:

As far as I understand you want to create a matrix that shows the number of lists where two words are located together for each pair of words.

First of all we should fix the set of unique words:

lst = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...] # list is a reserved word in python, don't use it as a name of variables

words = set()
for sublst in lst:
    words |= set(sublst)
words = list(words)

Second we should define a matrix with zeros:

result = [[0] * len(words)] * len(words) # zeros matrix N x N

And finally we fill the matrix going through the given list:

for sublst in lst:
    sublst = list(set(sublst)) # selecting unique words only
    for i in xrange(len(sublst)):
        for j in xrange(i + 1, len(sublst)):
            index1 = words.index(sublst[i])
            index2 = words.index(sublst[j])
            result[index1][index2] += 1
            result[index2][index1] += 1

print result

Solution 2:

I find it really hard to understand what you're really asking for, but I'll try by making some assumptions:

  • (1) You have a list (A), containing other lists (b) of multiple words (w).
  • (2) For each b-list in A-list
    • (3) For each w in b:
      • (3.1) count the total number of appearances of w in all of the b-lists
      • (3.2) count how many of the b-lists, in which w appears only once

If these assumptions are correct, then the table doesn't correspond correctly to the list you've provided. If my assumptions are wrong, then I still believe my solution may give you inspiration or some ideas on how to solve it correctly. Finally, I do not claim my solution to be optimal with respect to speed or similar.

OBS!! I use python's built-in dictionaries, which may become terribly slow if you intend to fill them with thousands of words!! Have a look at: https://docs.python.org/2/tutorial/datastructures.html#dictionaries

    frq_dict = {} # num of appearances / frequency
    uqe_dict = {} # uniquefor list_b in list_A:
            temp_dict = {}
            for word in list_b:
                    if( word in temp_dict ):
                            temp_dict[word]+=1
                    else:
                            temp_dict[word]=1

            # frq is the number of appearances for word, frq in temp_dict.iteritems(): 
                    if( frq > 1 ):
                            if( word in frq_dict )
                                    frq_dict[word] += frq
                            else
                                    frq_dict[word] = frq
                    else:
                            if( word in uqe_dict )
                                    uqe_dict[word] += 1
                            else
                                    uqe_dict[word] = 1

Solution 3:

I managed to come up with the right answer to my own question:

list = [["Word1","Word2","Word2"],["Word2", "Word3", "Word4"],["Word2","Word3"]]

#Names of all dicts
all_words = sorted(set([w for sublist inlistfor w in sublist]))

#Creating the dicts
dicts = []
for i in all_words:
    dicts.append([i, dict.fromkeys([w for w in all_words if w != i],0)])

#Updating the dictsfor l inlist:
    for word insorted(set(l)):
        tmpL = [w for w in l if w != word]
        ind = ([w[0] for w in dicts].index(word))

        for w in dicts[ind][1]:
            dicts[ind][1][w] += l.count(w)

print dicts

Gets the result:

['Word1', {'Word4': 0, 'Word3': 0, 'Word2': 2}], ['Word2', {'Word4': 1, 'Word1': 1, 'Word3': 2}], ['Word3', {'Word4': 1, 'Word1': 0, 'Word2': 2}], ['Word4', {'Word1': 0, 'Word3': 1, 'Word2': 1}]]

Post a Comment for "How To Create A Frequency Matrix?"