Skip to content Skip to sidebar Skip to footer

Count The Number Occurrences Of Each Word In A Text - Python

I know that I can find a word in a text/array with this: if word in text: print 'success' What I want to do is read a word in a text, and keep counting as many times as the wo

Solution 1:

Now that we established what you're trying to achieve, I can give you an answer. Now the first thing you need to do is convert the text into a list of words. While the split method might look like a good solution, it will create a problem in the actual counting when sentences end with a word, followed by a full stop, commas or any other characters. So a good solution for this problem would be NLTK. Assume that the text you have is stored in a variable called text. The code you are looking for would look something like this:

from itertools import chain
from collections import Counter
from nltk.tokenize import sent_tokenize, word_tokenize

text = "This is an example text. Let us use two sentences, so that it is more logical."
wordlist = list(chain(*[word_tokenize(s) for s in sent_tokenize(text)]))
print(Counter(wordlist))
# Counter({'.': 2, 'is': 2, 'us': 1, 'more': 1, ',': 1, 'sentences': 1, 'so': 1, 'This': 1, 'an': 1, 'two': 1, 'it': 1, 'example': 1, 'text': 1, 'logical': 1, 'Let': 1, 'that': 1, 'use': 1})

Solution 2:

sentence = 'a quick brown fox jumped a another fox'words = sentence.split(' ')

solution 1:

result = {i:words.count(i) for i in set(words)}

solution 2:

result = {}    
for word in words:                                                                                                                                                                                               
    result[word] = result.get(word, 0) + 1

solution 3:

from collections import Counter    
result = dict(Counter(words))

Solution 3:

I would use one of these methods:

1) If the word doesn't contain spaces, but the text does, use

for piece in text.split(" "):
   ...

Then your word should occur at most once in each piece, and be counted correctly. This fails if you for example want to count "Baden" twice in "Baden-Baden".

2) Use the string method 'find' to get not only whether the word is there, but where it is. Count it, and then continue searching from beyond that point. text.find(word) returns either a position, or -1.

Solution 4:

What I understand is that you want to keep words already read so as you can detect if you encounter a new word. Is that OK ? The easiest solution for that is to use a set, as it automatically removes duplicates. For instance:

known_words = set()
for word in text:
    if word not in known_words:
        print'found new word:', word
    known_word.add(word)

On the other hand, if you need the exact number of occurrences for each word (this is called "histogram" in maths), you have to replace the set by a dictionary:

histo = {}
for word in text:
    histo[word] = histo.get(word, 0) + 1
print histo

Note: In both solutions, I suppose that text contains an iterable structure of words. As said by other comments, str.split() is not totally safe for this.

Solution 5:

Several options can be used but I suggest you do the following :

  • Replace special characters in your text in order to uniformize it.
  • Split the cleared sentence.
  • Use collections.Counter

And the code will look like...

from collections importCountermy_text="Lorem ipsum; dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut. labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

special_characters = ',.;'forchar in special_characters:
    my_text = my_text.replace(char, ' ')

print Counter(my_text.split())

I believe the safer approach would be to use the answer with NLTK, but sometimes, understanding what you are doing feels great.

Post a Comment for "Count The Number Occurrences Of Each Word In A Text - Python"