Skip to content Skip to sidebar Skip to footer

How Do I Count All Occurrences Of A Phrase In A Text File Using Regular Expressions?

I am reading in multiple files from a directory and attempting to find how many times a specific phrase (in this instance 'at least') occurs in each file (not just that it occurs,

Solution 1:

You can get rid of the regex entirely, the count-method of string objects is enough, much of the other code can be simplified as well.

You're also not changing data to lower case, just printing the string as lower case, note how I use data = data.lower() to actually change the variable.

Try this code:

import glob
import os

path = 'c:\script\lab\Tests'

k = 0

substring = ' at least '
for filename in glob.glob(os.path.join(path, '*.txt')):
    if filename.endswith('.txt'):
        f = open(filename)
        data = f.read()
        data = data.lower()
        S= data.count(substring)
        if S:
            k= k + 1
            print("'{}' match".format(filename), S)
        else:
            print("'{}' no match".format(filename))
print("Total number of matches", k)

If anything is unclear feel free to ask!


Solution 2:

You make multiple mistakes in your code. data.split() and data.lower() have no effect at all, since the both do not modifiy data but return a modified version. However, you don't assign the return value to anything, so it is lost. Also, you should always close a resource (e.g. a file) when you don't need it anymore.

Also, you append every string you find using re.search to a list S, which you dont use for anything anymore. It would also be pointless, because it would just contain the string you are looking for x amount of time. You can just take the list that is returned by re.search and comupute its length. This gives you the number of times it occurs in the text. Then you just increase your counter variable k by that amount and move on to the next file. You can still have your print statements by simply printing the temporary num_found variable.

import re
import glob
import os

path = 'D:/Test'

k = 0

for filename in glob.glob(os.path.join(path, '*.txt')):
    if filename.endswith('.txt'):
        f = open(filename)
        text = f.read()
        f.close()
        num_found = len(re.findall(r' at least ', data, re.MULTILINE))
        k += num_found        

Post a Comment for "How Do I Count All Occurrences Of A Phrase In A Text File Using Regular Expressions?"