Skip to content Skip to sidebar Skip to footer

Python Finding Most Common Pattern In List Of Strings

I have a large list of API calls stored as strings, which have been stripped of all common syntax('htttp://', '.com', '.', etc..) I would like to return a dictionary of the most c

Solution 1:

Use Collections.Counter, then split by dot afterall use dict comprehension-

>>>from collections import Counter>>>calls = ['admob.api.oauthcert', 'admob.api.newsession', 'admob.endusercampaign']>>>l = '.'.join(calls).split(".")>>>d = Counter(l)>>>{k:v for k,v in d.most_common(3) }>>>{'admob': 3, 'api': 2}>>>{k:v for k,v in d.most_common(4) }>>>{'admob': 3, 'api': 2, 'newsession': 1, 'oauthcert': 1}

Or

>>>import re
>>>from collections import Counter
>>>d =  re.findall(r'\w+',"['admob.api.oauthcert', 'admob.api.newsession', 'admob.endusercampaign']")
>>>{k:v for k,v in Counter(d).most_common(2)}
>>>[('mob', 3), ('admob', 3), ('api', 2)]

Or

>>>from collections import Counter
>>>import re
>>>s= "['admobapioauthcert', 'admobapinewsession', 'admobendusercampaign']"
>>>w=[i for sb in re.findall(r'(?=(mob)|(api)|(admob))',s) for i in sb ]#Change (mob)|(api)|(admob) what you want
>>>{k:v for k,v in Counter(filter(bool, w)).most_common()}
>>>{'mob': 3, 'admob': 3, 'api': 2}

Solution 2:

Is this what you'd you wanted. Its gives the common patterns of strings after splitting on a dot.

calls = ['admob.api.oauthcert', 'admob.api.newsession', 'admob.endusercampaign']
from collections import Counter
Counter(reduce(lambda x,y: x+y,map (lambda x : x.split("."),calls))).most_common(2)

O/P: [('admob', 3), ('api', 2)]

filter(lambda x: x[1]>1 ,Counter(reduce(lambda x,y: x+y,map (lambda x : x.split("."),calls))).most_common())

Update : I dont know if this would work for you:

calls = ['admobapioauthcert', 'admobapinewsession', 'admobendusercamp']
filter(lambda x : x[1]>1andlen(x[0])>2,Counter(reduce(lambda x,y:x + y,reduce(lambda x,y: x+y, map(lambda z :map(lambda x : map(lambda g: z[g:x+1],range(len(z[:x+1]))),range(len(z))),calls)))).most_common())

O/P:

[('admo', 3), ('admob', 3), ('adm', 3), ('mob', 3), ('dmob', 3), ('dmo', 3), ('bapi', 2), ('dmobapi', 2), ('dmoba', 2), ('api', 2), ('obapi', 2), ('admobap', 2), ('admoba', 2), ('mobap', 2), ('dmobap', 2), ('bap', 2), ('mobapi', 2), ('moba', 2), ('obap', 2), ('oba', 2), ('admobapi', \
2)]

Post a Comment for "Python Finding Most Common Pattern In List Of Strings"