Hashtag Counter Python

October 29, 2022 Post a Comment

I am a python beginner. For an exercise, I have to write a python function that will scan a list of strings, counts how many times a hashtag appears and puts this into a dictionary

Solution 1:

Fundamentally, your function doesn’t work because this line

hash_index = post_string.find(char)

Will always find the index of the first hash tag in the string. This could be fixed by providing a start index to str.find, or, better, by not calling str.find at all and instead maintaining the index when iterating over the string (you can use enumerate for that). Better yet, don’t use an index, you don’t need it if you restructure your parser to use a state machine.

That said, a Pythonic implementation would replace the whole function with a regular expression, which would make it drastically shorter, correct, more readable, and likely more efficient.

Solution 2:

This should work:

import string
alpha = string.ascii_letters + string.digits

def analyze(posts):
    hashtag_dict = {}

    for post in posts:
        for i in post.split():
            if i[0] == '#':
                current_hashtag = sanitize(i[1:])

                if len(current_hashtag) > 0:
                    if current_hashtag in hashtag_dict:
                        hashtag_dict[current_hashtag] += 1
                    else:
                        hashtag_dict[current_hashtag] = 1

    return hashtag_dict


def sanitize(s):
    s2 = ''
    for i in s:
        if i in alpha:
            s2 += i
        else:
            break
    return s2


posts = [
        "hi #weekend",
        "good morning #zurich #limmat",
        "spend my #weekend in #zurich",
        "#zurich <3",
        "#lindehof4Ever(lol)"
        ]

print(analyze(posts))

Solution 3:

With your help, I managed to get 2.75 points out of 4. Thanks a lot! I didn't copy-paste any of your solutions into the correction tool, I used my own version that I tried to improve with your suggestions. (I am sure if I posted any of your solutions I would've gotten 4/4.)

According to them, the official solution would have been:

def analyze(posts):
tags = {}

for post in posts:
    curHashtag = None
    for c in post:
        is_allowed_char = c.isalnum()

        if curHashtag != None and not is_allowed_char:
            if len(curHashtag) > 0 and not curHashtag[0].isdigit():
                if curHashtag in tags.keys():
                    tags[curHashtag] += 1
                else:
                    tags[curHashtag] = 1
            curHashtag = None

        if c == "#":
            curHashtag = ""
            continue

        if c.isalnum() and curHashtag != None:
            curHashtag += c

    if curHashtag != None:
        if len(curHashtag) > 0 and not curHashtag[0].isdigit():
            if curHashtag in tags.keys():
                tags[curHashtag] += 1
            else:
                tags[curHashtag] = 1

return tags

This is of course not an elegant solution, but a solution using exclusively what we have learned so far. Maybe this helps another beginner, who wants to use the tools they have to solve this exercise.

Baca Juga

Solution 4:

Well,

this task can be done with regexes, don't be afraid to use them ;) Some quick solution.

#!/usr/bin/python3.4
import re

PATTERN = re.compile(r'#(\w+)')
posts = [
    "hi #weekend",
    "good morning #zurich #limmat",
    "spend my #weekend in #zurich",
    "#zurich <3"]

container = {}
for post in posts:
    for element in PATTERN.findall(elements):
        container[element] = container.get(element, 0) + 1
print(container)

Result:

{'zurich': 3, 'limmat': 1, 'weekend': 2}

EDIT

I would like to use here Counter from collections aswell.

#!/usr/bin/python3.4
import re
from collections import Counter

PATTERN = re.compile(r'#(\w+)')
posts = [
    "hi #weekend",
    "good morning #zurich #limmat",
    "spend my #weekend in #zurich",
    "#zurich <3"]

words = [word for post in posts for word in PATTERN.findall(post)]

counted = Counter(words)
print(counted)

# Result: Counter({'zurich': 3, 'weekend': 2, 'limmat': 1})

Python Playground