How To Extract Url From Pandas Dataframe?

February 02, 2024 Post a Comment

I need to extract URLs from a column of DataFrame which was created using following values creation_date,tweet_id,tweet_text 2020-06-06 03:01:37,1269102116364324865,#Webinar: Sig

Solution 1:

The main problem is that your URL pattern contains capturing groups where you need non-capturing ones. You need to replace all ( with (?: in the pattern.

However, it is not enough since str.extract requires a capturing group in the pattern so that it could return any value at all. Thus, you need to wrap the whole pattern with a capturing group.

You may use

pattern = r'(https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}[-a-zA-Z0-9()@:%_+.~#?&/=]*)'

Note the + is not necessary to escape inside a character class. Also, there is no need to use // inside a character class, one / is enough.

Python Playground

How To Extract Url From Pandas Dataframe?

Solution 1:

Post a Comment for "How To Extract Url From Pandas Dataframe?"