How To Extract Url From Pandas Dataframe?
I need to extract URLs from a column of DataFrame which was created using following values creation_date,tweet_id,tweet_text 2020-06-06 03:01:37,1269102116364324865,#Webinar: Sig
Solution 1:
The main problem is that your URL pattern contains capturing groups where you need non-capturing ones. You need to replace all (
with (?:
in the pattern.
However, it is not enough since str.extract
requires a capturing group in the pattern so that it could return any value at all. Thus, you need to wrap the whole pattern with a capturing group.
You may use
pattern = r'(https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}[-a-zA-Z0-9()@:%_+.~#?&/=]*)'
Note the +
is not necessary to escape inside a character class. Also, there is no need to use //
inside a character class, one /
is enough.
Post a Comment for "How To Extract Url From Pandas Dataframe?"