Skip to content Skip to sidebar Skip to footer

Removing From A String All The Characthers Included Between Two Specific Characters In Python

What's a fast way in Python to take all the characters included between two specific characters out of a string?

Solution 1:

You can use this regular expression: \(.*?\). Demo here: https://regexr.com/3jgmd

Then you can remove the part with this code:

import re
test_string = 'This is a string (here is a text to remove), and here is a text not to remove'
new_string = re.sub(r" \(.*?\)", "", test_string)

This regular expression (regex) will look for any text (without line break) in brackets prepended by a space


Solution 2:

You will most probably use a regular expression like

\s*\([^()]*\)\s*

for that (see a demo on regex101.com).
The expression removes everything in parentheses and surrounding whitespaces.


In Python this could be:
import re
test_string = 'This is a string (here is a text to remove), and here is a text not to remove'
new_string = re.sub(r'\s*\([^()]*\)\s*', '', test_string)
print(new_string)
# This is a string, and here is a text not to remove


However, for learning purposes, you could as well go with the builtin methods:
test_string = 'This is a string (here is a text to remove), and here is a text not to remove'
left = test_string.find('(')
right = test_string.find(')', left)

if left and right:
    new_string = test_string[:left] + test_string[right+1:]
    print(new_string)
    # This is a string , and here is a text not to remove

Problem with the latter: it does not account for multiple occurences and does not remove whitespaces but it is surely faster.


Executing this a 100k times each, the measurements yield:
0.578398942947 # regex solution
0.121736049652 # non-regex solution

Solution 3:

to remove all text in ( and ) you can use findall() method from re and remove them using replace():

import re
test_string = 'This is a string (here is a text to remove), and here is a (second one) text not to remove'
remove = re.findall(r" \(.*?\)",test_string)
for r in remove:
    test_string = test_string.replace(r,'')
print(test_string)
#result: This is a string , and here is a  text not to remove

Post a Comment for "Removing From A String All The Characthers Included Between Two Specific Characters In Python"