Splitting A Python String
I have a string in python that I want to split in a very particular manner. I want to split it into a list containing each separate word, except for the case when a group of words
Solution 1:
This isn't something with an out-of-the-box solution, but here's a function that's pretty Pythonic that should handle pretty much anything you throw at it.
def extract_groups(s):
separator = re.compile("(-?\|[\w ]+\|)")
components = separator.split(s)
groups = []
for component in components:
component = component.strip()
if len(component) == 0:
continue
elif component[0] in ['-', '|']:
groups.append(component.replace('|', ''))
else:
groups.extend(component.split(' '))
return groups
Using your examples:
>>> extract_groups('Jimmy threw his ball through the window.')
['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']
>>> extract_groups('Jimmy |threw his ball| through the window.')
['Jimmy', 'threw his ball', 'through the', 'window.']
>>> extract_groups('Jimmy |threw his| ball -|through the| window.')
['Jimmy', 'threw his', 'ball', '-through the', 'window.']
Solution 2:
There's probably some regular expression solving your problem. You might get the idea from the following example:
import re
s = 'Jimmy -|threw his| ball |through the| window.'
r = re.findall('-?\|.+?\||[\w\.]+', s)
print r
print [i.replace('|', '') for i in r]
Output:
['Jimmy', '-|threw his|', 'ball', '|through the|', 'window.']
['Jimmy', '-threw his', 'ball', 'through the', 'window.']
Explanation:
-?
optional minus sign\|.+?\|
pipes with at least one character in between|
or[\w\.]+
at least one "word" character or.
In case ,
or '
can appear in the original string, the expression needs some fine tuning.
Solution 3:
You can parse that format using a regex, although your choice of delimiter makes it rather an ugly one!
This code finds all sequences that consist either of a pair of pipe characters |
enclosing zero or more non-pipe characters, or one or more characters that are neither pipes nor whitespace.
import re
str = 'Jimmy |threw his| ball -|through the| window.'for seq in re.finditer(r' \| [^|]* \| | [^|\s]+ ', str, flags=re.X):
print(seq.group())
output
Jimmy
|threw his|
ball
-
|through the|
window.
Post a Comment for "Splitting A Python String"