Regex To Extract Nested Patterns
Solution 1:
C# has recursive/nested RegEx, I don't believe Python does. You could re-run the RegEx search on previous results, but this is probably less efficient (the overhead of RegEx for such a simple search) than just making a custom parser. The text your searching for "[@" and "]" isn't very complex.
Here's a custom parser (in JavaScript) that would do the job.
vartxt="Lorem ipsum dolor sit amet [@a xxx yyy [@b xxx yyy [@c xxx yyy]]] lorem ipsum sit amet";
function parse(s) {
varstack= [];
varresult= [];
for(var x=0; x<s.length; x++) {
varc= s.charAt(x);
if(c == '[' && x+1 < s.length-1 && s.charAt(x+1) == '@') {
for(var y=0; y<stack.length; y++)
stack[y] += "[@";
stack.push("[@");
x++;
} elseif(c == ']' && stack.length > 0) {
for(var y=0; y<stack.length; y++)
stack[y] += "]";
result.push(stack.pop());
} else {
for(var y=0; y<stack.length; y++)
stack[y] += c;
}
}
return result;
}
parse(txt);
It quickly loops through all the characters of the text (only once) and uses a stack and an if...if else...else condition to push, pop and modify the values in that stack respectively.
Solution 2:
So coming from a c# background, I'm not sure this is going to help but, I imagine that since you have to parse the inside commands anyway, why not just store the contents of the command, and then run your regex function again on the inner data? I know I'm probably missing something, but that's why I would try at least.
Solution 3:
No wonder you cannot wrap your head around the problem. There is a formal language theory regarding formal languages. Noam Chomsky described four categories of the languages -- known as Chomsky hierarchy. Regular expressions are capable do describe the easies category of the languages -- the regular languages. However, languages with nested paired structures are outside of regular languages, and they cannot be described/accepted by regular expressions.
One kind of the parsers that are the most easily implemented are the ones based on recursive call of functions that parse elements of the language.
Post a Comment for "Regex To Extract Nested Patterns"