Using Regular Expressions In Python To Determine C++ Functions And Their Parameters
Solution 1:
The grammar of C++ is far too complex to be handled by simple
regular expressions. You'll need at least a minimal parser.
I've found that for restricted cases, where I'm not concerned
with C++ in general, but only my own style, I can often get away
with a flex based tokenizer and a simple state machine. This
will fail in many cases of legal C++—for starters, of
course, if someone uses the pre-processor to modify the syntax;
but also because <
can have different meanings, depending on
what precedes it names a template or not. But it's often
adequate for a specific job.
Solution 2:
I've used a PEG parser with great success when trying to do simple format parsing. pyPeg is a very simple implementation of such a parser written in Python.
Example Python code for C++ function parser:
EDIT: Address template parameters. Tested with input from SK-logic and output is correct.
import pyPEG
from pyPEG import parseLine
import re
defsymbol(): return re.compile(r"[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&*][\w:]+")
deftype(): return symbol
deffunctionName(): return symbol
deftemplatedType(): return symbol, "<", -1, [templatedType, symbol, ","], ">"defparameter(): return [templatedType, type], symbol
deftemplate(): return"<", -1, [symbol, template], ">"deffunction(): return [type, templatedType], functionName, -1, template, "(", -1, [",", parameter], ")"# -1 -> zero or more repetitions.
sourceCode = "std::string foobar(std::vector<int> &A, std::map<std::string, std::vector<std::string> > &B)"
results = parseLine(sourceCode, function(), [], packrat=True)
When this is executed results is:
([(u'type', [(u'symbol', 'std::string')]), (u'functionName', [(u'symbol', 'foobar')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'int')]), (u'symbol', '&A')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::map'), (u'symbol', 'std::string'), (u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'std::string')])]), (u'symbol', '&B')])], '')
Solution 3:
C++ cannot really be parsed by a (sane) regular expression: they are a nightmare as soon as nesting is concerned.
There is another concern too, determining when to parse and when not to. A function may be declared:
- at file scope
- in a namespace
- in a class
And the two last can be nested at arbitrary depths.
I would propose to use CLang here. It's a real C++ front-end with a full-featured parser and there are:
- a C API, with (notably) an API to the Indexing Library
- Python bindings on top of the C API
The C API and Python bindings are far from fully exposing the underlying C++ model, but for a task as simple as listing functions it should be enough.
That said, I would question the usefulness of the project: if the documentation can be generated by a simple parser, then it is redundant with the code. And redundancy is at best, useless, and worst dangerous: it introduces the potential threat of desynchronization...
If the function is tricky enough that its use requires documentation, then a developer, who knows the limitations and al, has to write this documentation.
Post a Comment for "Using Regular Expressions In Python To Determine C++ Functions And Their Parameters"