Genia Tagger File Not Found Error In Anaconda/nltk
Solution 1:
TL;DR:
# Install Genia Tagger (C code).
$ git clone https://github.com/saffsd/geniatagger && cd geniatagger && make && cd ..
# Install Genia Tagger (python wrapper)
$ git clone https://github.com/informationsea/geniatagger-python.git && cd geniatagger-python && sudo python setup.py install && cd ..
$ python
>>> from geniatagger import GeniaTagger
>>> tagger = GeniaTagger('./geniatagger/geniatagger')
>>> loading morphdic...done.
loading pos_models................done.
loading chunk_models....done.
loading named_entity_models..done.
>>> print tagger.parse('This is a pen.')
[('This', 'This', 'DT', 'B-NP', 'O'), ('is', 'be', 'VBZ', 'B-VP', 'O'), ('a', 'a', 'DT', 'B-NP', 'O'), ('pen', 'pen', 'NN', 'I-NP', 'O'), ('.', '.', '.', 'O', 'O')]
I'm not sure whether the packages for Genia tagger works out of the box from conda
, so i think a native python/pip fix is simpler.
Firstly, there's no support for Genia Tagger in NLTK (At least not yet =) ), so it isn't a problem with the NLTK installation/modules.
The problem might lie in some outdated imports that the original GeniaTagger C code uses (http://www.nactem.ac.uk/tsujii/GENIA/tagger/).
So to resolve the problem, you have to add #include <cstdlib>
to the original code but thankfully @saffsd has already done so and put it nicely in his github repo (https://github.com/saffsd/geniatagger/blob/master/morph.cpp)
Then comes installing the python wrapper, you can either:
install from the official pypi with:
pip install https://pypi.python.org/packages/source/g/geniatagger-python/geniatagger-python-0.1.tar.gz
or use some other github repo to install, e.g. https://github.com/informationsea/geniatagger-python that appears first from google search
Lastly, the GeniaTagger
initialization in python is rather weird because it doesn't really take the path to the directory of the tagger but the tagger itself and assumes that the model files are in the same directory as the tagger, see https://github.com/informationsea/geniatagger-python/blob/master/geniatagger.py#L19 .
And possibly it expects some use of './' in the first level of directory path, so you would have to initialize the tagger as such GeniaTagger('./geniatagger/geniatagger')
.
Beyond the installation issues. If you use the python wrapper for the GeniaTagger, there's only one function in the GeniaTagger
object, i.e. parse()
, when you use parse()
, it will output a list of tuples for each sentence and the input is one sentence string. The items in each tuple are:
- token (surface word)
- lemma (see Stemmers vs Lemmatizers)
- POS tag (looks like Penn Treebank tagset, see What are all possible pos tags of NLTK?)
- Noun chunk (see Output results in conll format (POS-tagging, stanford pos tagger))
- Named Entity chunk
Post a Comment for "Genia Tagger File Not Found Error In Anaconda/nltk"