Skip to content Skip to sidebar Skip to footer

Genia Tagger File Not Found Error In Anaconda/nltk

I need to perform text pre-processing tasks such as sentence splitting, tokenization and tagging using NLTK. I want to use GENIA tagger for tagging. I am using Anaconda version 3.1

Solution 1:

TL;DR:

# Install Genia Tagger (C code).
$ git clone https://github.com/saffsd/geniatagger && cd geniatagger && make && cd ..
# Install Genia Tagger (python wrapper)
$ git clone https://github.com/informationsea/geniatagger-python.git && cd geniatagger-python && sudo python setup.py install && cd ..
$ python
>>> from geniatagger import GeniaTagger
>>> tagger = GeniaTagger('./geniatagger/geniatagger')
>>> loading morphdic...done.
loading pos_models................done.
loading chunk_models....done.
loading named_entity_models..done.

>>> print tagger.parse('This is a pen.')
[('This', 'This', 'DT', 'B-NP', 'O'), ('is', 'be', 'VBZ', 'B-VP', 'O'), ('a', 'a', 'DT', 'B-NP', 'O'), ('pen', 'pen', 'NN', 'I-NP', 'O'), ('.', '.', '.', 'O', 'O')]

I'm not sure whether the packages for Genia tagger works out of the box from conda, so i think a native python/pip fix is simpler.

Firstly, there's no support for Genia Tagger in NLTK (At least not yet =) ), so it isn't a problem with the NLTK installation/modules.

The problem might lie in some outdated imports that the original GeniaTagger C code uses (http://www.nactem.ac.uk/tsujii/GENIA/tagger/).

So to resolve the problem, you have to add #include <cstdlib> to the original code but thankfully @saffsd has already done so and put it nicely in his github repo (https://github.com/saffsd/geniatagger/blob/master/morph.cpp)

Then comes installing the python wrapper, you can either:

  • install from the official pypi with: pip install https://pypi.python.org/packages/source/g/geniatagger-python/geniatagger-python-0.1.tar.gz

  • or use some other github repo to install, e.g. https://github.com/informationsea/geniatagger-python that appears first from google search

Lastly, the GeniaTagger initialization in python is rather weird because it doesn't really take the path to the directory of the tagger but the tagger itself and assumes that the model files are in the same directory as the tagger, see https://github.com/informationsea/geniatagger-python/blob/master/geniatagger.py#L19 .

And possibly it expects some use of './' in the first level of directory path, so you would have to initialize the tagger as such GeniaTagger('./geniatagger/geniatagger').


Beyond the installation issues. If you use the python wrapper for the GeniaTagger, there's only one function in the GeniaTagger object, i.e. parse(), when you use parse(), it will output a list of tuples for each sentence and the input is one sentence string. The items in each tuple are:

Post a Comment for "Genia Tagger File Not Found Error In Anaconda/nltk"