Improved Tomita Parser

e-mail

This is an implementation of the Tomita's parsing algorithm, Unlike other parsing algorithms like LALR(1) or LL(1), this algorithm can handle any ambiguous or cyclic context free grammar.

As a part of my linguistics project, I hacked the source code of this parser. I ported the code to C++, fixed a bug, throw in some performance improvements.

I was trying to use this to parse a large grammar from Penn Treebank, but I eventually abandoned that goal as this implementation doesn't seem to work very well when the grammar is large.

Anyway, here is the original tom parser source code and my version (including a binary for Win32). I just hope somebody would find it useful in some way.

License

The original source was in the public domain, and I don't know who the author is. I follow the original author and hereby disclaim any copyright to my modification as well. Happy hacking!


Back to home