BitPar


BitPar is a parser for highly ambiguous probabilistic context-free grammars (such as treebank grammars). BitPar uses bit-vector operations to speed up the basic parsing operations by parallelization.

You can download the BitPar source code as a gzip-compressed tar file. It is freely available for research and education.

You might also want to download the English trace grammar (4.5 MB) which was extracted from the PENN treebank or a similar grammar for German (5.3 MB) which was extracted from the Tiger treebank. There is also a UTF8 version of the German grammar.

Older versions of the parser and the grammars are available here:

old BitPar software
old English grammar
old German grammar

Publications

Please cite the following publications if you want to refer to the BitPar parser:

Trace Prediction and Recovery With Unlexicalized PCFGs and Slash Features, Proceedings of COLING-ACL 2006, Sydney, Australia. (pdf)

Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors, Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland. (pdf)

Acknowledgements

Many thanks to Andreas van Cranenburgh, who implemented a parser option which allows you to turn off the built-in smoothing functions.



Please send comments, suggestions and bug reports to Helmut Schmid at the address LastName@cis.uni-muenchen.de.