mwetoolkit
THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/
THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/
The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc.
Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics.
The mwetoolkit can be applied to virtually any text collection, language, and MWE type. It is a command-line tool written mostly in Python. Its development started in 2010 as a PhD thesis but the project keeps active (see the SVN logs).
Up-to-date documentation and details about the tool can be found on the mwetoolkit website: http://mwetoolkit.sourceforge.net/