AGTK | A suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs. |
Arithmetic Coding | A java package Arithmetic Coding and PPM (adaptive variable-length > n-gram language models for compression) |
ComLinToo | A set of Perl tools for computational linguistics (esp. corpus handling and (permutation) statistics). |
Attribute-Logic Engine (ALE) | A freeware logic programming and grammar parsing and generation system |
EDG | A Lisp system for developing and displaying HPSG |
Ellogon | An LGPL component-based natural language engineering platform written in C, C++, Java, Tcl, Perl, and Python |
Emdros | A text database engine for analyzed or annotated text. |
FreeLing | An open source suite of language analyzers. |
Heart of Gold | XML-based middleware for integrating deep (HPSG parsing) and shallow NLP components. |
Leo | A project to provide an architecture for defining XML specifications of grammars for different natural language parsing systems and tools for converting grammars automatically between those systems |
LKB | The LKB system is a grammar and lexicon development environment for use with constraint-based linguistic formalisms. |
Mallet | A Machine Learning for Language Toolkit written in Java |
MinorThird | A collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text. |
Ngram Statistics Package | Allows for the counting and measuring of Ngrams in text. |
NLTK | A Python package intended to simplify the task of programming natural language systems. |
nlpFarm | A collection of NLP libraries, tools and demo applications. Current focus is mainly on parsing and dialogue systems. |
SenseRelate | Implements a word sense disambiguation algorithm using WordNet::Similarity |
Tiger API | Library which allows java programmers to easily access the structure of any corpus given as a tiger-xml file. |
Web as Corpus Toolkit | A collection of programs that can be used to create a (large) text corpus from a list of URLs. |
Weka | A collection of machine learning algorithms for data mining tasks. |
Weta | The Waikato Environment for Text Analysis |
WordNet::Similarity | Provides measures of semantic relatedness using WordNet. |