Daxtron Labs

Free Ontologies and Lexicons

OpenCyc The Cyc project has finally released their "free to use version", OpenCyc. Currently a Linux binary executable, at 40 MB, OpenCyc claims to be a resource for commonsense reasoning. Supports a full logic engine capable of reasoing with contradictions by defining local non-contradictory (i.e. locally consistent) subdomains. Windows version and axiom sets avaiable soon. Tutorial, documentaion and download available at the site.

ResearchCyc Now in academic research stength! Millions of facts and rules with the engine to run them. Linux and Windows flavors. Weighs in at 300+ Meg for the zip. For advanced research, and requires signing a NDA, but still no-cost and it is a big improvement over OpenCyc. Soon to contain content of full Cyc.

WordNet An on-line lexical reference system widely used in computational linguistics. Words are categorized in synonym sets each representing an underlying concept.

Moby Grady Ward’s public domain Moby project includes thesauri, part-of-speech dictionaries for building parsers, and extensive word lists. Both old and big. Also on-line at Etext.IceWire.

OpenMind The principal goal of Open Mind is to develop "intelligent" software, in part by collecting large data sets and providing an open infrastructure where different ideas can be tried out. The data and resulting software are made available to the public.

ThoughtTreasure An advanced story understanding system and framework.

Edinburgh Associative Thesaurus Contains the most common word associations for the 8000 most common words and phrases in Britan. Not a developed semantic network such as WordNet, but empirical association data. Real people, real responses.

Daxtron's AIML Web Census (3 Meg Zip) Contains the number of times a merged set of patterns from the AIML AAA response set plus the header patterns to 300,000 question can be found on the web. Data both in CSV and AIML templates. Contains web frequency info on over 68,000 common English patterns. Similar to the AliceBot.org Superbot database but instead of summerizing the data on 2 million bot inputs, the census asks the web how frequent do existing bot patterns and subpatterns occur. Should your build a bot without checking either out? The data might also be of interest to those doing word frequency and n-gram analysis.