Blog

LexPredict Open Sources the 1910 Version of Black’s Law

I.  Purpose

At their core, many academic and commercial applications of natural language processing and machine learning can benefit from a controlled lexicon of expert-selected terms (i.e., a dictionary). This is especially true of highly technical language, such as legal text. However, after a search of the existing landscape, we were unable to find a high-quality open source or freely-available legal dictionary. Instead, the best existing versions, when available, exist under some form of restrictive licensing conditions.

Thus, to further the goals of the entire legal profession, as well as a range of legal technology providers and solutions, we are announcing the next step in our broader open source plan that we outlined earlier this month. The 1910 version of Black’s Law Dictionary (2nd Edition) is one of the premier legal dictionaries. LexPredict is now making it available on Github as a structured data object.

Arguably the best legal dictionary, Black’s Law is now available under the open source Creative Commons CC-BY-SA-4.0, which will allow both researchers and commercial providers to operate with limited restrictions. Our own open-source document analytics platform, ContraxSuite, is just one of the many projects that will benefit from Black’s becoming open source. We expect many other researchers and companies to both benefit and contribute.

II.  Contents

The current contents of the dictionary are organized by language, locale, and topic, and can be accessed in CSV, JSON, or Excel. The contents and organization of this legal dictionary repository will evolve and expand as the community participates and our products evolve. Our vision is to work towards multi-domain, multi-lingual, and cross-lingual resources for legal and regulatory text. To this end, we are still actively cleaning and correcting the current version, and we anticipate that this refinement process alone may take several weeks.

III. Final Thoughts

We believe that the absence of such high-quality linguistic inputs represents a significant bottleneck in the academic and commercial space. We hope this offering will help. Please check back for more as we continue down the path to #OpenSourceLegal.