Why We’re Open-Sourcing Our Contract Analytics Platform – ContraxSuite


Over the last decade, we’ve spent many thousands of hours developing the contract analytics and document analytics tools that we use with clients. These tools, based on enterprise-quality open source frameworks for natural language processing, machine learning, and optical character recognition, have allowed us to quickly and easily attack many problems, from securities filings and court opinions to articles of incorporation and lease agreements.

Today, we are proud to announce that we plan to open source the development of our core platform for contract analytics and document analytics – ContraxSuite. This code base and our public development roadmap are hosted on Github under a permissive open-source licensing model that will allow most organizations to quickly and freely implement and customize their own contract and document analytics, or purchase a license with or without support services included. Like Redhat does for Linux, we will provide support, customization, and data services to “cover the last mile” for those organizations who need it.

We believe that the future of law lies in its central role in facilitating and regulating the modern information economy. But unless we start treating law itself like the production of information, we’ll never get there. Before we can solve big problems with smart contracts, we need to start by structuring existing legacy contracts. We hope our actions today will help lawyers, companies, and other LegalTech providers accelerate the pace of improvement and innovation through more open collaboration.


Over the course of human history, many models of economic development and innovation have emerged. Some of these models, like the public stock company, are quite new in the scheme of things. Others, like the community of “natural” philosophers (scientists) or mutual insurance, are very old. But in all cases – whether intangible insurance or tangible iPad – there is a flow from idea to execution to economic value.

One crude way of comparing these economic models is to examine how information is held – publicly or privately. Information, not opposable thumbs, as Cesar Hidalgo elegantly explains, is the secret weapon of our species – our only defense against the intrinsic chaos and decay of the universe. And in the mode of economic analysis descended from Adam Smith, private information is the key. Knowledge and know-how enable enterprises to produce and profit. This principle of private information has guided most enterprises over the last few centuries, from the coveted trade route maps of the 18th century to the modern Coca Cola recipe.

So why, especially in the last half-century, have we seen the “open source” or “peer-to-peer” model of information grow? Google, Apple, and Facebook, three companies worth nearly two trillion dollars combined, have given away thousands of software projects worth billions of dollars in effort-hour cost. Have we all gone mad? Or is there something else going on – an emergent value to more public knowledge?

There are many attempts to answer this fascinating question, and these attempts both challenge and enlighten our understanding of human behavior, economics, and law. However, especially as it relates to software, we lean on the words of Prof. David Agrawal:

For many […], software is only a semi-finished good that generates little value until the code has undergone revision by the user. Thus, creating the ultimate finished product will require a sequence of motivating incentives.


The quote above is as true of legal documents – articles of incorporation, operating agreements, corporate filings, contracts and agreements – as it is of software. While there are thousands of resources for corporate governance or contracting available online, none of these are “finished” goods for another party. In some cases, “finishing” may simply require updating the parties and dates, like a restaurant microwaving a pre-cooked cut of meat. In other cases, “finishing” may actually involve novel structures or more creative redrafting, like a chef growing his own produce or combining flavors in unexpected ways.

As to the second sentence – “[…] the ultimate finished product will require a sequence of motivating incentives” – we arrive back to ContraxSuite. Software products designed to assist in the drafting and analysis of legal documents are perfect examples of semi-finished goods. Alone, no software product can incorporate an entity or enter into a sales agreement or manage the geopolitical risk of supplier networks (let’s ignore smart contracts and DAO-like organizations, for now). Only with the assistance of legal professionals can this semi-finished software deliver value to the organization and its clients.

In our experience, few legal departments and law firms debate that their legal documents contain valuable information. Analytics can provide insights into a wide array of opportunities and risks. Standardization can remove frictions for core business operations and increase the rate and quality of transactions.

But when you ask these organizations to pay a per-document fee for software that almost always requires additional customization or produces “unfinished” results, their excitement turns into hesitation. Contract analytic software, like Google’s TensorFlow or the Linux Kernel, does not generate value by itself; human capital is required to “cover the last mile” that actually solves the problem. This is why data scientists have not yet been put out of business by TensorFlow and why Redhat and Oracle still “sell Linux” for billions of dollars per year.

We have a “last mile” problem in the legal arena. Much like Linux and data science, contract analytics is largely about the combination of well-known practices with large-scale, high-quality data. Contracts are natural language encoded in either analog or digital format, and this language is unlocked and encoded with technologies like optical character recognition (OCR) and natural language processing (NLP). These encodings are then mapped back to real-world business problems through techniques like clustering or classification, two types of machine learning (ML) algorithms. None of these technologies or techniques above are proprietary or novel, and some versions of these ideas have been available nearly as long as there have been digital computers.

The real challenge in contract analytics is to develop the so-called “training data” – the set of documents and labels used to “teach” the machine what separates a lease agreement from a purchase/sale agreement from a retirement benefits plan. Herein lies the true value of the current software and service providers. But, paradoxically, almost all providers get their information from one of two sources – either public sources of agreements, like the SEC’s EDGAR database and evidence from public courts, or from private sources of agreements – their clients. Many organizations have therefore paid for the privilege to give away their own information so that someone else can profit.

By open-sourcing ContraxSuite, we hope to change this dynamic. The analysis and standardization of contracts and corporate governance material is key to the transformation of our economy. But blockchain and smart contracts aside, there are significant improvements in risk management, compliance, and profitability that can be gained by treating contracts as valuable data. Until legal departments and law firms can be “sequentially motivated,” to borrow Professor Agrawal’s language, we will not see this maturation of the industry.

In the near future, we’ll be revealing more details about this open source strategy – including more details on academic and industry partnerships, support and customization services, and our open-source license model.  In the meantime, we hope to get everyone thinking fundamentally about how we do business in legal tech. What does the client really want – your software license or a sustainable solution?


While some in the legal technology community will certainly tout their ability to outperform our open source framework on some use cases today, we believe that they will find themselves on the wrong side of history. We encourage others in legal technology and related domains to follow our lead and think hard about whether closed code is really the right strategy. By committing to open source and allowing others to validate, improve, and maintain our shared infrastructure, any gaps in quality or functionality will soon fade away.

More critically, the hype and “vaporware” that pervade legal technology do a disservice to the overall effort towards innovation. We would like to move the field forward through iterative and open methodological improvement. Vaporware has been enabled by a lack of transparency and, to be honest, a lack of sophistication on behalf of customers, but going forward, both the consumer and developer community should be able to inspect the quality of any existing offering. Our co-founders, as leading educators of the next generation of attorneys, have committed to making sure that the General Counsel and Managing Partner of the future won’t be fooled. Others can chase the temporary rents of the short game, but we will continue to make moves that support the long-term maturation of the overall legal industry. We hope you’ll join us.

– Michael J Bommarito & Daniel Martin Katz
CEO & CSO @ LexPredict