LexPredict Releases OpenEDGAR – Software to Build SEC EDGAR Databases

Our ContraxSuite platform has been generating a lot of attention lately, especially as we continue to work on improving the user experience over the coming months. But we are also hard at work on other projects designed to make the world’s data more accessible to anyone who needs it.

The amount of data in the world is increasing at a staggering rate. For a powerful example of this, we look to the SEC’s EDGAR database. This database is prohibitively vast for anyone to try to skim without knowing exactly what they’re looking for and exactly where to find it. This is a prime example of the sorts of problems LexPredict addresses with our data products and services.

To that end, LexPredict is releasing a new project to make the SEC’s EDGAR database easier to search – OpenEDGAR. OpenEDGAR is a comprehensive framework for building databases from EDGAR, and can automate the retrieval and parsing of all EDGAR forms. OpenEDGAR uses the same software that powers many of our data products, including the LexPredict Agreement Database. As with our pioneering ContraxSuite software platform, OpenEDGAR will be open source.

As we like to point out from time to time, the benefits of an open source business model outweigh many of the potential downsides. Open source code is a great asset for the developer community, but the amount of data in the EDGAR database is huge and the variety of form types can be overwhelming. Luckily for you, our data scientists have extensive experience with legal and regulatory data. With OpenEDGAR, LexPredict will be able to build custom data sets and parsers, provide real-time search API access, and even deliver terabytes of historical bulk data to organizations that need assistance.

Part of our continued interest in the EDGAR database is that its size and its content are ideal for training machine learning and natural language processing tools. Both LexNLP and ContraxSuite are built and tested on training data from EDGAR, and we frequently rely on our own Agreement Database to accelerate client projects.

Future Updates

As part of our commitment to putting valuable software into the open source community, we will release OpenEDGAR on GitHub in the next 30 days. In the meantime, if you’re interested in custom data and parsers, real-time search APIs, or terabytes of historical legal and financial data, please check out our Agreement Database.

Please email us at to be notified when we launch.

Comments are closed, but trackbacks and pingbacks are open.