Preparing for a Post-LIBOR Future with ContraxSuite, Part 5: Word Embedding and Regular Expressions

LIBOR goes dark sometime toward the end of 2021, and even now, financial institutions are preparing for it. As discussed in Part 1, Part 2, Part 3, and Part 4, the potential consequences are substantial and wide-ranging, but there are also many options during this transitional period.

Last week, we discussed how LexNLP can locate certain phrases or extract terms in the many forms that drafters might use. Revolver or revolving? No problem matching either. 150 basis points or 1.5%? No difference to LexNLP. But what happens when what you’re looking for is something more general? What happens when drafters frequently use different synonyms with different root words? Unlike most other contract analytics and E-Discovery software solutions, LexNLP and ContraxSuite don’t have to miss a beat when challenged in circumstances like this. Their secret? A digital cousin of the simple thesaurus: word embedding.

Word Embedding

Word embedding accomplishes an approximation of conceptual understanding by building a thesaurus, from scratch, out of large amounts of data. Critically, these tools can be built with little to no human supervision; the only requirement is a large sample of similar language. Like many of the most effective models in machine learning and natural language processing, the approach is based on a simple idea, as demonstrated with the following example:

The parties to this ____________.

Fill in the blank with how you think this sentence should end. You can probably think of at least three ways this sentence would probably end in the context of a legal document:

The parties to this agreement
The parties to this contract
The parties to this amendment

Word embedding models are trained by reviewing many, many phrases like this. Whenever words appear in a similar context – like “agreement,” “contract,” or “amendment” – the model takes note of the similarity. This technique is not limited to nouns or single words, either. It is equally effective at handling verbs, noun phrases, or more complex concepts like “date” or “expression of distance”.

word embedding example

Two different phrases in a simple word embedding

The ability to use a pre-trained thesaurus or word embedding – and to even quickly train your own – is what sets LexNLP and ContraxSuite apart from many other similar contract analytics and E-Discovery tools. Whereas most systems rely on exact phrase matching or might incorporate some stemming, they fall apart when confronted with more complex circumstances. When searching for the concept of a “purchase,” for example, results like “acquisition,” “transfer,” or “repurchase” should be found. Most tools, however, are clueless.

Let’s return to our example above. A straight-up search function that accounts for “contract” and “agreement” will return a large portion of documents that a user is looking for. A tool that uses word embedding, like LexNLP, will not only find these documents, but will also find documents even when the exact phrase isn’t used. In our example, what if our contract analytics software is missing phrases like “The parties to this lease”? We probably want those sentences found, even though they do not contain the words “contract” or “agreement”. When configured with an appropriate word embedding model, LexNLP can recognize the phrase “The parties to this lease” as equivalent to “The parties to this agreement.”

word embedding example

A word embedding model matching “lease” as a semantically similar concept

The concept of “lease” wasn’t expressly included in the search query, but because it is used in contexts similar to “contract” and “agreement”, LexNLP knows to incorporate it into the model and treat it accordingly.

word embedding example

Word embedding model successfully integrating “lease” as a concept related to “contract” and “agreement”

Trying to guess what words will be relevant and important is difficult. LexNLP lets a machine with a full map of word embedding associations do the heavy lifting.

Regular Expressions

Casting a wide net is an integral part of contract analytics and discovery, and newer technologies like word embedding are often useful. In many contexts, however, simpler patterns like those found in the phrase:

Acme, Inc. AS Borrower or Acme, Inc. (the “Borrower”)

contain valuable information and don’t require any fancy machine learning or natural language processing to handle. Frequently, users even know exactly what pattern they’re looking for. Regular expressions, often shortened to regex or regexp, provide a concise way to express these patterns and capture specific information. ContraxSuite allows for easy implementation of such extraction rules.

regex governed by

ContraxSuite interface showing a regex document field detector for the phrase “governed by”

In this example, a ContraxSuite field detector searching for governing law is written to find the word “governed”, followed by anywhere from 1 to 5 characters of white space, followed by the word “by”. Many of LexNLP’s methods also incorporate regular expressions, either to locate or extract certain information.

Combining The Evidence

Putting these tools together is even more powerful than using either alone. For example, many organizations are interested in restrictions on assignment. Some tools do an excellent job of finding all assignment clauses, but when asked to extract the number of days required for notice and response, they fall flat on their face. Other tools can extract the number of days in a limited number of clauses, but miss many other examples. With ContraxSuite, users can combine tools like word embedding (capturing “assign,” “convey,” and “transfer,” etc.), and LexNLP extractors like get_duration, to get the best of both worlds.

Sadly, many systems treat these two approaches religiously; some believe only in cutting-edge machine learning approaches, while others have faith in tried-and-true regular expression patterns. Systems that rely on one technology may portray themselves as opposed to and superior to the other approach. The true power of LexNLP and ContraxSuite is in the open-minded integration of both approaches, allowing a user to pick and choose the right tools for the right problems.

In the months to come, it will become increasingly important for lenders and borrowers to analyze the language of their loan agreements for their relationship with LIBOR rates. LexNLP can find those relationships, one word embedding at a time. Next week, we will dive deeper into the ContraxSuite UI, how to implement regex field detectors, and more. Stay tuned.

Put ContraxSuite On The Case

In response to the growing need for LIBOR-related contract analytics, we are building a LIBOR-focused version of ContraxSuite. Trained on tens of thousands of financial contracts, ContraxSuite can find and label important LIBOR-related clauses, including fallback provisions. ContraxSuite provides a user-friendly interface for users of all backgrounds and experience levels, but your organization can opt to use LexNLP separately from ContraxSuite to customize how you extract and structure information in your documents. Contact us to find out how best to implement ContraxSuite, or develop a solution with LexNLP by itself, in your organization (You can also scroll to the contact form at the bottom of this page).

Click here for Part 1 of this series. Click here for Part 2. Click here for Part 3. Click here for Part 4.


About LexPredict

LexPredict is an enterprise legal technology and consulting firm, part of the Elevate family of businesses. Our consulting teams specialize in legal analytics, legal data science and training, risk management, and legal data strategy consulting. We work with corporate legal departments and law firms to empower better organizational decision-making by improving processes, technology, and the ways people interact with both. We develop software and data tools, including ContraxSuite, LexSemble, CounselTracker, and LexReserve, that assist organizations with contract analytics and workflows, early case assessment and decision trees, outside counsel spend management, and case valuation. Discover more at

Comments are closed, but trackbacks and pingbacks are open.