Parsing legal clauses from contracts


(Erik Chan) #1

Hello,

I’m building a model to classify clauses within legal documents. Instead of trying to classify the entire document (searching for a needle in haystack), I’m thinking of providing better supervision by training a model to classify per paragraph/text snippet.

How would you suggest splitting a variety of legal documents into its separate clauses? My impression is a solution should exist because it is possible with images (e.g bounding box detection). But NLP seems to work a bit differently.

I’m considering training a seq-to-seq RNN to automatically annotate a document with clause beginning and ending tags. Would that work since legal documents are long texts?

So the input document could be:

1. This is some important clause.

2. Cool guys include:
i) Erik
ii) Sam
iii) Teddy

3. Bad guys include:
i) Gary
ii) Jennifer
But only when Gary is drunk.

And the output of the model would be below. {star} and {/star} are the annotations included by the model to help break the document into separate parts.

{*}1. This is some important clause.{/*}

{*}2. Cool guys include:
i) Erik{/*}

{*}2. Cool guys include:
ii) Sam{/*}

{*}2. Cool guys include:
iii) Teddy{/*}

{*}3. Bad guys include:
i) Gary
But only when Gary is drunk.{/*}

{*}3. Bad guys include:
ii) Jennifer
But only when Gary is drunk.{/*}

Are there any other possible solutions I should consider?


#2

Check Python natural language toolkit http://www.nltk.org/


(Erik Chan) #3

@clipmaker is there something in the library that you think would help solve this problem? I’m not aware of any