Because the body of a patent specification often contains considerable matter that is not claimed because it describes the prior art which may in the public domain, but which is used to understand the definitions that form the bounds of what is claimed for exclusionary rights, analysis of claims should focus on the parsing of the technical language in the claims but carried out in light of the specification is necessary for more sophisticated analysis of freedom to operate.
As outlined below, the formal technical style of claims language allows certain computational linguistic techniques to be employed that are not applicable to patent documents as a whole.
Types of claims
Claims can be viewed as falling into two broad types, independent claims and dependent claims. Claims can be further divided into categories based on what is claimed, a method claim or device claim, etc. Differing national patent laws mean that techniques and strategies needed to determine the metes and bounds of claims may differ. In much of Europe, claims tend to be interpreted according to a central claiming system, in which claims identify the “centre” of the invention. In the United States and England a peripheral claims system, which identifies the limits and bounds of what is claimed, applies.35
There are a number of special types of claims, which can be identified by reference to their structure and content. “Jepson claims” typically may include the phrase “wherein the improvement comprises”. Markush claims, often used in chemistry but occasionally seen in all other fields of art, may include the phrase “consisting of” or “comprising” followed by a variable list of possible substituents. Product-by-process claims typically use phrases like “Product obtained by the process of claim…”. Beauregard type claims often include the phrase a “computer-readable medium”.
Computational linguistic techniques have been applied to patents claims with an number of aims in mind, including computer-aided drafting of claims and the machine translation of claims from one language to another, the conversion of claims into more readable forms through the reformatting of claims into a more linguistically natural structures, and the alignment of sections of the description with the relevant sections of the claims.36
Patent claims have a peculiar structure that challenges natural language processing (NLP) techniques. Claims are an extreme example of very long sentences with an abundance of telescopically embedded clauses. There are, however, aspects of claims grammar such as use of predicates37 and aspects of the linguistic specificity of claims drafting style that allow the use of NLP techniques such as Rhetorical Structure Analysis (RSA) using cue phrases,38 and combination of methods exploiting grammatical formalisms with predicate lexicons.39 Although many of these techniques have advanced in recent years, these technologies are yet to be widely adopted.
As the volume of patent applications increases and the need for the translation of patents and non patent literature grows, as a result of trade and patent co-operation treaties, computational linguistics will play an increasingly important role. In the areas of patentability, freedom to operate (FTO), invalidation and infringement searching across multiple languages, NLP techniques have the potential to give significant competitive advantage to those organizations that successfully deploy the technology.
Quality of data for claims analysis
While some patent offices, such as the USPTO and EPO, publish patents in a format that makes claims easy to programmatically identify and search, this is not universally true. In the case of Australia, where claims are published as part of an image facsimile of the whole patent document, optical character recognition (OCR) and text parsing techniques are required just to identify the claims section. If claims have been amended when a PCT application enters national phase, programmatic identification of claims is dependent on the conventions and procedures adopted by publishing authority of that nation. If all that is provided is a cover sheet with hand written amendments or sections of claims ruled out, programmatic access to and analysis of claims becomes problematic.
Publication of the claims sections of patents in machine readable form (e.g. Standard ST.36 Recommendation for the Processing of Patent Information using XML) can be seen a first step in facilitating innovative software to add functionality such as linking terms used in the claims to definitions in the specification, file wrapper, or external references.
WIPO has announced that as of January 2006 it will produce “Weekly Published International Applications” on DVD (formerly known as Rule 87 DVDs) in a new format that includes search quality OCR data with the claims clearly marked.40 The “wo-published-application.dtd” could be seen as setting a minimum benchmark for the publication of patent documents.41