Text mining using natural language processing

ITI Scotland (now Scottish Enterprise), with the University of Edinburgh School of Informatics, has developed a combination of advanced text mining software technologies known as TXM.

Text mining harnesses computer power to enhance the process of analysing large quantities of unstructured text, identifying, locating and extracting critical information, and presenting this in a clear and succinct ways to facilitate the creation of comprehensive, searchable databases for novel discovery.

Text Mining, or Text Analytics as it is increasingly known, is both a technology and a process. It is a mechanism for the discovery of knowledge from documents. It is a means of finding value in text.

The technology mines documents and other forms of ‘unstructured’ electronic data. It does this by analysing linguistic structure and by applying statistical and machine-learning techniques to discern entities (names, dates, places, terms, proteins, etc) and their attributes, as well as relationships, concepts, and even sentiments. These ‘features’ are extracted to databases for further analysis, automated classification and processing of the source documents.

These databases use visualisation approaches for the exploratory analysis of the discovered information.

Commercial opportunity

We are seeking engagement with organisations and/or individuals who may have an interest in developing this technology for commercial applications. The technology, protected by six patent families, can be accessed as a complete system or in its component parts. We seek to exploit these intellectual assets for the benefit of the Scottish economy.

TXM employs the latest in complex NLP (Natural Language Processing) technologies to analyse text and cross reference data with large databases of background knowledge. The highly scalable TXM system is accessed by means of a web-based interface which allows simultaneous users to be located on different sites anywhere in the world.

This allows:

  • Flexible, generic text mining system which creates structure from unstructured information
  • Automates extraction of valuable information
  • Centralises document management, in a single electronic format
  • Converts PDF source document to machine-readable XML format, performs linguistic pre-processing, extracts entities and their relationships, and assigns identifiers
  • Uses task-specific NLP modules for particular domain application
  • Modular design for scalability and manageability
  • Adaptable across wide variety of domains such as legal, financial, intellectual property and life sciences

Potential market application in:

  • Recruitment
  • Patents and Intellectual Property
  • Legal, Tax and Regulatory
  • Financial Services
  • Healthcare
  • Games Industry
  • Corporate Compliance
  • Engineering and Manufacturing
  • Science and Social Science
  • Intelligence and Counter-terrorism
  • Law Enforcement
  • Life Sciences

Benefits of TXM technologies

  • Centralises information and converts to a single electronic format – improves document handling and selection
  • More efficient research: reduced time and greater productivity
  • Greater accuracy with reduced human error
  • Reduces task tedium and frustration for the end user
  • Fewer items of importance are missed
  • Valuable, but otherwise hidden, items are discovered
  • Output quality is increased

The next step

If you are interested in exploring this opportunity further, please contact us