Information extraction

Synonyms: text extraction

Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information from unstructured machine-readable documents, generally human language texts by means of natural language processing (NLP). Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted domains. An example is the extraction from news wire reports of corporate mergers, such as denoted by the formal relation:, from an online news sentence such as: "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp. " A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context.

Related terms:
Named entity recognitionNamed entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, ...

Broader terms:
Text miningText mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the divining of patterns and trends through means such as ...

