PoolParty Extractor (PPX) is part of the PoolPartyWeb based ontology manager which can serve as a central hub for your knowledge organization. With PoolParty you can organize and maintain knowledge models based on widely accepted specifications like RDF, SPARQL and SKOS. product family and builds the basis for state-of-the art text miningText mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the divining of patterns and trends through means such as ... applications.

The idea behind PPX is to underpin automatic text mining algorithms with domain-specific knowledge from thesauri and linked data sources. This is the precondition to extract meaning from unstructured information more precisely and with higher performance. PoolParty Extractor supports the following application scenarios:
- automatic document categorisation
- named entity extractionNamed entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, ... based on concepts from thesauri or other knowledge models
- text analysis to improve semantic indexing
- automatic transformation of unstructured text to an RDF based linked data source
- linking and enrichment of text with structured data from databases or XML-documents
- extended indexing by using inflected forms of words and by splitting of compound words
- generation and continuous improvement of thesauri by text corpus analysis
PoolParty Extractor can be integrated smoothly with third-party systems like CMSA content management system (CMS) is a collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based. The procedures are designed to: Allow for a large number of people to contribute to and share stored data Control access to ..., DMS, communication platforms, wikis etc. PPX is fully based on Java and provides an HTTP APIAn application programming interface (API) is an interface implemented by a software program to enable interaction with other software, similar to the way a user interface facilitates interaction between humans and computers. APIs are implemented by applications, libraries and operating systems .... Integrations with Sharepoint, ConfluenceConfluence is a web-based corporate wiki written in Java and mainly used in corporate environments. It is developed and marketed by Atlassian. Confluence is sold as either on-premises software or as a hosted solution. Its license is proprietary, but a zero-cost license program is available for ..., WordPressWordPress is an open source Content Management System (CMS), often used as a blog publishing application, powered by PHP and MySQL. It has many features including a plug-in architecture and a template system. Used by over 12% of the 1,000,000 biggest websites, WordPress is the most popular CMS ... and others exist, please provide us your use case!
The latest release 2.1.1 of PPX further extends the capabilities to extract meaning from text with high precision and high performance:
- use of tf-idf (term frequency inverse document frequency)
- Creation of a textcorpus for tf-idf
- Use tf-idf calculation during extraction
- Corpus / thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ... alignment
- show missing concepts
- show not used concepts
- Use regular expressions to match specific patterns in texts
- Use parts of the thesaurus as dynamic components for regular expressions
- Calculate inflected forms (at the moment for German)
- Word forms are added to the extraction model and used during extraction
- List of inflected forms can be imported to thesaurus
- Split compound words (at the moment for German)
PPX can be tested online as a web service, please send us a short note describing your interest and we will provide further details.