Your browser is out-of-date!

For a richer surfing experience on our website, please update your browser.Update my browser now!

×

MwTExt: Automatic Extraction Of Multi-word Terms For Ontology Learning

Pratik Thanawala
SCS-WP-2017-02-001
pratik.thanawala@ahduni.edu.in SCHOOL OF COMPUTER STUDIES

Abstract

Multiword expressions are omnipresent element of natural language, whose    construal as a linguistic resource has   significant importance in Ontology, and in various applications. This paper presents an architecture-MwTExt for automatic extraction of multi-word terms- MWTs, as compound concepts from un-annotated natural language   English texts corpora, for automatic construction of Ontology. Shallow parsing and syntactic structure analysis are used to extract compound concepts, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The lexical descriptions of MWTs are further encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain corpus and the results obtained are compared with Text2Onto, a prominent Ontology learning tool. The result signifies that MwTExt performs better for extraction of accurate and realistic lexicalized MWTs with significant average precision of 97%.
 

Keywords

Multi-word Terms. Compound concepts. Ontology. NLP

Recent Working Papers