Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in Ontology, and in various applications. This paper presents an architecture-MwTExt for automatic extraction of multi-word terms- MWTs, as compound concepts from un-annotated natural language English texts corpora, for automatic construction of Ontology. Shallow parsing and syntactic structure analysis are used to extract compound concepts, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The lexical descriptions of MWTs are further encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain corpus and the results obtained are compared with Text2Onto, a prominent Ontology learning tool. The result signifies that MwTExt performs better for extraction of accurate and realistic lexicalized MWTs with significant average precision of 97%.
Multi-word Terms. Compound concepts. Ontology. NLP