Проблемы извлечения терминологического ядра предметной области из электронных энциклопедических словарей

Problems of Extracting Terminological Core of the Subject Domain from Electronic Encyclopedic Dictionaries

The paper is devoted to the problems of automatic construction of the terminological system of the subject domain. A method for extracting domain terms from electronic encyclopedic data sources is proposed. The peculiarity of the proposed approach is a thorough analysis of the term structure, recognition of errors based on their linguistic classification, automatic generation of lexical-syntactic patterns representing multi-component terms, and the use of a set of heuristic methods for processing "special" terms. By analyzing encyclopedic dictionaries, a reference list of concept names is automatically formed, which is used to assess the quality of the dictionaries being developed.