Research Class: Multilingual Keyword Extraction

Datum održavanja: petak, 27.9.2019. u 12:00 sati, prostorija O-357
Predavač: Slobodan Beliga, Odjel za informatiku Sveučilišta u Rijeci
Naziv predavanja: Multilingual Keyword Extraction


Automatic keyword extraction task is the initial step in many systems for natural language processing (NLP), text mining (TM), and information retrieval (IR). Keywords concisely and compactly describe the subject of the text. This talk will present the issues of automatic keyword extraction and introduce an unsupervised graph-based method for this challenge. Within the novel, Selectivity-Based Keyword Extraction (SBKE) method, new centrality measures for keyword extraction task will be proposed and tested. SBKE method extracts keywords from the source text represented as a language complex network. The node-level centrality measure called selectivity is calculated from a weighted network as the average weight distributed on the links of a single node and is used in the procedure of keyword candidate ranking and extraction. It will be shown that selectivity-based keyword extraction slightly outperforms an extraction based on the standard centrality measures: in/out-degree, betweenness, and closeness.

Furthermore, it will be presented that method does not require external linguistic knowledge, which is commonly used in similar methods since SBKE is purely derived from a network structure, making it suitable for use in different natural languages and in a multilingual scenario. However, results point out that selectivity-based keyword extraction has excellent potential for the collection-oriented keyword extraction task, too. The talk will show achieved results in terms of standard IR measures and Kappa statistics, tested for different natural languages (Croatian, English, Serbian and Italian) and for various domains (scientific publications in the field of mining and geology, essays and critiques in architecture and design, news form politics, sports, culture and economy, and technical texts from Wikipedia in the field of computer science). This will demonstrate that method is suitable for language and domain independent keyword extraction.