Знайдено документів: 1
Інформація × Реєстраційний номер 2120U007886, Матеріали видань та локальних репозитаріїв Категорія Препринт Назва роботи Matching Red Links with Wikidata Items Автор Liubonko KaterynaLiubonko Kateryna Дата публікації 01-01-2020 Постачальник інформації Український католицький університет Першоджерело https://hdl.handle.net/20.500.14570/2051 Видання Опис This work tackles the problem of matching Wikipedia red links with existing articles. Links in Wikipedia pages are considered red when lead to nonexistent articles. In other Wikipedia editions could exist articles that correspond to such red links. In our work, we propose a way to match red links in one Wikipedia edition to existent pages in another edition. We solve this task in a context of Ukrainian red links and English existing pages. We created a dataset of 3 171 most frequent Ukrainian red links and a dataset of 2 957 927 pairs of red links and the most probable candidates for the correspondent pages in English Wikipedia. This dataset is publicly released1. We defined the task as a Named Entity Linking problem. Red links are named entities and we link Ukrainian red links to English Wikipedia pages. In this work we provide a thorough analysis on the data and define its conceptual characteristics to exploit in entity resolution. These characteristics are graph properties (connections with the pages where red links occur and connections with the pages which occur in the same pages with red links) and word properties (title names). BabelNet knowledge base was applied to this task. We evaluated its powers in terms of F1 score (29 %) and regarded it as a baseline for our approach. To improve the results we introduced several similarity metrics based on mentioned red links characteristics. Combined in a linear model they resulted in F1 score 85 % which is our best result. In our thesis we also discuss bottlenecks and limitations of the current approach and outline the ideas for future improvements. To the best of our knowledge,we are the first to state the problem and propose a solution for red links in Ukrainian Wikipedia edition. All the code for this project is publicly released on github. Додано в НРАТ 2025-11-05 Закрити
Матеріали
Препринт
Liubonko Kateryna. Matching Red Links with Wikidata Items : публікація 2020-01-01; Український католицький університет, 2120U007886
Знайдено документів: 1

Оновлено: 2026-03-15