Знайдено документів: 1
Інформація × Реєстраційний номер 2124U004709, Матеріали видань та локальних репозитаріїв Категорія Препринт Назва роботи Large Language Models adaptation to Hutsul dialect Автор Maksymiuk Yuliia Дата публікації 01-01-2024 Постачальник інформації Український католицький університет Першоджерело https://hdl.handle.net/20.500.14570/4858 Видання Опис Linguistic diversity is a significant concern, as numerous languages worldwide are low-resource, including Ukrainian. Large language models (LLMs) have the poten- tial to preserve these languages, yet their capacity to handle dialects remains unex- plored. This study focuses on adapting LLMs to the Hutsul dialect of Ukrainian, for which no parallel corpus or dictionary existed before this research. The initial ver- sion of the parallel corpus consisting of 9,852 Ukrainian-Hutsul sentence pairs was created, along with the baseline version of the Hutsul-Ukrainian dictionary com- prising 7,320 word pairs. The Retrieval-Augmented Generation (RAG) approach has been applied to enrich the corpus with synthetic data, achieving a statistically similar quality to the original data and expanding the corpus to 16,342 sentence pairs. This synthetic dataset demonstrated a positive effect on the performance of Mistral-7B, suggesting that synthetic data effectively enhances model performance. Three large language models — LLaMA-3-8B, Mistral-7B, and Gemma-7B — were adapted to the Hutsul dialect using parameter-efficient fine-tuning (PEFT) technique LoRA. The proposed approach, which combines the adaptation of language models and the creation of synthetic data, provides a first version of the basis for creating a Ukrainian dialect corpus that will be expanded. The resources and adapted mod- els will be available in the public domain for further research on dialects and low- resource languages. Although this is an initial work, it opens up new opportunities for preserving linguistic diversity and creating Ukrainian dialect data, which is one of the first steps towards Ukrainian NLP for dialects. Додано в НРАТ 2025-05-09 Закрити
Матеріали
Препринт
Maksymiuk Yuliia. Large Language Models adaptation to Hutsul dialect : публікація 2024-01-01; Український католицький університет, 2124U004709
Знайдено документів: 1

Оновлено: 2026-03-14