1 documents found
Information × Registration Number 2124U004709, Article popup.category Препринт Title popup.author Maksymiuk Yuliia popup.publication 01-01-2024 popup.source_user Український католицький університет popup.source https://hdl.handle.net/20.500.14570/4858 popup.publisher Description Linguistic diversity is a significant concern, as numerous languages worldwide are low-resource, including Ukrainian. Large language models (LLMs) have the poten- tial to preserve these languages, yet their capacity to handle dialects remains unex- plored. This study focuses on adapting LLMs to the Hutsul dialect of Ukrainian, for which no parallel corpus or dictionary existed before this research. The initial ver- sion of the parallel corpus consisting of 9,852 Ukrainian-Hutsul sentence pairs was created, along with the baseline version of the Hutsul-Ukrainian dictionary com- prising 7,320 word pairs. The Retrieval-Augmented Generation (RAG) approach has been applied to enrich the corpus with synthetic data, achieving a statistically similar quality to the original data and expanding the corpus to 16,342 sentence pairs. This synthetic dataset demonstrated a positive effect on the performance of Mistral-7B, suggesting that synthetic data effectively enhances model performance. Three large language models — LLaMA-3-8B, Mistral-7B, and Gemma-7B — were adapted to the Hutsul dialect using parameter-efficient fine-tuning (PEFT) technique LoRA. The proposed approach, which combines the adaptation of language models and the creation of synthetic data, provides a first version of the basis for creating a Ukrainian dialect corpus that will be expanded. The resources and adapted mod- els will be available in the public domain for further research on dialects and low- resource languages. Although this is an initial work, it opens up new opportunities for preserving linguistic diversity and creating Ukrainian dialect data, which is one of the first steps towards Ukrainian NLP for dialects. popup.nrat_date 2025-05-09 Close
Article
Препринт
Maksymiuk Yuliia. : published. 2024-01-01; Український католицький університет, 2124U004709
1 documents found

Updated: 2026-03-22