Reducing Hallucination in Multilingual Voice Agents Using Instruction-Tuned Models
Main Article Content
Abstract
In highly applied multilingual voice agents of customer service and interactive AI systems in the world, one persistent problem constantly haunts the industry/field: hallucinations--syntactically adequate responses, which are logically wrong or simply inapplicable. These are multiplied in the multilingual setting as they have disparate linguistic peculiarities, professional vocabularies, and underrepresented languages. The current paper investigates how far it is forthcoming to trim hallucinations in multilingual voice agents via instruction-tuned models finetuned to pay heed to clear task guidelines. We provide finetuning of the big-resource multilingual transformer-based models (that is, mT5, XGLM, BLOOMZ), and propose the measure across ten languages as the measure for understanding, factual accuracy, and language consistency.
To approximate the uniqueness of our methodology, we propose a hybrid evaluation apparatus of automated rating (BLEU, COMET, and factual consistency scores) and human evaluation, which is modified according to the cultural and linguistic peculiarities of the language, which is the subject of our research. Experiments conducted by us indicate that the tuning of instruction by a large margin decreases hallucinations, particularly when referring to retrieval-augmented generation (RAG) and task-completion instructions. We also discuss how the phenomenon of instruction tuning used in high-resource and low-resource languages is different and may lead to hallucination in the context of various families of languages. It may also be performed to determine the trends on both the structural and syntactic levels. Last, we suggest the most effective way to calibrate the pipes with the instructions in multilingual voice systems on an operational stage.
With our findings, the future success of multilingual voice assistance is adapting instruction-tuned models to generate more factual reliability, compatibility in the languages, and confidence being guaranteed through mediation of safer and more responsible conversational AI agents across the language divide.