Data Engineering in the Age of Large Language Models: Transforming Data Access, Curation, and Enterprise Interpretation

Kushvanth Chowdary Nagabhyru

PDF

Published: Dec 31, 2024

Keywords:

large language models (LLMs), data engineer- ing, AI-driven data access, data curation, enterprise data in- terpretation, business intelligence, data bias, ethical AI, future of data engineering,Large language models (LLMs) represent a critical advance for data engineering.

Kushvanth Chowdary Nagabhyru

Abstract

For many years, the guiding principle in enterprise data management has been, ”garbage-in, garbage-out”—meaning that the quality of downstream reporting and analyses can only be as good as the quality of the data that enter the system. These words still remain true as organizations struggle to gain insight from their enterprise data. Yet, 2024 is shaping up to be the year of ”garbage-in, garbage-read” for enterprise data interpretation. Large language models (LLMs), such as ChatGPT, have demon- strated an unprecedented capability to convert unstructured text into human-like natural language, answering questions regardless of the source of the input text and summarizing content as part of higher-level reasoning. The impact of LLMs is much broader than natural language processing alone—they affect how organizations curate and access information as well. This article summarizes research about AI techniques that help users get the right data in the right format; automate the evaluation and curation of data; and, finally, apply natural language processing directly to the transformed data to provide enterprise intelligence.

Issue

Volume 2024, Issue 12

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details