Automating Data Governance and PII Compliance Using Unity Catalog in AI-Driven Data Ecosystems
Main Article Content
Abstract
Tailored data ecosystems enable organizations to continuously generate and derive business value by processing data into valuable information using AI workflows and pipelines. The wider adoption of AI, however, introduces new risks, such as model bias and unethical decision-making. Recent developments demonstrate the importance of establishing an AI governance framework to minimize risks and set up accessible, secure, and responsible AI systems. Based on numerous high-profile cases that reveal AI models leaking personally identifiable information (PII) – which, when exposed, poses significant legal compliance risks and reputational damage – a risk-based, regulator-driven approach to PII governance and protection in AI pipelines is also gaining traction. Recent advances in Zero Trust Architecture and principles of least privilege, along with the concurrent evolution of PII-related regulations, provide further impetus by advocating for data minimization.
Against this backdrop, automation of data governance and PII compliance across all AI workflows as part of a tailored data ecosystem becomes a pressing need. Automation improves accessibility and efficiency; enables organizations to have their own cloud-based data ecosystems, yet "outsource" their regulatory compliance implementation; and, with adequate governance, supports responsible AI development. A formal analysis defines the goals of establishing automated governance in a Unity Catalog environment typically employed by organizations to create tailored data ecosystems.