Using a Machine Learning Data Catalog to Reboot Data Governance

Early attempts at data governance were largely predisposed to “organization”: developing organization charts mapping out the roles of the “data governance council,” the data owners, and the data stewards, along with ironing out the details of processes for defining and approving data policies. In some cases, these organization activities were accompanied by rote operational tasks such as manually surveying tables and documenting data element metadata. In general, these activities are focused on what could be called the “data production lifecycle,” or the processes applied from data acquisition or creation to its delivery to some sort of database, data warehouse, or other type of reporting system.

The challenge is that limiting the data governance activity to manipulating org charts or mindless manual tasks does not contribute to any of the key objectives of operational data governance such as:

Using a Machine Learning Data Catalog to Reboot Data Governance