As mentioned in my post describing the major business processes that comprise a data governance function, the Define processes documents data definitions and business context associated with business terminology, taxonomies, relationships, as well as the policies, rules, standards, processes, and measurement strategy that must be defined to operationalize data governance efforts. This process runs parallel and is iterative to the Discover process stage as Discovery drives Definition, and Definition drives more targeted focus for Discovery.
The most relevant processes that comprise the Define stage include:
Business Glossary Creation. Collaborative process to capture and share full business context around critical data. In addition to the expected definitions of core data entities and attributes, context can also include rules, policies, reference data, free form annotation, links, and data owners, to name a few.
Data Classification. In the world of structured data, this process is often referred to as metadata management – the act of capturing relevant supporting business and IT context about data in the form of metadata. For unstructured content, classification plays a critical role in tagging and categorizing content with appropriate context to deliver relevant search results.
Data Relationships Definition. Process of defining data relationships, mappings and hierarchies at both the metadata (data modeling) and data (business hierarchies) levels.
Reference Data Management. Process for defining and standardizing reference data to be used within and across applications to ensure consistent data capture and usage. Reference data can include business-defined lists of values for internally managed attributes (e.g., customer type, product color), industry standardized values (e.g., ISO 3166 standards for State and Country codes), or industry regulated standards (e.g., GS1 GDSN data synch code lists)
Business Rules Definition. The process of creating and documenting logical business requirements to build the rules and policies for data validation, cleansing, enrichment, matching, merging, masking, archiving, standardization, etc. These rules define both the automated machine-supported and manual human-centric processes.
Data Governance Policies Definition. Process of defining data governance-driven policies such as data accountability and ownership, organizational roles and responsibilities, data capture and validation standards, data access and usage, as well as data masking, archiving, subsetting and retention. (For more on data governance policies, see my post “Data Governance Policies Shape Organizational Behaviors”)
Other Dependent Policies Alignment. It’s likely other efforts within an organization already document business- or IT-driven policies that set parameters on how enterprise data should be managed and used, but may not currently label it a “data governance” policy. These include policies for information security, data privacy, GRC, IT governance and others.
Key Performance Indicator (KPI) Definition. Processes that define service level agreements (SLAs), operational baselines metrics for DQ and policy compliance, return on investment (ROI) and total cost of ownership (TCO) measures, and other measures used to define the effectiveness and value delivered from the data governance efforts. (For more on measuring data governance effectiveness, see my post “Measuring Data Governance: Lies, Damned Lies, and ROI”)
Stay tuned next week for the "Apply" process stage deep dive...