As mentioned in my post describing the major business processes that comprise a data governance function, the Define processes documents data definitions and business context associated with business terminology, taxonomies, relationships, as well as the policies, rules, standards, processes, and measurement strategy that must be defined to operationalize data governance efforts. This process runs parallel and is iterative to the Discover process stage as Discovery drives Definition, and Definition drives more targeted focus for Discovery.
The most relevant processes that comprise the Define stage include:
Business Glossary Creation. Collaborative process to capture and share full business context around critical data. In addition to the expected definitions of core data entities and attributes, context can also include rules, policies, reference data, free form annotation, links, and data owners, to name a few.
- Ensures everyone is on the same page –data architects, modelers, developers, stewards and data consumers: business process owners as well as operational and strategic decision-makers.
Data Classification. In the world of structured data, this process is often referred to as metadata management – the act of capturing relevant supporting business and IT context about data in the form of metadata. For unstructured content, classification plays a critical role in tagging and categorizing content with appropriate context to deliver relevant search results.
- Effective data classification delivers context and fast tracks information access to business users, allowing them to respond quickly to regulatory and compliance requirements, reduce costs and inefficiencies and gain improved insights into the business and customers. Trusted data classification benefits IT by reducing integration complexity, providing transparency often missing from black box/custom coding and ultimately improves collaboration, agility and time to value.
Data Relationships Definition. Process of defining data relationships, mappings and hierarchies at both the metadata (data modeling) and data (business hierarchies) levels.
- A data model without relationships is just a data inventory. Defining the expected relationships between master data, transactional data and reference data – and the applications and processes that depend on them - ultimately defines an organization’s business model. Data hierarchies (e.g., organizational, bill of materials, customer, product, sales, and channel) represent the foundation for an organization’s planning, decision-making, and customer engagement.
Reference Data Management. Process for defining and standardizing reference data to be used within and across applications to ensure consistent data capture and usage. Reference data can include business-defined lists of values for internally managed attributes (e.g., customer type, product color), industry standardized values (e.g., ISO 3166 standards for State and Country codes), or industry regulated standards (e.g., GS1 GDSN data synch code lists)
- Common, standardized reference data ensures the ability to reconcile, aggregate and gain insight from data shared across disparate application, functional, geographic or organizational silos or across B2B trading partners.
Business Rules Definition. The process of creating and documenting logical business requirements to build the rules and policies for data validation, cleansing, enrichment, matching, merging, masking, archiving, standardization, etc. These rules define both the automated machine-supported and manual human-centric processes.
- When operationally implemented in the Apply process stage, business rules are the key to ensuring data is trusted, secure, and ultimately fit for business usage.
Data Governance Policies Definition. Process of defining data governance-driven policies such as data accountability and ownership, organizational roles and responsibilities, data capture and validation standards, data access and usage, as well as data masking, archiving, subsetting and retention. (For more on data governance policies, see my post “Data Governance Policies Shape Organizational Behaviors”)
- When defined, approved, evangelized and enforced appropriately, these policies have the power to evolve your corporate culture to one that manages data as an asset. Without these top-down, executive-driven policy mandates, it will be difficult to change past behavior impacting data quality and security.
Other Dependent Policies Alignment. It’s likely other efforts within an organization already document business- or IT-driven policies that set parameters on how enterprise data should be managed and used, but may not currently label it a “data governance” policy. These include policies for information security, data privacy, GRC, IT governance and others.
- When a data governance effort is ready to scope and define what policies it must document and implement, it can start with the work that’s already been done and reconcile which policies should be owned by the data governance effort, which should simply be recognized and complied with, and which should be replaced or improved.
Key Performance Indicator (KPI) Definition. Processes that define service level agreements (SLAs), operational baselines metrics for DQ and policy compliance, return on investment (ROI) and total cost of ownership (TCO) measures, and other measures used to define the effectiveness and value delivered from the data governance efforts. (For more on measuring data governance effectiveness, see my post “Measuring Data Governance: Lies, Damned Lies, and ROI”)
- Data governance efforts will not receive sponsorship, resources, funding or prioritization without a means to measure the value and effectiveness of the efforts.
Stay tuned next week for the "Apply" process stage deep dive...