Practical Data Mapping and Governance Whitepaper - How To Guide - 02JUNE2022
Practical Data Mapping and Governance Whitepaper - How To Guide - 02JUNE2022
Defined Terms
▪ Data = personal information (“PI”:CA), protected health information (“PHI”), personal data (“PD”:other
state laws/GDPR)
▪ DPLC (data processing lifecycle) “follows the data” through collection (sources/suppliers), usage/access,
sharing/transfer, storage (resources) and retention/disposal (“processing”) as well as for what purposes
▪ Resource means assets (apps/software, databases, systems, technologies), internal resources (file
cabinets) and external resources (service providers/data processors, third parties) housing/containing data
▪ Data mapping is about developing a visual DPLC diagram and inventories of associated resources/data and
service providers (data processors/HIPAA business associates)
Key Objectives
1. Define and name discreet data processing lifecycles (DPLCs) within an organization.
2. Develop a thorough understanding of what data is collected, how and from whom (source) it is
collected, processed/used, shared/transferred and retained/disposed, including backups.
3. Create a visual data flow diagram for each DPLC. This helps you tell your data story to key internal
and external stakeholders, including business partners, investigators and auditors.
4. Develop comprehensive inventories of resources, data, and service providers and classify the data’s
sensitivity level.
5. Establish a practical data governance strategy with clear roles and responsibilities for DPLC process
owners and resource owners/custodians to facilitate accurate privacy notices, Privacy/Security-by-
Design, proper privacy rights fulfilment, etc.
Effective data mapping and data governance are foundational to a sustainable and defensible privacy
and security program, complying with privacy and security laws, regulations, standards and policies, and
protecting the company, including its brand/reputation, consumers, investors, board/executives and
other stakeholders.
S I P O C
Data Data Data Data Data Data
Suppliers Resource Inputs, DPLC Process Steps Outputs, Resource Customers
/ Data From & Formats & Formats & To & / Endpoints
Sources How / How Moved How Moved / How /
Data / Transferred Data
Location Transferred Location
Notice
Collection, How & Purpose
Used / Processed / Accessed
& Purpose
Used / Processed / Accessed
& Purpose
Shared / Disclosed & Purpose
Cross Border Data Transfers &
Purpose
Stored / Backed-up
Disposed
Partially filled in SIPOC: This is a fictitious clinical laboratory processing example. Using business
process mapping architecture, this is an example of a “level 0” process. This could be further broken into
“level 1”, “level 2”, “level 3”, etc., sub-level processes.
Outputs/Deliverables (collectively along with the SIPOC, these are the “Data Mapping Documents”)
1. Data Flow Diagram: Convert the SIPOC information to a Visio or like diagram with swim lanes
representing organizational or functional control of appropriate parts of the data flow. Diagraming is
a little bit of art and a little bit science. When multiple data collection points merge into a single
infrastructure using common resources, e.g., an ERP or Salesforce platform, this may be a single
DPLC represented by a single data flow diagram, although any complexity may require multiple
layers. Complex collection points might be considered mini-DPLCs. When processes use different
infrastructures and/or resources, these should be separate DPLCs requiring separate data flow
diagrams, e.g., B2B and D2C business channels and acting as data controller vs. service provider.
This is about data mapping the core data processing lifecycle process. One-off processes
(exceptions) are normally not included. However, if certain exception processes occur frequently
enough, these can also be mapped.
Sample data flow diagram created from SIPOC: The fictitious clinical laboratory processing
data flow diagram follows.
3. Service Provider Inventory: If no contract management system is in place, this should be created
to track agreements/amendments and initial and any periodic assessments to assure and document
compliance. For each service provider, the inventory should also identify related data types, data
sensitivity classification, and assigned resource owner.
DPLC Dashboard: For organizations with multiple DPLCs, creating and maintaining a DPLC side-by-side
dashboard (Excel works) facilitates visibility and governance. It should identify the DPLC’s name, the
DPLC Process Owner, business model (B2B, D2C), role (controller, processor), countries/states,
applicable laws/regulations, data classification types, etc.
Manual Data Mapping Limitations: The quality and accuracy of data maps developed through this
process are dependent upon having the right people identified and participating in the data mapping
interviews, properly vetting the diagrams for sign-off, and ultimately owning these going forward.
Automated Data Mining and Mapping Limitations
▪ Automated data discovery tools: We have observed clients’ use of automated data discovery reports
which produced many false positives and did not capture all types of personal data. However, these
tools may be useful to check and validate the data elements captured during the data mapping
interview process. The effectiveness of these tools should improve over time using AI.
▪ Automated data mapping tools: Many of these automated tools today simply create “data lineage”
maps showing how data moves from one resource to another. While helpful, the data lineage maps
we have observed do not represent complete data flow diagrams of data processing lifecycles. Thus,
establishing governance around DPLCs as described within this whitepaper cannot be fully achieved.
It remains to be seen whether future automated data mapping tools can replicate all the benefits
from using an interactive, highly participatory data mapping interview process, as espoused at the
beginning of this whitepaper.
Automated scanning, identifying and inventorying of specific data elements makes sense for large
organizations with vast amounts of data residing in numerous data repositories. However, complete
reliance on such automation is not advisable at this time.