Data Inventory Guide - GovEx Labs
Data Inventory Guide - GovEx Labs
A data inventory is a fully described record of the data assets maintained by a city. The inventory records basic information about a data asset including its name, contents, update
frequency, use license, owner/maintainer, privacy considerations, data source, and other relevant details. The details about a dataset are known as metadata.
Because cities may have thousands of datasets across multiple servers, databases, and computers, it’s helpful to narrow down which datasets should be included in the inventory overall
and how to plan for inventory updates in the future. The datasets worth inventorying are those which are considered assets to employees, departments, executive leadership, and the
general public. Data assets can range from individual datasets that are connected to forms that people fill out, to integrated databases that track a city’s operations in any given field
(building permits, public safety responses, etc.)
The first step to treating your city’s data as an asset is to create a comprehensive data inventory with consistent metadata. Knowing what data your city collects leads to efficiency, and
increases accountability. It also eases citywide reporting, decision making, and performance optimization.
Managing a data inventory reduces risk and uncertainty by creating a checklist for security and compliance requirements and improves a city’s ability to designate accountability for the
quality of the data collected and created. Just as it is important for cities to know what data they have, it’s equally important to know what data a city does not have. With a complete
picture, cities can begin to collect and use city data to better align mission goals, increase consistency and confidence in decision making, and build performance intelligence.
Managing a data inventory is crucial to better information sharing and integration and a sustainable comprehensive open data program. Providing a public data inventory will make city
employees’ jobs easier when they need information from another department - they will know what exists and how to find it. The same benefits apply to the public regarding its search for
city information. Having a complete inventory is also important when determining which datasets to release publicly. It’s not feasible to release all of a city’s public datasets at once, so
decisionmakers need a prioritization strategy. The data inventory can be used to prioritize the release of data according to strategic priorities, public interest, etc.
Step 1: Establish an Oversight Authority - Conducting a data inventory across departments requires coordination, oversight, and leadership. The first step to conducting an
inventory is establishing who will manage the inventory process. Oversight authorities can come in a variety of shapes and sizes and are often defined in a city’s Open Data Policy.
Some are led by a Chief Data Officer (or similar role), others leverage existing enterprise data management bodies, and others are working groups that include public representation.
While the breadth and depth of data governance authorities can range to best suit your city’s needs, establishing a clear authority body to oversee the data inventory process is key
to success. In the absence of a data governance committee, you may wish to identify a lead liaison, preferably within the Mayor or City Manager’s office, to interact with
departments and facilitate this process.
Note: Establishing a data governance committee, or repurposing an existing committee, is an optional, but highly recommended step in success
fully completing the inventory process.
Step 2: Determine the Data Inventory Scope and Plan - The oversight authority, such as a data governance committee, should manage the inventory process by providing an
unambiguous scope, deadlines, performance metrics, and guidelines.
Scope: If the scope is not already defined in your city’s Open Data Policy, the oversight authority should determine the scope of the data inventory at hand. If your city does
not already have a data inventory in place, creating a city-wide comprehensive data inventory can range in difficulty depending on how many data assets your city manages,
how siloed those assets are managed, and your available capacity to conduct the inventory. When defining the scope of the data inventory, the oversight authority should
consider the following:
Any relevant data definitions or inventory requirements that are included in your city’s Open Data Policy
Any government records definitions outlined in your local Records Management policies (i.e. Distinguish government datasets from non-record data and personal data
notes)
All data assets
Strategic-priority-specific assets
Individual departmental assets
Plan: The data inventory plan.
Required Metadata
Deadlines
Guidance
Performance Metrics
Step 3: Catalog Data Assets in Accordance with Inventory Plan - Liaisons in each city department or agency catalogue and describe the data assets within their departments.
Liaisons are employees who are responsible for managing the inventory process at the departmental/agency level. The lead manager of the data inventory compiles the individual
department inventories into a larger citywide data inventory. Inventories should be structured in machine-readable format (Spreadsheet, CSV, JSON, etc.)
The data governance committee establishes the extent to which the inventory is made public.
Data governance committees can publish high level inventory summaries that specify which datasets are to be published, or remain unpublished.
Philadelphia Example
Philadelphia’s inventory lists all datasets that it uncovers, even those that will never be released due to sensitive content such as Personally Identifiable Information (PII)
or security concerns. Philadelphia lists the following notes with datasets that contain sensitive information: “Some data sets in this inventory cannot be published as
open data. Others could be published after sensitive data is removed (such as personal information).”
https://labs.centerforgov.org/data-governance/data-inventory/ Page 1 of 3
Data Inventory Guide - GovEx Labs 2/2/23, 12:56 PM
Step 5: Initiate Data Prioritization Efforts - The data governance committee establishes the priority and scheduling of the publication of datasets described in the inventory.
Governments complete data inventories for a variety of reasons. Data inventories are a great way to figure out what data is being collected (and if there is any duplication among
departments), determine what systems are in use and their analytics capabilities, promote transparency, develop data publishing plans, and learn about current challenges and
opportunities within the organization that might affect its open data goals. Because many inventorying efforts require participation from a wide variety of staff, inventorying is also a
great opportunity to build relationships and convey the importance of inventorying and the open data program.
You don’t need an open data policy to complete and find value in a data inventory. GovEx surveyed municipalities with and without open data inventories, the consensus is that
having an open data policy that calls for a data inventory is helpful in completing the inventory in a timely manner and demonstrating its importance across the organization, but not
necessary.
There’s no one size fits all approach for data inventories. Inventories should be customized to fit the government’s needs and open data goals. Some governments begin with a
targeted approach to inventorying in one department, one IT system, or around one strategic priority; other governments dive right in attempting to inventory all their data systems
and datasets in one go. Some governments have their open data coordinators complete the inventory; others hire third-party auditors. It’s important to take time to determine what
the right approach is for your organization. This includes exploring how familiar staff are with open data, how bought in they are to your organization’s open data goals, their capacity
to assist in completing an inventory, any open data legislation that might deal with inventorying, and how you plan to share the results of the data inventory.
Inventorying works best as an organization-wide effort. Inventorying data is a chance to connect with city staff, relay the importance of the city’s open data program, and provide
training around open data. Making this a citywide effort can be a unifying process that thoroughly addresses concerns about open data, builds buy-in throughout the city, and
generates conversation among frontline staff, managers, and senior leadership about data.
Training is the first step to creating a good inventorying experience. Members of your organization undoubtedly have different understandings and knowledge about open data
and the importance of completing an inventory. Providing information about open data in general, the city’s open data program, its goals, and why it’s doing an inventory ensures
that everyone is on the same page and motivated to contribute to the inventorying process.
Inventorying is a continual process. Some cities have mandates which require them to update their inventories on an annual basis, but all the local government we spoke with
plan to update their inventory routinely and regularly.
A note about privacy. Do not exclude any datasets based on privacy or confidentiality concerns. To make the data inventory as useful as possible, it should include data that may
be sensitive, private, or unlikely to be released. Always include a description of the sensitivity concerns.
Kansas City, MO
Intro to Open Data presentation (https://drive.google.com/file/d/0B3D_5mo12oglREJQOGpxbnpzWXM/view?usp=sharing)
Sample Memos (https://drive.google.com/open?id=0B3D_5mo12oglNzlUa0dxRC1KU1U)
Chattanooga, TN
Data Inventory Guide (https://docs.google.com/document/d/1ApDBPeyIb1OfyGB-sP18USlg1miKgE0Vw8rV9gPCfX4/edit?usp=sharing)
Montgomery County, MD
Montgomery County Government Open Data Implementation Plan (including Dataset Inventory Process)
(http://montgomerycountymd.gov/open/Resources/Files/OpenDataImplementationPlan_FY14.pdf)
Philadelphia, PA
Open Data Strategic Plan (http://cityofphiladelphia.github.io/slash-data/phl_opendata_plan.pdf?_ga=1.264434959.101783352.1408377064)
San Diego, CA
Open Data Implementation Update (https://datasd.gitbooks.io/council_report/)
San Francisco, CA
5 Ways to Scale the Mountain of Data in Your Organization blog (discusses alternatives to a full scale inventory) (http://datasf.org/blog/5-ways-to-scale-mountain-of-data/)
One Page Summary of Data Inventory Process (https://drive.google.com/file/d/0B3XTBxBQSd0hX0MzWnRodTVVNG8/view)
San Jose, CA
Data Coordinator Description (https://drive.google.com/file/d/0B3D_5mo12oglaFVfQWZFc2p1MlU/view)
Data Inventory Process Diagram (https://drive.google.com/file/d/0B3D_5mo12oglNDV1aEJtVDA4SDQ/view?usp=sharing)
Data Coordinator Guidebook (https://drive.google.com/file/d/0B3D_5mo12oglcTJ3TGRza3JHdXc/view?usp=sharing)
GovEx Sample Data Sources
Sample data sources (https://github.com/govex/govex.github.io/tree/master/data-governance/data-inventory/sample-data-sources)
Step 3: Catalog Data Assets in Accordance with Inventory Plan
https://labs.centerforgov.org/data-governance/data-inventory/ Page 2 of 3
Data Inventory Guide - GovEx Labs 2/2/23, 12:56 PM
Chattanooga, TN
Data Inventory Template (https://docs.google.com/spreadsheets/d/19CDZVvmLm0KJTBMcXN5ULk39T6Aag6u6-gI4SjiwVVM/edit)
Chicago, IL
Data Dictionary (http://datadictionary.cityofchicago.org/)
Philadelphia, PA
Metadata Catalog (in beta) (http://cityofphiladelphia.github.io/metadata-catalog/#home/)
Inventory Template (https://docs.google.com/spreadsheets/d/19CDZVvmLm0KJTBMcXN5ULk39T6Aag6u6-gI4SjiwVVM/edit?usp=sharing)
Open Data Inventory (http://cityofphiladelphia.github.io/slash-data/inventory/?_ga=1.264434959.101783352.1408377064)
San Francisco, CA
DataSF Guidebook: Data Coordinators Edition (https://docs.google.com/document/d/1CJ2uZSYEYcPb6bpcr24kcRCV0zDN-9xYE-o7FA23EMk/edit#)
DataSF Guidebook: Detailed Inventory Guide Steps 2 & 3 (https://docs.google.com/document/d/1W5C5oO2TrVnmOgLe81_KYgmbghj6hDs9-4SC-ygMDV4/edit)
San Jose, CA
Inventory Template (https://drive.google.com/file/d/0B3D_5mo12oglaTNOXzl4TktnVm8/view?usp=sharing)
Toronto, ON, Canada
Data Catalogue (http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=1a66e03bb8d1e310VgnVCM10000071d60f89RCRD)
Step 4: Data Inventory Quality Checks
Montgomery County, MD
Montgomery County Data Publishing Plan (https://data.montgomerycountymd.gov/Government/dataMontgomery-Publishing-Plan/xb2w-gwkm)
New York City, NY
NYC Open Data Plan (https://data.cityofnewyork.us/City-Government/NYC-Open-Data-Plan/v475-8jcj) (released 2013, 2014, 2015
(http://www1.nyc.gov/assets/home/downloads/pdf/reports/2015/NYC-Open-Data-Plan-2015.pdf))
NYC Open Data Plan - List of Datasets Removed (https://data.cityofnewyork.us/City-Government/NYC-Open-Data-Plan-List-Of-Datasets-Removed/unw7-yyit) (NYC
released a public inventory/schedule in 2013, but then reneged on planning to make publicly available some datasets)
NYC just passed a law (http://legistar.council.nyc.gov/LegislationDetail.aspx?ID=2460488&GUID=4D8BEE2E-106A-4752-85EE-ACC115233069&FullText=1) requiring a
series of investigations and audits to see how compliant agencies are with the open data law requirements.
Philadelphia, PA
OpenDataPhilly.org (http://opendataphilly.org/)
Open Data Census (http://cityofphiladelphia.github.io/slash-data/census/?_ga=1.264434959.101783352.1408377064)
San Francisco, CA
DataSF Guidebook: Data Coordinators Edition (https://docs.google.com/document/d/1CJ2uZSYEYcPb6bpcr24kcRCV0zDN-9xYE-o7FA23EMk/edit#)
Step 5: Initiate Data Prioritization Efforts
Denver, CO
Procedure for Evaluating Open Data Value (https://drive.google.com/file/d/0B2Evzkhx_rmSUDhHczBTS1dVaTg/view?usp=sharing)
Philadelphia, PA
Open Data Google Forum (http://cityofphiladelphia.github.io/slash-data/discuss/?_ga=1.264434959.101783352.1408377064)
San Francisco, CA
DataSF Guidebook: Data Coordinators Edition (https://docs.google.com/document/d/1CJ2uZSYEYcPb6bpcr24kcRCV0zDN-9xYE-o7FA23EMk/edit#)
(http://creativecommons.org/licenses/by/4.0/) This work is licensed under a Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/).
Last updated: 2022-04-12 19:33:25 +0000 | Johns Hopkins University (https://www.jhu.edu/) | Center for Government Excellence (http://govex.jhu.edu/) | Labs
(http://labs.centerforgov.org/) | find us on: twitter (https://www.twitter.com/gov_ex) | facebook (https://www.facebook.com/centerforgov) | github (https://www.github.com/govex)
https://labs.centerforgov.org/data-governance/data-inventory/ Page 3 of 3