Chapter 03 - Data Classification
Chapter 03 - Data Classification
5
Cont.
• Rights and responsibilities of data ownership and custody:
• Data owners remain legally responsible for all data they own
• Data processors do not necessarily all have direct relationships
with data owners;
• processors can be third parties
• Ownership, custody, rights, responsibilities, and liability are all
relative to the dataset.
• E.g., a cloud provider is usually the data processor for a cloud customer’s
data, but the provider is the data owner for information that the provider
collects and creates, such as the provider’s own customer list, asset
inventory, and billing information.
6
The Data Lifecycle
• The data life cycle refers to the entire period of
time that data exists in the system. (chapter 4)
• It is important to know the phases of cloud data
lifecycle in sequence for the understanding of
data security concept in cloud computing
• The data owner will be identified in the Create
phase.
• Many data security and management
responsibilities require action on the part
of the data owner at this point of the lifecycle.
7
Data Categorization
• Data can be categorized based on how the data is going to be used by the
organization.
• This allows the data owner to appropriately categorize the data.
• Some ways an organization might categorize data:
1. Regulatory Compliance
• The organization may want to create categories based on which regulations apply to a specific
dataset.
• These regulation bodies might be
• Graham-Leach-Bliley Act (GLBA), (concerns financial institutions)
• Payment Card Industry (PCI),
• Sarbanes-Oxley (SOX),
• Health Insurance Portability and Accountability Act (HIPAA),
• General Data Protection Regulation (GDPR) 8
Cont.
2. Business Function
• The organization might want to have specific categories for different uses of
data:
• Perhaps the data is tagged based on its use in billing, marketing, or operations.
3. Functional Unit
• Each department or office might have its own category and keep all data it
controls within its own category.
4. By Project
• Some organizations might define datasets by the projects they are associated
with as a means of creating discrete, compartmentalized projects.
9
Data Classification
• It is the process of analyzing data for certain attributes to determine
the appropriate policies and controls to apply to ensure its security.
• Within a cloud environment, proper data classification is more crucial
than in a traditional data center because many customers and hots
are located within the same multitenant environment
• It is the responsibility of Cloud Security Professional immediately
upon the data creation
10
Types of classification
1. Sensitivity
• This is the classification model used by the US military.
• Data is assigned a classification according to the sensitivity of the data
• Must be assigned to all data
• material that is not deemed to be sensitive must be assigned the
“unclassified” label.
11
Cont.
2. Jurisdiction
• geophysical location of the source or storage point of the data might have
significant bearing on how that data is treated and handled.
• E.g., Personal identifiable information (PII) data of the European Citizen is subject
to the EU privacy laws, which are much stricter and more comprehensive than
privacy laws in the United States.
3. Criticality
• Data that is critical to organizational survival is classified in a manner different from
trivial, basic operational data.
• Business Impact Analysis (BIA) helps us determine which material would be
classified this way.
12
Data Mapping
• Data between organizations (or sometimes even between
departments) must be normalized and translated so that it conforms
in a way meaningful to both parties
• In context of classification mapping is necessary so that data that is
known as sensitive in one system/organization is recognized as such
by the receiving system/organization
• Without proper mapping efforts, data classified at a specific level
might be exposed to undue risk or threats.
13
Data Labeling
• When the data owner creates, categorizes, and classifies the data, the
data also needs to be labeled
• Label should indicate who the data owner is, usually in terms of the
office or role instead of an individual name or identity
• Labels on data in hardcopy might be printed headers and footers,
whereas labels on electronic files might be embedded in the filename
• Labels may include the following information:
• Date of creation • Handling directions • Source
• Date of scheduled • Dissemination/ • Jurisdiction
destruction/disposal distribution instructions
• Confidentiality level • Access limitations • Applicable regulation
14
Data Discovery Methods
• It can be used to refer to several kinds of tasks:
• it might mean that the organization is attempting to create an initial inventory
of data it owns or that the organization is involved in electronic discovery
• it can also mean the modern use of data mining tools to discover trends and
relations in the data already in the organization’s inventory.
1. Label-Based Discovery
• the labels created by data owners in the Create phase of the data lifecycle will
greatly aid any data discovery effort
• With labels organization can determine what data it controls and what
amounts of each kind.
15
Cont.
2. Metadata-Based Discovery
• Metadata (data about data) can be useful for discovery purposes.
• Metadata is a listing of traits and characteristics about specific data elements
or sets.
• Metadata is often automatically created at the same time as the data, often
by the hardware or software used to create the parent data.
• E.g., most modern digital cameras create a metadata every time a photograph is taken,
such as date, time, and location where the photo was shot, make and model of the
camera
• It is embedded in file and is copied and transferred whenever the image itself is copied
or moved.
16
Cont.
3. Content-Based Discovery
• Even without labels or metadata, discovery tools can be used to locate and
identify specific kinds of data by delving into the content of datasets.
• This technique can be as basic as term searches or can use sophisticated
pattern-matching technology.
17
Data Analytics
• Current technology provide additional options for finding and
assigning types to data.
• Modern tools can create new data feeds from sets of data
• These include the following:
1. Data mining
• This kind of data analysis is an outgrowth of the possibilities offered by
regular use of the cloud, also known as “big data.”
• Extremely useful in detect and analyze previously unknown trends and
patterns from data collected from various data stream in an organization
18
Cont.
2. Real-Time Analytics
• In some cases, tools can provide data mining functionality concurrently with data
creation and use.
• These tools rely on automation and require efficiency to perform properly.
• Examples:
• Viewing orders as they happen for better tracking and to identify trends.
• Continually updated customer activity like page views and shopping cart use to
understand user behavior.
3. Agile Business Intelligence
• State-of-the-art data mining involves recursive, iterative tools and processes that
can detect trends and identify even more patterns in historical and recent data.
19
Structured vs. Unstructured Data
• Data that is sorted according to meaningful, discrete types and
attributes, such as data in a database, is said to be structured
• Unsorted data (e.g. content of various emails in Sent folder) is
considered unstructured
• It is typically much easier to perform data discovery actions on
structured data because that data is already situated and arranged.
20
Information Rights Management (IRM)
• It is the use of specific controls that act in addition to organization’s other
access control mechanisms to protect certain types of assets, usually at the
file level.
• Other terms used in same context are DRM “digital rights management” or
“data rights management.”
• DRM is encapsulated within the concept of IRM:
• DRM applies to the protection of consumer media, such as music, publications,
video, movies..
• IRM applies to the organizational side to protect information and privacy, whereas
DRM applies to the distribution side to protect intellectual property rights and
control the extent of distribution.
21
Cont.
• In a typical environment, access controls are placed on a data object (file)
that determines who on the system can read or modify that object.
• With IRM additional control layers are applied to the object that allow for
much more granular and powerful control over what can be done with it.
• With IRM functions like copying, renaming, printing, sending, … can be
further controlled and restricted.
make data storage more removed from data consumption and allows for
more flexibility in choosing hosting platforms and providers.
• IRM controls and ACLs can be places immediately upon data at the time
creation.
22
Intellectual Property Protections
• It is that class of valuable belongings that are intangible; assets of the mind
• Copyright
• The legal protection for expressions of ideas is known as copyright
• It is granted to anyone who first creates an expression of an idea.
• This involves literary works, films, music, software, and artistic works.
• Copyright does not cover ideas, specific words, slogans, recipes, or formulae.
• Those things can often be secured with other intellectual property protections
• Copyright protects the tangible expression of an idea, not the form of an idea. For
instance, copyright protects the content of a book, not the hardcopy version of a
book itself
23
Cont.
• Trademarks
• It is intended to be applied to specific words and graphics.
• Trademarks are representations of an organization—its brand.
• A trademark can be the name of an organization, a logo, a phrase associated
with an organization, even a specific color or sound, or some combination of
these.
• In order to have a trademark protected by law, it must be registered within a
jurisdiction.
• Commonly, that is the US Patent and Trademark Office (USPTO)
24
Cont.
• Patents
• It is the legal mechanism for protecting intellectual property in the form of
inventions, processes, materials, decorations, and plant life.
• Patent owner gains right for production, sale, and importation of the
patented property.
• Patents typically last for 20 years from the time of the patent application.
• Trade Secrets
• Trade secrets are intellectual property that involve many of the same aspects
as patented material: processes, formulas, commercial methods, and so forth.
• However, unlike other intellectual property protections, material considered
trade secrets must be just that: secret.
25
Cont.
• They cannot be disclosed to the public, and efforts must be made to maintain
secrecy in order to keep this legal protection.
• anyone who tries to acquire trade secrets by theft or misappropriation can be
sued in civil court
• Anyone who discovers or invents the similar methods, processes, and
information through legal means is justified and legally free to use that
knowledge to their own benefit.
26
IRM Tool Traits
• IRM can be implemented in enterprises by manufacturers, vendors, or
content creators.
• Material protected by IRM solutions need some form of labeling or
metadata in order for the IRM tool to function properly.
• Some ways that IRM has been or could be applied:
1. Rudimentary Reference Checks
• content itself can automatically check for proper usage or ownership
• e.g. in many computer games, game would pause in operation until the player
entered some information that could only have been acquired with the
purchase of a licensed copy of the game
27
Cont.
2. Online Reference Checks
• Microsoft software packages, including Windows operating systems and
Office programs, are often locked in the same manner, requiring users to
enter a product key at installation
• the program would then later check the product key against an online
database when the system connected to the Internet.
3. Local Agent Checks
• User installs a reference tool that checks the protected content against the
user’s license.
• Gaming engines often work this way, the agents check the user’s system
against the online license database to ensure the games are not pirated.
28
Data Control
• Organization also needs to protect data in lifecycle phases
• It is required to make, use, and enforce a set of data management policies
and practices for data retention, audit, and disposal.
• Data Retention:
• data retention policy should include the following:
• Retention Periods
• The retention period is the length of time that the organization should keep data.
• retention period is often expressed in a number of years and is frequently set by
regulation
• Data retention periods can also be mandated or modified by contractual agreements.
29
Cont.
• Applicable Regulation
• the retention policy should refer to all applicable regulatory guidance.
• Retention Formats
• policy should contain a description of how the data is archived
• what type of media it is stored on and any handling specifications particular to the data.
• E.g., some types of data are required by regulation to be kept encrypted while in
storage.
• Data Classification
• Highly sensitive or regulated data may entail specific retention periods, by mandate or
contract or best practice.
• organization can use the classification level of data to determine how long specific
datasets or types of data need to be retained.
• classification policy as a means to describe/create classification levels (e.g., the longer
the retention period, the higher the classification)
30
Cont.
• Archiving and Retrieval Procedures
• Having data in storage is useful
• stored data can be used to correct production errors, can serve as business
continuity and disaster recovery (BC/DR) backups, and can be datamined for
business intelligence purposes.
• Detailed description of the processes both for sending data into storage and for
recovering it should be explained in policy
• Monitoring, Maintenance, and Enforcement
• Policy should list how often it will be reviewed and amended, by whom,
consequences for failure to adhere to the policy, and which entity within the
organization is responsible for enforcement.
31
Cont.
• Data Retention in Cloud
• Managing data retention in the cloud can be especially tricky
• It may be difficult to ensure, for instance, that the cloud provider is
not retaining the organization’s data beyond the retention period
• Organization should make sure that provider can support the
organization’s retention policy when
• considering cloud migration, and
• negotiations with potential cloud providers
32
Data Control – Cont.
• Legal Hold
• Organization suspend all data destruction activities when it is notified that
either
• a law enforcement/regulatory entity is commencing an investigation
• a private entity is commencing litigation against the organization,
• Concept of a “legal hold” severely affects an organization’s data retention and
destruction policies because it replaces them.
• Data destruction will continue until the investigation/lawsuit has been fully
resolved
33
Cont.
• Data Audit
• Data audit is a powerful tool when organization needs to regularly review,
inventory, and inspect the usage and condition of the data it owns.
• The policy should include detailed descriptions of the following items:
• Audit period
• Audit scope
• Audit responsibilities (internal and/or external)
• Audit processes and procedures
• Applicable regulations
• Monitoring, maintenance, and enforcement
34
Cont.
• Audit can be done on logging
• Logging forms are
• event logging, security logging, traffic logging and so on
35
Cont.
• Data Destruction/Disposal
• When organization has ownership and control of all the infrastructure, including the
data, hardware, and software, data disposal options are direct and straightforward.
• In the cloud, data disposal is much more difficult and riskier.
• Following are the data disposal options in the traditional environment:
• Physical Destruction of Media and Hardware
• Degaussing: applying strong magnetic fields to the hardware and media where
the data resides, effectively making them blank
• Overwriting :
• Multiple passes of random characters are written to the storage areas where the
data resides, with a final pass of all zeroes or ones.
• extremely time-consuming for large storage areas.
36
Cont.
• Crypto-Shredding (Cryptographic Erasure)
• Two round of encryption
• First round: encrypting the data with a strong encryption engine and
then taking the keys generated in that process,
• Second round: encrypting the key of first process with a different
encryption engine and destroying the resulting keys of the second
round of encryption.
37
Data Destruction/Disposal difficulties in
cloud
• Many of these options are unavailable or not feasible.
• Because the cloud provider owns the hardware, physical destruction is usually
out of question
• It nearly impossible to determine all the components and media that would
need to be destroyed because of the difficulty of knowing the actual specific
physical locations of the data
• Similarly, overwriting is not a practical means of sanitizing data in the cloud
• In cloud a customer cannot physically destroy or overwrite storage
space/media as that would affect other customers’ data.
38
Data Destruction/Disposal solution in
cloud
• Crypto-shredding as the sole pragmatic option for data disposal in the
cloud
• Organization needs to create a policy for data disposal.
• This policy should include detailed descriptions of the following:
• The process for data disposal
• Applicable regulations
• Clear direction of when data should be destroyed
39
Summary
• Discussed data management functions within the data lifecycle,
• including data retention, auditing, and disposal.
• Describe various roles, rights, and responsibilities associated with data
ownership
• Reviewed intellectual property concepts and legal protections for intellectual
property
• IRM solution
• Discussed inventorying data assets and the added value data discovery offers the
organization
• Touched on some jurisdictional concerns for data
40