0% found this document useful (0 votes)
165 views

Data Profiling

Data profiling examines, analyzes, and reviews data to collect statistics about data quality and structure. It involves collecting descriptive statistics, data types, patterns, and performing quality assessments. There are three main types of data profiling: structure discovery focuses on data formatting and consistency; content discovery assesses data quality; and relationship discovery detects connections between data sources. Data profiling leads to higher quality data and insights that help with analytics and decision making.

Uploaded by

Alex
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views

Data Profiling

Data profiling examines, analyzes, and reviews data to collect statistics about data quality and structure. It involves collecting descriptive statistics, data types, patterns, and performing quality assessments. There are three main types of data profiling: structure discovery focuses on data formatting and consistency; content discovery assesses data quality; and relationship discovery detects connections between data sources. Data profiling leads to higher quality data and insights that help with analytics and decision making.

Uploaded by

Alex
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Profiling

What is Data
Profiling
Data profiling is the process of
examining, analyzing and reviewing
data to collect statistics surrounding
the quality and hygiene of the dataset.

Data profiling may also be known as


data archeology, data assessment, data
discovery or data quality analysis.
Data profiling involves
• Collecting descriptive statistics like min, max, count and sum.
• Collecting data types, length and recurring patterns.
• Tagging data with keywords, descriptions or categories.
• Performing data quality assessment, risk of performing joins on
the data.
• Discovering metadata and assessing its accuracy.
• Identifying distributions, key candidates, foreign-key candidates,
functional dependencies, embedded value dependencies, and
performing inter-table analysis.
Types of Data Profiling

Structure discovery- This focuses on Content discovery- This Relationship discovery- This
the formatting of the data, making sure process assesses the quality detects connections, similarities,
everything is uniform and consistent. It differences and associations between
also uses basic statistical analysis to of individual pieces of data.
data sources.
return information about the validity of For example, ambiguous,
the data.
incomplete and null values
are identified.
• Leads to higher quality, more credible data.
• Helps with more accurate predictive analytics

Advantages and decision making.


• Makes better sense of the relationships

of Data between different datasets and sources.


• Keeps company information centralized and

Profiling organized.
• Eliminates errors associated with high costs,
such as missing values or outliers.
• Highlights areas within a system that
experience the most data quality issues, such
as data corruption or user input errors.
• Produces insights surrounding risks,
opportunities and trends.
1. Unjustified discrimination
2. Stigmatization
3. Dehumanization
Disadvantages of 4. De-individualization
Data Profiling 5. Loss of privacy
6. Loss of autonomy
7. Being confronted with unwanted
information
Resources

• https://panoply.io/analytics-stack-guide/data-profiling-best-practices/
• https://searchdatamanagement.techtarget.com/definition/data-profil
ing
• https://reference.jrank.org/security/Data_Mining_and_Profiling_in_B
.html

You might also like