Data Profiling
Data Profiling
What is Data
Profiling
Data profiling is the process of
examining, analyzing and reviewing
data to collect statistics surrounding
the quality and hygiene of the dataset.
Structure discovery- This focuses on Content discovery- This Relationship discovery- This
the formatting of the data, making sure process assesses the quality detects connections, similarities,
everything is uniform and consistent. It differences and associations between
also uses basic statistical analysis to of individual pieces of data.
data sources.
return information about the validity of For example, ambiguous,
the data.
incomplete and null values
are identified.
• Leads to higher quality, more credible data.
• Helps with more accurate predictive analytics
Profiling organized.
• Eliminates errors associated with high costs,
such as missing values or outliers.
• Highlights areas within a system that
experience the most data quality issues, such
as data corruption or user input errors.
• Produces insights surrounding risks,
opportunities and trends.
1. Unjustified discrimination
2. Stigmatization
3. Dehumanization
Disadvantages of 4. De-individualization
Data Profiling 5. Loss of privacy
6. Loss of autonomy
7. Being confronted with unwanted
information
Resources
• https://panoply.io/analytics-stack-guide/data-profiling-best-practices/
• https://searchdatamanagement.techtarget.com/definition/data-profil
ing
• https://reference.jrank.org/security/Data_Mining_and_Profiling_in_B
.html