Data Quality
Data Quality
Introduction
So there is a need to collect data in one place and clean up the data
Why data quality matters?
Good data is your most valuable asset, and bad data can seriously harm
your business and credibility…
Data quality results from the process of going through the data and
scrubbing it, standardizing it, and de-duplicating records, as well as
doing some of the data enrichment.
1. Maintain complete data.
2. Clean up your data by standardizing it using rules.
3. Use fancy algorithms to detect duplicates.
4. Avoid entry of duplicate leads and contacts.
5. Merge existing duplicate records.
6. Use roles for security.
Inconsistent data before
cleaning up
Consistent data after cleaning
up
Data Profiling
It is the process of statistically examining and analyzing the content in a
data source, and hence collecting information about the data. It
consists of techniques used to analyze the data we have for accuracy
and completeness.
Data profiling involves statistical analysis of the data at source and the
data being loaded, as well as analysis of metadata. These statistics may
be used for various analysis purposes.
Common examples of analyses to be done are:
NULL values: Look out for the number of NULL values in an attribute.
How to conduct Data Profiling?
Candidate keys: Analysis of the extent to which certain columns are distinct will
give developer useful information w. r. t. selection of candidate keys.
Primary key selection: To check whether the candidate key column does not
violate the basic requirements of not having NULL values or duplicate values.
Empty string values: A string column may contain NULL or even empty sting values
that may create problems later.
String length: An analysis of largest and shortest possible length as well as the
average string length of a sting-type column can help us decide what data type
would be most suitable for the said column.
How to conduct Data Profiling?