0% found this document useful (0 votes)
122 views

Data Profiling & Data Quality

The document discusses data profiling and data quality. It explains that data profiling examines data sources and collects statistics about the data, including frequency, data types, lengths, discrete values, and patterns. This helps improve data quality, shorten project timelines, and increase user understanding of the data. The document also demonstrates two open source data quality tools - Open Source Data Quality (OSDQ) and Talend Data Quality. Both are free Java-based tools that connect to databases and allow ad-hoc profiling, analysis, and reporting to assess data quality.

Uploaded by

Niraj
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views

Data Profiling & Data Quality

The document discusses data profiling and data quality. It explains that data profiling examines data sources and collects statistics about the data, including frequency, data types, lengths, discrete values, and patterns. This helps improve data quality, shorten project timelines, and increase user understanding of the data. The document also demonstrates two open source data quality tools - Open Source Data Quality (OSDQ) and Talend Data Quality. Both are free Java-based tools that connect to databases and allow ad-hoc profiling, analysis, and reporting to assess data quality.

Uploaded by

Niraj
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Technical Knowledge Sharing Session

Data Profiling & Data Quality


Data Profiling
Data profiling is the process of examining the data available from a source
and collecting statistics or summary information about that data.

What do you get?


• Frequency
• additional metadata information such as
• data types,
• length,
• discrete values,
• uniqueness, occurrence of null values,
• typical string patterns. etc.

Benefits of data profiling


• To improve data quality,
• shorten the implementation cycle of major projects, and
• improve users' understanding of data
Data Quality
Tooling Demo
Open Source Data Quality [OSDQ]

Java Based free tool. Freedom to write logics and


Connects to few databases update data directly to
Adhoc run, does not save your databases
connections

Quick analysis and reports for Get colourful charts against


profiling each column you profile and
save them in various formats
Talend Data Quality

Community Based free tool Freedom to write logics and


Lot of data connectors update data directly to
databases.

Quick analysis and reports for Get bar charts against each
profiling with loads of inbuilt column you profile.
capabilities You cannot save them in free
versions
Questions, Feedback

You might also like