0% found this document useful (0 votes)
38 views18 pages

Data Explorer: - A Data Profiling Tool

The document describes a data profiling tool called Data Explorer. It allows users to perform various types of data profiling like column profiling, constant analysis, null rule analysis, frequency analysis, and primary/composite key analysis on databases like MS SQL Server. The tool is being developed in .NET using C# and will have a simple user interface, support multiple databases, and allow export of profiling results to Excel.

Uploaded by

Yashvir Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views18 pages

Data Explorer: - A Data Profiling Tool

The document describes a data profiling tool called Data Explorer. It allows users to perform various types of data profiling like column profiling, constant analysis, null rule analysis, frequency analysis, and primary/composite key analysis on databases like MS SQL Server. The tool is being developed in .NET using C# and will have a simple user interface, support multiple databases, and allow export of profiling results to Excel.

Uploaded by

Yashvir Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Explorer

- A data profiling tool

Ihr Logo
Agenda
 Introduction
 Existing System
 Limitations of Existing System
 Proposed Solution
 Project Scope
 Block Diagram
 Implementation
 Technology
 Hardware and Software Requirements
 Features and Benefits
 Future Enhancement

DataYour
Explorer
Logo – A Data Profiling Tool
Introduction (1/2)
Data Profiling
 Data profiling is the process of examining the data available in an existing data source
(e.g. a database or a file) and collecting statistics and information about that data.
 Data profiling is an analysis of the candidate data sources for a data warehouse to
clarify the structure, content, relationships and derivation rules of the data. Profiling
helps to understand anomalies and to assess data quality, but also to discover, register,
and assess enterprise metadata.
 The purpose of data profiling is both to validate metadata when it is available and to
discover metadata when it is not.
 The result of the analysis is used both strategically, to determine suitability of the
candidate source systems and give the basis for an early go/no-go decision, and
tactically, to identify problems for later solution design, and to level sponsors’
expectations.

DataYour
Explorer
Logo – A Data Profiling Tool
Introduction (2/2)
Pourpose of Data Profiling
 Find out whether existing data can easily be used for other purposes
 Improve the ability to search the data by tagging it with keywords, descriptions, or
assigning it to a category
 Give metrics on data quality, including whether the data conforms to particular
standards or patterns
 Assess the risk involved in integrating data for new applications, including the
challenges of joins
 Assess whether metadata accurately describes the actual values in the source
database
 Understanding data challenges early in any data intensive project, so that late project
surprises are avoided. Finding data problems late in the project can lead to delays and
cost overruns.

DataYour
Explorer
Logo – A Data Profiling Tool
Existing System
 Initially the data Profiling activities used to be done by writing complicated SQL queries
 This would be comfortable for analyst or user who knows to write SQL queries
 Many of us do not know the proper syntax and format for writing SQL queries
 To overcome this, Data Profiling tools were introduced
 Data Profiling Tools, to a some extent overcome the limitations for writing complex
queries
 All types of profiling activities were not supported by the tools
 User has to understand and learn how to use the tool

DataYour
Explorer
Logo – A Data Profiling Tool
Limitations of Existing System
SQL Queries Existing Tools

 Development time is more.  Complex User Interface


 Need to understand the functionality  Limited Functionality.
for developing the queries.
 Setup and Installation.
 Results needs to be exported to excel
or notepad for anlysis
 License Cost.

 Traditional Approach
 Minimum Server Requirements

DataYour
Explorer
Logo – A Data Profiling Tool
Proposed Solution
 Developing an Application performing all the types of profiling
 Easy to use interface
 Minimum system requirements
 Feature to export the profiling results data to excel
 Additional feature to indicate the Data Quality i.e. Data Quality Indicator
 Supporting multiple Databases like Oracle 10g, Oracle 11g, MS SQL Server 2005, MS
SQL Server 2008, My SQL etc

DataYour
Explorer
Logo – A Data Profiling Tool
Project Scope
 Keeping the Time Line and other factors in mind, the project will currently support only
MS SQL Server
 Also the project will have following types of Profiling:
 Column Profiling
 Empty Column Analysis
 Null Rule Analysis
 Constant Analysis
 Frequency Analysis
 Uniqueness Analysis
 Primary/Composite Key Analysis

DataYour
Explorer
Logo – A Data Profiling Tool
Architecture Diagram
Analysis Team Business Users Management

Data Explorer
Data Central Capture Reporting
Profiling Metadata Issues
Repository and Notes

MS SQL Server Other Databases

DataYour
Explorer
Logo – A Data Profiling Tool
Implementation
 The project will be implemented module wise.
 Project will be having different modules. Each module will be developed individually
and Unit Tested
 After completion of all the modules and unit testing, all the modules will be integrated
and System Integration Testing will be performed
 There will be separate modules for Databases retrieval from server, Tables retrieval
after selecting a database, Columns retrieval after selecting a Table
 There will be separate module for each type of profiling discussed.

DataYour
Explorer
Logo – A Data Profiling Tool
Implementation - Profiling Details
 Column Profiling
 This will help in discovering total no of records, null percentage, unique
percentage, minimum and maximum value in the column, documented data type
etc.
 Constant Analysis
 This will help in discovering those columns which has less than 4 and greater than
0 distinct values.
 Null Rule Analysis
 This will help in finding all the columns in a table which has 100% NULL values

DataYour
Explorer
Logo – A Data Profiling Tool
Implementation - Profiling Details
 Unique Analysis
 This will help in finding all the columns in table which has 100% uniqueness.
 Primary Key / Composite Key Analysis
 It will help us to find out the possible primary or composite key columns which can
be have unique combination.
 Frequency Analysis
 This will help in finding the no. of distinct values in the columns and the no. of time
the value is repeated in a column.

DataYour
Explorer
Logo – A Data Profiling Tool
Technology
 Data Explorer will be developed on .NET platform using C# as a coding language.
 .NET is Microsoft platform for developing advanced and Robust applications
 .NET supports a wide range of library classes which eases the development efforts
and hence more time can be utilized in other activities
 .NET is called Language Independent Platform as it support 4 native languages and 21
non-native languages.
 Native Languages are a Microsoft created languages i.e. C#. VB.Net. J#, VC++
 Non-Native or Non Microsoft Languages supported are Pearl, Ruby on Rails etc

DataYour
Explorer
Logo – A Data Profiling Tool
Hardware and Software Requirements

Data Explorer

HARDWARE SOFTWARE
• Pentium Core 2 • Windows 2000/
Duo processor or Windows XP/
above Windows Vista/
• 2 GB RAM Windows 7
• 20 GB HDD • Microsoft .NET
• Printer Framework 3.5
• Router for Internet • Microsoft Visual
Connection Studio 2008

DataYour
Explorer
Logo – A Data Profiling Tool
Features
 Supports multiple databases like MS SQL Server, Oracle
 Different types of profiling like
 Column Profiling
 Constant Analysis
 Unique Analysis
 Null Rule Analysis
 Frequency Analysis
 Empty Column Analysis
 Primary / Composite Key Analysis
 Quickly Analyze and validate data issues

DataYour
Explorer
Logo – A Data Profiling Tool
Benefits
 Quick discovery of data issues
 No more writing of queries to profile data
 Time efficient
 Shorten the implementation cycle of major projects
 Improve understanding of data for the users
 Discovering business knowledge
 Improves data accuracy in corporate databases

DataYour
Explorer
Logo – A Data Profiling Tool
Future Enhancement
 Data Explorer can be further extended to support unstructured or semi-structured data
like flat files, .csv files
 It can also be extended to support other relation data bases like MS Access, MySQL,
Sybase etc Time efficient
 It can also be enhanced by including Data Quality reports on top of Data Quality
Results
 There can be mechanism to store the profiling results so that it can be used or referred
later at any point of time

DataYour
Explorer
Logo – A Data Profiling Tool
Thank You

Data Ihr
Explorer
Logo – A Data Profiling Tool

You might also like