0% found this document useful (0 votes)
359 views25 pages

Data Cleaning With SSIS

This document discusses using SQL Server Integration Services (SSIS) and Data Quality Services (DQS) for data cleaning. It covers why data cleaning is important, the various SSIS components that can be used for data cleaning including built-in transformations, lookups, scripts and DQS. It then explains how to set up DQS with SQL Server, build knowledge bases in DQS and use the DQS cleaning task in SSIS projects to cleanse data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
359 views25 pages

Data Cleaning With SSIS

This document discusses using SQL Server Integration Services (SSIS) and Data Quality Services (DQS) for data cleaning. It covers why data cleaning is important, the various SSIS components that can be used for data cleaning including built-in transformations, lookups, scripts and DQS. It then explains how to set up DQS with SQL Server, build knowledge bases in DQS and use the DQS cleaning task in SSIS projects to cleanse data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Cleaning with SSIS

SQL Server 2012


About Me

• I have been in the I.T. field for almost 15 years


• Currently system administrator / DBA for LSU
Highway Safety Research Group
• Manage Business Intelligence systems
• Adjunct instructor teaching B.I. in ISDS
department
Agenda

• Why do we need data cleaning?


• SSIS tools
– Built-in components
– DQS
– Custom components
• DQS Setup
• Resources
Data Quality / Cleaning

• Aspects of Data Quality:


– Accuracy
– Completeness
– Reliability
– Accessibility
– Consistency
– Timeliness
Data Quality / Cleaning

• Data quality is an ongoing process that involves:


– Discovery
– Standardizing
– De-duplicating
• The goal is to create a single view even if the data
is stored in disparate systems.
SSIS Components

• Data Conversion Transform – Ex. Convert Unicode


string to non-Unicode string.
• Derived column transformation – Ex. Convert word
“one” to number “1”
• Lookup task – Ex. Cleaning state names with a
dataset containing clean names
• Fuzzy lookup – Ex. Remove duplicates (Similar
spellings)
• Script Component – Ex. Regex expression
SSIS component demo

• SSIS built-in component demo 1


Data Quality Services

• DQS enables building of knowledge bases which


can be used for data:
– Correction
– Enrichment
– Standardization
– De-duplication
SQL 2012 Data Quality Services

• Knowledge driven solution


• 1st generation product
• SQL editions
– Enterprise
– Business Intelligence
DQS Components

• Components
– Server
– Client
– SSIS Cleaning Task
• DQS Building Blocks
– Domain
– Knowledge Base
Data Cleaning - SSIS

• Installed as part of Integration Services


• Integration options for DQS
– SSIS task component
– Master Data Services (MDS)
– API – Not in the current iteration
Data Cleaning - SSIS
DQS SSIS Demo

• DQS SSIS Demo 2


Custom Component

• Regex component demo 3


DQS - Server

• DQS server is part of SQL Server 2012


• Two step installation:
– Choose component when installing system
– Run the “Data Quality Server Installer”
• Installation documentation:
– http://msdn.Microsoft.com/en-us/library/gg492277(v
=sql.110).aspx
DQS - Server
DQS - Server
DQS - Server

• What just happened?


• Now what?
– Grant users access to roles in DQS_MAIN database
– Roles:
• dqs_administrator
• dqs_kb_editor
• dqs_kd_operator
– Enable TCP/IP for DQS Client connection
DQS - Client
DQS - Client
DQS - Client

• First step is to create a knowledge base


– KB included with installation
• Use as is
• Edit as needed
– Use source data to create KB
– Manually create entries in KB
DQS - Client

• Creating and managing domains


• Single and composite domains
• Values:
– Correct
– Error
– Invalid
DQS - Client

• Manage knowledge base


• Create domains (composite domains)
• Cleanse data
• Profile data
• Match data
• Configuration
• View activity
Resources

• Data Quality Services Blog – Microsoft


• SQL Server 2012 Resources – Microsoft
• Books Online for SQL Server 2012 – Microsoft
• DQS Performance and Best Practices – Microsoft
• SearchSQLServer
• SQLServerPedia
• SQLServerCentral
Contact Information

• Email: [email protected]
• Twitter: @markverret
• Web: HSRG.LSU.EDU

You might also like