100% found this document useful (2 votes)

462 views69 pages

Data Profiling PPT - How To

The document discusses different techniques for data profiling, including: 1) Basic statistics gathering such as maximum, minimum, mean, and frequency distributions to analyze individual attributes. 2) Advanced techniques like studying relationships between attributes over time and subject profiling to identify real-world entities represented in the data. 3) The speaker advocates using data profiling tools to efficiently gather metadata about the data, but warns against relying too heavily on tools without careful analysis of the results.

Uploaded by

sandeep batlad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

462 views69 pages

Data Profiling PPT - How To

Uploaded by

sandeep batlad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Data Profiling:

Best Practices by Example

G I A N D I LO R E TO, P H . D.

IDQ CONFERENCE
N OV E M B E R 4 , 20 13
LITTLE ROCK, AR
Class Overview
• Meet your instructor, class demographics
• Data profiling: an overview
• Tools
• Where does data profiling fit?
• Modern data profiling techniques
• Basic Data Profiling
• Advanced Data Profiling Techniques
• Subjects and Subject Level Data Profiling

• Reporting
• Tools
• Discussion, Questions, Wrap-Up

IDQ Conference 2013

Gian Di Loreto (me)
• Ph.D. from Michigan State in 1998 studying particle physics
• Transitioned to industry in 1998 as a programmer
• Got involved in Data Quality in 2000
• DQ was an emerging discipline then
• Companies didn’t understand it, what it was, why they should care (do they
now?)

• Still involved in Data Quality

• Data Quality Assessments (data profiling is key here)
• Data Cleansing
• Ongoing Data Quality
• Data Stewardship Program Design and Implementation
• Trainings

• I try to remain faithful to scientific, academic principles

IDQ Conference 2013

Data Profiling: an Overview
• We hear about profiling all the time
• Profiling is the study of a multidimensional object during which all but 1 (or a
small few) dimension is examined while ignoring all others
• This allows us to more fully understand one degree of freedom, but at the
expense of all others

• Mathematical
• Geometrical (your instructor has been described as having a ‘Roman Profile’)

• Racial
• Ignore other factors when deciding whom to stop, frisk, scan, etc..
• Data
• Simplest example: study one data element regardless of any other attributes

IDQ Conference 2013

Data Profiling: an Overview
• Data Profiling – definitions:
• Data Entity – data table, Excel sheet, etc.
• Data Attribute – data field, column, etc.
• Subject – the real world object your data describes, aka the thing in your data
that you care about
• Metadata – derived data, data about data

• Simple data profiling involves exhaustively studying one data

attribute without regard to the values or behavior of other data
attributes in the same entity.
• More complex data profiling will involve studying the relationship
between data attributes, the behavior of one data attribute as it
relates to one or more others within the same or a different entity
• Even more complex data profiling will involve the definition of a
subject type and profiling subject derived metadata
• We will discuss each of these today, with examples, as time permits

IDQ Conference 2013

Data Profiling: Tools
• Virtually all data profiling performed today employs the use of a tool,
a software package, that performs (usually) both canned and custom
data profiling
• We will briefly look at three such tools today during the 2nd session
• Trillium
• DataFlux
• Talend

• We will not dwell on the tools, just give you a quick feel for how these
techniques have been implemented by the software development
community (in the cases where they have)
• We will demo by example using Talend since it’s free and you guys
can go ahead and download it yourselves if you like

IDQ Conference 2013

Data Profiling: Tools - continued
• Because data profiling is well understood (relatively) and easily
programmed, there are many good tools out there
• The problem is that often management will look at the tool as a
solution to your data integrity, data quality issues rather that just
that, a tool
• Like any tool, a data profiling tool (such as those we will study today)
is only as good as its operator

IDQ Conference 2013

Where Does Data Profiling Fit?
• Data profiling is a quick way to learn a great deal about any given
data set.
• It is usually done at the outset of a data quality investigation, or any
data-centric project, such as
• A data quality assessment
• A data cleansing
• The creation of a data warehouse
• A system upgrade or new implementation
• Any data migration

• Essentially, anytime you want an overview of what you’ve got in your

data, a data profile is great way to start, however there are caveats:
• A data profile generates a great deal of reports, charts, metadata
• We must resist the temptation to focus one of the excellent tools on our data,
create a bunch of reports and call it a data profile
• The analogy is the highlighter in college

IDQ Conference 2013

What doesn’t data profiling do ?
• Data profiling does not improve data quality
• Data profiling does not improve data quality
• Data profiling does not improve data quality
• Data profiling will not simplify your project
• Data profiling will not create a project plan
• Data profiling will not set expectations for time, resources, or cost.

• What it does do is provide a vast amount of metadata that if carefully

studied, can render a path toward all of the above.

IDQ Conference 2013

Modern Data Profiling
Techniques
What is considered ‘modern’ of course changes all the time, these techniques
are some that have worked for your instructor, have added value to actual
projects in real life.

IDQ Conference 2013

Initial Data Profiling Exercises
• Statistics Gathering
• Max/Min/Mean/Median/SD/Field Data Type

• Key Constraints
• Frequency Distributions
• Outlier Study
• Frequent Values, Infrequent Values

IDQ Conference 2013

Statistics Gathering
• Entity Level (table level)
• Very useful during data transmission
• If reports match before and after a data migration, confidence can be high that
all data was successfully migrated (like a checksum)
• The example on the next page is from DataFlux (we’ll see more later) and gives
an overview of statistics culled from the table PS_PERSONAL_DATA a
PeopleSoft table
• This is the first toe in the pool that most tools provide when data profiling.
• You can see the conundrum already; it’s a lot of information that needs to be
examined and filtered before sharing. We’ll talk more about reporting later.

IDQ Conference 2013

Entity Profile Example

IDQ Conference 2013

Statistics Gathering
• Attribute Level (data row level) profiling
• All Data Types
• Null Count – Null Percentage: number and/or percentage of records with a null
value
• Mode – Most frequent value
• Pattern Count – Number of difference distinct patterns observed; mm/dd/yyyy
or 999999.99 for example
• Data type observed always or almost always in the column
• Length of data in the column (most of the time)
• Uniqueness

• Numeric Data Types

• Mean
• Median
• Precision
• Standard Deviation

IDQ Conference 2013

Attribute Profile Example

IDQ Conference 2013

Statistics Gathering
Attribute Level (data row level) profiling - continued
• For fields with non-unique data the frequency distribution (group-by)
results can yield very interesting results
• Can be compared with allowed values
• Frequent and in frequent values should be studied

IDQ Conference 2013

Statistics Gathering
Attribute Level (data row level) profiling - examples

IDQ Conference 2013

Statistics Gathering
Pattern Frequency Distribution

IDQ Conference 2013

Statistics Gathering
• That will conclude the simple data profiling exercise
• These examples are data type, industry, tool, non-specific
• These quantities can and should be studied in data profiling exercise
and any good tool will provide these for you
• Always remember in all examples to look at outliers (outliers?),
anything that appears very rarely, or very frequently is usually worth
looking into.
• Your job is to figure out where the interesting metadata lies and to
filter that out and prepare for consumption by the non-technical
crowd (more on reporting later)

IDQ Conference 2013

Advanced Data Profiling Techniques
• The data profiling techniques we have described so far can be
thought of as studying the data ‘at rest’. But there is often a time
dependence to the data that can provide useful insight.

for example, consider the following:

HR Employee Statuses: Active, Term, Dead, Hire, LOA, RFL
can yield time dependent pairs of statuses some of which are
allowed, some of which are not
Active-> Term
Hire -> LOA
LOA -> RFL
• This ‘state transition analysis’ can be applied to any time dependent
data.
• We will show examples during the demo section of our lesson today.

IDQ Conference 2013

Advanced Data Profiling Techniques
Subject Profiling
• The last type of data profiling we will introduce today is called
‘subject profiling’.
• The subject is the real life entity your data describes.
• Most of your data can be tied back to a subject
• Your data can have multiple subject types
• The subject is best described as the object in your data that you care about
• In HR, one subject is the employee

• Subject: discussion, what is the subject in your data?

IDQ Conference 2013

Advanced Data Profiling Techniques
Subject Profiling
• How to ID your subjects, build subject table
• Where in your data do your subjects exist? (from flag analysis, next
slide)

IDQ Conference 2013

Advanced Data Profiling Techniques
From Flag Analysis:
Subject ID Subject Identifier system 1 ID system 2 ID system 3 ID from system 1 from system 2 from system 3 SystemString
1 SIMPSON, WILLIAM 13383747 51128015 N Y Y NYY
2 SHAH, SANJEEVKUMAR 97990694 83825324 N Y Y NYY
3 KLECKA, ELIZABETH 57424517 77159268 Y N Y YNY
4 HEYNIS, MARY 50643168 38773091 61705282 Y Y Y YYY
5 FROEHLICH, DEBRA 65474857 4788680 20263455 Y Y Y YYY
6 WHITSURA, FRANK 34081383 9028648 N Y Y NYY
7 JAMES, NELSON 40521824 61221964 88640578 Y Y Y YYY
8 VEGA, HONORIO 96206762 3189667 Y N Y YNY
9 WULF, LONARTA 96796216 17319820 N Y Y NYY
10 BALL, JON 71316335 83662510 N Y Y NYY
11 KOLJACK, MATHIAS 67996093 N N Y NNY
12 WALENGA, JOEL 22949861 N Y N NYN
13 PARK, RODNEY 20408982 82230657 N Y Y NYY
14 WISE, IRENE 54396436 56040041 N Y Y NYY
15 HATAMI, RICHARD 84648416 33360377 44539967 Y Y Y YYY
16 SANDIFER, FREDDIE 26719162 N Y N NYN
17 ELGAR, ALBERT 67369462 Y N N YNN
18 SAKATA, FORD 83613295 94593963 N Y Y NYY
19 PROSEK, SUSAN 30045977 14659663 2629273 Y Y Y YYY
20 SCHREIBER, JOHN 44465024 N N Y NNY
21 SORENSEN, BERNARD 85980590 20965008 77776535 Y Y Y YYY
22 MADSEN, ROBERT 840806 11426695 71071458 Y Y Y YYY
23 SWEENEY, PATRICK 75698270 N N Y NNY
24 MA, DANIEL 7225529 57966111 Y N Y YNY
25 HEDRICK, BRIAN 19034742 N N Y NNY
26 JEDRZEJEK, JOSEPH 28205762 99595414 N Y Y NYY
27 SUSTR, RONALD 41667946 N N Y NNY
28 COSTAGLI, BLAS 8263132 39083443 N Y Y NYY
29 ZINNEN, HERMANN 70126267 13695857 57872540 Y Y Y YYY
30 MARTE, JULIO 2013
IDQ Conference 10398477 3847652 N Y Y NYY
Tools - summary

• As I’ve mentioned previously, the marketplace is crowded (you might say

overcrowded with data profiling tools)

• I have played around with three, but I can’t make any claims regarding
which is better than the other, I will talk about the three and compare
them, look at plusses and minuses.

• Tools can be stand alone or part of a larger software package

• Tools can operate as desktop versions and/or client server installs which
can facilitate collaboration.

• I will demonstrate Talend, and give you some hand outs I have prepared
which should give you a flavor of Trillium and DataFlux.

• Bottom line is the tool will provide you a great deal of metadata, the art
here is how you arrange and disseminate that metadata.
Reporting

• Reporting is a dying art form

• As a physicist, reporting was everything. Without distilling some very complicated

analyses down to a digestible summary, interest would wane quickly. Not to
mention funding.

• The situation in the ‘real world’ is no different. If you cannot take the output of
your data profile and create some simple and easy to swallow summaries, your
project sponsors will feel lost. This will lead to bad things.

• The best reporting tools start at a very high level and allow drill down so interested
parties can dig and see the detail, but the details are not provided until asked for.

• A lot of good work as been done with regard to this sort of drill down reporting, the
entire field of BI is essentially (as I understand it) designed around the careful
extraction and dissemination of information.

• Anybody can press a button and create a bunch of meta-data, the art of this
business is preparing useful, usable, and actionable reports.
Reporting

• Anybody can press a button and create a bunch of meta-data, the art of this
business is preparing useful, usable, and actionable reports… how to do this?

• I find the simple approach best. Create a single table (or as few as possible) to hold
all your results, this is especially easy at the subject level.

• This table itself can then be profiled to provide summaries and overviews, but since
it contains all the metadata can allow drill down to the meta-data and if you’re
careful, the data itself.
Demonstration using Talend

At this point we will fire up Talend and run through some examples of the
data profiling techniques we discussed.

In the handouts I’ve included examples of data profiling taken as screen

shots from DataFlux and Trillium just for your information, read through on
your own if you are curious how these products handle the same tasks.
Wrap-Up

Main points I hoped to cover:

• Data Profiling is a valuable exercise, but it has its place, its

limitations

• Biggest risk is overwhelming project sponsors with many reports

which, if not carefully disseminated can obfuscate rather than
clarify the data and the state of the data

• Questions/feedback?
Dataflux, Talend, Trillium
We’ve attached some screen shots and notes for you to read later at the bar
or on the plane back home

IDQ Conference 2013

DataFlux Demo

Desktop looks
like this, this
product here is
called the Data
Management
Studio.

IDQ Conference 2013

DataFlux Demo
One of the nicer features of DataFlux is that it recognizes your already existing locally defined data
connections and allows you to begin work there without spending time defining them.

These risers are part of

the navigation through
DataFlux

IDQ Conference 2013

DataFlux Demo
Let’s Profile a data table

Let’s use the PS_PERSONAL_DATA table since it has a lot of recognizable data fields.
From the main screen, select new-> profile

First you’ll be prompted for a name and a folder to save it in.

IDQ Conference 2013

DataFlux Demo
Drill down to the table you want to profile
• Check the box next to the table we want to profile
• Press the green right arrow to run
• For some reason you’ll be prompted for a description, enter something.
• Job will run in a few minutes

IDQ Conference 2013

DataFlux Demo
Looking at the job profile results:
When the job is done, click the table name on the left, PS_PERSONAL_DATA in our example

Let’s click around here and see what jumps out

Below, we’ve picture the overview of the table, select a column on the left for more detail about that
particular column.

IDQ Conference 2013

DataFlux Demo

We can also profile the relationships between two or more fields in a table.
Start by selecting a field name from the list under Standard Metrics (we’re checking EMPLID vs EMPLID_INT here)

Right click and select Analyze Primary Key/Foreign Key Relationships…

IDQ Conference 2013

DataFlux Demo

Add the EMPLID_INT field to those to which we will compare EMPLID.

The results validate that EMPLID is always the same as EMPLID_INT

IDQ Conference 2013

DataFlux Demo

IDQ Conference 2013

DataFlux Demo

• Finally, we can save each and all of these reports to Excel for easy distribution.
• In fact, you can schedule a job to run this profile on a schedule, create the excel report
and email it around.
• Start by selecting Export… from the file menu.
• Configure the next menu like this. Careful, if you select all tables and fields you’ll get
1500 excel reports.

IDQ Conference 2013

DataFlux Demo

• It takes a few seconds, but you’ll get a nice excel report with some
useful stats.
• I find this report useful for a file delivery, it provides a good
overview of the structure of the data, max’s and min’s number of
unique and null values.

IDQ Conference 2013

DataFlux Demo
• DataFlux supports time dependent data profiling.
• You can program a job to profile a table for example every 24 hours
and even to send you an alert if a metric changes based on your
input.

IDQ Conference 2013

Talend

The freeware product we will demo today is called Talend Open Studio for Data
Quality (TOS-DQ)

• We’ll start with a simple data profile

• We need to point the product to our external data sources, in this case we’ll use
Oracle Data
• Create a New connection

IDQ Conference 2013

TOS-DQ Profile Example

We will be connecting to our small Password is ‘forward’

server’s Oracle instance. IP Address Hit check to test connection
= 192.168.3.123

IDQ Conference 2013

TOS-DQ Profile Example

Oracle Tables:

PS_JOB - TABLE1214
PS_EMPLOYMENT – TABLE1216
PS_EARNINGS_BAL –
TABLE1218
PS_PAY_CHECK – TABLE1220
PS_PERS_NID – TABLE1222
PS_PERSONAL_DATA –
TABLE1224

Expand to see list of tables. We will

want to analyze table 1224, the
PERSONAL_DATA table
To do this, goto TDQ_Data Profiling
and select New Analysis

IDQ Conference 2013

TOS-DQ Profile Example

Select ‘Column Analysis’ Give it any name you like

IDQ Conference 2013

TOS-DQ Profile Example

Select columns to analyze

Select TABLE1224
This table has very many
columns; select 15 or 20 or
so.

IDQ Conference 2013

TOS-DQ Profile Example

Now we need to tell Talend what to analyze, by default it will do nothing and complain that you
didn’t set the ‘indicators’.

Select the hyperlink (click on) ‘Select indicators for each column’. This will allow you to analyze
specific things for each column you selected in the prior step.

You can select all columns here for

simplicity.

The help window on the right will

tell you about the different kinds of
analyses available.

IDQ Conference 2013

TOS-DQ Profile Example

results

Run Job

The running man runs the job and

the eye brings up the results.

Results are also visible from the

Analysis Results tab.

IDQ Conference 2013

TOS-DQ Profile Example

The results screen is quite dense, let’s go through it together

IDQ Conference 2013

TOS-DQ Profile Example

• Just for fun, let’s profile another table that we derived from the personal data table.
• We created a table that has two columns, one for SSN and one for FIRST_NAME | LAST_NAME | BIRTHDATE
• We can use Talend to look for duplicate entries in this table

One of the nicer features of Talend is that

you can drill down to the data in question
from the profile report and furthermore see
the generic SQL query that produced that
report.

IDQ Conference 2013

TOS-DQ Profile Example

Here we can see we have some

duplicate SSN’s which actually is
probably due to my scrambling
algorithm, in addition to naturally
occurring doubles.

IDQ Conference 2013

Trillium Product Demo

Trillium operates as a client/server, but for today’s exercise, we have the source databases,
the server and the client running on the same box

• Start by finding the Repository Manager and starting it.

• Once you’re there, right click on Repositories and select Add Repository….
• User/pass is your username/username

This public cache is

• Set it up like this: interesting, it’s the amount
of memory allocated by the
server to each connected
client.

IDQ Conference 2013

Trillium Product Demo

• Next, we will have to define the database connections we will use, you can also define
flat file connections here.
• Right click on Loader Connections and select Add Loader Connection…

• And set it up like this:

• We will test in the next step

IDQ Conference 2013

Trillium Product Demo

• Now we will exit out of the Repository Manager and start the TSS-13 Control Center.

• Find the repository and connect

• We will now load some data. Trillium performs its analysis on the data as it is loaded.
The parameters of this analysis are set to default values that can be adjusted based on
your particular situation

• Select the ‘Entities’ tab, right click and select ‘Create Entity’

IDQ Conference 2013

Trillium Product Demo

• Select the relevant connection icon, type in credentials

• press Next >

• Select tables TDWI_OWNER.TABLE1214-TABLE1224 (use your control key to select

multiple tables)

IDQ Conference 2013

Trillium Product Demo

• IMPORTANT! To save time and space, we selected a 10% sample of the records from
the source table.

IDQ Conference 2013

Trillium Product Demo

• Select finish on the next screen.

• Select Run Now

• Refresh the main screen and you’ll see your jobs.

• This will take few minutes to load and analyze the tables, we’ll jump to another
repository already created and with its data analyzed.
• You can track the status if you select Analysis -> Background Tasks.

Or, you can press

the clock icon

IDQ Conference 2013

Trillium Product Demo

• When the jobs are done, there is a ton of metadata collected, getting through it can
seem daunting at first.
• Start by getting back to the main screen and selecting ‘Entities’, pick a table (for
example Tdwi Owner Table1216 and click on it.
• Select ‘Relationships’ and you’ll find the results of Trillium key analysis
• Here, the software has correctly identified Emplid Int and Emplid as table keys.

IDQ Conference 2013

Trillium Product Demo

• One of the more interesting things Trillium uncovers is the relationships between
different pairs of data elements, it picks up correlations.
• Like most of its results, Trillium over-shoots and some of what it picks up can be
thrown away, but in my example it did uncover some non-obvious relationships
between data elements.
• Go to Relationship Summary and select Discovered under Dependencies.

IDQ Conference 2013

Trillium Product Demo

• Discovered Dependencies from table1216 (this is the PS personal data table)

• I clicked on the heading of the ‘Quality’ column to sort by this column.

• Go ahead and click around and see what else you can find, you can do no damage
here.

IDQ Conference 2013

Trillium Product Demo

• Click a field name on the left and wait a few seconds and you’ll get some interesting
breakdowns.

Breakdown of frequent values

Breakdown of the pattern

Breakdown of the masks

IDQ Conference 2013

Trillium Product Demo

• Let’s say you want to know what one of these breakdowns is tell you. Click on the
diagram

• Now, if you right click on the row listed, you can drill down to the data.

IDQ Conference 2013

Trillium Product Demo

• The Soundex* analysis is interesting also. If you find a column with, say, a city name,
like in our Table1224, you can see how the algorithm is used during the match address
analysis.

*Soundex is a phonetic algorithm

for indexing names by sound, as
pronounced in English

Similar to soundex, metaphone

creates the same key for similar
sounding words. It's more
accurate than soundex as it knows
the basic rules of English

IDQ Conference 2013

Trillium Product Demo

• Let’s explore what Trillium refers to as business rules

• These can be defined at the entity (table) level or at the attribute (data element) level.
• While browsing I noticed some strange annual rate entries in the compensation table,
attribute.

• The Min is $0.01, which even during our current economic situation, seems low for an
annual rate. I assume this is ‘rate’ not actual compensation received.
• So I’d like to create a business rule to see how many tiny annual rates we have.

IDQ Conference 2013

Trillium Product Demo

• To do so, right click on AnnualRt and Add Attribute Business Rule

• Configure thusly:

IDQ Conference 2013

Trillium Product Demo

• The results suggest it is a good business rule

• Now the cool thing about Trillium is that you can quickly drill down and see the records
that passed or failed this business rule. This allows you to research the ‘bad’ data,
tweak your rule and so on.

• Notice anything?

IDQ Conference 2013

Subject level example
• What’s missing from all these tools is a mechanism to process the data at the subject level rather than (or in addition to) the record
level.
• The subject is the real life entity described by the data.

• As an example, we have a file of data containing electric usage data. The data contains a meter ID and a number of readings for
each meter.
• 1st we profile this file and record the profiles of each data attribute (AKA column, field).
• In doing so we see that the data spans 6 weeks or so

IDQ Conference 2013

Subject level example

• Now we used a different product to do the subject level profile and determine the breakdown of number of
reads per meter (here meter is the subject)

We see these outlier values and we were troubled.

We looked at the voltage of each of these meters with 1 or 2 reads only.

3.147, 2.714, 2.761, 2.762, and similar values

Looking now to the frequency distribution of the voltages, we can see these
are all very low (with in the bottom 0.1%), so we discovered why these meters
aren’t giving out reads as often as they should. They have dead batteries.

IDQ Conference 2013

Subject level example – state transition analysis

• In order to study the time component we are required to look at subsequent records in the data ordered by
time, for a specific subject. (subsequent records that cross into a different subject are meaningless)

• I do not know how to do this with any of the tools we have here, but I suspect it is possible, I am investigating.
Meanwhile, I have a created a metadata table with a simple script and then profiled that to understand the
state transition behavior of the JOB table.

IDQ Conference 2013

Subject level example – state transition analysis

• Profiling this table of metadata yields the time ordered pairs present in the data. Often these pairs are different
than the ‘allowed’ values given to the DQ analyst by the IT guys, working through any discrepancies between
the actual vs. the expected values is a useful exercise that can yield a few business rules.

• Again, we look for very frequent and very infrequent values, here we have no standouts.

IDQ Conference 2013

Data Profiling, Quality and Governance - research paper
No ratings yet
Data Profiling, Quality and Governance - research paper
13 pages
Data Profiling Vision Felix Naumann
No ratings yet
Data Profiling Vision Felix Naumann
11 pages
Data Profiling: References
No ratings yet
Data Profiling: References
28 pages
Cyber Security Unit - 5
No ratings yet
Cyber Security Unit - 5
43 pages
Test Taker Handbook
No ratings yet
Test Taker Handbook
16 pages
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
No ratings yet
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
3 pages
Data Profiling
No ratings yet
Data Profiling
7 pages
Introduction To Statistics Module
No ratings yet
Introduction To Statistics Module
101 pages
Ehlers - Non Linear Filters
0% (1)
Ehlers - Non Linear Filters
6 pages
Data Profiling
No ratings yet
Data Profiling
7 pages
The Process of Statistical Analysis in Psychology, 1st Edition Complete EPUB Download
100% (11)
The Process of Statistical Analysis in Psychology, 1st Edition Complete EPUB Download
17 pages
BBA116 Basic Statistics Weekly Summarised Notes
No ratings yet
BBA116 Basic Statistics Weekly Summarised Notes
87 pages
Calculus Study Guide
No ratings yet
Calculus Study Guide
51 pages
Kcse Revision Statistic 11
No ratings yet
Kcse Revision Statistic 11
8 pages
33 Ways To Tableau by Ryan Sleeper
0% (1)
33 Ways To Tableau by Ryan Sleeper
124 pages
Determinants_of_the_variability_in_corpo (2)
No ratings yet
Determinants_of_the_variability_in_corpo (2)
16 pages
Afsha Manual
No ratings yet
Afsha Manual
99 pages
Geo. P
No ratings yet
Geo. P
18 pages
ETL Test Cases
No ratings yet
ETL Test Cases
14 pages
Data Description Measures of Central Tendency
No ratings yet
Data Description Measures of Central Tendency
55 pages
Data Profiling
No ratings yet
Data Profiling
15 pages
Fin534 Individual Assignment 1
No ratings yet
Fin534 Individual Assignment 1
27 pages
Screenshot 2023-03-21 at 10.02.39 PM
No ratings yet
Screenshot 2023-03-21 at 10.02.39 PM
15 pages
NWEA Complete Norms2008
No ratings yet
NWEA Complete Norms2008
170 pages
NCERT Solution For CBSE Class 10 Maths Chapter 14 Statistics PDF
No ratings yet
NCERT Solution For CBSE Class 10 Maths Chapter 14 Statistics PDF
23 pages
Statistics: Review Notes Melissa E. Agulto, Ph.D. Professor, Central Luzon State University
No ratings yet
Statistics: Review Notes Melissa E. Agulto, Ph.D. Professor, Central Luzon State University
23 pages
Testing A Data Warehouse
100% (2)
Testing A Data Warehouse
7 pages
Quiz Solutions
No ratings yet
Quiz Solutions
20 pages
08 S1 Silver 4
No ratings yet
08 S1 Silver 4
15 pages
SodaPDF Deleted MSTE 002
No ratings yet
SodaPDF Deleted MSTE 002
27 pages
STAT 1043 Statistics: Week 7
No ratings yet
STAT 1043 Statistics: Week 7
23 pages
Statistics-WPS Office
No ratings yet
Statistics-WPS Office
4 pages
Strategies For Testing Data Warehouse
100% (1)
Strategies For Testing Data Warehouse
4 pages
Correlation of Indian Stock Market With American
No ratings yet
Correlation of Indian Stock Market With American
87 pages
Lab Report Anal. Chem
No ratings yet
Lab Report Anal. Chem
11 pages
How Large Is The Web
No ratings yet
How Large Is The Web
19 pages
Dorado, Venus Module 3
No ratings yet
Dorado, Venus Module 3
7 pages
What Is Dimensional Model
No ratings yet
What Is Dimensional Model
7 pages
Unit Iii - Statistical Process Control: TQM Tools (Seven Tools of Quality) 1. Pareto Diagram 2. Flow Diagram
No ratings yet
Unit Iii - Statistical Process Control: TQM Tools (Seven Tools of Quality) 1. Pareto Diagram 2. Flow Diagram
11 pages
6 Sigma 1619893345
No ratings yet
6 Sigma 1619893345
299 pages
Quiz 2 M
No ratings yet
Quiz 2 M
6 pages
EOI Form For AUS
No ratings yet
EOI Form For AUS
3 pages
Annual: Set - A
No ratings yet
Annual: Set - A
3 pages
Figure 8-11. The Gooseneck Version of The Slotted-Heel Clamp Strap Has A
No ratings yet
Figure 8-11. The Gooseneck Version of The Slotted-Heel Clamp Strap Has A
5 pages
Etl Architecture
No ratings yet
Etl Architecture
76 pages
Datawarehouse Architecture
No ratings yet
Datawarehouse Architecture
34 pages
Pre-Test and Post-Test Analysis: Mean 2,35 Median 2,3 Mode 2,3
No ratings yet
Pre-Test and Post-Test Analysis: Mean 2,35 Median 2,3 Mode 2,3
3 pages
ETLDesignMethodologyDocument SSIS
100% (1)
ETLDesignMethodologyDocument SSIS
12 pages
EDM EnterpriseDataDictionaryStandards
No ratings yet
EDM EnterpriseDataDictionaryStandards
33 pages
Informatica MDM Interview Preparation
100% (1)
Informatica MDM Interview Preparation
35 pages
Data Quality Assessment Guide
No ratings yet
Data Quality Assessment Guide
7 pages
Transcript Application
No ratings yet
Transcript Application
1 page
Metadata Definitions - V01
100% (1)
Metadata Definitions - V01
12 pages
Basics of Dimensional Modeling
100% (1)
Basics of Dimensional Modeling
14 pages
Course+Slides+ +Data+Warehouse+ +the+Ultimate+Guide
No ratings yet
Course+Slides+ +Data+Warehouse+ +the+Ultimate+Guide
393 pages
EOR Request Form 2018 A4 090119 PDF
No ratings yet
EOR Request Form 2018 A4 090119 PDF
1 page
Slowly Changing Dimensions
No ratings yet
Slowly Changing Dimensions
26 pages
SAP Data Quality
No ratings yet
SAP Data Quality
58 pages
Logical Data Modeling Guide
No ratings yet
Logical Data Modeling Guide
13 pages
BI Project Management
No ratings yet
BI Project Management
11 pages
Data Profiling With Informatica Data Quality
No ratings yet
Data Profiling With Informatica Data Quality
5 pages
Technet Etl Design Questionnaire
100% (1)
Technet Etl Design Questionnaire
15 pages
02 - ETL Design Strategy
100% (1)
02 - ETL Design Strategy
95 pages
XBRL US Pacific Rim Workshop Database and Business Intelligence Workshop Karen Hsu Director Product Marketing, Informatica
No ratings yet
XBRL US Pacific Rim Workshop Database and Business Intelligence Workshop Karen Hsu Director Product Marketing, Informatica
18 pages
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
Alex Meadows
No ratings yet
Data Model
100% (1)
Data Model
11 pages
ETL Startegy To Store Data Validation Rules
No ratings yet
ETL Startegy To Store Data Validation Rules
7 pages
2.data Modeling Overview
No ratings yet
2.data Modeling Overview
36 pages
5 Fundamental Data Quality Practices
No ratings yet
5 Fundamental Data Quality Practices
12 pages
CloudMDM Student Lab Guide
No ratings yet
CloudMDM Student Lab Guide
68 pages
Data Lake Beyond The Data Warehouse
100% (1)
Data Lake Beyond The Data Warehouse
44 pages
3.data Modeling Tools
100% (1)
3.data Modeling Tools
28 pages
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
Data Virtuality Best Practices
No ratings yet
Data Virtuality Best Practices
18 pages
Data Quality Migration
No ratings yet
Data Quality Migration
4 pages
Designing A Logical Data Model
No ratings yet
Designing A Logical Data Model
42 pages
MDM Best Practices
No ratings yet
MDM Best Practices
12 pages
ETL Testing Goals
No ratings yet
ETL Testing Goals
3 pages
Innovations in MDM Implementation: Success Via A Boxed Approach
No ratings yet
Innovations in MDM Implementation: Success Via A Boxed Approach
4 pages
Data Warehousing Basics Interview Questions
No ratings yet
Data Warehousing Basics Interview Questions
24 pages
BIGuidebook Templates - BI Logical Data Model - Preliminary Design
No ratings yet
BIGuidebook Templates - BI Logical Data Model - Preliminary Design
9 pages
Data Modeling Principles
100% (1)
Data Modeling Principles
21 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Data Quality A Survey of Data Quality Dimensions
No ratings yet
Data Quality A Survey of Data Quality Dimensions
5 pages
Best Practices For Implementing Cloud Data Governance and Catalog
100% (1)
Best Practices For Implementing Cloud Data Governance and Catalog
45 pages
DAMA Dictionary of Data Management 2nd Ed
100% (2)
DAMA Dictionary of Data Management 2nd Ed
261 pages
7 - Informatica Data Cleanse
100% (1)
7 - Informatica Data Cleanse
43 pages
Data Modeling Interviews
No ratings yet
Data Modeling Interviews
16 pages
Global Services
No ratings yet
Global Services
15 pages
Data Modeling Interview Questions
No ratings yet
Data Modeling Interview Questions
2 pages
Data Architecture Basics: An Illustrated Guide For Non-Technical Readers
100% (6)
Data Architecture Basics: An Illustrated Guide For Non-Technical Readers
31 pages
Data Architecture Is Composed of Models
No ratings yet
Data Architecture Is Composed of Models
7 pages
Data Model Standards and Guidelines
100% (2)
Data Model Standards and Guidelines
72 pages
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
From Everand
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
Len Silverston
No ratings yet
Data Conversion and Cleansing Methodology
No ratings yet
Data Conversion and Cleansing Methodology
21 pages
DCAM Framework Training - Opening Presentation - v3.3.1 - MASTER
No ratings yet
DCAM Framework Training - Opening Presentation - v3.3.1 - MASTER
62 pages
BABOK 3 ONLINE - A Guide To The Business Analysis Body of Knowledge
99% (68)
BABOK 3 ONLINE - A Guide To The Business Analysis Body of Knowledge
514 pages
Identifying Master Data
100% (1)
Identifying Master Data
8 pages
Metrology and Quality Control
No ratings yet
Metrology and Quality Control
4 pages
Better Data Visualizations Scholars
98% (41)
Better Data Visualizations Scholars
464 pages
Collect, Transform and Combine Data Using Power BI and Power Query in Excel (Business Skills)
85% (13)
Collect, Transform and Combine Data Using Power BI and Power Query in Excel (Business Skills)
543 pages
Visual Data Storytelling With Tableau by Lindy Ryan
85% (20)
Visual Data Storytelling With Tableau by Lindy Ryan
450 pages
Storytelling With Data - The New Visualization Data Guide To Reaching Your Business Aim in The Fastest Way
100% (6)
Storytelling With Data - The New Visualization Data Guide To Reaching Your Business Aim in The Fastest Way
100 pages
Storytelling With Data Cole Nussbaumer Knaflic
100% (45)
Storytelling With Data Cole Nussbaumer Knaflic
291 pages
SAP S - 4HANA Migration Cockpit - Migrate Your Data To SAP S - 4HANA
100% (4)
SAP S - 4HANA Migration Cockpit - Migrate Your Data To SAP S - 4HANA
61 pages
The DAMA Guide To The Data Management Body of Knowledge - First Edition
100% (11)
The DAMA Guide To The Data Management Body of Knowledge - First Edition
430 pages
Enterprise Systems Architecture Aligning Business Operating Models To Technology Landscapes (Daljit Roy Banger)
100% (5)
Enterprise Systems Architecture Aligning Business Operating Models To Technology Landscapes (Daljit Roy Banger)
311 pages
Business Analysis For Beginners - Mohamed Elgendy PDF
100% (10)
Business Analysis For Beginners - Mohamed Elgendy PDF
166 pages
Data Governance Playbook
100% (16)
Data Governance Playbook
168 pages
The Data Storytelling Handbook
100% (17)
The Data Storytelling Handbook
49 pages
Fundamentals of Business Intelligence (2015)
89% (19)
Fundamentals of Business Intelligence (2015)
361 pages
Learn Excel Data Analysis
100% (15)
Learn Excel Data Analysis
721 pages
Data Governance Toolkit
100% (10)
Data Governance Toolkit
29 pages
SAP S4HANA Conversion
100% (10)
SAP S4HANA Conversion
382 pages
Talk - Data Quality Framework
100% (1)
Talk - Data Quality Framework
30 pages
Data Migration Cookbook - SAP ECC To SAP S - 4HANA
100% (2)
Data Migration Cookbook - SAP ECC To SAP S - 4HANA
42 pages
101 Business Analysis Techniques
92% (12)
101 Business Analysis Techniques
119 pages
Data Visualization Charts, Maps, and Interactive Graphics
100% (16)
Data Visualization Charts, Maps, and Interactive Graphics
249 pages
Cookbook - SAP ECC To SAP S4 HANA Data Migration (9575)
90% (30)
Cookbook - SAP ECC To SAP S4 HANA Data Migration (9575)
42 pages
Data Engineering Cookbook
88% (8)
Data Engineering Cookbook
88 pages
Architecting A Data Lake
100% (8)
Architecting A Data Lake
60 pages
PWC Information Management Framework: Data Governance Is A Key Component of Information Management
100% (7)
PWC Information Management Framework: Data Governance Is A Key Component of Information Management
3 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
s/4 Hana Migration Porocess
100% (7)
s/4 Hana Migration Porocess
41 pages
Data Migration in SAP S4 HANA
100% (11)
Data Migration in SAP S4 HANA
17 pages
Deloitte - S4 Hana Conversion
100% (4)
Deloitte - S4 Hana Conversion
18 pages