0% found this document useful (0 votes)
3 views3 pages

Analytics Engineering Case Study

The document outlines a case study for blending reports from seven independent source systems into a master report that displays the latest customer data side by side. It details the structure of the reports, the hierarchy of customer relationships, and assumptions regarding data handling and uniqueness across systems. The goal is to create a unified view of customer metrics while addressing potential naming inconsistencies and hierarchical relationships among different systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Analytics Engineering Case Study

The document outlines a case study for blending reports from seven independent source systems into a master report that displays the latest customer data side by side. It details the structure of the reports, the hierarchy of customer relationships, and assumptions regarding data handling and uniqueness across systems. The goal is to create a unified view of customer metrics while addressing potential naming inconsistencies and hierarchical relationships among different systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Analytics Engineering Case Study

We have a collection of 7 source systems (each responsible for different aspects of our
broader product offering) operating somewhat independently. As part of regular processes
they each generate a report showing the status of customers (for the activity covered by that
system). These reports are a “point in time” view, so the metrics will change over time and
are generated at source independently.

The reports are generated at varying cadence (some daily, others weekly) and arrive in
multiple formats (Flat files, database connections and CDC interfaces). They all have the
same columns.

The requirement is to blend these reports in one master report, which allows data
consumers to see all the latest rows for each company (as represented in each of the
sources) side by side.

Report Samples
Below are samples from 2 of the 7 source systems (here called A and B). Each report has
the columns:
● Company name : Text
● Internal Company Reference : An ID from the respective source system. It can be a
number, or it can be text
● Status Summary : Text description
● Metrics 1 - 5 (can we think of some good examples) : They are numeric (some
integers, some fixed-point numbers).

System A
Name Ref Status M1 M2 M3 M4 M5

Bobs Widgets BOBW1 OPEN 2 44.5 4 4 4

Bobs Widgets Events BOBW2 OPEN 3 39.6 6 1 11.5

Jukebox Studios JBS01 NEW 7 38.7 2 6 8

System B
Name Ref Status M1 M2 M3 M4 M5

Bobs Widgets Ltd 2001 ACTIVE 1 0 2 2 1

Gig Management 1001 INACTIVE 4 0 0 0 0

Jukebox Services 2021 ACTIVE 2 44.21 1 0 0


Output
This is the output as suggested by the stakeholder:
Name System Ref Status M1 M2 M3 M4 M5

Bobs Widgets LTD A BOBW1 ACTIVE 2 44.5 4 4 4

Bobs Widgets LTD A BOBW2 ACTIVE 3 39.6 6 1 11.5

Bobs Widgets LTD B 2001 ACTIVE 1 0 2 2 1

Jukebox Studios A JBS01 NEW 7 38.7 2 6 8

Gig Management B 1001 INACTIVE 4 0 0 0 0

Jukebox Services B 2021 ACTIVE 2 44.21 1 0 0

Customer Relationships & Hierarchy


Looking at the sample data, we can immediately see a question about combining data for a
common customer, in this case Is “Bobs widgets” and “Bobs widgets ltd” are the same
overall customer. Whereas despite similar names “Jukebox Studios” and “Jukebox Services”
are entirely independent and unrelated.

Hierarchies are used to represent different parts of a business, or aspects of the business
activity within a system. In some cases this is because each brand is represented
independently but rolled up to a parent “group”.
● Systems A and D have a 2 tier hierarchy (one parent entity can have multiple
children).
● Systems B, F and G have a 1 tier hierarchy (each entity is independent and has no
children)
● Systems C and E have a 3 tier hierarchy (each parent will have 1 or more child
entities, each of the children will have 1 or more children)

We have access to the individual system hierarchy information as a list of key/value pairs
(child references to parent references).

We have some Independent mapping data (from multiple sources), for relationships between
systems. These are stored as key/value pairs which can be used to map between customers
as represented in different systems.

Assumptions
At this point, it can be assumed:
1. The data from any individual system can be generated at any level in the hierarchy,
but it will be consistent within the customer representation within the system and
there will be no overlap/requirement for deduping an individual report.
2. Each new report from each system will completely overwrite the previous one, only
the data points in the latest report are required.
3. An individual company can exist in multiple source systems, the name may be
inconsistent.
4. All metrics are additive.
5. All input rows are to be included in the output (the reporting interface will do any
required filtering).
6. No further aggregations or calculations will be required on the data for the output.
7. The “Internal Company Reference” is unique within each source system, however it
is not unique across systems (company 1001 in System B may not be the same as
company 1001 in system D).

You might also like