0% found this document useful (0 votes)

6 views4 pages

Power Bi Project

The Property Analysis Competition Sprint is designed for interns to apply Business Intelligence best practices, focusing on data engineering and modeling. Participants are required to download datasets, create SSIS packages for data extraction and transformation, and design a Snowflake or Star Schema for analysis. The final submission includes documentation, ER diagrams, and SSIS package screenshots for mentor review.

Uploaded by

Meenakshi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views4 pages

Power Bi Project

Uploaded by

Meenakshi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

X[Project - Property Analysis]

BI Developer/Data Analyst Competition Sprint Part 1 - Property Data Engineering.

Introduction:

Welcome! The Property Analysis Competition Sprint is designed to follow an industry level

best practice design of Business Intelligence project, it helps interns gain familiarity with

Business Intelligence end to end solution, and also apply industry-level best practices in

Business Intelligence and Data Analysis projects. While the instructions tell you what to do,

they do not always necessarily tell you how to do it. This is deliberate as it is important for

you to develop an independent drive to solve problems on your own. You should have a

good idea where to start from your 6 weeks of training and the on-boarding tasks.

Download the Excel files from the link below.

RawDataSet

Part 1

Data Engineering: Design Data Warehouse, Data Model, ELT / ETL Data pipeline.

1. Click above RawDataSet link and download 3 datasets: AUS_SubCityDistrictState_Data

,NSW_PropertyMedainValue, and NSW-Public-Schools-Master-Dataset.

2. Use Visual Studio and create SSIS package, design dataflows to extract data from

AUS_SubCityDistrictState_Data, NSW_PropertyMedainValue and

NSW-Public-Schools-Master-Dataset (csv), and load data into tables into your Data

Warehouse, use “load_” as prefix to name all your data load tables in your Data

Warehouse.

3. Follow Kimball dimensional modeling methodology, Design Snowflake Schema Model, or

Star Schema Model including dimension and fact tables in Data Warehouse, use Bus

Matrix and think about what dimension and fact tables you need to create. If you use the

Star Schema Model you might need to consider using the factless fact table, do some

research about the factless fact table.

4. Create another SSIS package and design dataflows to get Data extracted from load

tables, then transformed and loaded into Snowflake Schema Model or Star Schema

Model.

5. You need a “Category” as a dimension table ( or degenerate dimension in fact table), in

SSIS categorize Median Value into Category 0-750k, $750k-$1.5M, $1.5M-$2.5M,

$2.5M+ and load the “Category” values into dimension table (or degenerate dimension in

fact table) , this is a transformation of business requirements for Property Analysis.

Note: follow the same process as the onboarding task to submit your work to Mentor to

review.

-------------------------------------------------------------------------------------------------------------------------

Answers

This project is a comprehensive task designed to test your ability to work on a Business Intelligence
(BI) project, from data engineering to designing data models for analysis. Here’s a breakdown of the
steps with more details to help you approach each part effectively:

1. Download and Explore the Datasets

 Datasets:

o AUS_SubCityDistrictState_Data: Likely contains demographic or regional data.

o NSW_PropertyMedianValue: Contains property median values.

o NSW-Public-Schools-Master-Dataset: Contains data related to public schools in

NSW.

 Objective: Understand the structure, relationships, and types of data in each dataset.
Identify primary keys, foreign keys, and data quality issues.

2. Create Data Load Tables in Data Warehouse

 Using SSIS:

o Open Visual Studio and create an SSIS project.

o Design Data Flows:

1. Extract: Load data from CSV files into staging tables in your database. Use
the “load_” prefix for table names (e.g.,
load_AUS_SubCityDistrictState_Data).

2. Transform: Clean and format the data (e.g., handle missing values, format
dates).

3. Load: Insert the cleaned data into staging tables.

 Tips:

o Use Data Flow Tasks for ETL.

o Configure Flat File Sources for the CSV files.

o Use Data Conversion or Derived Column transformations for data type consistency.

3. Design a Snowflake or Star Schema

 Choose Schema:

o Star Schema: Simplified, efficient for queries. Fact tables connect directly to
dimension tables.

o Snowflake Schema: More normalized, reduces redundancy but adds complexity.

 Steps:

o Use the Kimball methodology:

1. Identify business processes (e.g., property sales, school performance).

2. Identify the grain (lowest level of detail in the fact table).

3. Create dimension tables (e.g., Location, School, Property Category).

4. Create fact tables (e.g., PropertyAnalysis_Fact).

 Fact Table Example:

PropertyAnalysis_Fact with columns like Property_ID, Median_Value, Location_ID,
Category_ID, School_ID.

 Dimension Tables:

o Location_Dim: City, district, state.

o School_Dim: School name, type, and location.

o Category_Dim: Categories based on median value.

4. Implement Transformation and Load Data into the Schema

 SSIS Dataflows:

1. Create a new SSIS package.

2. Extract data from load_ tables.

3. Transform:

 Categorize Median_Value into ranges (0–750k, 750k–1.5M, etc.) using a

Derived Column or Script Component.

 Normalize and map keys between tables.

4. Load the transformed data into dimension and fact tables.

 Key Tasks:

o Implement Lookups to map keys between staging and dimension tables.

o Use Conditional Splits for value categorization.

5. Include “Category” Dimension

 Transformation:

o Create a Category table (or as a degenerate dimension within the fact table).

o Assign values to Median_Value using a transformation:

 $0–750k: Category 1.

 $750k–$1.5M: Category 2.

 $1.5M–$2.5M: Category 3.

 $2.5M+: Category 4.

 Load:

o Insert transformed data into the Category dimension or the fact table with the
derived column.

6. Submission and Review

 Package your SSIS project.

 Include:

o ER diagrams showing the schema.

o Documentation of your process, decisions, and assumptions.

o Screenshots of SSIS packages and transformations.

 Submit to your mentor for feedback.

Business Intelligence, Data Analytics and Reporting Training
No ratings yet
Business Intelligence, Data Analytics and Reporting Training
5 pages
DWDM Lab Manual Excercises
No ratings yet
DWDM Lab Manual Excercises
91 pages
DWDM LAB Final Manualtest
No ratings yet
DWDM LAB Final Manualtest
134 pages
DHW Ex4 EX5 & EX6
No ratings yet
DHW Ex4 EX5 & EX6
8 pages
Introduction To Data Warehouses. Data Warehouse Development Lifecycle (Kimball's Approach)
No ratings yet
Introduction To Data Warehouses. Data Warehouse Development Lifecycle (Kimball's Approach)
29 pages
Ass 1
No ratings yet
Ass 1
31 pages
Data Warehouse Using Kimball Approach in Computer Maniac
No ratings yet
Data Warehouse Using Kimball Approach in Computer Maniac
10 pages
Datawarehouse Intro Slides
No ratings yet
Datawarehouse Intro Slides
33 pages
DM Cia1
No ratings yet
DM Cia1
31 pages
Assginment 1 Case 7233
No ratings yet
Assginment 1 Case 7233
11 pages
Typical Interview Questions PDF
No ratings yet
Typical Interview Questions PDF
9 pages
Unit - I
No ratings yet
Unit - I
65 pages
SQL Server 2008 For Business Intelligence: UTS Short Course
No ratings yet
SQL Server 2008 For Business Intelligence: UTS Short Course
62 pages
GUNADWDM
No ratings yet
GUNADWDM
105 pages
BI Journal Manish 3
No ratings yet
BI Journal Manish 3
55 pages
Time Allowed: Three Hours 5 January 2018, 9AM-12PM: Instructions To Candidates
No ratings yet
Time Allowed: Three Hours 5 January 2018, 9AM-12PM: Instructions To Candidates
3 pages
Study Guide MO-500 Certification Exam Microsoft Access Expert ( Office 2019)
From Everand
Study Guide MO-500 Certification Exam Microsoft Access Expert ( Office 2019)
Anand Vemula
No ratings yet
ADBMS Assignment 2
No ratings yet
ADBMS Assignment 2
16 pages
Assignment 2
0% (1)
Assignment 2
6 pages
Learn SQL: Database Management Basics
From Everand
Learn SQL: Database Management Basics
Kiet Huynh
No ratings yet
Learning SQL: Master SQL Fundamentals
From Everand
Learning SQL: Master SQL Fundamentals
Kiet Huynh
No ratings yet
Data Warehouse
No ratings yet
Data Warehouse
10 pages
Data Mining Cat
No ratings yet
Data Mining Cat
6 pages
Exam - 1: October 5, 2016 Exam - 2: November 23, 2016 Quiz - 2: October 26, 2016 Quiz - 3: November 9, 2016
No ratings yet
Exam - 1: October 5, 2016 Exam - 2: November 23, 2016 Quiz - 2: October 26, 2016 Quiz - 3: November 9, 2016
14 pages
BI Assignment 1
No ratings yet
BI Assignment 1
6 pages
BIDW Roadmap: Author: Dave Goyal
No ratings yet
BIDW Roadmap: Author: Dave Goyal
27 pages
DWDM Record
No ratings yet
DWDM Record
83 pages
CA2 Notes
No ratings yet
CA2 Notes
8 pages
Data Warehouse: Subject-Oriented Integrated Time Variant Non Volatile
No ratings yet
Data Warehouse: Subject-Oriented Integrated Time Variant Non Volatile
10 pages
CCS341 Set3
100% (1)
CCS341 Set3
3 pages
MSBI Tools Step by Step Process PDF
No ratings yet
MSBI Tools Step by Step Process PDF
111 pages
Exam 70-463:: Implementing A Data Warehouse With Microsoft SQL Server 2012
No ratings yet
Exam 70-463:: Implementing A Data Warehouse With Microsoft SQL Server 2012
5 pages
Assignment 2
0% (6)
Assignment 2
7 pages
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
From Everand
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
Brian Knight
No ratings yet
Data Warehousing Answer Key
No ratings yet
Data Warehousing Answer Key
4 pages
DWM QB Cyse
No ratings yet
DWM QB Cyse
8 pages
INTERNSHIP
No ratings yet
INTERNSHIP
7 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
10 pages
Ba Createing Data Mart SQL
No ratings yet
Ba Createing Data Mart SQL
25 pages
Pune Agenda
No ratings yet
Pune Agenda
8 pages
BI - Lab Manual
No ratings yet
BI - Lab Manual
28 pages
DW Concepts
100% (1)
DW Concepts
40 pages
Designing and Implementing A Data Warehouse Using Dimensional Mod
No ratings yet
Designing and Implementing A Data Warehouse Using Dimensional Mod
87 pages
DMBI Winter 23
No ratings yet
DMBI Winter 23
45 pages
Tasbi Ul Hasan-20023247
No ratings yet
Tasbi Ul Hasan-20023247
10 pages
MSBI Corporate Training MatMSBIerial
No ratings yet
MSBI Corporate Training MatMSBIerial
65 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
MSBI Corporate Training MatMSBIerial
No ratings yet
MSBI Corporate Training MatMSBIerial
65 pages
767 Implementing A SQL Data Warehouse: Exam Design
No ratings yet
767 Implementing A SQL Data Warehouse: Exam Design
4 pages
DWM Practical Notes Theory Answers All
No ratings yet
DWM Practical Notes Theory Answers All
15 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
2010-SQL Saturday WM Presentation
No ratings yet
2010-SQL Saturday WM Presentation
20 pages
Data Mining - Assignment
No ratings yet
Data Mining - Assignment
15 pages
Warehousing
No ratings yet
Warehousing
13 pages
Week 03 High Level Dimensional Modeling
No ratings yet
Week 03 High Level Dimensional Modeling
9 pages
DMBI Unit-2
No ratings yet
DMBI Unit-2
36 pages
Report Data Storage Assignment PDF
No ratings yet
Report Data Storage Assignment PDF
33 pages
Cat Data Mining
No ratings yet
Cat Data Mining
4 pages
20bcs087 Akhil Kholia
No ratings yet
20bcs087 Akhil Kholia
28 pages
Week 04 Detailed Dimensional Modeling
No ratings yet
Week 04 Detailed Dimensional Modeling
6 pages
Notes Format
No ratings yet
Notes Format
132 pages
(Ebook) Data Warehousing Fundamentals For IT Professionals, Second Edition by Paulraj Ponniah ISBN 9780470462072, 9780470604113, 0470462078, 0470604115 PDF Download
No ratings yet
(Ebook) Data Warehousing Fundamentals For IT Professionals, Second Edition by Paulraj Ponniah ISBN 9780470462072, 9780470604113, 0470462078, 0470604115 PDF Download
51 pages
Data Warehouse Unit-3 Complete
No ratings yet
Data Warehouse Unit-3 Complete
31 pages
Data Modeling, Star Schema, Snowflake Schema
No ratings yet
Data Modeling, Star Schema, Snowflake Schema
7 pages
Kimball - Data Modelling
No ratings yet
Kimball - Data Modelling
11 pages
Vincent Mcburney (8) : Surrogate Keys For Datastage Server Edition
No ratings yet
Vincent Mcburney (8) : Surrogate Keys For Datastage Server Edition
15 pages
BigData Oriented To Business Decision Making A Real Case
No ratings yet
BigData Oriented To Business Decision Making A Real Case
21 pages
Week 11 - Tut 10 Solutions
No ratings yet
Week 11 - Tut 10 Solutions
14 pages
Chapter 29
No ratings yet
Chapter 29
30 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
16 pages
IS438 Tutorial1 DimensionalModeling Exercices
No ratings yet
IS438 Tutorial1 DimensionalModeling Exercices
3 pages
PL 300
No ratings yet
PL 300
94 pages
SQL - ERD Homework
No ratings yet
SQL - ERD Homework
26 pages
Schemas For Multidimensional Data Models
No ratings yet
Schemas For Multidimensional Data Models
18 pages
Patni Computer Systems LTD.: Student Guide
No ratings yet
Patni Computer Systems LTD.: Student Guide
197 pages
Unit 2 - Data Warehouse Logical Designm
No ratings yet
Unit 2 - Data Warehouse Logical Designm
73 pages
Datastage Questions
No ratings yet
Datastage Questions
18 pages
Power BI Intermediate Slides
No ratings yet
Power BI Intermediate Slides
168 pages
COMP8047 - S03 Business Requirements
No ratings yet
COMP8047 - S03 Business Requirements
30 pages
Lecture 12-13 (31-MAY-01-JUNE - 07-08 - JUNE-2023) - CH09 - PPT
No ratings yet
Lecture 12-13 (31-MAY-01-JUNE - 07-08 - JUNE-2023) - CH09 - PPT
88 pages
R18CSE4102-UNIT 1 Data Mining Notes
No ratings yet
R18CSE4102-UNIT 1 Data Mining Notes
26 pages
Best Practices For Multi-Dimensional Design Using Cognos 8 Framework Manager
No ratings yet
Best Practices For Multi-Dimensional Design Using Cognos 8 Framework Manager
24 pages
Data Warehousing and BA
No ratings yet
Data Warehousing and BA
77 pages
Greenplum Architecture, Administration, and
No ratings yet
Greenplum Architecture, Administration, and
573 pages
1.difference Between Z/Os and LUW: Accessing DB2
No ratings yet
1.difference Between Z/Os and LUW: Accessing DB2
12 pages
DW and Abinitio Basic Concepts
No ratings yet
DW and Abinitio Basic Concepts
27 pages
DMW Lab Manual
No ratings yet
DMW Lab Manual
42 pages
Whitepaper Cisco Datavault
No ratings yet
Whitepaper Cisco Datavault
36 pages
Data Warehousing Schemas
No ratings yet
Data Warehousing Schemas
17 pages
Data Warehousing & Data Mining Chapter 2
No ratings yet
Data Warehousing & Data Mining Chapter 2
88 pages

Power Bi Project

Uploaded by

Power Bi Project

Uploaded by

X[Project - Property Analysis]

BI Developer/Data Analyst Competition Sprint Part 1 - Property Data Engineering.

Download the Excel files from the link below.

1. Click above RawDataSet link and download 3 datasets: AUS_SubCityDistrictState_Data

,NSW_PropertyMedainValue, and NSW-Public-Schools-Master-Dataset.

AUS_SubCityDistrictState_Data, NSW_PropertyMedainValue and

3. Follow Kimball dimensional modeling methodology, Design Snowflake Schema Model, or

research about the factless fact table.

5. You need a “Category” as a dimension table ( or degenerate dimension in fact table), in

SSIS categorize Median Value into Category 0-750k, $750k-$1.5M, $1.5M-$2.5M,

fact table) , this is a transformation of business requirements for Property Analysis.

1. Download and Explore the Datasets

o AUS_SubCityDistrictState_Data: Likely contains demographic or regional data.

o NSW_PropertyMedianValue: Contains property median values.

o NSW-Public-Schools-Master-Dataset: Contains data related to public schools in

2. Create Data Load Tables in Data Warehouse

o Open Visual Studio and create an SSIS project.

o Design Data Flows:

3. Load: Insert the cleaned data into staging tables.

o Use Data Flow Tasks for ETL.

o Configure Flat File Sources for the CSV files.

3. Design a Snowflake or Star Schema

o Snowflake Schema: More normalized, reduces redundancy but adds complexity.

o Use the Kimball methodology:

1. Identify business processes (e.g., property sales, school performance).

2. Identify the grain (lowest level of detail in the fact table).

3. Create dimension tables (e.g., Location, School, Property Category).

4. Create fact tables (e.g., PropertyAnalysis_Fact).

 Fact Table Example:

o Location_Dim: City, district, state.

o School_Dim: School name, type, and location.

o Category_Dim: Categories based on median value.

4. Implement Transformation and Load Data into the Schema

1. Create a new SSIS package.

2. Extract data from load_ tables.

 Categorize Median_Value into ranges (0–750k, 750k–1.5M, etc.) using a

 Normalize and map keys between tables.

4. Load the transformed data into dimension and fact tables.

o Implement Lookups to map keys between staging and dimension tables.

5. Include “Category” Dimension

o Assign values to Median_Value using a transformation:

6. Submission and Review

 Package your SSIS project.

o ER diagrams showing the schema.

o Documentation of your process, decisions, and assumptions.

o Screenshots of SSIS packages and transformations.

 Submit to your mentor for feedback.

You might also like