Power Bi Project
Power Bi Project
Introduction:
Welcome! The Property Analysis Competition Sprint is designed to follow an industry level
best practice design of Business Intelligence project, it helps interns gain familiarity with
Business Intelligence end to end solution, and also apply industry-level best practices in
Business Intelligence and Data Analysis projects. While the instructions tell you what to do,
they do not always necessarily tell you how to do it. This is deliberate as it is important for
you to develop an independent drive to solve problems on your own. You should have a
good idea where to start from your 6 weeks of training and the on-boarding tasks.
RawDataSet
Part 1
Data Engineering: Design Data Warehouse, Data Model, ELT / ETL Data pipeline.
2. Use Visual Studio and create SSIS package, design dataflows to extract data from
NSW-Public-Schools-Master-Dataset (csv), and load data into tables into your Data
Warehouse, use “load_” as prefix to name all your data load tables in your Data
Warehouse.
Star Schema Model including dimension and fact tables in Data Warehouse, use Bus
Matrix and think about what dimension and fact tables you need to create. If you use the
Star Schema Model you might need to consider using the factless fact table, do some
4. Create another SSIS package and design dataflows to get Data extracted from load
tables, then transformed and loaded into Snowflake Schema Model or Star Schema
Model.
Note: follow the same process as the onboarding task to submit your work to Mentor to
review.
-------------------------------------------------------------------------------------------------------------------------
Answers
This project is a comprehensive task designed to test your ability to work on a Business Intelligence
(BI) project, from data engineering to designing data models for analysis. Here’s a breakdown of the
steps with more details to help you approach each part effectively:
Datasets:
Objective: Understand the structure, relationships, and types of data in each dataset.
Identify primary keys, foreign keys, and data quality issues.
Using SSIS:
1. Extract: Load data from CSV files into staging tables in your database. Use
the “load_” prefix for table names (e.g.,
load_AUS_SubCityDistrictState_Data).
2. Transform: Clean and format the data (e.g., handle missing values, format
dates).
Tips:
Choose Schema:
o Star Schema: Simplified, efficient for queries. Fact tables connect directly to
dimension tables.
Steps:
Dimension Tables:
SSIS Dataflows:
3. Transform:
Key Tasks:
Transformation:
o Create a Category table (or as a degenerate dimension within the fact table).
$0–750k: Category 1.
$750k–$1.5M: Category 2.
$1.5M–$2.5M: Category 3.
$2.5M+: Category 4.
Load:
o Insert transformed data into the Category dimension or the fact table with the
derived column.
Include: