0% found this document useful (0 votes)
6 views4 pages

Power Bi Project

The Property Analysis Competition Sprint is designed for interns to apply Business Intelligence best practices, focusing on data engineering and modeling. Participants are required to download datasets, create SSIS packages for data extraction and transformation, and design a Snowflake or Star Schema for analysis. The final submission includes documentation, ER diagrams, and SSIS package screenshots for mentor review.

Uploaded by

Meenakshi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Power Bi Project

The Property Analysis Competition Sprint is designed for interns to apply Business Intelligence best practices, focusing on data engineering and modeling. Participants are required to download datasets, create SSIS packages for data extraction and transformation, and design a Snowflake or Star Schema for analysis. The final submission includes documentation, ER diagrams, and SSIS package screenshots for mentor review.

Uploaded by

Meenakshi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

X[Project - Property Analysis]

BI Developer/Data Analyst Competition Sprint Part 1 - Property Data Engineering.

Introduction:

Welcome! The Property Analysis Competition Sprint is designed to follow an industry level

best practice design of Business Intelligence project, it helps interns gain familiarity with

Business Intelligence end to end solution, and also apply industry-level best practices in

Business Intelligence and Data Analysis projects. While the instructions tell you what to do,

they do not always necessarily tell you how to do it. This is deliberate as it is important for

you to develop an independent drive to solve problems on your own. You should have a

good idea where to start from your 6 weeks of training and the on-boarding tasks.

Download the Excel files from the link below.

RawDataSet

Part 1

Data Engineering: Design Data Warehouse, Data Model, ELT / ETL Data pipeline.

1. Click above RawDataSet link and download 3 datasets: AUS_SubCityDistrictState_Data

,NSW_PropertyMedainValue, and NSW-Public-Schools-Master-Dataset.

2. Use Visual Studio and create SSIS package, design dataflows to extract data from

AUS_SubCityDistrictState_Data, NSW_PropertyMedainValue and

NSW-Public-Schools-Master-Dataset (csv), and load data into tables into your Data

Warehouse, use “load_” as prefix to name all your data load tables in your Data

Warehouse.

3. Follow Kimball dimensional modeling methodology, Design Snowflake Schema Model, or

Star Schema Model including dimension and fact tables in Data Warehouse, use Bus

Matrix and think about what dimension and fact tables you need to create. If you use the

Star Schema Model you might need to consider using the factless fact table, do some

research about the factless fact table.

4. Create another SSIS package and design dataflows to get Data extracted from load

tables, then transformed and loaded into Snowflake Schema Model or Star Schema

Model.

5. You need a “Category” as a dimension table ( or degenerate dimension in fact table), in

SSIS categorize Median Value into Category 0-750k, $750k-$1.5M, $1.5M-$2.5M,


$2.5M+ and load the “Category” values into dimension table (or degenerate dimension in

fact table) , this is a transformation of business requirements for Property Analysis.

Note: follow the same process as the onboarding task to submit your work to Mentor to

review.

-------------------------------------------------------------------------------------------------------------------------

Answers

This project is a comprehensive task designed to test your ability to work on a Business Intelligence
(BI) project, from data engineering to designing data models for analysis. Here’s a breakdown of the
steps with more details to help you approach each part effectively:

1. Download and Explore the Datasets

 Datasets:

o AUS_SubCityDistrictState_Data: Likely contains demographic or regional data.

o NSW_PropertyMedianValue: Contains property median values.

o NSW-Public-Schools-Master-Dataset: Contains data related to public schools in


NSW.

 Objective: Understand the structure, relationships, and types of data in each dataset.
Identify primary keys, foreign keys, and data quality issues.

2. Create Data Load Tables in Data Warehouse

 Using SSIS:

o Open Visual Studio and create an SSIS project.

o Design Data Flows:

1. Extract: Load data from CSV files into staging tables in your database. Use
the “load_” prefix for table names (e.g.,
load_AUS_SubCityDistrictState_Data).

2. Transform: Clean and format the data (e.g., handle missing values, format
dates).

3. Load: Insert the cleaned data into staging tables.

 Tips:

o Use Data Flow Tasks for ETL.

o Configure Flat File Sources for the CSV files.


o Use Data Conversion or Derived Column transformations for data type consistency.

3. Design a Snowflake or Star Schema

 Choose Schema:

o Star Schema: Simplified, efficient for queries. Fact tables connect directly to
dimension tables.

o Snowflake Schema: More normalized, reduces redundancy but adds complexity.

 Steps:

o Use the Kimball methodology:

1. Identify business processes (e.g., property sales, school performance).

2. Identify the grain (lowest level of detail in the fact table).

3. Create dimension tables (e.g., Location, School, Property Category).

4. Create fact tables (e.g., PropertyAnalysis_Fact).

 Fact Table Example:


PropertyAnalysis_Fact with columns like Property_ID, Median_Value, Location_ID,
Category_ID, School_ID.

 Dimension Tables:

o Location_Dim: City, district, state.

o School_Dim: School name, type, and location.

o Category_Dim: Categories based on median value.

4. Implement Transformation and Load Data into the Schema

 SSIS Dataflows:

1. Create a new SSIS package.

2. Extract data from load_ tables.

3. Transform:

 Categorize Median_Value into ranges (0–750k, 750k–1.5M, etc.) using a


Derived Column or Script Component.

 Normalize and map keys between tables.

4. Load the transformed data into dimension and fact tables.

 Key Tasks:

o Implement Lookups to map keys between staging and dimension tables.


o Use Conditional Splits for value categorization.

5. Include “Category” Dimension

 Transformation:

o Create a Category table (or as a degenerate dimension within the fact table).

o Assign values to Median_Value using a transformation:

 $0–750k: Category 1.

 $750k–$1.5M: Category 2.

 $1.5M–$2.5M: Category 3.

 $2.5M+: Category 4.

 Load:

o Insert transformed data into the Category dimension or the fact table with the
derived column.

6. Submission and Review

 Package your SSIS project.

 Include:

o ER diagrams showing the schema.

o Documentation of your process, decisions, and assumptions.

o Screenshots of SSIS packages and transformations.

 Submit to your mentor for feedback.

You might also like