(BI-2425) Project Assignment
(BI-2425) Project Assignment
INTELLIGENCE
PROJECT
BUILDING AND MINING DATA WAREHOUSE
1. General Information
Assignment ID PROJECT
Estimated duration: 10 weeks
Submission deadline: 22/12/2024
Assignment type: Student Group
Submission chanel: Moodle
Hồ Thị Hoàng Vy, Tiết Gia Hồng,
Teachers:
Nguyễn Ngọc Minh Châu
[email protected],
Contacts: [email protected]
[email protected]
2. Learning outcomes
This assignment is to gain the following outcomes:
- G3.3 Design a Star or Snowflake data model diagram through the
Multidimensional Design from analytical business requirements and OLTP
system
- G5.1 Deploy the ETL procedure to extracting data from disparate
databases and data sources, and then transforming the data for effective
integration into a data warehouse using SSIS tool
- G5.2 Operate the basic OLAP technologies using SSAS tools.
- G5.3 Create a dashboard and other visualizations to analyze and
communicate the data from DW using SSRS or excel...
- G5.4 Applying the data mining algorithms in Analysis Services to your
data.
3. Requirements and submission rules
The objective of this project is to create a data warehouse utilizing air quality data from
the Environmental Protection Agency's (EPA) Daily Summary AQI by County across 10
States. The aim is to analyze this data warehouse in order to identify trends and patterns
in U.S. air quality over 3 years (2021-2023)
3.2 Design data warehouse (DW), synthesize, load data from the
sources into DW, then design and build Cube
Suggestions:
- Map the above data sources to get the values for building Geography dimension
with dimensional hierarchy as follows: State > County
- Transform the datetime data to create the Date dimension with dimensional
hierarchy: Year > Quarter > Month > Day
-
- Define and design other dimensional hierarchies to meet OLAP and Report
requirements
- The report should include a visual representation of the data and an analysis
based on the results of the questionnaire.
- The analysis should be concise and to the point, avoiding lengthy and verbose
writing.
- Tips: You could present your analysis based on the hints in each question.
1. Report the min and max of AQI value for each State during each quarter of
years. Analysis hints: How do the AQI values fluctuate during the year? Pay
attention to the values ( max, min). Are any unusually large or small?
2. Report the mean and the standard deviation of AQI value for each State during
each quarter of years. Analysis hints: How do the AQI values fluctuate during the
year? Pay attention to the values (mean, std, max, min). Are any unusually large or
small?
3. Report the number of days, and the mean AQI value where the air quality is rated
as "very unhealthy" or worse for each State and County. Analysis hint: What is the
AQI limit above which air quality is "very unhealthy" or worse?
4. For the four following states: Hawaii, Alaska, Illinois and Delaware, count the
number of days in each air quality Category (Good, Moderate,etc.) by County.
Analysis hints: Comparing the data of the states and counties, focus on the
distribution of the harmful air condition. What could you conclude about the
differences?)
5. For the four following states: Hawaii, Alaska, Illinois and Delaware, compute the
mean AQI value by quarters. Analysis hints: Comparing the data of the states
over the year. What could you conclude about the fluctuations?
6. Design a report to demonstrate the AQI fluctuation trends over the year for the four
following states: Hawaii, Alaska, Illinois and California. Analysis hint: Give your
opinion about the fluctuations of AQI value.
8. Use a regional map to visually represent (by color) the mean AQI value in regions
during a year. Example:
Question for bonus points:
9. Report the mean, the standard deviation, min and max of AQI value group by
State and County during each quarter of the year. Analysis hints: Pay attention
to the values (mean, std, max, min). Are any unusually large or small? Compare
the standard deviation values between question 1 and 2, explain.
False: Otherwise
Report the mean AQI value by State, Category, DayLightSaving over years.
Analysis hint: Is there any notable difference on the air quality during the Daylight
Saving period compared to the other?
Be caution: The Category in the data set is calculated for each County, not State.
12. Report the number of days by Category and Defining Parameter. Analysis hints:
What is your opinion on the pollution situation in the United States as a whole?
Additionally, please identify the primary factors that the country should consider in
order to enhance air quality
- Students propose applications of any case, explain the algorithm used, why, how
the results are, etc.
3.5 Conclusion:
A brief summary of the group's project outcomes including the following elements:
-
- Based on the data and reports, please give an overview of air quality in US counties
in 2023 and explain the aforementioned arguments.
- This section should also include a summary of the project's achievements and
suggestions for potential improvement areas.
4. Assessment
- Midterm Q&A: ETL process (data flow, data cleaning, ETL data from source to DW)
- Final Q&A: Completed project (mining DW with reports, OLAP, mining, periodical
automatic job creation to perform ETL)
- You can refer to any documents or ask for help during the assignment, but cheating
or plagiarism will always result in a 0 for the project. No exceptions!!!
- The teacher evaluates the total score for each group, the group determines the
percentage of each member's score depending on the level of contribution to the
project.
5. References
Air quality index - WikipediaAir quality index - Wikipedia
6. Other rules
- Students work in groups and post the source code on Github
- Project includes:
▪ Members Information
o Main content:
o Video: