Assignment! DS EN22CS301186
Assignment! DS EN22CS301186
EN22CS301186
Assignment 02
1
Anushtha Rathore
EN22CS301186
INDEX
02 What are the important challenges and scope of Data Science project 8-11
management.e inspiring Industry Projects on Data Science in detail
with atleast 3 examples
Important Difficulties in Project Management for Data Science:
Scope of Data Science Project Management
Reference
2
Anushtha Rathore
EN22CS301186
The analysis of air quality data gathered from multiple monitoring stations located around a
metropolis is the main goal of this case study. The objective is to comprehend how different
elements affect air pollution levels, spot long-term patterns, and investigate connections
between meteorological and air quality indicator data.
Description of Data
The hourly air quality measurements in the dataset include:
DateTime: The measurement's timestamp.
PM2.5: The concentration of particulate matter (µg/m³).
PM10: The concentration of particulate matter (µg/m³).
NO2: The concentration of nitrogen dioxide (µg/m3).
The ambient temperature in degrees Celsius.
Humidity: Percentage of relative humidity.
Wind speed (km/h) is the wind speed.
Location: Tracking the position of the monitoring station.
Techniques
1. Data Purification
Interpolated missing data, particularly for continuous measurements.
DateTime was converted to the proper datetime objects for time series analysis.
2. Characteristic Statistics
Summary statistics (mean, median, max, min) for each air quality metric were
computed.
Computed monthly averages to assess seasonal patterns.
3. Information Visualization:
Time Series Analysis: Line plots were made to show how PM2.5 levels have
changed over time.
Relationships Heatmap: To investigate relationships between various air
contaminants and climatic conditions, a heatmap was created.
4. Trend Evaluation:
To discover longer-term patterns and level out daily volatility, rolling averages
were utilized.
Examined changes in the quality of the air at various times of the year and in
different places.
3
Anushtha Rathore
EN22CS301186
Results
Trends in Air Quality: During the winter, there was a noticeable seasonal variation in
PM2.5 levels, with higher concentrations owing to stable meteorological conditions
and increased heating.
Findings pertaining to the Correlation: There was a strong positive correlation (0.85)
between PM2.5 and NO2, indicating a major contribution of vehicle emissions to
particulate matter pollution.
Weather Impact: Lower PM2.5 levels were linked to higher humidity levels, suggesting
that moisture in the air may aid in particulate settling.
Perspectives
According to the analysis,
Reducing vehicle emissions through policy might greatly enhance air quality.
Public awareness initiatives ought to concentrate on tactics for reducing pollution
throughout the winter.
Public awareness initiatives ought to concentrate on tactics for reducing pollution
throughout the winter.
Increased surveillance in key months can help reduce the health hazards linked to air
pollution.
4
Anushtha Rathore
EN22CS301186
Description: This line plot shows the trend of PM2.5 levels over time, with the x-axis
representing the date and the y-axis representing the PM2.5 concentration (µg/m³).
You can use a rolling average to smooth out daily fluctuations.
In this case study, we examine a dataset that includes movie-related data, such as ratings,
genres, and box office results. The goal is to identify trends in the box office performance of
films and determine the elements that lead to high attendance and profits.
Description of Data
The dataset includes details on five thousand films, including:
MovieID: A special number assigned to every film.
Title: Film title.
Genre: The film's genre, such as comedy or action.
User rating on average, out of ten.
BoxOffice: The total amount made at the box office (millions).
ReleaseYear: The year of the film's premiere.
Runtime: The length of the film, expressed in minutes.
Director: The film's director.
Approach
1. Data Cleaning:
Deleted unnecessary columns and duplicate entries.
Handled missing values by imputing the mean or median for box office receipts
and ratings.
2. Characteristic Statistics
Examined box office results and average ratings by genre.
Looked at runtime distributions and how they related to revenue and ratings.
3. Information Visualization:
Box Plot: Box plots were made to compare box office receipts and ratings for
various genres.
Scatter Plot: To investigate the connections between runtime, box office
receipts, and ratings, scatter plots were created.
4. Trend Evaluation:
Examined patterns in income and ratings over time to determine whether
certain genres become more well-like
5
Anushtha Rathore
EN22CS301186
Results
Genre Performance: Compared to genres like romance and documentaries, action
and adventure films often earned higher average scores and performed better at the
box office.
Genre Performance: Compared to genres like romance and documentaries, action
and adventure films often earned higher average scores and performed better at the
box office.
Runtime Analysis: It appears that mid-length films are more popular with audiences
since they have longer runtimes—between 90 and 120 minutes—and generate higher
revenue.
Director Influence: Movies with well-known filmmakers typically have better box
office results and ratings, demonstrating the value of celebrity in the business.
Perspectives
According to the analysis, it appears that:
To optimize their appeal to audiences, movie companies ought to concentrate on
making mid-length action and adventure films.
Working with well-known directors could improve a movie's chances of success.
Genre trends should be highlighted in marketing plans to correspond with audience
preferences.
6
Anushtha Rathore
EN22CS301186
Reference:
https://research.ibm.com/publications/advances-in-exploratory-data-analysis-visualisation-
and-quality-for-data-centric-ai-systems
Question 2) What are the important challenges and scope of Data Science project
management.
7
Anushtha Rathore
EN22CS301186
8
Anushtha Rathore
EN22CS301186
9
Anushtha Rathore
EN22CS301186
A distinct set of difficulties arises when managing a data science project, ranging from scale
and legal constraints to data collecting and model interpretability. These initiatives, however,
span a wide range of topics, from departmental collaboration to post-deployment monitoring
and iteration. Because data science is a dynamic profession, project management in this
discipline must be highly collaborative, flexible, and agile.
References: https://iabac.org/blog/challenges-solutions-in-implementing-data-science-
projects-in-industry
10