Phase1. (Team 11) Document
Phase1. (Team 11) Document
1.Problem Statement
Air pollution poses serious health and environmental risks. Accurately predicting air quality
levels can help authorities take timely action to protect public health and mitigate harmful
effects. The goal is to use machine learning techniques to forecast air quality and identify
contributing factors.
❖ Predict future air quality index (AQI) levels using historical environmental data.
❖ Identify key pollutants and features that influence AQI.
❖ Provide insights and visualizations to aid decision-making and public awareness.
4.Data Sources
Variables:
PM2.5, PM10, NO2, SO2, CO, temperature, humidity, wind speed, etc.
5.High-Level Methodology
Data Collection:
Download from Kaggle/UCI or via API.
Data Cleaning:
Handle missing values, remove duplicates, standardize formats.
EDA:
Use seaborn/matplotlib for correlation heatmaps, time series plots.
Feature Engineering:
Add rolling averages, time lags, or weather indices.
Model Building:
Random Forest, XGBoost, LSTM (for time series).
Model Evaluation:
RMSE, MAE, R² score.
Deployment:
Optional—deploy using Streamlit or Flask for demo purposes.