DAC Phase2
DAC Phase2
Phase 2: innovation
Project Definition:
• The project involves analyzing air quality data to assess the
suitability of air for specific purposes, such as breathing.
• The objective is to identify potential issues or deviations from
regulatory standards and determine air probability based on
various parameters.
• This project involves analysis objectives , collecting air quality
data, designing relevant visualizations, building a predictive
model.
Dataset: https://tn.data.gov.in/resource/location-wise-daily-ambient-air-
quality-tamil-nadu-year-2014
Features Engineering:
Create relevant features of air quality or transform existing ones.
For Example: we might calculate daily average or identify seasonal trends.
import numpy as np
Analytics in Action:
This data is often structured and organized to support data analysis and
decision-making. Analytical data in Cognos can come from various sources and
be transformed and modeled to meet the specific needs of business users. Here
are key aspects related to analytical data in Cognos
• Data Sources:
Analytical data can originate from various sources, including databases,
data warehouses, spreadsheets, and external data feeds. Cognos can
connect to these sources to access and analyze data.
• Data Modeling:
Data modeling in Cognos involves defining the structure and
relationships within the data. It helps users understand how different data
elements are related and organized, making it easier to query and analyze
the data.
• ETL (Extract, Transform, Load):
Data extraction, transformation, and loading are crucial processes to
prepare data for analysis. Cognos provides tools and capabilities to
extract data from source systems, transform it to suit analytical needs,
and load it into its data model.
• Data Exploration:
Cognos offers features for data exploration, allowing users to navigate,
filter, and interact with the data to identify patterns, trends, and
anomalies.
• Data Visualization:
Cognos enables users to create various data visualizations, such as charts,
graphs, and dashboards, to represent data in a visually meaningful way.
• Reporting:
Cognos allows users to create reports based on the analytical data. These
reports can be customized, scheduled, and distributed to relevant
stakeholders.
• Ad-Hoc Analysis:
Business users can perform ad-hoc analysis by creating their own queries
and reports, enabling them to explore data and generate insights without
relying on IT or data analysts.
• Data Security and Governance:
Cognos provides tools for managing data security and governance. This
ensures that only authorized users have access to sensitive data, and data
remains compliant with organizational policies and regulations.
• Data Integration:
Analytical data in Cognos may involve the integration of data from
various sources, making it possible to analyze and report on a
comprehensive set of data.
• Data Performance:
Cognos optimizes data retrieval and analysis performance, ensuring that
analytical processes run efficiently, even with large datasets.
Visualization:
Histogram:
A histogram can be defined as a set of rectangles with bases
along with the intervals between class boundaries. Each rectangle bar
depicts some sort of data and all the rectangles are adjacent. The
heights of rectangles are proportional to corresponding frequencies of
similar as well as for different classes.
import matplotlib.pyplot as plt
import numpy as np
▪ Missing Values:
Identify and handle missing values in your dataset. You can
drop rows with missing values, impute values, or use other
strategies depending on the context.
print(data.head())
print(data.info())
print(data.describe())
▪ Duplicate Data:
Check for and remove duplicate rows if necessary.
data.dropna() # To remove rows with missing values
data.fillna(value) # To fill missing values with a specific value
▪ Outliers:
Identify and handle outliers in your data. You can use statistical
methods or visualization techniques to detect outliers.
# Example using Z-score for outlier detection
from scipy import stats
z_scores = stats.zscore(data)
data_no_outliers = data[(z_scores < 3).all(axis=1)]
▪ Data Transformation:
Transform data if needed, such as converting data types,
encoding categorical variables, or scaling numerical features.
# Example: Encoding categorical variables
data = pd.get_dummies(data, columns=['categorical_column'])
▪ Feature Engineering:
Create new features from existing ones to improve model
performance.
data['new_feature'] = data['feature1'] + data['feature2']
▪ Data Splitting:
Split your dataset into training, validation, and test sets for
model development and evaluation.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
▪ Data Analysis:
Perform the specific analysis you need, such as building
machine learning models, statistical analysis, or hypothesis
testing.
Conclusion:
In this phase, we have defined the problem, objectives, and outlined
the design thinking process for the project. The ultimate goal is to
provide meaningful insights that can inform policies and initiatives
aimed at improving the lives of Air Q Assessment in Tamil Nadu.
Document your entire analysis process, including data sources,
methods, and assumptions. This documentation is crucial for
transparency and reproducibility.