0% found this document useful (0 votes)

43 views8 pages

DAC Phase2

Uploaded by

poovizhi27be

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views8 pages

DAC Phase2

Uploaded by

poovizhi27be

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

AIR Q ASSESMENT TN

Phase 2: innovation
Project Definition:
• The project involves analyzing air quality data to assess the
suitability of air for specific purposes, such as breathing.
• The objective is to identify potential issues or deviations from
regulatory standards and determine air probability based on
various parameters.
• This project involves analysis objectives , collecting air quality
data, designing relevant visualizations, building a predictive
model.
Dataset: https://tn.data.gov.in/resource/location-wise-daily-ambient-air-
quality-tamil-nadu-year-2014

Data Collection and Preprocessing:

• Using air quality dataset. This dataset may include information about air
pollutants like PM2.5, PM10, CO, SO2, NO2, temperature, humidity, etc.
• Preprocessing the data:
Handling missing values, outliers, and ensure it’s in a suitable format
machine learning.
Example:
Checking for missing values using isnull() and notnull()
Filling for missing values using fillnull() and replace()

Features Engineering:
Create relevant features of air quality or transform existing ones.
For Example: we might calculate daily average or identify seasonal trends.
import numpy as np

# Sample air quality data (replace with your own data)

air_quality_data = = pd.read_csv(“c:\users\ELCOT\Downloads)
# Calculate the average
average_air_quality = np.mean(air_quality_data)
# Print the result
print("Average Air Quality:", average_air_quality)

Analytics in Action:
This data is often structured and organized to support data analysis and
decision-making. Analytical data in Cognos can come from various sources and
be transformed and modeled to meet the specific needs of business users. Here
are key aspects related to analytical data in Cognos
• Data Sources:
Analytical data can originate from various sources, including databases,
data warehouses, spreadsheets, and external data feeds. Cognos can
connect to these sources to access and analyze data.
• Data Modeling:
Data modeling in Cognos involves defining the structure and
relationships within the data. It helps users understand how different data
elements are related and organized, making it easier to query and analyze
the data.
• ETL (Extract, Transform, Load):
Data extraction, transformation, and loading are crucial processes to
prepare data for analysis. Cognos provides tools and capabilities to
extract data from source systems, transform it to suit analytical needs,
and load it into its data model.
• Data Exploration:
Cognos offers features for data exploration, allowing users to navigate,
filter, and interact with the data to identify patterns, trends, and
anomalies.
• Data Visualization:
Cognos enables users to create various data visualizations, such as charts,
graphs, and dashboards, to represent data in a visually meaningful way.
• Reporting:
Cognos allows users to create reports based on the analytical data. These
reports can be customized, scheduled, and distributed to relevant
stakeholders.
• Ad-Hoc Analysis:
Business users can perform ad-hoc analysis by creating their own queries
and reports, enabling them to explore data and generate insights without
relying on IT or data analysts.
• Data Security and Governance:
Cognos provides tools for managing data security and governance. This
ensures that only authorized users have access to sensitive data, and data
remains compliant with organizational policies and regulations.

• Data Integration:
Analytical data in Cognos may involve the integration of data from
various sources, making it possible to analyze and report on a
comprehensive set of data.
• Data Performance:
Cognos optimizes data retrieval and analysis performance, ensuring that
analytical processes run efficiently, even with large datasets.

Visualization:

Data visualization is the graphical representation of information

and data. By using visual elements like chart, graph and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data. Additionally, it provides an
excellent way for employees or business owners to present data to non-
technical audiences without confusion. In the world of Big Data, data
visualization tools and technologies are essential to analyze massive
amounts of information and make data-driven decisions

Histogram:
A histogram can be defined as a set of rectangles with bases
along with the intervals between class boundaries. Each rectangle bar
depicts some sort of data and all the rectangles are adjacent. The
heights of rectangles are proportional to corresponding frequencies of
similar as well as for different classes.
import matplotlib.pyplot as plt
import numpy as np

# Sample air quality data (replace with your own data)

air_quality_data = pd.read_csv(“c:\users\ELCOT\Downloads)
# Create a histogram
plt.hist(air_quality_data, bins=10, edgecolor='black', alpha=0.7)
plt.title("Histogram of Y")
plt.xlabel("Y")
plt.ylabel("Frequency")
plt.grid(True)
# Display the histogram
plt.show()
Pie Charts:

Pie graphs are used to show the distribution of qualitative

(categorical) data.It shows the frequency or relative frequency of
values in the data .Frequency is the amount of times that value
appeared in the data. Relative frequency is the percentage of the
total. Each category is represented with a slice in the 'pie' (circle). The
size of each slice represents the frequency of values from that
category in the data.
City

Total Rural Urban And Rural Urban

Analyzing and Handling Data set:

▪ Load the Dataset:
• Import the necessary libraries (e.g., Pandas, NumPy) in
Python.
• Load your dataset into a data structure (e.g., a Pandas
DataFrame)
import pandas as pd
data = pd.read_csv('your_dataset.csv')

▪ Missing Values:
Identify and handle missing values in your dataset. You can
drop rows with missing values, impute values, or use other
strategies depending on the context.
print(data.head())
print(data.info())
print(data.describe())

▪ Duplicate Data:
Check for and remove duplicate rows if necessary.
data.dropna() # To remove rows with missing values
data.fillna(value) # To fill missing values with a specific value
▪ Outliers:
Identify and handle outliers in your data. You can use statistical
methods or visualization techniques to detect outliers.
# Example using Z-score for outlier detection
from scipy import stats
z_scores = stats.zscore(data)
data_no_outliers = data[(z_scores < 3).all(axis=1)]

▪ Data Transformation:
Transform data if needed, such as converting data types,
encoding categorical variables, or scaling numerical features.
# Example: Encoding categorical variables
data = pd.get_dummies(data, columns=['categorical_column'])

▪ Feature Engineering:
Create new features from existing ones to improve model
performance.
data['new_feature'] = data['feature1'] + data['feature2']

▪ Data Splitting:
Split your dataset into training, validation, and test sets for
model development and evaluation.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

▪ Scaling and Normalization:

Scale or normalize numerical features to ensure that they have
a similar scale and distribution.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

▪ Data Analysis:
Perform the specific analysis you need, such as building
machine learning models, statistical analysis, or hypothesis
testing.
Conclusion:
In this phase, we have defined the problem, objectives, and outlined
the design thinking process for the project. The ultimate goal is to
provide meaningful insights that can inform policies and initiatives
aimed at improving the lives of Air Q Assessment in Tamil Nadu.
Document your entire analysis process, including data sources,
methods, and assumptions. This documentation is crucial for
transparency and reproducibility.

Principles of Data Science WEB 3
No ratings yet
Principles of Data Science WEB 3
30 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
84 pages
SAP Dumps
89% (9)
SAP Dumps
52 pages
Experiment - 1 csd201
No ratings yet
Experiment - 1 csd201
19 pages
Notes in Environmental Data Analysis
100% (1)
Notes in Environmental Data Analysis
11 pages
Lecture 2
No ratings yet
Lecture 2
30 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Lavanya Sharma IP File 2024-25-1
No ratings yet
Lavanya Sharma IP File 2024-25-1
37 pages
Data - Part 1
No ratings yet
Data - Part 1
58 pages
Dsa Final Project Report
No ratings yet
Dsa Final Project Report
30 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
INDEX
No ratings yet
INDEX
16 pages
DAC Phase5
No ratings yet
DAC Phase5
25 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Data Analytics With Python Examples
No ratings yet
Data Analytics With Python Examples
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Ch.3 Data Preprocessing
No ratings yet
Ch.3 Data Preprocessing
16 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
MODULE2 Material
No ratings yet
MODULE2 Material
14 pages
Pandas 1702216043
No ratings yet
Pandas 1702216043
86 pages
Dbms - MongoDB Collection
No ratings yet
Dbms - MongoDB Collection
51 pages
ML Report
No ratings yet
ML Report
12 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
Aa MDM MST
No ratings yet
Aa MDM MST
8 pages
Chap.3 Data Preprocessing
No ratings yet
Chap.3 Data Preprocessing
6 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
Information User Lecture Note
No ratings yet
Information User Lecture Note
23 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
Questions For The May 2024 IDU
100% (3)
Questions For The May 2024 IDU
13 pages
Datascience
No ratings yet
Datascience
26 pages
Shashank Bodduna: Informatics Practices Project XII
No ratings yet
Shashank Bodduna: Informatics Practices Project XII
20 pages
Machine Learning Project 3
No ratings yet
Machine Learning Project 3
74 pages
Increasing The Value of Quality Management Systems: Ida Gremyr
No ratings yet
Increasing The Value of Quality Management Systems: Ida Gremyr
14 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Approaches in Data Analysis (Slides)
No ratings yet
Approaches in Data Analysis (Slides)
13 pages
Approaches in Data Analysis (Slides) (Re-Brand)
No ratings yet
Approaches in Data Analysis (Slides) (Re-Brand)
13 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
Predictive Modeling
No ratings yet
Predictive Modeling
27 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
Set 2
No ratings yet
Set 2
3 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Unit 2
No ratings yet
Unit 2
58 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Total Documentation
No ratings yet
Total Documentation
21 pages
Harsha Teja: Informatics Practices Project XII
No ratings yet
Harsha Teja: Informatics Practices Project XII
20 pages
Data Analyst Course
No ratings yet
Data Analyst Course
8 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
Fds Csheet and Read The Rule
No ratings yet
Fds Csheet and Read The Rule
4 pages
Get-Pip Py
No ratings yet
Get-Pip Py
413 pages
Foundation of Data Science Imp Notes
No ratings yet
Foundation of Data Science Imp Notes
6 pages
Status Report Mechanisms For Researcher Access To Online Platform Data Clean jsd4 Access iMamQm4Agxb9hhvNynRHTXjrAjE 104117
No ratings yet
Status Report Mechanisms For Researcher Access To Online Platform Data Clean jsd4 Access iMamQm4Agxb9hhvNynRHTXjrAjE 104117
25 pages
MKT 403 Main Course Material Combined Final Oye
No ratings yet
MKT 403 Main Course Material Combined Final Oye
198 pages
Pull Back Load and Stress Analysis of Pi
No ratings yet
Pull Back Load and Stress Analysis of Pi
18 pages
BUAN6320 - Chapter - 1-4 & 9
No ratings yet
BUAN6320 - Chapter - 1-4 & 9
191 pages
Da End Sem
No ratings yet
Da End Sem
5 pages
Data Science Workflow
No ratings yet
Data Science Workflow
7 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Oose Lab Manual
No ratings yet
Oose Lab Manual
31 pages
Railway Reservation System
No ratings yet
Railway Reservation System
18 pages
SIEBEL Interview Questions and Answers
No ratings yet
SIEBEL Interview Questions and Answers
2 pages
Daniel Sam Joseph: Informatics Practices Project XII
No ratings yet
Daniel Sam Joseph: Informatics Practices Project XII
20 pages
Book
No ratings yet
Book
7 pages
Notes - 4 Unit-Big Data
No ratings yet
Notes - 4 Unit-Big Data
38 pages
EDI Messages
No ratings yet
EDI Messages
18 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
15 pages
Dadosfera Commercial Eng
No ratings yet
Dadosfera Commercial Eng
16 pages
Big Data Report
No ratings yet
Big Data Report
6 pages
Unit 5-Understanding The Concepts of Pointer
No ratings yet
Unit 5-Understanding The Concepts of Pointer
34 pages
Unix ETL Interview Questions
No ratings yet
Unix ETL Interview Questions
5 pages
IrpMan PDF
No ratings yet
IrpMan PDF
182 pages
Mid Course Summative Assessment - Data Vizualization Tools
No ratings yet
Mid Course Summative Assessment - Data Vizualization Tools
4 pages
Cr3 Manual en US
No ratings yet
Cr3 Manual en US
16 pages
Important Question For DBMS
No ratings yet
Important Question For DBMS
34 pages
Assignment No. 1
No ratings yet
Assignment No. 1
8 pages
Naukri AjitKumarMishra (7y 0m)
No ratings yet
Naukri AjitKumarMishra (7y 0m)
5 pages
Operating Systems: Operating System Support For Continuous Media
No ratings yet
Operating Systems: Operating System Support For Continuous Media
23 pages
Jurnal Atlantis Press
No ratings yet
Jurnal Atlantis Press
6 pages
OOSE Team
No ratings yet
OOSE Team
3 pages
Certificate Programme in Machine Learning
No ratings yet
Certificate Programme in Machine Learning
3 pages
Iip Final Report Format
No ratings yet
Iip Final Report Format
2 pages
Storage Discovery
No ratings yet
Storage Discovery
17 pages
Aneka Soal Ujian Sistem Operasi: File System & FUSE
No ratings yet
Aneka Soal Ujian Sistem Operasi: File System & FUSE
2 pages
Oracle BI Applications
No ratings yet
Oracle BI Applications
5 pages

DAC Phase2

Uploaded by

DAC Phase2

Uploaded by

AIR Q ASSESMENT TN

Data Collection and Preprocessing:

# Sample air quality data (replace with your own data)

Data visualization is the graphical representation of information

# Sample air quality data (replace with your own data)

Pie graphs are used to show the distribution of qualitative

Total Rural Urban And Rural Urban

Analyzing and Handling Data set:

▪ Scaling and Normalization:

You might also like