0% found this document useful (0 votes)

19 views

Data Preprocessing-AIML Algorithm1

This document provides information about a course on Disruptive Technologies - I at the Bachelor of Engineering (Computer Science & Engineering) program. The course objectives include developing an understanding of artificial intelligence building blocks and data science/analytics. Students will learn about data processing, AIML algorithms, and disruptive technologies. The course outcomes are remembering characteristics of disruptive technologies and understanding AI/ML algorithms and applications. Students will experiment with data visualization and work with data through the data science process. They will also analyze and evaluate solutions using AI/ML and design solutions for selected domains.

Uploaded by

Manic Console

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Data Preprocessing-AIML Algorithm1

Uploaded by

Manic Console

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 47

INSTITUTE - UIE

DEPARTMENT- ACADEMIC UNIT-1

Bachelor of Engineering (Computer Science &
Engineering)
SUBJECT NAME:- Disruptive Technologies – I
SUBJECT CODE- 23ECH102
Prepared By: Dr. Prashant Upadhyaya

Data Preprocessing-AIML Algorithm DISCOVER . LEARN . EMPOWER

1
Course Objectives
S. No. Objectives

1 To develop an understanding of the building blocks of AI.

2 To aware about Data Science/Analytics.

To provide knowledge about data processing.

4 To make familiar with AIML Algorithms.

5 To give brief knowledge about Disruptive Technologies.

2
Course Outcomes
CO Title Level
Number

CO1 Recognise the characteristics of disruptive technologies and Remember

understand building blocks of data science, artificial
intelligence, and machine learning.
CO2 Describe AI/ML algorithms and techniques to demonstrate Understand
its applications.
CO3 Experiment with effective data visualizations, and explain Apply
how to work with data through the entire data science
process.
CO4 Analyse and evaluate solutions to address real time problems Analyze and
using AI/ML for different applications. evaluate
CO5 Design, formulate and integrate in a team that can propose, a Create
solution for their selected domain.

3
TASK OBJECTIVES
• Introduction
• Running Python
• Python Programming
• Data types
• Control flows
• Classes, functions, modules
• Hands-on Exercises
Course Goals
• To understand the basic structure and syntax of Python programming
language
• To learn how to run Python scripts on our research computing facility,
the Emerald Linux cluster
• To write your own simple Python scripts.
• To serve as the starting point for more advanced training on Python
coding
Unit-1 Data Pre-processing and AIML Algorithms Contact Hours:16
Experiment No 1 Explore data pre-processing packages and AIML algorithms.
Experiment No 2 Exploring Pandas and Numpy library for Data analysis..
Experiment No 3 Understanding the Data analysis/Visualization using AIML algorithms and Matplotlib.

Experiment No 4 Explore, transform and summarize input datasets for building Classification/Regression/Prediction models.

Unit-2 AIML Applications Contact Hours:16

Experiment No 1 Understand supervised learning to train and develop classifier models.
Experiment No 2 Build a classification model by using different machine learning algorithms..
Experiment No 3 Develop a prediction model based on linear/logistic regression.
Experiment No 4 Derive insights from images using pre-trained Vision API models for image classification.

Unit-3 AI Applications Contact Hours:16

Experiment No 1 Design an AI-enabled application for emotion/gesture detection.
Experiment No 2 Create an artificial intelligence powered Chat Bot to mimic human interactions for different application domains.

Experiment No 3 Apply appropriate machine learning model for accurate prediction of air quality index.
Experiment No 4 Develop an engineered solution to socially relevant problem(s) with technical report.

6
Theory
Python Libraries for Data Science
Many popular Python toolboxes/libraries:
• NumPy
• SciPy
• Pandas
All these libraries are
• SciKit-Learn installed on the SCC

Visualization libraries
• matplotlib
• Seaborn

and many more … 7

1. Python Libraries for Data Science
NumPy:
 introduces objects for multidimensional arrays and matrices, as well as
functions that allow to easily perform advanced mathematical and statistical
operations on those objects

 provides vectorization of mathematical operations on arrays and matrices

which significantly improves the performance

 many other python libraries are built on NumPy

Link: http://www.numpy.org/

8
2. Python Libraries for Data Science
SciPy:
 collection of algorithms for linear algebra, differential equations, numerical
integration, optimization, statistics and more

 part of SciPy Stack

 built on NumPy

Link: https://www.scipy.org/scipylib/

9
3. Python Libraries for Data Science
Pandas:
 adds data structures and tools designed to work with table-like data (similar
to Series and Data Frames in R)

 provides tools for data manipulation: reshaping, merging, sorting, slicing,

aggregation etc.

 allows handling missing data

Link: http://pandas.pydata.org/

10
4. Python Libraries for Data Science
SciKit-Learn:
 provides machine learning algorithms: classification, regression, clustering,
model validation etc.

 built on NumPy, SciPy and matplotlib

Link: http://scikit-learn.org/

11
5. Python Libraries for Data Science
matplotlib:
 python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats

 a set of functionalities similar to those of MATLAB

 line plots, scatter plots, barcharts, histograms, pie charts etc.

 relatively low-level; some effort needed to create advanced visualization

Link: https://matplotlib.org/

12
6. Python Libraries for Data Science
Seaborn:
 based on matplotlib

 provides high level interface for drawing attractive statistical graphics

 Similar (in style) to the popular ggplot2 library in R

Link: https://seaborn.pydata.org/

13
Introduction to Matplotlib

 Matplotlib is an amazing visualization library in Python for

2D plots of arrays.
 Matplotlib is an amazing visualization library in Python for 2D plots
of arrays.
 Matplotlib is a multi-platform data visualization library built on
NumPy arrays and designed to work with the broader SciPy
stack. It was introduced by John Hunter in the year 2002.
 One of the greatest benefits of visualization is that it allows us
visual access to huge amounts of data in easily digestible visuals.
 Matplotlib consists of several plots like line, bar, scatter,
histogram etc.
Theory
1. Importing matplotlib :
• from matplotlib import pyplot as plt

• import matplotlib.pyplot as plt

2. Basic plots in Matplotlib :

• Matplotlib comes with a wide variety of plots.

• Plots helps to understand trends, patterns, and to make correlations.
• They’re typically instruments for reasoning about
quantitative information.
A. Line plot :
# importing matplotlib module
from matplotlib import pyplot as plt
•
• # x-axis values
• x = [5, 2, 9, 4, 7]
•
• # Y-axis values
• y = [10, 5, 8, 4, 2]
•
• # Function to plot
• plt.plot(x,y)
•
• # function to show the plot
• plt.show()
B. Bar plot :
• # importing matplotlib module
• from matplotlib import pyplot as plt
•
• # x-axis values
• x = [5, 2, 9, 4, 7]
•
• # Y-axis value
s
•• y = [10, 5, 8, 4, 2]
• # Function to plot the bar
• plt.bar(x,y)
•
• # function to show the plot
• plt.show()
3. Functional Approach

Figure 1. Line graph through Matplotlib

Figure 2. Line graph through
Matplotlib with labels
INTRODUCTION - Numpy

 Numpy, Scipy and Matplotlib provide MATLAB-like functionality in

python.
 Numpy Features:
 Typed multi-dimentional arrays (matrices)
 Fast numerical computations (matrix math)
 High-level math functions

22
Need of Numpy
 Python does numerical computations slowly.
 1000 x 1000 matrix multiply
 Python triple loop takes > 10 min.
 Numpy takes ~0.03 seconds

23
NumPy Overview
1. Arrays
2. Shaping and transposition
3. Mathematical Operations
4. Indexing and slicing
5. Broadcasting

24
Arrays
Structured lists of numbers:-
 Vectors
 Matrices
 Images
 Tensors
 ConvNets

25
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets

26
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets

Figure 1. Sample Image[1]

27
Arrays

Structured lists of numbers.

o Vectors
o Matrices
o Images
o Tensors
o ConvNets

Figure 2. Sample Image[1]

28
Arrays

Structured lists of numbers.

 Vectors
 Matrices
 Images
 Tensors
 ConvNets

Figure 3. Sample Image[1] 29

A. Arrays, Basic Properties
import numpy as np
a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
print a.ndim, a.shape, a.dtype

1. Arrays can have any number of dimensions, including zero (a scalar).

2. Arrays are typed: np.uint8, np.int64, np.float32, np.float64
3. Arrays are dense. Each element of the array exists and has the same
type.

30
B. Arrays Creation
 np.ones, np.zeros
 np.arange
 np.concatenate
 np.astype
 np.zeros_like, np.ones_like
 np.random.random

31
Arrays Creation