Data Preprocessing-AIML Algorithm1
Data Preprocessing-AIML Algorithm1
2
Course Outcomes
CO Title Level
Number
3
TASK OBJECTIVES
• Introduction
• Running Python
• Python Programming
• Data types
• Control flows
• Classes, functions, modules
• Hands-on Exercises
Course Goals
• To understand the basic structure and syntax of Python programming
language
• To learn how to run Python scripts on our research computing facility,
the Emerald Linux cluster
• To write your own simple Python scripts.
• To serve as the starting point for more advanced training on Python
coding
Unit-1 Data Pre-processing and AIML Algorithms Contact Hours:16
Experiment No 1 Explore data pre-processing packages and AIML algorithms.
Experiment No 2 Exploring Pandas and Numpy library for Data analysis..
Experiment No 3 Understanding the Data analysis/Visualization using AIML algorithms and Matplotlib.
Experiment No 4 Explore, transform and summarize input datasets for building Classification/Regression/Prediction models.
Experiment No 3 Apply appropriate machine learning model for accurate prediction of air quality index.
Experiment No 4 Develop an engineered solution to socially relevant problem(s) with technical report.
6
Theory
Python Libraries for Data Science
Many popular Python toolboxes/libraries:
• NumPy
• SciPy
• Pandas
All these libraries are
• SciKit-Learn installed on the SCC
Visualization libraries
• matplotlib
• Seaborn
Link: http://www.numpy.org/
8
2. Python Libraries for Data Science
SciPy:
collection of algorithms for linear algebra, differential equations, numerical
integration, optimization, statistics and more
built on NumPy
Link: https://www.scipy.org/scipylib/
9
3. Python Libraries for Data Science
Pandas:
adds data structures and tools designed to work with table-like data (similar
to Series and Data Frames in R)
Link: http://pandas.pydata.org/
10
4. Python Libraries for Data Science
SciKit-Learn:
provides machine learning algorithms: classification, regression, clustering,
model validation etc.
Link: http://scikit-learn.org/
11
5. Python Libraries for Data Science
matplotlib:
python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats
Link: https://matplotlib.org/
12
6. Python Libraries for Data Science
Seaborn:
based on matplotlib
Link: https://seaborn.pydata.org/
13
Introduction to Matplotlib
2D plots of arrays.
Matplotlib is an amazing visualization library in Python for 2D plots
of arrays.
Matplotlib is a multi-platform data visualization library built on
NumPy arrays and designed to work with the broader SciPy
stack. It was introduced by John Hunter in the year 2002.
One of the greatest benefits of visualization is that it allows us
visual access to huge amounts of data in easily digestible visuals.
Matplotlib consists of several plots like line, bar, scatter,
histogram etc.
Theory
1. Importing matplotlib :
• from matplotlib import pyplot as plt
Or
22
Need of Numpy
Python does numerical computations slowly.
1000 x 1000 matrix multiply
Python triple loop takes > 10 min.
Numpy takes ~0.03 seconds
23
NumPy Overview
1. Arrays
2. Shaping and transposition
3. Mathematical Operations
4. Indexing and slicing
5. Broadcasting
24
Arrays
Structured lists of numbers:-
Vectors
Matrices
Images
Tensors
ConvNets
25
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
26
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
27
Arrays
30
B. Arrays Creation
np.ones, np.zeros
np.arange
np.concatenate
np.astype
np.zeros_like, np.ones_like
np.random.random
31
Arrays Creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
32
Arrays Creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
33
Arrays Creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
34
Arrays Creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
35
Arrays Creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
36
Arrays Creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
37
Arrays Creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
38
Machine Learning Algorithms
Machine Learning algorithms are the programs that can learn the hidden
patterns from the data, predict the output, and improve the performance
from experiences on their own. Different algorithms can be used in
machine learning for different tasks, such as simple linear regression that
can be used for prediction problems like stock market prediction, and the
KNN algorithm can be used for classification problems.
Machine Learning Algorithm can be broadly classified into three types:
• Supervised Learning Algorithms
• Unsupervised Learning Algorithms
• Reinforcement Learning algorithm
39
Supervised Learning Algorithm
Supervised learning is a type of Machine learning in which the machine
needs external supervision to learn. The supervised learning models
are trained using the labelled dataset. Once the training and processing
are done, the model is tested by providing a sample test data to check
whether it predicts the correct output.
40
Unsupervised Learning Algorithm
It is a type of machine learning in which the machine does not need any
external supervision to learn from the data, hence called unsupervised
learning. The unsupervised models can be trained using the unlabelled
dataset that is not classified, nor categorized, and the algorithm needs
to act on that data without any supervision. In unsupervised learning,
the model doesn't have a predefined output, and it tries to find useful
insights from the huge amount of data.
41
Reinforcement Learning Algorithm
In Reinforcement learning, an agent interacts with its environment by
producing actions, and learn with the help of feedback. The feedback is
given to the agent in the form of rewards, such as for each good action,
he gets a positive reward, and for each bad action, he gets a negative
reward. There is no supervision provided to the agent. Q-Learning
algorithm is used in reinforcement learning.
42
Learning Outcomes
On completion of the course students will be able to understand
44
Summary
Python is a high-level, general-purpose and a very popular
programming language. Python programming language (latest Python
3) is being used in web development, Machine Learning applications,
along with all cutting edge technology in Software Industry. Python
Programming Language is very well suited for Beginners, also for
experienced programmers with other programming languages like C+
+ and Java.
45
References
• Python Homepage
• http://www.python.org
• Python Tutorial
• http://docs.python.org/tutorial/
• Python Documentation
• http://www.python.org/doc
• Python Library References
• http://docs.python.org/release/2.5.2/lib/lib.html
• Python Add-on Packages:
• http://pypi.python.org/pypi
THANK YOU
For queries
Email: [email protected]
47