0% found this document useful (0 votes)
0 views

Data Analytics Using Python

The document provides an overview of Python's applications in data analytics and artificial intelligence, highlighting its ease of use and versatility for tasks ranging from data preprocessing to machine learning and natural language processing. It discusses various Python libraries such as NumPy, Pandas, and Scikit-learn, which facilitate data manipulation, analysis, and visualization. Additionally, it outlines specific projects and applications that can be developed using Python in fields like healthcare, finance, and social media.

Uploaded by

aditi31.kapil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Data Analytics Using Python

The document provides an overview of Python's applications in data analytics and artificial intelligence, highlighting its ease of use and versatility for tasks ranging from data preprocessing to machine learning and natural language processing. It discusses various Python libraries such as NumPy, Pandas, and Scikit-learn, which facilitate data manipulation, analysis, and visualization. Additionally, it outlines specific projects and applications that can be developed using Python in fields like healthcare, finance, and social media.

Uploaded by

aditi31.kapil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Data

Analytics
Using
Python
ELC Activity
Thapar Institute of Engineering and
Technology
By:
Dr. Aditi Sharma
Assistant Professor
Python

A High-level Programming Language, as well as a scripting


language

Python is an easy language to learn because of its simple


syntax

It can be used for simple tasks as well as complex tasks


like machine learning

Different data types available: primitive, string, list, tuple,


set, dictionary.
Applications of Python for AI
Data Preprocessing: Python libraries like Pandas and NumPy are widely used for cleaning,
transforming, and preprocessing raw data into a suitable format for machine learning models.

Machine Learning Libraries: Python offers powerful machine learning libraries such as
scikit-learn, TensorFlow, and PyTorch. Scikit-learn provides simple and efficient tools for data
mining and data analysis, while TensorFlow and PyTorch are deep learning frameworks that
allow users to build and train complex neural network models.
Natural Language Processing (NLP): Python's NLTK (Natural Language Toolkit) and spaCy
libraries are extensively used for processing and analyzing human language data. These
libraries are crucial for applications like sentiment analysis, language translation, and
chatbots.

Computer Vision: Libraries like OpenCV and Dlib are widely used for computer vision tasks
such as image and video analysis, facial recognition, object detection, and image segmentation.
Applications of Python for AI
Reinforcement Learning: Python is often used in reinforcement learning
applications, and libraries like OpenAI Gym provide environments for
developing and testing reinforcement learning algorithms.

Big Data Processing: Python can be integrated with big data processing
frameworks such as Apache Hadoop and Apache Spark for large-scale
machine learning tasks on big datasets.

Web Development and APIs: Python frameworks like Flask and Django
are used to deploy machine learning models as web applications or APIs,
allowing easy integration of machine learning functionalities into web
services.
Applications of Python for AI
Automated Machine Learning (AutoML): Python has several AutoML libraries
like TPOT and Auto-sklearn that automate the process of selecting the best machine
learning model and hyperparameters for a given dataset, making it easier for non-
experts to work on machine learning projects.

Data Visualization: Libraries like Matplotlib, Seaborn, and Plotly enable data
visualization, helping data scientists and researchers to understand complex
patterns and relationships in data, which is crucial for feature selection and model
evaluation.

Predictive Analytics: Python is used for building predictive models in various


domains such as finance, healthcare, marketing, and sales, helping businesses make
data-driven decisions.
Python Libraries

Numpy Pandas Scipy

Scikit- Matplot Seabor


Learn lib n
Numpy

NumPy is a powerful library in Python used for numerical computing.

Provides support for large, multi-dimensional arrays and matrices,


along with a collection of high-level mathematical functions to operate
on these arrays.
NumPy is a fundamental package for scientific computing in Python
and is widely used in various fields such as physics, engineering, data
science, and machine learning.
Arrays: Multidimensional homogenous array of fixed size is provided
in Numpy.
• import numpy as np
• # Creating a 1D array
• a = np.array([1, 2, 3, 4, 5])
• # Creating a 2D array
• b = np.array([[1, 2, 3], [4, 5, 6]])
Numpy • # Element-wise operations
• a = np.array([1, 2, 3])
• b = np.array([4, 5, 6])
• c = a + b # [5, 7, 9]
Numpy Functions

Shape and Dimesions


Indexing and Slicing
Universal Function
Linear Algebra
Scientific Computing
Pandas

Pandas is a popular open-source data analysis and manipulation


library for Python.

It provides easy-to-use data structures such as Series and


DataFrame, along with data analysis tools for cleaning,
transforming, and analyzing structured data.

Pandas is widely used in data science, machine learning, and


finance for handling and analyzing data efficiently.
Series & DataFrame
A Series is a one-dimensional labeled array that can hold any data
type. It is like a column in a DataFrame or a single attribute of an
object.
# Creating a Series
s = pd.Series([1, 3, 5, 6, 8])

A DataFrame is a two-dimensional labeled data structure with


columns that can be of different data types. It is similar to a
spreadsheet or SQL table or a dictionary of Series objects. You
can think of it like a table in a relational database or an Excel
spreadsheet.
• import pandas as pd

• # Creating a DataFrame from a


dictionary

DataFram • data = {'Name': ['Alice', 'Bob',


'Charlie'],
e • 'Age': [25, 30, 35],
• 'City': ['New York', 'London',
'Paris']}

• df = pd.DataFrame(data)
• data = {'state': ['Ohio', 'Ohio', 'Ohio',
'Nevada', 'Nevada'],
DataFram • 'year': [2000, 2001, 2002, 2001, 2002],

e •

'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
DataFrame can be treated as
an ordered collection of • print(frame)
columns: Each column can be state year pop
a different data type and Have
both row and column indices. 0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
A column in a DataFrame can be retrieved as a
Series by dict-like notation or as attribute
• data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],

DataFram
'year': [2000, 2001, 2002, 2001, 2002],
• 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
• frame = DataFrame(data)

e–
• print(frame['state’])
0 Ohio
1 Ohio

Retrievin 2
3
4
Ohio
Nevada
Nevada

ga •
Name: state, dtype: object
print(frame.state)

Column
0 Ohio
1 Ohio
2 Ohio
3 Nevada
4 Nevada
Name: state, dtype: object
• data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
• 'year': [2000, 2001, 2002, 2001, 2002],
• 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
• frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'], index=['A', 'B', 'C',
'D', 'E'])
• print(frame2.loc[['A', 'B']])
• print(frame2)

DataFram year state pop debt


A 2000
B 2001
Ohio 1.5 NaN
Ohio 1.7 NaN
year state pop debt
A 2000 Ohio 1.5 NaN
B 2001 Ohio 1.7 NaN

e–
• print(frame2.loc['A':'E',['state','pop']])
C 2002 Ohio 3.6 NaN
state pop
D 2001 Nevada 2.4 NaN
A Ohio 1.5
E 2002 Nevada 2.9 NaN

Fetching
B Ohio 1.7
• print(frame2.loc['A’])
C Ohio 3.6
year 2000
D Nevada 2.4
state Ohio

Rows
E Nevada 2.9
pop 1.5
• print(frame2.iloc[:,1:3])
debt NaN state pop
Name: A, dtype: object A Ohio 1.5
• print(frame2.iloc[1:3]) B Ohio 1.7
year state pop debt C Ohio 3.6
B 2001 Ohio 1.7 NaN D Nevada 2.4
C 2002 Ohio 3.6 NaN E Nevada 2.9
• frame2['debt'] = 0
• print(frame2)
year state pop debt
A 2000 Ohio 1.5 0
B 2001 Ohio 1.7 0
C 2002 Ohio 3.6 0
D 2001 Nevada 2.4 0

DataFram
E 2002 Nevada 2.9 0

• frame2['debt'] = range(5)
• print(frame2)

e–
year state pop debt
A 2000 Ohio 1.5 0
B 2001 Ohio 1.7 1
C 2002 Ohio 3.6 2

Modifying •
D 2001 Nevada 2.4
E 2002 Nevada 2.9
3
4

val = Series([10, 10, 10], index = ['A', 'C', 'D'])

Columns
• frame2['debt'] = val
• print(frame2)
year state pop debt
A 2000 Ohio 1.5 10.0
B 2001 Ohio 1.7 NaN
C 2002 Ohio 3.6 10.0
D 2001 Nevada 2.4 10.0
E 2002 Nevada 2.9 NaN

• Rows or individual elements can be modified similarly.


Using loc or iloc.
DataFram • del frame2['debt']
• print(frame2)

e– A
year
2000
state pop
Ohio 1.5
B 2001 Ohio 1.7
Removing C
D
2002
2001
Ohio 3.6
Nevada 2.4
Columns E 2002 Nevada 2.9
• data = pd.read_csv('data.csv')
Data • data.to_csv('output.csv',
index=False)
Reading/ • pd.read_excel(‘myfile.xlsx’,sheet
Writing _name=‘sheet1’,
Pandas provides functions index_col=None,
to read data from various
na_values=[‘NA’])
file formats like CSV,
Excel, SQL databases, and • pd.read_sata(‘myfile.dta’)
output data to these
formats. • pd.read_sas(‘myfile.sas7bdat’)
• pd.read_hdf(‘myfile.h5’, ‘df’)
Pandas provides functions for
handling missing data, dropping
unnecessary columns, filling missing
values, and performing other data
cleaning tasks.
Data # Handling missing data
Cleaning
and
Preprocessi df.dropna() # Drop rows with
missing values
ng

df.fillna(value=0) # Fill missing


values with 0
Projects

Automated Social
Fraud Healthcar
Machine Media
Detection e
Learning Analytics

Voice Customer Automated


Recognitio Segmentat Machine
n ion Learning
Projects

Text Handwritte Object


Emotion
Summariza n Data Identificati
Analysis
tion Recognition on

Game
Sentiment Recommen
Developme
Analysis der System
nt
Thank You

You might also like