Data Analytics Using Python
Data Analytics Using Python
Analytics
Using
Python
ELC Activity
Thapar Institute of Engineering and
Technology
By:
Dr. Aditi Sharma
Assistant Professor
Python
Machine Learning Libraries: Python offers powerful machine learning libraries such as
scikit-learn, TensorFlow, and PyTorch. Scikit-learn provides simple and efficient tools for data
mining and data analysis, while TensorFlow and PyTorch are deep learning frameworks that
allow users to build and train complex neural network models.
Natural Language Processing (NLP): Python's NLTK (Natural Language Toolkit) and spaCy
libraries are extensively used for processing and analyzing human language data. These
libraries are crucial for applications like sentiment analysis, language translation, and
chatbots.
Computer Vision: Libraries like OpenCV and Dlib are widely used for computer vision tasks
such as image and video analysis, facial recognition, object detection, and image segmentation.
Applications of Python for AI
Reinforcement Learning: Python is often used in reinforcement learning
applications, and libraries like OpenAI Gym provide environments for
developing and testing reinforcement learning algorithms.
Big Data Processing: Python can be integrated with big data processing
frameworks such as Apache Hadoop and Apache Spark for large-scale
machine learning tasks on big datasets.
Web Development and APIs: Python frameworks like Flask and Django
are used to deploy machine learning models as web applications or APIs,
allowing easy integration of machine learning functionalities into web
services.
Applications of Python for AI
Automated Machine Learning (AutoML): Python has several AutoML libraries
like TPOT and Auto-sklearn that automate the process of selecting the best machine
learning model and hyperparameters for a given dataset, making it easier for non-
experts to work on machine learning projects.
Data Visualization: Libraries like Matplotlib, Seaborn, and Plotly enable data
visualization, helping data scientists and researchers to understand complex
patterns and relationships in data, which is crucial for feature selection and model
evaluation.
• df = pd.DataFrame(data)
• data = {'state': ['Ohio', 'Ohio', 'Ohio',
'Nevada', 'Nevada'],
DataFram • 'year': [2000, 2001, 2002, 2001, 2002],
e •
•
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
DataFrame can be treated as
an ordered collection of • print(frame)
columns: Each column can be state year pop
a different data type and Have
both row and column indices. 0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
A column in a DataFrame can be retrieved as a
Series by dict-like notation or as attribute
• data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
•
DataFram
'year': [2000, 2001, 2002, 2001, 2002],
• 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
• frame = DataFrame(data)
e–
• print(frame['state’])
0 Ohio
1 Ohio
Retrievin 2
3
4
Ohio
Nevada
Nevada
ga •
Name: state, dtype: object
print(frame.state)
Column
0 Ohio
1 Ohio
2 Ohio
3 Nevada
4 Nevada
Name: state, dtype: object
• data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
• 'year': [2000, 2001, 2002, 2001, 2002],
• 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
• frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'], index=['A', 'B', 'C',
'D', 'E'])
• print(frame2.loc[['A', 'B']])
• print(frame2)
e–
• print(frame2.loc['A':'E',['state','pop']])
C 2002 Ohio 3.6 NaN
state pop
D 2001 Nevada 2.4 NaN
A Ohio 1.5
E 2002 Nevada 2.9 NaN
Fetching
B Ohio 1.7
• print(frame2.loc['A’])
C Ohio 3.6
year 2000
D Nevada 2.4
state Ohio
Rows
E Nevada 2.9
pop 1.5
• print(frame2.iloc[:,1:3])
debt NaN state pop
Name: A, dtype: object A Ohio 1.5
• print(frame2.iloc[1:3]) B Ohio 1.7
year state pop debt C Ohio 3.6
B 2001 Ohio 1.7 NaN D Nevada 2.4
C 2002 Ohio 3.6 NaN E Nevada 2.9
• frame2['debt'] = 0
• print(frame2)
year state pop debt
A 2000 Ohio 1.5 0
B 2001 Ohio 1.7 0
C 2002 Ohio 3.6 0
D 2001 Nevada 2.4 0
DataFram
E 2002 Nevada 2.9 0
• frame2['debt'] = range(5)
• print(frame2)
e–
year state pop debt
A 2000 Ohio 1.5 0
B 2001 Ohio 1.7 1
C 2002 Ohio 3.6 2
Modifying •
D 2001 Nevada 2.4
E 2002 Nevada 2.9
3
4
Columns
• frame2['debt'] = val
• print(frame2)
year state pop debt
A 2000 Ohio 1.5 10.0
B 2001 Ohio 1.7 NaN
C 2002 Ohio 3.6 10.0
D 2001 Nevada 2.4 10.0
E 2002 Nevada 2.9 NaN
e– A
year
2000
state pop
Ohio 1.5
B 2001 Ohio 1.7
Removing C
D
2002
2001
Ohio 3.6
Nevada 2.4
Columns E 2002 Nevada 2.9
• data = pd.read_csv('data.csv')
Data • data.to_csv('output.csv',
index=False)
Reading/ • pd.read_excel(‘myfile.xlsx’,sheet
Writing _name=‘sheet1’,
Pandas provides functions index_col=None,
to read data from various
na_values=[‘NA’])
file formats like CSV,
Excel, SQL databases, and • pd.read_sata(‘myfile.dta’)
output data to these
formats. • pd.read_sas(‘myfile.sas7bdat’)
• pd.read_hdf(‘myfile.h5’, ‘df’)
Pandas provides functions for
handling missing data, dropping
unnecessary columns, filling missing
values, and performing other data
cleaning tasks.
Data # Handling missing data
Cleaning
and
Preprocessi df.dropna() # Drop rows with
missing values
ng
Automated Social
Fraud Healthcar
Machine Media
Detection e
Learning Analytics
Game
Sentiment Recommen
Developme
Analysis der System
nt
Thank You