2-Introduction to exploratory data analysis using R or Python-14-12-2024
2-Introduction to exploratory data analysis using R or Python-14-12-2024
import numpy as np
Numpy.org : NumPy is the fundamental package for scientific computing with Python.
NumPy arrays are designed to handle large data sets efficiently and
with a minimum of fuss. The NumPy library has a large set of routines
for creating, manipulating, and transforming NumPy arrays.
Core Python has an array data structure, but it’s not nearly as versatile,
efficient, or useful as the NumPy array.
Numpy – Multidimensional Arrays
• Numpy’s main object is a multi-dimensional array.
data = np.array( [ 1, 2, 3 ] )
data = np.array( [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] )
import pandas as pd
Used for:
• Data Analysis
• Data Manipulation
• Data Visualization
PyData.org : high-performance, easy-to-use data structures and data analysis tools for the
Python programming language.
Pandas – Indexed Arrays
• Pandas are used to build indexed arrays (1D) and matrices (2D),
where columns and rows are labeled (named) and can be accessed
via the labels (names).
Columns (features)
index
Row (samples)
index x1 x2 x3 x4
raw data
1 2 3 4 one 1 2 3 4
4 5 6 7 two 4 5 6 7
8 9 10 11 three 8 9 10 11
• Series is a 1D labeled (indexed) array and can hold any data type,
and mix of data types.
Series Raw data Column Index Labels
1
x1 = df[ ‘x1’ ] 4
8
• Selecting Multiple Columns Note: df[‘x1’:’x3’ ] this python syntax does not work!
Selects columns labeled x1 and x3 for all rows Selects columns labeled x1 through x3 for all rows
1 3 1 2 3
x1 = df[ [ ‘x1’, ‘x3’ ] ] 4 6 x1 = df.ix[ :, ‘x1’:’x3’ ] 4 5 6
8 10 8 9 10
rows (all) columns
Slicing function
Used for:
• Plots
• Histograms
• Bar Charts
• Scatter Plots
• etc
matplotlib.org: Matplotlib is a Python 2D plotting library which produces publication quality
figures in a variety of hardcopy formats and interactive environments across platforms.
Matplotlib - Plot
• The function plot plots a 2D graph.
X values to plot
Function to plot Y values to plot
plt.plot( x, y )
• Example: X Y
1 2 3
Matplotlib – Plot Labels
• Add Labels for X and Y Axis and Plot Title (caption)
plt.plot( [ 1, 2, 3 ], [ 4, 6, 8 ] )
plt.xlabel( “X Numbers” ) # Label on the X-axis
plt.ylabel( “Y Numbers” ) # Label on the Y-axis
plt.title( “My Plot of X and Y”) # Title for the Plot
plt.show()
My Plot of X and Y
8
Y Numbers
1 2 3
X Numbers
Matplotlib – Multiple Plots and Legend
• You can add multiple plots in a Graph
My Plot of X and Y
8 ---- 1st Line
---- 2nd Line
Y Numbers
1 2 3
X Numbers
Matplotlib – Bar Chart
• The function bar plots a bar graph.
1 2 3