4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
Getting Started with Python Libraries
Python libraries are powerful collections of tools; is a collection of pre-written code, often organized
into modules and packages, that provides reusable functions and classes for performing specific
tasks, thereby simplifying and accelerating the development process.
The Python programming language comes with a variety of built-in functions. Among these are
several common functions, including:
print() which prints expressions out
abs() which returns the absolute value of a number
int() which converts another data type to an integer
len() which returns the length of a sequence or collection
These built-in functions, however, are limited, and we can make use of modules to make more
sophisticated programs.
A module is a set of code or functions with the.py extension. A library is a collection of related
modules or packages. They are used by both programmers and developers. Libraries are used by
community members, developers and researchers.
In [1]: 1 # Let's try to import a module , math
2 import math
Since math is a built-in module, your interpreter should complete the task with no feedback,
returning to the prompt. This means you don’t need to do anything to start using the math module.
Let’s run the import statement with a module that you may not have installed, like the 2D plotting
library matplotlib:
In [2]: 1 import matplotlib
NumPy:
Numerical computing, array manipulation, and scientific computing.
Pandas:
Data manipulation and analysis, especially with DataFrames.
Matplotlib:
Data visualization, creating static, interactive, and animated visualizations.
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 1/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
Seaborn:
Statistical data visualization, building on Matplotlib for more visually appealing and informative
plots.
Scikit-learn: sklearn
Machine learning algorithms, providing tools for classification, regression, clustering, and more.
TensorFlow:
Deep learning framework, used for building and training neural networks.
SciPy:
Scientific computing, containing modules for optimization, integration, and signal processing.
PyTorch:
Another popular deep learning framework, known for its flexibility and dynamic computation graphs
Keras:
High-level API for building and training neural networks, often used as a front-end for TensorFlow
or Theano.
In [3]: 1 import pandas as pd
Basic data structures in pandas Pandas provides two types of classes for handling data:
1. Series: a one-dimensional labeled array holding data of any type such as integers, strings,
Python objects etc.
2. DataFrame: a two-dimensional data structure that holds data like a two-dimension array or a
table with rows and columns.
In [4]: 1 # let's create a pandas series
2 # nan is a blank cell, it is defined in numpy library
3
4 s = pd.Series([1, 3, 5, 0, 6, 8])
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 2/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
In [5]: 1 s
Out[5]: 0 1
1 3
2 5
3 0
4 6
5 8
dtype: int64
In [6]: 1 # let's create a pandas series
2 # nan is a blank cell, it is defined in numpy library
3
4 s = pd.Series([1, 3, 5, np.nan, 6, 8])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_84444\2328450563.py in <module>
2 # nan is a blank cell, it is defined in numpy library
3
----> 4 s = pd.Series([1, 3, 5, np.nan, 6, 8])
NameError: name 'np' is not defined
In [7]: 1 # we need to import the numpy library to execute the code correctly
2 import numpy as np
In [8]: 1 s = pd.Series([1, 3, 5, np.nan, 6, 8])
In [9]: 1 s
Out[9]: 0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
In [10]: 1 # import the pandas library
2 import pandas as pd
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 3/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
In [11]: 1 # create a dataframe df
2 df = pd.DataFrame(
3 {
4 "A": 1.0,
5 "B": pd.Timestamp("20130102"),
6 "C": pd.Series(1, index=list(range(4)), dtype="float32"),
7 "D": np.array([3] * 4, dtype="int32"),
8 "E": pd.Categorical(["test", "train", "test", "train"]),
9 "F": "foo",
10 }
11 )
In [12]: 1 df
Out[12]:
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
In [13]: 1 # to get the number of rows, number of columns of a df
2 df.shape
Out[13]: (4, 6)
In [14]: 1 # in a dataframe, the index 0 refers to rows
2 df.shape [0]
Out[14]: 4
In [15]: 1 # in a dataframe, the index 1 refers to columns
2 df.shape [1]
Out[15]: 6
In [16]: 1 # to get the different datatypes of different columns
2 df.dtypes
Out[16]: A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 4/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
An object type is a user-defined composite datatype that encapsulates a data structure along with
the functions and procedures needed to manipulate the data. Data object will have memory
In [17]: 1 # after uploading the dataset onto your anaconda 3 folder (or the folder
2 # if the file you want to read is in .csv (comma seperated) then use pd.r
3 # Import an Excel file using the read_excel () function from the pandas l
4 # Set a column index while reading your data into memory.
5
6 data = pd.read_excel ('B Example 1 - Data Encoding.xlsx')
7 data
Out[17]:
S.N Country Hours Salary House
0 0 France 34.0 12000.0 No
1 1 Spain 37.0 49000.0 Yes
2 2 Germany 20.0 34000.0 No
3 3 Spain 58.0 41000.0 No
4 4 Germany 40.0 43333.3 Yes
5 5 France 45.0 28000.0 Yes
6 6 Spain 39.8 51000.0 No
7 7 France 28.0 89000.0 Yes
8 8 Germany 50.0 53000.0 No
9 9 France 47.0 33000.0 Yes
In [18]: 1 data.dtypes
Out[18]: S.N int64
Country object
Hours float64
Salary float64
House object
dtype: object
In [19]: 1 column_name = df.columns
2 column_name
Out[19]: Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 5/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
In [20]: 1 # in bigger datasets, you may want to automatically check if data type 'o
2
3
4 column_type = column_name.dtype
5 if column_type == 'object':
6 print('The column contains string data')
7 else:
8 print('The column does not contain string data')
The column contains string data
axis = 1 means columns and axis = 0 means rows
In [21]: 1 d = {
2 "A": 1.0,
3 "B": pd.Timestamp("20130102"),
4 "C": pd.Series(1, index=list(range(4)), dtype="float32"),
5 "D": np.array([3] * 4, dtype="int32"),
6 "E": pd.Categorical(["test", "train", "test", "train"]),
7 "F": "foo",
8 }
9 d
Out[21]: {'A': 1.0,
'B': Timestamp('2013-01-02 00:00:00'),
'C': 0 1.0
1 1.0
2 1.0
3 1.0
dtype: float32,
'D': array([3, 3, 3, 3]),
'E': ['test', 'train', 'test', 'train']
Categories (2, object): ['test', 'train'],
'F': 'foo'}
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 6/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
In [22]: 1 pd.DataFrame (data = d)
Out[22]:
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
In [23]: 1 df.A
Out[23]: 0 1.0
1 1.0
2 1.0
3 1.0
Name: A, dtype: float64
In [24]: 1 df.B
Out[24]: 0 2013-01-02
1 2013-01-02
2 2013-01-02
3 2013-01-02
Name: B, dtype: datetime64[ns]
In [25]: 1 df [A]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_84444\4086650841.py in <module>
----> 1 df [A]
NameError: name 'A' is not defined
In [26]: 1 df["A"]
Out[26]: 0 1.0
1 1.0
2 1.0
3 1.0
Name: A, dtype: float64
In [27]: 1 df.columns
Out[27]: Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 7/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
In [28]: 1 df.index
Out[28]: Int64Index([0, 1, 2, 3], dtype='int64')
In [29]: 1 import pandas as pd
2 data = pd.read_excel ('B Example 1 - Data Encoding.xlsx')
3 data
4
5 data ['Country']
Out[29]: 0 France
1 Spain
2 Germany
3 Spain
4 Germany
5 France
6 Spain
7 France
8 Germany
9 France
Name: Country, dtype: object
In [30]: 1 type (data ['Country'])
Out[30]: pandas.core.series.Series
In [31]: 1 data [['Country']]
Out[31]:
Country
0 France
1 Spain
2 Germany
3 Spain
4 Germany
5 France
6 Spain
7 France
8 Germany
9 France
In [32]: 1 type (data [['Country']])
Out[32]: pandas.core.frame.DataFrame
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 8/9
4/14/25, 10:04 PM Python Libraries - Jupyter Notebook
localhost:8888/notebooks/anaconda3/0_BigData_2025/Python Libraries.ipynb# 9/9