Essential Python Libraries and Frameworks
Essential Python Libraries and Frameworks
https://www.amazon.com/dp/B0BW2MGYG4
Murat Durmus
A Hands-On Introduction to
Essential Python
Libraries and Frameworks
(With Code Samples)
Copyright © 2023 Murat Durmus
All rights reserved. No part of this publication may be reproduced, distributed, or
transmitted in any form or by any means, including photocopying, recording, or other
electronic or mechanical methods, without the prior written permission of the
publisher, except in the case of brief quotations embodied in critical reviews and certain
other noncommercial uses permitted by copyright law.
Cover design:
Murat Durmus
▪ LinkedIn: https://www.linkedin.com/in/ceosaisoma/
▪ E-Mail: [email protected]
Note:
The code examples and their description in this book were written
with the support of ChatGPT (OpenAI).
"Python is not just a language,
it's a community
where developers can learn,
collaborate and create wonders."
- Guido van Rossum
(Creator of Python)
A BRIEF HISTORY OF PYTHON PROGRAMMING LANGUAGE ............ 1
PANDAS ...................................................................... 6
Pros and Cons ........................................................... 8
NUMPY ..................................................................... 10
Pros and Cons ......................................................... 12
SEABORN .................................................................. 14
Pros and Cons ......................................................... 16
SCIPY ......................................................................... 18
Pros and Cons ......................................................... 20
MATPLOTLIB ............................................................. 22
Pros and Cons ......................................................... 24
SCIKIT-LEARN ............................................................ 27
Pros and Cons ......................................................... 29
PYTORCH................................................................... 32
Pros and Cons ......................................................... 36
TENSORFLOW ........................................................... 38
Pros and Cons ......................................................... 40
XGBOOST .................................................................. 43
Pros and Cons ......................................................... 45
LIGHTGBM ................................................................ 47
Pros and Cons ......................................................... 49
KERAS ........................................................................ 51
Pros and Cons ......................................................... 52
PYCARET.................................................................... 54
Pros and Cons ......................................................... 55
vi
MLOPS .......................................................................................... 57
MLFLOW ................................................................... 58
Pros and Cons ........................................................ 60
KUBEFLOW ............................................................... 61
Pros and Cons ........................................................ 66
ZENML ...................................................................... 69
Pros and Cons ........................................................ 72
EXPLAINABLE AI ............................................................................ 74
SHAP ......................................................................... 75
Pros and Cons ........................................................ 77
LIME .......................................................................... 79
Pros and Cons: ....................................................... 81
INTERPRETML ........................................................... 84
Pros and Cons ........................................................ 87
SPACY ....................................................................... 90
Pros and Cons ........................................................ 91
NLTK ......................................................................... 93
Pros and Cons ........................................................ 94
TEXTBLOB ................................................................. 96
Pros and Cons ........................................................ 97
CORENLP................................................................... 99
Pros and Cons ...................................................... 100
GENSIM .................................................................. 102
Pros and Cons ...................................................... 104
REGEX ..................................................................... 106
Pros and Cons ...................................................... 107
vii
IMAGE PROCESSING .....................................................................109
viii
SELENIUM ............................................................... 155
Pros and Cons ...................................................... 156
A PRIMER TO THE 42 MOST COMMONLY USED
MACHINE LEARNING ALGORITHMS (WITH CODE
SAMPLES) ............................................................... 158
MINDFUL AI ............................................................ 159
INSIDE ALAN TURING: QUOTES & CONTEMPLATIONS
................................................................................ 160
ix
A BRIEF HISTORY OF PYTHON
PROGRAMMING LANGUAGE
Python is a popular high-level programming language for
various applications, including web development,
scientific computing, data analysis, and machine learning.
Its simplicity, readability, and versatility have made it a
popular choice for programmers of all levels of expertise.
Here is a brief history of Python programming language.
1
PANDAS
2
PANDAS
At a glance:
3
PANDAS
4
PANDAS
DATA SCIENCE
Data science is an interdisciplinary field that involves
extracting, analyzing, and interpreting large, complex data
sets. It combines elements of statistics, computer science,
and domain expertise to extract insights and knowledge
from data.
5
PANDAS
PANDAS
Python Pandas is an open-source data manipulation and
analysis library for the Python programming language. It
provides a set of data structures for efficiently storing and
manipulating large data sets, as well as a variety of tools
for data analysis, cleaning, and preprocessing.
6
PANDAS
print(data.head())
7
PANDAS
Cons:
8
PANDAS
9
NUMPY
NUMPY
NumPy is a Python library for numerical computing. It
provides powerful data structures, such as n-dimensional
arrays or "ndarrays", and a wide range of mathematical
functions for working with these arrays efficiently.
10
NUMPY
Output:
Original array: [1 2 3 4 5]
Array multiplied by 2: [ 2 4 6 8 10]
Array squared: [ 1 4 9 16 25]
Array sine values: [ 0.84147098 0.90929743
0.14112001 -0.7568025 -0.95892427]
Original 2D array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Subarray:
[[2 3]
[5 6]]
11
NUMPY
Cons:
12
NUMPY
13
SEABORN
SEABORN
Seaborn is a Python data visualization library built on top
of Matplotlib. It provides a high-level interface for creating
informative and attractive statistical graphics in Python.
14
SEABORN
# Load dataset
df = pd.read_csv('my_dataset.csv')
# Show plot
sns.plt.show()
15
SEABORN
16
SEABORN
Cons:
17
SCIPY
SCIPY
Scipy is an open-source scientific computing library for
Python that provides a collection of functions for
mathematics, science, and engineering. It is built on top of
the NumPy library, which provides efficient array
operations for numerical computing.
18
SCIPY
# Print result
print("Result:", result)
print("Error:", error)
19
SCIPY
Cons:
20
SCIPY
21
MATPLOTLIB
MATPLOTLIB
Matplotlib is a popular data visualization library for the
Python programming language. It provides a way to create
a wide range of static, animated, and interactive
visualizations in Python.
22
MATPLOTLIB
Finally, we add some labels and a title to the plot using the
set_xlabel, set_ylabel, and set_title functions. We then
use the show function to display the plot.
23
MATPLOTLIB
Cons:
24
MATPLOTLIB
25
MATPLOTLIB
MACHINE LEARNING
Machine learning is a subfield of artificial intelligence that
develops algorithms that can automatically learn and
improve from data.
26
SCIKIT-LEARN
SCIKIT-LEARN
Python scikit-learn (also known as sklearn) is a popular
machine learning library for the Python programming
language. It provides a range of supervised and
unsupervised learning algorithms for various types of data
analysis tasks such as classification, regression, clustering,
and dimensionality reduction.
• Naive Bayes
27
SCIKIT-LEARN
28
SCIKIT-LEARN
print(predicted_y)
29
SCIKIT-LEARN
Cons:
30
SCIKIT-LEARN
31
PYTORCH
PYTORCH
PyTorch is a popular open-source machine learning library
for the Python programming language. It is primarily used
for developing deep learning models and provides a range
of tools and features for building, training, and deploying
neural networks.
32
PYTORCH
33
PYTORCH
transforms.Normalize((0.1307,), (0.3081,))])
trainset = datasets.MNIST(root='./data',
train=True, download=True, transform=transform)
testset = datasets.MNIST(root='./data',
train=False, download=True,
transform=transform)
trainloader =
torch.utils.data.DataLoader(trainset,
batch_size=32, shuffle=True)
testloader =
torch.utils.data.DataLoader(testset,
batch_size=32, shuffle=False)
34
PYTORCH
total = 0
with torch.no_grad():
for data in testloader:
inputs, labels = data
inputs = inputs.view(-1, 28*28)
outputs = net(inputs)
_, predicted = torch.max(outputs.data,
1)
total += labels.size(0)
correct += (predicted ==
labels).sum().item()
print(f"Accuracy: {correct / total}")
35
PYTORCH
36
PYTORCH
Cons:
37
TENSORFLOW
TENSORFLOW
TensorFlow is a popular open-source machine learning
library developed by Google. It is primarily used for
building and training deep neural networks, although it
also includes a range of tools and features for other
machine learning tasks.
38
TENSORFLOW
loss=tf.keras.losses.SparseCategoricalCrossentr
opy(from_logits=True),
metrics=['accuracy'])
39
TENSORFLOW
40
TENSORFLOW
Cons:
41
TENSORFLOW
42
XGBOOST
XGBOOST
XGBoost is an open-source software library which provides
a gradient boosting framework for machine learning. It
was developed by Tianqi Chen and his colleagues at the
University of Washington and is now maintained by DMLC.
XGBoost is designed to be scalable, portable and efficient,
making it popular for use in a wide range of applications,
including prediction, classification, and ranking problems
in industry and academia.
43
XGBOOST
44
XGBOOST
45
XGBOOST
Cons:
46
LIGHTGBM
LIGHTGBM
Python LightGBM is a gradient boosting framework that
uses tree-based learning algorithms. It is a powerful
machine learning library that was developed by Microsoft
and is designed to be efficient and fast. LightGBM stands
for "Light Gradient Boosting Machine". It was developed
to tackle large-scale data and can handle millions of rows
and thousands of features.
47
LIGHTGBM
48
LIGHTGBM
49
LIGHTGBM
Cons:
50
KERAS
KERAS
Keras is a high-level neural networks API, written in Python
and capable of running on top of popular deep learning
frameworks such as TensorFlow. Keras was designed to
enable fast experimentation with deep neural networks,
and it has become one of the most popular deep learning
libraries. It is particularly well-suited for building and
training deep learning models for computer vision and
natural language processing (NLP) tasks. Keras is open-
source and is maintained by a community of contributors
on GitHub.
51
KERAS
52
KERAS
Cons:
53
PYCARET
PYCARET
PyCaret is an open-source machine learning library in
Python that automates the end-to-end machine learning
process. It is designed to be an easy-to-use library that
requires minimal coding effort while providing maximum
flexibility and control to the user. PyCaret has a wide range
of features, including data preprocessing, classification,
regression, clustering, anomaly detection, natural
language processing, time series forecasting, and model
deployment.
# load data
data = get_data('diabetes')
54
PYCARET
# setup model
clf = setup(data, target='Class variable')
# compare models
compare_models()
55
PYCARET
Cons:
56
PYCARET
MLOPS
MLOps (Machine Learning Operations) is a set of practices
and tools that streamline the machine learning (ML)
development lifecycle, from development to deployment
and maintenance.
57
MLFLOW
MLFLOW
MLflow is an open-source platform for managing and
tracking machine learning experiments. It provides a
simple and flexible interface for tracking experiments,
packaging code into reproducible runs, and sharing and
deploying models.
58
MLFLOW
# Define a model
model = LinearRegression()
59
MLFLOW
60
KUBEFLOW
Cons:
KUBEFLOW
Kubeflow is an open-source platform for running machine
learning workloads on Kubernetes. Kubernetes is a
61
KUBEFLOW
62
KUBEFLOW
63
KUBEFLOW
command=['python',
'/app/load_data.py'],
arguments=[
'--data-path', data_path,
'--output-path',
'/mnt/data/raw_data.csv'
]
)
64
KUBEFLOW
pipeline_filename = pipeline_func.__name__ +
'.yaml'
kfp.compiler.Compiler().compile(pipeline_func,
pipeline_filename)
65
KUBEFLOW
66
KUBEFLOW
Cons:
67
KUBEFLOW
68
ZENML
ZENML
ZENML is an open-source MLOps framework that provides
a pipeline-based approach for managing end-to-end
machine learning workflows. ZENML is designed to
simplify the development and deployment of machine
learning models by providing a high-level API for common
machine learning tasks.
69
ZENML
from zenml.steps.preprocesser.standard_scaler
import StandardScaler
from zenml.steps.splitter.random_split import
RandomSplit
from zenml.steps.trainer.tf_trainer import
TFTrainer
from
zenml.backends.orchestrator.tf_local_orchestrat
or import TFLocalOrchestrator
# Define splitter
split = RandomSplit(split_map={'train': 0.7,
'eval': 0.2, 'test': 0.1})
# Define preprocesser
preprocesser = StandardScaler()
# Define trainer
trainer = TFTrainer(
loss='categorical_crossentropy',
last_activation='softmax',
epochs=10,
batch_size=32
)
# Define evaluator
evaluator = TFEvaluator()
# Define pipeline
pipeline = SimplePipeline(
datasource=ds,
splitter=split,
preprocesser=preprocesser,
trainer=trainer,
evaluator=evaluator,
name='my-pipeline'
)
70
ZENML
# Define orchestrator
orchestrator = TFLocalOrchestrator()
# Run pipeline
orchestrator.run(pipeline)
71
ZENML
Cons:
72
ZENML
73
ZENML
EXPLAINABLE AI
Explainable AI (XAI) is a set of techniques and practices
that aim to make machine learning models and their
decisions more transparent and understandable to
humans.
74
SHAP
SHAP
SHAP (SHapley Additive exPlanations) is a popular open-
source library for interpreting and explaining the
predictions of machine learning models. SHAP is based on
the concept of Shapley values, which are a method from
cooperative game theory used to determine the
contribution of each player to a cooperative game. In the
context of machine learning, SHAP computes the
contribution of each feature to a particular prediction,
providing insight into how the model is making its
predictions.
75
SHAP
Finally, we plot the SHAP values for the first instance using
the waterfall function from the shap.plots module. This
generates a waterfall plot showing the contribution of
each feature to the model's prediction for the first
instance.
76
SHAP
Cons:
77
SHAP
78
LIME
LIME
Python LIME (Local Interpretable Model-Agnostic
Explanations) is an open-source library for explaining the
predictions of machine learning models. Like Python SHAP,
LIME provides a way to understand how a model is making
its predictions by generating explanations for individual
instances. However, while SHAP provides global feature
importance measures, LIME generates local explanations
that are specific to a particular instance.
79
LIME
80
LIME
81
LIME
Cons:
82
LIME
83
INTERPRETML
INTERPRETML
InterpretML is an open-source Python library for
interpreting and explaining machine learning models. It
provides a range of tools and techniques for
understanding how a model is making its predictions,
including global feature importance, local explanations,
and counterfactual reasoning. The library is designed to be
model-agnostic and can be used with a wide range of
machine learning models, including regression,
classification, and time series models.
84
INTERPRETML
85
INTERPRETML
ebm =
ExplainableBoostingClassifier(random_state=42)
ebm.fit(X, y)
86
INTERPRETML
Cons:
87
INTERPRETML
88
INTERPRETML
TEXT PROCESSING
Text processing is analyzing and manipulating textual data
to extract useful information or insights. It involves various
techniques and tools, including natural language
processing (NLP), machine learning, and statistical
analysis.
89
SPACY
SPACY
Spacy is an open-source library for advanced natural
language processing (NLP) in Python. It provides a wide
range of NLP capabilities, including tokenization, part-of-
speech tagging, named entity recognition, dependency
parsing, and more. Spacy is designed to be fast, efficient,
and user-friendly, making it a popular choice for
developing NLP applications.
# Text to process
text = "Apple is looking at buying U.K. startup
for $1 billion"
90
SPACY
91
SPACY
Cons:
92
NLTK
NLTK
NLTK stands for Natural Language Toolkit. It is a popular
open-source library for natural language processing (NLP)
tasks in Python. It provides a wide range of functionalities
for processing human language such as tokenization,
stemming, lemmatization, POS tagging, and more. It also
includes a number of pre-built corpora and resources for
training machine learning models for NLP tasks. NLTK is
widely used for various applications such as text
classification, sentiment analysis, machine translation, and
information extraction.
# sample text
text = "This is an example sentence for
tokenization."
Output:
['This', 'is', 'an', 'example', 'sentence',
'for', 'tokenization', '.']
93
NLTK
Cons:
94
NLTK
95
TEXTBLOB
TEXTBLOB
Python TextBlob is a popular open-source Python library
used for processing textual data. It provides a simple API
for natural language processing tasks like sentiment
analysis, part-of-speech tagging, noun phrase extraction,
and more. It is built on top of the Natural Language Toolkit
(NLTK) library and provides an easy-to-use interface for
text processing.
# Sentiment Analysis
sentiment_polarity = blob.sentiment.polarity
sentiment_subjectivity =
blob.sentiment.subjectivity
print("Sentiment Polarity:",
sentiment_polarity)
print("Sentiment Subjectivity:",
sentiment_subjectivity)
# Text Translation
translation = blob.translate(to='fr')
print("Translation to French:", translation)
96
TEXTBLOB
Pros:
Cons:
97
TEXTBLOB
98
CORENLP
CORENLP
Python CoreNLP is a Python wrapper for Stanford
CoreNLP, a Java-based natural language processing toolkit
developed by Stanford University. It provides a set of tools
for various natural language processing tasks such as part-
of-speech tagging, named entity recognition, dependency
parsing, sentiment analysis, and more. It can be used to
analyze and extract information from text data in different
formats like plain text, HTML, and XML.
nlp = StanfordCoreNLP(r'/path/to/corenlp',
memory='8g')
Output:
John PERSON
Google ORGANIZATION
California STATE_OR_PROVINCE
In this example, we first import the StanfordCoreNLP class
from the stanfordcorenlp package. Then, we create a
99
CORENLP
Cons:
100
CORENLP
101
GENSIM
GENSIM
Gensim is an open-source library for unsupervised topic
modeling and natural language processing. It provides a
suite of algorithms and models for tasks such as document
similarity analysis, document clustering, and topic
modeling. The library is designed to be scalable and
efficient, with support for streaming data and distributed
computing.
102
GENSIM
Output:
[(0,
'0.082*"and" + 0.082*"broccoli" + 0.082*"eat"
+ 0.082*"to" + 0.082*"bananas" + 0.060*"i" +
103
GENSIM
104
GENSIM
Cons:
105
REGEX
REGEX
Python Regex (Regular Expression) library is a powerful
tool used for pattern matching and text processing. It
provides a set of functions and meta-characters that allow
us to search and manipulate strings using complex
patterns. The regular expression is a sequence of
characters that define a search pattern. Python's built-in
re module provides support for regular expressions in
Python. It is a widely used library for performing various
text manipulation tasks such as string matching, searching,
parsing, and replacing.
106
REGEX
else:
print("Phone number not found.")
if email_match:
print("Email found:", email_match.group())
else:
print("Email not found.")
Output:
107
REGEX
Cons:
108
REGEX
IMAGE PROCESSING
Image processing analyzes and manipulates digital images
to extract useful information or improve their quality. It
involves various techniques and tools, including computer
vision, machine learning, and signal processing.
109
OPENCV
OPENCV
OpenCV (Open-Source Computer Vision Library) is a library
of programming functions mainly aimed at real-time
computer vision. It provides many useful and powerful
algorithms and techniques for computer vision and
machine learning applications, including image and video
processing, object detection and recognition, camera
calibration, and more.
110
OPENCV
while True:
# Read a frame from the camera
ret, frame = cap.read()
111
OPENCV
112
OPENCV
Cons:
113
SCIKIT-IMAGE
SCIKIT-IMAGE
Python scikit-image is an open-source image processing
library that provides algorithms for image processing and
computer vision tasks such as filtering, segmentation,
object detection, and more. It is built on top of the
scientific Python ecosystem, including NumPy, SciPy, and
matplotlib.
114
SCIKIT-IMAGE
# Load image
image = io.imread('example.jpg', as_gray=True)
115
SCIKIT-IMAGE
Cons:
116
SCIKIT-IMAGE
117
PILLOW
PILLOW
Pillow is a popular Python library used for image
processing tasks. It is a fork of the Python Imaging Library
(PIL) and supports many of its features, while also
including additional functionality and bug fixes. Pillow
provides a comprehensive set of functions for opening,
manipulating, and saving image files in a wide variety of
formats, including BMP, PNG, JPEG, TIFF, and GIF.
118
PILLOW
119
PILLOW
Cons:
120
MAHOTAS
MAHOTAS
Python Mahotas is an image processing library that
provides a set of algorithms for image processing and
computer vision tasks. It is built on top of numpy and scipy
and provides functions to perform operations like filtering,
segmentation, feature extraction, morphology, and other
image processing tasks.
• Watershed segmentation
121
MAHOTAS
import mahotas as mh
import numpy as np
from skimage import data
# Apply thresholding
thresh = mh.thresholding.otsu(image)
# Label regions
labeled, nr_objects = mh.label(image > thresh)
# Display results
print("Number of objects:", nr_objects)
for region in regions:
print("Object:", region.label)
print("Area:", region.area)
print("Perimeter:", region.perimeter)
print("Eccentricity:", region.eccentricity)
print("Intensity mean:",
region.mean_intensity)
print("")
122
MAHOTAS
Cons:
123
SIMPLEITK
SIMPLEITK
SimpleITK is a high-level interface to the Insight
Segmentation and Registration Toolkit (ITK). It is a Python
library used for image processing, analysis, and computer
vision tasks. SimpleITK allows for easy manipulation of
images, such as filtering, segmentation, registration, and
feature extraction.
# Read an image
image = sitk.ReadImage("image.nii")
124
SIMPLEITK
size = image.GetSize()
125
SIMPLEITK
Cons:
126
SIMPLEITK
WEB FRAMEWORK
A web framework is a software framework designed to
simplify the development of web applications by providing
a set of reusable components and tools for building and
managing web-based projects. It provides a standardized
way to build and deploy web applications by providing a
structure, libraries, and pre-written code to handle
everyday tasks such as request handling, routing, form
processing, data validation, and database access.
127
FLASK
FLASK
Flask is a micro web framework written in Python. It is
classified as a microframework because it does not require
particular tools or libraries. It has no database abstraction
layer, form validation, or any other components where
pre-existing third-party libraries provide common
functions. However, Flask supports extensions that can
add application features as if they were implemented in
Flask itself. There are extensions for object-relational
mappers, form validation, upload handling, various open
authentication technologies, and more.
app = Flask(__name__)
@app.route('/')
def hello():
return 'Hello, World!'
if __name__ == '__main__':
app.run()
128
FLASK
Cons:
129
FLASK
130
FASTAPI
FASTAPI
FastAPI is a modern, fast (high-performance) web
framework for building APIs with Python 3.6+ based on
standard Python type hints. It is designed to be easy to use
and understand, with a focus on developer productivity
and code quality.
131
FASTAPI
app = FastAPI()
@app.get("/")
This command starts the server with the main module and
app instance as the application. The --reload option will
automatically reload the server on code changes.
132
FASTAPI
133
FASTAPI
Cons:
134
DJANGO
DJANGO
Django is a high-level Python web framework that allows
for rapid development of secure and maintainable
websites. It follows the model-view-controller (MVC)
architectural pattern and provides an extensive set of tools
and libraries for handling common web development tasks
such as URL routing, form validation, and database schema
migrations.
135
DJANGO
def hello(request):
return HttpResponse("Hello, World!")
urlpatterns = [
path('hello/', views.hello, name='hello'),
]
136
DJANGO
urlpatterns = [
path('admin/', admin.site.urls),
path('myapp/', include('myapp.urls')),
]
7. Start the Django server by running the command
python manage.py runserver in your command
prompt or terminal.
137
DJANGO
Cons:
138
DASH
DASH
Dash is a web application framework for building
interactive web-based dashboards. It is built on top of
Flask, Plotly.js, and React.js, which makes it easy to build
complex and data-driven web applications. Dash allows
users to create interactive dashboards with interactive
graphs, tables, and widgets without needing to know
HTML, CSS, or JavaScript.
app = dash.Dash()
app.layout = html.Div(children=[
html.H1(children='Hello Dash'),
html.Div(children='''
Dash: A web application framework for
Python.
139
DASH
'''),
dcc.Graph(
id='example-graph',
figure={
'data': [
{'x': [1, 2, 3], 'y': [4, 1,
2], 'type': 'bar', 'name': 'SF'},
{'x': [1, 2, 3], 'y': [2, 4,
5], 'type': 'bar', 'name': u'Montréal'},
],
'layout': {
'title': 'Dash Data
Visualization'
}
}
)
])
if __name__ == '__main__':
app.run_server(debug=True)
140
DASH
Cons:
141
PYRAMID
PYRAMID
Pyramid is a web framework designed to make the
development of web applications more accessible by
providing a simple and flexible approach to building web
applications. Pyramid is a lightweight framework that is
easy to learn and use. It is based on the WSGI standard and
provides many features, including URL routing,
templating, authentication, and database integration.
142
PYRAMID
Then, create a new file called app.py and add the following
code:
from wsgiref.simple_server import make_server
from pyramid.config import Configurator
from pyramid.response import Response
def home(request):
return Response('Hello, Pyramid!')
if __name__ == '__main__':
with Configurator() as config:
config.add_route('home', '/')
config.add_view(home,
route_name='home')
app = config.make_wsgi_app()
server = make_server('localhost', 8000,
app)
print('Server running at
http://localhost:8000')
server.serve_forever()
143
PYRAMID
Cons:
144
PYRAMID
WEB SCRAPING
Web scraping is the process of extracting data from
websites automatically using software or a script. It
involves fetching web pages, parsing the HTML or XML
content, and extracting useful information from the web
pages, such as text, images, links, and other data.
145
BEAUTIFULSOUP
BEAUTIFULSOUP
BeautifulSoup is a Python library used for web scraping
purposes to pull the data out of HTML and XML files. It
creates a parse tree from page source code that can be
used to extract data in a hierarchical and more readable
manner.
url = "https://www.nytimes.com/"
response = requests.get(url)
soup = BeautifulSoup(response.content,
'html.parser')
articles = soup.find_all('article')[:5]
146
BEAUTIFULSOUP
print(title)
print(link)
print()
Output:
https://www.nytimes.com/2022/01/20/nyregion/new
-york-city-vaccine-mandate.html
https://www.nytimes.com/2022/01/20/business/wal
l-street-banks-q4-earnings.html
https://www.nytimes.com/2022/01/20/us/politics/
afghanistan-refugees.html
https://www.nytimes.com/2022/01/20/world/europe
/eu-russia-ukraine.html
147
BEAUTIFULSOUP
https://www.nytimes.com/2022/01/20/us/politics/
elliott-abrams-dead.html
Cons:
148
BEAUTIFULSOUP
149
SCRAPY
SCRAPY
Scrapy is an open-source web crawling framework that is
used to extract data from websites. It is built on top of the
Twisted framework and provides an easy-to-use API for
crawling web pages and extracting information. Scrapy is
designed to handle large-scale web crawling tasks and can
be used to extract data for a wide range of applications,
including data mining, information processing, and even
for building intelligent agents.
150
SCRAPY
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/page/1/',
]
next_page = response.css('li.next
a::attr(href)').get()
if next_page is not None:
151
SCRAPY
yield response.follow(next_page,
self.parse)
This spider defines the name of the spider, the starting URL
to scrape, and a parse method which is responsible for
extracting the quotes from each page and following links
to the next page if they exist.
152
SCRAPY
Cons:
153
SCRAPY
154
SELENIUM
SELENIUM
Selenium is a library that enables web automation and
testing by providing a way to interact with web pages
programmatically. It allows developers to automate web
browsers, simulate user interactions with websites, and
scrape web data.
155
SELENIUM
156
SELENIUM
Cons:
157
A PRIMER TO THE 42 MOST COMMONLY USED
MACHINE LEARNING ALGORITHMS (WITH CODE SAMPLES)
Available on Amazon:
https://www.amazon.com/dp/B0BT911HDM
Kindle: (B0BT8LP2YW)
Paperback: (ISBN-13: 979-8375226071)
158
MINDFUL AI
MINDFUL AI
Reflections on Artificial Intelligence
Inspirational Thoughts & Quotes on Artificial Intelligence
(Including 13 illustrations, articles & essays for the fundamental
understanding of AI)
Available on Amazon:
https://www.amazon.com/dp/B0BKMK6HLJ
159
INSIDE ALAN TURING:
QUOTES & CONTEMPLATIONS
"We can only see a short distance ahead, but we can see
plenty there that needs to be done." ~ Alan Turing
Available on Amazon:
https://www.amazon.com/dp/B09K25RTQ6
160