100% found this document useful (2 votes)

2K views

Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download

The document promotes various ebooks and textbooks available for download at ebookmass.com, including titles focused on data ingestion, data cleaning, and data engineering with Python. It highlights the 'Data Ingestion with Python Cookbook' by Gláucia Esppenchutz as a practical guide for managing data ingestion processes. The document also includes acknowledgments and information about the author and contributors.

Uploaded by

eyshakayane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

2K views

Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download

Uploaded by

eyshakayane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Download the full version and explore a variety of ebooks

or textbooks at https://ebookmass.com

Data Ingestion with Python Cookbook: A practical

guide to ingesting, monitoring, and identifying
errors in the data ingestion process 1st Edition
Esppenchutz

_____ Follow the link below to get your download now _____

https://ebookmass.com/product/data-ingestion-with-python-
cookbook-a-practical-guide-to-ingesting-monitoring-and-
identifying-errors-in-the-data-ingestion-process-1st-
edition-esppenchutz/

Access ebookmass.com now to download high-quality

ebooks or textbooks
We have selected some products that you may be interested in
Click the link to download now or visit ebookmass.com
for more options!.

Python Data Cleaning Cookbook - Second Edition Michael

Walker

https://ebookmass.com/product/python-data-cleaning-cookbook-second-
edition-michael-walker/

Data Engineering with dbt: A practical guide to building a

cloud-based, pragmatic, and dependable data platform with
SQL Zagni
https://ebookmass.com/product/data-engineering-with-dbt-a-practical-
guide-to-building-a-cloud-based-pragmatic-and-dependable-data-
platform-with-sql-zagni/

Exploratory Data Analysis with Python Cookbook: Over 50

recipes to analyze, visualize, and extract insights from
structured and unstructured data Oluleye
https://ebookmass.com/product/exploratory-data-analysis-with-python-
cookbook-over-50-recipes-to-analyze-visualize-and-extract-insights-
from-structured-and-unstructured-data-oluleye/

Data-Driven SEO with Python: Solve SEO Challenges with

Data Science Using Python 1st Edition Andreas Voniatis

https://ebookmass.com/product/data-driven-seo-with-python-solve-seo-
challenges-with-data-science-using-python-1st-edition-andreas-
voniatis/
Statistical Process Monitoring Using Advanced Data-Driven
and Deep Learning Approaches: Theory and Practical
Applications 1st Edition Fouzi Harrou
https://ebookmass.com/product/statistical-process-monitoring-using-
advanced-data-driven-and-deep-learning-approaches-theory-and-
practical-applications-1st-edition-fouzi-harrou/

Beginner's Guide to Streamlit with Python: Build Web-Based

Data and Machine Learning Applications 1st Edition Sujay
Raghavendra
https://ebookmass.com/product/beginners-guide-to-streamlit-with-
python-build-web-based-data-and-machine-learning-applications-1st-
edition-sujay-raghavendra/

Data Structure and Algorithms With Python: The Ultimate

Guide Towards Coding John Thomas

https://ebookmass.com/product/data-structure-and-algorithms-with-
python-the-ultimate-guide-towards-coding-john-thomas/

(eBook PDF) Intro to Python for Computer Science and Data

Science: Learning to Program with AI, Big Data and The
Cloud
https://ebookmass.com/product/ebook-pdf-intro-to-python-for-computer-
science-and-data-science-learning-to-program-with-ai-big-data-and-the-
cloud/

Nonclinical Study Contracting and Monitoring: A Practical

Guide 1st Edition

https://ebookmass.com/product/nonclinical-study-contracting-and-
monitoring-a-practical-guide-1st-edition/
Data Ingestion with Python
Cookbook

A practical guide to ingesting, monitoring, and identifying

errors in the data ingestion process

Gláucia Esppenchutz

BIRMINGHAM—MUMBAI
Data Ingestion with Python Cookbook
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.

Group Product Manager: Reshma Raman

Publishing Product Manager: Arindam Majumdar
Senior Editor: Tiksha Lad
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Sejal Dsilva
Production Designer: Jyoti Chauhan
Marketing Coordinator: Nivedita Singh

First published: May 2023

Production reference: 1300523

Published by Packt Publishing Ltd.

Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-83763-260-2

www.packtpub.com
This book represents a lot and wouldn’t be possible without my loving husband, Lincoln, and his
support and understanding during this challenging endeavor. I want to thank all my friends that
didn’t let me give up and always boosted my spirits, along with my grandmother, who always believed,
helped, and said I would do big things one day. Finally, I want to thank my beloved and four-pawed
best friend, who is at peace, Minduim, for “helping” me to write this book.

– Gláucia Esppenchutz
Contributors

About the author

Gláucia Esppenchutz is a data engineer with expertise in managing data pipelines and vast amounts
of data using cloud and on-premises technologies. She worked in companies such as Globo.com,
BMW Group, and Cloudera. Currently, she works at AiFi, specializing in the field of data operations
for autonomous systems.
She comes from the biomedical field and shifted her career ten years ago to chase the dream of
working closely with technology and data. She is in constant contact with the open source community,
mentoring people and helping to manage projects, and has collaborated with the Apache, PyLadies
group, FreeCodeCamp, Udacity, and MentorColor communities.

I want to thank my patient and beloved husband and my friends. Thanks also to my mentors in the
Python open source community and the DataBootCamp founders, who guided me at the beginning
of my journey.
Thanks to the Packt team, who helped me through some hard times; you were terrific!
About the reviewers
Bitthal Khaitan is currently working as a big data and cloud engineer with CVS Health, a Fortune
4 organization. He has a demonstrated history of working in the cloud, data and analytics industry
for 12+ years. His primary certified skills are Google Cloud Platform (GCP), the big data ecosystem
(Hadoop, Spark, etc.), and data warehousing on Teradata. He has worked in all phases of the SDLC
of DW/BI and big data projects with strong expertise in the USA healthcare, insurance and retail
domains. He actively helps new graduates with mentoring, resume reviews, and job hunting tips in
the data engineering domain. Over 20,000 people follow Bitthal on LinkedIn. He is currently based
out of Dallas, Texas, USA.
Jagjeet Makhija is a highly accomplished technology leader with over 20 years of experience. They are
skilled not only in various domains including AI, data warehouse architecture, and business analytics,
but also have a strong passion for staying ahead of technology trends such as AI and ChatGPT.
Jagjeet is recognized for their significant contributions to the industry, particularly in complex proof
of concepts and integrating Microsoft products with ChatGPT. They are also an avid book reviewer
and have actively shared their extensive knowledge and expertise through presentations, blog articles,
and online forums.
Krishnan Raghavan is an IT professional with over 20 years of experience in the area of software
development and delivery excellence across multiple domains and technology, ranging from C++ to
Java, Python, data warehousing, and big data tools and technologies. Krishnan tries to give back to the
community by being part of GDG – Pune Volunteer Group, helping the team in organizing events.
When not working, Krishnan likes to spend time with his wife and daughter, as well as reading fiction,
non-fiction, and technical books. Currently, he is unsuccessfully trying to learn how to play the guitar.
You can connect with Krishnan at mail to: k r i s h n a n @ g m a i l . c o m or via
LinkedIn: www.linkedin.com/in/krishnan-raghavan

I would like to thank my wife, Anita, and daughter, Ananya, for giving me the time and space to
review this book.
Table of Contents

Prefacexv

Part 1: Fundamentals of Data Ingestion

1
Introduction to Data Ingestion 3
Technical requirements 4 How to do it… 17
Setting up Python and its environment 4 How it works… 21
See also 22
Getting ready 4
How to do it… 5 Creating schemas 22
How it works… 7 Getting ready 22
There’s more… 7 How to do it… 22
See also 7 How it works… 24
Installing PySpark 8 See also 25

Getting ready 8 Applying data governance in ingestion 25

How to do it… 10 Getting ready 25
How it works… 12 How to do it… 26
There’s more… 12 How it works… 28
See also 12 See also 28
Configuring Docker for MongoDB 13 Implementing data replication 29
Getting ready 13 Getting ready 29
How to do it… 13 How to do it… 29
How it works… 14 How it works… 29
There’s more… 16 There’s more… 30
See also 16
Further reading 30
Configuring Docker for Airflow 16
Getting ready 17
viii Table of Contents

2
Principals of Data Access – Accessing Your Data 31
Technical requirements 31 How to do it… 47
Implementing governance in a data How it works… 48
access workflow 32 There’s more… 49
Getting ready 32 See also 52
How to do it… 33 Managing encrypted ﬁles 52
How it works… 34 Getting ready 52
See also 34 How to do it… 53
Accessing databases and data How it works… 54
warehouses34 There’s more… 55
Getting ready 35 See also 56
How to do it… 35 Accessing data from AWS using S3 56
How it works… 37 Getting ready 56
There’s more… 38 How to do it… 59
See also 39 How it works… 62
Accessing SSH File Transfer Protocol There’s more… 63
(SFTP) ﬁles 39 See also 63
Getting ready 39 Accessing data from GCP using
How to do it… 41 Cloud Storage 64
How it works… 43 Getting ready 64
There’s more… 43 How to do it… 66
See also 44 How it works… 68
Retrieving data using API There’s more… 70
authentication44 Further reading 70
Getting ready 45

3
Data Discovery – Understanding Our Data before Ingesting It 71
Technical requirements 71 How to do it… 73
Documenting the data discovery How it works… 77
process71 Configuring OpenMetadata 77
Getting ready 72 Getting ready 77
Table of Contents ix

How to do it… 79 Getting ready 86

How it works… 84 How to do it… 88
There’s more… 86 How it works… 92
See also 86
Further reading 93
Connecting OpenMetadata Other tools 94
to our database 86

4
Reading CSV and JSON Files and Solving Problems 95
Technical requirements 95 How it works… 105
Reading a CSV file 96 There’s more… 106
See also 107
Getting ready 96
How to do it… 96 Using PySpark to read CSV files 108
How it works… 98 Getting ready 108
There’s more… 98 How to do it… 108
See also 99 How it works… 109
Reading a JSON file 99 There’s more… 110
See also 114
Getting ready 100
How to do it… 100 Using PySpark to read JSON files 114
How it works… 100 Getting ready 114
There’s more… 101 How to do it… 115
See also 103 How it works… 116
Creating a SparkSession for PySpark 103 There’s more… 117
See also 117
Getting ready 103
How to do it… 104 Further reading 117

5
Ingesting Data from Structured and Unstructured Databases 119
Technical requirements 119 There’s more… 127
Configuring a JDBC connection 120 See also 129

Getting ready 120 Ingesting data from a JDBC database

How to do it… 124 using SQL 129
How it works… 126 Getting ready 129
x Table of Contents

How to do it… 130 Getting ready 146

How it works… 132 How to do it… 146
There’s more… 133 How it works… 148
See also 133 There’s more… 149
See also 150
Connecting to a NoSQL
database (MongoDB) 134 Ingesting data from MongoDB
Getting ready 134 using PySpark 150
How to do it… 136 Getting ready 150
How it works… 142 How to do it… 151
There’s more… 144 How it works… 153
See also 145 There’s more… 155
See also 156
Creating our NoSQL table in
MongoDB145 Further reading 157

6
Using PySpark with Deﬁned and Non-Deﬁned Schemas 159
Technical requirements 159 How to do it… 169
Applying schemas to data ingestion 160 How it works… 170

Getting ready 160 Ingesting unstructured data with a

How to do it… 162 well-deﬁned schema and format 172
How it works… 163 Getting ready 172
There’s more… 165 How to do it… 172
See also 165 How it works… 174
Importing structured data using a There’s more… 176
well-deﬁned schema 165 See also 176

Getting ready 165 Inserting formatted SparkSession

How to do it… 165 logs to facilitate your work 176
How it works… 167 Getting ready 176
There’s more… 168 How to do it… 176
See also 169 How it works… 178
Importing unstructured data without There’s more… 179
a schema 169 See also 179

Getting ready… 169 Further reading 179

Table of Contents xi

7
Ingesting Analytical Data 181
Technical requirements 181 How it works… 197
Ingesting Parquet ﬁles 182 There’s more… 198
See also 200
Getting ready 182
How to do it… 183 Ingesting partitioned data 200
How it works… 184 Getting ready 200
There’s more… 185 How to do it… 201
See also 185 How it works… 201
Ingesting Avro files 185 There’s more… 203
See also 204
Getting ready 186
How to do it… 186 Applying reverse ETL 204
How it works… 188 Getting ready 204
There’s more… 190 How to do it… 205
See also 190 How it works… 206
Applying schemas to analytical data 191 There’s more… 207
See also 207
Getting ready 191
How to do it… 191 Selecting analytical data for reverse
How it works… 194 ETL207
There’s more… 194 Getting ready 207
See also 195 How to do it… 208
Filtering data and handling common How it works… 209
issues195 See also 210

Getting ready 195 Further reading 210

How to do it… 196

Part 2: Structuring the Ingestion Pipeline

8
Designing Monitored Data Workﬂows 213
Technical requirements 213 Getting ready 214
Inserting logs 214 How to do it… 214
How it works… 216
xii Table of Contents

Creating parallel ingest tasks 270 Getting ready 274

Getting ready 270 How to do it… 275
How to do it… 271 How it works… 277
How it works… 272 There's more… 278
There's more… 273 See also 279
See also 274
Further reading 279
Deﬁning ingest-dependent DAGs 274

10
Logging and Monitoring Your Data Ingest in Airflow 281
Technical requirements 281 Designing advanced monitoring 304
Installing and running Airflow 282 Getting ready 304
How to do it… 306
Creating basic logs in Airflow 283
How it works… 308
Getting ready 284
There’s more… 309
How to do it… 284
See also 309
How it works… 287
See also 289 Using notification operators 309
Getting ready 310
Storing log files in a remote location 289
How to do it… 312
Getting ready 289
How it works… 315
How to do it… 290
There’s more… 318
How it works… 298
See also 299 Using SQL operators for data quality 318
Getting ready 318
Configuring logs in airflow.cfg 299
How to do it… 320
Getting ready 299
How it works… 321
How to do it… 299
There’s more… 323
How it works… 301
See also 323
There’s more… 303
See also 304 Further reading 324

11
Automating Your Data Ingestion Pipelines 325
Technical requirements 325 Scheduling daily ingestions 326
Installing and running Airflow 326 Getting ready 327
xiv Table of Contents

How to do it… 327 There's more… 340

How it works… 330
Setting up the schedule_interval
There's more… 330
parameter340
See also 331
Getting ready 340
Scheduling historical data ingestion 331 How to do it… 341
Getting ready 331 How it works… 342
How to do it… 332 See also 342
How it works… 335
Solving scheduling errors 343
There's more… 336
Getting ready 343
Scheduling data replication 337 How to do it… 343
Getting ready 337 How it works… 346
How to do it… 338 There’s more… 347
How it works… 339
Further reading 347

12
Using Data Observability for Debugging, Error Handling,
and Preventing Downtime 349
Technical requirements 349 Getting ready 358
Docker images 350 How to do it… 358
How it works… 361
Setting up StatsD for monitoring 351 There’s more… 363
Getting ready 351
How to do it… 351 Creating an observability dashboard 363
How it works… 353 Getting ready 363
See also 354 How to do it… 363
How it works… 369
Setting up Prometheus for storing There’s more… 370
metrics354
Getting ready 354 Setting custom alerts or notiﬁcations 370
How to do it… 354 Getting ready 371
How it works… 356 How to do it… 371
There’s more… 357 How it works… 377

Setting up Grafana for monitoring 358 Further reading 378

Index379

Other Books You May Enjoy 388

Preface
Welcome to Data Ingestion with Python Cookbook. I hope you are excited as me to enter the world
of data engineering.
Data Ingestion with Python Cookbook is a practical guide that will empower you to design and implement
efficient data ingestion pipelines. With real-world examples and renowned open-source tools, this
book addresses your queries and hurdles head-on.
Beginning with designing pipelines, you’ll explore working with and without data schemas, constructing
monitored workflows using Airflow, and embracing data observability principles while adhering
to best practices. Tackling the challenges of reading diverse data sources and formats, you’ll gain a
comprehensive understanding of all these.
Our journey continues with essential insights into error logging, identification, resolution, data
orchestration, and effective monitoring. You’ll discover optimal approaches for storing logs, ensuring
easy access and references for them in the future.
By the end of this book, you’ll possess a fully automated setup to initiate data ingestion and pipeline
monitoring. This streamlined process will seamlessly integrate into the subsequent stages of the Extract,
Transform, and Load (ETL) process, propelling your data integration capabilities to new heights. Get
ready to embark on an enlightening and transformative data ingestion journey.

Who this book is for

This comprehensive book is specifically designed for Data Engineers, Data Integration Specialists, and
passionate data enthusiasts seeking a deeper understanding of data ingestion processes, data flows,
and the typical challenges encountered along the way. It provides valuable insights, best practices, and
practical knowledge to enhance your skills and proficiency in handling data ingestion tasks effectively.
Whether you are a beginner in the data world or an experienced developer, this book will suit you.
It is recommended to know the Python programming fundamentals and have basic knowledge of
Docker to read and run this book’s code.

What this book covers

Chapter 1, Introduction to Data Ingestion, introduces you to data ingestion best practices and the
challenges of working with diverse data sources. It explains the importance of the tools covered in
the book, presents them, and provides installation instructions.
xvi Preface

Chapter 2, Data Access Principals – Accessing your Data, explores data access concepts related to data
governance, covering workflows and management of familiar sources such as SFTP servers, APIs,
and cloud providers. It also provides examples of creating data access policies in databases, data
warehouses, and the cloud.
Chapter 3, Data Discovery – Understanding Our Data Before Ingesting It, teaches you the significance of
carrying out the data discovery process before data ingestion. It covers manual discovery, documentation,
and using an open-source tool, OpenMetadata, for local configuration.
Chapter 4, Reading CSV and JSON Files and Solving Problems, introduces you to ingesting CSV and JSON
files using Python and PySpark. It demonstrates handling varying data volumes and infrastructures
while addressing common challenges and providing solutions.
Chapter 5, Ingesting Data from Structured and Unstructured Databases, covers fundamental concepts
of relational and non-relational databases, including everyday use cases. You will learn how to read
and handle data from these models, understand vital considerations, and troubleshoot potential errors.
Chapter 6, Using PySpark with Defined and Non-Defined Schemas, delves deeper into common PySpark
use cases, focusing on handling defined and non-defined schemas. It also explores reading and
understanding complex logs from Spark (PySpark core) and formatting techniques for easier debugging.
Chapter 7, Ingesting Analytical Data, introduces you to analytical data and common formats for reading
and writing. It explores reading partitioned data for improved performance and discusses Reverse
ETL theory with real-life application workflows and diagrams.
Chapter 8, Designing Monitored Data Workﬂows, covers logging best practices for data ingestion,
facilitating error identification, and debugging. Techniques such as monitoring file size, row count,
and object count enable improved monitoring of dashboards, alerts, and insights.
Chapter 9, Putting Everything Together with Airﬂow, consolidates the previously presented information
and guides you in building a real-life data ingestion application using Airflow. It covers essential
components, configuration, and issue resolution in the process.
Chapter 10, Logging and Monitoring Your Data Ingest in Airflow, explores advanced logging and
monitoring in data ingestion with Airflow. It covers creating custom operators, setting up notifications,
and monitoring for data anomalies. Configuration of notifications for tools such as Slack is also covered
to stay updated on the data ingestion process.
Chapter 11, Automating Your Data Ingestion Pipelines, focuses on automating data ingests using
previously learned best practices, enabling reader autonomy. It addresses common challenges with
schedulers or orchestration tools and provides solutions to avoid problems in production clusters.
Chapter 12, Using Data Observability for Debugging, Error Handling, and Preventing Downtime,
explores data observability concepts, popular monitoring tools such as Grafana, and best practices
for log storage and data lineage. It also covers creating visualization graphs to monitor data source
issues using Airflow configuration and data ingestion scripts.
Preface xvii

To get the most out of this book

To execute the code in this book, you must have at least a basic knowledge of Python. We will use
Python as the core language to execute the code. The code examples have been tested using Python
3.8. However, it is expected to still work with future language versions.
Along with Python, this book uses Docker to emulate data systems and applications in our local
machine, such as PostgreSQL, MongoDB, and Airflow. Therefore, a basic knowledge of Docker is
recommended to edit container image files and run and stop containers.
Please, remember that some command-line commands may need adjustments depending on your local
settings or operating system. The commands in the code examples are based on the Linux command-
line syntax and might need some adaptations to run on Windows PowerShell.

Software/Hardware covered in the book OS Requirements

Python 3.8 or higher Windows, Mac OS X, and Linux (any)

Docker Engine 24.0 / Docker Desktop 4.19 Windows, Mac OS X, and Linux (any)

For almost all recipes in this book, you can use a Jupyter Notebook to execute the code. Even though it
is not mandatory to install it, this tool can help you to test the code and try new things on the code due
to the friendly interface.
If you are using the digital version of this book, we advise you to type the code yourself or access
the code via the GitHub repository (link available in the next section). Doing so will help you
avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/
PacktPublishing/Data-Ingestion-with-Python-Cookbook. In case there’s an update
to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://
github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You
can download it here: https://packt.link/xwl0U
xviii Preface

Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file
extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Then
we proceeded with the with open statement.”
A block of code is set as follows:

def gets_csv_first_line (csv_file):

    logging.info(f"Starting function to read first line")
    try:
        with open(csv_file, 'r') as file:
            logging.info(f"Reading file")

Any command-line input or output is written as follows:

$ python3 –-version
Python 3.8.10

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words
in menus or dialog boxes appear in the text like this. Here is an example: “Then, when we selected
showString at NativeMethodAccessorImpl.java:0, which redirected us to the
Stages page.”

Tips or important notes

Appear like this.

Sections
In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How
it works..., There’s more..., and See also).
To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready
This section tells you what to expect in the recipe and describes how to set up any software or any
preliminary settings required for the recipe.

How to do it…
This section contains the steps required to follow the recipe.
Preface xix

How it works…
This section usually consists of a detailed explanation of what happened in the previous section.

There’s more…
This section consists of additional information about the recipe in order to make you more knowledgeable
about the recipe.

See also
This section provides helpful links to other useful information for the recipe.

Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the
subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen.
If you have found a mistake in this book, we would be grateful if you would report this to us. Please
visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata
Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would
be grateful if you would provide us with the location address or website name. Please contact us at
[email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you
are interested in either writing or contributing to a book, please visit authors.packtpub.com.
xx Preface

Share Your Thoughts

Once you’ve read Data Ingestion with Python Cookbook, we’d love to hear your thoughts! Please
click here to go straight to the Amazon review page for this book and share
your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering
excellent quality content.
Preface xxi

Download a free PDF copy of this book

Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical
books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content
in your inbox daily
Follow these simple steps to get the benefits:

1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781837632602

2. Submit your proof of purchase

3. That’s it! We’ll send your free PDF and other benefits to your email directly
Part 1:
Fundamentals
of Data Ingestion

In this part, you will be introduced to the fundamentals of data ingestion and data engineering,
passing through the basic definition of an ingestion pipeline, the common types of data sources, and
the technologies involved.
This part has the following chapters:

• Chapter 1, Introduction to Data Ingestion

• Chapter 2, Principals of Data Access – Accessing Your Data
• Chapter 3, Data Discovery – Understanding Our Data Before Ingesting It
• Chapter 4, Reading CSV and JSON Files and Solving Problems
• Chapter 5, Ingesting Data from Structured and Unstructured Databases
• Chapter 6, Using PySpark with Deﬁned and Non-Deﬁned Schemas
• Chapter 7, Ingesting Analytical Data
1
Introduction to Data Ingestion
Welcome to the fantastic world of data! Are you ready to embark on a thrilling journey into data
ingestion? If so, this is the perfect book to start! Ingesting data is the first step into the big data world.
Data ingestion is a process that involves gathering and importing data and also storing it properly
so that the subsequent extract, transform, and load (ETL) pipeline can utilize the data. To make it
happen, we must be cautious about the tools we will use and how to configure them properly.
In our book journey, we will use Python and PySpark to retrieve data from different data sources
and learn how to store them properly. To orchestrate all this, the basic concepts of Airflow will be
implemented, along with efficient monitoring to guarantee that our pipelines are covered.
This chapter will introduce some basic concepts about data ingestion and how to set up your
environment to start the tasks.
In this chapter, you will build and learn the following recipes:

• Setting up Python and the environment

• Installing PySpark
• Configuring Docker for MongoDB
• Configuring Docker for Airflow
• Logging libraries
• Creating schemas
• Applying data governance in ingestion
• Implementing data replication
4 Introduction to Data Ingestion

Technical requirements
The commands inside the recipes of this chapter use Linux syntax. If you don’t use a Linux-based
system, you may need to adapt the commands:

• Docker or Docker Desktop

• The SQL client of your choice (recommended); we recommend DBeaver, since it has a
community-free version

You can find the code from this chapter in this GitHub repository: https://github.com/
PacktPublishing/Data-Ingestion-with-Python-Cookbook.

Note
Windows users might get an error message such as Docker Desktop requires a newer WSL
kernel version. This can be fixed by following the steps here: https://docs.docker.
com/desktop/windows/wsl/.

Setting up Python and its environment

In the data world, languages such as Java, Scala, or Python are commonly used. The first two languages
are used due to their compatibility with the big data tools environment, such as Hadoop and Spark,
the central core of which runs on a Java Virtual Machine (JVM). However, in the past few years, the
use of Python for data engineering and data science has increased significantly due to the language’s
versatility, ease of understanding, and many open source libraries built by the community.

Getting ready
Let’s create a folder for our project:

1. First, open your system command line. Since I use the Windows Subsystem for Linux (WSL),
I will open the WSL application.
2. Go to your home directory and create a folder as follows:
$ mkdir my-project

3. Go inside this folder:

$ cd my-project

4. Check your Python version on your operating system as follows:

$ python -–version
Setting up Python and its environment 5

Depending on your operational system, you might or might not have output here – for example,
WSL 20.04 users might have the following output:
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3

If your Python path is configured to use the python command, you will see output similar
to this:
Python 3.9.0

Sometimes, your Python path might be configured to be invoked using python3. You can
try it using the following command:
$ python3 --version

The output will be similar to the python command, as follows:

Python 3.9.0

5. Now, let’s check our pip version. This check is essential, since some operating systems have
more than one Python version installed:
$ pip --version

You should see similar output:

pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.9)

If your operating system (OS) uses a Python version below 3.8x or doesn’t have the language
installed, proceed to the How to do it steps; otherwise, you are ready to start the following Installing
PySpark recipe.

How to do it…
We are going to use the official installer from Python.org. You can find the link for it here: https://
www.python.org/downloads/:

Note
For Windows users, it is important to check your OS version, since Python 3.10 may not be
yet compatible with Windows 7, or your processor type (32-bits or 64-bits).

1. Download one of the stable versions.

At the time of writing, the stable recommended versions compatible with the tools and resources
presented here are 3.8, 3.9, and 3.10. I will use the 3.9 version and download it using the
following link: https://www.python.org/downloads/release/python-390/.
Scrolling down the page, you will find a list of links to Python installers according to OS, as shown
in the following screenshot.
6 Introduction to Data Ingestion

Figure 1.1 – Python.org download files for version 3.9

2. After downloading the installation file, double-click it and follow the instructions in the wizard
window. To avoid complexity, choose the recommended settings displayed.
The following screenshot shows how it looks on Windows:

Figure 1.2 – The Python Installer for Windows

Setting up Python and its environment 7

3. If you are a Linux user, you can install it from the source using the following commands:
$ wget https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz

$ tar -xf Python-3.9.1.tgz

$ ./configure –enable-optimizations

$ make -j 9

After installing Python, you should be able to execute the pip command. If not, refer to the pip
official documentation page here: https://pip.pypa.io/en/stable/installation/.

How it works…
Python is an interpreted language, and its interpreter extends several functions made with C or
C++. The language package also comes with several built-in libraries and, of course, the interpreter.
The interpreter works like a Unix shell and can be found in the usr/local/bin directory: https://
docs.python.org/3/tutorial/interpreter.html.
Lastly, note that many Python third-party packages in this book require the pip command to be
installed. This is because pip (an acronym for Pip Installs Packages) is the default package manager
for Python; therefore, it is used to install, upgrade, and manage the Python packages and dependencies
from the Python Package Index (PyPI).

There’s more…
Even if you don’t have any Python versions on your machine, you can still install them using the
command line or HomeBrew (for macOS users). Windows users can also download them from the
MS Windows Store.

Note
If you choose to download Python from the Windows Store, ensure you use an application
made by the Python Software Foundation.

See also
You can use pip to install convenient third-party applications, such as Jupyter. This is an open source,
web-based, interactive (and user-friendly) computing platform, often used by data scientists and data
engineers. You can install it from the official website here: https://jupyter.org/install.
8 Introduction to Data Ingestion

Installing PySpark
To process, clean, and transform vast amounts of data, we need a tool that provides resilience and
distributed processing, and that’s why PySpark is a good fit. It gets an API over the Spark library that
lets you use its applications.

Getting ready
Before starting the PySpark installation, we need to check our Java version in our operational system:

1. Here, we check the Java version:

$ java -version

You should see output similar to this:

openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-
0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

If everything is correct, you should see the preceding message as the output of the command
and the OpenJDK 18 version or higher. However, some systems don’t have any Java version
installed by default, and to cover this, we need to proceed to step 2.
2. Now, we download the Java Development Kit (JDK).
Go to https://www.oracle.com/java/technologies/downloads/, select
your OS, and download the most recent version of JDK. At the time of writing, it is JDK 19.
The download page of the JDK will look as follows:
Installing PySpark 9

Figure 1.3 – The JDK 19 downloads official web page

Execute the downloaded application. Click on the application to start the installation process.
The following window will appear:

Note
Depending on your OS, the installation window may appear slightly different.
10 Introduction to Data Ingestion

Figure 1.4 – The Java installation wizard window

Click Next for the following two questions, and the application will start the installation.
You don’t need to worry about where the JDK will be installed. By default, the application is
configured, as standard, to be compatible with other tools’ installations.
3. Next, we again check our Java version. When executing the command again, you should see
the following version:
$ java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-
0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

How to do it…
Here are the steps to perform this recipe:

1. Install PySpark from PyPi:

$ pip install pyspark
Installing PySpark 11

If the command runs successfully, the installation output’s last line will look like this:
Successfully built pyspark
Installing collected packages: py4j, pyspark
Successfully installed py4j-0.10.9.5 pyspark-3.3.2

2. Execute the pyspark command to open the interactive shell. When executing the pyspark
command in your command line, you should see this message:
$ pyspark
Python 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more
information.
22/10/08 15:06:11 WARN Utils: Your hostname, DESKTOP-DVUDB98
resolves to a loopback address: 127.0.1.1; using 172.29.214.162
instead (on interface eth0)
22/10/08 15:06:11 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address
22/10/08 15:06:13 WARN NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes
where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-
defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For
SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/

Using Python version 3.8.10 (default, Jun 22 2022 20:18:18)

Spark context Web UI available at http://172.29.214.162:4040
Spark context available as 'sc' (master = local[*], app id =
local-1665237974112).
SparkSession available as 'spark'.
>>>

You can observe some interesting messages here, such as the Spark version and the Python
used from PySpark.
3. Finally, we exit the interactive shell as follows:
>>> exit()
$
12 Introduction to Data Ingestion

How it works…
As seen at the beginning of this recipe, Spark is a robust framework that runs on top of the JVM. It is
also an open source tool for creating resilient and distributed processing output from vast data. With
the growth in popularity of the Python language in the past few years, it became necessary to have a
solution that adapts Spark to run alongside Python.
PySpark is an interface that interacts with Spark APIs via Py4J, dynamically allowing Python code to
interact with the JVM. We first need to have Java installed on our OS to use Spark. When we install
PySpark, it already comes with Spark and Py4J components installed, making it easy to start the
application and build the code.

There’s more…
Anaconda is a convenient way to install PySpark and other data science tools. This tool encapsulates all
manual processes and has a friendly interface for interacting with and installing Python components,
such as NumPy, pandas, or Jupyter:

1. To install Anaconda, go to the official website and select Products | Anaconda

Distribution: https://www.anaconda.com/products/distribution.
2. Download the distribution according to your OS.

For more detailed information about how to install Anaconda and other powerful commands, refer
to https://docs.anaconda.com/.

Using virtualenv with PySpark

It is possible to configure and use virtualenv with PySpark, and Anaconda does it automatically
if you choose this type of installation. However, for the other installation methods, we need to make
some additional steps to make our Spark cluster (locally or on the server) run it, which includes
indicating the virtualenv /bin/ folder and where your PySpark path is.

See also
There is a nice article about this topic, Using VirtualEnv with PySpark, by jzhang, here: https://
community.cloudera.com/t5/Community-Articles/Using-VirtualEnv-with-
PySpark/ta-p/245932.
Configuring Docker for MongoDB 13

Configuring Docker for MongoDB

MongoDB is a Not Only SQL (NoSQL) document-oriented database, widely used to store Internet
of Things (IoT) data, application logs, and so on. A NoSQL database is a non-relational database that
stores unstructured data differently from relational databases such as MySQL or PostgreSQL. Don’t
worry too much about this now; we will cover it in more detail in Chapter 5.
Your cluster production environment can handle huge amounts of data and create resilient data storage.

Getting ready
Following the good practice of code organization, let’s start creating a folder inside our project to
store the Docker image:
Create a folder inside our project directory to store the MongoDB Docker image and data as follows:
my-project$ mkdir mongo-local
my-project$ cd mongo-local

How to do it…
Here are the steps to try out this recipe:

1. First, we pull the Docker image from Docker Hub as follows:

my-project/mongo-local$ docker pull mongo

You should see the following message in your command line:

Using default tag: latest
latest: Pulling from library/mongo
(...)
bc8341d9c8d5: Pull complete
(...)
Status: Downloaded newer image for mongo:latest
docker.io/library/mongo:latest

Note
If you are a WSL user, an error might occur if you use the WSL 1 version instead of version 2.
You can easily fix this by following the steps here: https://learn.microsoft.com/
en-us/windows/wsl/install.
14 Introduction to Data Ingestion

2. Then, we run the MongoDB server as follows:

my-project/mongo-local$ docker run \
--name mongodb-local \
-p 27017:27017 \
-e MONGO_INITDB_ROOT_USERNAME="your_username" \
-e MONGO_INITDB_ROOT_PASSWORD="your_password"\
-d mongo:latest

We then check our server. To do this, we can use the command line to see which Docker
images are running:
my-project/mongo-local$ docker ps

We then see this on the screen:

Figure 1.5 – MongoDB and Docker running container

We can even check on the Docker Desktop application to see whether our container is running:

Figure 1.6 – The Docker Desktop vision of the MongoDB container running

3. Finally, we need to stop our container. We need to use Container ID to stop the container,
which we previously saw when checking the Docker running images. We will rerun it in Chapter 5:
my-project/mongo-local$ docker stop 427cc2e5d40e

How it works…
MongoDB’s architecture uses the concept of distributed processing, where the main node interacts with
clients’ requests, such as queries and document manipulation. It distributes the requests automatically
among its shards, which are a subset of a larger data collection here.
Configuring Docker for MongoDB 15

Figure 1.7 – MongoDB architecture

Since we may also have other running projects or software applications inside our machine, isolating
any database or application server used in development is a good practice. In this way, we ensure
nothing interferes with our local servers, and the debug process can be more manageable.
This Docker image setting creates a MongoDB server locally and even allows us to make additional
changes if we want to simulate any other scenario for testing or development.
The commands we used are as follows:

• The --name command defines the name we give to our container.

• The -p command specifies the port our container will open so that we can access it
via localhost:27017.
• -e command defines the environment variables. In this case, we set the root username and
password for our MongoDB container.
• -d is detached mode – that is, the Docker process will run in the background, and we will
not see input or output. However, we can still use docker ps to check the container status.
• mongo:latest indicates Docker pulling this image’s latest version.
16 Introduction to Data Ingestion

There’s more…
For frequent users, manually configuring other parameters for the MongoDB container, such as the
version, image port, database name, and database credentials, is also possible.
A version of this image with example values is also available as a docker-compose file in the official
documentation here: https://hub.docker.com/_/mongo.
The docker-compose file for MongoDB looks similar to this:
# Use your own values for username and password
version: '3.1'

services:

  mongo:
    image: mongo
    restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example

  mongo-express:
    image: mongo-express
    restart: always
    ports:
      - 8081:8081
    environment:
      ME_CONFIG_MONGODB_ADMINUSERNAME: root
      ME_CONFIG_MONGODB_ADMINPASSWORD: example
      ME_CONFIG_MONGODB_URL: mongodb://root:example@mongo:27017/

See also
You can check out MongoDB at the complete Docker Hub documentation here: https://hub.
docker.com/_/mongo.

Configuring Docker for Airflow

In this book, we will use Airflow to orchestrate data ingests and provide logs to monitor our pipelines.
Airflow can be installed directly on your local machine and any server using PyPi (https://pypi.
org/project/apache-airflow/) or a Docker container (https://hub.docker.com/r/
apache/airflow). An official and supported version of Airflow can be found on Docker Hub,
and the Apache Foundation community maintains it.
Configuring Docker for Airflow 17

However, there are some additional steps to configure our Airflow. Thankfully, the Apache Foundation
also has a docker-compose file that contains all other requirements to make Airflow work. We
just need to complete a few more steps.

Getting ready
Let’s start by initializing our Docker application on our machine. You can use the desktop version or
the CLI command.
Make sure you are inside your project folder for this. Create a folder to store Airflow internal components
and the docker-compose.yaml file:
my-project$ mkdir airflow-local
my-project$ cd airflow-local

How to do it…
1. First, we fetch the docker-compose.yaml file directly from the Airflow official docs:
my-project/airflow-local$ curl -LfO 'https://airflow.apache.org/
docs/apache-airflow/2.3.0/docker-compose.yaml'

You should see output like this:

Figure 1.8 – Airflow container image download progress

Note
Check the most stable version of this docker-compose file when you download it, since
new, more appropriate versions may be available after this book is published.

2. Next, we create the dags, logs, and plugins folders as follows:

my-project/airflow-local$ mkdir ./dags ./logs ./plugins

3. Then, we create and set the Airflow user as follows:

my-project/airflow-local$ echo -e "AIRFLOW_UID=$(id -u)\
nAIRFLOW_GID=0" > .env
18 Introduction to Data Ingestion

Note
If you have any error messages related to the AIRFLOW_UID variable, you can create a .env
file in the same folder where your docker-compose.yaml file is and define the variable
as AIRFLOW_UID=50000.

4. Then, we initialize the database:

my-project/airflow-local$ docker-compose up airflow-init

After executing the command, you should see output similar to this:
Creating network "airflow-local_default" with the default driver
Creating volume "airflow-local_postgres-db-volume" with default
driver
Pulling postgres (postgres:13)...
13: Pulling from library/postgres
(...)
Status: Downloaded newer image for postgres:13
Pulling redis (redis:latest)...
latest: Pulling from library/redis
bd159e379b3b: Already exists
(...)
Status: Downloaded newer image for redis:latest
Pulling airflow-init (apache/airflow:2.3.0)...
2.3.0: Pulling from apache/airflow
42c077c10790: Pull complete
(...)
Status: Downloaded newer image for apache/airflow:2.3.0
Creating airflow-local_postgres_1 ... done
Creating airflow-local_redis_1    ... done
Creating airflow-local_airflow-init_1 ... done
Attaching to airflow-local_airflow-init_1
(...)
airflow-init_1       | [2022-10-09 09:49:26,250] {manager.
py:213} INFO - Added user airflow
airflow-init_1       | User "airflow" created with role "Admin"
(...)
airflow-local_airflow-init_1 exited with code 0

5. Then, we start the Airflow service:

my-project/airflow-local$ docker-compose up

6. Then, we need to check the Docker processes. Using the following CLI command, you will see
the Docker images running:
my-project/airflow-local$ docker ps
Configuring Docker for Airflow 19

These are the images we see:

Figure 1.9 – The docker ps command output

In the Docker Desktop application, you can also see the same containers running but with a
more friendly interface:

Figure 1.10 – A Docker desktop view of the Airflow containers running

20 Introduction to Data Ingestion

7. Then, we access Airflow in a web browser:

In your preferred browser, type http://localhost:8080/home. The following screen
will appear:

Figure 1.11 – The Airflow UI login page

8. Then, we log in to the Airflow platform. Since it’s a local application used for testing and
learning, the default credentials (username and password) for administrative access in Airflow
are airflow.
When logged in, the following screen will appear:
Configuring Docker for Airflow 21

Figure 1.12 – The Airflow UI main page

9. Then, we stop our containers. We can stop our containers until we reach Chapter 9, when we
will explore data ingest in Airflow:
my-project/airflow-local$ docker-compose stop

How it works…
Airflow is an open source platform that allows batch data pipeline development, monitoring, and
scheduling. However, it requires other components, such as an internal database, to store metadata to
work correctly. In this example, we use PostgreSQL to store the metadata and Redis to cache information.
All this can be installed directly in our machine environment one by one. Even though it seems quite
simple, it may not be due to compatibility issues with OS, other software versions, and so on.
Docker can create an isolated environment and provide all the requirements to make it work. With
docker-compose, it becomes even simpler, since we can create dependencies between the
components that can only be created if the others are healthy.
You can also open the docker-compose.yaml file we downloaded for this recipe and take a look
to explore it better. We will also cover it in detail in Chapter 9.
22 Introduction to Data Ingestion

See also
If you want to learn more about how this docker-compose file works, you can look at the Apache
Airflow official Docker documentation on the Apache Airflow documentation page: https://
airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/
index.html.

Creating schemas
Schemas are considered blueprints of a database or table. While some databases strictly require
schema definition, others can work without it. However, in some cases, it is advantageous to work
with data schemas to ensure that the application data architecture is maintained and can receive the
desired data input.

Getting ready
Let’s imagine we need to create a database for a school to store information about the students, the
courses, and the instructors. With this information, we know we have at least three tables so far.

Figure 1.13 – A table diagram for three entities

In this recipe, we will cover how schemas work using the Entity Relationship Diagram (ERD), a visual
representation of relationships between entities in a database, to exemplify how schemas are connected.

How to do it…
Here are the steps to try this:

1. We define the type of schema. The following figure helps us understand how to go about this:
Other documents randomly have
different content
THE WANDERER

Imitated from Goethe’s “Der Wanderer”

Wanderer

God’s grace be thine, young woman

And his, the boy who sucks
That breast of thine.
Here let me on the craggy scar,
In shade of the great elm,
My knapsack fling from me
And rest me by thy side.

Woman

What business urges thee

Now in the heat of day
Along this dusty path?
Bringest thou some city merchandise
Into the country round?
Thou smilest, stranger,
At this my question.

Wanderer

No city merchandise I bring,

Cool now the evening grows,
Show me the rills
Whence thou dost drink,
My good young woman.

Woman
Here, up the rocky path,
Go onward. Through the shrubs
The path runs by the cot
Wherein I dwell,
On to the rills
From whence I drink.

Wanderer

Traces of ordering human hands

Betwixt the underwood.
These stones thou hast not so disposed,
Nature—thou rich dispensatress.

Woman

Yet further up.

Wanderer

With moss o’erlaid, an architrave!

I recognize thee, plastic spirit,
Thou hast impressed thy seal upon the stone.

Woman

Further yet, stranger.

Wanderer
Lo, an inscription whereupon I tread,
But all illegible,
Worn out by wayfarers are ye,
Which should show forth your Master’s piety,
Unto a thousand children’s children.

Woman

In wonder, stranger, dost thou gaze

Upon these stones?
Up yonder round my cot
Are many such.

Wanderer

Up yonder?

Woman

Leftwards directly
On through the underwood,
Here!

Wanderer

Ye Muses! and ye Graces!

Woman

That is my cottage.

Wanderer
The fragments of a temple!

Woman

Here onwards on one side

The rivulet flows
From whence I drink.

Wanderer

Glowing, then hoverest

Above thy sepulchre,
Genius! Over thee
Is tumbled in a heap
Thy masterpiece,
O thou undying one!

Woman

Wait till I bring the vessel

That thou mayst drink.

Wanderer
Ivy hath clad around
Thy slender form divine.
How do ye upward strive
From out the wreck,
Twin columns!
And thou, the solitary sister there,
How do ye,
With sombre moss upon your sacred heads,
Gaze in majestic mourning down
Upon these scattered fragments
There at your feet,
Your kith and kin!
Where lie the shadows of the bramble bush,
Concealed by wrack and earth,
And the long grass wavers above.
Nature dost then so hold in price
Thy masterpiece’s masterpiece?
Dost thou, regardless, shatter thus
Thy sanctuary?
Dost sow the thistles therein?

Woman

How the boy sleeps!

Wouldst thou within the cottage rest,
Stranger? Wouldst here
Rather than ’neath the open heavens bide?
Now it is cool. Here, take the boy.
Let me go draw the water.
Sleep, darling, sleep!

Wanderer
Sweet is thy rest.
How, bathed in heavenly healthiness,
Restful he breathes!
Thou, born above the relics
Of a most sacred past,
Upon thee may its spirit rest.
He whom it environeth
Will in the consciousness of power divine
Each day enjoy.
Seedling so rich expand,
The shining spring’s
Resplendent ornament,
In presence of thy fellows shine,
And when the flower-sheathe fades and falls
May from thy bosom rise
The abounding fruit,
And ripening, front the sun.

Woman

God bless him—and ever still he sleeps.

Nought have I with this water clear
Except a piece of bread to offer thee.

Wanderer

I give thee thanks.

How gloriously all blooms around
And groweth green!

Woman
My husband soon
Home from the fields
Returns. Stay, stay, O man,
And eat with us thy evening bread.

Wanderer

Here do ye dwell?

Woman

There, between yonder walls,

The cot. My father builded it
Of brick, and of the wreckage stones.
Here do we dwell.
He gave me to a husbandman,
And in our arms he died—
Sweetheart—and hast thou slept?
How bright he is—and wants to play.
My rogue!

Wanderer
O Nature! everlastingly conceiving.
Each one thou bearest for the joy of life,
All of thy babes thou hast endowed
Lovingly with a heritage—a Name.
High on the cornice doth the swallow build,
Of what an ornament she hides
All unaware.
The caterpillar round the golden bough
Spins her a winter quarters for her young.
Thus dost thou patch in ’twixt the august
Fragments of bygone time
For needs of thine—for thy own needs
A hut. O men—
Rejoicing over graves.
Farewell, thou happy wife.

Woman

Thou wilt not stay?

Wanderer

God keep you safe

And bless your boy.

Woman

A happy wayfaring!

Wanderer

Where doth the pathway lead me

Over the mountain there?
Woman

To Cuma.

Wanderer

How far is it hence?

Woman

’Tis three good miles.

Wanderer

Farewell!
O Nature! guide my way,
The stranger’s travel-track
Which over graves
Of sacred times foregone
I still pursue.
Me to some covert guide,
Sheltered against the north,
And where from noontide’s glare
A poplar grove protects.
And when at eve I turn
Home to the hut,
Made golden with the sun’s last beam,
Grant that such wife may welcome me,
The boy upon her arm.
IMITATED FROM GOETHE’S “ALEXIS AND DORA”
Ah, without stop or stay the ship still momently presses
On through the foaming deep, further and further from shore.
Far-traced the furrow is cut by the keel, and in it the dolphins
Bounding follow as though prey were before them in flight.
All betokens a fortunate voyage; light-hearted the shipman
Gently handles the sail that takes on it labour for all.
Forward as pennon and streamer presses the voyager’s spirit,
One alone by the mast stands reverted and sad.
Mountains already blue he sees departing, he sees them
Sink in the sea, while sinks every joy from his gaze.
Also for thee has vanished the ship that bears thy Alexis,
Robs thee, O Dora, of friend, robs thee of, ah! the betrothed.
Thou, too, gazest in vain after me. Our hearts are still beating
For one another, but ah! on one another no more.
Single moment wherein I have lived, thou weigh’st in the balance
More than all days erewhile coldly squandered by me.
Ah, in that moment alone, the last, arose in my bosom
Life unhoped for in thee, come down as a gift from the Gods.
Now in vain dost thou with thy light make glorious the æther,
Thy all-illumining day—Phœbus, by me is abhorred.
Back on myself I return, and fain would I there in the silence
Live o’er again the time when daily to me she appeared.
Was it possible beauty to see and never to feel it?
Did not the heavenly charm work on thy dullness of soul?
Blame not thyself, poor heart, so the poet proposes a riddle,
Artfully wrought into words oft to the ear of the crowd,
The network of images, lovely and strange, is a joy to the hearer,
Yet still there lacketh the word affirming the sense of the whole.
Is it at last disclosed, then every spirit is gladdened,
And in the verse perceives meaning of twofold delight.
Ah, why so late, O love, dost thou unbind from my forehead
Wrappings that darkened my eyes—why too late dost unbind?
Long time the freighted bark delayed for favouring breezes,
Fair at last rose the wind pressing off-shore to the sea.
Idle seasons of youth and idle dreams of the future
Ye have departed—for me only remaineth the hour;
Yes it remains the gladness remaining for me; Dora I hold thee
Yes, it remains the gladness remaining for me; Dora, I hold thee.
Hope to my gaze presents, Dora, thy image alone.
Often on thy way to the temple I saw thee gay-decked and decorous,
Stepped the good mother beside, all ceremonious and grave.
Quick-footed wert thou and eager, bearing thy fruit to the market,
Quitting the well, thy head how daringly balanced the jar;
There, lo! thy throat was shown, thy neck more fair than all others,
Fairer than others were shown the poise and play of thy limbs.
Ofttime I held me in fear for the totter and crash of the pitcher,
Yet upright ever it stood, there where the kerchief was pleached.
Fairest neighbour, yes, my wont it was to behold thee,
As we behold the stars, as we contemplate the moon.
In them rejoicing, while never once in the tranquil bosom,
Even in shadow of thought stirs the desire to possess.
Thus did ye pass, my years. But twenty paces asunder
Our dwellings, thine and mine, nor once on thy threshold I trod.
Now the hideous deep divides us! Ye lie to the heavens,
Billows! your lordly blue to me is the colour of night.
Already was everything in motion. A boy came running
Swift to my father’s house, calling me down to the shore.
“The sail is already hoisted; it flaps in the wind,” so spake he.
“Weighed with a lusty cheer the anchor parts from the sand.
Come, Alexis! O come!” And gravely, in token of blessing,
Laid my good father his hand on the clustering curls of the son.
Careful the mother reached me a bundle newly made ready;
“Come back happy!” they cried. “Come back happy and rich.”
So out of doors, the bundle under my arm, did I fling me,
And at the wall below, there by the garden gate,
Saw thee stand; thou smiledst upon me and spake’st. “Alexis,
Yonder clamouring folk, are these thy comrades aboard?
Distant shores thou visitest now and merchandise precious
Thou dost deal in, and jewels for the wealthy city dames.
Wilt thou not bring me also one little light chain? I would buy it
Thankfully. I have wished so oft to adorn me with this.”
Holding my own I stood and asked, in the way of a merchant,
First of the form, the weight exact, of the order thou gavest.
Modest in truth was the price thou assignedst. While gazing upon thee,
Neck and shoulders I saw worthy the jewels of our queen.
L d d d h f h hi Th id h ki dl
Louder sounded the cry from the ship. Then saidest thou kindly,
“Some of the garden fruit take thou with thee on thy way.
Take the ripest oranges—take white figs. The sea yields
Never a fruit at all. Nor doth every country give fruits.”
Thereon I stepped within; the fruit thou busily broughtest,
There in the gathered robe bearing a burden all gold.
Often I pleaded, “see this is enough,” and ever another
And fairer fruit down dropped, lightly touched, to thy hand.
Then at the last to the bower thou camest. There was a basket,
And the myrtle in bloom bent over thee, over me.
Skilfully didst thou begin to arrange the fruit and in silence.
First the orange, that lies heavy a globe of gold,
Then the tenderer fig, which slightest pressure will injure,
And with myrtle o’erlaid, fair adorned was the gift.
But I lifted it not. I stood, we looked one another
Full in the eyes. When straight the sight of my eyes waxed dim.
Thy bosom I felt on my own! and now my arm encircled
The stately neck, whereon thousandfold kisses I showered.
Sank thy head on my shoulder—by tender arms enfolded
As with a chain was he the man whom thou hast made blest.
The hands of Love I felt, he drew us with might together,
And thrice from a cloudless sky it thundered; and now there flowed
Tears from my eyes, down streaming, weeping wert thou. I wept,
And through sorrow and joy the world seemed to pass from our sense.
Ever more urgent their shoreward cry; but thither to bear me
My feet refused: I cried, “Dora, and art thou not mine?”
“For ever,” thou gently saidst. And thereon it seemed that our tears,
As by some breath divine, gently were blown from our eyes.
Nearer the cry “Alexis!” Then peered the boy, as he sought me,
In through the garden gate. How the basket he eyed.
How he constrained me. How I pressed thee once more by the hand.
How arrived I aboard? I know as one drunken I seemed.
Even so my companions took me to be; they bore with one ailing,
And already in haze of distance the city grew dim.
“For ever,” Dora, thy whisper was. In my ear it echoes
Even with the thunder of Zeus. There stood she by his throne,
She, his daughter, the Goddess of Love, and beside her the Graces.
So by the Gods confirmed this our union abides.
y
O then haste thee, our bark, with the favouring winds behind thee.
Labour, thou lusty keel, sunder the foaming flood!
Bring me to that strange haven; that so for me may the goldsmith
In his workshop anon fashion the heavenly pledge.
Ay, in truth, the chainlet shall grow to a chain, O Dora.
Nine times loosely wound shall it encircle thy neck.
Further, jewels most manifold will I procure for thee; golden
Bracelets also. My gifts richly shall deck thy hand.
There shall the ruby contend with the emerald; loveliest sapphire
Matched against jacinth shall stand, while with a setting of gold
Every gem may be held in a perfect union of beauty.
O what joy for the lover to grace with jewel and gold the beloved.
If pearls I view, my thought is of thee; there rises before me
With every ring the shape slender and fair of thy hand.
I will barter and buy, and out of them all the fairest
Thou shalt choose. I devote all my lading to thee.
But not jewel and gem alone shall thy lover procure thee.
What a housewife would choose, that will he bring with him too.
Coverlets delicate, woollen and purple, hemmed to make ready
A couch that grateful and soft fondly shall welcome the pair.
Lengths of the finest linen. Thou sittest and sewest and clothest
Me therein and thyself, and haply also a third.
Visions of hope delude my heart. Allay, O Divine Ones,
Flames of resistless desire wildly at work in my breast,
And yet I fain would recall delights that are bitter,
When care to me draws near, hideous, cold and unmoved.
Not the Erinnyes torch nor the baying of hounds infernal
Strikes such terror in him, the culprit in realms of despair,
As that phantom unmoved in me who shows me the fair one
Far away. Open stands even now the garden gate,
And another, not I, draws near—for him fruits are falling,
And for him, too, the fig strengthening honey retains.
Him too doth she draw to the bower. Does he follow? O sightless
Make me, O Gods! destroy the vision of memory in me.
Yes—a maiden is she—she who gives herself straight to one lover,
She to another who woes as speedily turns her around.
Laugh not, O Zeus, this time, at an oath audaciously broken—
Th nder more fiercel ! strike! et hold back th lightning shaft
Thunder more fiercely! strike! yet hold back thy lightning shaft.
Send on my trace the sagging clouds. In gloom as of night-time
Let thy bright lightning-flash strike this ill-fated mast.
Scatter the planks around and give to the raging waters
This my merchandise. Give me to the dolphins a prey.
Now ye Muses enough! In vain is your effort to image
How in a heart that loves alternate sorrow and joy.
Nor are ye able to heal those wounds which Love has inflicted,
Yet their assuagement comes, Gracious Ones, only from you.
Editor’s Note.—The four Goethe translations with which this
volume closes are taken from rough jottings, hardly more than
protoplasm.
They much need re-handling, which they cannot now receive. Many
lines are, as verse, defective for the ear ... yet some contain sufficient
beauty, as well as fidelity, in translation to justify, perhaps, their
preservation as fragments of unfinished work.
This does not apply to the other translations which were left by E.
D. in fair MS. as completed.

COLSTONS LIMITED, PRINTERS, EDINBURGH

*** END OF THE PROJECT GUTENBERG EBOOK POEMS ***

Updated editions will replace the previous one—the old editions

will be renamed.

Creating the works from print editions not protected by U.S.

copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the

free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and

Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only

be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project

Gutenberg:

1.E.1. The following sentence, with active links to, or other

immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United

States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is

derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is

posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project

Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute

this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,

performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or

providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who

notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of

any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project

Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend

considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except

for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you

discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set

forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied

warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the

Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission

of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the

assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project

Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500

West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws

regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states

where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot

make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current

donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About

Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several

printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,

including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
back
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge

connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and

personal growth every day!

ebookmasss.com

Packt - Hands On - Big.data - Analytics.with - Pyspark.2019
No ratings yet
Packt - Hands On - Big.data - Analytics.with - Pyspark.2019
253 pages
Python Parallel Programming Cookbook
From Everand
Python Parallel Programming Cookbook
Giancarlo Zaccone
5/5 (1)
Learning Windows Server Containers
From Everand
Learning Windows Server Containers
Srikanth Machiraju
No ratings yet
How To Train Your Dragon Scene Script
No ratings yet
How To Train Your Dragon Scene Script
5 pages
Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz - Instantly access the full ebook content in just a few seconds
100% (1)
Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz - Instantly access the full ebook content in just a few seconds
50 pages
Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz instant download
No ratings yet
Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz instant download
52 pages
PDF Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download
100% (4)
PDF Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download
41 pages
Immediate download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz ebooks 2024
100% (2)
Immediate download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz ebooks 2024
66 pages
Ebooks File Data Ingestion With Python Cookbook: A Practical Guide To Ingesting, Monitoring, and Identifying Errors in The Data Ingestion Process 1st Edition Esppenchutz All Chapters
100% (4)
Ebooks File Data Ingestion With Python Cookbook: A Practical Guide To Ingesting, Monitoring, and Identifying Errors in The Data Ingestion Process 1st Edition Esppenchutz All Chapters
57 pages
Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process Esppenchutz download
100% (2)
Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process Esppenchutz download
38 pages
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
No ratings yet
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
41 pages
Download full (Ebook) Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process by Esppenchutz, Glaucia ISBN 9781837632602, 183763260X, 9781837699602, 183993260X ebook all chapters
100% (14)
Download full (Ebook) Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process by Esppenchutz, Glaucia ISBN 9781837632602, 183763260X, 9781837699602, 183993260X ebook all chapters
65 pages
FastAPI Cookbook: Develop high-performance APIs and web applications with Python
From Everand
FastAPI Cookbook: Develop high-performance APIs and web applications with Python
Giunio De Luca
No ratings yet
(Ebook) Julia Cookbook by Jalem Raj Rohit ISBN 9781785882012, 1785882015 - The ebook in PDF format with all chapters is ready for download
100% (1)
(Ebook) Julia Cookbook by Jalem Raj Rohit ISBN 9781785882012, 1785882015 - The ebook in PDF format with all chapters is ready for download
47 pages
Download full (Ebook) Julia Cookbook by Jalem Raj Rohit ISBN 9781785882012, 1785882015 ebook all chapters
100% (3)
Download full (Ebook) Julia Cookbook by Jalem Raj Rohit ISBN 9781785882012, 1785882015 ebook all chapters
76 pages
Building a Web Application with PHP and MariaDB: A Reference Guide
From Everand
Building a Web Application with PHP and MariaDB: A Reference Guide
Sai Srinivas Sriparasa
No ratings yet
Pentaho Analytics for MongoDB Cookbook: Over 50 recipes to learn how to use Pentaho Analytics and MongoDB to create powerful analysis and reporting solutions
From Everand
Pentaho Analytics for MongoDB Cookbook: Over 50 recipes to learn how to use Pentaho Analytics and MongoDB to create powerful analysis and reporting solutions
Joel Andre Latino
No ratings yet
Machine Learning with R Cookbook 2nd Edition Bhatiapdf download
100% (1)
Machine Learning with R Cookbook 2nd Edition Bhatiapdf download
48 pages
PySide GUI Application Development - Second Edition
From Everand
PySide GUI Application Development - Second Edition
Jaganmohan Gopinath
No ratings yet
Machine Learning with R Cookbook 2nd Edition Bhatia - The complete ebook set is ready for download today
No ratings yet
Machine Learning with R Cookbook 2nd Edition Bhatia - The complete ebook set is ready for download today
80 pages
Spark Cookbook
From Everand
Spark Cookbook
Rishi Yadav
No ratings yet
Practical Big Data Analytics Hands on techniques to implement enterprise analytics and machine learning using Hadoop Spark NoSQL and R 1st Edition Nataraj Dasgupta pdf download
No ratings yet
Practical Big Data Analytics Hands on techniques to implement enterprise analytics and machine learning using Hadoop Spark NoSQL and R 1st Edition Nataraj Dasgupta pdf download
55 pages
Machine Learning with R Cookbook 2nd Edition Bhatia pdf download
No ratings yet
Machine Learning with R Cookbook 2nd Edition Bhatia pdf download
49 pages
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
From Everand
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
Prateek Gupta
No ratings yet
Download Full Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni PDF All Chapters
No ratings yet
Download Full Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni PDF All Chapters
41 pages
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
Spring Boot 3.0 Cookbook: Proven recipes for building modern and robust Java web applications with Spring Boot
From Everand
Spring Boot 3.0 Cookbook: Proven recipes for building modern and robust Java web applications with Spring Boot
Felip Miguel Puig
No ratings yet
Business Intelligence with Looker Cookbook: Create BI solutions and data applications to explore and share insights in real time
From Everand
Business Intelligence with Looker Cookbook: Create BI solutions and data applications to explore and share insights in real time
Khrystyna Grynko
No ratings yet
(Ebook) Julia Cookbook by Jalem Raj Rohit ISBN 9781785882012, 1785882015 download
100% (1)
(Ebook) Julia Cookbook by Jalem Raj Rohit ISBN 9781785882012, 1785882015 download
46 pages
Blazor Web Development Cookbook: Tested recipes for advanced single-page application scenarios in .NET 9
From Everand
Blazor Web Development Cookbook: Tested recipes for advanced single-page application scenarios in .NET 9
Pawel Bazyluk
No ratings yet
Node.js Cookbook: Practical recipes for building server-side web applications with Node.js 22
From Everand
Node.js Cookbook: Practical recipes for building server-side web applications with Node.js 22
Bethany Griggs
No ratings yet
Practical Big Data Analytics Hands on techniques to implement enterprise analytics and machine learning using Hadoop Spark NoSQL and R 1st Edition Nataraj Dasgupta all chapter instant download
100% (1)
Practical Big Data Analytics Hands on techniques to implement enterprise analytics and machine learning using Hadoop Spark NoSQL and R 1st Edition Nataraj Dasgupta all chapter instant download
40 pages
Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni - The latest ebook is available for instant download now
100% (1)
Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni - The latest ebook is available for instant download now
60 pages
Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit pdf download
100% (1)
Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit pdf download
55 pages
Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit download
100% (3)
Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit download
83 pages
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
From Everand
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
Dmitry Anoshin
No ratings yet
BIRT 2.6 Data Analysis and Reporting
From Everand
BIRT 2.6 Data Analysis and Reporting
John Ward
2/5 (1)
Python Data Cleaning and Preparation Best Practices: A practical guide to organizing and handling data from various sources and formats using Python
From Everand
Python Data Cleaning and Preparation Best Practices: A practical guide to organizing and handling data from various sources and formats using Python
Maria Zervou
No ratings yet
Git Best Practices Guide
From Everand
Git Best Practices Guide
Eric Pidoux
No ratings yet
(Ebook) Implementing Splunk 7 by James D. Miller ISBN 9781788836289, 1788836286 download
100% (1)
(Ebook) Implementing Splunk 7 by James D. Miller ISBN 9781788836289, 1788836286 download
53 pages
PostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases
From Everand
PostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases
Simon Riggs
No ratings yet
Instant Magento Performance Optimization How-to
From Everand
Instant Magento Performance Optimization How-to
Mathieu Nayrolles
No ratings yet
Download Complete Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit PDF for All Chapters
100% (4)
Download Complete Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit PDF for All Chapters
40 pages
HBase High Performance Cookbook
From Everand
HBase High Performance Cookbook
Ruchir Choudhry
No ratings yet
Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni download
No ratings yet
Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni download
51 pages
Machine Learning with R Cookbook 2nd Edition Bhatia - Download the ebook and start exploring right away
100% (1)
Machine Learning with R Cookbook 2nd Edition Bhatia - Download the ebook and start exploring right away
49 pages
Getting Started with Windows Server Security
From Everand
Getting Started with Windows Server Security
Santhosh Sivarajan
No ratings yet
Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit - Download the ebook and start exploring right away
100% (1)
Python Penetration Testing Essentials Techniques for ethical hacking with Python 2nd Edition Mohit - Download the ebook and start exploring right away
65 pages
Hudson 3 Essentials
From Everand
Hudson 3 Essentials
Lloyd H. Meinholz
No ratings yet
PostgreSQL 9 Administration Cookbook: LITE Edition
From Everand
PostgreSQL 9 Administration Cookbook: LITE Edition
Simon Riggs
3/5 (1)
Apache Mesos Cookbook Powerful Recipes And Useful Techniques To Get Started With Apache Mesos Blomquist instant download
No ratings yet
Apache Mesos Cookbook Powerful Recipes And Useful Techniques To Get Started With Apache Mesos Blomquist instant download
44 pages
Apache Spark for Data Science 1st Edition Padma Priya Chitturi pdf download
100% (1)
Apache Spark for Data Science 1st Edition Padma Priya Chitturi pdf download
50 pages
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
From Everand
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
Andrei Gheorghiu
No ratings yet
Instant Hands-on Testing with PHPUnit How-to
From Everand
Instant Hands-on Testing with PHPUnit How-to
Michael Lively
No ratings yet
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
From Everand
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
Partha Pritam Deka
No ratings yet
(Ebook) Amazon EC2 Cookbook by Reddy, Sekhar, Sarkar, Aurobindo ISBN 9781785280047, 178528004X - Download the entire ebook instantly and explore every detail
100% (1)
(Ebook) Amazon EC2 Cookbook by Reddy, Sekhar, Sarkar, Aurobindo ISBN 9781785280047, 178528004X - Download the entire ebook instantly and explore every detail
58 pages
Building Web Applications with Flask
From Everand
Building Web Applications with Flask
Italo Maia
No ratings yet
Machine Learning with Spark - Second Edition
From Everand
Machine Learning with Spark - Second Edition
Rajdeep Dua
No ratings yet
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
From Everand
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Michael Walker
5/5 (1)
Learning PHP Data Objects
From Everand
Learning PHP Data Objects
Dennis Popel
5/5 (1)
[FREE PDF sample] Machine Learning with R Cookbook Second Edition Analyze data and build predictive models Bhatia ebooks
100% (6)
[FREE PDF sample] Machine Learning with R Cookbook Second Edition Analyze data and build predictive models Bhatia ebooks
55 pages
Aristotle's Metaphysics
No ratings yet
Aristotle's Metaphysics
18 pages
Gicano Vs Gegato Full Text
No ratings yet
Gicano Vs Gegato Full Text
3 pages
Richard Pugoy Gonzales
No ratings yet
Richard Pugoy Gonzales
3 pages
Internal Propedeutics MCQ - S
No ratings yet
Internal Propedeutics MCQ - S
39 pages
Komatsu PC450LC-8 OM
No ratings yet
Komatsu PC450LC-8 OM
403 pages
Assessing The Role of Human Rights Protections For Sexual Minorities in HIV Prevention in Asia
No ratings yet
Assessing The Role of Human Rights Protections For Sexual Minorities in HIV Prevention in Asia
188 pages
Heidemann: - Albert Einstein
No ratings yet
Heidemann: - Albert Einstein
2 pages
Principles and Types of Speech
No ratings yet
Principles and Types of Speech
23 pages
Osleger, D. y Read, J., (1991) - Relation of Eustacy To Stacking Patterns of Metter-Scale Carbonate Cycles, Cambrian, USA
No ratings yet
Osleger, D. y Read, J., (1991) - Relation of Eustacy To Stacking Patterns of Metter-Scale Carbonate Cycles, Cambrian, USA
28 pages
Financial Aspect Feasibility Study
No ratings yet
Financial Aspect Feasibility Study
66 pages
Teks B.inggris PAS-WPS Office
No ratings yet
Teks B.inggris PAS-WPS Office
3 pages
Shaheerullah Fayzi High School Deploma
No ratings yet
Shaheerullah Fayzi High School Deploma
2 pages
3 Rock Worksheet Answers
100% (1)
3 Rock Worksheet Answers
1 page
Soal Sem Ganjil IX UTS
No ratings yet
Soal Sem Ganjil IX UTS
4 pages
VTP List As On 27.10.2011
No ratings yet
VTP List As On 27.10.2011
44 pages
Social Work
No ratings yet
Social Work
3 pages
Overcoming Psychologism Husserl and the Transcendental Reform of Psychology Full Text EPUB
100% (9)
Overcoming Psychologism Husserl and the Transcendental Reform of Psychology Full Text EPUB
15 pages
Class IX Online - CELL - 14-05-22
No ratings yet
Class IX Online - CELL - 14-05-22
5 pages
Kamlesh Parmar
No ratings yet
Kamlesh Parmar
5 pages
Isha Hatha Yoga - Program Registration Form
No ratings yet
Isha Hatha Yoga - Program Registration Form
2 pages
Fertility Improvement Programme for Therapeutic
No ratings yet
Fertility Improvement Programme for Therapeutic
4 pages
Stella Cottrell Study Skills Website Information
No ratings yet
Stella Cottrell Study Skills Website Information
11 pages
1183 1799 1 SM
No ratings yet
1183 1799 1 SM
18 pages
Nomenclature of Organic Compounds
100% (1)
Nomenclature of Organic Compounds
16 pages
Rationality and Reasonability Perelman PDF
No ratings yet
Rationality and Reasonability Perelman PDF
20 pages
Preethi.K: Senior Software Engineer
No ratings yet
Preethi.K: Senior Software Engineer
5 pages
Full download The case against the global economy and for a turn towards localization Mander pdf docx
No ratings yet
Full download The case against the global economy and for a turn towards localization Mander pdf docx
67 pages
Keralapolicy
No ratings yet
Keralapolicy
14 pages
EC201 ELECTRONIC DEVICES_END_MO23
No ratings yet
EC201 ELECTRONIC DEVICES_END_MO23
1 page

Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download

Uploaded by

Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download

Uploaded by

Download the full version and explore a variety of ebooks

Data Ingestion with Python Cookbook: A practical

Access ebookmass.com now to download high-quality

Python Data Cleaning Cookbook - Second Edition Michael

Data Engineering with dbt: A practical guide to building a

Exploratory Data Analysis with Python Cookbook: Over 50

Data-Driven SEO with Python: Solve SEO Challenges with

Beginner's Guide to Streamlit with Python: Build Web-Based

Data Structure and Algorithms With Python: The Ultimate

(eBook PDF) Intro to Python for Computer Science and Data

Nonclinical Study Contracting and Monitoring: A Practical

A practical guide to ingesting, monitoring, and identifying

Group Product Manager: Reshma Raman

First published: May 2023

Production reference: 1300523

Published by Packt Publishing Ltd.

About the author

Part 1: Fundamentals of Data Ingestion

Getting ready 8 Applying data governance in ingestion 25

How to do it… 79 Getting ready 86

Getting ready 120 Ingesting data from a JDBC database

How to do it… 130 Getting ready 146

Getting ready 160 Ingesting unstructured data with a

Getting ready 165 Inserting formatted SparkSession

Getting ready… 169 Further reading 179

Getting ready 195 Further reading 210

Part 2: Structuring the Ingestion Pipeline

See also 217 How it works… 229

Creating parallel ingest tasks 270 Getting ready 274

How to do it… 327 There's more… 340

Setting up Grafana for monitoring 358 Further reading 378

Other Books You May Enjoy 388

Who this book is for

What this book covers

To get the most out of this book

Software/Hardware covered in the book OS Requirements

Python 3.8 or higher Windows, Mac OS X, and Linux (any)

Download the example code files

Download the color images

def gets_csv_first_line (csv_file):

Any command-line input or output is written as follows:

Tips or important notes

Share Your Thoughts

Download a free PDF copy of this book

1. Scan the QR code or visit the link below

2. Submit your proof of purchase

• Chapter 1, Introduction to Data Ingestion

• Setting up Python and the environment

• Docker or Docker Desktop

Setting up Python and its environment

3. Go inside this folder:

4. Check your Python version on your operating system as follows:

The output will be similar to the python command, as follows:

You should see similar output:

1. Download one of the stable versions.

Figure 1.1 – Python.org download files for version 3.9

Figure 1.2 – The Python Installer for Windows

$ tar -xf Python-3.9.1.tgz

1. Here, we check the Java version:

You should see output similar to this:

Figure 1.3 – The JDK 19 downloads official web page

Figure 1.4 – The Java installation wizard window

1. Install PySpark from PyPi:

Using Python version 3.8.10 (default, Jun 22 2022 20:18:18)

1. To install Anaconda, go to the official website and select Products | Anaconda

Using virtualenv with PySpark

Configuring Docker for MongoDB

1. First, we pull the Docker image from Docker Hub as follows:

You should see the following message in your command line:

2. Then, we run the MongoDB server as follows:

We then see this on the screen:

Figure 1.5 – MongoDB and Docker running container

Figure 1.7 – MongoDB architecture

• The --name command defines the name we give to our container.

Configuring Docker for Airflow

You should see output like this:

Getting ready 8 Applying data governance in ingestion 25

How to do it… 79 Getting ready 86

Getting ready 120 Ingesting data from a JDBC database

How to do it… 130 Getting ready 146

Getting ready 160 Ingesting unstructured data with a

Getting ready 165 Inserting formatted SparkSession

Getting ready… 169 Further reading 179

Getting ready 195 Further reading 210

See also 217 How it works… 229

Creating parallel ingest tasks 270 Getting ready 274

How to do it… 327 There's more… 340

Setting up Grafana for monitoring 358 Further reading 378

Other Books You May Enjoy 388