Buy ebook Hands on Data Science for Biologists Using Python 1st Edition Yasha Hasija cheap price

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Download Full Version ebook - Visit ebookmeta.

com

Hands on Data Science for Biologists Using Python


1st Edition Yasha Hasija

https://ebookmeta.com/product/hands-on-data-science-for-
biologists-using-python-1st-edition-yasha-hasija/

OR CLICK HERE

DOWLOAD NOW

Discover More Ebook - Explore Now at ebookmeta.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Start reading on any device today!

All About Bioinformatics: From Beginner to Expert Yasha


Hasija

https://ebookmeta.com/product/all-about-bioinformatics-from-beginner-
to-expert-yasha-hasija/

ebookmeta.com

Translational Biotechnology: A Journey from Laboratory to


Clinics 1st Edition Yasha Hasija (Editor)

https://ebookmeta.com/product/translational-biotechnology-a-journey-
from-laboratory-to-clinics-1st-edition-yasha-hasija-editor/

ebookmeta.com

Data-Driven SEO with Python: Solve SEO Challenges with


Data Science Using Python 1st Edition Andreas Voniatis

https://ebookmeta.com/product/data-driven-seo-with-python-solve-seo-
challenges-with-data-science-using-python-1st-edition-andreas-
voniatis/
ebookmeta.com

The Clover Chapel 2 Jamison Valley 2021st Edition Devney


Perry

https://ebookmeta.com/product/the-clover-chapel-2-jamison-
valley-2021st-edition-devney-perry/

ebookmeta.com
Seeing Four Dimensional Space and Beyond Using Knots 1st
Edition Eiji Ogasa

https://ebookmeta.com/product/seeing-four-dimensional-space-and-
beyond-using-knots-1st-edition-eiji-ogasa/

ebookmeta.com

The Philology of Life: Walter Benjamin's Critical Program


1st Edition Kevin Mclaughlin

https://ebookmeta.com/product/the-philology-of-life-walter-benjamins-
critical-program-1st-edition-kevin-mclaughlin/

ebookmeta.com

The Vestibular System and Its Diseases: Transactions of


the International Vestibular Symposium of the Graduate
School of Medicine of the University of Pennsylvania
Robert Joseph Wolfson (Editor)
https://ebookmeta.com/product/the-vestibular-system-and-its-diseases-
transactions-of-the-international-vestibular-symposium-of-the-
graduate-school-of-medicine-of-the-university-of-pennsylvania-robert-
joseph-wolfson-editor/
ebookmeta.com

What Bad Girls Get 1st Edition Emily Tilton.

https://ebookmeta.com/product/what-bad-girls-get-1st-edition-emily-
tilton/

ebookmeta.com

Strengths Based Teaching and Learning in Mathematics Five


Teaching Turnarounds for Grades K 6 1st Edition Beth
Mccord Kobett
https://ebookmeta.com/product/strengths-based-teaching-and-learning-
in-mathematics-five-teaching-turnarounds-for-grades-k-6-1st-edition-
beth-mccord-kobett/
ebookmeta.com
The ABSITE Review 7th Edition Steven M. Fiser

https://ebookmeta.com/product/the-absite-review-7th-edition-steven-m-
fiser/

ebookmeta.com
Hands-On Data Science for
Biologists Using Python
Hands-On Data Science for
Biologists Using Python

Yasha Hasija and Rajkumar Chakraborty


First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2021 Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication and
apologize to copyright holders if permission to publish in this form has not been obtained. If any copyrighted
material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under US Copyright Law, no part of this book may be reprinted, reproduced, transmitted,
or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or
contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. For works that are not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only
for identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Names: Hasija, Yasha, author. | Chakraborty, Rajkumar, author.
Title: Hands on data science for biologists using Python / Yasha Hasija and Rajkumar Chakraborty.
Description: First edition. | Boca Raton : CRC Press, 2021. | Includes bibliographical references and index.
Identifiers: LCCN 2020044939 | ISBN 9780367546793 (hardback) | ISBN 9780367546786 (paperback) |
ISBN 9781003090113 (ebook)
Subjects: LCSH: Biology–Data processing. | Python (Computer program language)
Classification: LCC QH324.2 .H373 2021 | DDC 570.285‐‐dc23
LC record available at https://lccn.loc.gov/2020044939
ISBN: 978-0-367-54679-3 (hbk)
ISBN: 978-0-367-54678-6 (pbk)
ISBN: 978-1-003-09011-3 (ebk)

Typeset in Times
by MPS Limited, Dehradun
Contents

Preface................................................................................................................................ xi
Author Bio ........................................................................................................................ xii

1. Python: Introduction and Environment Setup ....................................................................1


Why Learn Python..................................................................................................................... 1
Installing Python ........................................................................................................................ 2
Installing Anaconda Distribution .............................................................................................. 3
Running the Jupyter Notebook ................................................................................................. 3
The Building Blocks of Programs ............................................................................................ 5
Errors in Python......................................................................................................................... 5
Exercise ...................................................................................................................................... 6

2. Basic Python Programming....................................................................................................7


Datatypes and Operators ........................................................................................................... 7
Variables .................................................................................................................................... 9
Strings ...................................................................................................................................... 11
Lists and Tuples....................................................................................................................... 16
Dictionary in Python ............................................................................................................... 22
Conditional Statements ............................................................................................................ 26
Loops in Python....................................................................................................................... 29
Functions .................................................................................................................................. 33
Classes and Objects ................................................................................................................. 37
File Handling in Python .......................................................................................................... 40
Exercise .................................................................................................................................... 43

3. Biopython ................................................................................................................................45
Introduction .............................................................................................................................. 45
Installing Biopython ................................................................................................................ 45
Biopython Seq Class ............................................................................................................... 45
Parsing Sequence Files ............................................................................................................ 47
Writing Files ............................................................................................................................ 51
Pairwise Sequence Alignment................................................................................................. 53
BLAST with Biopython .......................................................................................................... 57
Multiple Sequence Alignment................................................................................................. 59
Construction of a Phylogenetic Tree ...................................................................................... 62
Handling PDB Files................................................................................................................. 64
Exercise .................................................................................................................................... 70

4. Python for Data Analysis......................................................................................................71


Introduction .............................................................................................................................. 71
NumPy ..................................................................................................................................... 71
NumPy Arrays versus Lists..................................................................................................... 71
Two-Dimensional Matrices ..................................................................................................... 73

v
vi Contents

Matrix Operations .................................................................................................................... 74


Comparing Matrices ................................................................................................................ 77
Generating Data Using NumPy............................................................................................... 78
Speed Test................................................................................................................................ 79
“Pandas” Dataframe................................................................................................................. 80
Selecting Rows and Columns ................................................................................................. 81
Conditional Filtering in Dataframe ......................................................................................... 84
Writing CSV Files from Pandas Dataframe ........................................................................... 85
Apply() Function ..................................................................................................................... 85
Concatenating and Merging .................................................................................................... 87
Exercise .................................................................................................................................... 89

5. Python for Data Visualization..............................................................................................91


Introduction .............................................................................................................................. 91
Matplotlib................................................................................................................................. 91
Matplotlib Functional Method................................................................................................. 92
Matplotlib Object-Oriented Method........................................................................................ 93
Resolution and Saving Figures ............................................................................................... 97
Legend...................................................................................................................................... 98
Customization of the Plot Appearance ................................................................................... 99
Scatterplot .............................................................................................................................. 102
Histogram............................................................................................................................... 102
Boxplot................................................................................................................................... 104
Seaborn................................................................................................................................... 104
Distribution Plots ................................................................................................................... 105
Joint Plots............................................................................................................................... 106
Pairplot ................................................................................................................................... 108
Barplot.................................................................................................................................... 111
Boxplot................................................................................................................................... 113
Violin Plot.............................................................................................................................. 114
Heatmaps................................................................................................................................ 114
Cluster Maps .......................................................................................................................... 116
Regression Plot ...................................................................................................................... 117
Plotly – Interactive Data Visualization................................................................................. 118
Geographical Plotting ............................................................................................................ 120
Exercise .................................................................................................................................. 122

6. Principal Component Analysis ...........................................................................................123


Introduction ............................................................................................................................ 123
Variance as Information ........................................................................................................ 123
Data Transformation .............................................................................................................. 124
Case Study ............................................................................................................................. 125
PCA: Step-by-Step................................................................................................................. 127
Standardization of the Features............................................................................................. 127
Obtain the Eigenvectors and Eigenvalues ............................................................................ 128
Choosing Axes with Maximum Variance............................................................................. 130
Programing Drive .................................................................................................................. 133
Exercise .................................................................................................................................. 135
Contents vii

7. Hands-On Projects...............................................................................................................137
Differential Gene Expression Analysis................................................................................. 137
Quality Control ...................................................................................................................... 138
Normalization......................................................................................................................... 141
Differential Expression Analysis........................................................................................... 146
Cluster Map ........................................................................................................................... 151
Gene Enrichment Analysis .................................................................................................... 152
SNP Analysis ......................................................................................................................... 153
Exercise .................................................................................................................................. 160

8. Machine Learning and Linear Regression .......................................................................161


Introduction to Machine Learning and Its Applications in Biology ................................... 161
Types of Machine Learning Systems ................................................................................... 161
Optimization of Models......................................................................................................... 165
Challenges in Machine Learning Projects ............................................................................ 167
Linear Regression .................................................................................................................. 169
General Workflow of a Machine Learning Project .............................................................. 171
Implementation of Linear Regression Using Scikit-Learn................................................... 172
Loading Dataset ..................................................................................................................... 172
Train-Test Split ...................................................................................................................... 173
Training Model ...................................................................................................................... 173
Model Evaluation................................................................................................................... 174
Predicting Child Height Based on Parents Height ............................................................... 176
Predicting the Height of Sons ............................................................................................... 178
Predicting the Height of Daughters ...................................................................................... 180
Exercise .................................................................................................................................. 181
References .............................................................................................................................. 181

9. Logistic Regression ..............................................................................................................183


Introduction ............................................................................................................................ 183
Implementation of Logistic Regression Using Sklearn........................................................ 184
Train-Test Split ...................................................................................................................... 187
Training the Logistic Regression Model .............................................................................. 187
Evaluation of Model .............................................................................................................. 187
Retrieving Intercept and Coefficient ..................................................................................... 188
Data Scaling........................................................................................................................... 189
Predicting a New Result........................................................................................................ 192
Breast Cancer Prediction Using Logistic Regression........................................................... 193
Model Evaluation................................................................................................................... 194
Exercise .................................................................................................................................. 196
References .............................................................................................................................. 196

10. K-Nearest Neighbors (K-NN) .............................................................................................. 197


Introduction ............................................................................................................................ 197
Implemention of K-NN Using Sklearn ................................................................................. 198
Loading the Dataset............................................................................................................... 198
Splitting the Dataset into the Training Set and the Test Set ............................................... 198
Training the K-NN Model on the Training Set.................................................................... 199
Evaluation with K 1 .............................................................................................................. 199
Choosing a K-Value .............................................................................................................. 199
viii Contents

Data Scaling........................................................................................................................... 201


Predicting New Values .......................................................................................................... 202
Diagnosing the Liver Disease Using K-NN ......................................................................... 203
Missing Value Imputation ..................................................................................................... 204
Data Scaling........................................................................................................................... 205
Splitting the Dataset into the Training Set and the Test Set ............................................... 205
Choosing a K-Value .............................................................................................................. 205
Evaluation of the Model........................................................................................................ 206
Exercise .................................................................................................................................. 208
References .............................................................................................................................. 208

11. Decision Trees and Random Forests .................................................................................209


Introduction ............................................................................................................................ 209
Random Forests ..................................................................................................................... 211
Implementation of Decision Tree and Random Forest Using Sklearn................................ 212
Train Test Split ...................................................................................................................... 212
Decision Trees ....................................................................................................................... 212
Prediction and Evaluation ..................................................................................................... 213
Predicting New Values .......................................................................................................... 213
Random Forests ..................................................................................................................... 214
Prediction and Evaluation of Random Forest Model........................................................... 214
Predicting Prognosis of Diabetes Using Random Forest ..................................................... 214
Loading Dataset ..................................................................................................................... 214
Train-Test Split ...................................................................................................................... 215
Training Classifier ................................................................................................................. 215
Cross-Validation .................................................................................................................... 216
Exercise .................................................................................................................................. 217
Reference ............................................................................................................................... 217

12. Support Vector Machines ...................................................................................................219


Introduction ............................................................................................................................ 219
Kernel Trick........................................................................................................................... 219
Implementation of Support Vector Machines Using Sklearn .............................................. 221
Train Test Split ...................................................................................................................... 221
Train the Support Vector Classifier ...................................................................................... 221
Predictions and Evaluations .................................................................................................. 221
Grid Search ............................................................................................................................ 222
Prediction of Wheat Species Based on Wheat Seed Data ................................................... 223
Train Test Split ...................................................................................................................... 224
Training Support Vector Classifier and Tuning Its Parameters Using a Grid Search ........ 224
Exercise .................................................................................................................................. 225
References .............................................................................................................................. 225

13. Neural Nets and Deep Learning ........................................................................................227


Introduction ............................................................................................................................ 227
Neural Networks Architecture............................................................................................... 227
The Working Principle of Neural Networks ........................................................................ 228
Activation Functions.............................................................................................................. 228
Steps of Forward Propagation............................................................................................... 229
Gradient Descent ................................................................................................................... 229
Contents ix

Backpropagation .................................................................................................................... 230


Implementing Neural Networks Using TensorFlow............................................................. 231
Data Scaling........................................................................................................................... 232
TensorFlow 2.0 ...................................................................................................................... 232
Creating a Model ................................................................................................................... 232
Model – As a List of Layers................................................................................................. 232
Model – Adding in Layers One by One............................................................................... 233
Building Model ...................................................................................................................... 233
Training Model ...................................................................................................................... 234
Overfitting .............................................................................................................................. 235
Dropout and Early Stopping ................................................................................................. 235
Model Evaluation................................................................................................................... 238
Predicting New Instance........................................................................................................ 239
Predicting Breast Cancer Using Neural Networks ............................................................... 239
Separating the Dependent and Independent Dataset ............................................................ 239
Data Scaling........................................................................................................................... 239
Splitting the Dataset into the Training Set and Test Set ..................................................... 239
Creating the Model ................................................................................................................ 240
Model Evaluation................................................................................................................... 241
Convolutional Neural Network ............................................................................................. 241
Implementation of CNN Using TensorFlow ........................................................................ 243
Import Libraries ..................................................................................................................... 243
Importing the Dataset ............................................................................................................ 244
Splitting the Dataset into the Training Set and Test Set ..................................................... 245
Building Model ...................................................................................................................... 246
Training Model ...................................................................................................................... 247
Model Evaluation................................................................................................................... 248
Exercise .................................................................................................................................. 249
Reference ............................................................................................................................... 249

14. The Machine Learning Project ..........................................................................................251


Introduction ............................................................................................................................ 251
Importing the Libraries.......................................................................................................... 251
Importing the Dataset ............................................................................................................ 251
PCA ........................................................................................................................................ 252
Splitting the Dataset into the Training Set and the Test Set ............................................... 253
Training the Logistic Regression Model and Evaluation..................................................... 254
Training the K-NN Model and Evaluation ........................................................................... 254
Choosing K-Value ................................................................................................................. 254
Training the Random Forest Model and Evaluation ............................................................ 255
Training the SVM Model and Evaluation ............................................................................ 256
Training the ANN Model and Evaluation ............................................................................ 257
Exercise .................................................................................................................................. 259
Reference ............................................................................................................................... 259

15. Natural Language Processing............................................................................................. 261


Introduction ............................................................................................................................ 261
Vectorizing the Text .............................................................................................................. 261
Bag of Words......................................................................................................................... 261
TF-IDF ................................................................................................................................... 262
x Contents

Classification of Abstracts into Various Categories Using NLP ......................................... 263


Importing the Dataset ............................................................................................................ 263
Text Processing...................................................................................................................... 264
Label Encoding ...................................................................................................................... 265
Text Tokenization Bag-of-Words ......................................................................................... 265
Splitting the Dataset into the Training Set and the Test Set ............................................... 267
Building Model ...................................................................................................................... 267
Model Evaluation................................................................................................................... 267
TF-IDF Implementation......................................................................................................... 267
Splitting the Dataset into the Training Set and the Test Set ............................................... 268
Building Model ...................................................................................................................... 268
Model Evaluation................................................................................................................... 268
Artificial Neural Networks in NLP....................................................................................... 270
Splitting the Dataset into the Training Set and the Test Set ............................................... 271
Building Model ...................................................................................................................... 271
Training Model ...................................................................................................................... 271
Model Evaluation................................................................................................................... 272
New Prediction ...................................................................................................................... 272
Exercise .................................................................................................................................. 273
References .............................................................................................................................. 273

16. K-Means Clustering .............................................................................................................275


Introduction ............................................................................................................................ 275
Implementation of K-Means Clustering Using Sklearn ....................................................... 275
Choosing the Number of Clusters......................................................................................... 277
K-Means Clustering of Genes Based on the Co-Expression ............................................... 279
Exercise .................................................................................................................................. 283

Index..............................................................................................................................................285
Preface

Yasha Hasija Ph.D.

Data science is rapidly becoming a vital discipline involving the use of big data to extract meaningful
information. With the advent of high throughput technologies in the field of healthcare, it is becoming
increasingly imperative for life science researchers to analyze the massive amount of data being
generated. Researchers with little or no computational skills often find the task challenging. In order to
overcome this challenge, we have meticulously drafted this book, using illustrative examples, as a
stepwise guide to ease newcomers from the field of life sciences to the field of data science. We have
chosen Python as our programming language of choice because of its easy accessibility on all operating
systems, versatility, comprehensible interface, ease of use, object-oriented features, and wide range of
applicability.
This book will serve as a beginner’s guide for anyone interested in the basics of programming, data
science, and Machine Learning. Every topic has an intuitive explanation of concepts and is accompanied
by the implementation of the concepts using biological examples. This book can also serve as a
handbook for biological data analysis using standard Python code templates for model building -
facilitated with supplementary files for each chapter. The text is made to be as interactive as possible
with accompanying Jupyter Notebooks for every section, to help readers practice the codes in their local
systems. Each chapter is specially designed with examples.
The book is divided into two sections. The first section deals with an introduction to basic Python
programming and a hands-on tutorial for data handling. Chapters in this section elaborate on the usage
of some of the basic Python libraries and packages. One of the important libraries for life sciences data -
Biopython - is explained in this section with examples of reading and writing various biological file
formats, performing Pairwise and Multiple Sequence Alignments, handling protein and sequence data,
etc. The subsequent sections elaborate on data handling using NumPy and Pandas, data visualization
techniques, and dimensionality reduction methods that are common to all data analyzes and also provide
illustrative examples for biological data.
Machine Learning is an integral part of several research projects today and has numerous applications in
the present-day era. Almost all of the disciplines of technology have been transformed by Machine
Learning and artificial networks, and life sciences are no exception, with Machine Learning applications in
fields ranging from agriculture to diagnostics to personalized medicine to drug development to biological
imaging - the list is mounting. The second section of the book deals with Python implementation in
Machine Learning algorithms. Chapters in this section contain an introduction to Machine Learning to
make readers comfortable with the various terminologies used in Machine Learning. This section also
explores popular supervised and unsupervised Machine Learning algorithms - such as logistic regression,
k-nearest neighbors, decision trees, random forests, support vector machines, artificial neural networks,
convoluted neural networks, natural language processing, and k-means clustering - and shows their
implementation in Python.
The book is written considering the need for biologists to learn programming in light of handling
massive data, analyzing it, and deriving useful insights from it. I hope our readers will benefit from this
hands-on book on data science for biologists using Python.

Yasha Hasija, Ph.D.

xi
Author Bio

Dr. Yasha Hasija (B.Tech, M.Tech, Ph.D.) is an Associate Professor at the Department of
Biotechnology and the Associate Dean of Alumni Affairs at the Delhi Technological University. Her
research interests include genome informatics, genome annotation, microbial informatics, integration of
genome-scale data for systems biology, and personalized genomics. Several of her works have been
published in international journals of high repute, and she has made noteworthy contributions in the area
of biotechnology and bioinformatics as author and editor of notable books. Her expertise, through her
book chapters and conference papers, is of significance to other academic scholarship and teaching. She
is also on the editorial boards of numerous international journals.
Dr. Hasija’s work has brought her recognition and several prestigious awards - including the Human
Gene Nomenclature Award at the Human Genome Meeting (2010) held in Montpellier, France. She is
the project investigator for several research projects sponsored by the Government of India - including
DST-SERB, CSIR-OSDD, and DBT. As Dr. Hasjia continues conducting research, her passion for
finding the translational implications of her findings grows.
Mr. Rajkumar Chakraborty (B.Tech, M.Tech) received his Bachelor of Technology Degree in
Biotechnology from the Bengal College of Engineering and Technology, West Bengal, India and
completed his Masters of Technology Degree in Bioinformatics from the Delhi Technological
University, Delhi, India. He is currently pursuing his Ph.D. in the field of bioinformatics. He was a
part of the 4-member team which won “Promising Innovative Implementable Idea Award” at the
SAMHAR-COVID19 Hackathon 2020 for innovating a solution towards drug repurposing against
COVID-19. His research interests are in applied Machine Learning and the integration of big data in
biological science.

xii
1
Python: Introduction and
Environment Setup

Why Learn Python


Before knowing about Python, we should first understand why people working in the area of life
sciences should learn to program. As we are in the era of information technology, we have seen a
massive explosion in biological data like sequences, annotations, interactions, biologically active
compounds, etc. For instance, while this chapter was being written last April 2019, the Gene Bank
(NCBI) - which is one of the largest databases for nucleotide sequences - contains 212 million se­
quences in its repository (https://www.ncbi.nlm.nih.gov/genbank/statistics/). EMBL, which is also a raw
nucleotide sequence repository, contains 2,253.8 million annotated sequence data which are expected to
double in about 19.9 months (https://www.ebi.ac.uk/ena/about/statistics). This extensive data is being
generated by the advent of high-throughput technologies. For the analysis of this massive amount of
data, we need the help of computers. Computers consist of a central processing unit (CPU), a primary
memory, and a secondary memory storage device. The CPU is the component that does operations on
the data stored in primary and secondary memory. Primary memory is as fast as the CPU and is
designed to keep up with its speed, but it loses its memory as soon as the power is switched off. A
secondary memory storage device can store data after the computer shuts down. These make up our
digital assistant - which is pretty fast and accurate in its tasks and does not get bored with repetitive jobs.
However, in order to assign the job to computers and to receive the desired output, we need to com­
prehend their language, which is also known as the programming language. Every biological research
involves using different datasets and has unique problems to solve - from filtering, merging, subsetting,
finding commonalities between lists, and may even require customization of data formats for preserving
and using information. Programming gives a free hand to users to think and implement innovative
algorithms and solve various problems.
Over time, data science has also found its applications in life sciences. Data science helps in finding
patterns in a huge amount of structured or unstructured data which can help in providing valuable
insights in almost all frontiers of biology - ranging from finding putative variations, predicting amino
acid substitution consequences, diagnosing diseases quickly, predicting lead drug toxicity, predicting
pharmacophores, personalized, or precision medicine, prediction in the field of protein secondary and
tertiary structure, microRNA interaction with their targets, epigenetics, etc. The very first step in
generating a hypothesis from a big amount of data is the curation of large datasets. A task like curating
data is very tedious and time-consuming work. It consists of repetitive searching of data from certain
database’s websites, literature, and others. Here comes our digital assistant to the rescue, saving us from
this tedious job as it can work much faster than how humans think and perform things manually. A 3.0-
gigahertz CPU can process 3 billion instructions per second - that is an example of the tremendous
power of computing.
The central theme of this book is to provide a practical approach to biologists in applying data science
techniques on omics data. Data science usually consists of data analysis, data visualization, data pre­
paration, Machine Learning, and more. We will discuss each aspect in relation to relevant biological
problems along with their solutions - starting with basic Python programming so that readers can get
accustomed to programming terminologies.

1
2 Hands on Data Science for Biologists

Programming skills are a valuable asset for any biologist. There are many programming lan­
guages that have been developed. Some are for instantaneous computation, website creation, and
database generation, among others, and some are general-purpose programming languages that
were developed to be used in a variety of application domains. Python is one example of a general-
purpose programming language. Guido van Rossum developed it as a hobby in the Netherlands
around 30 years ago and named it after a famous British comedian group called “Monty Python’s
Circus”. Now, Python has applications in various domains like data science, web development,
data visualization, and desktop applications, to name a few. Python is one of the popular pro­
gramming languages in the data science and Machine Learning area, and it is community-driven.
Since it has a very steady learning curve, it is recommended by many experts for beginners as their
first programming language to learn. Primarily, Python has simple English-like readable syntax
which is easily understandable by users. For example, if one wants to find the proportion of the
amino acid Leucine with a symbol “L” contained in a protein sequence, the following Python
code will do that:

Protein = “MKLFWLLFTIGFCWAQYSSNTQQGRTSIVHLFEWRWVDIALECERY”
Leu_contain = Protein.count(‘L’)/len(Protein)
print(Leu_contain)

The code is very much similar to the English language. The first line is the protein sequence. The
second line calculates the Leucine residues (denoted by the letter “L”) by counting the number of times
“L” appears in the sequence and then dividing it by the total length of the sequence. Moreover, at last
printing the value, it turns out to be 0.108
Thanks to the readability of Python codes, learners can concentrate on the concepts of programming
and problems more than learning the syntax of the language. As Python is community-driven and it has
one of the largest communities, Python has evolved to contain several important libraries that are pre-
installed or are freely available to install. These libraries help in the quick and efficient development of
complex applications, because these do not need to be written from scratch.
Another advantage of learning Python is that it can be used for various purposes due to the devel­
opment of popular libraries, such as:

• Frameworks like Django, Flask, Pylons are used for creating static and dynamic websites.
• Libraries like Pandas, NumPy, and Matplotlib are accessible for data science and visualization.
• Scikit-Learn and TensorFlow are advanced libraries for Machine Learning and deep learning
• Desktop applications can be built using packages like PyQt, Gtk, and wxWidgets, among others.
• Modules like BeeWare or Kivy are taking the lead in mobile applications.

Learning programming is the same as learning a new language; we have to first understand the
vocabulary and syntaxes. Next, we learn how to construct some meaningful but terse sentences.
Using those sentences, we then form paragraphs, and finally, we write our own story. In this book,
we will start with Python syntaxes and vocabulary. Then, we will construct small programs with
biological relevance to help biologists learn programming with problems that are important
to them.

Installing Python
We are using Python 3.7, which is the current and stable version of Python. Most of the operating
systems either already have Python installed by default, or it can be downloaded from the Python
Software Foundation’s website (https://www.python.org/), where it is freely available. After installing
Python, open the Python Shell in Windows or type “python3” in the terminal of Mac or Linux as
follows:
Python: Introduction and Environment Setup 3

Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit


(AMD64)] on win32
Type “help”, “copyright”, “credits” or “license()” for more information.
»>Instructions are typed after “»>”. Let us start typing our first instruction
and press enter.

»> print(‘Welcome to Python’)


Welcome to Python

Our first instruction was simple - to print “Welcome to Python”. If it runs correctly, then Python has
been successfully installed and we are all set and ready to go!

Installing Anaconda Distribution


As we have discussed, Python has various packages that aid us in writing fewer lines of codes. Installing
each package one by one is a time-consuming job. Moreover, because this book is centered on data
science applications, we will require many widely used packages and along with their dependencies. For
the sake of investing less time in setting up the coding environment, we will install the Anaconda
distribution of Python. The Anaconda distribution comes with preinstalled packages for data science,
and it is the most popular among data scientists. Most of the statistics, data visualization, and Machine
Learning packages are built-in with the installation of Anaconda distribution. It is basically Python with
a set of various useful tools and packages preinstalled within itself. We will also get IPython (i.e. an
interactive Python shell) and Jupyter Notebook-like packages along with it. Jupyter Notebook will be
used throughout this book for writing codes and executing these. Jupyter Notebook is a kind of in­
teractive notebook based on IPython distribution. As a server-client application, the Jupyter Notebook
App enables us to write, edit, and run our codes in notebooks through an internet browser. The ap­
plication can be executed on a personal computer even without internet access. It comes with an
Integrated Development Environment (IDE) which has autofill options for variables and packages. The
Jupyter Notebook is also an easy way to share codes, so the codes used in this book may be downloaded
and executed in the machines of users.
For more information about the Anaconda distribution, one can visit their official website
(https://www.Anaconda.com/distribution/). To install Anaconda on the computer, go to (https://www.
Anaconda.com/distribution/#download-section). Choose Python 3.x version, where x is equal to
or greater than 7, and then download the graphical installer according to the user’s operating system
(i.e. Windows, Linux, or Mac OS). Follow the instructions for the graphical installer and keep all of
the default options ticked.

Running the Jupyter Notebook


After installing the Anaconda distribution, we may now proceed to opening the Jupyter Notebook and
then writing our first line of code. To do this, open the Anaconda command prompt in Windows or
terminal for Linux or Mac OS users. Type “Jupyter Notebook”, and the application should open on the
default internet browser in this specific address: http://localhost:8888, only of course, if port 8888 is
currently not in use. The user can also open Jupyter Notebook using the “Anaconda Navigator” by
searching the same term in the applications. Please refer to the screenshot (Figure 1.1) below to de­
monstrate if we are on the same page or not.
The “Files” tab shows the browsable list of files and folder in the working directory. The “Running”
tab displays the currently active Jupyter Notebook or terminals. The “Clusters” tab is for multiple
assemblies of computers connected to a node. To create a new file, folder, or Jupyter Notebook, click the
“New” button in the upper right corner of the page. Upon clicking the “New” button, a dropdown menu
4 Hands on Data Science for Biologists

FIGURE 1.1 A Screenshot of Root Directory Shown in a Jupyter Notebook.

FIGURE 1.2 New Notebook with Python 3.

will appear. Under the notebook section, choose “Python 3” to create a Python 3-compatible Jupyter
Notebook (Figure 1.2).
As indicated here, the current name of the notebook is “untitled”. To rename it, click on the “untitled”
text itself. The cells will run Python 3 codes, as Python 3 was selected as the kernel. Try writing the
same code that we have initially typed in the Python terminal:

print(‘Welcome to Python’)

Click on the “Run” button above or press “Shift” + “Enter” to execute the code.
The following output should appear in the notebook:

Welcome to Python

In the event that the user has different cells in their Notebook, and the user runs the cells altogether,
then the user can share their variables and imports among cells. This makes it simple to separate out the
code into legitimate pieces without expecting to reimport libraries, reintroduce variables, or define
functions in each cell.
The Jupyter Notebook has a few menus that the user can utilize to connect with their Notebook. The
menus are as follows:

• File
The File menu is used to create new notebooks, save notebooks, and open previously saved
notebooks. Jupyter Notebook is typically saved in a “.ipynb” format, but the user can also save it
in other formats by using the “Download as” option. Also, saving checkpoints options are au­
tomatically given.
• Edit
The Edit menu consists of typical editing options like cut, copy, paste, merge cells, and others.
• View
The View menu is useful in toggling the header and the toolbar.
• Insert
The Insert menu is used for inserting cells below and above the current cell.
• Cell
The Cell menu consists of running the cells and changing the type of cells.
Python: Introduction and Environment Setup 5

• Kernel
The Kernel option is mostly used in debugging to interrupt and to restart the Python 3 kernel.
• Widgets
JavaScript widgets can be added to our cells to create dynamic content using Python. This menu is
for saving and clearing the widget state.
• Help
The Help menu is used for learning about Jupyter Notebook, its documentation, shortcuts, etc.

We can also add rich content in the Jupyter Notebook using markup language in the cells and change the
cell type using the “Cell” menu to markdown. The markup language is a superset of HTML and is used
for styling text, inserting maths equations, etc. To learn more about Jupyter Notebook, the user can
always refer to the documentation.

The Building Blocks of Programs


In the next few chapters, we will learn more about vocabulary, syntax, and problem-solving with
Python. We will discover the powerful abilities of Python and amalgamate those capabilities to create
exciting programs.
There are some basic patterns used for building a program. These building blocks are not just for
Python programs, rather, these are more or less the same for any programming language. These are
discussed below:
Input: Accepting data/information from the user. Input might be from typing on the keyboard,
reading from a file like Fasta Q or PDB, or even acquiring information from some form of a sensor - for
example, from biomedical devices or color detection.
Output: Displaying the result, storing these in a file, or sometimes giving commands to other devices
such as in robotics or automation.
Serial Execution: Executing statements one by one and according to the order that these are
described in the script.
Conditional Execution: Checking for specific conditions, running, and/or skipping some of the
commands.
Repeated Execution: Executing a group of statements continuously and maybe with slight differences.
Reuse: Writing a batch of instructions once, naming these, and then reusing those statements when
needed throughout the program.

Errors in Python
Python gives rather detailed error messages by pinpointing the statement and library which are being
used. Correcting or understanding errors are sometimes bothersome, but this process hones one into a
successful programmer. There are different types of errors. Some are understandable by Python, and
these can give alerts as warnings, but errors are not native to Python most of the time, so programs are
sometimes executed with unexpected results. Here are three major types of errors in this discussion:
Syntax Error: These are the errors that are the most simple to understand and correct. These usually
happen when the user scrambles the grammar rules of Python, and Python gets confused over these
disarranged statements. Python will tell the user where the exact point of confusion is with the line and
word and ask the user to correct this. For learners, this is the most common mistake or error message.
Structuring the statements correctly is an essential requirement for proper execution.
Logic Errors: These are the errors in which Python does not understand the program and executes it
with unexpected results. These happen when the user’s statement is grammatically correct, but the
meaning is not intentional. Logical errors are bugs, and the debugging process will help here. The user
must look through all the steps to find the bug.
6 Hands on Data Science for Biologists

Semantic Errors: These types of mistakes happen when the user gives a grammatically correct
statement in the proper order, but there is a problem in the program. For example, when the user tries to
add or subtract a number with a string. This kind of operation is not possible and will raise a semantic
error - pointing out the operation or statement.
Readers will encounter a lot of errors, and correcting these requires the skill of asking the questions
discussed above.
With this prerequisite knowledge and environment setup, we are now ready to take deep dive into
the exciting world of Python language and start writing our programs. In this book, the codes are
explained in a step-by-step process so that these are understandable and applicable in solving in­
dividual problems. Data science techniques sometimes require a great depth of mathematical and
statistical understanding. These are beyond the scope of this book. However, we will provide a ne­
cessary and intuitive explanation in every section. We hope that this wealth of knowledge helps the
readers understand and appreciate the usability of programming for a biologist, the features of Python
language, the combination of Python and data science for biologists, and ultimately, discover the fun
way to learning all of this.

Exercise
>sp|Q9SE35|20-107
QSIADLAAANLSTEDSKSAQLISADSSDDASDSSVESVDAASSDVSGSSVESVDVSGSSL
ESVDVSGSSLESVDDSSEDSEEEELRIL

1. Why do biologists need to learn programming?


2. What are the types of errors generally encountered by programmers?
3. Comment on the building blocks of a typical computer program.
4. What are the advantages of installing Anaconda distribution over vanilla Python?
5. What are the applications of data science in the field of biology?
6. Write the functions of menus present in the Jupyter Notebook.
7. Calculate the serine (S) content in the given peptide sequence using the Python programming
language.
2
Basic Python Programming

In this chapter, we will go through a basic understanding and overview of Python programming which is
a prerequisite for any form of data analysis. Variables and operator, string, list and tuples, dictionary,
conditions, loops, functions, and objects are some of the topics covered in this chapter.
Let us begin with a familiar syntax in Python which is used for commenting a statement. If a
statement or a line begins with “#”, then Python will ignore it. Comments are useful to make any code
self-explanatory. We will use a lot of comments wherever required to make the code more under­
standable to readers.

Code:
#Let’s print “Hi there!”
print('Hi there!')

Output:
Hi there!

After executing the above code in the Jupyter Notebook by pressing “ctrl + enter”, the first statement
or line starting with “#” will be ignored, and as a result, we will see “Hi there!”. Therefore, the first line
is a comment which describes that the code will print “Hi there!”.

Datatypes and Operators


Datatypes
As the name suggests, datatypes are the different types of data, such as numbers, characters, and
Booleans that can be stored and analyzed in Python. Among various datatypes, the four most
common are:

• int (integers or whole numbers)


• float (decimal numbers or floating-points)
• bool (Boolean or True/False)
• str (string or a collection of characters like a text)

There are two important ways in which Python represents a number: int and float. Decimals numbers
(float), such as 1.0, 3.14, ‒2.33, etc., will potentially consume more space than integers or whole
numbers, like 1, 3, ‒4, 0, etc. Think of this way, if we take whole numbers between 0 and 1, then we will
see only the two numbers 0 and 1, but, in the case of decimal numbers, we will get infinite numbers
between 0.0 and 1.0. Next, we have a Boolean datatype, which is “True” or “False”, and these are used
in making conditions which we will learn shortly. Lastly, “str” or the string datatype is the datatype that
biologists will need and encounter the most - whether it is the DNA, RNA, and/or protein sequences or
names, most of them are text or strings. Therefore, we have a separate section for strings in this chapter.
It is imperative to mention here that string-type data always remains inside quotes, i.e. (‘<string data>’).
For example, ‘ATGAATGC’ will be a string for Python.

7
8 Hands on Data Science for Biologists

To know the datatype of values, we can write “type(<value>)” to get its datatype in the Jupyter
coding cell.

Code:
print(type(4)) # integer, or a whole number
print(type(4.0)) # floating point, or decimal number
print(type(True)) # boolean, or a True/False
print(type('ATGAATGC')) # means string, or ‘a piece of text’

Output:
<class 'int'>
<class 'float'>
<class 'bool'>
<class 'str'>

In the statements, we can see that 4 is the “int” type, whereas 4.0 is in “float”. For now, ignore the
word “class”. We will learn about this in the succeeding parts of this book. The key takeaways here are
the datatypes and the method in identifying the datatype of value in Python.

Code:
# Addition
print(4 + 6) #Integer + Integer
print(4 + 6.0) #Integer + Float
# Subtraction
print(6-3) #Integer - Integer
print(6-3.0) #Integer - Float
# Multiplication
print(2 * 5) #Integer * Integer
print(2 * 5.0) #Integer * Float
# Division
print(24/3) #Integer / Integer

# Power
print(2**8) #Integer ** Integer
print(2**8.0) #Integer ** Float
# ‘%’ or modulo operator, also known as the modulo or remainder operator gives
# the remainder of two numbers which are not a factor of each other.
8%3 #Integer % Integer

Output:
10
10.0
3
3.0
10
10.0
8.0
256
256.0
2
Basic Python Programming 9

Operators
In this section, we will discuss some of the standard operators in Python. We are familiar with some of
the operators like “+”, “‒”, “*”, “/”, “=”, and “**”.

TABLE 2.1
Some Common Operators in Python.
Symbol Name
+ Addition
‒ Subtraction
* Multiplication
/ Division
** Power
% Modulo
= Equal to

Operations with an integer and float will always return float-type results, and operations with two
integers will return integers, except for division where these will still return a float type. Subsequently,
we can attain an integer-type for division by using an integral division operator (i.e.“//”).

Order of Operation – PEMDAS


For a complex calculation involving two or more operators, the order of operation is determined by the
rule of PEMDAS:

1. Parentheses ()
2. Exponent **
3. Multiplication *
4. Division / // %
5. Addition +
6. Subtraction ‒

After PEMDAS, the order goes from left to right. For example, try to evaluate “2 + 5*4/2”.
According to “PEMDAS”, first calculate “5*4”, then “5*4/2”, and lastly “2 + 5*4/2 = 12”. Now, if
the user has to break this order, they can use the Parentheses as used in pen and paper-solving of
equations.

Variables
Variables in Python are like the variables of algebra in mathematics. We think of a variable as a box
with a name on it that can hold any value or datatype. Variables can also inherit all the properties of
the value stored inside it. Variables consist of two parts: the name and the value. We assign a name for
the value by using an equal to “=” operator. The name is on the left side, and the value is on the
right side.

Code:
length_of_gene = 1300
print (length_of_gene)
10 Hands on Data Science for Biologists

Output:
1300

Once we assign a variable, then we can recall them. In the example below, we can see that variables:
“length_of_gene” and “length_of_introns” are assigned and then are used for finding the mRNA length
and storing it in another variable called the “length_of_mRNA”.

Code:
length_of_gene = 1300
length_of_introns = 350
length_of_mrna = length_of_gene - length_of_introns
print(length_of_mrna)

Output:
950

From this point forward, we will use these variables in other programs.
Variables make our programs clear enough to read, and these are reusable. For example, if the user
has to use a long protein or nucleotide sequence, then it would not be wise to write it every time.
Therefore we can assign it to a variable, and we can reuse this every time it is required. Variables can be
assigned to other variables, reassigned anytime to different values, and also allocated to another vari­
able. Let us explain this in code:

Code:
some_var = 100
another_var = some_var
some_var = 300
length_of_gene,length_of_introns = 1300,350

In the code, “another_var” is assigned the same as “some_var”, and the next line “some_var” is
reassigned to another value. When assigning a new value to the variable, the old value will be forgotten
and, thus, cannot be retrieved. This reassigning of a variable can also be done with a non-identical
datatype. For example, a variable containing an integer can be reassigned to a variable containing string
and vice versa. This property is not true for many other programming languages. In the last statement,
two variables are assigned values in the same statement - which is also one of the unique points of
Python that sets it apart from any other programming language. Last but not the least, variable names
are case sensitive - for example, a variable name “protein_id” cannot be called “Protein_ID” or
“PROTEIN_ID”.

Rules for Variable Naming


In Python, nomenclature can be assigned to a variable with a set of rules:

• Variable names must start with a letter or an underscore.


• The rest of the name should consist of letters, numbers, or underscores. No special characters like
“@”, “.”, etc. are permissible.
• Python variables are case sensitive, as discussed earlier.
• 33 words are prohibited from being used as a variable name, because these are in Python 3.7’s
vocabulary and are known as keywords. All Python keywords are listed in Table 2.2.
Basic Python Programming 11

TABLE 2.2
Keywords in Python
False else import pass Yield
None break except in Raise
True class finally is return
and continue for lambda try
as def from nonlocal while
assert del global not with
elif if or

Most Python programmers prefer to name the variables with the following guidelines:

• Most variables should be in “snake_case”, which means there is an underscore between words.
• Most variables are in lowercase other than constants.
• CamelCase is used for defining class or functions. Please note that we have a dedicated section for
classes and functions in this chapter, so just remember this part for now.

Strings
For computer programmers, strings are the collection of characters or, more commonly, any texts. In
bioinformatics studies, handling strings is very common - like sequencing files, finding patterns in the
sequences, data-mining from texts, processing data from various file formats, etc. By enclosing a sequence
of characters between a pair of single quotes, double quotes, triple-single quotes, or triple-double quotes, a
string object can be constructed in Python. While characters enclosed between single or double quotes can
only have a single line, characters between triple-single or triple-double quotes can have multiple lines.
Let us take a look at the following example:

Code:
# A string within a pair of single quotes
seq_1 = 'ATGCGTCA'
print(seq_1)
print('---------')

# A string within a pair of double quotes


seq_2 = "ATGCGTCA"
print(seq_2)
print('---------')

# A string within a pair of triple single quotes


seq_3 = '''ATGCGTCA'''
print(seq_3)
print('---------')

# A string within a pair of triple double quotes


seq_4 = """ATGCGTCA"""
print(seq_4)
print('---------')
12 Hands on Data Science for Biologists

# A string within a pair of triple single quotes, can have multiple lines
seq_5 = '''MALNSGSPPA
IGPYYENHGY'''
print(seq_5)
print('---------')

# A string within a pair of triple double quotes, can have multiple lines
seq_6 = """IGPYYENHGY
IGPYYENHGY"""
print(seq_6)

Output:
ATGCGTCA
---------
ATGCGTCA
---------
ATGCGTCA
---------
ATGCGTCA
---------
IGPYYENHGY
---------
IGPYYENHGY
IGPYYENHGY

The characters should be enclosed within the same type of quote - usually single or double quotes -
for defining a string datatype.

Escape Sequence Characters


Supposing the user has to print text in different lines using double or single quotes, as programmers generally
like to use double or single quotes for defining text. Perhaps, the user wants to use quotes inside quotes or
more. Here, we have escape sequence characters to come to our aid. These are indeed not unique to Python
and are found in various other languages. Given below is the list of escape sequences and their meanings:

Escape Sequence Meaning


\newline Ignored

\\ Backslash (\)
\' Single quote (')
\" Double quote (")
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\v ASCII Vertical Tab (VT)
\ooo ASCII character with octal value ooo
\xhh… ASCII character with hex value hh…
Basic Python Programming 13

Although most of these are not commonly used, we will try out some of the examples.

Code:
# Escape Sequence Characters
print(' Hey Ashok, "How\'re you?" ') #escaping single quotes
print('---------')
print('First line\nSecond line') #escaping new line
print('---------')
print('\\') #escaping Backslash

Output:
Hey Ashok, "How're you?"
---------
First line
Second line
---------
\

In this example, applications of escaping characters are shown. they are mostly used for writing text
files using Python.

String Indexing:
A string is a collection or sequence of characters, so it is possible in Python to grab single characters as
well as a part of the text by using their indexes. For grabbing the character, we have to place the index
number inside the square bracket pair after the string name.
Given below is an example of String Indexing in the DNA sequence “ATGCGTCA” to print the
second nucleotide.
It may be noted that the index of any string starts with 0, starting with the leftmost character -
meaning that the index of first nucleotide “A” is 0, that of the second nucleotide “T” is 1, and so on. In
backward indexing, the indexing starts with -1 from the rightmost character, meaning the backward
index of last nucleotide “A” is -1, that of second last nucleotide “C” is -2, and so on.

Code:
dna_seq = 'ATGCGTCA'
print(dna_seq[1])

Output:
T

In the output we got “T”, but the first nucleotide was “A”. It is because, unlike our customary
practice, Python counting starts with zero.
Another example of character index for Python is shown in the figure below, where the first row is the
sequence; the second row is the forward index of nucleotides, and the third row shows the backward
index (Figure 2.1):

FIGURE 2.1 String Indexing in Python.


14 Hands on Data Science for Biologists

Below is the code for extracting the first character of the string:

Code:
# Extracting the first nucleotide
dna_seq = 'ATGCGTCA'
print(dna_seq[0])
print('---------')

# Grab the second last nucleotide


print(dna_seq[-2])

Output:
A
---------
C

To grab a part of a text or string, the annotation used is “string_name[start: end]”, where “start” is the
starting index, the “end” is the index extending up to the provided number, but not including it.

• dna_seq[3:6] is “CGT” - characters starting at index 3 and extending up to but not including
index 6
• dna_seq[3:] is “CGTCA” - leaving a blank for either index defaults to the start or end index of the
string
• dna_seq[:] is “ATGCGTCA” - emptying both fields always produces a copy of the whole string
• dna_seq[1:5] is “TGCGTCA” - an index that is too big is truncated to string length
• dna_seq[:-4] is “ATGC” - selecting up to but not including the last four characters
• dna_seq [-4:] is “GTCA” - starting with the fourth character from the right end to the right end

String Concatenation
There are a few ways to concatenate or join strings. The easiest and most common way to add join
strings by using the plus symbol (+) or, in simplest terms, by simply adding them.

Code:
#String concatenation
dna_1 = ‘ATGCGTCA’
dna_2 = ‘ACTGCGTC’
full_dna = dna_1 + dna_2
print('The sequence of DNA is 'full_dna)

Output:
The sequence of DNA is ATGCGTCAACTGCGTC.

We can add any number of strings using the “+” operator. An important thing to note here is that all of
the datatypes should be strings while adding strings - for example, if we add a string with an integer, like
“ACTGCGTC” + 4, then there will be an error message suggesting that “str” type and “int” cannot be
added. To add a number, we have to convert the number to “str” type by using str(number) function.
While we cannot add integer with strings, we can print the same string multiple times using the “*”
operator with an “int” datatype. For example, “ACTGCGTC”*2 will double the string into “
ACTGCGTCACTGCGTC”.
Basic Python Programming 15

Commands in Strings
Various commands are available to make the desired modifications in strings or to carry out analyses. We
will discuss some of the most common methods in this section. Remember that these methods do not
modify the string itself but, rather, produce a new string, because the string is an immutable datatype.
Let us return to the Jupyter Notebook and try out the following codes:

Code:
#Converting a string into lowercase letters
dna_seq = 'ATGCGTCA'
print(dna_seq.lower())
print('---------')
print(dna_seq)

Output:
atgcgtca
---------
ATGCGTCA

In the example above, the lower() method is used. It reverts the strings in lowercase letters. We can
also observe that the original variable “dna_seq” is not changed after applying the lower() method on it.
In the same way, using the command str.upper() will change the string into uppercase letters.
A few more commands for string alteration include count(), find(), and len(). Their usage is described below:

Code:
dna_seq = 'ATGCGTCA'
print(dna_seq.count('A')) #str.count()counts all the occurrences of the
selected string in the parent string.
print(dna_seq.find('GT')) #str.find() returns the index of the first occurrence
of the selected string in the parent string.
print(len(dna_seq)) # len()returns the length of the string.

Output:
2
4
8

In the above examples, “len()” is a function that returns the length of the string. There is a primary
method called str.split() which is very frequently used for extracting data from delimited text file
formats like CSV, TSV, etc. CSV stands for comma-separated-values, where values of each column are
separated by a comma delimiter, and TSV stands for tab-separated-values, where the values of each
column are separated by a tab delimiter

FIGURE 2.2 Example of a CSV-Formatted File.


16 Hands on Data Science for Biologists

Figure 2.2 is an example of a CSV-formatted file, where the first row is known as the header row and
consists of column names, and the rest of the rows are instances that have values separated by a comma for
each column. We can extract the values of each row if we consider each row as a string using str.split() method:

Code:
#str.split()
first_row = '6,148,72,35,0,33.6,0.627,50,1'
Pregnancies,Glucose,BloodPressure,
SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,
Age,Outcome = first_row.split(',')
print(Glucose)
print(BloodPressure)
print(Insulin)
print('---------')
print(first_row.split(','))

Output:
148
72
0
---------
['6', '148', '72', '35', '0', '33.6', '0.627', '50', '1']

Let us study the above code line by line. We assign the first observation (i.e. the second row of the
CSV file shown in Figure 2.2) to a variable named “first_row”. Second, we use the multiple-variables
assigning feature of Python for setting up each column as variable and first observations as their values,
respectively. Here the split(‘,’) method collects a string and returns a list of values that are split by
commas. We can print the variables named after the columns in the header of the CSV file. Also, the last
line of the output is a list of split values. The list is a particular datatype in Python, which we are going
to discuss in the next section. We are barely grazing the surface, and it should be noted that there are
other exciting methods present for the string datatype. Readers can always refer to the Python doc­
umentation to find all of the methods available for strings.

Lists and Tuples


Now that we are familiar with the datatypes like integers, strings, and Booleans, we will discuss two
more datatypes in this section - lists and tuples.
Lists and tuples store or hold multiple values of any datatype-like containers. These are also known as data
structures, because they store data in a particularly convenient way so that these can be retrieved easily later.

Lists
While strings are a collection or sequence of characters, lists are a series of values that are more like
arrays in other programming languages but are more comparatively flexible. Values in lists are known
as items or elements. Some essential features of lists are:

• Lists are ordered. A list notes the order of the items inserted and can be retrieved later.
• Objects in a list can be accessed with an index.
• Lists can contain any entity - numbers, strings, tuples, and even other lists.
• Lists can be modified or mutable. Changes may be made in the list; new items can be added;
existing items removed or revised.
Basic Python Programming 17

There are different ways to build a new list. The best way is to put the elements in a square bracket
(‘[‘and’]’) separate them by commas.

Code:
# Creating an empty list called "list"
list = [ ]
# Adding the values inside the List
list = [ 1, 2, 3, 4, 5]
# Printing the List
list

Output:
[1, 2, 3, 4, 5]

Lists can hold any datatype or objects and can be assigned to any variable

Code:
# Adding the values irrespective of their datatype: Integer, String, float.
list = [1, 2, 3,'Metformin', 4.0, 4/2]
list

Output:
[1, 2, 3, 'Metformin', 4.0, 2.0]

Code:
# Creating a list called drug_name
drug_name = ['Metformin', 'Acarbose', 'Canagliflozin', 'Dapagliflozin']
print(drug_name)

Output:
['Metformin', 'Acarbose', 'Canagliflozin', 'Dapagliflozin']

Accessing Values in a List


Like strings, list items also have indexes starting with “0” for forward indexing and “-1” for backward
indexing (Figure 2.3)

['Metformin' 'Acarbose' 'Canagliflozin' 'Dapagliflozin']

0 1 2 3

-4 -3 -2 -1

FIGURE 2.3 Python List Indexes.

We can access items inside a list using brackets [] and indexes.

Code:
# Accessing the elements in the list
print(drug_name[0]) # Metformin
Discovering Diverse Content Through
Random Scribd Documents
vexations—the dark weather of life, that beset even such a humble
career as mine.
So much for the introduction—and now to business.
The following letter is very welcome. Can Harriet venture to tell
us who the author of this capital riddle really is?
Newport, March 28, 1842.
Friend Merry:
In looking over, a few days since, some old papers
belonging to my father, I found the following riddle. My father
informs me that it was written many years ago, by a school-
boy of his, then about fifteen years old, and who now
occupies a prominent place in the literary and scientific world.
If you think it will serve to amuse your many black-eyed and
blue-eyed readers, you will, by giving it a place in the
Museum, much oblige a blue-eyed subscriber to, and a
constant reader of, your valuable and interesting Magazine.
Harriet.
riddle.

Take a word that’s much used,—’tis a masculine name,


That backward or forward doth spell just the same;
Then a verb used for dodging—a right it will claim
That backward or forward it spells just the same;
The form of an adjective, none can exclaim
That backward or forward it spells not the same;
Then a chief Turkish officer’s title or name,
That backward or forward doth spell just the same;
The name of a liquor, its friends all will claim
That backward or forward is still just the same;
Then a word used for jest, or doth triumph proclaim,
That backward or forward still spells just the same;
Then a verb in the imperfect, which also doth claim
That backward or forward it spells just the same;
The name of a place which geographers fame,
That backward or forward doth still spell the same;
Then a very queer word, ’t is a Spanish ship’s name,
That backward or forward doth spell just the same;
Then a verb that’s well known, I refer to the same,
That, backward or forward spelt, makes but one name;
Then a name that is given to many a dame
That backward or forward still spells just the same.
A Set of initials the above will afford—
R-Ove through them in order, they form a droll word.
I L-eave you to solve it—’t will cure a disease;
De-Velop the riddle—’t will set you at ease.
D-Espair not, but hope; ’t is easily guessed:
L-Ike etching on copper in gay colors dressed,
E-Tch it down on your hearts, and there let it rest.

Elizabeth Town, N. J., April 9, 1842.


Dear Sir:
Though perhaps not so young as the generality of your
admiring readers, I am confident that there can be none who
are more delighted than myself with your works, and
particularly your Museum, which is now being published. Of
course, I was the more pleased when I noticed the addition
of a “puzzle column,” of which I am decidedly fond. I have
solved with correctness all the puzzles that have appeared in
your Museum, with the exception of Puzzle No. 5 in the April
number, which so far passes my comprehension, that, after
repeated endeavors after its solution, I have flattered myself
that it is a hoax; but if it is not, I must confess it is the
hardest puzzle I have seen for some time. Are not the
following correct answers to the April puzzles?—No. 1,
“Mother.” No. 2, “Charles Dickens.” No. 3, “Boston and
Worcester Railroad.” No. 4, “Prince de Joinville;” and Master
Bare-Head’s, “Massachusetts.” I forward you an original
puzzle, for which I do not profess any very extraordinary
difficulty.
I am a name of 23 letters.

My 5th, 21st, 7th, 10th, 22d, is a Russian noble.


My 17th, 18th, 20th, 20th, 12th, 2d, is a .
My 1st, 10th, 15th, 16th, is a legal writing.
My 4th, 14th, 13th, 17th, 12th, is a pleasant amusement.
My 11th, 3d, 8th, is seen whenever it is not invisible.
My 2d, 12th, 21st, 4th, 12th, 2d, is what if all men were,
the world would be happier.
My 19th, 12th, 7th, 7th, 23d, 9th, 19th, 6th, 9th, 12th,
6th, 19th, is the title of a justly celebrated periodical.
My 22d, 3d, 9th, 9th, 14th, 6th, is a street where my
whole is found.

If you think the above worthy a place, you can publish it.
You may hear from me again soon. My sheet is full, so I have
but to subscribe myself,
Very respectfully,
W. F. W.

Saturday, April 8, 1842.


Dear Sir:
I have taken the liberty to send you this puzzle, which I
suppose almost any of your readers can unravel.
I am a name of 13 letters.

My 1st, 5th, 6th, 4th, and 2d, is a girl’s name.


My 3d, 5th, 10th, and 11th, is what every bird has.
My 9th, 6th, 4th, 10th, 11th, 12th, and 13th, is what
physicians often use.
My 3d, 4th, 3d, and 5th, is a number.
My 11th, 5th, and 3d, is also a number.
My 13th, 8th, and 1st, is a color.
My whole is the name of a distinguished orator and
statesman.
From a constant reader, who signs himself,
Respectfully yours,
Alexis.

Dear Mr. Merry:


I have been trying my hand at puzzles since the reception
of the April number of the Museum. I have guessed out No.
4, as you will see below.
Sarah.
Answer to Puzzle No. 4, in the April number of the Museum.

The first, the “mechanic,” I doubt not a bit,


Is the joiner, well known by rustic and cit;
The second, a word highly prized by us all,
For all would be loved, whether great, whether small;
The third, Mr. Puzzler, a pin, I should guess,
For fastening a plank, or a fair lady’s dress;
The fourth—let me see; I’ll think in a trice—
I have it at last! it is very fine rice;
The fifth, it is said, “is French for a city,”—
Now that must be ville—how exceedingly pretty!
The sixth, and the last, it seems very clear,
Will never spell Yankee, but p-e-e-r.
Prince de Joinville.

Gloucester, April, 1842.


Mr. Merry:
I have found out the answers to the puzzles in the April
number, as follows: 1st puzzle, the answer is, Mother; 2d,
Charles Dickens; 3d, Boston and Worcester Railroad; 4th,
Prince de Joinville; 5th, ——; 6th, Massachusetts. And now,
Mr. Merry, I take the liberty to send you one, which, if you
think worthy, I should like to have you publish in your
Magazine, and oblige
Your blue-eyed Friend,
F. W. C.
I am a sentence of 11 letters.

My 6th, 4th, 7th, and 8th, is a fruit.


My 1st, 10th, 7th, and 3d, is used for fuel.
My 11th, 2d, 9th, and 9th, is a loud screech.
My 2d, 7th, and 3d, is what every one does.
My 9th, 4th, 7th, and 1st, is a long stride.
My 1st and 7th is an abbreviation for father.
My 3d, 7th, 6th, 10th, and 8th, is a small light.
My 4th, 7th, 5th, and 9th, is a person of rank.
My whole has written many interesting books.

Dear Sir:
My little daughter has handed me the following puzzle to
send to you for your next number, which please insert, and
oblige
A Subscriber.

My 8th, 2d, 9th, 19th, 24th, 4th, was a celebrated


English poet.
My 3d, 26th, 14th, 16th, 27th, is one of the elements.
My 21st, 11th, 6th, 7th, 26th, 8th, exists only in
imagination.
My 14th, 9th, 10th, 5th, 19th, is a gaudy flower.
My 4th, 11th, 20th, 13th, 17th, 16th, 26th, 9th, was a
Swiss philosopher.
My 19th, 1st, 5th, 22d, is various in form and expression.
My 9th, 15th, 28th, 26th, 14th, is an article of extensive
commerce.
My 12th, 13th, 9th, 4th, 19th, 24th, 27th, was strikingly
exemplified in
My 4th, 7th, 8th, 1st, 26th, 4th, 6th, 14th, 1st, 16th,
14th, 15th, 5th, 4th, 6th.
My 19th, 26th, 19th, 26th, 3d, is a foreign production.
My 14th, 16th, 23d, 10th, was a famous archer.
My 13th, 14th, 26th, 14th, 9th, 16th, is pale and
motionless.
My 24th, 26th, 25th, 18th, 23d, is much used in one of
the polite arts.
My 6th, 2d, 13th, 14th, 14th, 1st, 2d, 9th, 26th, 8th, 8th,
2d, 22d, 6th, asks your opinion of my whole.

Philadelphia, April 6, 1842.


Mr. Merry:
You will pardon the liberty that one of your juvenile
admirers has taken, by sending you a puzzle for your
invaluable Museum. The subject is one that you are very
familiar with, and as I have but just made it my subject,
perhaps full justice may not have been done to its character. I
have at least tried to make the best of it.
Elizabeth.
I am composed of 9 letters.

My 4th, 8th, 6th, is the retreat of a wild beast.


My 9th, 2d, 4th, is the name of the Creator.
My 4th, 2d, 5th, is a female deer.
My 6th, 8th, 4th, is a nickname for a boy.
My 3d, 2d, 8th, is what cloth is made from.
My 1st, 3d, 5th, is a scripture denunciation.
My 7th, 5th, 9th, is a part of the human frame.
My 9th, 2d, 9th, is a record kept by seamen.
My 2d, 4th, 8th, is a piece of poetry.
My 4th, 3d, 6th, is a Spanish title.
My 4th, 2d, 9th, is a sagacious animal.
My 9th, 8th, 7th, 6th, is a romantic spot.
My 6th, 3d, 4th, is where Adam’s first son went and
dwelt.
My 7th, 8th, 6th, 4th, is an act of friendship.
My 4th, 3d, 1st, 6th, is an article of commerce.
My 9th, 3d, 1st, 6th, is a female dress.
My 9th, 2d, 1st, 8th, 6th, is a Scottish name for a small
flower.
My 8th, 4th, 5th, 6th, is the first spot inhabited by human
beings.
My 9th, 3d, 2d, 4th, is what all people should be.
My whole is what my friend Robert Merry has found very
useful to himself in moving through the world.

Utica, April 9, 1842.


Mr. Merry:
I am a subscriber to your Museum and have been very
much pleased with it. I write to let you know that I wish very
much to have you continue the story of Philip Brusque. I wish
to know whether the people lived contented under the
government of M. Bonfils, and if they ever got away from the
island. I live at Utica, and was much pleased with the account
of your visit to this place thirty-five years ago.
From a Blue-eyed Friend,
Samuel L********.

Dear Mr. Merry:


If it is not too much trouble, I should like to know what
became of Brusque, and if Mr. Bonfils made a good king. With
some assistance, I have found out the answers to three of
those puzzles which were in the last Magazine. The first is
Mother, the second Charles Dickens, and the fourth Prince de
Joinville.
If the following be worthy a place in your Magazine, by
inserting it you will oblige
A New Hampshire Boy.
I am a name of 11 letters.

My 10th, 11th, 8th, is a useful grain.


My 3d, 4th, 8th, is an industrious insect.
My 1st, 2d, 7th, 4th, is an ancient city.
My 6th, 2d, 5th, 11th, is a name often given to a royalist
in the Revolution.
My 9th, 2d, 3d, 3d, 4th, 5th, is a bad man.
My whole, Mr. Merry, you know better than I do.

I offer my best thanks for the letters from the following friends:
“One of your blue-eyed readers in New York;” “A little subscriber in
Canandaigua,” whom I shall always be happy to hear from; E. D. H
——s, of Saugus; C. W., of Millbury; C. A. S. and L. B. S., of
Sandwich; L. W——e, and W. B. W——e; and “A Subscriber.”
S. L.’s letter about the postage, dated Utica, April 22, was duly
received.
H. E. M. thinks that Puzzle No. 5, in the April number, is either a
hoax, or that the solution is Nantucket. We think it is a little of both:
that is, that our friend who sent it to us intended it for Nantucket;
but about that time it was “all fools day,” and the unlucky types of
the printer seem to have made a very good puzzle, as sent to us,
into “an April fool.”
ROBERT MERRY’S
MUSEUM.

edited by

S. G. GOODRICH,

author of peter parley’s tales .

VOLUME IV.

BOSTON:
B R A D B U R Y, S O D E N , & C O . ,
No. 10 School Street, and 127 Nassau Street, New York. 1842.
CONTENTS OF VOLUME IV.
JULY TO DECEMBER, 1842.

The Sense of Taste, 1


2, 50, 88, 109, 146,
The Siberian Sable-Hunter,
166
Hay-Making, 8
Limby Lumpy, 9
Lime, 11
The Voyages, Travels and Experiences of
Thomas Trotter, 12, 58, 92, 136, 170
Similes, 16
Proverbs and Sayings of the Chinese, 16
Indians of America, 17, 38, 72, 141
Ruins of Babylon, 24
Adam and Eve, 25
Merry’s Adventures, 26, 34, 66, 104, 132,
161
Gaza, 29
Knights Templars and other orders of
Knighthood, 30
A Page for Little Readers, 30
Bob O’Linkum’s Song to the Mower, 32
The Sense of Touch, 33
That thing I cannot do, 45
Skeleton of a Bird, 47
A Tragedy in the Woods, 48
Frogs, 49
Walled Cities, 55
Bells, 55
A Mother’s Affection, 56
To Correspondents, 63, 128
Puzzles, 64, 128
Seeing, 65
The Stock-Dove, &c., 79
Story of Philip Brusque, 80, 151, 181
Ingenious Contrivances of Nature, 84
Don’t be too Positive, 86
A Melancholy Event, 96
Sketches of Bible Scenes, 97
Bethesda, 97
Jerusalem, 98
Valley of Jehoshaphat, 102
Joppa or Jaffa, 103
Mount Carmel, 104
Ruins of Jericho as they now appear, 129
Askelon, 130
Bethlehem, 131
The Hippopotamus, 107
The Flying Dragon, 108
The Snail, 108
Varieties, 126, 160, 187
Rivers, 135
Boy and Bird, 135
Gall Insects, 140
Anecdote of the Natives of Porto Rico, 143
Winter Sport, 144
Clouds, 144
The Orang-Outang, 145
Field Teachers, 154
Life and Character of Alexander the Great, 157
Discovery of the Mines of Potosi, 165
Wild Geese, 169
The Two Friends, 175
The Selfish Boy, 177
Story of Little Dick and the Giant, 178
The Flowers, 179
Christmas, 180
Winter is coming, 182
Liberty, 183
Dress and other matters in France, in the
time of Henry IV., 185
The Last Leaf of Autumn, 186
Reflections, 188

Entered, according to Act of Congress, in the year 1842, by S. G. Goodrich, in the Clerk’s
Office of the
District Court of Massachusetts.
KNIGHTS TEMPLARS.
MERRY’S MUSEUM.

V O L U M E I V . — N o . 1 .

The Sense of Taste.


The tongue, which has so much to do with talking, has a good
deal to do with tasting. It is indeed one of the chief instruments by
which the sensation of taste is experienced. The palate is also
another organ of importance in the perception of taste.
The tongue is always moistened with saliva, which instantly
dissolves the surface of anything that is put into the mouth. Some
portion of the particles being taken upon the tongue, this latter is
pressed against the roof of the mouth, thus bringing them in contact
with the nerves which coat the surface of the mouth and palate. It is
by means of these nerves that the qualities of substances are
perceived and the sensation which we call taste is excited.
It will be perceived that the saliva of the mouth is one great
cause of all taste. When the tongue is rendered dry by disease, or
any other circumstance, the sense of taste is either imperfect or lost.
The pressure of the tongue against the surface of the mouth seems
also to be important in producing the sense of taste; for if you put
anything into your mouth, and hold it open, the sensation is hardly
produced. It is from the effect of this pressure that the act of
chewing and swallowing gives us so much pleasure.
There is a great difference in people, as to the degree of
perfection in which they possess this sense; for in some, it is very
blunt, while in others, it is very acute. There is a difference also as
to the things that people like. Some are fond of cheese, and others
cannot endure it. The Esquimaux are delighted with the flavor of
blubber oil; the Indians of Guiana feast upon monkeys; the negroes
of south-western Africa are fond of baked dogs; the Chinese eat
rats, lizards and puppies; the French rank snails and frogs among
their nicest tit-bits; yet all these things are revolting to us.
This diversity arises chiefly from custom and habit; for originally
our perceptions are, no doubt, nearly the same. It is certainly so
with animals; for every horse and every ox, in a natural state, eats
or rejects the same species of food.
The word taste is frequently used in what is called a metaphorical
sense, for the purpose of expressing the feelings of the mind. A
person who loves poetry is said to have a taste for poetry; by which
is meant that he has a mind which feels and appreciates the
qualities of poetry, just as the tongue feels or appreciates the
qualities of food.
It is in the same sense that we say, a person has a taste for
painting, or music, or any other art. When we say a person has fine
taste, we mean that his mental perceptions are very acute.
The Siberian Sable-Hunter.

chapter ix.

Agreeably to their plan, the sable-hunters continued at the hut,


following the game, day after day, with the greatest ardor. The forest
proved to be very extensive, stretching out for miles upon both sides
of a little river that flowed into the Lena. It was the depth of winter,
and snow fell almost every day; yet they were seldom prevented
from going forth by the weather. They were very successful in their
hunting, and a day seldom passed in which they did not bring home
some game. They killed several bears and wolves, and a great
number of sables, ermines, martens, squirrels and lynxes.
In all their expeditions, Alexis was among the most active,
persevering, and skilful of the party. It was a great object in
obtaining the finer furs, to kill the animals without breaking the skin
of the body. In this art, Alexis excelled; for he could shoot with such
precision, as to bring down his game, by putting only a single shot
through the head. But he was of an ardent temper, and sometimes
his zeal led him into danger. One day, being at a distance from his
party, he saw a silver fox, and he pursued him for several hours,
entirely forgetting that he was separated from his friends, and
wandering to a great distance, amid the mazes of the woods.
At last, in pursuing the fox, he entered a wild and rocky dell,
where perpendicular cliffs, fringed by cedars and hemlocks, frowned
over the glen. Plunging into the place, which seemed like a vast
cavern, he soon came near the object of his pursuit, and brought
him to the ground. Before he had time to pick up his game, he saw
a couple of sables peering through a crevice in a decayed oak that
had rooted itself in the rocks above. Loading his gun, he fired, and
the animals immediately disappeared within the cavity. Believing that
they were killed, he clambered up the steep face of the precipice
with great labor and no little danger. At length, he reached the foot
of the tree which leaned from the cliff, over the dark valley beneath.
Immediately he began to ascend it, hardly observing, in his
eagerness, that it was rotten to the very root, and trembled
throughout its whole extent, as he ascended.
Up he went, heedless of all but the game, until he reached the
crevice, where two sables, of the largest kind, lay dead. He took
them out, and, for the first time, looked beneath. He was touched
with a momentary thrill of fear as he gazed down and perceived the
gulf that yawned beneath him. At the same moment, he heard a
crackling at the roots of the tree, and perceived a descending motion
in the limbs to which he clung. He now knew that he was falling, and
that, with the vast mass, he must descend into the valley beneath.
The moment was almost too awful for thought: yet his mind turned
to his father and sister, with a feeling of farewell, and a prayer to
Heaven for his soul. How swift is the wing of thought in the moment
of peril! He felt himself rushing downward through the air; he closed
his eyes; there was a horrid crash in his ears, and he knew no more.
The sound of the falling oak rung through the glen, and in the space
of a few minutes the figure of a man, clothed in furs, was seen
emerging from one of the caverns, at a little distance. He
approached the spot where Alexis had fallen; but at first nothing was
to be seen save the trunk of the tree, now completely imbedded in
the snow. The man was about to turn away, when he saw the fox
lying at a little distance, and then remarked one of the sables, also
buried in the snow. Perceiving that the animal was warm, as if just
killed, he looked around for the hunter. Not seeing him, the truth
seemed at once to flash upon his mind; and he began to dig in the
snow beneath the trunk of the tree. Throwing off his bear-skin coat
and a huge wolf-skin cap, and seizing upon a broken limb of the
tree, he labored with prodigious strength and zeal. A large
excavation was soon made, and pretty soon he found the cap of
Alexis. This increased his zeal, and he continued to dig with
unabated ardor for more than an hour. Buried at the depth of eight
feet in the snow, he found the young man, and with great labor took
him out from the place in which he was imbedded, and which, but
for this timely aid, had been his grave. The surface of the snow was
so hard as to bear the man’s weight, provided as he was with the
huntsman’s broad-soled shoes of skins. Still it was with great
difficulty that he could carry Alexis forward. He, however, succeeded
in bearing him to his cave. Here he had the satisfaction of soon
finding that the youth was still alive; that he was indeed only
stunned, and otherwise entirely unhurt. He soon awoke from his
insensibility, and looking around, inquired where he was. “You are
safe,” said the stranger, “and in my castle, where no one will come
to molest you. You are safe; and now tell me your name.”
For a moment, Alexis was bewildered, and could not recollect his
name, but after a little time, he said falteringly, “Pultova,—my name
is Alexis Pultova.”
“Pultova!” said the stranger, with great interest; “are you of
Warsaw—the son of Paul Pultova?”
“I am,” was the reply.
“Yes,” said the other, “you are, I see by your resemblance, you
are the son of my noble friend, General Pultova. And what brought
you here?”
“I am a hunter,” said Alexis.
“Alas, alas,” said the man, “and so it is with the brave, and the
noble, and the chivalrous sons of poor stricken Poland: scattered
over this desolate region of winter—this wild and lone Siberia—
banished, forgotten, save only to be pursued, crushed by the
vengeful heel of power. Oh God! O Heaven! how long will thy justice
permit such cruelty toward those whose only crime is, that they
loved their country too well?” Saying these words, the stranger’s
bosom heaved convulsively, the tears fell fast down his cheeks, and,
as if ashamed of his emotion, he rushed out of the cavern.
Alexis was greatly moved, yet his curiosity was excited, and he
began to look around to ascertain what all this might mean. He now,
for the first time, recollected his fall from the tree. He perceived that
he was in a lofty cavern, in which he saw a bed made of skins, a
gun, and various other trappings belonging to a hunter. He justly
concluded that he had been rescued by the stranger; and when he
returned, as he did in a few minutes, he poured out his grateful
thanks to him for saving his life.
The two now fell into conversation: and Alexis heard the details of
his own rescue, as well as the story of the hunter. He was a Polish
nobleman, who had taken part in the struggle for liberty, and who
had also shared in the doom of those patriots who survived the
issue. While they were conversing, they thought they heard sounds
without, and going to the mouth of the cave, they perceived voices
in the glen. Alexis soon recognised the piercing tones of Linsk, and
immediately answered him. The old hunter, with his two sons, soon
came up, and there was a hearty shaking of hands all round. The
whole story was soon told, and the hunters were invited by the
stranger into the cave.
The evening was now approaching, and Linsk, with his party,
being pressed to spend the night at the cave, cheerfully accepted
the request. A fire was soon kindled, a haunch of fat bear’s meat
was roasted, and the company sat down to their meal. There was
for a time a good deal of hilarity; for, even in comfortless situations,
a sense of deliverance from peril breaks into the heart, scattering
with its brief sunshine the gloom that is around. So it was with the
hunters, in the bosom of that dark cavern, and in that scene and
season of winter; the laugh, the joke, and the story passed from one
to the other. Even the stern and stony brow of the stranger relaxed
at some of the droll remarks and odd phrases of Linsk, and
unconsciously he became interested in the passing scene.
When Linsk had done ample justice to the meal, he hitched back
a little from the circle which sat around, and, wiping his greasy lips
and hands, using the sleeve of his wolf-skin coat instead of a pocket-
handkerchief, he said, “Well, master Alexis, this jump of yours, from
the top of a mountain into the middle of a valley, beats all the capers
of that kind which I ever heard of; but as to your going eight feet
into the snow, that’s nothing. I once knew a fellow who spent a
winter at Kamschatka, and he says that the snow falls there to such
a depth as sometimes to cover up houses. He told one thumping
story of what happened to himself.”
“What was it?—tell it,” was uttered by several voices. Thus
invited, Linsk proceeded to relate the following tale.
“The man I spoke of was one of your short, tough little runts, and
very like a weasel—hard to catch, hard to kill, and worth very little
when you’ve got him. I forget now what it was led him off to such a
wild place as Kamschatka; but I believe it was because he was of a
restless make, and so, being always moving, he finally got to the
end of the world. Nor was this restlessness his only peculiarity—he
was one of those people to whom something odd is always
happening; for you know that there are folks to whom ill-luck sticks
just as natural as a burr to a bear’s jacket.
“Well, Nurly Nutt—for that was the young fellow’s name—found
himself one winter at Kamschatka. It was far to the north, where the
sun goes down for six months at a time, and brandy freezes as hard
as a stone. However, the people find a way to melt the brandy; and,
by the rays of the moon, or the northern lights, which make it
almost as light as day, they have their frolics, as well as other
people.
“It chanced to be a hard winter, and the snow was very deep.
However, the people tackled up their dogs, hitched them to their
sledges, and cantered away over the snow like so many witches.
Nurly was a great hand at a frolic, especially if the girls were of the
mess; and he went on at such a rate as to become quite a favorite
with the softer sex. But it so happened, that, just as the girls
became eager to catch Nurly, he wouldn’t be caught, you know—a
thing that’s very disobliging, though it’s very much the way of the
world.
“There was one black-eyed girl that particularly liked our little
hero; and he liked her well enough, but still he wouldn’t come to the
point of making her an offer of his heart. Well, they went on flirting
and frolicking for some time, and a great many moonlight rides they
had over the snow-crust. Well, one night they were out with a party,
skimming over the vast plain, when they came to a steep ridge, and
the leader of the train of sledges must needs go over it. It was hard
work for the dogs, but they scrabbled up one after another.
“Now Nurly and his little lass were behind all the rest, and, for
some reason of their own, they were a good deal behind. However,
they ascended the hill; but, as luck would have it, just as they got to
the top, the sledge slipped aside, and tipped the pair over. The
sledge went on, and all the more swiftly that the dogs had a lighter
load; but down the hillside went Nurly and the girl, her arms around
him, as if she had been a bear and he a cub. At last they came to
the bottom with a terrible thump, the crust broke through, and in a
moment they were precipitated down some five and twenty feet!
Both were stunned; but soon recovering, they looked around. What
was their amazement to find themselves in a street, and before a
little church! Just by their side was an image of the Virgin!
“‘What can it mean?’ said Nurly.
“‘It is a warning!’ said the lass.
“‘And what must we do?’ said the other.
“‘Why, Nurly, don’t you understand?’ replied the girl.
“‘I’ll be hanged if I do,’ said the youth.
“‘Shall I tell you?’ said the girl.
“‘Certainly,’ said he.
“‘Well, Nurly,’ replied the lass, ‘we have been a good deal
together, and we like each other very well, and yet we go on, and
nothing comes of it. We dance and ride, and ride and dance, and still
nothing comes of it. Well, one night we go forth in the sledge; the
train passes on; it courses over a hill. They all go safely. You and I
alone meet with a miracle. We are hurled to the valley—we descend
into a new world; a church is before us—we are alone—saving the
presence of the blessed Virgin, and she smiles upon us.’ The girl
hesitated.
“‘Go on,’ said Nurly.
“‘Well—the Virgin smiles—and here is a church—’
“‘Well, and what of it—pray what does it all mean?’ said the
fellow.
“‘You are as stupid as a block!’ said the lass, weeping.
“‘I can’t help it,’ said Nurly Nutt.
“‘You can help it—you must help it!’ replied the girl, smartly. ‘We
must make a vow. Take my hand and say after me.’ He now obeyed.
“‘We do here take a most holy vow, before the blessed Virgin, and
at the door of the church, that we will love each other till death,
and, as soon as we can find a priest, that we will mutually pledge
our vows as man and wife, forever: and so may Heaven help us.’
“‘Whew!’ said Nurly; but at the same time he kissed his
betrothed.
“They then began to look around. They saw a passage leading to
some houses. They passed along, and there found a village all
buried beneath the snow. There were paths dug out along the
streets and from house to house. Here the people dwelt, as if
nothing had happened. They had herds of deer, and plenty of bear’s
meat; and thus they lived till spring came to melt away the snow,
and deliver them from their prison. Nurly and his little wife stayed in
the village till spring, and then went to their friends. They had been
given up as lost;—so there was great rejoicing when they got back.
Nurly was laughed at a little for the advantage taken of his
ignorance and surprise by the lass of the black eyes; but he was still
content, for she made him a good little wife. He brought her all the
way to Okotsk, and settled there. It was at that place I saw him, and
heard the story. It sounds queer—but I believe it true.”
When Linsk had done, the stranger made some remarks, alluding
to his own history. Linsk, in a very respectful manner, begged him to
state the adventures of which he spoke, and the man went on as
follows:—
“I am a native of Poland. You see me here, clothed in skins, and a
mere hunter like yourselves. I am but a man, and a very poor one,
though the noblest blood of my country flows in my veins. I had a
vast estate, situated almost thirty miles from Warsaw. I there
became acquainted with a Russian princess, and loved her. My love
was returned, and we vowed fidelity to each other for life. The
revolution broke out, and I took an active part in it. My suit had
been favored by the emperor before, but now I was informed that
he frowned upon my hopes and wishes, and that he looked upon me
with a special desire of vengeance. Twice was I assailed by ruffians
in the streets of Warsaw, hired to take my life. In battle, I was
repeatedly set upon by men, who had been offered large rewards if
they would kill or capture me; but I escaped all these dangers.
“The princess whom I loved was in the Russian camp. I was one
of a party who broke in, by a desperate assault, and surrounded the
house where she dwelt. We took her captive, and carried her to
Warsaw. She was offended, and would not see me. She contrived
her escape; but I was near her all the time, even during her flight.
As we were about to part, I made myself known to her, and asked
her forgiveness. She wept, and leaned on my breast.
“Warsaw had that day fallen; the hopes of liberty had perished;
Poland was conquered; the emperor was master over the lives and
fortunes of the people, and too well did we know his cruel nature to
have any other hope than that of the gallows, the dungeon, or
Siberia.
“I told these things to the princess. She heard me, and said she
would share my fate. While we were speaking, a close carriage and
six horses came near. It was night, but the moon was shining
brightly. I perceived it to be the carriage of Nicholas, the emperor;
but at the moment I recognised it, it was set upon by four men on
horseback, who rushed out of an adjacent thicket. They were heavily
armed, and, discharging their pistols, killed the postillion and one of
the guard. There were but three of the emperor’s men left, and
these would have been quickly despatched, had I not dashed in,
with my two attendants, to the rescue. One of the robbers was
killed, and the others fled.
“Though Nicholas is harsh, he is no coward. He had just leaped
from the carriage, when the ruffians had escaped. He was perfectly
cool, and, turning to me, surveyed me for an instant. He had often
seen me at court, and I think he recognised me. ‘To whom do I owe
my safety?’ said he. ‘To a rebel!’ said I; and we parted.
“The carriage passed on. The princess had witnessed the whole
scene, though she had not been observed by the emperor’s party. I
returned to her. She seemed to have changed her mind, and begged
me to see her conducted to the emperor’s camp. ‘You are now safe,’
said she. ‘You have saved the Czar’s life, and that insures you his
forgiveness—his gratitude. I know him well. In matters of
government he is severe; but in all personal things he is noble and
generous. I will plead your cause, and I know I shall prevail. Your
life, your fortune, your honor, are secure.’
“I adopted her views, though with much anxiety. I conducted her
near to the Russian camp, and she was then taken in safety to the
Czar’s tent. Soon after, she went to St. Petersburgh, since which I
have heard nothing of her. The judgment of the enraged emperor
fell like a thunderbolt upon the insurgents of Poland. The blood of
thousands was shed upon the scaffold. Thousands were shut up in
dungeons, never more to see the light or breathe the air of heaven.
Thousands more were banished to Siberia, and myself among the
number. The emperor’s hard heart knew no mercy. Here I am, and
here, alone, am I resolved to die.”
This story was told with such energy, and with an air so lofty and
stern, as to make all the party afraid to speak. Soon after, the
stranger left the cave for a short time, as if the thoughts excited by
his narrative could not brook the confinement of the cavern. He soon
returned, and all retired to rest. In the morning the hunters took
leave, Alexis bearing with him a rich present of furs from the hermit,
several of them the finest of sables. One of these was carefully
rolled up, and Alexis was instructed in a whisper to see that, if
possible, it should be sent to the princess Lodoiska! At the same
time, he was told never to reveal the name and character of the
stranger whom he had met, and was also requested to enjoin
secrecy upon his companions.
Linsk and his party went back to their hut; and in a few weeks,
having obtained a large amount of rich furs, they took advantage of
the sledges of some Tungusians, going to Yakoutsk, and returned to
that place, making a brisk and rapid journey of several hundred
miles in a few days. Alexis little expected the news which awaited his
arrival.

The following complimentary toast to the ladies was given at a


railroad celebration in Pennsylvania: “Woman—the morning star of
our youth; the day star of our manhood; the evening star of our old
age. God bless our stars!”
Hay-Making.

No part of the business of farming is more pleasant than hay-


making. It is true, that to mow the grass, and make the hay in the
broiling sun of July, is rather hard work; yet, after all, hay-makers
are usually a cheerful, merry, frolicsome set of people.
There are few sounds more pleasant than those produced by the
whetting of the mower’s scythe. This proceeds from the ideas that
are associated with it. It is then that the summer flowers are in full
bloom; it is then that their sweet perfume is borne upon every
breeze; it is then that the song of the bobolink, the meadow-lark,
the oriole, and the robin, is heard from every bush, and field, and
tree.
When, therefore, we hear the ringing of the mower’s scythe,
ideas of the flowers, of their fair forms, and lovely hues, and
delicious fragrance; of the birds, and their joyous minstrelsy, come
thronging into the mind, thus producing very agreeable emotions.
Nor is this all—the hay-making season is a time when children can
go forth to roam in freedom where they will; to chase the butterfly,
or pluck the flowers, or dabble in the brook, or stoop down and
drink from the rivulet, or sit at leisure beneath the cooling shade of
the trees. It is a time when the poor are relieved from the pinches of
Jack Frost; when the young are gay, and the old are cheerful. It is
the time when people saunter forth at evening, and feel that they
might live in the open air,—when the merry laugh is heard in the
village, at sunset; when the notes of the flute steal through the
valley, and many a musical sound comes down from the hill.
Hay-making, then, is a season of many pleasures, and the word
brings to our minds, perhaps, more agreeable associations, than
almost any other.
Limby Lumpy;
or, the boy who was spoiled by his mamma.

Limby Lumpy was the only son of his mamma. His father was called
the “pavier’s assistant;” for he was so large and heavy, that, when
he used to walk through the streets, the men who were ramming
the stones down, with a large wooden rammer, would say, “Please to
walk over these stones, sir.” And then the men would get a rest.
Limby was born on the 1st of April; I do not know how long ago;
but, before he came into the world, such preparations were made!
There was a beautiful cradle; and a bunch of coral, with bells on it;
and lots of little caps; and a fine satin hat; and nice porringers for
pap; and two nurses to take care of him. He was, too, to have a little
chaise, when he grew big enough; after that, he was to have a
donkey, and then a pony. In short, he was to have the moon for a
plaything, if it could be got; and as to the stars, he would have had
them, if they had not been too high to reach.
Limby made a rare to do when he was a little baby. But he never
was a little baby—he was always a big baby; nay, he was a big baby
till the day of his death.
“Baby Big,” his mamma used to call him; he was “a noble baby,”
said his aunt; he was “a sweet baby,” said old Mrs. Tomkins, the
nurse; he was “a dear baby,” said his papa,—and so he was, for he
cost a good deal. He was “a darling baby,” said his aunt, by the
mother’s side; “there never was such a fine child,” said everybody,
before the parents; when they were at another place, they called
him “a great, ugly, fat child.”
We call it polite in this world to say a thing to please people,
although we think exactly the contrary. This is one of the things the
philosopher Democrates, that you may have heard of, would have
laughed at.
Limby was almost as broad as he was long. He had what some
people call an open countenance; that is, one as broad as a full
moon. He had what his mamma called beautiful auburn locks, but
what other people said were carroty;—not before the mother, of
course.
Limby had a flattish nose and a widish mouth, and his eyes were
a little out of the right line. Poor little dear, he could not help that,
and, therefore, it was not right to laugh at him.
Everybody, however, laughed to see him eat his pap; for he would
not be fed with the patent silver pap-spoon which his father bought
him; but used to lay himself flat on his back, and seize the pap-boat
with both hands, and never let go of it till its contents were fairly in
his dear little stomach.
So Limby grew bigger and bigger every day, till at last he could
scarcely draw his breath, and was very ill; so his mother sent for
three apothecaries and two physicians, who looked at him,—told his
mamma there were no hopes; the poor child was dying of over-
feeding. The physicians, however, prescribed for him—a dose of
castor oil!
His mamma attempted to give him the castor oil; but Limby,
although he liked sugar plums, and cordial, and pap, and
sweetbread, and oysters, and other things nicely dished up, had no
fancy for castor oil, and struggled, and kicked, and fought, every
time his nurse or mamma attempted to give it to him.
“Limby, my darling boy,” said his mamma, “my sweet cherub, my
only dearest, do take the oily poily—there’s a ducky, deary—and it
shall ride in a coachy poachy.”
“Oh! the dear baby,” said the nurse, “take it for nursey. It will take
it for nursey—that it will.”

You might also like