Data Science at The Command Line: Obtain, Scrub, Explore, and Model Data With Unix Power Tools 2nd Edition Jeroen Janssens Instant Download
Data Science at The Command Line: Obtain, Scrub, Explore, and Model Data With Unix Power Tools 2nd Edition Jeroen Janssens Instant Download
https://ebookmeta.com/product/data-science-at-the-command-line-
obtain-scrub-explore-and-model-data-with-unix-power-tools-2nd-
edition-jeroen-janssens/
https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/
https://ebookmeta.com/product/python-data-science-handbook-
essential-tools-for-working-with-data-2nd-edition-jake-
vanderplas/
https://ebookmeta.com/product/python-data-science-handbook-
essential-tools-for-working-with-data-2nd-edition-jake-
vanderplas-2/
https://ebookmeta.com/product/the-rough-guide-to-andalucia-10th-
ed-2023-10th-edition-rough-guides/
Doing Good in the World The Inspiring Story of The
Rotary Foundation s First 100 Years 1st Edition David
Forward
https://ebookmeta.com/product/doing-good-in-the-world-the-
inspiring-story-of-the-rotary-foundation-s-first-100-years-1st-
edition-david-forward/
https://ebookmeta.com/product/my-eyes-are-black-holes-1st-
edition-logan-ryan-smith/
https://ebookmeta.com/product/cross-technology-communication-for-
internet-of-things-fundamentals-and-key-technologie-1st-edition-
xiuzhen-guo/
https://ebookmeta.com/product/complete-checkers-2-revised-
edition-richard-pask/
https://ebookmeta.com/product/edexcel-a-level-mathematics-pure-
mathematics-year-2-a-level-maths-and-further-maths-2017-1st-
edition-greg-attwood/
IoT and AI Technologies for Sustainable Living: A
Practical Handbook 1st Edition Abid Hussain
https://ebookmeta.com/product/iot-and-ai-technologies-for-
sustainable-living-a-practical-handbook-1st-edition-abid-hussain/
Praise for Data Science at the Command Line
Traditional computer and data science curricula all too often
mistake the command line as an obsolete relic instead of teaching
it as the modern and vital toolset that it is. Only well into my
career did I come to grasp the elegance and power of the
command line for easily exploring messy datasets and even
creating reproducible data pipelines for work. The first edition of
Data Science at the Command Line was one of the most
comprehensive and clear references when I was a novice in the
art, and now with the second edition, I’m again learning new tools
and applications from it.
—Dan Nguyen, data scientist, former news
application developer at ProPublica, and former Lorry
I. Lokey Visiting Professor in Professional Journalism
at Stanford University
The Unix philosophy of simple tools, each doing one job well, then
cleverly piped together, is embodied by the command line. Jeroen
expertly discusses how to bring that philosophy into your work in
data science, illustrating how the command line is not only the
world of file input/output, but also the world of data manipulation,
exploration, and even modeling.
—Chris H. Wiggins, associate professor in the
department of applied physics and applied
mathematics at Columbia University, and chief data
scientist at The New York Times
This book explains how to integrate common data science tasks
into a coherent workflow. It’s not just about tactics for breaking
down problems, it’s also about strategies for assembling the
pieces of the solution.
—John D. Cook, consultant in applied mathematics,
statistics, and technical computing
Despite what you may hear, most practical data science is still
focused on interesting visualizations and insights derived from flat
files. Jeroen’s book leans into this reality, and helps reduce
complexity for data practitioners by showing how time-tested
command-line tools can be repurposed for data science.
—Paige Bailey, principal product manager code
intelligence at Microsoft, GitHub
It’s amazing how fast so much data work can be performed at the
command line before ever pulling the data into R, Python, or a
database. Older technologies like sed and awk are still incredibly
powerful and versatile. Until I read Data Science at the Command
Line, I had only heard of these tools but never saw their full
power. Thanks to Jeroen, it’s like I now have a secret weapon for
working with large data.
—Jared Lander, chief data scientist at Lander
Analytics, organizer of the New York Open Statistical
Programming Meetup, and author of R for Everyone
The command line is an essential tool in every data scientist’s
toolbox, and knowing it well makes it easy to translate questions
you have of your data to real-time insights. Jeroen not only
explains the basic Unix philosophy of how to chain together single-
purpose tools to arrive at simple solutions for complex problems,
but also introduces new command-line tools for data cleaning,
analysis, visualization, and modeling.
—Jake Hofman, senior principal researcher at
Microsoft Research, and adjunct assistant professor
in the department of applied mathematics at
Columbia University
Data Science at the Command
Line
SECOND EDITION
Jeroen Janssens
Data Science at the Command Line
by Jeroen Janssens
Copyright © 2021 Jeroen Janssens. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles
(http://oreilly.com). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
[email protected].
Tim O’Reilly
May 2021
Preface
Data science is an exciting field to work in. It’s also still relatively
young. Unfortunately, many people, and many companies as well,
believe that you need new technology to tackle the problems posed
by data science. However, as this book demonstrates, many things
can be accomplished by using the command line instead, and
sometimes in a much more efficient way.
During my PhD program, I gradually switched from using Microsoft
Windows to using Linux. Because this transition was a bit scary at
first, I started with having both operating systems installed next to
each other (known as a dual-boot). The urge to switch back and
forth between Microsoft Windows and Linux eventually faded, and at
some point I was even tinkering around with Arch Linux, which
allows you to build up your own custom Linux machine from scratch.
All you’re given is the command line, and it’s up to you what to
make of it. Out of necessity, I quickly became very comfortable using
the command line. Eventually, as spare time got more precious, I
settled down with a Linux distribution known as Ubuntu because of
its ease of use and large community. However, the command line is
still where I’m spending most of my time.
It actually wasn’t too long ago that I realized that the command line
is not just for installing software, configuring systems, and searching
files. I started learning about tools such as cut, sort, and sed.
These are examples of command-line tools that take data as input,
do something to it, and print the result. Ubuntu comes with quite a
few of them. Once I understood the potential of combining these
small tools, I was hooked.
After earning my PhD, when I became a data scientist, I wanted to
use this approach to do data science as much as possible. Thanks to
a couple of new, open source command-line tools including
xml2json, jq, and json2csv, I was even able to use the
command line for tasks such as scraping websites and processing
lots of JSON data.
In September 2013, I decided to write a blog post titled “7
Command-Line Tools for Data Science”. To my surprise, the blog post
got quite some attention, and I received a lot of suggestions of other
command-line tools. I started wondering whether the blog post
could be turned into a book. I was pleased that, some 10 months
later, and with the help of many talented people (see the
acknowledgments), the answer was yes.
I am sharing this personal story not so much because I think you
should know how this book came about, but because I want to you
know that I had to learn about the command line as well. Because
the command line is so different from using a graphical user
interface, it can seem scary at first. But if I could learn it, then you
can as well. No matter what your current operating system is and no
matter how you currently work with data, after reading this book
you will be able to do data science at the command line. If you’re
already familiar with the command line, or even if you’re already
dreaming in shell scripts, chances are that you’ll still discover a few
interesting tricks or command-line tools to use for your next data
science project.
Constant width
Used for code and commands, as well as within paragraphs to
refer to command-line tools and their options.
IT did not require the narration of this stirring tale to nerve our
forward movement, but it certainly increased our determination to
proceed at all hazard.
Our next halt was made at the cabin, some miles further on, from
which, as mentioned in the first chapter, the young man whom we
all knew and counted as one of us had been borne off a prisoner. As
soon as it was made known, by the usual signs, that we were
friends, we were joyfully if tearfully greeted. The family, consisting of
aged parents, sister, brother's wife and little children, were in
despair. Dreadful anxiety filled their minds. It was an illustration of
the saying that "to know the worst is better than suspense." If in the
great cause then firing their hearts this family had seen that son and
brother shot down before their eyes, they would have borne the
affliction silently and with submission. But the terrible uncertainty as
to his fate wrought upon them. A price had previously been set upon
the young man's head, and they had reason to fear the worst for
him.
It must be added, in passing, that his beloved ones never saw him
again alive. The good fortune fell to us to liberate him the next day
from his captors, when we found him bound upon his horse, with his
hands lashed behind him and his feet tied together under the
animal; but, alas! his liberation gave him only a short respite from
death. He fell, only a few days after, heroically fighting at the battle
of Osawatomie.
Some miles beyond we had to make that ford of the Pottawatomie
river of unenviable fame, and which we looked upon as the danger-
point of all others in our journey; for there our enemy, we thought,
would most likely be in ambush. But we swam the swift, dark,
muddy stream, swelled by recent rains to a flood, with the water up
to our horses' backs, luckily without hindrance or serious mishap.
That ford was the notorious Dutch Henry's crossing, so-called,—
surely a gloomy, gruesome, and dreaded spot at that dark midnight
hour. There, close by, had been enacted, just two months prior, the
rightly named Pottawatomie tragedy, which made that locality, on
account of this bloody event, verily for the time the "storm center"
of the Kansas conflict. But, terrible as it was, it served a great
purpose and was speedily followed by good.
The hero of our sketch was the central figure in this tragic act of the
Kansas drama, as he was in most others at this trying period. Brown
was the cyclonic force, the lightning's flash in the darkness, that
cleared and lighted the way for the men of that day.
Despite all delays on the way, we made our forced night-march of
twenty-two or more miles in remarkably good time, and arrived at
our destination about two o'clock in the morning, as weary,
exhausted, and hungry a set of troopers as ever drew rein and
slipped stirrup to seek rest and refreshment.
The Adair Log Cabin.
It will be of interest to our readers to learn here that, a couple of
miles from the town,—our halting place,—we passed the log cabin of
the Adair family, which has such historic interest gathered about it,
and which we shall have occasion to mention again later.
It so happened, as we learned afterward, that the hero of our story
lodged under that roof that night. He was aroused from his slumbers
and watched us from the window as we marched past,—having been
reliably assured, by our advanced guard, that we were no
threatening foe, but his firmest and safest friends.
A photographic view of the cabin's exterior is given on the opposite
page, as it appears to-day; and nearly the same as it existed at that
early date, now almost fifty years ago.
The town referred to was Osawatomie, soon to be made famous by
the man who is the principal subject of these sketches.
We were challenged by friendly pickets on guard, who escorted us to
the old "block-house" reared for town defense, where we were glad
to find shelter, and especially to find food, for hungry we were
indeed.
To what a sumptuous feast were we welcomed on that occasion!
And yet, strange to relate, the recollection of it is not calculated to
make one's mouth water. It so happened that a side of bacon and a
barrel of hardtack were stored there, for just such emergencies as
the present one, and these were now pressed into our service.
Their edible condition was such as naturally to suggest certain
Scripture phrases as descriptive thereof;—of the bacon, "ancient of
days"; and of the biscuit, "fullness of life." As we crunched the latter
between our teeth, the peculiar, fresh, sweet-and-bitter taste,
commingling at every mouthful, told us too well of the "life"
ensconced therein. No comments were made, however, except the
ejaculation occasionally, by one and another, "Wormy!" " Wormy!"
However, nothing daunted, we paused not in our eating till our
ravenous hunger was appeased. And then, on the bare floor of
boards, rived roughly out of forest trees,—though it was a little
difficult to fit our forms to their ridges and hollows,—we gained a
few hours of as sweet and refreshing slumber as ever visited mortal
eyes.
VI
The Battle
I N less time than it takes to relate it, the plan of battle was
arranged.
Our men were divided into three companies. Two divisions were to
make flank movements, one on the right and the other on the left of
the foe, while the third was to assault directly in front. The plan of
attack was well conceived and as successfully executed.
We had a circuit of some miles to make to gain the flank positions. It
was quickly and silently traveled. In our division, detailed on the left
flank, hardly a word was spoken during a two hours' march. Each
man was busy with his own thoughts. It is said that persons in
critical situations will sometimes have their whole lives pass before
them. I believe that most of us, during this march, recalled nearly all
we had ever done or seen, known or felt.
We were suddenly awakened, at length, from such reveries, by the
crack of rifles and the clash of musketry, and by bullets actually
whizzing about our ears. So closely had we stolen the march on
them that when we opened fire we were actually more in danger
from the guns of our friends than from those of our foes.
The enemy were taken completely by surprise. As prisoners whom
we took told us afterward, they thought that "Old Brown" was surely
upon them; and their next and only thought was of escape. They left
all, and ran for dear life, some on foot, shoeless and hatless; others
springing to their horses, and, even without bridle or saddle,
desperately making the trial of flight. Perfectly bewildered, they ran
this way and that; and naturally, as our forces were positioned,
many ran directly into our hands.
The one thing they did not do well was to fight, except in the case of
a few desperate ones and of the leaders, who called in vain upon
their men to rally. Then they gave up all for lost, and each looked
out for himself. Many discharged their pieces at the first onslaught,
but so much at random that not a man of our number was fatally
injured, though several were more or less severely wounded. We
took many prisoners, and captured some thirty horses, all the
enemy's wagons and luggage, and much ammunition and arms. The
victory was complete.
Not until all was over did Captain Brown and his reserve come up,
though they had ridden hard to lend us a helping hand. He warmly
congratulated us, however, upon our good success, saying that he
could not have done it better himself, and that he was just as glad
and proud of our victory as though he had won it.
VIII
T HERE were incidents not a few, connected with the day and with
the central figure of our sketch, which would add interest to our
pages. One there was which especially impressed itself upon all
witnesses of it.
This relates to one of the enemy who was fatally wounded in the
battle. He desired very much, he said, to see "Old Brown" before he
died.
Captain Brown was informed of the wish, whereupon he rode up to
the wagon which served as ambulance, and, with somewhat of
sternness in his manner, said to the prisoner, "You wish to see me.
Here I am. Take a good look at me, and tell your friends, when you
get back to Missouri, what sort of man I am."
Then he added in a gentler tone, "We wish no harm to you or to
your companions. Stay at home, let us alone, and we shall be
friends. I wish you well."
The prisoner meanwhile had raised himself with great difficulty, and
viewed the old man from head to foot as if feasting his eyes on a
great curiosity. Then he sank back, pale and exhausted, as he
answered, "I don't see as you are so bad. You don't talk like it."
The countenance of Brown as he viewed the sufferer had changed
to a look of commiseration. The wounded man saw it, and, reaching
out his hand, said, "I thank you." Brown tenderly clasped it, and
replied, "God bless you," while he turned with tears in his eyes and
rode away.
The present writer was standing within a few feet of Brown at the
time, and naturally drank in the scene with a boy's eager curiosity
and susceptibility to impression.
It was a scene for a painter, and the artist could with
appropriateness have called his work, "The Conqueror Conquered."
But it was perfectly illustrative of the man and of the hero. Brown
was as brave as a lion. He seemed absolutely not to know fear. Yet
withal he possessed a heart tender as a child's or as the tenderest
woman's.
IX
An Intrepid Charge
Hard Lines