Handson Entity Resolution A Practical Guide To Data Matching With Python Third Early Release Michael Shearer Instant Download
Handson Entity Resolution A Practical Guide To Data Matching With Python Third Early Release Michael Shearer Instant Download
https://ebookbell.com/product/handson-entity-resolution-a-
practical-guide-to-data-matching-with-python-third-early-release-
michael-shearer-53103394
https://ebookbell.com/product/handson-entity-resolution-a-practical-
guide-to-data-matching-with-python-1st-edition-michael-
shearer-55537528
https://ebookbell.com/product/handson-entity-resolution-michael-
shearer-57056790
https://ebookbell.com/product/handson-systematic-innovation-for-
business-management-first-darrell-mann-44912938
Handson Ethical Hacking And Network Defense Mindtap Course List 4th
Edition Rob Wilson
https://ebookbell.com/product/handson-ethical-hacking-and-network-
defense-mindtap-course-list-4th-edition-rob-wilson-44975634
Handson Data Analysis With Numpy And Pandas Implement Python Packages
From Data Manipulation To Processing 1st Edition Curtis Miller
https://ebookbell.com/product/handson-data-analysis-with-numpy-and-
pandas-implement-python-packages-from-data-manipulation-to-
processing-1st-edition-curtis-miller-45457142
https://ebookbell.com/product/handson-design-patterns-with-kotlin-gof-
reactive-patterns-concurrent-patterns-and-more-alexey-soshin-46191644
https://ebookbell.com/product/hands-on-liferay-dxp-learn-portlet-
development-and-customization-using-osgi-modules-1st-edition-apoorva-
prakash-46274882
https://ebookbell.com/product/hands-on-media-history-a-new-
methodology-in-the-humanities-and-social-sciences-1st-edition-nick-
hall-46417372
https://ebookbell.com/product/handson-data-analysis-in-r-for-finance-
jeanfrancois-collard-46450416
Hands-On Entity Resolution
A Practical Guide to Data Matching with Python
With Early Release ebooks, you get books in their earliest form—the author’s
raw and unedited content as they write—so you can take advantage of these
technologies long before the official release of these titles.
Michael Shearer
Hands-On Entity Resolution
by Michael Shearer
The views expressed in this work are those of the author, and do not represent
the publisher’s views. While the publisher and the author have used good
faith efforts to ensure that the information and instructions contained in this
work are accurate, the publisher and the author disclaim all responsibility for
errors or omissions, including without limitation responsibility for damages
resulting from the use of or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any code samples or
other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to
ensure that your use thereof complies with such licenses and/or rights.
978-1-098-14842-3
Chapter 1. Introduction to Entity
Resolution
With Early Release ebooks, you get books in their earliest form—the author’s
raw and unedited content as they write—so you can take advantage of these
technologies long before the official release of these titles.
This will be the 1st chapter of the final book. If you have comments about
how we might improve the content and/or examples in this book, or if you
notice missing material within this chapter, please reach out to the author at
[email protected].
All around the world vast quantities of data are being collected and stored.
This data records the world we live in, the changing attributes and
characteristics of the people, places and things around us. More data is being
added every day.
Companies and institutions seek to derive valuable insights from this raw
data. Advanced analytical techniques have been developed to discern patterns
in the data, extract meaning and even attempt to predict the future. The
performance of these algorithms depends on the quality and richness of the
data fed into them. By combining data from more than one organisation often
a richer, more complete, data set can be created from which more valuable
conclusions can be drawn.
This book will guide you through how to join these heterogeneous data sets
together to create richer sets of data about the world in which we live. This
process of joining data sets together is known by a variety of names including
name matching, fuzzy matching, record linking, entity reconciliation and
entity resolution. In this book we will use the term entity resolution to
describe the overall process of resolving, that is joining, data together that
refers to real-world entities.
Entity resolution is a key analytic technique to identify data records that refer
to the same real-world entity. This matching process enables the removal of
duplicate entries within a single source and the joining of disparate data
sources together when common unique identifiers are not available.
For example healthcare providers often need to join records from across
different practices or historical archives held on different platforms. In
financial services customer databases need to be reconciled to offer the most
relevant products and services or to enable the detection of fraud. To enhance
resilience or provide transparency on environmental and social issues
corporations need to join up supply chain records with sources of risk
intelligence.
Within the confines of their own domain these identifiers usually work well, I
identify myself with my unique number and it’s clear that I’m the same
returning individual. This identifier allows a common context to be quickly
established between two parties and reduces the possibility of
misunderstandings. These identifiers typically have nothing in common with
each other, they vary in length and format, they are assigned according to
different schemes. There is no mechanism to translate between them or to
identify that individually and collectively they refer to me and not another
individual.
However, when business is depersonalized, and I don’t know the person I’m
dealing with and they don’t know me, what happens if I register for the same
service more than once? Perhaps I’ve forgotten to identify with my unique
number or a new application is being submitted on my behalf. A second
number will be created that also identifies me. This duplication makes it more
difficult for the service provider to offer a personalized service as they must
now join together two different records to understand fully who I am and
what my needs might be.
Within larger organisations the problem of matching up customer records
becomes even more challenging. Different functions or business lines may
maintain their own records that are specifically tailored to their purpose but
were designed independently of each other. A common problem is how to
construct a comprehensive (or 360º) view of a customer. Customers may
have interacted with different parts of an organisation over many years. They
may have done so in different contexts; as an individual, as part of a joint
household or perhaps in an official capacity associated with a company or
other legal entity. In the course of these different interactions, the same
person may have been assigned a multiplicity of identifiers in various
systems.
Michael Shearer
Michael William Shearer
Michael W R Shearer
M W R Shearer
M W Shearer
None of these names exactly match with each other but all refer to the same
person, the same real-world entity. Titles, nicknames, shortened forms or
accented characters all frustrate the process of finding an exact
match. Double‑barrelled or hyphenated surnames add further permutations.
The process of capturing and recording names or labels usually reflects the
data standards of the acquirer. At the most basic level some data acquisition
processes will employ upper case characters only, others lowercase whilst
many will permit mixed case with initial letters capitalized.
Worked Example
Names
Name
Michael Shearer
What if we add another attribute, can that help improve our matching
accuracy? If you can’t remember your membership number a service provider
will often ask for a date of birth to help identify you (they also do this for
security reasons). Date of birth is a particularly helpful attribute as it doesn’t
change and has a large number of potential values (known as high
cardinality). Also the composite structure of individual values for day, month
and year may give us clues to the likelihood of a match when an exact
equivalence isn’t established. For example consider:
At first glance the Date of Birth is not equivalent between the two records
and so we might be tempted to discount the match. If these two individuals
are born 10 days apart, they are unlikely to be the same person! However,
there is only a single digit difference between the two with the former lacking
the leading digit 1 in the day sub-field - could this be a typo? It’s hard to tell.
If the records were from different sources we would also have to consider
whether the data format was consistent, do we have UK format of
DD/MM/YYYY or perhaps in the US format of MM/DD/YYYY?
What if we add a place of birth? Again, this attribute shouldn’t change but it
can be expressed at different levels of granularity or with different
punctuation. For example:
Here there is no exact match on the Place of Birth between any of the records
although all could be factually correct.
Therefore Place of Birth, which may be recorded at different levels of
specificity, doesn’t help us as much as we thought it might. What about
something more personal, like a phone number? Of course, many of us do
change our phone number throughout our life but with the ability to keep a
cherished and well‑socialized mobile phone number when swapping between
providers this number is a more sticky attribute that we can use. However,
even here we have challenges. Individuals may possess more than one
number (a work and a personal number for example), the identifier may be
recorded in a variety of formats, including spaces or hyphens. It may include
or exclude an international dialing prefix.
Place of Mobile
Name Date of Birth
Birth Number
Deliberate Obfuscation
The vast majority of data inconsistencies that frustrate the matching process
arise through inattentive but well meaning data capture processes. However
for some uses we must consider the scenario where data has been maliciously
obfuscated to disguise the true identity of the entity and prevent associations
that might reveal a criminal intent or association.
Match Permutations
If I asked you to match your name against a simple table of, say, 30 names
you could probably do so within a few seconds. A longer list might take
minutes but is still a practical task, however what if I asked you to compare a
list of 100 names with a second list of 100 names the task becomes a lot more
laborious and prone to error.
Not only does the number of potential matches expand to 10,000 (100 x 100),
but if you want to do so in one pass through the second table you have to hold
all 100 names from the first table in your head – not easy!
In fact you’d have 4950 comparisons to make. At one per second that’s about
80 minutes work just to compare two short lists. For much larger datasets the
number of potential combinations becomes impractical, even for the most
performant hardware.
Blind Matching?
So far we have assumed that the sets of data we seek to match are fully
transparent to us – that the values of the attributes are readily available, in full
and have not been obscured or masked in any way. In some cases this ideal is
not possible due to privacy constraints or geopolitical factors that prevent
data moving across borders. How can we find matches without being able to
see the data? This feels like magic but as we will see in Chapter 9 there are
cryptographic techniques that enable matching to still take place without
requiring full exposure of the list to be matched against.
Data Standardization
Record Blocking
Attribute Comparison
Match Classification
Clustering
Canonicalization
Data Standardization
Record Blocking
Attribute Comparison
Match Classification
The final step in the basic entity resolution process is to conclude whether the
collective similarity between individual attributes is sufficient to declare two
records a match, i.e. to resolve that they refer to the same real-world entity.
This judgement can be made according to a set of manually defined rules or
can be based on a machine-learning probabilistic approach.
Clustering
Once our match classification is complete we may group our records into
connected clusters via their matching pairs. The inclusion of a record pair in a
cluster may be determined by an additional match confidence
threshold. Records without pairs above this threshold will form standalone
clusters. If our matching criteria allow for different equivalence criteria then
our clusters may be intransitive, i.e. record A maybe paired with record B,
and record B paired with C, but record C may not be paired to record A. As a
result clusters may be highly interconnected or more loosely coupled.
Canonicalization
Worked Example
Returning to our simple example lets apply the steps to our data. Firstly lets
standardize our data, splitting the name attribute, standardizing the date of
birth and removing the extra characters in the place of birth and mobile
number fields:
Place of
Firstname Lastname Date of Birth Birth
In this simple example we only have one pair to consider so we don’t need to
apply blocking. We’ll return to this in Chapter 5.
Firstname
Michael Micheal No Match
Lastname
Shearer Shearer Match
Date of Birth
4/1/1970 14/1/1970 No Match
Place of Birth
Stow on the Stow on the Match
Wold Wold
Mobile Number
07700 900999 07700 900999 Match
We can take this approach a step further and assign a relative weighting to
each of our attribute comparisons; a mobile number is worth perhaps twice as
much as a Date of Birth match and so on… Combining these weighted
scores produces an overall match score which can be considered against a
given confidence threshold.
This probabilistic approach works particularly well when some of the values
of a categorical attribute (one with a finite set of values) are significantly
more common than others. If we consider a City attribute as part of an
address match in a UK dataset then London is likely to occur much more
frequently than, say, Bath and therefore maybe weighted less.
Measuring Performance
Statistical approaches may help us to decide how to evaluate and combine all
the clues that comparing individual attributes gives us but how do we decide
whether the combination is good enough or not? How do we set the
confidence threshold to declare a match? This depends on what is important
to us and how we propose to use our newly found matches.
Do we care more about being sure we spot every potential match and we are
ok if in the process we declare a few matches that turn out to be false? This
measure is known as Recall. Or we don’t want to waste our time with
incorrect matches but if we miss a few true matches along the way that’s ok.
This is called Precision.
Comparing two records there are four different scenarios that can arise:
If our recall measure is high then we are only declaring relatively few False
Negatives, i.e. when we declare a match we rarely overlook a good candidate.
If our precision is high then when we declare a match we nearly always get it
right.
Ideally of course we’d like high recall and precision simultaneously, our
matches are both correct and comprehensive but this is tricky to achieve!
Chapter 6 describes this process in more detail.
Getting Started
So how can we solve these challenges?
Hopefully this chapter has given you a good understand of what Entity
Resolution is, why it is needed and the main steps in the process.Subsequent
chapters will guide you, hands-on, through a set of worked real-world
examples based on publicly available data.
1
https://www.fbiic.gov/public/2008/nov/Naming_practice_guide_UK_2006.pdf
Chapter 2. Data Standardization
With Early Release ebooks, you get books in their earliest form—the author’s
raw and unedited content as they write—so you can take advantage of these
technologies long before the official release of these titles.
This will be the 2nd chapter of the final book. If you have comments about
how we might improve the content and/or examples in this book, or if you
notice missing material within this chapter, please reach out to the author at
[email protected].
In this Chapter we will get hands-on and work through a real-world example
of this process. We will create our working environment, acquire the data we
need, cleanse that data and then perform a simple entity resolution exercise to
allow us to perform some simple analysis. We will conclude by examining
the performance of our data matching process and consider how we might
improve it.
But first let’s introduce our example and why we need entity resolution to
solve it!
Sample Problem
The TheyWorkForYou website is run by mySociety, a UK charity who build web tools that make
democracy a little more accessible. MySociety is not politically-aligned, and its projects are for
everyone to use.
So how can we join these two datasets together? Although both datasets
include the name of the Constituency that each MP represents we can’t use
this as a common key because since the 2019 general election a number of
by-elections1 have taken place, returning new MPs. These new members may
have Facebook accounts but should not be considered in the re-election
population as this might skew our analysis. Therefore we need to connect our
data by matching the names of the MPs between the two sets of records, i.e.
resolving these entities so we can create a single combined record for each
MP.
Environment Setup
Our first task is to set up our entity resolution environment. In this book we
will be using Python and the Juypter-lab interactive development
environment.
To begin you’ll need Python installed on your machine. If you don’t already
have it you can download it from www.python.org.2
If installing Python for the first time make sure to select the ‘Add Python to PATH’ option to ensure
you can run Python from your command line.
Once git is installed you can clone (that is take a copy of) the Github
repository that accompanies this book onto your machine. Run this command
from the parent directory of your choice:
>>>git clone
https://github.com/mshearer0/HandsOnEntityResolutio
n
>>>cd HandsOnEntityResolution
>>>.\venv\Scripts\activate.bat (Windows)
This will prefix your command prompt to show the environment name based on the directory name:
>>>(HandsOnEntityResolution) your_path\HandsOnEntityResolution
>>>deactivate (Windows)
>>>deactivate (Linux)
To set up our Jupyter Lab code environment and the packages will use the
Python package manager pip. Pip should be included with your Python
installation. You can check using:
You can then install the packages you will need throughout the book from the
requirements.txt file using:
Next configure a python kernel associated with our virtual environment for
our notebooks to pick up:
>>>jupyter-lab
Acquiring Data
Now we have our environment configured, our next task is to acquire the data
we need. It’s often the case that the data we need comes in a variety of
formats and presentations. The examples included in this book will illustrate
how to deal with the some of the most common formats we encounter.
Wikipedia Data
Then we can import the requests and BeautifulSoup python packages and use
them to download a copy of the Wikipedia text and then run an html parser to
extract all the tables present on the page:
import requests
from bs4 import BeautifulSoup
website_url = requests.get(url).text
soup = BeautifulSoup(website_url,'html.parser')
tables = soup.find_all('table')
BEAUTIFUL SOUP
Beautiful Soup is a library that makes it easy to scrape information from web
pages. More details are available at:
https://www.crummy.com/software/BeautifulSoup/
Next, we need to find the table we want within the page. In this case we
select the table that includes the text ‘Member returned’ (a column name).
Within this table we extract the column names as headers and then iterate
through all the remaining rows and elements building a list of lists. These
lists are then loaded into a Pandas dataframe, setting the extracted headers as
dataframe column names.
import pandas as pd
The result is a Pandas dataframe which we can examine using the info()
method:
Figure 2-1. Wikipedia MP Info
Finally we can simplify our dataset by only retaining the columns we need:
Now we can move on to download our second dataset and load it into a
separate dataframe:
url = "https://www.theyworkforyou.com/mps/?f=csv"
df_t = pd.read_csv(url, header=0)
Figure 2-2. They Work For You MP Info
If you are reading this book after the 2024/25 UK general election then the TheyWorkForYou website
will likely be updated with the new MPs. If you are following along on your own machine then please
use the ‘mps_they_raw.csv’ file supplied in the Github repository that accompanies this book. The raw
Wikipedia data ‘mps_wiki_raw.csv’ is also provided.
If we examine the first few rows of this data we can see what these fields
contain:
Figure 2-3. First five rows of the They Work For You dataset
def facelink(url):
website_url = requests.get(url).text
soup = BeautifulSoup(website_url,'html.parser')
flinks = [f"{item['href']}" for item in soup.select
("a[href*='facebook.com']")]
if
flinks[0]!="https://www.facebook.com/TheyWorkForYou":
return(flinks[0])
else:
return("")
The function uses the same BeautifulSoup package we used to parse the
Wikipedia webpage. In this case we extract all the links to facebook.com. We
then examine the first link. If this link is the account of weworkforyou then
the site doesn’t have a Facebook account listed for the MP so we return a nil
string, if it does then we return that link.
We can apply this function to every row in the dataframe using the apply()
method to call the facelink function, passing the URI value as the url. The
value returned from the function is added to a new column Flink appended to
the dataframe.
Be patient, this function has to do quite a bit of work so may take a few
minutes to run on your machine. Once this completes we can view our first
few rows again to check if we are getting Facebook links we expect:
Figure 2-4. First five rows of the They Work For You dataset with Facebook links
Finally we can simplify our dataset by only retaining the columns we need:
Cleansing Data
Now we have our raw datasets we can begin our data cleansing process. We
will perform some initial cleansing on the Wikipedia dataset first and then the
TheyWorkForYou data. We will then attempt to join these datasets together
and see what further inconsistencies that reveals and we need to standardize.
Wikipedia
Let’s have a look at the first and last few rows in the Wikipedia dataset:
The first task in our cleansing process is to standardize our column names:
We can also see that the output of our parser has a blank row at the start and
the end of our dataframe and it appears we have ‘\n’ characters appended to
each element. These additions would clearly interfere with our match so need
to be removed.
df = df.dropna()
df_w['Constituency'] =
df_w['Constituency'].str.rstrip("\n")
df_w['Fullname'] = df_w['Fullname'].str.rstrip("\n")
To be sure we now have a clean Fullname we can check for any other '\n’
characters.
df_w[df_w['Fullname'].astype(str).str.contains('\n')]
This simple checks shows we also have leading values that we need to
remove them:
df_w['Fullname'] = df_w['Fullname'].str.lstrip("\n")
Our next task is to split our Fullname into Firstname and Lastname so we can
match these values independently. For the purposes of this example we are
going to use a simple method, selecting the first substring as the Firstname
and the remaining substrings, separated by spaces, as the Lastname.
df_w['Firstname'] = df_w['Fullname'].str.split().str[0]
df_w['Lastname'] =
df_w['Fullname'].astype(str).apply(lambda x:
' '.join(x.split()[1:]))
We can check how well this basic method has worked by looking for
Lastnames that contain spaces:
Attribute Comparison
Now we have two similarly formatted dataframes we can experiment with the
next stage of the Entity Resolution process. As our datasets are small we
don’t need to employ record blocking and so we can proceed directly to try a
simple exact match of Firstname, Lastname and Constituency. The merge
method (similar to a database join) does this exact matching for us:
len(df_w.merge(df_t, on=
['Constituency','Firstname','Lastname']))
599
We find 599 of 650 are perfect matches of all three attributes – not bad!!
Matching on just Constituency and Lastname gives us 607 perfect matches so
we clearly have 8 mismatching Firstnames:
len(df_w.merge(df_t, on=['Constituency','Lastname']))
607
We have 599 matches out of 650 so far, but can we do better? Let’s start with
examining the Constituency attribute in our datasets. As a categorical
variable we would expect this should be pretty easy to match.
len(df_w.merge(df_t, on=['Constituency'] ))
623
We have 623 matches, leaving 27 unmatched. Why not? Surely we’d expect
the same Constituencies to be present in both datasets, what is going wrong?
Constituency
Let’s have a look at the first 5 of the unmatched population in both datasets.
To do this we perform an outer join between the dataframes using the
Constituency attribute and then select those records found in either the right
(Wikipedia) or left (Theyworkforyou) dataframe accordingly:
Figure 2-8. Constituency Mismatches
We can see that the first dataset from the TheyWorkForYou website has
commas embedded in the Constituency names whereas the Wikipedia data
does not. This explains why they don’t match. To ensure consistency let’s
remove any commas from both dataframes:
df_t['Constituency'] =
df_t['Constituency'].str.replace(',', '')
df_w['Constituency'] =
df_w['Constituency'].str.replace(',', '')
After applying this cleansing we have have a perfect match on all 650
constituencies.
len(df_w.merge(df_t, on=['Constituency']))
650
CASE SENSITIVITY
In this simple example we have matching case conventions (e.g. initial Capitalization) between the two
datasets. In many situations this won’t be the case and you’ll need to standardize on upper or lower
case characters. We’ll see how this can we done in later chapters.
Repeating our perfect match on all three attributes we can now match 624
records.
len(df_w.merge(df_t, on=
['Constituency','Firstname','Lastname']))
624
“Now abideth Faith, Hope and Charity, these three; but the greatest of these is
Charity.”
In no age, and among no other people, have these virtues been so
signally illustrated as they have been in our own age and by the
people of the United States.
Faith in the Republic, in the grandeur of its power, in the beneficence
of its institutions, and in the freedom, humanity and justice of its
rule—this sentiment animated and inspired the soldiers of the Union
during the long and dreadful years of the late civil war.
The glad picture of a country saved, disenthralled and enfranchised—
this was the hope, imprinted on their hearts, that made their long
marches less wearisome, that shortened the lonely hours of the night
watch, and that nerved their arms amid the smoke of battle.
And the greatest of these virtues has been illustrated, during the two
decades since the war, by the quick, unfailing and generous response
of the people to every appeal made in behalf of those who thus risked
health and life that the Republic might be preserved.
The inspiration that prompts and organizes such a charity as this, in
which you, ladies and gentlemen of Topeka, have engaged, is in every
sense honorable to the Capital City. Kindly consideration of the
needs and sufferings of the poor or unfortunate is always a gracious
sentiment. But it is doubly so when it has for its object the relief of
men who once periled their lives for their country, and I am honored
by the part you have allotted me, to formally open this fair and
festival.
In the bustle, rush and interest of personal and public activities, the
people sometimes forget how immeasurable is the debt of gratitude
the Republic owes to the soldiers. I have, now and then, heard good
citizens bewailing the burden of our pension list, and thoughtlessly
declaring that Congress was extravagantly generous in the pensions
given those who were disabled in the service. It is probably true that
in some instances the generosity of the Government has been
imposed upon. But would any young man in this assemblage consent
to lose an arm for thirteen dollars a month, or a leg for thirty-two
dollars a month, or to go through life blind and helpless for seventy-
two dollars a month? Measure the sacrifice with the pension, and no
true-hearted, right-thinking man or woman will say that the
Government has done more than justice to its disabled soldiers.
Nor should it be forgotten, in considering the obligation of the people
and the Government to the soldiers, that there was a time, not many
years ago, when everything in this country—its Government, its
lands, its money—was absolutely at the mercy of the army. When, at
the close of the late civil war, the men who had followed Grant, and
Sherman, and Thomas, and Meade, marched in review down
Pennsylvania avenue, in Washington, on their bayonets rested all
control, all law, all public authority. They had only to say to
Congress, “These States we have conquered are ours; divide their
territory and give us patents for it,” and it would have been done.
They had only to say to the President, Congress, and people: “We
have each earned pensions of one hundred dollars a month—enact
such a law,” and it would have been enacted. Or if they had declared,
“Give every soldier’s widow or orphan one thousand dollars a year,”
where was the power to say nay? The Government? It was their
strong and steady columns. The Congress? They could have sent a
Corporal with his guard and brushed it away as you would a fly from
your hand. The Constitution? It was a mere faded parchment, as dry
and useless as a last year’s bird’s nest, if these men in faded blue
uniforms, marching with their tattered flags—these men who had
looked death in the face on dozens of battle-fields—had so decreed.
But they used their great power with the chivalry of heroes and the
unselfishness of patriots. They demanded nothing of the country
they had saved. Quietly and modestly they returned to the peaceful
homes and walks they had left, and took up again the broken threads
of their old life.
Thousands of these soldiers contracted, during their service, the
seeds of diseases that were only developed years afterward. The
surgeon of my own Regiment, an old and capable physician, once
told me that every man in the army marching to Atlanta under
Sherman, would sooner or later suffer from the effects of the
exposures and hardships of that trying campaign.
And for unfortunates of this character, whose disabilities were
developed long after the war, the Government provides no pensions.
Death may result, but the widows and orphans of such soldiers can
claim no bounty from the country. For the relief of these soldiers,
their widows and orphans, private benevolence must be appealed to.
And this is the purpose of your organization. It is an object that
should enlist the sympathies of all, and I sincerely trust that the
largest measure of success may reward your efforts, and that a fund,
ample for the purposes indicated, may be provided.
A WAR-TIME PICTURE.
Speech, at a Grand Army Camp-Fire, held in Topeka, February 13, 1885.
ebookbell.com