100% found this document useful (2 votes)
68 views

Probabilistic data structures and algorithms for big data applications Gakhov all chapter instant download

data

Uploaded by

peachgsell5r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
68 views

Probabilistic data structures and algorithms for big data applications Gakhov all chapter instant download

data

Uploaded by

peachgsell5r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Download the Full Version of textbook for Fast Typing at textbookfull.

com

Probabilistic data structures and algorithms for


big data applications Gakhov

https://textbookfull.com/product/probabilistic-data-
structures-and-algorithms-for-big-data-applications-gakhov/

OR CLICK BUTTON

DOWNLOAD NOW

Download More textbook Instantly Today - Get Yours Now at textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Learning functional data structures and algorithms learn


functional data structures and algorithms for your
applications and bring their benefits to your work now
Khot
https://textbookfull.com/product/learning-functional-data-structures-
and-algorithms-learn-functional-data-structures-and-algorithms-for-
your-applications-and-bring-their-benefits-to-your-work-now-khot/
textboxfull.com

Big Data Analytics Systems Algorithms Applications C.S.R.


Prabhu

https://textbookfull.com/product/big-data-analytics-systems-
algorithms-applications-c-s-r-prabhu/

textboxfull.com

Learning Functional Data Structures and Algorithms Learn


functional data structures and algorithms for your
applications and bring their benefits to your work now 1st
Edition Atul S. Khot
https://textbookfull.com/product/learning-functional-data-structures-
and-algorithms-learn-functional-data-structures-and-algorithms-for-
your-applications-and-bring-their-benefits-to-your-work-now-1st-
edition-atul-s-khot/
textboxfull.com

Disk-based algorithms for big data 1st Edition Healey

https://textbookfull.com/product/disk-based-algorithms-for-big-
data-1st-edition-healey/

textboxfull.com
Data Structures and Algorithms in Swift Kevin Lau

https://textbookfull.com/product/data-structures-and-algorithms-in-
swift-kevin-lau/

textboxfull.com

Data Structures & Algorithms in Python John Canning

https://textbookfull.com/product/data-structures-algorithms-in-python-
john-canning/

textboxfull.com

Data Mining Algorithms in C++: Data Patterns and


Algorithms for Modern Applications 1st Edition Timothy
Masters
https://textbookfull.com/product/data-mining-algorithms-in-c-data-
patterns-and-algorithms-for-modern-applications-1st-edition-timothy-
masters/
textboxfull.com

Data Structures Algorithms in Kotlin Implementing


Practical Data Structures in Kotlin 1st Edition
Raywenderlich Tutorial Team
https://textbookfull.com/product/data-structures-algorithms-in-kotlin-
implementing-practical-data-structures-in-kotlin-1st-edition-
raywenderlich-tutorial-team/
textboxfull.com

Bio-inspired Algorithms for Data Streaming and


Visualization, Big Data Management, and Fog Computing
Simon James Fong
https://textbookfull.com/product/bio-inspired-algorithms-for-data-
streaming-and-visualization-big-data-management-and-fog-computing-
simon-james-fong/
textboxfull.com
PROBABILISTIC
DATA STRUCTURES AND ALGORITHMS
FOR BIG DATA APPLICATIONS

ANDRII GAKHOV
Probabilistic Data Structures and Algorithms
for Big Data Applications

1st edition, 2019

Bibliographic information published by the Deutsche Nationalbibliothek:


The Deutsche Nationalbibliothek lists this publication in the Deutsche
Nationalbibliografie; detailed bibliographic data are available on the
Internet at http://dnb.dnb.de.

© 2019 Andrii Gakhov

All rights reserved; no part of this book may be reproduced or transmitted


by any means, electronic, mechanical, photocopying or otherwise, without
the prior permission of the author. All product names and trademarks
referred to are the property of their respective owners.

The paperback edition is printed and published by


BoD — Books on Demand GmbH
22848 Norderstedt
Germany

The publisher and the author assume no responsibility for errors or


omissions, or for damages resulting from the use of the information
contained in this book.

ISBN (paperback): 978-37-48190-48-6


ASIN (ebook): B07MYKTY8W
To my wife Larysa
and my son Gabriel.
Table of Contents

Preface vii

1 Hashing 1
1.1 Cryptographic hash functions . . . . . . . . . . . . . . . . 3
1.2 Non-Cryptographic hash functions . . . . . . . . . . . . . 7
1.3 Hash tables . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Membership 21
2.1 Bloom filter . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Counting Bloom filter . . . . . . . . . . . . . . . . . . . . 32
2.3 Quotient filter . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Cuckoo filter . . . . . . . . . . . . . . . . . . . . . . . . . 49
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Cardinality 61
3.1 Linear Counting . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Probabilistic Counting . . . . . . . . . . . . . . . . . . . . 68
3.3 LogLog and HyperLogLog . . . . . . . . . . . . . . . . . 77
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4 Frequency 93
4.1 Majority algorithm . . . . . . . . . . . . . . . . . . . . . 97
4.2 Frequent algorithm . . . . . . . . . . . . . . . . . . . . . 100
4.3 Count Sketch . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4 Count–Min Sketch . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5 Rank 127
5.1 Random sampling . . . . . . . . . . . . . . . . . . . . . . 131
5.2 q-digest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 t-digest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

6 Similarity 163
6.1 Locality–Sensitive Hashing . . . . . . . . . . . . . . . . . 175
6.2 MinHash . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.3 SimHash . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Index 207
Preface

Big data is characterized by three fundamental dimensions: Volume,


Velocity, and Variety, The Three V’s of Big Data. The Volume
expresses the amount of data, Velocity describes the speed at which data
is arriving and being processed, and Variety refers to the number of
types of data.
The data could come from anywhere, including social media, various
sensors, financial transactions, etc. IBM has stated1 that people create
2.5 quintillion bytes of data every day, this number is growing
constantly and most of it cannot be stored and is usually wasted
without being processed. Today, it is not uncommon to process terabyte-
or petabyte-sized corpora and gigabit-rate streams.
On the other hand, nowadays every company wants to fully
understand the data it has, in order to find value and act on it. This led
to the rapid growth in the Big Data Software market. However,
the traditional technologies which include data structures and
algorithms, become ineffective when dealing with Big Data. Therefore,

1
What Is Big Data? https://www.ibm.com/software/data/bigdata/what-is-big-data.html
viii Preface

many software practitioners, again and again, refer to computer science


for the most appropriate solutions and one option is to use probabilistic
data structures and algorithms.
Probabilistic data structures is a common name for data structures
based mostly on different hashing techniques. Unlike regular (or
deterministic) data structures, they always provide approximated
answers but with reliable ways to estimate possible errors. Fortunately,
the potential losses and errors are fully compensated for by extremely
low memory requirements, constant query time, and scaling, the factors
that become essential in Big Data applications.

About this book


The purpose of this book is to introduce technology practitioners which
includes software architects and developers, as well as technology
decision makers to probabilistic data structures and algorithms. Reading
this book, you will get a theoretical and practical understanding of
probabilistic data structures and learn about their common uses.
This is not a book for scientists, but to gain the most out of it you
will need to have basic mathematical knowledge and an understanding
of the general theory of data structures and algorithms. If you do not
have any “computer science” experience, it is highly recommended you
read Introduction to Algorithms by Thomas H. Cormen, Charles E.
Leiserson, Ronald L. Rivest, and Clifford Stein (MIT), which provides
a comprehensive introduction to the modern study of computer
algorithms.
While it is impossible to cover all the existing amazing solutions,
this book is to highlight their common ideas and important areas of
application, including membership querying, counting, stream mining,
and similarity estimation.
ix

Organization of the book


This book consists of six chapters, each preceded by an introduction
and followed by a brief summary and bibliography for further reading
relating to that chapter. Every chapter is dedicated to one particular
problem in Big Data applications, it starts with an in-depth explanation
of the problem and follows by introducing data structures and algorithms
that can be used to solve it efficiently.
The first chapter gives a brief overview of popular hash functions
and hash tables that are widely used in probabilistic data structures.
Chapter 2 is devoted to approximate membership queries, the most
well-known use case of such structures. In chapter 3 data structures that
help to estimate the number of unique elements are discussed. Chapters
4 and 5 are dedicated to important frequency- and rank-related metrics
computations in streaming applications. Chapter 6 consists of data
structures and algorithms to solve similarity problems, particularly —
the nearest neighbor search.

This book on the Web


You can find errata, examples, and additional information at
https://pdsa.gakhov.com. If you have a comment, technical question
about the book, would like to report an error you found, or any other
issue, send email to [email protected].
In case you are also interested in Cython implementation that includes
many of the data structures and algorithms from this book, please
check out our free and open-source Python library called PDSA at
https://github.com/gakhov/pdsa. Everybody is welcome to contribute
at any time.
x Preface

About the author


Andrii Gakhov is a mathematician and software engineer holding a Ph.D.
in mathematical modeling and numerical methods. He has been a teacher
in the School of Computer Science at V. Karazin Kharkiv National
University in Ukraine for a number of years and currently works as
a software practitioner for ferret go GmbH, the leading community
moderation, automation, and analytics company in Germany. His fields
of interests include machine learning, stream mining, and data analysis.
The best way to reach the author is via Twitter @gakhov or by visiting
his webpage at https://www.gakhov.com.

Acknowledgments
The author would like to thank Asmir Mustafic, Jean Vancoppenolle,
and Eugen Martynov for the contribution to reviewing this book and
for their useful recommendations. Big gratitude to academia reviewers
Dr. Kateryna Nesvit and Dr. Dharavath Ramesh for their invaluable
suggestions and remarks.
Special thanks to Ted Dunning, the author of the t-digest algorithm,
for a very precise review of the corresponding chapter, the insightful
questions, and discussion.
Finally, thanks to all the people who provided feedback and helped
make this book possible.
1
Hashing

Hashing plays the central role in probabilistic data structures as they


use it for randomization and compact representation of the data.
A hash function compresses blocks of input data of an arbitrary size by
generating an identifier of a smaller (and in most cases fixed) size, called
the hash value or simply the hash.
The choice of hash functions is crucial to avoid bias. Although
the selection decision is mostly based on the input and particular
use cases, there are certain common properties that a hash function
should fulfill in order to be applicable for hash-based selection.
2 Chapter 1: Hashing

Hash functions compress the input, therefore, cases where they generate
the same hash values for two different blocks of data are unavoidable and
known as hash collisions.

In 1979 J. Lawrence Carter and Mark Wegman proposed the universal


hash functions whose mathematical properties can guarantee a low
expected number of collisions, even if the input data are chosen
randomly from the universe.
The universal hash functions family H maps elements of the universe
to the range {0, 1, . . . , m – 1} and guarantees that by randomly picking
a hash function from the family the probability of collisions is limited:

  1
Pr h(x ) = h(y) ≤ , for any x , y : x 6= y. (1.1)
m
Thus, the random choice of a hash function from the family with
property (1.1) is precisely the same as choosing an element uniformly
at random.
An important universal hash functions family, designed to hash integers,
can be defined as

h{k ,q} (x ) = ((k · x + q) mod p) mod m, (1.2)

where k and q are randomly chosen integers modulo p with k 6= 0.


The value of p should be selected as a prime p ≥ m, and the common
choice is to take one of the known Mersenne prime numbers, e.g., for
m = 109 we choose p = M31 = 231 – 1 ≈ 2 · 109 .
Many applications can use the simpler version of the family (1.2):

h{k } (x ) = (k · x mod p) mod m, (1.3)

this is only approximately universal, but still provides a good probability


of collisions smaller than m2 in expectation.
However, the above families of hash functions are limited to integers,
that is not enough for most practical applications which require to
1.1 Cryptographic hash functions 3

hash variable-sized vectors and are in demand of fast and reliable hash
functions with certain guaranteed properties.
There are many classes of hash functions used in practice and the choice
mainly depends on their design and particular use. In the current
chapter we provide an overview of popular hash functions and simple
data structures that are prevalent in various probabilistic data structures.

1.1 Cryptographic hash functions


Practically, cryptographic hash functions are defined as fixed mappings
from variable input bit strings to fixed length output bit strings.
As stated previously, hash collisions are unavoidable, but a secure hash
function is required to be collision resistant, meaning that it should be
hard to find collisions. Of course, a collision can be found accidentally
or computed in advance. This is why such a class of functions always
requires mathematical proofs.
Cryptographic hash functions are very important in cryptography and
are used in many applications such as digital signatures, authentication
schemas, and message integrity.
There are three main requirements that cryptographic hashes are
expected to satisfy:

• Work factor — to make brute force inversion hard, a cryptographic


hash should be computationally expensive.
• Sticky state — cryptographic hash should not have a state in
which it can stick for a plausible input pattern.
• Diffusion — every output bit of a cryptographic hash should be
an equally complex function of every input bit.

Theoretically, cryptographic functions can be further divided into


keyed hash functions, that use a secret key, and unkeyed hash functions,
4 Chapter 1: Hashing

which do not. Probabilistic data structures use only unkeyed hash


functions, which include One–Way hash functions, Collision Resistant
hash functions, and Universal One–Way hash functions. These functions
differ only in some additional properties.
One–Way hash functions satisfy the following requirements:

• They can be applied to blocks of data of any length (of course,


in practice, it’s bounded by some huge constant).
• They produce a fixed-length output.
• They should have preimage resistance (one-way property) — it
should be computationally infeasible to find an input which hashes
to the specified output.

Additionally, for Collision Resistant hash functions it should be


extremely unlikely for two different inputs to generate the same hash
value.
If not collision resistant, Universal One–Way hash functions need to
be target collision resistant or second-preimage collision resistant — it
should be computationally infeasible to find a second distinct input that
hashes to the same output as the specified input.
Note, that being collision resistant implies that the function is
second-preimage resistant, but the generic complexity of finding
a second-preimage resistance function is much higher than finding
a colliding pair.

Because of their design (particularly, the work factor requirement),


cryptographic hash functions are much slower than non-cryptographic
ones. For instance, the function SHA–1, discussed below, is in the order of
540 MiB/second1 , but the popular non-cryptographic functions are in
the order of 2500 MiB/second and more.

1
Crypto++ 6.0.0 Benchmarks https://www.cryptopp.com/benchmarks.html
1.1 Cryptographic hash functions 5

Message–Digest Algorithms

The popular Message–Digest Algorithm, MD5, was invented by Ron


Rivest in 1991 to replace the old MD4 standard. It is a cryptographic
hash algorithm, defined in IETF RFC 1321, that takes a message of
an arbitrary length and produces as an output the unique 128-bit hash
of the input.
The MD5 algorithm is based on the Merkle–Damgård schema. At
the first stage, it converts the input of an arbitrary size to a number of
blocks of a fixed size (512-bit blocks or sixteen 32-bit words) using an MD–
compliant padding function. Afterwards, such blocks are processed one
by one using a special compression function and every next block uses
the result of the previous output. To make the compression secure,
the algorithm applies Merkle–Damgård strengthening, then the padding
uses the encoded length of the original message. The final MD5 hash
digest is the 128-bit value generated after the processing of the last block.
The MD5 algorithm is often used to verify the integrity of a file —
instead of confirming that the file is unchanged by examining its raw
data, it is enough to compare the MD5 hashes.

As stated in Vulnerability Note VU#8360682 , the MD5 algorithm is


vulnerable to collision attacks. The discovered weaknesses in the algorithm
allow for the construction of different messages with the same MD5 hash.
As a result, attackers can generate cryptographic tokens or other data that
illegitimately appears authentic. It is not advisable to use it as a secure
cryptographic algorithm anymore, however, such vulnerability doesn’t
have a big impact for probabilistic data structures and can still be used.

Secure Hash Algorithms

Secure Hash Algorithms were developed by the US National Security


Agency (NSA) and published by the National Institute of Standards and

2
VU#836068 http://www.kb.cert.org/vuls/id/836068
6 Chapter 1: Hashing

Technology (NIST). The first algorithm from the family, called SHA–0,
was published in 1993 and quickly replaced by its successor SHA–1,
which became widely accepted globally. SHA–1 produces a longer 160-bit
(20-byte) hash value, while its security has been increased by fixing
the weaknesses of SHA–0.
SHA–1 was widely used for years in various applications, and most
websites were signed using algorithms based on it. However, in 2005
a weakness in SHA–1 was discovered, so in 2010 NIST deprecated it for
government use and it also got deprecated on the Internet since 2011.
Same as with MD5, the found weaknesses didn’t impinge on its usage as
a hash function for probabilistic data structures.
SHA–2 was published in 2001 and included six hash functions with
varying digest sizes: SHA–224, SHA–256, SHA–384, SHA–512, and
others. SHA–2 is stronger than SHA–1 and attacks made against SHA–2
are unlikely to happen with current computing power.

RadioGatún

The cryptographic hash function family called RadioGatún was presented


at the Second Cryptographic Hash Workshop in 2006 [Be06]. The design
of RadioGatún improved the known Panama hash function.
Similar to other popular hash functions, the input is split into
a sequence of blocks which are injected into the algorithm’s internal
state using a special function, that is followed by an iterative application
of a single non-cryptographic round function (called the belt-and-mill
round function). At every round, the state is represented as two parts,
the belt and the mill, that are treated differently by the round function.
The application of the round function consists of four operations in
parallel: 1) non-linear function applied to the mill, 2) simple linear
function applied to the belt, 3) feedforward some bits of the mill to
the belt in a linear way, 4) feedforward some bits of the belt to the mill
in a linear way. After injection of all input blocks, the algorithm
performs a number of rounds without input or output (blank rounds)
1.2 Non-Cryptographic hash functions 7

after which a part of the state is returned as the final hash value.
Among the family, RadioGatún64, with 64-bit words, is the default
choice and is optimal for 64-bit platforms. For best performance on
32-bit platforms, RadioGatún32, with 32-bit words, can also be used.

For the same clock frequency, RadioGatún32 is claimed to be 12 times faster


than SHA–256 for long inputs, and 3.2 times faster for short inputs,while
having fewer gates. RadioGatún64 is even 24 times faster than SHA–256
for long inputs but has about 50% more logic gates.

1.2 Non-Cryptographic hash functions


In contrast to cryptographic hash functions, non-cryptographic functions
are not designed to fend off attacks aimed at finding a collision, hence
don’t require security and high collision resistance.
Such functions simply have to be fast and guarantee a low probability
of collisions, allowing a lot of data to be quickly hashed with a reasonable
error probability.

Fowler/Noll/Vo

The basis of the Fowler/Noll/Vo (FNV or FNV1) non-cryptographic


hash algorithm was taken from an idea sent, as a reviewer comment, to
the IEEE POSIX P1003.2 committee by Glenn Fowler and Phong Vo
back in 1991 and afterward improved on by Landon Curt Noll [Fo18].
The FNV algorithm maintains an internal state that is initialized
to a special offset basis. After that, it iterates over the input blocks
of 8 bits and performs the multiplication of the state on some large
numerical constant, called the FNV Prime, followed by applying logical
exclusive OR (XOR) to the input block. After the last input is processed,
the resulting value of the state is reported as the hash.
8 Chapter 1: Hashing

The FNV Prime and the offset basis constants are design parameters
and depend on the bit length of the produced hash values. As mentioned
by Landon Curt Noll, the selection of the primes is the part of the magic
of the FNV algorithm, and some primes do hash better than others for
the same hash size.
The FNV1a alternate algorithm, that currently has to be preferred, is
a minor variation of the FNV algorithm that differs only in the order of
the internal XOR and multiplication operations. Although FNV1a uses
the same parameters and the FNV Prime as the FNV1, its XOR–folding
provides slightly better dispersion without interfering with the CPU
performance.
Currently, the FNV family includes algorithms for 32-, 64-, 128-, 256-,
512-, and 1024-bit hash values.
The FNV is very simple to implement, but its high dispersion of
the hash values makes them well suited for hashing nearly identical
strings. It is widely used in DNS servers, Twitter, database indexing
hashes, web search engines, and many other places. Some years ago,
the 32-bit version of the FNV1a was recommended as the hash algorithm
for IPv6 flow label generation [An12].

MurmurHash

Another well-known family of hash functions, called MurmurHash, was


published by Austin Appleby in 2008 and finalized as the MurmurHash3
algorithm in 2011 [Ap11].
The MurmurHash algorithms use a special probabilistic technique for
approximating the global optimum to find a hash function that mixes
the bits of the input value in the best way to produce the bits of the output
hash. The various generations of the algorithm differ mainly in their
mixing functions.
The algorithm is claimed to be twice as fast as the speed-optimized
1.2 Non-Cryptographic hash functions 9

lookup3 hash function3 . MurmurHash3 includes 32- and 64-bit versions


for x86 and x64 platforms.
Currently, MurmurHash3 is one of the most popular algorithms and is
used in Apache Hadoop, Apache Cassandra, Elasticsearch, libstdc++,
nginx, and others.

CityHash and FarmHash

In 2011, Google published a new family of hash functions for strings,


called CityHash, developed by Geoff Pike and Jyrki Alakuijala [Pi11].
CityHash functions are simple non-cryptographic hash functions that are
based on the MurmurHash2 algorithm.
The CityHash family were developed with the focus on short strings
(e.g., up to 64 bytes) that have the most interest in probabilistic data
structures and hash tables. It includes 32-, 64-, 128- and 256-bit versions.
For such short strings, the 64-bit version CityHash64 is faster than
MurmurHash and outperforms the 128-bit CityHash128. However, while
for long strings with at least a few hundred bytes the CityHash128 is
preferred over other hash functions of the CityHash family, in practice,
it is better to use MurmurHash3 instead.
One of the downsides of the CityHash is that it is fairly complex and
leads to non-optimal behavior on different compilers that can significantly
degrade its speed.
In 2014 Google published a successor to CityHash called
the FarmHash, developed by Geoff Pike [Pi14]. The new algorithm
included most of the techniques used in CityHash (and, unfortunately,
inherited its complexity) and the new generation of MurmurHash.
FarmHash functions mix the input bits thoroughly, but it is not enough
to be used in cryptography.
The FarmHash uses CPU specific optimizations and still requires tuning
of the compiler to get the best performance and is platform dependent.

3
Hash Functions and Block Ciphers https://burtleburtle.net/bob/hash/
10 Chapter 1: Hashing

Notably, the computed hash values also differ across platforms.


The FarmHash functions come in many versions, and the 64-bit version
Farm64 outperforms algorithms such as CityHash, MurmurHash3, and
FNV in tests on many platforms, including mobile phones.

1.3 Hash tables


A hash table is a dictionary data structure that is comprised of unordered
associative array of length m whose entries are called buckets and are
indexed by a key in the range {0, 1, . . . , m – 1}. To insert an element
into the hash table, a hash function is used to compute the key that is
utilized to select the appropriate bucket to store the value.
Typically, the universe from which we draw the input elements is much
bigger than the capacity m of the hash table, hence collisions in keys
are unavoidable. Additionally, when the number of elements in the hash
table grows, the number of collisions rises as well.
The critical concept of hash tables is the load factor α, the ratio of
the number of used keys n to the table’s total length m:
n
α := .
m

The load factor is a measure of how full the hash table is and since n
cannot exceed the capacity of the hash table it is upper bounded by
one. When α approaches its maximal value, the probability of collision
increases significantly which can necessitate an increase in capacity.
All hash table implementations need to address the problem of collisions
and provide a strategy on how to handle them. There are two main
techniques:

• Closed addressing — to store collided elements under the same


keys in a secondary data structure.
1.3 Hash tables 11

• Open addressing — to store collided elements in positions other


than their preferred positions and provide a way to address them.

The closed addressing technique is the most obvious way to resolve


collisions. There are many different implementations, for instance,
separate chaining that stores collided elements in a linked list, perfect
hashing that uses special hash functions and secondary hash tables of
different lengths.
Instead of creating a secondary data structure in either form, it is
possible to resolve collisions by storing the collided elements elsewhere in
the primary table and providing an algorithm on how to address them.
Since the address of the element is not known from the beginning, this
technique is known as open addressing.
Now we will cover two open addressing implementations that are useful
in the probabilistic data structures listed in this book.

Linear probing

One of the most straightforward hash table implementations that uses


open addressing is the Linear probing algorithm, invented by Gene
Amdahl, Elaine M. McGraw, and Arthur Samuel in 1954 and analyzed
by Donald Knuth in 1963. The idea of the algorithm is to place collided
elements into the next empty bucket. Its name originates from the fact
that the final position of the element will be linearly shifted from
the preferable bucket since we probe one bucket after another.
A LinearProbing hash table can be seen as a circular array that
stores indexed values in buckets. To insert a new element x , we compute
its key k = h(x ) using a single hash function h. If the bucket that
corresponds to that key is non-empty and contains a different value,
meaning a collision, we keep looking clockwise at the next buckets until
we find a free space where we can index the element x . Monitoring of
the load factor of the hash table can guarantee that we will definitely
find a free space at some point.
12 Chapter 1: Hashing

Similarly, when we want to lookup for some element x , we compute


its key k using the same hash function h and start checking the buckets
clockwise, starting at the preferable bucket with the key k = h(x ),
until we found the wanted element x or the first empty bucket appears,
resulting in the decision that the element is not in the table.

Example 1.1: Linear probing


Consider a LinearProbing hash table of length m = 12 and a hash
function based on 32-bit MurmurHash3 that maps the universe to the range
{0, 1, . . . , m – 1}:

h(x ) := MurmurHash3(x ) mod m.

Suppose that we want to store different names of colors in the hash table,
starting from red. The value of the hash function for the element is

h = h(red) = 2352586584 mod 12 = 0.

Since the LinearProbing hash table is empty at the beginning, the bucket
with the key k = 0 contains no elements, therefore we just index the element
there:

red

11 0
10 1
9 2
8 3
7 4
6 5

Next, we take the element green, whose hash value is

h = h(green) = 150831125 mod 12 = 5.

The key is k = 5, as this bucket is empty we again freely store the element.
1.3 Hash tables 13

red

11 0
10 1
9 2
8 3
7 4
6 5
green

Now, consider the element white. Its hash value is

h = h(white) = 16728905 mod 12 = 5.

The preferable bucket for that element is the one with the key k = 5.
However, the bucket is already occupied by a different element, meaning
a collision has appeared. In this case, we apply the Linear probing
algorithm and try to find the next empty bucket going clockwise from
the preferable bucket position. Fortunately, the next bucket, under key
k = 6 is free and we store the element white there.

red

11 0
10 1
9 2
8 3
7 4
6 5

white green

white

When we lookup for the element white in the LinearProbing hash table,
we first check its preferable bucket, with the key k = 5. Since that bucket
contains a value that differs from the element, we start checking buckets
in a clockwise direction, starting from the key k + 1 = 6. Fortunately,
the next bucket with the key k = 6 contains the wanted value and we can
conclude that the element is present in the hash table.

The algorithm requires O(1) time for each operation, as long as


the LinearProbing hash table is not full (the load factor is strictly
14 Chapter 1: Hashing

less than one). The longest probe sequence in Linear probing is of


expected length O(log n).

The Linear probing algorithm is very sensitive to the choice of the hash
function h because it must provide ideal uniform distribution.
Unfortunately, in practice, it is not possible, and the performance of
the algorithm degrades rapidly as the actual distribution diverges. To
address this problem, a variety of techniques for additional randomization
are widely used.

Cuckoo hashing

Another implementation of open addressing is Cuckoo hashing, introduced


by Rasmus Pagh and Flemming Friche Rodler in 2001 and published
in 2004 [Pa04]. The main idea of the algorithm is to use two hash
functions instead of one.
The Cuckoo hash table is an array of buckets, where instead of one
preferable bucket as in Linear probing and many other algorithms, each
element has two candidate buckets determined by two different hash
functions.
To index a new element x into the Cuckoo table, we compute keys
for two candidate buckets with the hash functions h1 and h2 . If at least
one of those buckets is empty, we insert the element into that bucket.
Otherwise, we randomly choose one of those buckets and store element x
there, while moving the element from that bucket to its alternative
candidate bucket. We repeat this procedure until an empty bucket is
found, or until a maximum number of displacements is reached. If there
are no empty buckets, the hash table is considered full.

Although Cuckoo hashing may execute a sequence of displacements, it


keeps the constant time O(1) to be finished.

The lookup procedure is straightforward and can be done in constant


1.3 Hash tables 15

time. We simply need to determine the candidate buckets for the input
element by computing its hashes h1 and h2 and check if such an element is
present in one of those buckets. The deletion procedure can be performed
in a similar way.

Example 1.2: Cuckoo hashing


Consider a Cuckoo hash table of length m = 12 with two 32-bit
hash functions MurmurHash3 and FNV1a that produce values in
the range {0, 1, . . . , m – 1}:

h1 (x ) := MurmurHash3(x ) mod m,
h2 (x ) := FNV1a(x ) mod m.

Like in Example 1.1, we index color names in the hash table starting
with red. The keys of the candidate bucket we obtain by applying those
hash functions:

h1 (red) = 2352586584 mod 12 = 0,


h2 (red) = 1089765596 mod 12 = 8.

The Cuckoo hash table is empty, so we use one of the candidate buckets,
for instance, the bucket with the key k = h1 (red) = 0 and index
the element.
0 1 2 3 4 5 6 7 8 9 10 11

red

Next, we index element black whose candidate buckets are h1 (black) = 6


and h2 (black) = 0. Since the bucket with the key k = 0 is occupied by
another element, we can only index it into the alternative bucket k = 6,
which is free.
0 1 2 3 4 5 6 7 8 9 10 11

red black

There is a similar situation with the element silver with h1 (silver) = 5


and h2 (silver) = 0. We store this element in the bucket with the key
k = 5 since 0 is occupied.
0 1 2 3 4 5 6 7 8 9 10 11

red silver black


16 Chapter 1: Hashing

Now consider the element white. The hash values of this element are

h1 (white) = 16728905 mod 12 = 5,


h2 (white) = 3724674918 mod 12 = 6.

As we can see, both candidate buckets for this element are occupied, and we
have to perform the displacements according to the Cuckoo hashing schema.
First, randomly select one of the candidate buckets, let’s say the bucket
with the key k = 5 and put the element white into it. The element silver
from the bucket 5 has to be relocated to its alternative bucket, which
is 0. As we can see, the bucket with the key 0 is not empty; therefore,
we store element silver and move element red from that bucket to its
other candidate bucket. Fortunately, the alternative bucket with the key
8 for element red is free and after storing it in that bucket, we finish
the insertion procedure. white

0 1 2 3 4 5 6 7 8 9 10 11

silver white black red

silver

red

For instance, when we want to lookup the element silver, we check only
its candidate buckets, which are 5 and 0, as we computed earlier. Since this
element is present in one of them, in the bucket with the key 0 in this case,
we conclude that the element silver is present in the Cuckoo hash table.

Cuckoo hashing ensures high space occupancy but requires the length
of the hash table to be slightly larger than the space needed to keep
all elements. A modification of the Cuckoo hash schema is used in
a probabilistic data structure called the Cuckoo filter, which we will
describe in detail in the next chapter.

Conclusion
In this chapter we covered an overview of hashing, its problems and
importance in data structures. We discussed cryptographic versus non-
cryptographic hash functions, reviewed a list of the functions that are
1.3 Hash tables 17

most used in practice, and learned about universal hashing which is


very important theoretically. As an application of the hash functions we
have considered hash tables, which are simple data structures that map
keys to values and answer membership queries. We studied examples
of open addressing hash tables that we will use in the next chapters for
probabilistic data structures.
If you are interested in more information about the material covered
here, please take a look at the list of references that follows this chapter.
In the next chapter we will be discussing first probabilistic data
structures and studying extensions of hash tables, called filters, that are
used to answer membership queries under requirements that are
common for Big Data applications, such as when storage is at
a premium and the speed of lookups must be as fast as possible.
Bibliography

[An12] Anderson, L., et al. (2012) “Comparing hash function algorithms


for the IPv6 flow label”, Computer Science Technical Reports, 2012.
[Ap11] Appleby, A. (2011) “MurmurHash”, sites.google.com,
https://sites.google.com/site/murmurhash/, Accessed Sept. 18,
2018.
[Ap16] Appleby, A. (2016) “SMHasher”, github.com,
https://github.com/aappleby/smhasher, Accessed Sept. 18, 2018.
[Be06] Bertoni, G., et al. (2006) “RadioGatún, a belt-and-mill hash
function”, Presented at the Second Cryptographic Hash Workshop,
Santa Barbara - August 24–25, 2006.
[Fo18] Fowler, G., et al. (2018) “The FNV Non-Cryptographic
Hash Algorithm”, IETF Internet-Draft. Version 15,
https://tools.ietf.org/html/draft-eastlake-fnv-15, Accessed Sept. 18,
2018.
[Fr84] Fredman, M. L., Komlós, J., and Szemerédi, E. (1984) “Storing
a Sparse Table with 0(1) Worst Case Access Time”, Journal of
Exploring the Variety of Random
Documents with Different Content
the whole arrived at the crest in a mass. The enemy, who had been
waiting behind the sky-line, suddenly appeared at the critical
moment, and opened such a heavy and effective fire that the 120th
crumpled up and rolled down hill. The Allies, contented with the
result of their salvo, did not pursue, but stepped back behind the
crest. Gauthier rallied the defeated regiment half-way down the
slope, and brought up the 122nd to assist: he then repeated the
assault over the same ground, and with better success, for the 120th
reached the crest, and broke up a Portuguese regiment (it was really
two Spanish battalions), and came to a deadly musketry contest with
the English regiment posted on the highest ground. There was a
fusillade almost muzzle to muzzle, but the French regiment finally
gave way ‘whether from the disadvantage of the position or from
over-fatigue after twice climbing such steep slopes’. The 122nd,
coming up just too late, then delivered a similar attack, and suffered
a similar repulse. Both regiments were then rallied half-way down the
slope, and kept up from thence a scattering fire, until Soult’s orders
came to withdraw all the line, in consequence of the defeat of
Clausel’s divisions. This exactly tallies with the narratives of the
British officers of the 40th, who also speak of three attacks, the first
easily foiled-a mere rush of skirmishers—the second very serious, and
rendered almost fatal by the incomprehensible panic of the
Spaniards, who, after behaving very well both on the previous day
and during the first attack, suddenly broke and fled—‘all attempts to
rally them being ineffectual’—over the whole face of the hill behind.
The rout was only stopped by a desperate charge against the front of
the leading French battalion, which was successful contrary to every
expectation and probability. For the 40th, who had suffered
considerable loss in the combat of Linzoain two days before, had only
10 officers and 400 men in line, and were attacking a column of
nearly 2,000 men. This column had been cast down hill, and the men
of the 40th had barely been re-formed—they showed a great wish to
pursue and came back reluctantly—when the third French attack, that
of the 122nd, was delivered with resolution and steadiness but
without success. Even then the fight was not over, for after an
interval the enemy came up the hill again, in disorder but with drums
beating and eagles carried to the front, the officers making incredible
efforts to push the men forward. They did not, however, get to the
crest, but, after rolling up to within twenty-five yards of it, stood still
under the heavy musketry fire, and then fell back, completely ‘fought
out[952].’
Reille’s report declares that Gauthier’s brigade only lost ‘50 killed
and several hundred wounded’—say 350 in all—in this combat. The
British 40th had 129 casualties—the Spanish battalions on their flanks
192. If a brigade of five battalions and 3,300 bayonets allowed itself
to be stopped by a single battalion in the last phases of the combat,
after suffering a loss of only one man in nine, there must have been
something wrong with it, beside bad guidance. One would suspect
that Reille is understating casualties in the most reckless fashion.
While this fight was going on by the banks of the Arga, there was
another in progress on the other flank of the hill of Oricain, on the
banks of the Ulzama. The 6th Division had been intermittently
engaged with Conroux’s troops during the whole time of the French
assaults on the heights. When it was seen that Clausel’s men were
‘fought out’ and falling back, Pack made an effort to utilize the
moment of the French débâcle by capturing Sorauren. He brought up
his divisional guns (Brandreth’s battery) to a position close to the
village, and sent forward the light companies of the two British
brigades to press in upon its south side, while Madden’s Portuguese,
on the other bank of the river, tried to get into it from the rear on the
north side. The attack failed, indeed was never pushed home,
Sorauren being too strongly held. The guns had to be drawn back,
many horses and some gunners having been shot down. Pack himself
was severely wounded in the head, and Madden’s brigade lost 300
men. Wellington sent down from the hill to order the attack to cease,
for even if Sorauren had been taken, the rest of his front-line troops
were in no condition to improve the advantage[953].
While this was going on upon the extreme left, an almost
bloodless demonstration was in progress on the extreme right, where
Foy, as on the previous day, had been ordered to keep Picton
employed. He showed his infantry in front of Alzuza, and pushed
forward the considerable body of light cavalry which had been lent
him to his left flank by Elcano, till their skirmishers had got into
collision with those of the British Hussar Brigade, along the river
Egues. There was much tiraillade but few casualties on either side;
the 10th Hussars were driven across the river, but were replaced by
the 18th, who kept the French in check for the rest of the day. Pierre
Soult showed no intention of closing, and Stapleton Cotton had been
warned by Wellington that his four brigades were intended for flank-
protection not for taking the offensive. The afternoon, therefore,
passed away in noisy but almost harmless bickering between lines of
vedettes. Foy in his report expressed himself contented with having
kept a larger force than his own occupied all day.
Thus ended this second Bussaco, a repetition in its main lines of
the first, and a justification of the central theory of Wellington’s
tactical system. Once more the line, in a well-chosen position, and
with proper precautions taken, had proved itself able to defeat the
column. The French made a most gallant attempt to storm a position
held by much inferior numbers, but extremely strong. They were
beaten partly—as all the critics insisted—by the fact that men who
have just scaled a hill of 1,000 feet are inevitably exhausted at the
moment when they reach its crest, but much more by the superiority
of fire of the line over the column when matters came to the
musketry duel. The French generals had learnt one thing at least
from previous experience—they tried to sheathe and screen the
column by exceptionally heavy skirmishing lines, but even so they
could not achieve their purpose. The only risk in Wellington’s game
was that the enemy’s numbers might be too overwhelming—if, for
example, the 6th Division had not been up in time on July 28th, and
Clausel had been able to put in Conroux’s division (7,000 extra
bayonets) along with the rest, operating against Ross’s extreme flank,
it is not certain that the heights of Oricain could have been held. But
Wellington only offered battle, as he did, because he was relying on
the arrival of the 6th Division. If he had known on the night of the
27th that it could not possibly come up in time, he would probably
have accepted the unsatisfactory alternative policy of which he
speaks in several dispatches, that of raising the siege of Pampeluna
and falling back on Yrurzun. ‘I hope we should in any case have
beaten the French at last, but it must have been further back
certainly, and probably on the Tolosa road[954].’
Soult is said to have felt from the 26th onward—his original
project of a surprise followed by a very rapid advance having failed
—‘une véritable conviction de non-réussite[955].’ We could well
understand this if he really believed—as he wrote to Clarke on the
evening after the battle—that Wellington had 50,000 men already in
line. But this was an ex post facto statement, intended to explain his
defeat to the Minister; and we may be justified in thinking that if he
had really estimated the hostile army at any such a figure, he would
never have attacked. His long delay in bringing on the action may be
explained by the fact that Reille’s divisions were not on the field
before evening on the 27th, and that on the 28th it took many hours
to rearrange the troops on a terrain destitute of any roads, rather
than by a fear of a defeat by superior numbers. It might have been
supposed on the 27th that he was waiting for the possible arrival of
D’Erlon, but on the morning of the 28th he had heard overnight from
his lieutenant, and knew that he could not reach the battle-front on
that day. In his self-exculpatory dispatch to Clarke, Soult complains
that D’Erlon told him that he was blocked by British divisions at
Irurita, ‘but I have no doubt that these are the same troops which fell
upon General Clausel’s flank this afternoon[956].’ In this he was wrong
—D’Erlon was speaking of Hill’s and Dalhousie’s divisions, while it was
Pack (whom D’Erlon had never seen) that rendered a French success
at Sorauren impossible.
The loss of the Allied Army was 2,652—of whom 1,358 were
British, 1,102 Portuguese, and 192 Spaniards. The heaviest casualties
fell on the 3/27th in Anson’s brigade, who first repulsed Maucune,
and then swept away Vandermaesen, and the 1/7th in Ross’s brigade,
the regiment whose flank was exposed by the breaking of the 10th
Portuguese—which corps also, as was natural, was very hard hit. But
all the front-line battalions, both British and Portuguese, had
considerable losses. Soult (as at Albuera) made a most mendacious
understatement of his casualties, putting them at 1,800 only. As
Clausel alone had reported about 2,000, Maucune about 700, and
Lamartinière at least 350, it is certain that the Marshal’s total loss
was over 3,000—how much over it is impossible to say, since the only
accessible regimental casualty-lists include all men killed, wounded,
or missing between July 25th and August 2nd. But the chances are
that 4,000 would have been nearer the mark than 3,000[957].
SECTION XXXVIII: CHAPTER V
SOULT’S RETREAT, JULY 30-31.
THE SECOND BATTLE OF SORAUREN

While the battle of July 28th was being fought, the outlying
divisions of both Soult’s and Wellington’s armies were at last
beginning to draw in towards the main bodies.
Hill, as we have already seen, had received the orders written by
Wellington on Sorauren bridge at 11 a.m. by the afternoon of the
same day, and had started off at once with his whole force—the
three 2nd Division brigades, Silveira’s one brigade, and Barnes’s
three battalions of the 7th Division. His directions were to endeavour
to cross the Puerto de Velate that night, so as to sleep at Lanz, the
first village on the south side of the pass. He was to leave a
detachment at the head of the defile, to check D’Erlon’s probable
movement of pursuit. The supplementary order, issued at 4 p.m.
from the heights of Oricain, directed Hill to march from the place
where his corps should encamp on the night of the 27th (Lanz as
was hoped) to Lizaso, abandoning the high road for the side road
Olague-Lizaso, since the former was known to be cut by the French
at Sorauren. If the men were not over-fatigued when they reached
Lizaso, Hill must try to bring them on farther, to Ollocarizqueta on
the flank of the Sorauren position, where the 6th Division would
have preceded them.
Similarly Dalhousie with the 7th Division (minus the three
battalions with Hill) was to march that same evening from
Santesteban over the Puerto de Arraiz on to Lizaso, to sleep there
and to march on Ollocarizqueta, like Hill, on the morning of the 28th,
if the state of the troops allowed it. All the baggage, sick, stores,
and other impedimenta from the Bastan were also directed on
Lizaso, but they were to go through it westward and turn off to
Yrurzun, not to follow the fighting force to Ollocarizqueta.
None of these directions worked out as was desired, the main
hindering cause being the fearful thunderstorm already recorded,
which raged during the twilight hours of the evening of the 27th. Hill
had started from Irurita, as directed, keeping as a rearguard
Ashworth’s Portuguese, who were intended to hold the Puerto de
Velate when the rest of the column should have crossed it. He was
nearing the watershed, in the roughest part of the road, where it
has many precipitous slopes on the left hand, when the storm came
down, completely blotting out the evening light with a deluge of
rain, and almost sweeping men off their feet. One of Barnes’s
officers describes the scene as follows: ‘So entangled were we
among carts, horses, vicious kicking mules, baggage, and broken-
down artillery, which lined the road, that we could not extricate
ourselves. Some lighted sticks and candles only added to the
confusion, for we were not able to see one yard beyond the lights,
owing to the thick haze, which seemed to render darkness still more
dark. In this bewildered state many who could not stand were
obliged from fatigue to sit down in the mire: to attempt going on
was impossible, except by climbing over the different vehicles that
blocked the road. In this miserable plight, I seated myself against a
tree, when weariness caused me, even amidst this bustle, mud, and
riot, to fall fast asleep[958].’ All sorts of disasters happened: one of
Tulloh’s 9-pounders went over the precipice with the shaft animals
drawn down with it, when the side of the roadway crumbled in[959].
Ross’s battery lost another gun in a similar way, owing to the sudden
breaking of a wheel, and many carts and mules blundered over the
edge. The only thing that could be done was to stick to the track, sit
down, and wait for daylight, which was fortunately early in July.
When it came, the drenched and miserable column picked itself
up from the mire, and straggled down the defile of the Velate,
passing Lanz and turning off at Olague towards Lizaso, as ordered.
Troops and baggage were coming in all day to this small and
overcrowded mountain village, in very sorry plight. It was of course
quite impossible for them to move a mile farther on the 28th, and
Hill had to write to Wellington that he could only hope to move his
four brigades on the early morning of the 29th.[960]
Lord Dalhousie had fared, as it seems, a little better in crossing
the Puerto de Arraiz; he had a less distance to cover, but the
dispatch from Sorauren bridge only reached him at 7 p.m. when the
rain was beginning. With laudable perseverance he kept marching all
night, and reached Lizaso at 12 noon on the 28th, with all his men,
except a battalion of Caçadores left behind to watch the pass. The
condition of the division was so far better than that of Hill’s column,
that Dalhousie wrote to Wellington that he could march again in the
late afternoon, after six hours’ rest, and would be at or near
Ollocarizqueta by dawn on the 29th[961], though two successive night
marches would have made the men very weary. This the division
accomplished, much to its credit, and reached the appointed
destination complete, for Hill had returned to it the three battalions
which had saved the day at Maya on the 25th.
Lizaso on the afternoon of July 28th was a dismal sight, crammed
with the drenched and worn-out men of seven brigades, who had
just finished a terrible night march, with large parties of the Maya
wounded, much baggage and transport with terrified muleteers in
charge, and a horde of the peasants of the Bastan, who had loaded
their more precious possessions on ox-wagons, and started off to
escape the French. It took hours to sort out the impedimenta and
start them on the Yrurzun road. There was a general feeling of
disaster in the air, mainly owing to physical exhaustion, which even
the report of the victory of Sorauren arriving in the evening could
not exorcise. Rumours were afloat that Wellington was about to
retire again, despite of the successes of the afternoon. However, the
day being fine, the men were able to cook and to dry themselves,
and the 7th Division duly set out for another night march.
It was fortunate that the retiring columns were not troubled by
any pursuit either on the 27th or the 28th. The storm which had so
maltreated Hill’s column seems to have kept the French from
discovering its departure. Darmagnac had moved forward on the
27th from Elizondo to ground facing Hill’s position at Irurita, but had
not attempted to attack it, D’Erlon having decided that it was ‘très
forte par elle-même, et inattaquable, étant gardée par autant de
troupes.’ So badly was touch kept owing to the rainy evening, that
Hill got away unobserved, and it was only on the following morning
that Darmagnac reported that he had disappeared. This news at last
inspired D’Erlon with a desire to push forward, and on the morning
of the 28th Abbé took over the vanguard, passed Almandoz, and
crossed the Velate, with Darmagnac following in his rear, while
Maransin—kept back so long at Maya—came down in the wake of
the other divisions to Elizondo. The head of Abbé’s division passed
the Puerto de Velate, pushing before it Ashworth’s Portuguese, who
had been left as a detaining force. Abbé reported that the pass was
full of wrecked baggage, and that he had seen guns shattered at the
foot of precipices. D’Erlon says that 400 British stragglers were
gleaned by the way—no doubt Maya wounded and footsore men. On
the night of the 28th Abbé and Darmagnac bivouacked at and about
Lanz, Maransin somewhere by Irurita. He left behind one battalion at
Elizondo[962], to pick up and escort the convoy of food expected from
Urdax and Ainhoue.
The light cavalry[963] who accompanied Abbé duly reported that
Hill had not followed the great road farther than Olague, but had
turned off along the valley of the upper Ulzama to Lizaso; his fires
were noted at evening all along the edge of the woods near that
village. One patrol of chasseurs, pushing down the main road to
Ostiz, sighted a similar exploring party of Ismert’s dragoons coming
from Sorauren, but could not get in touch with them, as the
dragoons took them for enemies and decamped. However, the two
bodies of French cavalry met again and recognized each other at
dawn on the 29th, so that free communication between the two
parts of Soult’s army was now established.
D’Erlon has been blamed by every critic for his slow advance
between the 26th and the 29th. He was quite aware that it was
slow, and frankly stated in his reports that he did not attack Hill
because he could not have turned him out of the position of Irurita,
not having a sufficient superiority of numbers to cancel Hill’s
advantage of position. This is quite an arguable thesis—D’Erlon had,
deducting Maya losses, some 18,000 men: Hill counting in Dalhousie
who had been placed on his flank by Wellington for the purpose of
helping him if he were pressed, had (also deducting Maya losses)
four British and three Portuguese brigades, with a strength of about
14,000. If Cameron’s and Pringle’s regiments had been terribly cut
up at Maya, Darmagnac’s and Maransin’s had suffered a perceptibly
larger loss, though one distributed over more units. D’Erlon was
under the impression that Hill had three divisions with him, having
received false information to that effect. If this had been true, an
assault on the position of Irurita would have been very reckless
policy. It was not true—only two divisions and one extra Portuguese
brigade being in line. Yet still the event of the combat of Beunza,
only three days later, when D’Erlon’s three divisions attacked Hill’s
own four brigades—no other allied troops being present—and were
held in check for the better part of a day, suggests that D’Erlon may
have had good justification for not taking the offensive on the 26th
or 27th, while the 7th Division was at Hill’s disposition. The most odd
part of his tactics seems to have been the way in which he kept
Maransin back at Maya and Elizondo so long: this was apparently the
result of fear that Graham might have something to say in the
contest—an unjustifiable fear as we know now, but D’Erlon cannot
have been so certain as we are! The one criticism on the French
general’s conduct which does seem to admit of no adequate reply, is
that in his original orders from Soult, which laid down the scheme of
the whole campaign, he was certainly directed to get into touch with
the main army as soon as possible, though he was also directed to
seize the pass of Maya and to pursue Hill by Elizondo and the Puerto
de Velate. If he drew the conclusion on the 26th that he could not
hope to dislodge Hill from the Irurita position, it was probably his
duty to march eastward by the Col de Berderis or the Col d’Ispegui
and join the main body by the Alduides. It is more than doubtful
whether, considering the character of the country and the tracks, he
could have arrived in time for the battle of Sorauren on the 28th.
And if Hill had seen him disappear eastward, he could have marched
to join Wellington by a shorter and much better road, and could
certainly have been in touch with his chief before D’Erlon was in
touch with Soult. Nevertheless, it cannot be said that the French
general obeyed the order which told him that ‘il ne perdra pas de
vue qu’il doit chercher à se réunir le plutôt possible au reste de
l’armée[964].’ And orders ought to be obeyed—however difficult they
may appear.
At dawn on the 29th, therefore, Soult’s troops were in their
position of the preceding day, and D’Erlon’s leading division was at
Lanz, requiring only a short march to join the main body. But
Wellington was in a better position, since he had already been joined
by the 6th Division, and would be joined in a few hours by the 7th
Division, which, marching all night, had reached Ollocarizqueta. Hill,
with his four original brigades, was at Lizaso, as near to Wellington
as D’Erlon was to Soult. The only other troops which could have
been drawn in, if Wellington had so originally intended, were the
brigades of the Light Division, for which (as we shall see) he made
another disposition. But omitting this unit, and supposing that the
other troops on both sides simply marched in to join their main
bodies on the 29th, it was clear that by night Wellington would have
a numerical equality with his adversary, and this would make any
further attempts to relieve Pampeluna impossible on the part of
Soult.
This was the Marshal’s conviction, and he even overrated the
odds set against him, if (as he said in his dispatch to Clarke) he
estimated his adversary’s force on the evening of the 28th at 50,000
men, and thought that several other divisions would join him ere
long. In face of such conditions, what was to be the next move?
Obviously a cautious general would have decided that his bolt had
been shot, and that he had failed. He had a safe retreat before him
to Roncesvalles, by a road which would take all his cavalry and guns;
and D’Erlon had an equally secure retreat either by Elizondo and
Maya on a good road, or by the Alduides, if absolute security were
preferred to convenience in marching. The country was such that
strong rearguards could have held off any pursuit. And this may
have been the Marshal’s first intention, for in his letter to Clarke,
written on the evening after the battle, before he had any
knowledge of D’Erlon’s approach, he said that he was sending back
his artillery and dragoons on Roncesvalles, since it was impossible to
use them in Navarre, and that he was dispatching them to the
Bidassoa ‘where he could make better use of them in new
operations, which he was about to undertake’. He was intending to
remain a short time in his present position with the infantry, to see
what the enemy would do. No news had been received of D’Erlon
since the 27th, when he was still at Elizondo, declaring that he could
not move because he was blocked by a large hostile force—‘the
same force, as I believe, which fell on Clausel’s flank to-day.’ Further
comment the Marshal evidently judged superfluous[965]. The column
of guns and baggage actually marched off on the night of the 28th-
29th.
But early on the morning of the 29th D’Erlon’s cavalry was met,
and the news arrived that he was at Lanz, and marching on Ostiz,
where he would arrive at midday, and would be only five miles from
Sorauren. This seems to have changed the Marshal’s outlook on the
situation, and by the afternoon he had sketched out a wholly
different plan of campaign, and one of the utmost hazard. As the
critic quoted a few pages back observed, he recovered his
confidence when once he was twenty-four hours away from a
defeat, and his strategical conceptions were sometimes risky in the
extreme[966].
The new plan involved a complete change of direction. Hitherto
Soult had been aiming at Pampeluna, and D’Erlon was, so to speak,
his rearguard. Now he avowed another objective—the cutting of the
road between Pampeluna and Tolosa, with the purpose of throwing
himself between Wellington and Graham and forcing the latter to
raise the siege of St. Sebastian[967]. He had now, as he explained,
attracted to the extreme south nearly the whole of the Anglo-
Portuguese divisions: there was practically a gap between
Wellington, with the troops about to join him, and the comparatively
small force left on the Bidassoa. He would turn the British left, by
using D’Erlon’s corps as his vanguard and cutting in north of Lizaso,
making for Hernani or Tolosa, whichever might prove the more easy
goal. He hoped that Villatte and the reserves on the Bidassoa might
already be at Hernani, for D’Erlon had passed on to him an
untrustworthy report that an offensive had begun in that direction,
as his original orders had directed. The manœuvre would be so
unexpected that he ought to gain a full day on Wellington—D’Erlon
was within striking distance of Hill, and should be able to thrust him
out of the way, before the accumulation of British troops about
Sorauren, Villaba, and Huerta could come up. There was obviously
one difficult point in the plan: D’Erlon could get at Hill easily enough.
But was it certain that the rest of the troops, now in such very close
touch with Wellington’s main body—separated from it on one front of
two miles by no more than a narrow ravine—would be able to
disentangle themselves without risk, and to follow D’Erlon up the
valley of the Ulzama? When armies are so near that either can bring
on a general action by advancing half a mile, it is not easy for one of
them to withdraw, without exposing at least its rearguard or
covering troops to the danger of annihilation.
Soult took this risk, whether underrating Wellington’s initiative, or
overrating the manœuvring power of his own officers and men. It
was a gross tactical error, and he was to pay dearly for it on the
30th. In fact, having obliged his enemy with a second Bussaco on
the 28th, he presented him with the opportunity of a second
Salamanca two days after. For in its essence the widespread battle of
the 30th, which extended over ten miles from D’Erlon’s attack on Hill
near Lizaso to Wellington’s counter-attack on Reille near Sorauren,
was, like Salamanca, the sudden descent of an army in position
upon an enemy who has unwisely committed himself to a march
right across its front.
Soult made the most elaborate plans for his manœuvre—D’Erlon
was to move from Ostiz and Lanz against Hill: he was lent the whole
of Treillard’s dragoons, to give him plenty of cavalry for
reconnaissance purposes; Clausel was to march Taupin’s and
Vandermaesen’s troops behind Conroux, who was to hold Sorauren
village till they had passed him. Conroux would then be relieved by
Maucune’s division from the high ground opposite the Col, and when
it had taken over Sorauren from him, would follow the rest of
Clausel’s troops up the high road. Maucune’s vacant position would
be handed over to Foy and Lamartinière. The former was to
evacuate Alzuza and the left bank of the Arga, where he was not
needed, as the column of guns and baggage, with its escort of
cavalry, had got far enough away on the road to Zubiri and
Roncesvalles to be out of danger. He was then to go up on to the
heights where Maucune had been posted during the recent battle, as
was also Lamartinière. The latter was to leave one battalion, along
with the corps-cavalry (13th Chasseurs) of Reille’s wing, on the high
road north of Zabaldica, as an extra precaution to guard against any
attempt by the enemy to raid the retreating column of
impedimenta[968].
Finally, some orders impossible to execute were dictated—viz.
that all these movements were to be carried out in the night, and
that both Clausel and Reille were to be careful that the British should
get no idea that any change in the position of the army was taking
place; one general is told that ‘il opérera son mouvement de manière
à ce que l’ennemi ne puisse aucunement l’apercevoir,’ the other that
‘il disposera ses troupes à manière que l’ennemi ne puisse
soupçonner qu’il y a de changement ni de diminution.’ Reille was to
hold the line Sorauren-Zabaldica for the whole day of the 30th, and
then to follow Clausel with absolute secrecy and silence. Now,
unfortunately, one cannot move 35,000 men on pathless hillsides
and among woods and ravines in the dark, without many units losing
their way, and much noise being made. And when one is in touch
with a watchful enemy at a distance of half a mile only, one cannot
prevent him from seeing and hearing that changes are going on. The
orders were impracticable.
Such were Soult’s plans for the 30th. Wellington’s counter-plans
do not, on the 29th, show any signs of an assumption of the
offensive as yet, though it was certainly in his mind. They are
entirely precautionary, all of them being intended to guard against
the next possible move of the opponent. The troops which had
suffered severely in the battle were drawn back into second line,
Byng’s and Stubbs’s brigades taking over the ground held by Ross
and Campbell. The 6th Division—now under Pakenham, Pack having
been severely wounded on the previous day—occupied the heights
north-west of Sorauren, continuing the line held on the 28th to the
left. Both Cole and Pakenham were told to get their guns up, if
possible. And when Lord Dalhousie came up from Lizaso after his
night march, his division also was placed to prolong the allied left—
two brigades to the left-rear of the 6th Division, hidden behind a
high ridge, the third several miles westward over a hill which
commanded the by-road Ostiz-Marcalain—a possible route for a
French turning movement[969].
Originally Wellington had supposed that Hill would reach Lizaso
earlier than Dalhousie, and had intended that the 2nd Division
should come down towards Marcalain and Pampeluna, while the 7th
remained at Lizaso to protect the junction of roads. But the storm,
which smote Hill worse than it smote Dalhousie, had settled matters
otherwise. It was the latter, not the former, who turned up to join
hands with the main army. Accepting the change, Wellington
directed Hill to select a good fighting position by Lizaso, in which he
should place the two English brigades of the 2nd Division and
Ashworth, while Da Costa’s Portuguese were to move to
Marcalain[970], close to the rear brigade which the 7th Division had
left in the neighbourhood. Thus something like a covering line was
provided for any move which D’Erlon might make against the British
left. Wellington was feeling very jealous that day of attempts to turn
this flank, and took one more precaution which (as matters were to
turn out) he was much to regret on the 31st. Orders were sent to
Charles Alten to move the Light Division from Zubieta towards the
high road that goes from Tolosa to Yrurzun via Lecumberri. In
ignorance of the division’s exact position, the dispatch directed Alten
to come down so far as might seem best—Yrurzun being named as
the farthest point which should be taken into consideration[971]. This
caused Alten to move, by a very toilsome night march, from Zubieta,
where he would have been very useful later on, to Lecumberri,
where (as it chanced) he was not needed. But this was a hazard of
war—it was impossible to guess on the 29th the unlikely place where
Soult’s army was about to be on the 31st. Another move of troops to
the Yrurzun road was that Fane’s cavalry brigade, newly arrived from
the side of Aragon, was directed to Berrioplano, behind Pampeluna
on the Vittoria road, with orders to get into touch with Lizaso,
Yrurzun, and Lecumberri.
Obviously all these arrangements were defensive, and
contemplated three possible moves on the part of the enemy—(1) a
renewal of the battle of the 28th at and north of Sorauren, with
which the 6th and 7th Divisions could deal; (2) an attempt to turn
the allied flank on the short circuit Ostiz-Marcalain; (3) a similar
attempt by a larger circuit via Lizaso, which would enable the enemy
to get on to the roads Lizaso-Yrurzun, and Lizaso-Lecumberri, and so
cut the communication between Wellington and Graham. The third
was the correct reading of Soult’s intentions. To foil it Hill was in
position at Lizaso, with the power to call in first Da Costa, then the
7th Division, and much later the Light Division, whose whereabouts
was uncertain, and whose arrival must obviously come very late. But
if the enemy should attack Hill with anything more than the body of
troops which had been seen at Ostiz and Lanz (D’Erlon’s corps),
Wellington intended to fall upon their main corps, so soon as it
began to show signs of detaching reinforcements to join the turning
or enveloping column.
Soult’s manœuvre had duly begun on the night of the 28th-29th
by the retreat of the artillery, the wounded, the train and heavy
baggage towards Roncesvalles, under the escort of the dragoon
regiments of Pierre Soult’s division, whose chasseur regiments,
however, remained with Reille as ‘corps cavalry’. On the news of this
move going round, a general impression prevailed that the whole
army was about to retire by the road on which it had come up. This
seemed all the more likely because the last of the rations brought
from St. Jean-Pied-du-Port had been distributed on the battle-
morning, and the troops had been told on the 29th to make raids for
food on all the mountain villages on their flanks and rear. They had
found little save wine—which was more a snare than a help. The
perspicacious Foy noted in his diary his impression that Soult’s new
move was not made with any real hope of relieving St. Sebastian or
cutting the Allies’ communication, it was in essence a retreat; but, to
save his face, the Marshal was trying to give it the appearance of a
strategic manœuvre[972]. This was a very legitimate deduction—it
certainly seemed unlikely that an army short of food, and almost
equally short of munitions, could be asked to conduct a long
campaign in a region where it was notorious that it could not hope
to live by requisitions, and would find communication with its base
almost impossible. It was true that convoys were coming up behind
D’Erlon—one was due at Elizondo—but roads were bad and
appointments hard to keep. There were orders sent to Bayonne on
the 28th for the start of another—but how many days would it take
before the army got the benefit?—a week at least. Men and officers
marched off on the new adventure grumbling and with stomachs ill-
filled.
The result of Soult’s orders was to produce two separate actions
on the 30th—one between D’Erlon and Hill behind Lizaso, in which
the French gained a tactical advantage of no great importance, the
other on the heights along the Ulzama between Wellington’s main
body and Clausel and Reille, in which the French rear divisions were
so routed and dispersed that Soult had to throw up all his ambitious
plans, and rush home for safety as fast as was possible.
The more important action, generally called the Second Battle of
Sorauren, may be dealt with first.
At midnight Clausel’s two divisions on the heights, separated
from Cole’s position by the great ravine, moved off, leaving their fires
burning. By dawn they had safely arrived on the high road between
Sorauren and Ostiz, and were ready to move on towards D’Erlon
when the third division—Conroux’s—should have been relieved at
Sorauren village by Maucune. But this had not yet been
accomplished, even by five in the morning: for Maucune’s troops
having woods and pathless slopes to cross, in great part lost their
way, and were straggling in small bands over the hills during the
night. They had only two miles to go in many cases, yet when the
light came some of them were still far from their destination and in
quite unexpected places. Conroux had disentangled one brigade
from Sorauren, while the other was still holding the outposts, and
Maucune was just beginning to file some of his men behind the
barricades and defences, when the whole of the village received a
salvo of shells, which brought death and confusion everywhere.
During the night Pakenham and Cole, obeying Wellington’s orders,
had succeeded in getting some guns up to the crest of the heights—
the 6th Division had hauled six guns up the hill on its left: they were
now mounted in front of Madden’s Portuguese, only 500 yards from
their mark[973]. Of the 4th Division battery[974] two guns and a
howitzer had been dragged with immense toil to the neighbourhood
of the chapel of San Salvador, overlooking Sorauren, the other three
guns to a point farther east, opposite the front from which
Vandermaesen had attacked on the 28th.
This was apparently the commencement of the action, but it
soon started on several other points. Foy had evacuated Alzuza at
midnight, but having rough country-paths in a steep hillside as his
only guides in the darkness, had not succeeded in massing his
division at Iroz till 5 a.m. He then mounted the heights in his
immediate front, pushing before him (as he says) fractions of
Maucune’s division which had lost their way. His long column had got
as far as the Col and the ground west of it, when—full daylight now
prevailing—he began to be shelled by guns from the opposite
heights—obviously the other half of Sympher’s battery[975].
Meanwhile Lamartinière’s division (minus the one battalion of the
122nd left to watch the Roncesvalles road) had moved very little in
the night, having only drawn itself up from Zabaldica to the heights
immediately to the left of the Col, where it lay in two lines of
brigades, the front one facing the Spaniards’ Hill, the rear one a few
hundred yards back. Foy, coming from Iroz, had marched past
Lamartinière’s rear, covered by him till he got west of the Col. This
division was not shelled, as Conroux, Maucune, and Foy had all
been, but noted with disquiet that British troops were streaming up
from Huarte on the Roncesvalles road, with the obvious intention of
turning its left, and getting possession of the main route to France.
Wellington, as it chanced, was in the most perfect condition to
take advantage of the mistake which Soult had made in planning to
withdraw his troops by a march right across the front of the allied
army in position. The night-movements of the French had been
heard, and under the idea that they might portend a new attack, all
the divisions had been put under arms an hour before dawn. The
guns, too, were in position, only waiting for their mark to become
visible—it had been intended to shell the French out of Sorauren in
any case that morning. Wellington had risen early as usual, and was
on the look-out place on the Oricain heights which he used as his
post on the 28th and 29th. When the panorama on the opposite
mountain became visible to him, he had only to send orders for a
general frontal attack, for which the troops were perfectly placed.
Accordingly, the 6th Division attacked Sorauren village at once,
while the troops on the Oricain heights descended, a little later, in
two lines—the front one formed of Byng’s brigade, Stubbs’s
Portuguese, and the 40th and Provisional Battalion of Anson’s—while
Ross’s brigade and Anson’s two other battalions (the troops which
had taken the worst knocks on the 28th) formed the reserve line.
Preceded by their skirmishers, Byng’s battalions made for Sorauren,
Cole’s marched straight for Foy’s troops on the opposite hills[976].
Farther to the right Picton had discovered already that there was no
longer any French force opposite him, and had got his division
assembled for an advance along the Roncesvalles road. He would
seem to have been inclined to go a little farther than Wellington
thought prudent, as there was a dispatch sent to him at 8 a.m.
bidding him to advance no farther, until it was certain that all was
clear on the left, though he might push forward light troops on the
heights east of the Arga river. He did not therefore get into touch
with Lamartinière on the mountain above Zabaldica for some time
after Foy and Maucune had been attacked.
Meanwhile it was not only to the old fighting-ground of the 28th
that the attack was confined. Wellington ordered Lord Dalhousie to
emerge from the shelter of the hills beyond the left of the 6th
Division, and to assail the troops below him in the Ulzama valley—
that is Taupin’s and Vandermaesen’s divisions, halted by Clausel
some way short of Ostiz, when the sound of firing began near
Sorauren.
The French units which were in the greatest danger were the
three divisions of Conroux, Maucune, and Foy, all suddenly caught
under the artillery fire of an enemy whom they had not believed to
possess any guns in line. Troops marching in column are very
helpless when saluted in this fashion. Foy writes ‘we had not been
intending to fight, and suddenly we found ourselves massed under
the fire of the enemy’s cannon. We were forced to go up the
mountain side to get out of range; we should have to retreat, and
we already saw that we should be turned on both flanks by the two
valleys on our left and right.... General Reille only sent part of my
division up on to the high crests after its masses had been well
played upon by an enemy whose artillery fire is most accurate[977].’
Now Foy could, after all, get out of range by going up hill—but
Maucune and Conroux were in and about a village which it was their
duty to hold, since they had to cover the retreat of Foy and
Lamartinière across their rear. If Sorauren were lost, Reille’s two left
divisions would be driven away from the main army—perhaps even
cut off. Yet it was a hard business to hold a village under a close-
range artillery fire from commanding ground, which enfiladed it on
both sides, when the enemy’s infantry intervened. Pakenham sent
Madden’s Portuguese round the village on the north—where they
drove in one regiment[978] which Conroux had thrown out as a flank
guard on the west bank of the Ulzama, and then proceeded to push
round the rear of the place. At the same time the left wing of the
troops which had descended from the chapel of San Salvador
(Byng’s brigade) began to envelop Sorauren on the eastern flank.
And frontally it was attacked by the light companies of Lambert’s
brigade of the 6th Division.
Conroux’s first brigade (Rey’s) had succeeded in getting away to
the north of the village before it was completely surrounded, and,
after bickerings with Madden’s Portuguese and other 6th Division
troops, finally straggled across the hills to Ostiz. His other brigade,
however, and all Maucune’s division were very nearly exterminated.
They made an obstinate defence of Sorauren for nearly two hours,
till the place was entirely encompassed: Maucune says that he
cleared a way for the more compromised units by a charge which
retook part of the village, and then led the whole mass up the hill.
The bulk of them scraped through, but were intercepted by 4th
Division troops in the hollow hillside above. One whole battalion was
forced to surrender, and great part of two others; there were 1,700
unwounded prisoners taken, of whom 1,100 belonged to Maucune
and 600 to Conroux’s rear brigade[979].
Maucune’s division was practically destroyed: having reported
600 casualties on the 28th, he reported 1,800 more on the 30th, and
as his total strength was only 4,186, it is clear that only 1,700 men
got away. The general and the survivors rallied in to Foy on the
heights above, along with the wrecks of Conroux’s second brigade,
which lost 1,000 men: the first brigade, though less hard hit, seems
to have had 500 or 600 casualties, and many stragglers. Altogether
both divisions were no longer a real fighting force during the rest of
the campaign.
Meanwhile, there had been a distinct and separate combat going
on farther up the Ulzama valley. When the fighting in and about
Sorauren began, Clausel had halted Vandermaesen’s and Taupin’s
division at a defile near the village of Olabe, knowing that if he
continued on his way towards Ostiz and Olague Reille’s wing would
be completely cut off from him. Having seen suspicious movements
in the hills on his left, he sent up two battalions from
Vandermaesen’s division to hold the heights immediately beyond the
river and cover his flank. The precaution was wise but insufficient:
somewhere about 8.30 a.m. the British 7th Division, having received
its orders to join in the general attack, came up from its concealed
position in the rear, and fell upon the covering troops, who were
driven off their steep position by Inglis’s brigade, and thrown down
on to the main body of Clausel’s troops in the valley. There was close
and bloody fighting in the bottom, ‘a small level covered with small
bushes of underwood,’ but after a time the French gave way.
Vandermaesen’s men soon got locked in a stationary fight with
Inglis’s battalions, but the two other 7th Division brigades (Le Cor
and Barnes) which had not descended to the river and the road,
were plying Taupin’s column with a steady fire from the slope of the
opposite bank, which made standing still impossible[980]. Having to
choose between attacking or retreating, Clausel opted for the easier
alternative, and drew off. In his report to Soult he gives as his
reason the fact that Sorauren had now been lost, that Reille’s troops
were streaming over the hills in disorder, and that it was no use
waiting any longer to cover their retreat, which was not going to be
by the road, but broadcast across the mountains. Accordingly he
disengaged himself as best he could, and retreated up the chaussée
followed by the 7th Division, which naturally took some time to get
into order. Inglis’s brigade followed by the road, presently supported
by Byng’s troops, who came up from the side of Sorauren at noon:
the other two brigades kept to the slopes on the west of the river,
turning each position which Clausel took up[981]. By one o’clock he
was back to Etulain, by dusk at Olague, where he was joined by
Conroux and the 3,000 men who represented the wreck of his
division. Vandermaesen had been much mauled—and had left
behind 300 prisoners, while many stragglers from his division, and
more from both of the brigades of Conroux, were loose in the hills.
It is doubtful whether Clausel had more than 8,000 men out of his
original 17,000 in hand that evening. The survivors were not fit for
further fighting[982].
Meanwhile, Reille was undergoing equally unpleasant
experiences. He had stopped behind to conduct the retreat of Foy’s
and Lamartinière’s divisions, when the unexpected cannonade,
followed by the advance of the British infantry, showed that he was
not to get away without hard fighting. When Pakenham attacked
Sorauren, and Cole a little later crossed the ravine to assail Foy’s
position, Reille tried for some time to maintain himself on the
heights, but soon saw that it was impossible. He sent Maucune
permission to evacuate Sorauren—of which the latter could not avail
himself, for he was pressed on all sides and no longer a free agent.
And having thus endeavoured to divest himself of responsibility for
the fate of his right division, he gave orders for the retreat of the
two others not by any regular route, but straight across country, up
hill and down dale. The reason for haste was not only that Cole was
now pressing hard upon Foy’s new position, but that Picton was
visible marching hard for Zabaldica, with the obvious intention of
turning Lamartinière’s flank. There was an ominous want of any
frontal attack on this division—it was clear that Picton intended to
encircle it, while Foy on the other flank was being driven in by Cole,
and there was a chance of the whole wing being surrounded.
Accordingly, at about 10 o’clock by Lamartinière’s reckoning, he
began to give way in échelon of brigades, much hindered after a
time by Picton’s light troops, who had now swerved up into the hills
after him, and pestered each battalion when it turned to retreat[983].
Foy reports that his rear brigade and part of his colleague’s division
were at one moment nearly cut off, and that he ran some danger of
being taken prisoner. It evidently became a helter-skelter business to
get away, and the French can have made no serious resistance, as
Picton’s three brigades that day had only just 110 casualties
between them[984]. The retreat was made more disorderly by the
arrival of a drove of some 4,000 fugitives of Conroux’s and
Maucune’s divisions, accompanied by the latter general himself, who
had escaped over the hills from Sorauren, and ran in for shelter on
Foy’s rear.
At about one o’clock Reille’s divisions, having outdistanced their
pursuers by their rapidity, halted in considerable disarray in a valley
by the village of Esain[985], where Reille tried to restore order, and to
settle a practicable itinerary, with the object of rejoining Clausel. For
reasons which he does not specify, he marched himself by a road
down the valley, which would lead him to Olague, but told Foy to
take a parallel track farther up the slope which should take him to
Lanz, higher in the same valley. The partition was obviously made in
a very haphazard fashion, for while the bulk of Lamartinière’s
division, and the poor remnant of Maucune’s which preserved any
order, accompanied the corps-commander, Foy found that he was
being followed by two stray battalions of Gauthier’s brigade of
Lamartinière, and by a great mass of stragglers, largely Conroux’s
men, but partly also Maucune’s and Lamartinière’s.
Reille got to Olague at dusk and joined Clausel, but brought with
him a mere wreck of his corps, probably not 6,000 men. For Foy
never appeared—and never was to appear again during the
campaign. He explained, not in the most satisfactory way, that part
of his track lay through woods, where the sense of direction is lost,
that he was worried by the reappearance of British light troops, who
had to be driven off repeatedly, and that the stragglers smothered
his marching columns and led them astray. Anyhow, he found
himself at dusk at Iragui in the upper valley of the Arga, instead of
at Lanz in the upper valley of the Ulzama—having marched ten miles
instead of the five that would have taken him from Esain to Lanz.
Picton’s light troops were still in touch with him, and he resolved that
it was hopeless to try to struggle over mountains in the dark.
Dropping into the pass that leads from Zubiri to the Alduides (the
Puerto de Urtiaga), he marched for some hours more, bivouacked,
and next morning descended into French territory[986]. He sent off
the stragglers to St. Jean-Pied-de-Port—they sacked all the mountain
villages on their way[987]—and took his own division at leisure down
the Val de Baigorry to Cambo—having lost only 550 men out of his
6,000 in the whole campaign. Some critics whispered that, seeing
disaster behind him, il a su trop bien tirer son épingle du jeu, and
had saved his division, regardless of what might happen to
colleagues—just as on the eve of Vittoria he had refused to join
Jourdan, and had managed a safe retreat for himself. That it was not
absolutely impossible to get away to the Bastan was shown by the
fact that the two stray battalions of Lamartinière’s division which had
followed Foy in error, branched off from him at Iragui and got to
Almandoz, Elizondo, and the pass of Maya. Another lost party—the
battalion and cavalry regiment which Lamartinière had left to cover
the Roncesvalles road[988]—turned off at Zubiri and followed Foy’s
route by the Alduides and Baigorry. Reille’s ‘main body’ was a poor
remnant by the night of the 30th, after all these deductions had
come to pass.
It must not be supposed that the operations of Pakenham,
Dalhousie, Cole, and Picton during the morning of the 30th were left
to the personal initiative of these officers. Wellington’s orders for the
first move had been made on the spur of the moment, as the
position of the French became visible at dawn. But after the capture
of Sorauren, probably between 9 and 10 a.m., he issued a definite
programme for the remainder of the day’s operations.
Picton was to pursue the enemy who had gone off north-
eastward (Lamartinière) toward the Roncesvalles road. He was given
two squadrons of hussars, and told to take his divisional battery with
him.
Cole was to act on the massif between the Arga and the Ulzama,
keeping touch with Picton on one flank and Pakenham on the other:
if the enemy in his front (Foy and the wrecks of Maucune) made a
strong stand, he was not to lose men by violent frontal attacks, but
to wait for the effect of Picton’s turning movement, ‘which will alarm
the enemy for his flank.’
Up the main road Ostiz-Olague there were to march Byng’s
brigade, the 6th Division, and O’Donnell’s troops from the San
Cristobal position (some six battalions). The 6th Division was to take
its divisional battery with it.
Dalhousie should operate on the east bank of the Ulzama,
keeping touch with Byng and Pakenham—he would be in a position
to turn all positions which the French (Taupin and Vandermaesen)
might take up, if they tried to hold back the main column on the
high road. Like Cole, he was not to attempt anything costly or
hazardous.
Finally, and here later news caused a complete alteration of the
programme, Hill was to ‘point a movement’ from Lizaso on Olague
and Lanz, if the situation of the enemy made it possible. Attacked
himself by the superior numbers of D’Erlon’s corps, Hill was (of
course) unable to do anything of the kind. Wellington seems to have
suspected that he might be too weak for his task, and in the general
rearrangement of the army sent him not only Campbell’s Portuguese
brigade (which properly belonged to the division of Silveira, who was
in person with Hill but only with one brigade, Da Costa’s), but also
the Spanish battalions which O’Donnell had detached to
Ollocarizqueta on the 27th[989], and finally Morillo’s division. These
7,000 men were started from their positions before noon, but did not
arrive in time to help Hill[990]. A separate supplementary order went
off to Charles Alten to tell him that the Light Division would not be
wanted at Lecumberri, and should return to Zubieta.
The whole of this series of orders is purely offensive in character,
and, as is easily seen, presupposes first, that a large section of the
French army is retiring on the Roncesvalles road, but secondly that
the main body is about to go back by Lanz, the Col de Velate,
Elizondo, and Maya. Hence the heaviness of the column directed on
the chaussée: if Wellington had dreamed that Soult was intending to
send nothing back by the Roncesvalles road, and had started a
vigorous attack on Hill that morning, the orders given would have
been different.
Meanwhile, during the hours which saw the destruction of
Clausel’s and Reille’s divisions, Soult himself was urging on D’Erlon’s
corps to overwhelm Hill, and hoping for the early arrival of Clausel’s
to lend assistance. Reille he had left behind as a containing force,
and did not expect to see for another twenty-four hours. Soult
informs us that he left Zabaldica and the left wing so early that he
had no reason to expect the trouble which was about to break out
behind him. He noticed Maucune’s division beginning to file into
Sorauren, and passed Clausel in march on Ostiz, before the British
guns opened, i.e. before 6 o’clock in the morning. But they must
have been sounding up the valley before he reached D’Erlon on the
heights by Etulain: he resolved to pay no attention to the ominous
noise, being entirely absorbed in the operation which he had in
hand.
D’Erlon was already in movement, by the valley of the Ulzama,
and had just been joined by the cavalry, which had come up from
the Arga valley by cross roads in the rear[991]. It was, of course, no
use to him in the sort of engagement on which he was launched.
The Marshal instructed him to push on and hurry matters, as there
were reports from deserters that three hostile divisions were on their
way to reinforce the British force at Lizaso. Accordingly D’Erlon,
having discovered his enemy’s position with some little difficulty, for
it lay all along the edge of woods, delivered his attack as soon as his
troops were up. Hill, on news of the French advance reaching him,
had evacuated Lizaso, which lies in a hole, and had drawn up his
four brigades along a wooded ridge half a mile to the south, with the
village of Gorron in front of his left wing, and that of Arostegui
behind his right wing. The Portuguese brigade of Ashworth formed
his centre: on the right was one regiment of Da Costa’s brigade
which had been called up from the Marcalain road, on the left the
other and the remains of Cameron’s brigade, which had suffered so
heavily at Maya: Fitzgerald of the 5/60th was in command, the
brigadier having been wounded on the 25th. Pringle’s brigade (under
O’Callaghan that day, for Pringle was acting as division-commander
vice William Stewart wounded at Maya) was in reserve—apparently
distributed by battalions along the rear of the line. The edge of the
woods was lined with a heavy skirmishing line of light companies,
and the Caçador battalion of Ashworth’s brigade. Altogether there
were under 9,000 men in line.
D’Erlon determined to demonstrate against Hill’s front with
Darmagnac’s division, who were to hold the Portuguese closely
engaged, but not to attack seriously. Meanwhile Abbé was to assail
the Allied left, and also to turn it by climbing the high hills beyond its
extreme flank, in the direction of the village of Beunza. He had
ample force to do this, having the strongest division in Soult’s army
—8,000 bayonets—and the only one which had not yet been
seriously engaged during the campaign. Maransin followed Abbé in
support. The arrangements being scientific, and the force put in
action more than double Hill’s, success seemed certain.
It was secured; but not so easily as D’Erlon had hoped.
Darmagnac, so Soult says, engaged himself much more deeply than
had been intended. Finding only Portuguese in his front, he made a
fierce attempt to break through, and was handsomely repulsed.
Meanwhile Abbé, groping among the wooded slopes to find the flank
of Fitzgerald’s brigade, missed it at first, and attacking the 50th and
92nd frontally, saw his leading battalions thrown back. But he put in
more, farther out to the right, and though the British brigade threw
back its flank en potence, and tried to hold on, it was completely
turned, and would have been cut off, but for a fierce charge by the
34th, who came up from the reserve and held the enemy in check
long enough for the rest to retire—with the loss of only 36 prisoners
(two of them officers). The retreat of the left wing compelled the
Portuguese in the centre to give back also—they had to make their
way across a valley and stream closely pursued, but behaved most
steadily, and lost less than might have been expected—though some
130 were cut off and captured. The right wing pivoted, in its
withdrawal, on an isolated hill held by Da Costa’s 2nd Line, which
was gallantly maintained to the end of the day. The centre and left
lost more than a mile of ground, but were in good enough order to
take up a new position, selected by Hill on a height in front of the
village of Yguaras, where they repulsed with loss a final attack made
by one of Darmagnac’s regiments which pursued too fiercely[992].
D’Erlon was re-forming his troops, much scattered in the woods,
when at 4 p.m. there arrived from Marcalain the head of Campbell’s
Portuguese brigade, followed by Morillo’s and O’Donnell’s Spaniards.
Their approach was observed, and no further attacks were made by
the French. D’Erlon winds up his account of the day by observing,
quite correctly, that he had driven Hill out of his position, inflicted
much loss on him, and got possession of the road to Yrurzun. So he
had—the Allies had lost 156 British and about 900 Portuguese, of
whom 170 were prisoners. The French casualties must have been
about 800 in all, if we may make a rough calculation from the fact
that they lost 39 officers—10 in Abbé’s division, 29 in
Darmagnac’s[993].
But it is not to win results such as these that 18,000 men attack
8,000, and fight them for seven hours[994]. And what was the use of
such a tactical success, when meanwhile Soult’s main body had been
beaten and scattered to the winds, so that Reille and Clausel were
bringing up 14,000 demoralized soldiers, instead of 30,000 confident
ones, to join the victorious D’Erlon?
This unpleasant fact stared Soult in the face, when he rode back
to Olague to receive the reports of his two lieutenants. It was
useless to think of further attempts on the Tolosa road, or
molestation of Graham. D’Erlon’s three divisions were now his only
intact force, capable of engaging in an action with confidence: the
rest were not only reduced to a wreck in number, but were ‘spent
troops’ from the point of view of morale. The only thing to be done
was to retreat as fast as possible, using the one solid body of
combatants to cover the retreat of the rest. All that Soult afterwards
wrote to Paris about his movements of July 31 being the logical
continuation of his design of July 30—‘de me rapprocher de la
frontière pour y prendre des subsistances, avec l’espoir de joindre la
réserve du Général Villatte[995],’ was of course mere insincerity. He
changed his whole plan, and fled in haste, merely because he was
forced to do so.
One strange resolve, however, he made on the evening of July
30. The safest and shortest way home was by the Puerto de Velate,
Elizondo, and Maya; and Clausel’s and Reille’s troops at Olague and
Lanz were well placed for taking this route. This was not the case
with D’Erlon’s men at Lizaso and the newly won villages in front of it.
Instead of bidding the routed corps hurry straight on, and bringing
D’Erlon down to cover them, the Marshal directed Reille and Clausel
to leave the great road, to cut across by Olague to Lizaso, and to get
behind D’Erlon, who would hold on till they were past his rear. All
would then take the route of the Puerto de Arraiz and go by
Santesteban. This was a much more dangerous line of retreat; so
much so that the choice excites surprise. Soult told Clarke that his
reason for taking the risk was that D’Erlon had got so far west that
there was no time to move him back to the Velate road—which
seems an unconvincing argument. For Clausel and Reille had to
transfer themselves to the Puerto de Arraiz road, which would take
just as much time; and D’Erlon could not retreat till they had cleared
his rear. The real explanation would seem to be that Soult thought
that the British column on the Velate road, being victorious, would
start sooner and pursue more vigorously than Hill’s troops, who had
just been defeated. If Reille and Clausel were pressed without delay,
their divisions would go to pieces: D’Erlon, on the other hand, could
be relied upon to stand his ground as long as was needful. If this
was Soult’s idea, his prescience was justified.
SECTION XXXVIII: CHAPTER VI
SOULT’S RETREAT, JULY 31-AUG. 3

When Soult’s orders of the evening of July 29th had been issued,
there was no longer any pretence kept up that the Army was
executing a voluntary strategical movement, planmässig as the
German of 1918 would have expressed the idea, and not absconding
under pressure of the enemy.
At 1 o’clock midnight Clausel’s and Reille’s harassed troops at
Olague and Lanz went off as fast as their tired legs would carry
them, and leaving countless stragglers behind. D’Erlon could not
retire till the morning, when he sent off Darmagnac and Maransin to
follow the rest of the army, retaining Abbé’s division as his
rearguard, which held the heights north of Lizaso for some time
after their comrades had gone.
Wellington’s orders issued at nightfall[996] were such as suited
Soult fairly well, for the British general had not foreseen that which
was unlikely, and he had been deceived to some extent by the
reports which had come in. The deductions which he drew from
what he had ascertained were that a large body of the enemy had
retreated eastward, and would fall into the Roncesvalles road, but
that the main force would follow the Velate-Elizondo chaussée. That
Soult would lead all that survived to him of his army over the Puerto
de Arraiz passes, to Santesteban, had not struck him as a likely
contingency. Hence his detailed orders overnight were inappropriate
to the facts which appeared next morning. He directed Picton to
pursue whatever was before him on the Roncesvalles road—thinking
that Foy and Lamartinière would escape in that direction; but lest
they should have gone off by Eugui and the Col de Urtiaga he
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like