Algorithm To Live by
Algorithm To Live by
Algorithm To Live by
• “Look-then-leap” is precisely what it is - Look at 37% of the options (or 37% of total time available) to form
an intuition and gain information and then leap.
• Variants - If there is 50/50 chance of proposal being accepted (option has agency), then stop at 25% and if
immediate proposals are a sure thing but belated ones are 50/50, be noncommittal till 61%.
• If we know exactly what we are looking for - then its a full-information game - stop when you find the first
option that matches
• Corollary - In the case of slim pickings, lower your standards and when there is more fish in the sea, raise
them
• Turning no-information games into full-information ones - Finding where your option stands relative to the
total population at large changes the look-then-leap to threshold rule (Higher odds of finding best option)
• Where to Park - Pass up all vacant spots available before a certain distance to the destination and then take the
first one that’s available after - choose distance based on occupancy rate (Rational brain does this amazingly
intuitively)
• Explore vs Exploit depends on how much time you have in the game - In the game life, it depends on your age
• We are more likely to try a new restaurant when we move to a city than when we are leaving it
• Explore when you have the time to use the resulting knowledge, exploit when you are ready to cash in. The
interval makes the strategy
• Movie sequels are “exploit” strategies. It is short-termist and maybe studios think they are in the end of the
interval (their imminent demise)
• Win-stay, Lose-shift strategy (Multi-armed bandit problems) - Stay with a winning option long as it pays and
shift when it doesn’t. Though Win-Stay is generally a very optimal strategy for most games, Lost-Shift may not
be - It could be a rash move. Good options shouldn’t be penalized for being imperfect (Bit of Bayes will help?)
• Gittins index for multi-armed bandit problems (Devised for R&D in pharma for drug trials) - Uses geometric
discounting. Play the arm with the highest index.
• An untested rookie is worth more (early in the season) than a veteran of seemingly equal ability, because we
know less about him - Taking future into account drives us more towards novelty
• Regret and Optimism - To try and fail is at least to learn, to fail to try is to suffer the inestimable loss of what
might have been (Chester Bernard)
• Regret will never stop increasing, even if you pick the best possible strategy - because even the best possible
strategy isnt perfect every time. Only rate of growth of regret goes down over time with the best strategy
• Minimum possible regret is regret that increases at a logarithmic rate (As many mistakes in the first year as the
next 9 in the decade which would be the same as what you would make in the next 90)
• Upper confidence bound algo (Optimism in the face of uncertainty) - Doesn’t care about past payoffs as much
and looks for what could perform better in the future (restaurant with a single mediocre review has a higher
potential for greatness than one with 100s of mediocre reviews)
• Assume the best about new people, things or options in the absence of evidence to the contrary - in the long
run, optimism is the best prevention for regret
• Adaptive trials as an alternative to current clinical trials - chance of using a given treatment is increased with
each win and decreased with loss (mod of Win-stay, lose-shift)
• In general people tend to over-explore (could be because the world is a restless bandit and things change all
the time - that bad restaurant doesn’t have to remain bad - maybe its better now)
• Having instincts tuned by evolution for a world in constant flux isn’t necessarily helpful in an era of industrial
standardization
• Childhood is designed in such a way that exploration can be done without concern of payoffs
• Our intuition about rationality are often informed by exploitation than exploration - if you treat every decision
as your last - only exploitation makes sense
• Elderly have few social connections by choice focusing on meaningful connections (exploit)
3. Sorting
• The tabulation of the 1880 census took 8 years, barely finishing by the time the 1890 census (US) began
• Finding largest, smallest, rarest, indexing, finding dupes - they all start with a sort
• Sorting has negative operating leverage (more to sort, worse the cost) Corollary: Do laundry more often if
sorting socks is becoming a pain
• Worst case analysis lets us make guarantees on computing times - (Big-O notation in comp-sci always deals
with worst case times - as size of problem increases, how does the running time change?)
• Bubble sort - Sorting a book shelf pass after pass in quadratic time
• Insertion sort - Pull off all the books and put them back one at a time at the right place (Again quadratic time)
• Even checking if a bookshelf is sorted is in linear time involving a full scan of the shelf (So sorting in constant
time O(1) is out of the question). Ideal solution lies between Linear and Quadratic times (Mergesort)
• Mergesort has a linearithmic time O(n log n) - Sorting two already sorted stacks is a lot easier than one big
stack. Excellent for parallelizing - Call a bunch of your friends over and give them all a pile to sort and then
merge them all in the end (Difference in times in staggering - like 29 passes vs a few million for census level
items)
• Bucket sort - Sorting by categories - This can be done in almost linear time (Though will depend on number of
categories so O(mxn). Libraries do this all the time by genre
• Sorting something you will never search is wasteful. Searching something unsorted is merely inefficient (Okay
if you do it infrequently). Err on the side of messiness (I do, with my bookshelf which is merely bucketed into
read and unread. Besides, I enjoy the searching :-))
• We search with our quick eyes and sort with our slow hands - important thing to consider as well. Sometimes
mess is the optimal choice
• Round-robin and ladder tournaments have quadratic complexity and can be quickly overwhelming when
number of teams increases
• Single-elimination tournament can only decide Gold but not silver or bronze accurately (Still we use it in
Olympics and in smartphone camera shootouts)
• Games with noise like soccer (fluke or luck plays a part) gain from inefficiency of a Bubble sort. (IPL uses a
comparison counting sort)
• One of the important skills as a poker player is to be able to evaluate how good you are
• Blood sort - Pecking Order/Dominance hierarchies - Establishing a pecking order avoids a lot of confrontation
and bloodshed.
• Pecking order is easier to establish in a herd/flock/pack if group size is small. Ethical raising of livestock must
take this into account (keep group size small)
• When every knows their position in a pecking order, no games ensue (be it poker cash games or fight among
monkeys)
• Race is fundamentally different from a fight - Marathon is cardinal than ordinal - naturally orders the set by
finish time and doesn’t need pairwise comparisons
• Things like national GDP establish a dominance hierarchy among nations and avoids conflicts to some extent
4. Caching
• What to do when cache gets full is decided by an eviction policy (or replacement policy). Idea is to minimize
the number of times you can’t find what you are looking for in the cache
• The idea is to evict whichever item it is we will need the longest from now (almost like clairvoyance)
• Random eviction - As the name suggests - nuke whichever and make space, even if its your favorite shirt in
the daily rack
• FIFO - First In, First Out - Get rid of whatever has been sitting on longest
• LRU - Least Recently Used (Gold standard) - Get rid of whichever has not been used recently. Considering
frequency of use can also help improve along with recency
• People watch movies set close to where they live (So Netflix caches these movies in the closest CDN)
• Brain forgets to reduce cognitive load. Unlike eviction or replacement, brain loses references (Ebbighaus
foregetting curve)
• Human society functions like human beings in forgetting (similar curve for newspaper headlines) -
Availability bias is caused by this
• Cognitive decline from ageing could be caused due to sheer amount of information that must be processed
• Minimizing maximum lateness - Pick the task with the earliest Due date first (or serve customers who arrived
first)
• Minimizing the number of foods that spoil in a fridge - Eat food by earliest expiry date but toss the largest
item if it means consuming more items (Moore’s Algorithm)
• Minimizing length of TO DO list - start with the items with shortest processing time
• Only prioritize a task twice as long only if its twice as important (form some rules of thumb)
• A man with one watch knows what time it is, a man with two is never sure
• Give a system an overwhelming number of trivial things to do and the important things get lost (denial-of-
service)
• Staying focused on getting the weighty important things done - can also be an non-optimal approach (Mars
Pathfinder issue due to priority inversion)
• Straightening out a to-do list can become an item on the to-do list when the system doing the scheduling is
same as the one being scheduled
• Price paid for switching tasks - context switch (that’s why 16 hour days are more productive than 8 hour days
sometimes in writing and programming)
• Everyone you interrupt more than a few times in an hour has a danger of not doing anything at all
• Thrashing - when all you do is context switch without doing anything productive
• When a juggler takes one more ball, he doesn’t lose just that ball, he loses all
• Interrupt coalescing - responsiveness (how quickly can you respond) and throughput (how much work can you
get done) are always at odds - better to have someone answer the phone (responsive) while you get work done
(throughput)
• Donald Knuth - patron saint of minimal context switching (mails replied once in 3 months)
• (w+1)/(n+2) - Laplace’s law for estimating probabilities where w is wins and n is attempts (When you have
one win from one attempt, you have 67% chance, which is reasonable “optimistic” estimate than over-confident
100%)
• It was Laplace who did all the heavy-lifting though its called Bayes rule
• How long will something last - As long as it has already lasted is a useful rule of thumb (Gott’s Copernican
principle). This sounds very much like Lindy effect
• The richer the prior information we bring into Bayes rule, the better our predictions
• Be wary of what distributions your real world priors draw from (power-law vs gaussian). Populations and
incomes follow power-law than normal
• Copernican principle is Bayes rule for priors that are power-law distributed. (Multiplicative based on power-
law exponent - hence 2x for Lindy and 1.4x for movie collections based on collections so far)
• Things that are neither more nor less likely to end because they have gone on for awhile (Erlang distribution) -
Predictions based on additive rule.
• Small data is big data in disguise - The reason why we are able to predict well from small data is because our
priors are so rich
• Our judgements betray our expectations and our expectations betray our experience
• ability to resist temptation maybe a matter of expectations than willpower (if you know how long you have to
wait, you develop the will to wait)
• Priors are formed by experience but when a species gains language and the ability to speak, priors are formed
not just by personal experience but by shared experience which may have a skew to special/interesting things
affecting priors
• News reports interesting/special things which may be infrequent - don’t let infrequent things reported
frequently affect your priors (protect your priors for they are what you base your decisions on)
• How hard to think, how many factors to consider? (There’s wisdom in thinking less)
• Overfitting taste - when food taste excellent but nutritionally poor (Taste was a fit for nutrition as required in
the past)
• Company will build whatever it is the CEO decides to measure (Sam Altman)
• Ruthless and clever optimization of the wrong thing (Goodhart’s law, though the book doesn’t mention it by
name)
• Training scars (A cop habitually handed the pistol to the assailant after grabbing it from him as he had done so
hundreds of time in training)
• Cross-validation detects overfitting by seeing how well a class generalizes what it learnt (to see if it was only
taught to the test)
• Language forms a natural Lasso - Convey what you intend fast or you lose the audience’s attention
• Less information, computation and time can improve accuracy (Elevator pitches and investment thesis in few
lines)
• Nudge a model towards simplicity by controlling how quickly it adapts to new data
• single most important factor than multi-factors lead to better predictions in some cases
• Early Stopping provides the foundation for a reasoned argument against reasoning (thinking person’s case
against thought)
• Further ahead you are in a brainstorming session, thinner the pen’s stroke size should be (simplification by
stroke size - can’t get into minor details when coming up with broad outlines due to pen size)
• You can also regularize to the page (what doesn’t make the page isn’t important)
8. Relaxation
• Traveling salesman problem is O(n!) problem. Its not that a computer can’t find the shortest route but that as
number of towns increase, the problem has n! solutions and finding the best out of it is computationally hard
• Defining difficulty - Any algo that runs in polynomial time O(n^2) or O(n^3) is considered efficient or in
general O(n^m) even. O(n!) is considered intractable
• Relax the traveling salesman problem by allowing him to retract to the same town or visit same town multiple
times and form - this shortest route produces the minimum spanning tree
• Constraint relaxation like above lets us solve an easier version of a complex problem making the intractable,
tractable and making it a starting point
• If you are willing to accept compromises, even the hairiest problems can be tamed
• Lagrangian relaxation - “Do, it or else!” problems replied with “Do it or else what?” - coloring outside the
lines at a cost to make the intractable tractable. (Converting impossibilities into penalties and teaching the art of
bending/breaking the rules and living with the consequences)
9. Randomness
• Choosing the random option feels like a cop out but its far from it
• You need to know when to rely on chance, in what way and to what extent
• Complex quantities can be estimated by sampling (value of pi can be estimated by dropping a needle on paper)
• Test of first-rate intelligence is the ability to hold two conflicting thoughts and still be able to function (F.
Scott Fitzgerald)
• Monte Carlo simulations - Replacing exhaustive probability calculations with sample simulations
• Sieve of Erastothenes - One of the first algorithms for finding prime numbers in ancient Greece
• It is much easier to multiply two primes than factor them back out (especially for very large ones) - forms the
basis of cryptography
• Testing for prime - primality - Miller’s approach is better than sieve of Erastothenes but it involves probability
than certainty
• Along with Time and Space tradeoff, in recent times, the third dimension of error probability is added (You
can get an answer fast but there is small chance of error)
• Bloom filters works like Miller-Rabin test for primality (Used for detecting malicious sites)
• Greedy/myopic algo - One that takes the best thing available, every step of the way (in the context of Gradient
Descent as in ML or Hill-Climbing as used in the book). To get from a local minima to a global minima, you
have worsen your solution a bit
• Use a little bit of randomness everytime you make a decision (Metropolis algorithm)
• Simulated Annealing (heating and then slowly cooling) from metallurgy is used in ML for optimization
• Even if you are in the habit of acting on bad ideas, you should always act on the good ones (Hill Climbing
algo)
• Your likelihood of following a bad idea should be inversely proportional to how bad the idea is
• Front-load randomness, rapidly cooling out of it, using lesser and lesser randomness as time goes by, lingering
longest as you approach freezing (Temper yourself, literally)
10. Networking
• Circuit switching - constant bandwidth, always on - made sense for human communication. But having a
dedicated connection to something that’s never talking but when it does wants an immediate line gave rise to
packet switching (like postcards moving at the speed of light)
• Packet switching - reliability increases exponentially with network size unlike circuit switching where calls
fail if any link gets disrupted
• Byzantine generals problem (confirmation of receipt of message requires another message causing endless
recursion)
• Exponential backoff algorithm - The algorithm of forgiveness. Ubiquitous for reconnecting - Max delay length
is exponential (Its a random number of seconds delay under max though to attempt reconnect)
• We must replace three strikes and you are out (finite forgiveness against infractions) with finite patience and
infinite mercy (Simply take longer intervals to try again but never give up)
• Control without hierarchy - More ants leave the nest, the more successful the foraging but unsuccessful
returnees result in diminishment
• Every employee tends to rise to his level of incompetence (Peter Principle) - Every employee gets promoted
but stagnates in a position where he doesn’t do well. So over time an organization is filled with people doing
their worst in the post they are in. (Demotions help)
• With poor feedback from the listener, the story falls apart
• Photons that miss the retina aren’t queued for later viewing - In real life, packet loss is total
• It used to be that people knocked on your door and went home if there was no answer, now they wait in queue
(on Email)
• Circuit switching to packet switching has happened to society - Instead of dedicated lines, we send messages
and instead of reject, we defer
• Companies that advertise fast internet connections are actually advertising higher bandwidth than lower
latency
• Algorithmic game theory - cross-pollination between game theory and computer science
• Successful investing is anticipating the anticipation of others (Keynes) or what Average opinion expects the
average opinion to be
• Halting problem - when a machine or mind tries to simulate something as complex as itself, it finds its
resources maxed out, by definition
• You really want to play only one level above the opponent - if not you are going to think they have
information they don’t really possess
• Equilibrium - Content with my strategy given yours, and you are content with yours given mine
• Every two player game has at least one equilibrium (John Nash)
• A low price of anarchy means that the system is about as good on its own than being carefully managed
• Tragedy of the commons - two player prisoner’s dilemma extended to many players - we can easily end up in
a terrible equilibrium with a clean conscience
• If the rules of the game force bad strategies, we shouldn’t change the strategies, we should change the game
(reverse game theory or mechanism design - what rules will give us behavior we want to see?)
• If a tree grows taller to get more sunlight and the rest of the trees do too, to the same level - the canopy finally
gets the same as it did before - except now it supports the trunk at a higher cost (Dawkins)
• Morality is herd instinct in the individual (Nietzsche). Emotion is mechanism design in the species
• Sealed-bid first price auction, Dutch auction (Descending) and English auction (Ascending). In a Dutch
auction, its the absence of bid that reveals information
• Information cascade (or infinite misinformation) - When players take others’ actions for beliefs and act
accordingly and that reinforces someone else’s belief. Consensus unglues from reality (Happens all the time in
the stock market)
• Be wary of cases where public information exceeds private information (to avoid being trapped in information
cascades) and also be wary of situations where you know more about what people are doing than why they are
doing it and hesitant to overrule your own doubts
• Vickrey auction (second-best bid) - winner pays second best bid - participants are incentivized to be honest
• Any game that requires strategically masking the truth can be transformed into one that requires nothing but
simple honesty
• Hell is other people (Sartre) - (Not because of maliciousness but because of the way they affect our beliefs)
• Popularity is complicated, intractable, a recursion in a hall of mirrors, but beauty in the eye of the beholder, is
not. Adopting a strategy that doesnt require anticipating, predicting other’s tactics is one way to cut the Gordian
knot of recursion
• Seek out games where honesty is the dominant strategy. Then just be yourself
If you have made it this far, this book is for you. 11/10