0% found this document useful (0 votes)
87 views14 pages

Improving Coin Selection in BDK 0.3

César Alvarez Vallero is a fifth year dual degree student in computer science pursuing a bachelor's and master's degree in Argentina. He has been interested in Bitcoin since first learning about it at age 11. He is applying to the Summer of Bitcoin program to work on improving coin selection algorithms in BDK. Coin selection is important for fee efficiency, privacy, and memory usage. The project goals are achievable and there are reference implementations to build upon. César believes he can contribute ideas to improve the current coin selection framework in BDK.

Uploaded by

Rahul Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views14 pages

Improving Coin Selection in BDK 0.3

César Alvarez Vallero is a fifth year dual degree student in computer science pursuing a bachelor's and master's degree in Argentina. He has been interested in Bitcoin since first learning about it at age 11. He is applying to the Summer of Bitcoin program to work on improving coin selection algorithms in BDK. Coin selection is important for fee efficiency, privacy, and memory usage. The project goals are achievable and there are reference implementations to build upon. César believes he can contribute ideas to improve the current coin selection framework in BDK.

Uploaded by

Rahul Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Improving coin selection in BDK

Personal Details
Name: César Alvarez Vallero
Twitter: https://twitter.com/csralvall

I am a fifth year dual degree student pursuing a B.Sc. CS + M.S.C.S. at the National
University of Córdoba. I know that Summer of Bitcoin kick-off in early May and that it is
expected to be a full commitment during the 12 week period that the program lasts. My
semester is not aligned with the expected dates because I live in a different hemisphere but I
think that this program is a unique opportunity to start working directly In the technology that
I think will revolutionise the world in the next fifty years (and is currently happening). I am
willing to put a great effort in making this project my first priority and I think that I have the
right background and the right incentives to complete this program with success and
continue my contribution beyond the Summer of Bitcoin. Let me explain further:

I didn’t find SOB as any student navigating through the web. I have been reading and
learning about bitcoin since the first time that I read about it, when I was just an eleven years
old boy, when I didn’t have any idea of game theory or cryptography. When I didn’t
understand the whole whitepaper. In all these years, I have been able to evolve from the
idea of becoming rich to the idea of becoming a sovereign individual, I have been able to
appreciate the resilience of the whole system, to analyse the development methodology that
allows to build an anti status quo tool in an open source and decentralised fashion, I have
been able to mature my comprehension of Bitcoin. It was one of the things that decided me
to study Computer Science, and it is today the thing that keeps me studying. It pushes me
outside my comfort zone.
When I said that Summer of Bitcoin didn’t find me, that’s because I was already looking for it
before it even existed.
In 2019 (second year of my career) I was looking for Bitcoin Contributors that inspire me with
their work and show me that it was possible to start contributing to something like Bitcoin
Core, that’s when I discovered John Newbery’s article Contributing to Bitcoin Core, a
personal account. It delivered me some useful and simple steps to advance in the Bitcoin
world (and applicable in a myriad of other places):
1. Start small
2. Hold yourself accountable
3. Sharpen the tools
4. Offer help and ask for help
5. Be a contributor, not a developer.

That year I found the chain code labs residency, and It was a great deception to find that I
was not up for the rise. I didn’t communicate well in English, I didn’t have any prior
experience developing that I could capitalise, neither contributions to open source projects
and even today there are things that are hard to comprehend for myself about Bitcoin. That
doesn’t stop me.
In the year 2020, I continued reading about Bitcoin, from a not so technical perspective, but I
was on it. I started sharpening my tools, stacking sats and putting skin on the game.

2021 came with the great news of Summer of Bitcoin, although it was only open for Indian
students, I realised that I have to start preparing myself for the next coming opportunity. I
started making contributions to Learning Bitcoin from the Command Line. It helped me a lot
to learn about new ways to use Git to make contributions to open source projects, to
communicate my ideas in plain English, to work in a distributed team, to sharpen my tools,
start small, hold myself accountable and be a contributor, not a developer. That year I built
my own node with a Raspberry Pi 4 and started the path to become a sovereign individual.

This year, 2022, I discovered a super useful project from Francisco Calderon. A p2p no KYC
exchange bot mounted over Telegram. It was wonderful to discover that contributing to that
project was not as hard as I thought and I could make valuable code contributions to the
project. Also, I provide support in the bot channel, I offered help and asked for help there.

When this edition of Summer of Bitcoin was announced, I was already waiting for it.

I expect to finish my degree at the end of this year. I’m three subjects and one thesis away
from it. I will be enrolled in three subjects in my first semester, from 14/03 to 24/06. I expect
an overall academic demand of 24hs weekly, including studying time. After 24/06 our
colleges enter winter recess, where I will have a lot of time to contribute to the project. I
know the SOB schedule is from 9/05 to 15/08, but If I left college to participate in it I would
be a university student who is not studying. If I am selected, I still shall be able to work
around 40 hrs a week on the project, though I am open to putting in more effort if the work
requires. Below I will share my expected calendar to make it crystal clear. Also, I want to
develop this project further and use it as a base for my degree thesis.

I hope to have explained to you my reasons and incentives with enough clarity. As someone
said on twitter: I came for the money, I stayed for the revolution.
My time schedule from 09/05 to 24/06

My time schedule from 27/06 to 15/08

Why this project?


Why BDK?
Wallets are the main way that I use to interact with the Bitcoin network. They handle my
keys, generate addresses when I need them, build and sign my transactions and also select
the UTXO when I want to pay for something. All of that without me noticing any blink.
The goal of BDK to build a modular cross platform system to build wallets seems for me the
best place to start with Bitcoin development. I feel that the learning curve in BDK is more
friendly than in the Bitcoin Core wallet module. Because of the age of the project, it's easier
to read the repository and follow the history of the files. I guess this will not be true in the
future, but that means that it is a perfect time to join the project as a contributor.
Rust is the programming language most actively used in the Bitcoin ecosystem lately, in
projects like Miniscript, rust-lightning, Stratum, LDK, RGB or The Eye of Satoshi, so I think it
could be a good way to start contributing to more projects than this in particular.

Why coinselection?
I think that Coinselection is one of the most optimizable aspects of Bitcoin wallets, not only
related to fee efficiency, but also to privacy and memory bloat. I find optimization problems
more compelling and approachable than cryptographic ones. The coinselection proposal of
SOB 2022 in BDK is a great way to start acquiring knowledge about wallet development,
because their objectives are quite doable and there are reference implementations to build
over. Also, I think I have some ideas that could improve the current coinselection framework,
but I need to gain experience with the current state of the implementations to know if they
are practicable or not.

Technical Knowledge
I am a 5th year dual degree student in Universidad Nacional de Córdoba. I am enrolled in a
5 year B.Sc.+ M.S.. course. Here in Argentina we don’t speak in terms of minors or majors,
but my degree can be considered a major is Computer Science. The most relevant courses
that I have done are:

- Algorithms and Data Structures


- Operating Systems
- Programming Languages
- Networks and Distributed Systems
- Software Engineering

I have done some courses in some e-learning platforms about React, Machine Learning
(probably the most relevant here) and AWS.

Some projects related to Bitcoin where I’ve contributed for are:

- P2P No-KYC Lightning Network Telegram Bot: I contributed new features to the bot,
like a scoring system for the traders or range orders. I also give support to new users
and report bugs (if I cannot correct them myself) in the bot channel.
- Learning Bitcoin from the Command Line: I contributed with the corrections of the
original English version and the translation of the Spanish version. In this project I
developed solid skills with git flow for open source contributions.
I’ve done some personal projects also:

- Multitype singly linked list: Used C to implement an independent type single linked
list, taking advantage of the macros of C. Documented with doxygen and with its own
makefile.
- Marketdex: A frontend for an ecommerce platform developed with React.

Currently I’m doing an internship in a local company called Bitlogic here in Cordoba,
Argentina. I’m working on the development of an e-learning platform related to physical
exercise courses called g-se. I provide full stack support for my assigned team and work with
Go, React (through NextJS) and AWS.

I’ve been selected, for four consecutive years, based on my academic performance, to be
teacher assistant. I contribute mainly in the Algorithm and Data Structures subjects.

Programming languages I have previously worked with include C, Python, Scala, Go,
Javascript (Node, React and NextJS), ARMv8 assembly and Haskell.

I have built Testnet/Regtest Bitcoin on my laptop and have been working with Lightning
Network through Polar and docker containers. I also have my own node running RaspiBlitz,
with testnet and signet enabled.

Project
Synopsis

BDK currently uses two coinselection algorithms. Bitcoin Core uses three at the same time,
deciding based on a metric that tracks the excessive amount spent in each transaction.
That metric doesn’t have any privacy consideration in its final value, and, until now, no one
has proposed another metric that accounts for those concerns. The people involved in the
thinking of these metrics are quite knowledgeable about the topic so I think that It may be
hard also for a newcomer, like me, to imagine a new metric or make huge changes to the
current state of the topic. So, I have structured my proposal in simple steps that would bring
new features to the current implementation in BDK, improve the modularity and extensibility
of BDK, include tools to iterate fast in the development of future coin selection algorithms
and allow me to think more about this topic to, eventually, come up with new ideas to work
with.

Project Plan

The current implementation of the algorithms of coin selection in BDK it’s built around the
following structures:
- The trait CoinSelectionAlgorithm, which at the moment only provides the signature
of one method, coin_select.
- The struct CoinSelectionResult, which encapsulates the result of coin_select for
each algorithm.
- The struct OutputGroup, which joins each UTXO with its effective value, i.e, the
amount of bitcoin of that UTXO minus the fees associated with it.

As I said, I propose small steps to iterate over the coin selection module of BDK to improve it
gradually. The first of this small steps is centred in the main trait of the module,
CoinSelectionAlgorithm:

Removal of the generic type parameter D:

I consider that the generic type parameter D applied to this trait belongs to a lower level and
can be easily incorporated to the algorithms with different strategies.
To begin with, the currently implemented algorithms, Branch and Bound and Largest First
don't use the database parameter. One of the first algorithms that could make use of the
parameter is Oldest First, currently in review process, in #557.
LLFourn suggested also to remove the type parameter in #281, but replacing it with a new
&dyn Database parameter in the coin selection, like this:

pub trait CoinSelectionAlgorithm: std::fmt::Debug {


pub fn coin_select(&self, &dyn Database, ...) -> Result<...>;
...
}

The solution that came to my mind was to remove the database parameter from the
coin_select method and include the generic type parameter as part of the struct that
implements the CoinSelectionAlgorithm, making the database a field in the struct
implementing the trait. In that way, the coin_select implementation of that struct can make
use of the database through its own field. E.g.:

pub trait CoinSelectionAlgorithm: std::fmt::Debug {...}

#[derive(Debug, Default, Clone, Copy)]


pub struct OldestFirstCoinSelection<D: Database> {
database: D,
...
}

impl<D: Database> CoinSelectionAlgorithm for


OldestFirstCoinSelection<D> {...}

I also thought about a third solution combining the ideas of LLFourn and me:
pub trait CoinSelectionAlgorithm: std::fmt::Debug {...}

[derive(Debug, Default, Clone, Copy)]


pub struct OldestFirstCoinSelection<D: Database> {
database: &dyn D,
...
}

Another option, to avoid hiding the Database behind a struct field would be to include the
parameter database as an Option<&dyn Database>. In the case of the current
implementation of Branch and Bound or Largest First, that field would be None, but it would
make clear the connection with a possible database. E.g.:

pub trait CoinSelectionAlgorithm: std::fmt::Debug {


pub fn coin_select(&self, Option<&dyn Database>, ...) -> Result..;
...
}

All of the above was mentioned in #281. Any of the above changes removes the general
type parameter without disabling the use of a database parameter for algorithms like Oldest
First.

Addition of simulation method:

Another great improvement for this trait would be the addition of a method to simulate the
implemented coin selection algorithms under different scenarios. Like the
CoinSelectionSimulator implemented by Murch but natively in BDK. I think this will
encourage the development of new coin selection algorithms in BDK because of the quick
testing and comparison enabled by this method. The additional inputs that it will have are:
- UTXO arrival rate.
- UTXO exit rate.
- Mean and standard deviation for amount of transaction.
- Mean and standard deviation for fee rate or a file with a list of fee rates.

The return value of this method should be Result<(), Error> and it should write the results of
the simulation in a csv file for further analysis.
Anthony Chow shared some of the results of simulations of coin selection algorithms and I
took note of the data he collected from it. I think that the csv file should have, at least, the
following columns:
- Hash of HEAD
- Algorithm
- Last Balance
- Mean #UTXO
- Last #UTXO
- #Deposits
- #Inputs Spent
- #Withdraws
- #Uneconomical outputs spent
- #Change Created
- #Changeless
- Min Change Value
- Max Change Value
- Mean Change Value
- Std. Dev. of Change Value
- Total Fees
- Mean Fees per Withdraw
- Cost to Empty
- Total Cost
- Min Input Size
- Max Input Size
- Mean Input Size
- Std. Dev. of Input Size
- Mean Waste
- Std. Dev. of Waste
- Max Waste
- Min Waste

The simulation function would finally look like this:

type TxDistribution = (f64, f64);


type FeeRateDistribution = (f64, f64);

enum FeeRateSource {
FeeRateDistribution;
File;
}

pub fn simulate(
&self,
UTXO_arrival_rate: f64,
UTXO_exit_rate: f64,
tx_distribution: TxDistribution,
fee_rate_source: FeeRateSource
) -> Result<(), Error>;
Expansion of OutputGroup structure:

The other structure that I want to modify is OutputGroup. At the moment that structure only
holds the effective value of each UTXO related to a determined fee. On the master branch of
Bitcoin Core, the struct OutputGroup covers more than the effective value. The conceptual
idea behind OutputGroup is, at least in Bitcoin Core, to join all the UTXOs, up to a certain
number, that were sent to the same address. This encapsulation allows the algorithms to
work over groups and also individual UTXOs and adds more privacy to the wallet because it
limits the exposition of the associated address to the public blockchain.
The addition of those features to the struct OutputGroup of BDK would make a step
towards a better privacy. I want to emphasise that this change is not restrictive, because the
maximum number of UTXOs per OutputGroup is configured by a parameter. The one
discussed here.
The basic structure that I propose, as a base to further development, is the following:

struct OutputGroup {
// The list of UTXOs contained in this output group.
weighted_utxos: Vec<WeightedUtxo>,
// Amount of fees for spending these utxo, calculated using a
certain FeeRate
fee: Fee,
// The value of the UTXOs after deducting the cost of spending them
at the effective feerate.
effective_value: i64;
}

This modification implies the change of the signature of the coin_select method, and hence,
the implementation of the coin selection algorithms implemented in BDK. At first, the
complexity of those changes seems moderate.

Addition of waste metric:

CoinSelectionResult current structure is the following:

pub struct CoinSelectionResult {


/// List of outputs selected for use as inputs
pub selected: Vec<Utxo>,
/// Total fee amount in satoshi
pub fee_amount: u64,
}

There is a proposed implementation of a waste metric for this structure that could be helpful
to compare the performance of different coin selection algorithms based on that metric. Also,
as the final output of the coin selection algorithms, this struct can provide more information
about the content it stores.

So, for this struct I took the proposed implementation of the waste metric by benthecarman
and I refactor its code in PR #558. To implement that PR I considered that the waste metric
for each CoinSelectionResult shouldn’t be modifiable externally but the calculation of this
waste metric should be useful in multiple places.
So, I modified the function implemented by Ben and made it a general one, not associated
with any struct. I made a type alias Waste to represent the i64 value of the waste metric with
a meaningful name. At the same time I took inspiration from the discussion about this metric
in the PR code review club to make the code more explicit about each part of the metric, and
added some tests to express the border cases that this metric could have.
To integrate this function with the CoinSelectionResult struct I added a new private field,
waste, of type Option<Cell<Waste>> to it with its own getter method, get_waste, that
retrieves the value of the field if there is a value on it or computes and sets it if it is not.
All modifications to the field can only be made through the constructor or the get_waste
method of the struct.

The final signature of the above changes would look like this:

pub type Waste = i64;


/// Result of a successful coin selection
#[derive(Debug)]
pub struct CoinSelectionResult {
/// List of outputs selected for use as inputs
pub selected: Vec<Utxo>,
/// Total fee amount in satoshi
pub fee_amount: u64,
/// Waste value of current coin selection
waste: Cell<Option<Waste>>,
}

impl CoinSelectionResult {
/// Create new CoinSelectionResult
pub fn new(selected_utxos: Vec<Utxo>, fee_amount: u64,
selection_waste: Option<Waste>) -> Self {...}

/// The total value of the inputs selected.


pub fn selected_amount(&self) -> u64 {...}

/// The total value of the inputs selected from the local wallet.
pub fn local_selected_amount(&self) -> u64 {...}

pub fn get_waste(
&self,
selected: Vec<WeightedUtxo>,
cost_of_change: Option<u64>,
target: u64,
fee_rate: FeeRate,
long_term_fee_rate: FeeRate,
) -> Waste {...}
}

pub fn calculate_waste(
selected: Vec<WeightedUtxo>,
cost_of_change: Option<u64>,
target: u64,
fee_rate: FeeRate,
long_term_fee_rate: FeeRate,
) -> Result<Waste, Error> {...}
}

You can look up for further information in PR #558.

New coin selection algorithms for BDK:

There is a new coin selection algorithm inclusion in BDK in process, OldestFirst in PR #557.
This inclusion was proposed by Murch in the issue #120 along with random selection, to set
a basement performance to compare future algorithms.

At the moment, there is a random selection algorithm, Single Random Draw, implemented
as a fall back complement for the BranchAndBound coin selection algorithm. I propose to
move it outside of the BranchAndBound algorithm and make it an implementation of
CoinSelectionAlgorithm. This change will allow the execution of the simulate method on
this algorithm to get the necessary data to fix a baseline for the performance of coin
selection algorithms.

Optional features
There are other proposed algorithms in different Bitcoin or altcoin projects whose
implementation I don’t think may be feasible in the period of the project, but may worth to
mention as possible future work:

- I find the idea of self organisation in bitcoin wallets attractive. The basic idea is to
produce the outputs that you use the most through the generation of change UTXOs
of similar amount to what you are sending. The balance between the increment of the
fees because of the extra output and the possible cost decrement caused by the
increase of exact matches is a topic that may require further study. This subject was
mentioned by Jameson Lopp in his blog article The Challenges of Optimising
Unspent Output Selection (Privacy, point D) and Edsko de Vries in the article Self
Organisation in Coin Selection, who talks about the coin selection algorithm
implemented in Cardano, Random Improve, who could serve as a base to improve
the idea. A tweak of this idea would be to model the transaction amount distribution
of each wallet and create change outputs based on that distribution.
- There is a proposal to explore the use of genetic algorithms in coin selection
algorithms in Bitcoin Core.
- A paper was produced about a new variation of Knapsack algorithm for coin
selection. I made a quick read of it and I think it didn’t have something new to add to
the topic, but it may be worth reviewing again.
- Coin selection leveraged with L2 solutions, as mentioned in #332.

Addition of consolidation and diversification primitives:

Related to the self organisation of wallet UTXO set is consolidation and diversification. There
are some time windows of low fees when these operations are economically feasible and
allow the improvement of the wallet UTXO set in terms of privacy and spend availability.
Some primitive functions to perform consolidation on wallet UTXO sets could be used in
some variation of Random Improve to join multiple small UTXOs (or one Output Group with
multiple UTXOs) into a single or multiple UTXOs which summed amount fall inside the range
of the most common produced outputs for that particular wallet. The same applies for
diversification with big UTXOs. The idea would be to build a distribution of UTXOs with the
average output amount of the wallet as the mean.

Improvement of documentation:

Finally, I want to produce self-explaining code that is well documented. So, to develop this
idea I would start by improving the current implemented algorithms, structs and trait
documentation. Providing one example of the usage of each algorithm should be enough,
and adding some information about the tradeoffs of each one would help too. The extraction
of the particular errors for the coin selection module in its own enum might be useful also.

Project Timeline

I’m used to the scrum methodology. I know it may not apply to this kind of project strictly, but
it can be a good inspiration to start drawing a draft of the project schedule.
The timeline provided in the website of Summer of Bitcoin sets a 12 week schedule to finish
the project. Using a scrum mindset, that would be 6 sprints of two weeks each. The roadmap
design should be designed prioritising the most complex and blocking features first trying to
deliver one complete feature on each sprint.

Considering the complexity of each feature, the following scores has been given, using the
Fibonacci series, to each of the major structures subject to changes:

1. CoinSelectionResult +2.
2. CoinSelectionAlgorithm database +3
3. Output Groups +5
4. CoinSelectionAlgorithm simulate +5
5. Single Random Draw algorithm +3
6. Random Improve algorithm +5
7. Documentation +2

The overall sum yields 28 complexity points.

To develop this features I have considered the following timeline:

- First sprint, 23/05 - 5/06:


As I have already started working with the waste metric in #558, I think I should work
with the final codechanges the first week of the sprint, and dedicate the second one
to work with more tests, if needed, and the documentation.

- Second sprint - 6/06 - 19/06:


The removal of the database algorithm should happen in the first week, including
modification of all the dependent structures, manual testing of the change and the
update of automatic tests and documentation.
Second week should be focused on OutputGroups and the addition of this new
parameter to LargestFirst and BranchAndBound. This change should be double
checked along all BDK. The update of test and documentation is also considered as
part of this week's work.

- Third sprint - 20/06 - 03/07:


This sprint may be considered critical because the first week is oriented to start
working in the simulate function. The work on this feature may take both weeks, but
in the opposite case, documentation and tests can be added by the second week of
this sprint.

- Fourth sprint - 04/07 - 17/07:


This sprint will be dedicated to the development of the Random Improve algorithm
used in Cardano. This work may be very similar to SRD but focusing on the change
outputs of the generated transactions.
As with the other sprints, the second week will be oriented to the addition of tests,
documentation and examples of usage of this algorithm.
- Fifth sprint - 18/07 - 24/07:
In this sprint the algorithm Single Random Draw will be extracted from the Branch
And Bound algorithm. The major concern about this work is related again with the
dependencies of other structures in BDK. The work required by this refactoring may
take the whole first week of this sprint. Again, tests, corrections and documentation
will be scheduled for the second week of this sprint.

- Sixth sprint - 25/07 - 07/07:


I expect to dedicate the last sprint to the improvement of the work developed in all
the previous sprints. The first week will be focused on providing some results
associated with the simulate method in a README file, comparing the different
algorithms through the data obtained.
Second week will be dedicated to the improvement of the coin selection error
module, adding descriptive names for each one and modularizing when necessary.
Also, a wrapper method for coin selection optimised by waste metric involving all the
implemented methods in BDK could be added to show the benefits of the metric.

You might also like