| README.md | ||
Dev Notes
This is a random and unsorted collection of notes about several topics like CS, programming, people management, company dynamics and software planning.
Important: those notes could have information that is not accurate or true.
Note to me
When linking a video: always include the title of the video and the author (if not already mentioned).
1 Software Planning
.001 Casey Muratori on planning software
How most of us view software architecture:
o---------------------o o-------------------o
| software architect | --> | programmer builds |
| creates a blueprint | --> | the program |
o---------------------o o-------------------o
But software does not work that way. If you deeply architect the software, you basically already build it, but in a different language. Mostly: UML Graphs.
A better way to view it: Urban planning. You create a basic layer where things should be, but giving the freedom to the programmer how to exactly they should build it.
o-------------------------o o------------------o o----------------------o
| urban planner | --> | architect | --> | builder / contractor |
| (aka software designer) | --> | (aka programmer) | --> | (aka compiler) |
o-------------------------o o------------------o o----------------------o
I'm now replacing the term "Software Architect" with "Software Planner" or "Software Designer".
- Title: Handmade Hero Day 026 - Introduction to Game Architecture
- Title: Handmade Hero Day 027 - Exploration-based Architecture
.002 Eskil Steenberg on designing for large projects
Note: E. Steenberg talks about very big projects, so not everything could apply to every code base.
Key points that are important (to me)
- every module you don't own/control: wrap it inside a custom platform layer
- split the code in modules and only one developer is responsible for it
- a developer can have multiple modules, but a module cannot have multiple developers
- keep your APIs consistent
- if you require API modifications, consider extending it by a new module
E. Steenberg talks about "plugins", where I refered to "modules". In his case, he
meant actual *.dll's.
In my case, I would keep the modularity inside the code ... but I have to think
more about it.
2 Security in Software
.001 Felix "Fefe" von Leitner's Talks
.01 Trusted Computing Base (TCB)
tbd
.02 OS Privileges
tbd
.03 Security in general
tbd
.04 Writing Secure Software
tbd
.002 Security Engineering
tbd
- Book: Security Engineering: A Guide to Building Dependable Distributed Systems
- Author: Ross Anderson
- ISBN-13: 978-1119642787
.003 Structured and immutable Logging
In some systems, it is important that the logs are structured so we're able to correlate events.
Also some logs, like audit logs, must be protected against tampering.
Like using hash chains (Merkle Trees), append only permissions, and timestamps from a NTP server.
The following Mastodon thread is from Kris.
In case those posts won't exist in some time, I'll summarize it here:
- structured logging is important - i.e. date, host, pid, program, error, ...
- NTP synced timestamps
- some logs are more important, like accouting or audit logs
- important logs need objects like: subject="Bob", role="admin", method="sudo"
- if violations are detected inside those logs, (disciplinary) action must be taken
- important logs must be append only and protected against tampering
- common for important logs is a Merkle-Tree/hash chain
journaldprovides such utilities- the program shouldn't have direct control over the logging. They should use an API for that
3 Network
.001 Measuring Network Latency
Key points:
- most users will experience the 99th percentile
- we want 99.9... percentiles
- in most measuring tools, they hide the important data
- when load testing, test latency at every load step not only at max load
- find out, when the tipping point occurs
- define your goals under specific loads
- widespread problem: coordinated omission
Coordinated Omission
describes scenarios, where the important data is masked.
For example, you're measuring latency within your code, but some runtime induced lag
hits (GC, buffer flush, thread block, context switch). Then your measurment is incorrect,
since it might miss or skew your data.
Or if you send some request per $time_intervall, but the request hangs for a bit, so your
intervall is skewed.
Which is ironic, because maybe your server caused the hanging, but you never notice is since
your tool just back offs and proceeds with the next request.
Formula of the chance one user would experience the 99th percentile
def f(n):
return (1 - (0.99 ** n)) * 100
Service Time vs Response Time
Service Time -> Server is doing its stuff (calculating things, transforming data, talking to the DB)
Response Time -> Client is waiting for something the server processes
In some scenarios only the service time get's measured, which appears constant.
While the response time degraded and nobody sees that. So account also the response time in your
measurments.
CTRL+Z Test
Using ctrl-z on a forever running task will stop it. You can resume the task
via fg <job nr>. And via jobs you can view how many jobs are suspended and which <job nr>
they have.
Stopping your server under load, could reveal the presence of "coordinated omissions".
Because if we measure the latency every 1ms for 100s, we have 10_000 data points. If we stop the server after 100s for another 100s, often the monitoring system will only store one data point. And since the server does not respond, the measuring tool stops too.
This skews the percentiles and hides the really bad stuff. The correct way would be to continue the measurment, even if the server is not responding. And then the data reveals the problem.
- How I Learned to Stop Worrying & Love Misery • Gil Tene
- Title: "How NOT to Measure Latency" by Gil Tene
- Bench Tool without coordinated omission
- Repo: HdrHistogram
.002 About Latency #2
Quoting important sections of the blog here.
- Blog: Everything You Know About Latency Is Wrong
- Author: Tyler Treat
Latency is defined as the time it took one operation to happen. This means every operation has its own latency—with one million operations there are one million latencies.
As a result, latency cannot be measured as work units / time. What we’re interested in is how latency behaves. To do this meaningfully, we must describe the complete distribution of latencies.
Latency almost never follows a normal, Gaussian, or Poisson distribution, so looking at averages, medians, and even standard deviations is useless.
Remember that latency is not service time. If you plot your data with coordinated omission, there’s often a quick, high rise in the curve.
Run a “CTRL+Z” test to see if you have this problem. A non-omitted test has a much smoother curve. Very few tools actually correct for coordinated omission.
4 Programming
5 Management / Company
.001 Managing People: What works, what doesn't
tbd
- Book: Peopleware: Productive Projects and Teams
- Author: Tom DeMarco, Tim Lister
- ISBN-13: 978-0321934116
.002 Why people don't want to work at your company
tbd
- Es Es Ka Em: Warum gute Leute nicht bei euch arbeiten wollen
- Author: Florian Haas