Out-of-distribution Bioattacks

The main goal of my work these days is trying to reduce the chances of individuals or small groups causing large-scale harm through engineered pandemics, potentially civilizational collapse or extinction. One question in figuring out whether this is worth working on, or funding, is: how large is the risk?

One estimation approach would be to look at historical attacks, but while they've been terrible they haven't actually killed very many people. The deadliest one was the September 11 attacks, at ~3k deaths. This is much smaller scale than the most severe instances of other disasters like dam failure, 25k-250k dead after 1975's Typhoon Nina, or pandemics, 75M-200M dead in the Black Death. If you tighten your reference class even further to include only historical biological attacks by individuals or small groups, the one with the most deaths is just five, in the 2001 anthrax attacks.

Put that way, I'm making a pretty strong claim: while the deadliest small-group bio attack ever only killed five people, we're on track for a future where one could kill everyone. Why do I think the future might be so unlike the past?

Short version: I expect a technological change which expands which actors would try to cause harm.

Benchmarking Bowtie2 Threading

I've been using Bowtie2 to align reads to genomes, and one of it's many settings is the number of threads. While sometimes people advise using about as many threads as your machine has cores, but if I'm running on a big machine are there diminishing returns or a point at which more threads are counterproductive? Am I better off running more samples in parallel with more threads each, or fewer with fewer?

I decided to run a few tests on an AWS EC2 with a c6a.8xlarge 32-core AMD machine. The test consisted of running one 7.2Gb 48M read-pair sample (SRR23998356) from Crits-Christoph et. al 2021 through Bowtie2 2.5.2 with the "Human / CHM13plusY" database from Langmead's Index Zone. The files were streamed from AWS S3 and decompressed in a separate process. See the script for my exact test harness and configuration.

What I found (sheet) was that initially allocating additional threads helps a lot, but after ~8 it was plateauing and after ~10 more threads were very slightly starting to hurt:

Losing Metaphors: Zip and Paste

In python (and several other languages) if I have two lists and want to process corresponding elements together I can use zip:

>>> for number, letter in zip(
...    [1,2,3,4], ["a", "b", "c", "d"]):
...  print(number, letter)
1 a
2 b
3 c
4 d

The metaphor is a zipper, taking the two sides and merging them together. It's not perfect, since a zipper interleaves instead of matching pairs, but it's pretty good.

In unix, there's a command line tool, paste that does the same thing:

Process Substitution Without Shell?

While I still write a decent amount of shell I generally try to avoid it. It's hard for others to read, has a lot of sharp edges, tends to swallow errors, and handles the unusual situations poorly. But one thing that keeps me coming back to it is how easily I can set up trees of processes.

Say I have a program that reads two files together in a single pass [1] and writes something out. The inputs you have are compressed, so you'll need to decompress them, and the output needs to be compressed before you write it out to storage. You could do:

Accounting for Foregone Pay

While the effective altruism movement started out with a strong focus on donations, over time it has shifted more towards careers. If you're trying to understand how levels of commitment have changed over time, or you're just trying to get a ballpark estimate of the financial opportunity cost of choosing a lower paying career, this can be quite tricky.

For someone earning to give this is relatively straightforward: AGB recently wrote a thoughtful post looking back at ten years of earning to give, and a statistic he gives is that he and his wife have donated an average of ~£150k over ten years, on a combined income averaging ~£320k. Clear cut! [1]

The case of someone choosing a lower-paying higher-impact career seems initially relatively simple: perhaps they're currently paid $100k, and if we look at their highest paying opportunity maybe they would be paid $300k, so we could say they're effectively sacrificing 2/3 or $200k. But this misses several factors that point in different directions:

Detecting What's Been Seen

Sometimes it makes sense for sites want to treat things differently based on whether the user has seen them. For example, I like sites that highlight new comments (ex: EA Forum and LessWrong) and I'd like them even better if comments didn't lose their "highlighted" status in cases where I hadn't scrolled them into view. In writing my Mastodon client, Shrubgrazer, I wanted a version of this so it would show me posts that I hadn't seen before. The implementation is a bit fussy, so I wanted to write up a bit on what approach I took.

The code is on github, and it counts posts as viewed if both the top of the post and bottom have been on screen for at least half a second. Specifically, whenever the top or bottom of a post enters the viewport it sets a 500ms timer, and if when the timer fires it's still within the viewport it keeps a record client side. If this now means that both the top and bottom have met the criteria it sends a beacon back so the server can track the entry as viewed.

Go back 4-7 years and this would have required a scroll listener, using a ton of CPU, but modern browsers now support the IntersectionObserver API. This lets us get callbacks whenever an entry enters or leaves the viewport.

I start by creating an IntersectionObserver:

More Posts