Skip to main content


I accidentally found a security issue while benchmarking postgres changes.

If you run debian testing, unstable or some other more "bleeding edge" distribution, I strongly recommend upgrading ASAP.

https://www.openwall.com/lists/oss-security/2024/03/29/4

I was doing some micro-benchmarking at the time, needed to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd, showing lots of cpu time in liblzma, with perf unable to attribute it to a symbol. Got suspicious. Recalled that I had seen an odd valgrind complaint in automated testing of postgres, a few weeks earlier, after package updates.

Really required a lot of coincidences.

Sure glad you did. This would have been unimaginably worse if it'd gone undetected for another six months.
I feel both confident and also kind of queasy when saying this: it seems extremely likely that this is not the first time something like this has happened, it's just the first time we have been lucky enough to notice.

That is true.

Binary artifacts have no business existing in Free Software (or near-binary considering how auditable pre-generated config scripts end-up being). The way it was compromised in this case is almost certain to have happened before and reminds me of the SourceForge malware debacle (so arguably that's another famous example of it happening before).

I"m not sure if many other projects do like Guix and record the checksum of the whole repository so as to ensure reproducibility purely from source.

In general this is reasonable, but this there are some clear exceptions for test vectors in cryptographic libraries and compression libraries (which this was).

In this case the actual malicious vector was the near-binary injected code in the practical binary of the unaudited autotools vomit (always autoreconf) which was then bundled in the actual binary artifact that was the compromised tarballs.

None should have ever been part of the project.

As for the test files, I still think that having a hex dump with comments explaining what flaws particular parts test would be desirable in a lot of cases.

@lispi314 @glyph The actual backdoor code was part of binary test vectors for the LZMA algorithm (and is an actual amd64 binary object file). Autotools goop was just how it was triggered/injected into the build system.

In this case the build system side was rather noisy, but you could make it a lot more subtle. Something like a subtly incorrect glob could be engineered to "accidentally" match a test data file that then gets linked into the build process in some way.

This entry was edited (1 month ago)

Adrian Cochrane reshared this.

Right, I suppose I did omit to mention that one.

Though that suggests my initial statement of "no binaries, full stop" is the right idea.

It's my understanding that at that point though, you're trusting the developers of the CPU assembly/byte/machine code.

The problem isn't necessarily the binaries - it may have been harder to do so, but they could've hid it in a commonly distributed compiler ala ( https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf ) - it's a matter of trust in the person doing the development.

Part of the problem is that the developer's submissions were pushed with less review than necessary.

Indeed, unfortunately until we have Libre FPGAs and the ability to verify their design and configuration, it is necessary to trust the hardware.

In my post here: https://udongein.xyz/notice/AgMU728f3awnLq7eqW

One of the articles (I'm not sure which again) explicitly links to the Trusting Trust problem & documentation.

It is possible, if tedious, to bootstrap a Forth implementation in a minimal amount of machine code in an hex editor or other basic input method, and then retain the ability to audit both the initial binary bootstrap seed and then bootstrap everything else from source.

Of course in this case the Guix project contributors decided that a small Scheme (which iirc isn't standard-compliant) interpreter would do.

The commits weren't adequately reviewed, yes.

That said, if binary artifacts like tarballs weren't accepted, and pratically binary by its difficulty of auditing autotools vomit was deemed equally unacceptable (better use autoreconf on whatever dev machine is used to build the project as necessary), it would be far more difficult to replicate this incident.

@lispi314 @glyph @AT1ST You should look up Precursor:

https://www.bunniestudios.com/blog/?p=5921

TL;DR there is a very legitimate argument that backdooring an FPGA generically is impractical, so libre FPGA configuration (and ideally libre FPGA toolchains) plus inspectable hardware generally suffice for a high level of trust.

However, this will never be practical for high performance general purpose compute, only for very limited applications like this. I don't think anyone has any good ideas on how to solve the trustability problem for things like desktop PC or smartphone class hardware. It might never be possible.

This entry was edited (1 month ago)
@lispi314 Do I have to bootstrap the whole chain of programs, starting from GCC?
@AndresFreundTec @glyph @marcan
And a lot of persistence! Reminds me of one of the classics of the industry, Cliff Stoll's Cuckoo's Egg - "Stoll traced the error to an unauthorized user who had apparently used nine seconds of computer time and not paid for it" leading to a german hacker selling content to the KGB - 38 years ago. It is impressive (but uncommon) to see someone paying that level of attention to anomalies these days, with how thick tech stacks have gotten...
we were very, very lucky to have you on the case!
Thanks for your work on this. Everyone should follow your example.

One more aspect that I think emphasizes the number of coincidences that had to come together to find this:

I run a number "buildfarm" instances for automatic testing of postgres. Among them with valgrind. For some other test instance I had used -fno-omit-frame-pointer for some reason I do not remember. A year or so ago I moved all the test instances to a common base configuration, instead of duplicate configurations. I chose to make all of them use -fno-omit-frame-pointer.

Afaict valgrind would not have complained about the payload without -fno-omit-frame-pointer. It was because _get_cpuid() expected the stack frame to look a certain way.

Additionally, I chose to use debian unstable to find possible portability problems earlier. Without that valgrind would have had nothing to complain.

Without having seen the odd complaints in valgrind, I don't think I would have looked deeply enough when seeing the high cpu in sshd below _get_cpuid().

There are more coincidences that are even less interesting. But even the above should make it clear how unlikely it was that I found this thing.
Just to be clear: I didn't mean that I didn't do good - I did. I mean that we got unreasonably lucky here, and that we can't just bank on that going forward.
This is incredible work, thank you so much.
@CyrilBrulebois I concur โ€” we are all your debt for spotting it so early.
Major Cliff Stoll / Cuckoo's Egg vibes here. A minor system-usage matter ends up uncovering much more!

@bikewazowski omg yes 100% very much this.

Looking forward to the book and 6 part Netflix miniseries, Andres.

Get a literary agent!

@BD
thank you for your diligence.
congrats and thank you for the investigation- IMO this is going to go down as the vuln of the decade. What a find.
@dgilman Unfortunately I suspect we'll see a lot more such attacks going forward, in all likelihood with more success in some cases.

This is insane. I expect full-fledged articles out soon, but another interesting bit in https://news.ycombinator.com/item?id=39866275 :

"the apparent author of the backdoor was in communication with me over several weeks trying to get xz 5.6.x added to Fedora 40 & 41 because of it's "great new features""

This is CVE-2024-3094 for easier tracking.

#JiaT75 #CVE20243094

This entry was edited (1 month ago)

reshared this

@dgilman
From the same thread:

"Fascinating. Just yesterday the author added a `SECURITY.md` file to the `xz-java` project.

> If you discover a security vulnerability in this project please report it privately. *Do not disclose it as a public issue.* This gives us time to work with you to fix the issue before public exposure, reducing the chance that the exploit will be used before a patch is released."

@richlv @dgilman It does not affect FreeBSD at all. Simply because this person was targeting a specific OS.
Congrats on going viral (ha!) and breaking open the biggest security deal in a *long* time
Big props, you benchmarked your way to averting a major cybersecurity catastrophe

@malwaretech saw reddit comment to the effect: don't cause perf issues for database people, they will wreck you.

Nice job Andres!

thank you, I see the paragraph that begins:

"To reproduce outside of systemd, โ€ฆ"

I have someone claiming:

"The exploit requires systemd. โ€ฆ"

Can both be true?

Postscript: thanks to @vi for helping me to realise my misunderstanding.

Apologies for the noise.

This entry was edited (4 weeks ago)

@grahamperrin the exploit only works if sshd is patched to depend on libsystemd as that's what pulls in the compromised liblzma.

You don't have to *run* sshd via systemd to be compromised, it can be started manually because the binary is still linked with the malicious library.

@AndresFreundTec

@vi @grahamperrin and OmniOS links with liblzma (via libxml2)
โ‡ง