August 7, 2024

Open Source Software and the Terrible, Horrible, No Good, Very Bad Week

James Plouffe
Technical & Product Marketing @ BlueRock Systems

Hyperbole aside, #OpenSource Software (#OSS) did have some troubling disclosures during the final week of the month.

Dirty Page* – The gift that keeps on giving

On March 26th, a vulnerability researcher using the handle notselwyn published an extremely detailed writeup of CVE-2024-1086, a high-severity Local Privilege Escalation (LPE) vulnerability, including Proof-of-Concept (PoC) exploit code. The exploit is noteworthy because:

  • It runs reliably—with success rates ranging from 93–99.4%, according to notselwyn—on kernels from 5.14.21–6.6.14 without needing to be (re)compiled for specific versions, making it a very attractive exploit chain component.
  • It builds on an earlier technique called Dirty Pagetable, the effects of which are similar to Kernel Space Mirroring Attacks (KSMA) but not affected by the mitigations developed to protect against KSMA. notselwyn has dubbed the expanded technique “Dirty Pagedirectory”.
  • It is a data-only attack that relies exclusively on user-space memory read/writes, so there are no changes to the execution flow of the targeted program/feature (in this case, nf_tables; one of the stated goals of the research and exploit development was to incorporate fileless execution for defense evasion).
  • It works on Google’s KernelCTF kernel, which has been hardened with both existing and experimental exploit mitigations.

There are some minor caveats related to kconfig values as well as the availability of a particular user capability (though it is enabled by default in some distributions) and, as of February, the underlying vulnerability was patched in the upstream -stable branches. It’s essential to remember, however, that a Git commit in an upstream repo is not the same thing as a patched kernel being available for a specific version of a particular distribution. The #patchgap—the time between when a vulnerability is disclosed and when a plurality of systems are patched against it—is lengthened by virtue of the fact that upstream fixes have to travel downstream before they can actually be installed and that time must be added to whatever time it takes to complete the deployment.

Moreover, determining which update contains a given patch can be a bit of a research project. For instance, some public recommendations suggest updating “past commit f342de4e2f33e0e39165d8639387aa6c19dff660”. Of course, that advice is only useful if one happens to be downloading and building their kernel directly from the main source tree—something that many enterprises not only don’t do (for very legitimate reasons) but are actively discouraged from doing by their vendors i.e., maintaining support / service-level agreements involves getting OS updates from the OS vendor, not rolling their own. But even vendor-supplied updates can be somewhat difficult to track. Although there are multiple kernel packages available, the relevant Ubuntu Advisory only emphasizes patched versions of the Hardware Enablement (HWE) kernel packages (linux-hwe-<kernel_version>) which—in the case of the 22.04.4 LTS (Jammy Jellyfish)—was not available for the running kernel on one server I examined:

Given the details in the advisory, it’s possible to extrapolate what the patched kernel version is, it’s just not obvious using the vendor-supplied tables. The point is not that it is impossible to find the correct information, it is that finding the correct information that accounts for the permutations of real-world installations can be unnecessarily time consuming.

It should also be noted that—even though the advisory was published on January 29th and the server in question has automatic upgrades enabled—the patched kernel was not installed until March 20th. The secondary point is that if an organization has specific metrics related to patching high severity vulnerabilities, regardless of whether that’s to ensure its infrastructure is actually protected or just to satisfy compliance requirements and the “best practices” of its cyber insurers, the reality is that there is less control over the timing than any organization would probably like.

I’ll close this section with the periodic reminder that all complex systems have bugs, some of which manifest as exploitable security vulnerabilities. This is not the first LPE exploit affecting the Linux kernel and it will not be the last. Given that Dirty Pagedirectory builds on Dirty Pagetable which, in turn, built on or borrowed from KSMA—all of which have been successful in defeating various types of mitigations—as well as the increasing interest in data-only attacks, one would be well advised to “watch this space”. It's definitely an area that the team here at BedRock Systems is thinking a lot about and one of the ways that we stop LPE exploits like this one is by implementing protections for memory pages without the use of either signatures or kernel-based mechanisms. We're taking this approach because—like a Hollywood franchise—we know there will be a sequel (even though nobody is asking for one) and we want to provide vulnerability-agnostic protection, especially during the critical period of the patch gap.

An SSH backdoor in xz making waves

On March 29th, the details of a backdoor in the popular xz compression library were posted to an OSS security mailing list. There is a quite a lot to unpack (pun intended) with this incident, but let’s start with the good news: although the malware was introduced at the very beginning of the supply chain, it probably doesn’t affect too many production installations because the backdoored versions of the library were mostly confined to the “bleeding-edge” versions of popular #Linux distros.

The bad news starts with the potential breadth of the damage. Indeed, if this compromise had played out like its perpetrators likely hoped it would, it’s probably fair to say that the potential impact could have rivaled the SolarWinds / SUNBURST attack of 2020 or the Microsoft Exchange / Hafnium attack of 2021 because the “playbook” (establishing a foothold in a widely deployed software package) is identical. This attack, however, differs in two important respects:

  1. The compromise has virtually no dependency on a target using a specific vendor.
  2. xz and/or its related libraries probably have more installations than either SolarWinds or Microsoft Exchange.

It is also perhaps more sinister in that that backdoor was committed by a project contributor who had been active for over two years. Brian Krebs made some interesting observations about the contributor’s “legend” (or apparent lack thereof) and raised some important questions about the likelihood that similar activities might be occurring in other projects.

Instead of ruining yet another Christmas for blue teams and incident responders everywhere, this caper took on a Scooby-Doo-esque quality. Although the villains haven’t been unmasked (yet), one can imagine them somewhere shouting “We [probably] would’ve gotten away with it too, if it hadn’t been for those pesky performance issues!” because—in what can only be characterized as luck (dumb or otherwise)— it was performance issues observed during testing that caused the first domino to fall. As Andreas Freund, who initially discovered the malicious code, explained in the original post: “I am not a security researcher, nor a reverse engineer.”

Despite that proclamation, advocates of Linus’s Law were quick to point out how the nature of OSS helped avert this potential catastrophe. I have written previously (with a bit of skepticism) about whether or not we did/do have enough eyeballs, but these days my thinking more closely matches that of self-described Cybersecurity Person Marcus Hutchins:

(original post here). Predictably, this observation generated a lot of “Well, actually…” replies. In a subsequent post, he puts a finer point on the issue: the malicious artifact in question was neither open, nor source; it was obfuscated bytecode that was included with the xz tarballs and it was only discovered because of dynamic analysis of a seemingly unrelated package (sshd). Even though those details are plain in even the most casual reading of Andreas Freund’s original post, it did little to quell the Internet’s ire. Mr. Hutchins hardly needs me to rise in his defense, but there are a couple of points worth foot-stomping. First, this situation is a sort of variation on the tragedy of commons: things which are no specific person’s problem have a way of becoming everybody’s problem. There are many beneficiaries of open source projects and yet, as Aristotle observed, most only “pay attention to what is their own; they care less for what is common” as demonstrated by the fact that many small but essential projects are chronically under-resourced (see also: Log4j). The second is that critics of Mr. Hutchins’s original statement are—without evidence—both crediting theoretical or imagined benefits of OSS and assuming that the ecosystem participants are virtually always good-faith actors. There is an enormous disconnect between what people are saying can be done and any concrete notion of how to do it.

The emphasis on the possibility that OSS can be inspected also ignores the fact that Software Content Analysis (SCA) is Hard, Actually™. There’s certainly an appeal to the idea that sunshine is the best disinfectant but it is only because some Linux distros patch OpenSSH to support systemd notifications and libsystemd has a dependency on lzma that the performance issues even manifested in the first place and that seems like a very narrow ray of sunshine indeed.

In the wake of this incident, it is very easy to imagine that “just verify all 3rd party software components” will become the new “just make sure everything’s patched”. And one suspects that such advice will be just about as useful in the day-to-day work to prevent incidents and breaches.

Conclusion

All of this might seem like a lot of doom and gloom but, as always, the reality is more nuanced. Are CVE-2024-1086 and the techniques it advances dangerous? Yes. But this time the danger was discovered through research and not in-the-wild exploitation. Should we file the SSH backdoor in xz under “Near Miss”? Also yes. What both of these events tell us is that sometimes we get lucky but also that we need to continue to re-evaluate how we think about both using and defending the infrastructure on which so many things rely when its failure modes are so difficult to quantify and analyze.

Subscribe to our newsletter

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.