The Dependency Tree is Actually More of a Jungle. And it’s Haunted.
Contributed by Dan Lorenc | originally posted on medium.com
I was looking through the Kubernetes go.mod file, and noticed something weird. A few strange-looking dependencies that didn’t seem to belong. I’m still not quite sure what caught my attention about these specific modules — there are over 300 direct and indirect dependencies required to build Kubernetes — but these particular ones really didn’t make sense to me.
On a normal day, I might have just gotten distracted and moved on, but I decided to really dig in here. I wanted to better understand the state of Go modules, the tooling around them, and what is going on in the dependency Jungle of cloud-native Go projects. Hunting down these dependencies was a perfect opportunity to do that.
After a few days, I think finally understand things a little better, and hopefully improved the state of a few projects. But the digging I did here, the more strange projects like this I found. I call them “ghosts”. They obviously aren’t really ghosts, but they’re half-dead dependencies – waiting to haunt anyone that dares to read the
go.mod file or the output of
This is Part One of a series that explains some interesting things I found in the module graph, some of the improvements I think I made, and some of the problems I faced along the way. For anyone hoping to try out a similar adventure through the Haunted Dependency Forest, I’d recommend bringing a machete.
Part 1: 99 Bottles Of Beer…
This first ghost was a small, innocuous-looking repository named rsc.io/sampler, hanging out toward the bottom of the main Kubernetes go.mod file. I wanted to see what this module was, and how/why it was being used.
Step one was to find the source code. The Go tool supports Vanity Imports allow repositories to declare import names like this one, while still being served from GitHub. To find the source for this repo, we can use a simple curl command:
$ curl -L rsc.io/sampler?go-get=1 | grep go-import <meta name=”go-import” content=”rsc.io/sampler git https://github.com/rsc/sampler">
This meta tag shows us that the canonical location for this package is on GitHub. Opening it up in a browser is when things started to seem weird. This package had only a few files, and the latest commit appears to break it on purpose! What’s going on here?
The hello.go file has this for contents:
// Copyright 2018 The Go Authors. All rights reserved. // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. // Translations by Google Translate. package sampler var hello = newText(` English: en: 99 bottles of beer on the wall, 99 bottles of beer, ... `)
That’s it. 99 bottles of beer on the wall. The sampler.go file is quite a bit longer, but it’s still complete gibberish. So why is this in Kubernetes? What could it possibly have been used for? If harmless things like this are lurking around, what else could be slipping in undetected? Can we remove it?
The first step to cleaning something up is understanding why it was there. And I was having trouble even there. The
go mod why command was providing me with nothing.
$ go mod why rsc.io/sampler
(main module does not need package rsc.io/sampler)
So it’s in here, but it’s not needed? A quick search of the Kubernetes vendor directory showed that this was correct, this package is not pulled in by
go mod vendor so k8s can build without it. That made me feel a lot better, but it still wasn’t good enough. If this module isn’t needed, we should be able to remove it, right? Maybe it was just added there by accident.
I tried a
go mod tidy, and got nothing. The file was already tidy-ed. Maybe it was added explicitly for some reason, and the go tool preserved it? I tried removing it myself:
go mod edit -dropreplace rsc.io/sampler
But k8s wanted me to put it back! As soon as I run
hack/update-vendor.sh it reappeared! So there must have been some kind of reason it kept getting pulled in, even if
go mod why can’t find it. Trying
go mod graph gave me some better results. I’m still not completely sure of the difference between these two commands, but here’s the shortened result:
$ go mod graph | grep rsc.io/
From this output we can see that k8s depends on
golang/mock, which depends on
rsc.io/quote/v3, which depends on
rsc.io/sampler. In case you’re wondering, the quote module is just as funny as sampler.
Busting The Ghost
Now I could see where sampler was coming from, but still didn’t know why. Things were starting to make more sense though — the mock library is used for tests, which usually use a separate set of build tags and test dependencies don’t necessarily always get pulled into a vendor directory.
To get rid of it, I first took a look at the mock library on GitHub. Jumping right off the page, I saw that it was recently removed!
Great! This meant I should just be able to update
golang/mock in k8s and drop this dependency. Thankfully, this was a pretty easy update: https://github.com/kubernetes/kubernetes/pull/97337. This PR ended up dropping both
sampler. Since these were never pulled into vendor, I don’t think this cleanup really had much of an impact. Still, it can’t hurt to remove things like this. At a minimum, they add constraints and complexity to the module resolution graph and make
go mod work a little bit harder. Worst-case scenario, something malicious could eventually appear in these and make its way up into other programs. An update to an old dependency probably gets less scrutiny during review time than a change that introduces an entirely new one.
I still wanted to understand why this was ever introduced into k8s. It was bugging me that a completely innocuous library like this, with seemingly no purpose, ended up in the dependency graph for such a critical piece of software. Reading the sampler repo itself didn’t give me much info. I could have just asked Russ Cox what it was for, but I wanted to try to find out on my own.
A Google search turned up a Hackernews discussion from a few years ago on the Using Go Modules blog post, showing how the new modules feature worked. It looked like rsc.io/quote/v3 and rsc.io/sampler were example repositories created to show how to use go modules.
That explains why the repos existed in the first place, but not why they ended up in
golang/mock or farther. A quick bisect on the
golang/mock go.mod file showed me the PR where they were introduced.
It was a PR to fix a bug where
golang/mock had some trouble parsing major versions out of import paths when run in a module-aware context. So, the author added some tests of this support using the example repos from the blog post. Completely logical. So the
go.mod file contained a test dependency of a test dependency, and this repo propagated all over the Go dependency trees for anyone using mock.
So then why was it removed? Why was I able to just update? I initially guessed that it was introduced as some kind of accident and quickly deleted. That wasn’t the case, so why did it get deleted? Looking at the eventual PR to delete this, I stumbled on an even more interesting tidbit of Go history:
That PR linked to an issue:
which linked to another issue:
Remember the vanity URLs I described at the start? Russ Cox was hosting his vanity import redirector on Google App Engine, and it was built with a very old version of Go (1.6). At some point, App Engine dropped support for this version of Go. This ended up taking down his redirector, so Go tooling couldn’t find the
rsc.io libraries and builds of
golang/mock failed! The dependency was removed to fix this, not to actually clean up the dependency trees.
And, interestingly enough, the
rsc.io/quote test case was never actually deleted – it was merely commented out! So while I was able to remove this ghost from my dependency tree, it could reappear at any time. This wasn’t really the closure I was hoping to find, but at this point, I had found enough other (potentially more serious) threads to pull on that I decided to move on to these. The next part of the series will cover a couple of actual CVEs I found lurking around.