Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

feat(executor): add option for additional mounts in docker runner #56434

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rdeusser
Copy link

@rdeusser rdeusser commented Sep 7, 2023

Description

We use the code intel feature on a lot of Go code and each time a repo was indexed it had to repopulate the Go module cache all over again, wasting time and unnecessarily putting pressure on our Artifactory instance. To solve this, I added the ability to specify mounts, semicolon-separated, through the environment variable EXECUTOR_DOCKER_ADDITIONAL_MOUNTS (e.g. export EXECUTOR_DOCKER_ADDITIONAL_MOUNTS=type=volume,source=gocache,target=/gocache;type=volume,source=gomodcache,target=/gomodcache). To fully solve the issue, it was necessary to add the environment variables GOCACHE and GOMODCACHE to the executor secrets under the Code Graph tab in the Sourcegraph UI as well as add those environment variables to the requested_envvars array in the Code Graph inference section.

Test plan

Tested in production environment.

Screenshot showing the volumes are referenced in the docker runner started by the executor:
Screenshot 2023-09-11 at 12 05 24 PM

Screenshot showing that the /gomodcache directory in the gomodcache volume is populated:
Screenshot 2023-09-11 at 12 02 33 PM

On one indexing job we saw an improvement of 92.4%.

@cla-bot
Copy link

cla-bot bot commented Sep 7, 2023

We require contributors to sign our Contributor License Agreement (CLA), and we don't have yours on file. In order for us to review and merge your code, please sign CLA to get yourself added.

Sourcegraph teammates should refer to Accepting contributions for guidance.

@cla-bot
Copy link

cla-bot bot commented Sep 7, 2023

We require contributors to sign our Contributor License Agreement (CLA), and we don't have yours on file. In order for us to review and merge your code, please sign CLA to get yourself added.

Sourcegraph teammates should refer to Accepting contributions for guidance.

@rdeusser rdeusser changed the title feat(executor): add option for additional bind mounts in docker runner WIP feat(executor): add option for additional bind mounts in docker runner Sep 7, 2023
@cla-bot
Copy link

cla-bot bot commented Sep 8, 2023

We require contributors to sign our Contributor License Agreement (CLA), and we don't have yours on file. In order for us to review and merge your code, please sign CLA to get yourself added.

Sourcegraph teammates should refer to Accepting contributions for guidance.

@rdeusser rdeusser changed the title WIP feat(executor): add option for additional bind mounts in docker runner WIP feat(executor): add option for additional mounts in docker runner Sep 8, 2023
@cla-bot
Copy link

cla-bot bot commented Sep 11, 2023

We require contributors to sign our Contributor License Agreement (CLA), and we don't have yours on file. In order for us to review and merge your code, please sign CLA to get yourself added.

Sourcegraph teammates should refer to Accepting contributions for guidance.

@cla-bot
Copy link

cla-bot bot commented Sep 11, 2023

We require contributors to sign our Contributor License Agreement (CLA), and we don't have yours on file. In order for us to review and merge your code, please sign CLA to get yourself added.

Sourcegraph teammates should refer to Accepting contributions for guidance.

@cla-bot
Copy link

cla-bot bot commented Sep 11, 2023

We require contributors to sign our Contributor License Agreement (CLA), and we don't have yours on file. In order for us to review and merge your code, please sign CLA to get yourself added.

Sourcegraph teammates should refer to Accepting contributions for guidance.

@rdeusser rdeusser changed the title WIP feat(executor): add option for additional mounts in docker runner feat(executor): add option for additional mounts in docker runner Sep 11, 2023
@cla-bot
Copy link

cla-bot bot commented Sep 11, 2023

We require contributors to sign our Contributor License Agreement (CLA), and we don't have yours on file. In order for us to review and merge your code, please sign CLA to get yourself added.

Sourcegraph teammates should refer to Accepting contributions for guidance.

@camdencheek
Copy link
Member

Hi @rdeusser! Sorry about the delay on the review. Ownership of executors has shuffled around a bit, and this one got lost in the weeds. I've got this on my list for today 👍

@camdencheek camdencheek requested review from camdencheek and a team October 11, 2023 14:59
@peterguy
Copy link
Contributor

Adding a volume mount to Executors seems like a good way to solve persisting files, but because executors run in so many different environments, one of the main goals for executors is to decouple them from infrastructure. Adding volume mounts actually works against that.

Instead of modifying executors, there are a few options to solve this pain point. I'll list them in order of my preference:

  1. Maintain a custom Docker image containing the required libraries and build artifacts, and use that as the base image. Executors support private Docker registries, and the overhead of maintaining such an image can be mitigated with scripting in the release process or CI chain.
  2. set up a remote GOCACHE, which makes use of an experimental feature to run a sub-command that can do anything, including read from a remote cache.
  3. If maintaining a customer Docker image or setting up a remote cache is too onerous, packing the build artifacts into one archive file, storing that file in an accessible-via-http(s) location, and adding to the job commands to retrieve and unpack the archive is another option. Kind of like a lightweight or home-brewed alternative to the first two.
  4. Deviating from the other options considerably: instead of auto-indexing, maybe you can set up your CI/CD process to do the indexing for you. See our github workflow as an example. Not sure that’s the direction you want to move, but it is another option.

I'm sure there are other options as well; these are what the team here came up with.

I hate to rain on your parade, @rdeusser - I like the engineering work you've done in this PR - but I recommend this PR be closed and an approach that works with the existing capabilities be taken instead.

@rdeusser
Copy link
Author

@peterguy In general decoupling from the infrastructure seems like a great way to support different environments more easily, but the reality is that we still have to deal with that infrastructure. Besides the fact that virtually all of the supported environments have a concept of a volume, adding volume mounts does not work against that.

  1. This will not work at any significant scale. The Go module cache after having indexed most of our repos is over 180GB. We have over 600 Go repositories.
  2. The Go build cache isn't the issue. The issue is the Go module cache which this doesn't apply to. So this won't work for us either.
  3. Similar to option 1, having an 180GB archive that's retrieved on every indexing job is infeasible.
  4. Having our CI do the indexing doesn't give us the option of having indexes for tags that have already been built. We pay for this feature so suggesting that we not use it is unacceptable.

None of the solutions provided solve the issue for us. Even moving to the Kubernetes-based executor doesn't solve the issue because the PVC's are deleted after each job is run.

Additionally, the Go module cache isn't the only problem this solves for us. In order to clone repos/go mod download them, our certificates need to be present inside the container otherwise the executor will fail when downloading dependencies. The other option is to maintain a custom scip-go image with our certificates in it. The solution for enterprise customers can't be for all of us to maintain custom scip-go/lsif-go images.

There are no approaches that I'm aware of that work with the existing capabilities, hence the purpose of the PR. The gist of what's needed here is that we need to provide a place to persistently store cached Go modules to the indexing environment prepared by the executor.

@peterguy
Copy link
Contributor

Thanks for adding more detail @rdeusser!

I appreciate your patience; this PR, and the concept of persistent volumes in executors, has sparked quite a lengthy discussion around here. 😄

We're adding this PR to our backlog, but given our workload now, we can't give an estimate of when we'll be able to work on it.

To give you some insight into the discussions this PR has generated, here are some comments from various engineers involved in the discussion:

  • Storing state on the host path makes executors not stateless, and we would need to also build a way to clean up that state over time
  • This change would require quite a lot of testing because executors run on many different environments, so “just mount a directory from the host” is not as simple as it sounds
  • I don't believe it will work for Firecracker executors, nor k8s executors which feels weird to me if only one out of three runtimes support it.

Notice that much of the discussion is around supporting volume mounts in all of the deployment scenarios. Testing volume mounts in all of the installation options will take a non-trivial amount of effort; if you get a chance to test out volume mounts in Firecracker and K8s, update this PR with the result!

@camdencheek camdencheek removed their request for review November 28, 2023 16:00
@bahrmichael bahrmichael removed the request for review from a team March 25, 2024 09:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants