Dzone Rc181 Repositorymanager
Dzone Rc181 Repositorymanager
181
CONTENTS
INTRODUCTION (1) source and (2) binary component. And while it is possible
to use a source repository to store components, some crucial
Software development depends upon two distinct kinds of
differences between these two super-types make this solution
components: (1) source code, and (2) binary components.
non-ideal.
This Refcard assumes basic familiarity with source control
management and is intended to help you design and configure WHY SOURCE REPOSITORIES?
a Repository Manager to store, manage, and share binaries, Source repositories are designed simply to manage source
optimize it for various workflows, and fit it smoothly into your code. A well-built source repository therefore boasts a feature
DevOps pipeline. set tailored to source code management (e.g.: differing versions,
tracking deleted or overwritten files, branching, and tagging).
Repository Managers are typically available as free OSS and
paid professional versions. Most organizations will want to WHY (BINARY) REPOSITORY MANAGERS?
implement a paid professional version as they grow and Repository Managers are to binaries what source repositories
mature their DevOps practices. or VCS (Version Control Systems) are to sources. Where
source repositories deal with relatively small code files
BASIC FEATURES (FREE) PRO FEATURES (PAID)
that change constantly and are often cloned with abandon,
Proxy public repositories Enterprise support binary repositories manage a completely different workflow.
Repository managers provide one source of truth for the
Host local binaries or
High availability binaries used in waterfall, agile, and CI/CD processes.
components
Group repositories for Binary components are often orders of magnitude larger than
Component intelligence
access rights and security source files.
SEE WHY
3
Binary components usually need to store lots of metadata the bare minimum repository. The common factors to consider
(package name, version, license, etc.). are discussed below.
While each different build tool or component format may have At its most basic essence, a Repository Manager is a caching
its own purpose built repository or Repository Manager, nearly proxy of these remote repositories. The cached components
all organizations are heterogeneous in terms of languages/ can be served rapidly to other machines on the same network
build tools and component formats. Its not uncommon to find after the initial request either to human coders or directly
multiple tools at play even within a single application these to Continuous Integration (CI) servers themselves. The ability
days. Take for example a Java application built with Maven, to cache the things you need locally isolates you from the
with a JavaScript UI with dependencies fetched from an npm inevitable network latency, internet connectivity issues, or
repository, ultimately distributed to testing and production as a components randomly disappearing from improperly managed
Docker container. Thats three tools, formats, and repositories remote repository ala npm gate.
in a single application. Do you want to manage three separate
servers, with their own idiosyncrasies, requiring backup,
ADVANCED MANAGEMENT OF THIRD-PARTY
permissions, etc.?
COMPONENTS
A Repository Manager is a hub for development teams across Some organizations may have a policy about what third-party
the whole organization, centralizing the management of all dependencies may be used because of licensing or security concerns.
the components generated and used by the organization. The
Heres a common example: third-party components need to
inevitable resulting diversity of component types, and their
be requested by a developer and approved by a legal
differing positions in the overall workflow, is one major reason
department. Frequently we see attempts to manage this
to use a dedicated Repository Manager, rather than just a
simple file server. process by a simple whitelist / blacklist approach. This is bound
to fail for several reasons:
Thus, the decision to use a Repository Manager generally
revolves around how many repositories you need, what types The breadth and volume of third-party components is
of component formats are dictated by those repositories and staggering. A typical enterprise can easily consume many
build tools, and what higher level functionality you need above hundreds of thousands of components that each release
Delays in human reviews of the list cause developers to do The automated rules engine and continuously updated
one of two things: 1) stop updating dependencies because Component Intelligence can alert you when these components
the friction is too high, leading to increased exposure to go bad so that your developers can immediately triage and
vulnerabilities over time and making it harder to upgrade remediate the problem.
later, or 2) work around the system to get their job done,
leading to decreased visibility of what is actually in use
FACTOR 2: INTERNAL COMPONENTS
and ultimately defeating the entire purpose of the process.
In addition to consuming third-party components, most
Fortunately, there are ways to deal with this that dont have modern build tools also need a location to push the artifacts
unintended side effects. of the build to a repository. This is done because the internal
artifacts themselves are often sub-assemblies or otherwise
COMPONENT INTELLIGENCE dependencies of yet another build. This makes hosting
Some professional versions of Repository Managers include internally developed components an equally important
health checks to provide instant insight into potential capability of a Repository Manager.
component security, license, and quality risks so that
There are several factors to consider when structuring your
development teams can take corrective action early and quickly.
internal hosted repositories. Youll want to partition things
This intelligence can help organizations identify known security,
into different repositories. Doing this effectively requires you
license, and architectural issues for each component. Health
to balance ease of administration by not having too many
check capabilities can be used as an automated audit tool for
repositories vs. challenges like security (covered later) as well
build managers, architects, open source governance, security,
as partitioning by use case.
and legal professionals.
A typical use case worth partitioning is for temporal
components. In Maven, these are formalized as snapshots and
are required to be separated from releases. In other tools, the
separation isnt baked in. Administration and cleanup will be
easier if you keep the components that are constantly churning
out short-lived versions separate from the ones you intend
to keep for a long time or forever. This allows easier purge
policies (discussed later), as well as optimizations in how you
store the components on disk that allow easier backups of the
permanent things, less block level fragmentation, and other
I/O-related concerns.
COMPONENT FIREWALL
Building upon the component intelligence, some Repository COMPONENT STAGING
Managers can provide a form of Firewall capability. It becomes Another common use case for internal components is to
possible to automate the decisions of what components manage them through a staging and promotion lifecycle.
to allow into the organization by using the intelligence and
When a component is pushed to a repository, that repository
combining it with a rules engine.
may not be its final destination. Imagine a workflow where a
release candidate component needs to go through integration
This allows you to stop bad components (e.g. ones with
testing and QA processes. Only components that go through
already existing known vulnerabilities, or ones with licenses
this process should be available for other teams or clients.
incompatible with your business model) from being proxied
and integrated only to be ripped out later. This model is
A Repository Manager can enable this workflow by providing
the only one that can scale without requiring an army of
mechanisms to associate components and promote them
human reviewers.
through various phases, where each phase may result in them
being available to different users on different known URLs.
Further, components that are known to be good when they
are first used become bad later when new vulnerabilities are This type of functionality is often done in conjunction with an
discovered. In a manual review process, almost no organization automated CI/CD pipeline that is discussed in more depth later.
SUPPORTING DISTRIBUTED TEAMS Repository Managers have become integral to the DevOps
When teams that access the repositories are located in different pipeline and are included in almost every reference architecture
locations or distributed across the globe, it is also important to found in organizations around the world.
provide access to all the components, both internal and third-
Any component or build artifact that is produced or needed
party. Recall that the basic essence of a Repository Manager is
in the CI/CD process is stored in a Repository Manager.
a caching proxy and therefore to reap the benefits, you really
Repositories are integrated to Jenkins, Maven, Gradle, Puppet,
want to have one located in each physical location where you
Chef, and almost every other tool in the DevOps toolchain.
have more than a few developers. Otherwise those developers
Rundeck, for example, orchestrates the deployment of
may suffer slow and unreliable build times because they are
applications to production and relies on a Repository Manager
fetching components from the internet and/or across the WAN.
to get the components it needs for deployment. Repository
In some cases, you will want to pre-emptively replicate Managers are central and critical to implementing modern
some content to another location, either to prime the repo DevOps environments.
CI/CD PIPELINES
WHAT IS CONTINUOUS INTEGRATION?
With the advent of lean, agile, and more recently, Continuous
Delivery and DevOps, projects no longer incubate for months
in a waterfall-like development process. Instead, they undergo
constant changes and releases. In many cases, these projects PUBLISHING PROJECTS
are distributed throughout an organization at different stages For teams moving into more advanced, continuous delivery
of the development lifecycle. and DevOps (Pipeline) models, Jenkins provides support
for pipeline-style projects. This becomes less about simply
This always-on development means the volume of changes
building or compiling projects ( la Maven or similar tool),
alone necessitates an automated approach to building
and more about automating tooling to take necessary action
applications. To support developers at scale, several tools have
and/or get things where they need to be throughout the
been developed to automate much of what goes into building
development lifecycle.
and releasing applications. This includes the ability to package
applications or components generated during build, and do so For example, it could be building a Maven package and making
continuously as changes are made. Colloquially this is referred sure its published to a repository for testing, then removed
to as Continuous Integration, or CI. once testing is complete. In more complex environments it
likely means multiple builds are taking place simultaneously,
PIPELINE and in parallel, then assembling each build together in a single
Building on the automation benefits from CI servers, teams package that is passed on through to staging and eventually
are now able to completely customize what goes where, when, into production. In some instances, the product of those builds
and how. In other words, teams add in various checkpoints is moved forward. In others, once its no longer needed its
throughout the process to ensure applications are free of major automatically removed.
defects. This isnt merely quality assurance anymore, but rather
governance to avoid vulnerability, license, and architectural The impact here is that it decouples the publishing process
issues in the applications and components a team produces. from the compilation tool, allowing greater customization
The result is a pipeline-like approach. and for components to be passed to the Repository Manager
at any point in the development lifecycle, regardless of the
REPOSITORY MANAGERS AND CI TOOLS development ecosystem.
Repository managers provide several interaction points with
CI tools like Jenkins and Bamboo. This can range from simply
requesting and storing a proxy of components as part of the
build process, to publishing internally developed components
for distribution across a development organization.
BUILDING PROJECTS
When an application is built using Maven (or similar build tools),
it gathers the components from a specified location (configured
via Maven settings) and compiles them. The end result can be
an application, or even another binary or component. Jenkins
provides automation for this by allowing the inclusion of a Maven
build step for freestyle and multi-configuration projects. When
the Build step is called, the Maven project will build, and if Maven
has been configured to do so, it can request components from
A B O U T T H E AU T H O R
BRIAN FOX is the co-founder and CTO of Sonatype, and is also a member of the Apache
Software Foundation and former Chair of the Apache Maven project. As a direct contributor
to the Maven ecosystem, including the maven-dependency-plugin and maven-enforcer-plugin,
he has over 20 years of experience driving the vision behind, as well as developing and leading
the development of software for organizations ranging from startups to large enterprises. Brian
is a frequent speaker at national and regional events including Java User Groups and other
development related conferences.
DZone communities deliver over 6 million pages each month to more than 3.3
million software developers, architects and decision makers. DZone offers
something for everyone, including news, tutorials, cheat sheets, research guides,
DZONE, INC. REFCARDZ FEEDBACK
feature articles, source code and more. WELCOME
150 PRESTON EXECUTIVE DR.
[email protected]
CARY, NC 27513
"DZone is a developer's dream," says PC Magazine.
SPONSORSHIP
Copyright 2017 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval 888.678.0399 OPPORTUNITIES
system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior
919.678.0300 [email protected]
written permission of the publisher.