0% found this document useful (0 votes)
64 views7 pages

Dzone Rc181 Repositorymanager

Repository Manager

Uploaded by

Asheesh Mathur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views7 pages

Dzone Rc181 Repositorymanager

Repository Manager

Uploaded by

Asheesh Mathur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

BROUGHT TO YOU IN PARTNERSHIP WITH

181
CONTENTS

Using Repository Managers Repository Requirements

Third-party Binary Management


The best way to organize, store, and distribute software components.
Internal Components

Security and Maintenance


UPDATED BY BRIAN FOX Repository Managers in the DevOps
ORIGINAL BY CARLOS SANCHEZ Toolchain...and more!

INTRODUCTION (1) source and (2) binary component. And while it is possible
to use a source repository to store components, some crucial
Software development depends upon two distinct kinds of
differences between these two super-types make this solution
components: (1) source code, and (2) binary components.
non-ideal.
This Refcard assumes basic familiarity with source control
management and is intended to help you design and configure WHY SOURCE REPOSITORIES?
a Repository Manager to store, manage, and share binaries, Source repositories are designed simply to manage source
optimize it for various workflows, and fit it smoothly into your code. A well-built source repository therefore boasts a feature
DevOps pipeline. set tailored to source code management (e.g.: differing versions,
tracking deleted or overwritten files, branching, and tagging).
Repository Managers are typically available as free OSS and
paid professional versions. Most organizations will want to WHY (BINARY) REPOSITORY MANAGERS?
implement a paid professional version as they grow and Repository Managers are to binaries what source repositories
mature their DevOps practices. or VCS (Version Control Systems) are to sources. Where
source repositories deal with relatively small code files
BASIC FEATURES (FREE) PRO FEATURES (PAID)
that change constantly and are often cloned with abandon,
Proxy public repositories Enterprise support binary repositories manage a completely different workflow.
Repository managers provide one source of truth for the
Host local binaries or
High availability binaries used in waterfall, agile, and CI/CD processes.
components

Group repositories for Binary components are often orders of magnitude larger than
Component intelligence
access rights and security source files.

From the point of view of the developer (though not the


INTRODUCTION: REPOSITORY REQUIREMENTS designer), binary packages dont need to be diffed.
A binary component is the output of any step in the development
Except in rare situations (e.g. snapshots and nightly builds),
process. Many components result from builds, but other types
binary components are not deleted or overwritten.
are crucial as well. Common component types include:

ZIP or tarball files


RPM or DEB packages (Linux)
JAR, WAR, and EAR packages (Java)
npm (JavaScript) 88 %
NuGet packages (.NET)
RubyGems (Ruby)
PyPI packages (Python)
Docker Images
are concerned about
DLLs (Windows)
container security.
Source packages You should be too.
Documentation packages
SEE WHY
THE TWO TYPES OF PACKAGES
The various types listed above are clustered into two groups:

DZONE.COM | DZONE, INC. VISIT DZONE.COM/REFCARDZ FOR MORE!


88 %

are concerned about


container security.
You should be too.

SEE WHY
3
Binary components usually need to store lots of metadata the bare minimum repository. The common factors to consider
(package name, version, license, etc.). are discussed below.

WHEN TO USE A REPOSITORY MANAGER FACTOR 1: THIRD-PARTY BINARY MANAGEMENT


To begin, lets clarify that a Repository Manager is different
PROXYING BINARIES
than a repository. The Repository Manager is in charge of
At some point, in most projects, youll use third party
managing multiple repositories that it hosts, each with a set
components that are hosted in a repository external to your
of specific functions and permissions. The design of your
organization. Network latency and bandwidth will affect
Repository Manager will help specify the role each repository
development speed directly especially when your external
serves for the development, QA/test, and operations tool and
components are (in some cases, gigantic) binaries even if your
teams that access them.
team is fully on-premises.
The choice of how to best interact with dependencies, either
Now imagine you need to work every day with the latest build
in source or binary form, is usually predicated on what
of several dependencies and each takes several minutes to
build system youre using. In fact, the ability to reuse binary download possibly several times a day. Now consider a long
dependencies without having to recompile them is usually a key chain of dependencies, and youre immediately (and with no
criterion in selection of the build tool in the first place. payoff on the development side) in component download
dependency hell.
PLATFORM BUILD TOOL WITH REPOS
Further, external dependencies introduce an element of
Maven, Ivy, Maven Ant Tasks,
JVM unnecessary risk simply because you cant control access to
Gradle
them. To remove this risk, configure your Repository Manager
.NET NuGet to proxy these files. Keep a copy in your private repository;
then dependency availability will be up to you. You can also
OSGi P2
apply your own backup and availability policies, guaranteeing
Yum Linux access to the components even if the upstream repository goes
down, or they disappear on the upstream repository. For those
Docker Docker of you who remember npm gate in 2016, this became a real-
world issue for thousands of development teams.

While each different build tool or component format may have At its most basic essence, a Repository Manager is a caching
its own purpose built repository or Repository Manager, nearly proxy of these remote repositories. The cached components
all organizations are heterogeneous in terms of languages/ can be served rapidly to other machines on the same network
build tools and component formats. Its not uncommon to find after the initial request either to human coders or directly
multiple tools at play even within a single application these to Continuous Integration (CI) servers themselves. The ability
days. Take for example a Java application built with Maven, to cache the things you need locally isolates you from the
with a JavaScript UI with dependencies fetched from an npm inevitable network latency, internet connectivity issues, or
repository, ultimately distributed to testing and production as a components randomly disappearing from improperly managed
Docker container. Thats three tools, formats, and repositories remote repository ala npm gate.
in a single application. Do you want to manage three separate
servers, with their own idiosyncrasies, requiring backup,
ADVANCED MANAGEMENT OF THIRD-PARTY
permissions, etc.?
COMPONENTS
A Repository Manager is a hub for development teams across Some organizations may have a policy about what third-party
the whole organization, centralizing the management of all dependencies may be used because of licensing or security concerns.
the components generated and used by the organization. The
Heres a common example: third-party components need to
inevitable resulting diversity of component types, and their
be requested by a developer and approved by a legal
differing positions in the overall workflow, is one major reason
department. Frequently we see attempts to manage this
to use a dedicated Repository Manager, rather than just a
simple file server. process by a simple whitelist / blacklist approach. This is bound
to fail for several reasons:
Thus, the decision to use a Repository Manager generally
revolves around how many repositories you need, what types The breadth and volume of third-party components is
of component formats are dictated by those repositories and staggering. A typical enterprise can easily consume many
build tools, and what higher level functionality you need above hundreds of thousands of components that each release

DZONE.COM | DZONE, INC. BROUGHT TO YOU IN PARTNERSHIP WITH


4
new versions four times a year on average. Having humans can keep up with new requests and thus no one ever goes back
review each one is simply impossible. to check the things previously requested and approved.

Delays in human reviews of the list cause developers to do The automated rules engine and continuously updated
one of two things: 1) stop updating dependencies because Component Intelligence can alert you when these components
the friction is too high, leading to increased exposure to go bad so that your developers can immediately triage and
vulnerabilities over time and making it harder to upgrade remediate the problem.
later, or 2) work around the system to get their job done,
leading to decreased visibility of what is actually in use
FACTOR 2: INTERNAL COMPONENTS
and ultimately defeating the entire purpose of the process.
In addition to consuming third-party components, most
Fortunately, there are ways to deal with this that dont have modern build tools also need a location to push the artifacts
unintended side effects. of the build to a repository. This is done because the internal
artifacts themselves are often sub-assemblies or otherwise
COMPONENT INTELLIGENCE dependencies of yet another build. This makes hosting
Some professional versions of Repository Managers include internally developed components an equally important
health checks to provide instant insight into potential capability of a Repository Manager.
component security, license, and quality risks so that
There are several factors to consider when structuring your
development teams can take corrective action early and quickly.
internal hosted repositories. Youll want to partition things
This intelligence can help organizations identify known security,
into different repositories. Doing this effectively requires you
license, and architectural issues for each component. Health
to balance ease of administration by not having too many
check capabilities can be used as an automated audit tool for
repositories vs. challenges like security (covered later) as well
build managers, architects, open source governance, security,
as partitioning by use case.
and legal professionals.
A typical use case worth partitioning is for temporal
components. In Maven, these are formalized as snapshots and
are required to be separated from releases. In other tools, the
separation isnt baked in. Administration and cleanup will be
easier if you keep the components that are constantly churning
out short-lived versions separate from the ones you intend
to keep for a long time or forever. This allows easier purge
policies (discussed later), as well as optimizations in how you
store the components on disk that allow easier backups of the
permanent things, less block level fragmentation, and other
I/O-related concerns.

COMPONENT FIREWALL
Building upon the component intelligence, some Repository COMPONENT STAGING
Managers can provide a form of Firewall capability. It becomes Another common use case for internal components is to
possible to automate the decisions of what components manage them through a staging and promotion lifecycle.
to allow into the organization by using the intelligence and
When a component is pushed to a repository, that repository
combining it with a rules engine.
may not be its final destination. Imagine a workflow where a
release candidate component needs to go through integration
This allows you to stop bad components (e.g. ones with
testing and QA processes. Only components that go through
already existing known vulnerabilities, or ones with licenses
this process should be available for other teams or clients.
incompatible with your business model) from being proxied
and integrated only to be ripped out later. This model is
A Repository Manager can enable this workflow by providing
the only one that can scale without requiring an army of
mechanisms to associate components and promote them
human reviewers.
through various phases, where each phase may result in them
being available to different users on different known URLs.
Further, components that are known to be good when they
are first used become bad later when new vulnerabilities are This type of functionality is often done in conjunction with an
discovered. In a manual review process, almost no organization automated CI/CD pipeline that is discussed in more depth later.

DZONE.COM | DZONE, INC. BROUGHT TO YOU IN PARTNERSHIP WITH


5
for developers, or to achieve some level of HA or DR. Some
FACTOR 3: SECURITY AND MAINTENANCE
Repository Managers will force the complexity of replication
AUTHENTICATION AND AUTHORIZATION onto the admins by requiring that you configure point-to-point
Since the Repository Manager stores project-related binaries, mirroring, often for each repository individually. Other solutions
the same permissions enforced for the projects themselves separate the notion of what components need to be exposed
(such as the source code access permissions) should be used in what logical repositories from the notion of where and how
for protecting the resulting binaries. In some cases, access to they are located geographically so that the tool can do the
the binaries may be granted without granting access to the work intelligently and dynamically.
source and this can be managed at the repository level.
HIGH AVAILABILITY
To simplify and centralize user management, configure your
Using a Repository Manager to hold all your development
Repository Manager to integrate with other organization
dependencies also means that your repository is a central
systems such as LDAP, Active Directory, or single sign on
piece to your infrastructure; any downtime means halting
servers (SSO).
development, with all the consequences. In a CI/CD
As with source traceability, binary traceability is equally environment, when a Repository Manager is not available, a
important. Track changes in the repository (such as which build cannot execute nor deploy to production, which could be
user uploaded a component and when, or who is downloading disastrous to the business or organization.
components) for audit purposes.
Today, advanced Repository Managers use a private binary
cloud storage and backend for all components. This component
PURGING POLICIES
fabric intentionally decouples the physical node topology from
Although most components are usually kept for a long time
the logical component topology. Any component deployed or
(the same as any other product or distribution), there are some
proxied on one Repository Manager is immediately available
cases when we can benefit from purging repository contents.
to all others since the component fabric shares the knowledge
Snapshot repositories need to be purged from time to time about new components and their metadata - no custom setup
to ensure reasonable disk usageespecially when using is required to replicate components between repositories.
Continuous Integration heavily, since CI can easily generate
dozens of new builds per day. Usually, snapshots can be purged DISASTER RECOVERY
when a new version is released, but that may be changed to The component fabric is also a critical piece in disaster recovery.
just keep the most recent snapshots. In the case with a loss of network connectivity to a data center,
the component fabric stores all data in separate nodes in other
Proxied repositories for third party components can also be
datacenters, so Repository Manager requests can be directed
purged when the components are not being used by any
to other Repository Managers. When the network issue is fixed
release for instance, for components used during a proof of
and the data center comes back online, any new components
concept that is discarded. In these cases, it is a good practice
and data in the fabric are automatically synced with the
to separate the components being used in production from
the components used during development for trials or proof of datacenter that is now available again.
concept (this can also be done during promotion: promote not
only the built components, but also the dependencies). This will FACTOR 4: REPOSITORY MANAGERS IN THE
considerably simplify management downstream. DEVOPS TOOLCHAIN

SUPPORTING DISTRIBUTED TEAMS Repository Managers have become integral to the DevOps
When teams that access the repositories are located in different pipeline and are included in almost every reference architecture
locations or distributed across the globe, it is also important to found in organizations around the world.
provide access to all the components, both internal and third-
Any component or build artifact that is produced or needed
party. Recall that the basic essence of a Repository Manager is
in the CI/CD process is stored in a Repository Manager.
a caching proxy and therefore to reap the benefits, you really
Repositories are integrated to Jenkins, Maven, Gradle, Puppet,
want to have one located in each physical location where you
Chef, and almost every other tool in the DevOps toolchain.
have more than a few developers. Otherwise those developers
Rundeck, for example, orchestrates the deployment of
may suffer slow and unreliable build times because they are
applications to production and relies on a Repository Manager
fetching components from the internet and/or across the WAN.
to get the components it needs for deployment. Repository
In some cases, you will want to pre-emptively replicate Managers are central and critical to implementing modern
some content to another location, either to prime the repo DevOps environments.

DZONE.COM | DZONE, INC. BROUGHT TO YOU IN PARTNERSHIP WITH


6
a Repository Manager. In a similar fashion, using Mavens Deploy
goal (Maven has goals such as clean, compile, test, package,
install, deploy all of which are managed by plugins) components
can be published as a build step to a desired location. Its
important to note that this is Maven functionality and not
something unique to Jenkins or any other system.

CI/CD PIPELINES
WHAT IS CONTINUOUS INTEGRATION?
With the advent of lean, agile, and more recently, Continuous
Delivery and DevOps, projects no longer incubate for months
in a waterfall-like development process. Instead, they undergo
constant changes and releases. In many cases, these projects PUBLISHING PROJECTS
are distributed throughout an organization at different stages For teams moving into more advanced, continuous delivery
of the development lifecycle. and DevOps (Pipeline) models, Jenkins provides support
for pipeline-style projects. This becomes less about simply
This always-on development means the volume of changes
building or compiling projects ( la Maven or similar tool),
alone necessitates an automated approach to building
and more about automating tooling to take necessary action
applications. To support developers at scale, several tools have
and/or get things where they need to be throughout the
been developed to automate much of what goes into building
development lifecycle.
and releasing applications. This includes the ability to package
applications or components generated during build, and do so For example, it could be building a Maven package and making
continuously as changes are made. Colloquially this is referred sure its published to a repository for testing, then removed
to as Continuous Integration, or CI. once testing is complete. In more complex environments it
likely means multiple builds are taking place simultaneously,
PIPELINE and in parallel, then assembling each build together in a single
Building on the automation benefits from CI servers, teams package that is passed on through to staging and eventually
are now able to completely customize what goes where, when, into production. In some instances, the product of those builds
and how. In other words, teams add in various checkpoints is moved forward. In others, once its no longer needed its
throughout the process to ensure applications are free of major automatically removed.
defects. This isnt merely quality assurance anymore, but rather
governance to avoid vulnerability, license, and architectural The impact here is that it decouples the publishing process
issues in the applications and components a team produces. from the compilation tool, allowing greater customization
The result is a pipeline-like approach. and for components to be passed to the Repository Manager
at any point in the development lifecycle, regardless of the
REPOSITORY MANAGERS AND CI TOOLS development ecosystem.
Repository managers provide several interaction points with
CI tools like Jenkins and Bamboo. This can range from simply
requesting and storing a proxy of components as part of the
build process, to publishing internally developed components
for distribution across a development organization.

BUILDING PROJECTS
When an application is built using Maven (or similar build tools),
it gathers the components from a specified location (configured
via Maven settings) and compiles them. The end result can be
an application, or even another binary or component. Jenkins
provides automation for this by allowing the inclusion of a Maven
build step for freestyle and multi-configuration projects. When
the Build step is called, the Maven project will build, and if Maven
has been configured to do so, it can request components from

DZONE.COM | DZONE, INC. BROUGHT TO YOU IN PARTNERSHIP WITH


7
CONTAINER CONSIDERATIONS a repository that merges and exposes the contents of multiple
Docker containers and their usage have revolutionized the way repositories in one convenient URL.
applications and the underlying operating system are packaged
and deployed to development, testing, and production systems. This allows you to reduce time and bandwidth usage for
Docker Hub is the public registry for Docker container images accessing Docker images in a private registry as well as share
and it is being joined by more and more other publicly available proprietary images within your organization in a hosted
registries such as the Google Container Registry. repository. Users can then launch containers based on those
images, resulting in a completely private Docker registry with
In many ways, a container is just like other components, with a all the features available in the Repository Manager.
few key differences:
Think of the Repository Manager as a heterogeneous location
1. They can be huge. A Java component might be a to store and manage your Docker images, open source
few hundred kilobytes, but a container image can be software components, and other build artifacts. By
many gigabytes or larger. This can put a strain on the comparison, other private container registry solutions act as
underlying disk infrastructure if size and volume is not homogeneous repositories.
anticipated.

2. They are comprised of many layers. Each layer is


effectively a file system delta of changes upon the layers
below it. This makes it interesting because all but the
lowest level layer are unable to stand alone.

Repository Managers also offer support for Docker containers.


The Repository Manager acts as a private Docker registry that
is capable of hosting proprietary containers as well as proxying
the public registries when non-proprietary containers need to
be downloaded. You can expose these Docker repositories to
the client-side tools directly or as a repository group, which is

A B O U T T H E AU T H O R
BRIAN FOX is the co-founder and CTO of Sonatype, and is also a member of the Apache
Software Foundation and former Chair of the Apache Maven project. As a direct contributor
to the Maven ecosystem, including the maven-dependency-plugin and maven-enforcer-plugin,
he has over 20 years of experience driving the vision behind, as well as developing and leading
the development of software for organizations ranging from startups to large enterprises. Brian
is a frequent speaker at national and regional events including Java User Groups and other
development related conferences.

BROUGHT TO YOU IN PARTNERSHIP WITH

DZone communities deliver over 6 million pages each month to more than 3.3
million software developers, architects and decision makers. DZone offers
something for everyone, including news, tutorials, cheat sheets, research guides,
DZONE, INC. REFCARDZ FEEDBACK
feature articles, source code and more. WELCOME
150 PRESTON EXECUTIVE DR.
[email protected]
CARY, NC 27513
"DZone is a developer's dream," says PC Magazine.
SPONSORSHIP
Copyright 2017 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval 888.678.0399 OPPORTUNITIES
system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior
919.678.0300 [email protected]
written permission of the publisher.

You might also like