0% found this document useful (0 votes)
33 views308 pages

DevOps E BOOK Final

Uploaded by

adnantamboli9000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views308 pages

DevOps E BOOK Final

Uploaded by

adnantamboli9000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 308

AMIT KUMAR SINGH (MCA)

Allana Institute Of Management Sciences

1
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Course Code: IT-41 Course Name: DevOps
Unit Topics Details Weightage
No. in %
1 1. Introduction to DevOps.
1.1. Define Devops
1.2. What is Devops
1.3. SDLC models, Lean, ITIL, Agile
1.4. Why Devops?
1.5. History of Devops
1.6. Devops Stakeholders
1.7. Devops Goals
1.8. Important terminology
1.9. Devops perspective
1.10. DevOps and Agile
1.11. DevOps Tools 10
1.12. Configuration management
1.13. Continuous Integration and Deployment
1.14. Linux OS Introduction
1.15. Importance of Linux in DevOps
1.16. Linux Basic Command Utilities
1.17. Linux Administration
1.18. Environment Variables
1.19. Networking
1.20. Linux Server Installation
1.21. RPM and YUM Installation

2 2. Version Control-GIT
2.1. Introduction to GIT
2.2. What is Git
2.3. About Version Control System and Types
2.4. Difference between CVCS and DVCS
2.5. A short history of GIT
2.6. GIT Basics
2.7. GIT Command Line
2.8. Installing Git
2.9. Installing on Linux
2.10. Installing on Windows 15
2.11. Initial setup
2.12. Git Essentials
2.13. Creating repository
2.14. Cloning, check-in and committing
2.15. Fetch pull and remote
2.16. Branching
2.17. Creating the Branches, switching the branches,
merging
2.18. The branches.

3 3. Chef for configuration management


3.1. Overview of Chef; Common Chef Terminology (Server,
Workstation, Client, Repository Etc.) Servers and Nodes Chef
2
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Configuration Concepts.
3.2. Workstation Setup: How to configure knife Executesome commands
to test connection between knife and workstation.
3.3. Organization Setup: Create organization; Add yourself and
node to organization.
3.4. Test Node Setup: Create a server and add to organization,
check node details using knife.
3.5. Node Objects and Search: How to Add Run list to Node Check
node Details. 25
3.6. Environments: How to create Environments, Add servers to
environments.
3.7. Roles: Create roles, Add Roles to organization.
3.8. Attributes: Understanding of Attributes, Creating Custom
Attributes, Defining in Cookbooks.
3.9. Data bags: Understanding the data bags, Creating and managing the
Data bags, Creating the data bags using CLI and Chef Console,
Sample Data bags for
Creating Users.
4 4. Build tool- Maven
4.1. Maven Installation
4.2. Maven Build requirements
4.3. Maven POM Builds (pom.xml)
4.4. Maven Build Life Cycle
20
4.5. Maven Local Repository (.m2)
4.6. Maven Global Repository
4.7. Group ID, Artifact ID, Snapshot
4.8. Maven Dependencies
4.9. Maven Plugins
4 5. Docker– Containers & Build tool- Maven
5.1. Introduction: What is a Docker, Use case of Docker, Platforms for
Docker, Dockers vs. Virtualization
5.2. Architecture: Docker Architecture., Understanding the Docker
components
5.3. Installation: Installing Docker on Linux. Understanding Installation
of Docker on windows. Some Docker commands. Provisioning.
5.4. Docker Hub.: Downloading Docker images. Uploading the images
in Docker Registry and AWS ECS, Understanding the containers, 30
Running commands in container. Running multiple containers.
5.5. Custom images: Creating a custom image. Runninga container
from the custom image. Publishing the
custom image.
4.6. Docker Networking: Accessing containers, linking containers,
Exposing container ports, Container
Routing.
Total:
100

3
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 1 : Introduction to DevOps.
Define Devops
DevOps is the combination of cultural philosophies, practices, and tools that
increases an organization’s ability to deliver applications and services at high
velocity: evolving and improving products at a faster pace than organizations
using traditional software development and infrastructure management
processes. This speed enables organizations to better serve their customers and
compete more effectively in the market.

How DevOps Works


Under a DevOps model, development and operations teams are no longer
“siloed.” Sometimes, these two teams are merged into a single team where the
engineers work across the entire application lifecycle, from development and test
to deployment to operations, and develop a range of skills not limited to a single
function.

In some DevOps models, quality assurance and security teams may also become
more tightly integrated with development and operations and throughout the
application lifecycle. When security is the focus of everyone on a DevOps team,
this is sometimes referred to as DevSecOps.

These teams use practices to automate processes that historically have been
manual and slow. They use a technology stack and tooling which help them
operate and evolve applications quickly and reliably. These tools also help
engineers independently accomplish tasks (for example, deploying code or
provisioning infrastructure) that normally would have required help from other
teams, and this further increases a team’s velocity.

Benefits of DevOps
Move at high velocity so you can innovate for customers faster, adapt to changing markets better, and
grow more efficient at driving business results. The DevOps model enables your developers and
operations teams to achieve these results. For example, microservices and continuous delivery let teams
take ownership of services and then release updates to them quicker.

4
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Increase the frequency and pace of releases so you can innovate and improve your product faster. The
quicker you can release new features and fix bugs, the faster you can respond to your customers’ needs
and build competitive advantage. Continuous integration and continuous delivery are practices that
automate the software release process, from build to deploy.

Ensure the quality of application updates and infrastructure changes so you can reliably deliver at a more
rapid pace while maintaining a positive experience for end users. Use practices like continuous
integration and continuous delivery to test that each change is functional and safe. Monitoring and
logging practices help you stay informed of performance in real-time.

Operate and manage your infrastructure and development processes at scale. Automation and
consistency help you manage complex or changing systems efficiently and with reduced risk. For
example, infrastructure as code helps you manage your development, testing, and production
environments in a repeatable and more efficient manner.

Build more effective teams under a DevOps cultural model, which emphasizes values such as ownership
and accountability. Developers and operations teams collaborate closely, share many responsibilities, and
combine their workflows. This reduces inefficiencies and saves time (e.g. reduced handover periods
between developers and operations, writing code that takes into account the environment in which it is
run).

Move quickly while retaining control and preserving compliance. You can adopt a DevOps model without
sacrificing security by using automated compliance policies, fine-grained controls, and configuration
management techniques. For example, using infrastructure as code and policy as code, you can define
and then track compliance at scale.

What is Devops
DevOps combines development and operations to increase the efficiency, speed, and security of
software development and delivery compared to traditional processes. A more nimble software
development lifecycle results in a competitive advantage for businesses and their customers.

DevOps explained

5
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
DevOps can be best explained as people working together to conceive, build and deliver secure
software at top speed. DevOps practices enable software development (dev) and operations (ops)
teams to accelerate delivery through automation, collaboration, fast feedback, and iterative
improvement.
Stemming from an Agile approach to software development, a DevOps process expands on the
cross-functional approach of building and shipping applications in a faster and more iterative
manner. In adopting a DevOps development process, you are making a decision to improve the flow
and value delivery of your application by encouraging a more collaborative environment at all
stages of the development cycle.
DevOps represents a change in mindset for IT culture. In building on top of Agile, lean practices, and
systems theory, DevOps focuses on incremental development and rapid delivery of software.
Success relies on the ability to create a culture of accountability, improved collaboration, empathy,
and joint responsibility for business outcomes.
Core DevOps principles

The DevOps methodology comprises four key principles that guide the effectiveness and efficiency of
application development and deployment. These principles, listed below, center on the best aspects
of modern software development.
1. Automation of the software development lifecycle. This includes automating testing, builds,
releases, the provisioning of development environments, and other manual tasks that can slow
down or introduce human error into the software delivery process.
2. Collaboration and communication. A good DevOps team has automation, but a great DevOps
team also has effective collaboration and communication.
3. Continuous improvement and minimization of waste. From automating repetitive tasks to
watching performance metrics for ways to reduce release times or mean-time-to-recovery, high
performing DevOps teams are regularly looking for areas that could be improved.
4. Hyperfocus on user needs with short feedback loops. Through automation, improved
communication and collaboration, and continuous improvement, DevOps teams can take a
moment and focus on what real users really want, and how to give it to them.
By adopting these principles, organizations can improve code quality, achieve a faster time to
market, and engage in better application planning.

SDLC models, Lean, ITIL, Agile


Agile SDLC model is a combination of iterative and incremental process models with focus on process
adaptability and customer satisfaction by rapid delivery of working software product. Agile Methods
break the product into small incremental builds. These builds are provided in iterations. Each
iteration typically lasts from about one to three weeks. Every iteration involves cross functional
teams working simultaneously on various areas like −

Planning

6
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Requirements Analysis
Design
Coding
Unit Testing and
Acceptance Testing.
At the end of the iteration, a working product is displayed to the customer and important
stakeholders.

What is Agile?
Agile model believes that every project needs to be handled differently and the existing methods
need to be tailored to best suit the project requirements. In Agile, the tasks are divided to time boxes
(small time frames) to deliver specific features for a release.

Iterative approach is taken and working software build is delivered after each iteration. Each build is
incremental in terms of features; the final build holds all the features required by the customer.

Here is a graphical illustration of the Agile Model −

The Agile thought process had started early in the software development and started becoming
popular with time due to its flexibility and adaptability.

The most popular Agile methods include Rational Unified Process (1994), Scrum (1995), Crystal Clear,
Extreme Programming (1996), Adaptive Software Development, Feature Driven Development, and
Dynamic Systems Development Method (DSDM) (1995). These are now collectively referred to
as Agile Methodologies, after the Agile Manifesto was published in 2001.

Following are the Agile Manifesto principles −

7
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Individuals and interactions − In Agile development, self-organization and motivation are
important, as are interactions like co-location and pair programming.
Working software − Demo working software is considered the best means of communication
with the customers to understand their requirements, instead of just depending on
documentation.
Customer collaboration − As the requirements cannot be gathered completely in the
beginning of the project due to various factors, continuous customer interaction is very
important to get proper product requirements.
Responding to change − Agile Development is focused on quick responses to change and
continuous development.
Agile Vs Traditional SDLC Models
Agile is based on the adaptive software development methods, whereas the traditional SDLC
models like the waterfall model is based on a predictive approach. Predictive teams in the traditional
SDLC models usually work with detailed planning and have a complete forecast of the exact tasks and
features to be delivered in the next few months or during the product life cycle.

Predictive methods entirely depend on the requirement analysis and planning done in the beginning
of cycle. Any changes to be incorporated go through a strict change control management and
prioritization.

Agile uses an adaptive approach where there is no detailed planning and there is clarity on future
tasks only in respect of what features need to be developed. There is feature driven development
and the team adapts to the changing product requirements dynamically. The product is tested very
frequently, through the release iterations, minimizing the risk of any major failures in future.

Customer Interaction is the backbone of this Agile methodology, and open communication with
minimum documentation are the typical features of Agile development environment. The agile
teams work in close collaboration with each other and are most often located in the same
geographical location.

Lean, ITIL, Agile


Lean In DevOps

Lean is a systematic process for stopping waste and was created in the manufacturing world by W.
Edwards Deming and Taiichi Ohno’s Toyota Production System. It revolutionized the Japanese
industrial economy after World War II and later returned to the United States. The book, “Lean
Software Development An Agile Toolkit,” adapted these lean techniques to software development.
They identified seven principles of lean that apply to the software. Like the just-in-time tenet of lean
manufacturing and aligned with the agile idea of being flexible, you try to move fast but delay
decisions in enhanced feedback loops and group context. Building integrity will inform the
approaches to continue this integration and testing.

The basic philosophy of lean is about identifying which actions you and your organization perform
that add value to the product or service you produce and which do not. Activities that don’t add

8
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
value are called waste. Lean recognizes three significant types of waste, and they all have Japanese
names: muda, muri, and mura. Muri is the primary form of waste, and it comes in two types: type
one, which is technically waste but necessary for some reason, like compliance, and type two, which
is just plain wasteful. The Poppendiecks also defined seven primary wastes that are endemic in
software development. This includes bugs and delays, but it also includes spending effort on features
that aren’t needed. Toyota didn’t take long to adapt the lean to product development. A recent
popular adaptation of that is found in Eric Ries’s book “Lean Startup.” In the book, he proposes the
build-measure-learn loop as a variation of the usual Kaizen plan-do-check-act cycle. You focus on
delivering the minimum viable product to customers, get their feedback, and iterate from there
instead of trying to analyze what the perfect product would have been upfront. There are a variety of
techniques that go along with lean.

ITIL, ITSM, and SDLC

The Information Technology Infrastructure Library is a set of detailed practices for IT activities such
as IT service management and IT asset management that focus on aligning IT services with the needs
of the business. DevOps stands on the shoulders of giants. And there are a lot of concepts from the
various ITSM and SDLC frameworks and maturity models that are worth learning. Teams should be
organized around the standard ITIL processes, with sections for change management, supplier
management, incident management, etc. But when you implement these processes, you want to use
a lean and agile mindset and need to craft them in a way that’s people first, and that doesn’t
introduce waste or bottlenecks, into the value stream, in the name of a standard or best practices.

IT service management is a realization that service delivery is an integral part of the overall software
development life cycle. Engineers should properly manage it from design, development, deployment,
and maintenance to retirement. In the past, the software development life cycles focused on code
writing and tended to stop at handoff, or if they mentioned deployment and maintenance, they went
into very little detail on them. In this way, ITSM is one of DevOps’ ancestors. ITIL was the first ITSM
framework. It launched the idea of ITSM. So many folks still speak about them as if they’re the same
thing, even though other ITSM frameworks, like COBIT, have emerged since. ITIL is a UK government
standard that grew out of the Thatcherism of the 1980s as the previously organically managed IT
assets.

ITIL Guidelines

The UK government didn’t have a lot of alignment and standards. And so, their central IT division
published guidelines on managing services in the late eighties and early nineties. The UK’s central IT
group did this first version of ITIL so well that it piqued interest outside the UK government. In 2001,
ITIL v2 was published with the explicit intent of being used by others. V3 was published in 2007 and
updated in 2011. It uses a process model-based view of controlling and managing services. It can be
said to inherit from Deming’s plan due check act cycle. ITIL recognizes four primary phases of the
service lifestyle. Service strategy, design, transition, and operation. It has guidance for every kind of
IT process you’ve ever heard of, from incident management to portfolio management, to capacity
management to serve catalogs. At the same time, all of the high-level principles of ITIL make sense.
9
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
It’s designed to be a reasonably prescriptive and top-down framework. While it’s not technically
against ITIL to do agile development or perform continuous integration and deployment or other
such practices, honestly, much of the culture, advice, and consultancy around ITIL assumes a
waterfall push-driven model of the world. But it certainly doesn’t have to be that way.

Why Devops?
DevOps implementation varies with each company, depending on their goals, processes, and even
corporate cultures. We can, however, identify a number of core DevOps principles that most teams
follow. The advantages of DevOps are Fostering a collaborative environment through communication,
mutual trust, sharing of skills and ideas, and problem-solving.

Establishing a culture of end-to-end accountability, in which the entire team is responsible for
the outcomes and there are no ―pointing fingers‖ between the ―Dev‖ and ―Ops‖ experts.
Focusing on continual improvement based on customer input and evolving technologies in
order to optimize product quality, cost, and delivery speed.
Whenever possible, use automation to streamline and speed up development and deployment
processes, as well as enhance efficiency and dependability.
Providing a client-centric strategy with quick feedback loops to meet changing customer
needs.
Taking lessons from mistakes and fostering an environment where they can be turned into new
opportunities.

What is the importance of DevOps? Key DevOps Benefits

The most important DevOps benefits are discussed below

Reduced time to market:


DevOps is important for your company because it allows you to produce software faster because of
improved procedures, automation, and release planning, among other things. If you have a shorter
time to market, you have a better chance of beating your competitors.

Faster innovation:
10
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
One of the DevOps benefits is faster innovation, Because of speedier product delivery to the market,
you can innovate faster than your competition. The DevOps culture allows the team to openly
contribute ground-breaking ideas and communicate their thoughts in real-time.

Increased efficiency in development:


DevOps eliminates the need for software engineers to spend time on things that are perfectly
automated. The quantity of manual labor is kept to a bare minimum. Parallel workflows, acceleration
tools, scalable infrastructure, continuous integration servers, and much more all help to ensure
efficient development and deployment.

Higher reliability:
The development, deployment, and other processes become more reliable and less prone to errors.
With DevOps and continuous testing ensuring faster development cycles, the team can quickly
identify any inconsistencies or problems in the program. It‘s simple to address issues swiftly thanks to
good communication and sharing of experience. It‘s also quite simple to undo a deployment at any
point.

Customer satisfaction:
Another significant argument for the importance of DevOps is that the customer-centric approach,
regular feedback, shorter time to market, and continuous improvement all lead to the most fulfilling
software development outcomes.

The Future of DevOps


DevOps‘ future will almost certainly bring changes in tooling and organizational techniques, but its
primary objective will remain the same.

Automation will play a major role


Automation will continue to play a key role in DevOps transformation, and artificial intelligence for
IT operations (AIOps) will help businesses achieve their DevOps goals. The essential features of
AIOps are machine learning, performance baselines, anomaly detection, automated root cause analysis
(RCA), and predictive insights, which all work together to speed up typical operational processes.
This novel technology has the potential to transform how IT operations teams monitor alerts and fix
issues, and will play an important role in DevOps‘ future.

AIOps will make service uptime easier to achieve


AIOps ingests metrics and employs inference models to draw meaningful insights from data, in
addition to leveraging data science and computational techniques to automate routine operations.
AIOps‘ automated capabilities may make service uptime much easier to achieve, from monitoring to
alerting to remedy. AIOps solutions can be utilized for real-time event stream analysis, proactive
detection to prevent downtime, improved communication, faster deployments, and more by DevOps
teams will sharpen focus on Cloud optimization

DevOps in the future will place a larger emphasis on maximizing the usage of cloud technology.
According to Deloitte Consulting analyst David Linthicum, the cloud‘s centralized nature enables
DevOps automation with a consistent platform for testing, deployment, and production.

Regardless of what new technologies the future brings, enterprises must understand that DevOps is all
about the journey and that their DevOps-related goals and expectations will change over time.

History of Devops

11
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The origins of the DevOps trace back to when the term, ―DevOps‖ was first coined by Patrick
Debois in 2009 which is regarded as the DevOps origin year. Debois is now regarded as one of the
pioneering figures of DevOps and has gained significance over the years as one of its gurus with
more and more organizations integrating DevOps into their operating systems. To answer the
looming question of when DevOps started, let us first look at how the term was formulated. The
evolution of DevOps occurred because of the combination of the words ―development‖ and
―operations,‖. This, therefore essentially provides a fundamental point for comprehending what
exactly people mean when they refer to ―DevOps.‖ One important thing you need to know about
the DevOps methodology is that it isn‘t a technology, process or established definitive.

DevOps is often referred to as a cultural viewpoint. Most importantly, the true meaning of
DevOps has widened to become an umbrella term referring to the culture, processes, and mindset
used for optimizing and shortening the life cycle of software development with the help of fast
feedback loops for offering features, updates and fixes at a frequent pace. To know more about the
DevOps culture and find the answer to the question ‗when did DevOps start?‘, you can check out
some of the Best DevOps Courses online to help you acquire a comprehensive idea about what it
is, how it functions and how it is implemented.

The DevOps history is quite an interesting one considering how it was first implemented into the
workflow systems of organizations at large. In this article, we will discuss in detail when DevOps
started, and the detailed history and evolution of DevOps.

Devops Stakeholders

DevOps (Development and Operations) involves a collaborative approach to software development


and deployment, emphasizing communication, integration, and automation between development
12
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
teams and operations teams. Stakeholders in a DevOps environment can vary depending on the
organization and project, but here are some common stakeholders involved:
Developers: Developers play a crucial role in the DevOps process. They are responsible for writing,
testing, and deploying code, and they work closely with operations teams to ensure smooth delivery
and integration of software.
Operations Team: The operations team, including system administrators and network engineers, is
responsible for managing the infrastructure, servers, and network. They collaborate with developers to
ensure that the software can be deployed and operated effectively.
Quality Assurance (QA) Team: The QA team is responsible for testing the software and ensuring its
quality. They work closely with developers to define test cases, perform testing, and provide feedback
on the software's functionality and performance.
Project Managers: Project managers oversee the entire software development lifecycle and are
responsible for coordinating activities between different teams. They ensure that the project stays on
track, manages resources, and meets business objectives.
Product Owners: Product owners are responsible for defining and prioritizing the features and
requirements of the software. They work closely with developers and project managers to ensure that
the software meets the needs of the end-users and aligns with the overall product vision.
IT Operations Management: IT operations management is responsible for overall IT service delivery
and ensuring that the software can be effectively deployed, monitored, and supported in production
environments.
Security Team: In the DevOps process, security is an essential aspect. The security team ensures that
the software is developed, deployed, and operated securely, following best practices and compliance
requirements.
Executives and Management: Executives and management stakeholders provide strategic direction,
allocate resources, and make decisions that impact the overall success of the DevOps initiative. They
set goals, define key performance indicators, and monitor progress.
End-users and Customers: End-users and customers are ultimately the beneficiaries of the software
being developed. Their feedback and satisfaction are essential, and involving them as stakeholders
helps ensure that the software meets their needs and expectations.
Third-Party Providers: In some cases, organizations may have third-party providers or vendors
involved in the DevOps process, such as cloud service providers or external consultants. These
stakeholders contribute to specific aspects of development, deployment, or infrastructure management.
Remember that the specific stakeholders and their roles can vary depending on the organization,
project, and the level of DevOps maturity. It's essential to identify and involve the relevant
stakeholders to ensure effective collaboration and successful implementation of DevOps practices.

Devops Goals
The primary goals of DevOps are to improve collaboration, increase efficiency, and deliver high-
quality software more rapidly. Here are some specific goals that organizations strive to achieve
through DevOps practices:

13
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Accelerated Software Delivery: DevOps aims to reduce the time it takes to develop, test, and deploy
software. By automating processes, streamlining workflows, and fostering collaboration between
development and operations teams, organizations can release new features and updates more
frequently, enabling faster time-to-market.
Continuous Integration and Continuous Deployment (CI/CD): CI/CD is a core principle of DevOps. It
involves automating the integration and testing of code changes, as well as the continuous deployment
of software to production environments. The goal is to ensure that changes are thoroughly tested,
validated, and deployed quickly and reliably.
Increased Collaboration and Communication: DevOps promotes closer collaboration and
communication between developers, operations teams, and other stakeholders. Breaking down silos
and fostering a culture of collaboration helps teams work together more effectively, share knowledge,
and resolve issues faster.
Improved Quality and Reliability: DevOps emphasizes the integration of quality assurance processes
throughout the software development lifecycle. By automating testing, performing continuous
monitoring, and using feedback loops, organizations can identify and address issues earlier, resulting
in higher-quality software and more reliable systems.
Infrastructure as Code (IaC): DevOps encourages the use of infrastructure as code, where
infrastructure resources, such as servers, networks, and configurations, are defined and managed
programmatically. This approach enables consistency, repeatability, and scalability, reducing manual
configuration errors and facilitating infrastructure changes.
Agile and Lean Practices: DevOps aligns with agile and lean principles, focusing on iterative
development, frequent feedback, and continuous improvement. By applying agile methodologies and
lean practices, organizations can adapt quickly to changing requirements, minimize waste, and
optimize processes for efficiency and value delivery.
Enhanced Scalability and Flexibility: DevOps promotes scalability and flexibility by leveraging cloud
computing and virtualization technologies. By provisioning and managing resources dynamically,
organizations can scale their infrastructure based on demand, optimize resource utilization, and
quickly adapt to changing business needs.
Improved Security and Compliance: Security is an integral part of DevOps. By incorporating security
practices throughout the software development lifecycle, organizations can proactively address
vulnerabilities, enforce compliance requirements, and enhance overall system security.
Monitoring and Feedback Loops: DevOps emphasizes continuous monitoring of applications and
infrastructure, collecting metrics and logs to gain insights into system performance, availability, and
user behavior. Feedback loops enable teams to detect issues, identify areas for improvement, and make
data-driven decisions to enhance the software and its delivery process.
Cultural Transformation: DevOps often requires a cultural shift within an organization. It promotes a
collaborative and cross-functional mindset, encourages transparency, and fosters a blameless culture
where learning from failures is valued. The goal is to create an environment that supports
experimentation, innovation, and continuous learning.
It's important to note that the specific goals and priorities of DevOps may vary based on organizational
context, industry, and project requirements. However, these goals collectively represent the core
objectives that organizations typically aim to achieve through adopting DevOps practices.
14
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Important terminology
Here are some important terminologies and concepts commonly used in the DevOps domain:
Continuous Integration (CI): CI is the practice of frequently merging code changes from multiple
developers into a central repository. It involves automating the build and testing process to identify
integration issues early and ensure that the codebase remains in a working state.
Continuous Deployment (CD): CD is the process of automatically deploying software changes to
production environments after passing through the necessary testing and validation stages. It aims to
deliver software updates to end-users rapidly and reliably.
Infrastructure as Code (IaC): IaC is an approach where infrastructure resources, such as servers,
networks, and configurations, are defined and managed programmatically using code. It enables the
automation and reproducibility of infrastructure provisioning and management.
Microservices: Microservices is an architectural style where an application is composed of loosely
coupled, independently deployable services. Each service focuses on a specific business capability and
communicates with other services through well-defined APIs. Microservices enable flexibility,
scalability, and independent development and deployment of different parts of an application.
Orchestration: Orchestration refers to the coordination and management of various automated tasks,
workflows, and processes in a DevOps environment. It involves managing the execution order,
dependencies, and parallelization of tasks to achieve desired outcomes efficiently.
Configuration Management: Configuration management involves managing and maintaining
consistent configurations of infrastructure resources and software systems. It includes defining,
provisioning, and managing configurations, ensuring consistency, and facilitating efficient change
management.
Version Control: Version control, often implemented using tools like Git, is a system that tracks and
manages changes to source code and other files. It enables teams to collaborate, manage codebase
history, and revert to previous versions if needed.
Continuous Monitoring: Continuous monitoring involves collecting and analyzing data about system
performance, health, and user behavior in real-time. It helps detect issues, ensure system availability,
and provide insights for optimization and troubleshooting.
DevOps Pipeline: A DevOps pipeline represents the end-to-end process of software development,
testing, and deployment. It typically includes stages like code compilation, testing, artifact generation,
deployment, and monitoring. Automation and integration of these stages enable smooth and efficient
software delivery.
DevOps Culture: DevOps culture emphasizes collaboration, communication, and shared responsibility
between development and operations teams. It promotes a mindset of continuous learning,
experimentation, and a focus on delivering value to customers.
Agile: Agile is an iterative and incremental software development methodology that emphasizes
flexibility, customer collaboration, and rapid response to change. DevOps aligns with agile principles,
promoting short development cycles, continuous feedback, and adaptive planning.

15
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Kanban: Kanban is a visual management framework used to visualize and optimize the flow of work.
It provides transparency, promotes collaboration, and helps identify and address bottlenecks in the
development and delivery process.
These are just a few key terminologies and concepts within the vast DevOps landscape. There are
many more specific tools, frameworks, and practices that organizations may adopt based on their
requirements and goals.

Devops perspective
DevOps is a very simple approach from a theory point of view. Just a set of practices that combine
software development (Dev) and information-technology operations (Ops) to shorten system
development life cycle and provide continuous delivery with high software quality.

However, practically implementing DevOps is easier said than done. The practice requires a shift in
mindset, a lot of patience, and effective management. The process of implementing DevOps is long
and consists of the following steps:

1. Creating the DevOps Infrastructure

The first step to DevOps implementation is to create the DevOps infrastructure on which the
application will run. But doing so is not easy. There is a lack of co-operation between the development
and the operations team. Both work in two different groups and silos. Developers want to deliver
changes as soon as possible and the operations team, on the other hand, aims for stability.

Now, we must bring them both together but also ensure they work towards the common goal of our
stakeholders, i.e. releasing valuable software as soon as possible with minimum risk involved.

For this, we will have to create a continuous delivery pipeline so that both the development and the
operation team can work together without any kind of confusion – thus releasing software sprints
faster without much risk. We should make the following small yet crucial changes to realize this goal:

We should audit even the smallest changes made to the deployment environment so that if anything
goes wrong, we can easily track what caused the problem.

We need to set strong monitoring systems to alert the development and operations team on time if
any abnormal event occurs. This will minimize the downtime if anything goes wrong.

We should ensure the application logs a WARNING every time a connection is unexpectedly closed
or timed out, INFO or DEBUG every time a connection is closed.

We should make sure our operation team can test the scenario if anything goes wrong so that the
same thing can be prevented from happening again in the future.

We must involve the operations team in the organizational IT service continuity plan right from the
start.

For creating the DevOps infrastructure, we should use technology with which the operations team is
well-familiar – so that they can easily own and manage the environment.

2. Modeling & Managing the DevOps Infrastructure


16
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The next thing we need after creating the DevOps Infrastructure is to model and manage it. Even if we
don‘t have complete control over the selection of Infrastructure – we must fully automate the build,
integration, testing, and deployment process. These are the question we must ask at this stage:

How will we provision the DevOps infrastructure?

How are we going to deploy & configure various bits of software that forms our infrastructure?

How can we manage our infrastructure once the provisioning and configuration?

Everything we need to create and maintain the infrastructure, such as operating system install
definitions, configuration for data centre automation tools like Puppet, general infrastructure
configurations like DNS files & SMTP settings, and the scripts for managing the infrastructure will be
kept under the version control.

All these files in version control will provide inputs to the deployment pipeline – whose job in case of
infrastructural changes is to:

Verify that the infrastructural changes will be run on all the applications before they are pushed into
the production environment. This will ensure that the new version of the infrastructure passes all
functional and non-functional tests before it is live.

Push changes to the production environment and the testing environment managed by the operations
team.

Perform tests for ensuring the successful deployment of the new infrastructure on the application.

For managing the DevOps infrastructure environment, we will need the following things:

A. FOR CONTROLLING THE ACCESS TO THE DEVOPS INFRASTRUCTURE

Controlling the access so that no one can make changes without approval.

Defining an automated process to make changes to the infrastructure.

Monitoring the infrastructure to detect and fix issues on time.

B. FOR MAKING CHANGES TO THE DEVOPS INFRASTRUCTURE

Even the smallest change no matter if it‘s about updating the firewall or deploying a new version of
the software – should be run through the same change management process.

The DevOps Infrastructure modification process should be managed through a single ticketing
system everyone can log into.

Changes should be logged as they are so that they can be easily audited.

We should be able to view the history of changes to every environment.

We need to test the changes in a production-like testing environment before pushing them live.

Apply all the DevOps infrastructure changes to the version control first and then apply them through
the automated process.

17
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Run tests to verify if the changes we made have worked or not.

3. Managing the Server Provisioning and Configuration

Server provisioning & server configuration management is often overlooked in small and medium-
sized organizations. Yet, it‘s a very important factor in DevOps infrastructure and environmental
management. Let‘s know about both in detail:

A. SERVER PROVISIONING

In server provisioning, we take a set of resources like appropriate systems and data and software to
build a server and make it ready for network operation. Typical tasks during server provisioning are
selecting a server from a pool of available servers, loading appropriate software, customizing and
configuring the system, changing a boot image for the server, and finally changing its parameters. We
can read more about server provisioning here.

B. VIRTUALIZATION

Virtualization is the fundamental enable of the cloud which enables thousands of hosts virtually access
servers over the internet. A virtual machine emulates a physical machine. Following will be benefits of
virtualization:

Fast response to the changing environment

Consolidation

Hardware standardization

Baselines can be easily maintained

C. ONGOING SERVER MANAGEMENT

After installing the operating system, we need to ensure full control over the configuration. They
should not change in an uncontrolled manner. Nobody should be able to log into the deployment
environment except the operations team and no change should be done without an automated system.
We also need to apply OS service packages, upgrades, install new software, change necessary settings,
and perform deployments.

D. PARALLEL TESTING WITH VIRTUAL ENVIRONMENTS

Next, we need to run parallel tests in the deployment pipeline to see if everything is running smoothly
in the production environment or there are any issues.

4. Managing Data

We may also face a set of problems in Data management & organization while implementing the
DevOps infrastructure, such as:

There is a large volume of data involved which makes it impossible to keep track of each data
involved in software development.

The lifecycle of application data is different from other parts of the system.

18
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
One way to avoid this problem and effectively manage data is to delete the previous version or replace
the old version with a new copy.

However, doing so is not possible in real-time scenarios. Every single bit of data is important. There
can be scenarios when we might need to roll back to a previous state due to some issues. In that case,
we will still need the older versions of data. So, we will need some advanced approaches for data
management like:

A. DATABASE SCRIPTING

One great way to manage data in DevOps Infrastructure is to capture all database initialization and
migration as scripts and check into version control. Then, we can use these scripts to manage every
database used in the delivery process.

However, we need to make sure all the database scripts are managed effectively so that there is no
issue while retrieving data from the databases.

B. DEPLOYING A DATABASE AFRESH

The most challenging yet crucial part of managing the DevOps infrastructure is to reproduce an
environment after an issue occurs. We must ensure the application behaves the way it was behaving
before the issue. And that‘s where the process of deploying a database comes into play. This is what
happens while we deploy a database afresh:

The old version of data is erased

The new database structure, instances, and schemas are created

Finally, the data is loaded into the database

5. Incremental Change

Incremental change is another effective technique to manage DevOps infrastructure data. It ensures an
application keeps working even after we are making changes into it – which is an important per-
requisite of continuous integration (CI). Continuous delivery, on the other hand, demands the
successful deployment of every software release, including the changes to the database into
production. This means we must update the entire operational database while retaining the valuable
data held in it. So, we need an efficient rollback strategy so that we can easily take back control of
things if anything goes wrong.

For this, we follow the following data migration strategies:

A. DATABASE VERSIONING

It is one of the most efficient mechanisms for data migration in an automated fashion. All we need is
to create a table in the database which contains its version number. Now, every time we make a
change to the database, we will have to create two scripts:

A roll-forward script that takes the database from version x to version x+1.

A roll-backward script that takes the database from version x+1 to version x.

19
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Another thing we will need is an application configuration setting which specifies the version of the
database with which it is designed to work.

Then during the deployment, we can use a tool which looks at the current version of the database and
the database version required by the application version being deployed. Then this tool will use the
roll-forward or roll-backward scripts to align both the application and the database version correctly.
We can read about database scripting in detail here.

B. MANAGING ORCHESTRATED CHANGES

This is another common practice for data migration. But we are not in its favor because it‘s better if
applications can communicate directly – not through the database. Still, many companies are
following this practice and integrating all applications through a single database.

If you are doing the same, be careful because even a small change in the database can have a knock-on
effect on how other applications are working. We should test such changes in an orchestrated
environment before implementing them in the production environment.

C. ROLLING BACK THE DATABASES

With the help of roll-forward and roll-backward scripts, it‘s easy to use an application at the deploy
time to migrate the existing database to its correct version – that too without losing any data.

Another effective data migration strategy is to perform both the database migration process from the
application deployment process independently. This will also make sure data migration is done
without data loss or any change in the application behaviour.

6. Configuration Management

Configuration management is another crucial step in DevOps infrastructure management in which we


ensure that all the files and software which we are expecting on the machine are available, configured
correctly, and working as intended.

Managing configuration manually is simple for a single machine. However, when we are handling five
or ten servers with which 100-200 computers are connected – configuration management becomes a
nightmare. That‘s why we need a better way to manage things:

A. VERSION CONTROL

Version control is responsible for recording changes to a file or a set of files over time – so that we can
easily remember specific versions later. It‘s a wise thing to use because if we know the previous
versions of files, we can easily roll back to the earlier versions of the project. Version control can also
help us recover in case we make mistakes and screw up things.

Best practices for version control

Use version control for everything (source code, tests, database scripts, builds & deployment scripts,
documentation, libraries, and configuration files).

Check-in regularly to see if all the versions are working properly.

20
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Use detailed multi-paragraph commit messages during check-in. This can save hours of debugging in
case any error occurs later.

B. MANAGING COMPONENTS AND DEPENDENCIES

1. Managing external libraries

Since external libraries come in binary form, managing them can be a difficult task. Here are two ways
we can get this done:

Check the external libraries into the version control.

Declare the external libraries and use a tool like Maven or Ivy to down them from the Internet
repositories to our own artifact repository.

2. Managing components

The best way is to split the application into smaller components. This will limit the scope of the
changes to the application, reduce regression bugs, encourage reuse, and enable a much more efficient
development process on large projects.

C. MANAGING SOFTWARE CONFIGURATION

Another crucial part of configuration management. Software configuration should be managed


carefully. We should subject it proper management & testing, and consider a few important software
configuration principles, such as:

Keep all the available application configuration options in the same repository as its source code.

Manage the values of configurations separately.

Perform configurations using an automated process with the help of values taken from the
configuration repository.

Use clear naming conventions to avoid confusion.

Do not repeat any information.

Keep the configuration information as simple as possible.

Do not over-engineer or over-optimize the configuration system.

Run all necessary configuration tests and keep a record of each.

That‘s how we establish a DevOps infrastructure management and software deployment environment.
Although, the process is not easy. It requires a lot of patience and guidance because there are a lot of
chances we can go wrong.

At Softobiz, we have been responsible for building many successful projects like EasyWebinar using
the DevOps approach. We can effectively guide you through the process of DevOps Infrastructure
management and building the deployment pipeline environment.

21
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
DevOps and Agile

DevOps and Agile are the two software development methodologies with similar aims, getting the
end-product as quickly and efficiently as possible. While many organizations are hoping to employ
these practices, there is often some confusion between both methodologies.

What does each methodology enclose? Where do they overlap? Can they work together, or should we
choose one over the other?

Before move further, take a glance at DevOps and Agile.

What is DevOps?

The DevOps is a combination of two words, one is software Development, and second is Operations.
This allows a single team to handle the entire application lifecycle, from development to testing,
deployment, and operations. DevOps helps you to reduce the disconnection between software
developers, quality assurance (QA) engineers, and system administrators.

DevOps promotes collaboration between Development and Operations team to deploy code to
production faster in an automated & repeatable way.

DevOps helps to increase organization speed to deliver applications and services. It also allows
organizations to serve their customers better and compete more strongly in the market.

DevOps can also be defined as a sequence of development and IT operations with better
communication and collaboration.

DevOps has become one of the most valuable business disciplines for enterprises or organizations.
With the help of DevOps, quality, and speed of the application delivery has improved to a great
extent.

22
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
DevOps is nothing but a practice or methodology of making "Developers" and "Operations" folks
work together. DevOps represents a change in the IT culture with a complete focus on rapid IT service
delivery through the adoption of agile practices in the context of a system-oriented approach.

What is Agile?
The Agile involves continuous iteration of development and testing in the SDLC process. Both
development and testing activities are concurrent, unlike the waterfall model. This software
development method emphasizes on incremental, iterative, and evolutionary development.
It breaks the product into small pieces and integrates them for final testing. It can be implemented in
many ways, such as Kanban, XP, Scrum, etc.
The Agile software development focus on the four core values, such as:
o Working software over comprehensive documentation.
o Responded to change over following a plan.
o Customer collaboration over contract negotiation.
o Individual and team interaction over the process and tools.
Below are some essential differences between the DevOps and Agile:
Parameter DevOps Agile
Definition DevOps is a practice of bringing Agile refers to the continuous iterative
development and operation teams approach, which focuses on
together. collaboration, customer feedback, small,
and rapid releases.
Purpose DevOps purpose is to manage end to end The agile purpose is to manage complex
engineering processes. projects.
Task It focuses on constant testing and It focuses on constant changes.
delivery.
Team size It has a large team size as it involves all It has a small team size. As smaller is
the stack holders. the team, the fewer people work on it so
that they can move faster.
Team skillset The DevOps divides and spreads the The Agile development emphasizes
skill set between development and the training all team members to have a
operation team. wide variety of similar and equal skills.
Implementation DevOps is focused on collaboration, so Agile can implement within a range of
it does not have any commonly accepted tactical frameworks such as safe, scrum,
framework. and sprint.
Duration The ideal goal is to deliver the code to Agile development is managed in units
production daily or every few hours. of sprints. So this time is much less than
a month for each sprint.
Target areas End to End business solution and fast Software development.
delivery.
Feedback Feedback comes from the internal team. In Agile, feedback is coming from the
customer.
Shift left It supports both variations left and right. It supports only shift left.
principle
Focus DevOps focuses on operational and Agile focuses on functional and non-
business readiness. functional readiness.
Importance In DevOps, developing, testing, and Developing software is inherent to
implementation all are equally Agile.
important.
Quality DevOps contributes to creating better The Agile produces better applications
quality with automation and early bug suites with the desired requirements. It
removal. Developers need to follow can quickly adapt according to the
23
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Coding and best Architectural practices changes made on time during the project
to maintain quality standards. life.
Tools Puppet, Chef, AWS, Ansible, and team Bugzilla, Kanboard, JIRA are some
City OpenStack are popular DevOps popular Agile tools.
tools.
Automation Automation is the primary goal of Agile does not emphasize on the
DevOps. It works on the principle of automation.
maximizing efficiency when deploying
software.
Communication DevOps communication involves specs Scrum is the most common method of
and design documents. It is essential for implementing Agile software
the operational team to fully understand development. Scrum meeting is carried
the software release and its network out daily.
implications for the enough running the
deployment process.
Documentation In the DevOps, the process The agile method gives priority to the
documentation is foremost because it working system over complete
will send the software to an operational documentation. It is ideal when you are
team for deployment. Automation flexible and responsive. However, it can
minimizes the impact of insufficient harm when you are trying to turn things
documentation. However, in the over to another team for deployment.
development of sophisticated software,
it's difficult to transfer all the knowledge
required.

DevOps Tools
Here are some most popular DevOps tools with brief explanation shown in the below image, such as:

24
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

1) Puppet
Puppet is the most widely used DevOps tool. It allows the delivery and release of the technology
changes quickly and frequently. It has features of versioning, automated testing, and continuous
delivery. It enables to manage entire infrastructure as code without expanding the size of the team.
Features
o Real-time context-aware reporting.
o Model and manage the entire environment.
o Defined and continually enforce infrastructure.
o Desired state conflict detection and remediation.
o It inspects and reports on packages running across the infrastructure.
o It eliminates manual work for the software delivery process.
o It helps the developer to deliver great software quickly.
2) Ansible
Ansible is a leading DevOps tool. Ansible is an open-source IT engine that automates application
deployment, cloud provisioning, intra service orchestration, and other IT tools. It makes it easier for
DevOps teams to scale automation and speed up productivity.
Ansible is easy to deploy because it does not use any agents or custom security infrastructure on the
client-side, and by pushing modules to the clients. These modules are executed locally on the client-
side, and the output is pushed back to the Ansible server.
Features
o It is easy to use to open source deploy applications.
o It helps in avoiding complexity in the software development process.
o It eliminates repetitive tasks.
o It manages complex deployments and speeds up the development process.
3) Docker
Docker is a high-end DevOps tool that allows building, ship, and run distributed applications on
multiple systems. It also helps to assemble the apps quickly from the components, and it is typically
suitable for container management.
Features
o It configures the system more comfortable and faster.
o It increases productivity.
o It provides containers that are used to run the application in an isolated environment.
o It routes the incoming request for published ports on available nodes to an active container.
This feature enables the connection even if there is no task running on the node.
25
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o It allows saving secrets into the swarm itself.
4) Nagios
Nagios is one of the more useful tools for DevOps. It can determine the errors and rectify them with
the help of network, infrastructure, server, and log monitoring systems.
Features
o It provides complete monitoring of desktop and server operating systems.
o The network analyzer helps to identify bottlenecks and optimize bandwidth utilization.
o It helps to monitor components such as services, application, OS, and network protocol.
o It also provides to complete monitoring of Java Management Extensions.
5) CHEF
A chef is a useful tool for achieving scale, speed, and consistency. The chef is a cloud-based system
and open source technology. This technology uses Ruby encoding to develop essential building blocks
such as recipes and cookbooks. The chef is used in infrastructure automation and helps in reducing
manual and repetitive tasks for infrastructure management.
Chef has got its convention for different building blocks, which are required to manage and automate
infrastructure.
Features
o It maintains high availability.
o It can manage multiple cloud environments.
o It uses popular Ruby language to create a domain-specific language.
o The chef does not make any assumptions about the current status of the node. It uses its
mechanism to get the current state of the machine.
6) Jenkins
Jenkins is a DevOps tool for monitoring the execution of repeated tasks. Jenkins is a software that
allows continuous integration. Jenkins will be installed on a server where the central build will take
place. It helps to integrate project changes more efficiently by finding the issues quickly.
Features
o Jenkins increases the scale of automation.
o It can easily set up and configure via a web interface.
o It can distribute the tasks across multiple machines, thereby increasing concurrency.
o It supports continuous integration and continuous delivery.
o It offers 400 plugins to support the building and testing any project virtually.
o It requires little maintenance and has a built-in GUI tool for easy updates.

26
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
7) Git
Git is an open-source distributed version control system that is freely available for everyone. It is
designed to handle minor to major projects with speed and efficiency. It is developed to co-ordinate
the work among programmers. The version control allows you to track and work together with your
team members at the same workspace. It is used as a critical distributed version-control for the
DevOps tool.
Features
o It is a free open source tool.
o It allows distributed development.
o It supports the pull request.
o It enables a faster release cycle.
o Git is very scalable.
o It is very secure and completes the tasks very fast.
8) SALTSTACK
Stackify is a lightweight DevOps tool. It shows real-time error queries, logs, and more directly into the
workstation. SALTSTACK is an ideal solution for intelligent orchestration for the software-defined
data center.
Features
o It eliminates messy configuration or data changes.
o It can trace detail of all the types of the web request.
o It allows us to find and fix the bugs before production.
o It provides secure access and configures image caches.
o It secures multi-tenancy with granular role-based access control.
o Flexible image management with a private registry to store and manage images.
9) Splunk
Splunk is a tool to make machine data usable, accessible, and valuable to everyone. It delivers
operational intelligence to DevOps teams. It helps companies to be more secure, productive, and
competitive.
Features
o It has the next-generation monitoring and analytics solution.
o It delivers a single, unified view of different IT services.
o Extend the Splunk platform with purpose-built solutions for security.
o Data drive analytics with actionable insight.

27
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
10) Selenium
Selenium is a portable software testing framework for web applications. It provides an easy interface
for developing automated tests.
Features
o It is a free open source tool.
o It supports multiplatform for testing, such as Android and ios.
o It is easy to build a keyword-driven framework for a WebDriver.
o It creates robust browser-based regression automation suites and tests.

Configuration management
configuration management tools
There are a variety of configuration management tools available, and each has specific features that
make it better for some situations than others. Yet the top five configuration management tools,
presented below in alphabetical order, have several things in common that I believe are essential for
DevOps success: all have an open source license, use externalized configuration definition files, run
unattended, and are scriptable. All of the descriptions are based on information from the tools'
software repositories and websites.
Ansible
"Ansible is a radically simple IT automation platform that makes your applications and systems easier
to deploy. Avoid writing scripts or custom code to deploy and update your applications—automate in
a language that approaches plain English, using SSH, with no agents to install on remote systems." —
GitHub repository
Website
Documentation
Community
Ansible is one of my favorite tools; I started using it several years ago and fell in love with it. You can
use Ansible to execute the same command for a list of servers from the command line. You can also
use it to automate tasks using "playbooks" written into a YAML file, which facilitate communication
between teams and non-technical people. Its main advantages are that it is simple, agentless, and easy
to read (especially for non-programmers).
Because agents are not required, there is less overhead on servers. An SSH connection is necessary
when running in push mode (which is the default), but pull mode is available if needed. Playbooks can
be written with a minimal set of commands or they can be scaled for more elaborate automation tasks
that could include roles, variables, and modules written by other people.
You can combine Ansible with other tools to create a central console to control processes. Those tools
include Ansible Works (AWX), Jenkins, RunDeck, and ARA, which offers traceability when running
playbooks.

28
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CFEngine
"CFEngine 3 is a popular open source configuration management system. Its primary function is to
provide automated configuration and maintenance of large-scale computer systems." —GitHub
repository
Website
Documentation
Community
CFEngine was introduced by Mark Burgess in 1993 as a scientific approach to automated
configuration management. The goal was to deal with the entropy in computer systems' configuration
and resolve it with end-state "convergence." Convergence means a desired end-state and elaborates on
idempotence as a capacity to reach the desired end-state. Burgess' research evolved in 2004 when he
proposed the Promise theory as a model of voluntary cooperation between agents.
The current version of CFEngine incorporates Promise theory and uses agents running on each server
that pull the configuration from a central repository. It requires some expert knowledge to deal with
configurations, so it's best suited for technical people.
Chef
"A systems integration framework, built to bring the benefits of configuration management to your
entire infrastructure." —GitHub repository
Website
Documentation
Community
Chef uses "recipes" written in Ruby to keep your infrastructure running up-to-date and compliant. The
recipes describe a series of resources that should be in a particular state. Chef can run in client/server
mode or in a standalone configuration named chef-solo. It has good integration with the major cloud
providers to automatically provision and configure new machines.
Chef has a solid user base and provides a full toolset to allow people with different technical
backgrounds and skills to interact around the recipes. But, at its base, it is more technically oriented
tool.
Puppet
"Puppet, an automated administrative engine for your Linux, Unix, and Windows systems, performs
administrative tasks (such as adding users, installing packages, and updating server configurations)
based on a centralized specification." —GitHub repository
Website
Documentation
Community
Conceived as a tool oriented toward operations and sysadmins, Puppet has consolidated as a
configuration management tool. It usually works in a client-server architecture, and an agent
communicates with the server to fetch configuration instructions.
Puppet uses a declarative language or Ruby to describe the system configuration. It is organized in
modules, and manifest files contain the desired-state goals to keep everything as required. Puppet uses
the push model by default, and the pull model can be configured.
Salt
"Software to automate the management and configuration of any infrastructure or application at
scale." — GitHub repository
Website

29
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Documentation
Community
Salt was created for high-speed data collection and scale beyond tens of thousands of servers. It uses
Python modules to handle configuration details and specific actions. These modules manage all of
Salt's remote execution and state management behavior. Some level of technical skills are required to
configure the modules.
Salt uses a client-server topology (with the Salt master as server and Salt minions as clients).
Configurations are kept in Salt state files, which describe everything required to keep a system in the
desired state.

Continuous Integration and Deployment


CI and CD Referred to as the Most Crucial DevOps Practice?
Continuous Integration and Continuous Delivery are among the most significant practices as they
create an active process of integrating and delivering the product to the market.
Small code changes can be made in the software code, making the entire process simpler and more
accessible.
CI and CD provide continuous feedback from the customers and the DevOps team, thus increasing
the transparency of any problem within the team or outside it.
The overall process ensures the faster release of the product.
The failures can now be detected faster and hence fixed effortlessly and quickly, which increases
the speed of release.

Next, let's talk about all the three processes in detail.


What is Continuous Integration?
Continuous Integration (CI) is a DevOps software development practice that enables the developers to
merge their code changes in the central repository. That way, automated builds and tests can be run.
The amendments by the developers are validated by creating a built and running an automated test
against them.
In the case of Continuous Integration, a tremendous amount of emphasis is placed on testing
automation to check on the application. This is to know if it is broken whenever new commits are
integrated into the main branch.

30
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

What is Continuous Delivery?


Continuous Delivery (CD) is a DevOps practice that refers to the building, testing, and delivering
improvements to the software code. The phase is referred to as the extension of the Continuous
Integration phase to make sure that new changes can be released to the customers quickly in a
substantial manner.
This can be simplified as, though you have automated testing, the release process is also automated,
and any deployment can occur at any time with just one click of a button.
Continuous Delivery gives you the power to decide whether to make the releases daily, weekly, or
whenever the business requires it. The maximum benefits of Continuous Delivery can only be yielded
if they release small batches, which are easy to troubleshoot if any glitch occurs.

What is Continuous Deployment?


When the step of Continuous Delivery is extended, it results in the phase of Continuous Deployment.
Continuous Deployment (CD) is the final stage in the pipeline that refers to the automatic releasing of
any developer changes from the repository to the production.
Continuous Deployment ensures that any change that passes through the stages of production is
released to the end-users. There is absolutely no way other than any failure in the test that may stop
the deployment of new changes to the output. This step is a great way to speed up the feedback loop
with customers and is free from human intervention.

31
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
After the basics of all three concepts, it's essential to understand how these three processes relate to
each other.
Linux OS Introduction
Linux is a community of open-source Unix like operating systems that are based on the Linux
Kernel. It was initially released by Linus Torvalds on September 17, 1991. It is a free and open-
source operating system and the source code can be modified and distributed to anyone
commercially or noncommercially under the GNU General Public License.
Initially, Linux was created for personal computers and gradually it was used in other machines like
servers, mainframe computers, supercomputers, etc. Nowadays, Linux is also used in embedded
systems like routers, automation controls, televisions, digital video recorders, video game consoles,
smartwatches, etc. The biggest success of Linux is Android(operating system) it is based on the
Linux kernel that is running on smartphones and tablets. Due to android Linux has the largest
installed base of all general-purpose operating systems. Linux is generally packaged in a Linux
distribution.
Linux Distribution
Linux distribution is an operating system that is made up of a collection of software based on Linux
kernel or you can say distribution contains the Linux kernel and supporting libraries and software.
And you can get Linux based operating system by downloading one of the Linux distributions and
these distributions are available for different types of devices like embedded devices, personal
computers, etc. Around 600 + Linux Distributions are available and some of the popular Linux
distributions are:
MX Linux
Manjaro
Linux Mint
elementary
Ubuntu
Debian
Solus
Fedora
openSUSE
Deepin
ARCHITECTURE OF LINUX
Linux architecture has the following components:

1. Kernel: Kernel is the core of the Linux based operating system. It virtualizes the common
hardware resources of the computer to provide each process with its virtual resources. This
makes the process seem as if it is the sole process running on the machine. The kernel is also

32
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
responsible for preventing and mitigating conflicts between different processes. Different types
of the kernel are:
Monolithic Kernel
Hybrid kernels
Exo kernels
Micro kernels
2. System Library: Isthe special types of functions that are used to implement the functionality of
the operating system.
3. Shell: It is an interface to the kernel which hides the complexity of the kernel‘s functions from
the users. It takes commands from the user and executes the kernel‘s functions.
4. Hardware Layer: This layer consists all peripheral devices like RAM/ HDD/ CPU etc.
5. System Utility: It provides the functionalities of an operating system to the user.
Advantages of Linux
The main advantage of Linux, is it is an open-source operating system. This means the source
code is easily available for everyone and you are allowed to contribute, modify and distribute the
code to anyone without any permissions.
In terms of security, Linux is more secure than any other operating system. It does not mean that
Linux is 100 percent secure it has some malware for it but is less vulnerable than any other
operating system. So, it does not require any anti-virus software.
The software updates in Linux are easy and frequent.
Various Linux distributions are available so that you can use them according to your
requirements or according to your taste.
Linux is freely available to use on the internet.
It has large community support.
It provides high stability. It rarely slows down or freezes and there is no need to reboot it after a
short time.
It maintain the privacy of the user.
The performance of the Linux system is much higher than other operating systems. It allows a
large number of people to work at the same time and it handles them efficiently.
It is network friendly.
The flexibility of Linux is high. There is no need to install a complete Linux suit; you are
allowed to install only required components.
Linux is compatible with a large number of file formats.
It is fast and easy to install from the web. It can also install on any hardware even on your old
computer system.
It performs all tasks properly even if it has limited space on the hard disk.
Disadvantages of Linux
It is not very user-friendly. So, it may be confusing for beginners.
It has small peripheral hardware drivers as compared to windows.
Is There Any Difference between Linux and Ubuntu?
The answer is YES. The main difference between Linux and Ubuntu is Linux is the family of open -
source operating systems which is based on Linux kernel, whereas Ubuntu is a free open-source
operating system and the Linux distribution which is based on Debian. Or in other words, Linux is
the core system and Ubuntu is the distribution of Linux. Linux is developed by Linus Torvalds and
released in 1991 and Ubuntu is developed by Canonical Ltd. and released in 2004.

Importance of Linux in DevOps


Linux plays a significant role in the DevOps ecosystem and is highly valued for several reasons:
Open-Source Nature: Linux is an open-source operating system, which means that its source code is
freely available and can be modified and distributed by anyone. This openness fosters collaboration
33
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
and innovation, allowing DevOps teams to customize and optimize Linux-based systems to meet their
specific needs.
Flexibility and Customizability: Linux provides a high degree of flexibility and customizability,
allowing DevOps teams to tailor their infrastructure and environments to their requirements. It
supports a wide range of distributions and flavors, each with its own set of features and package
management systems, enabling teams to choose the most suitable option for their projects.
Stable and Reliable: Linux is known for its stability and reliability. It is widely used in production
environments due to its robustness and ability to handle heavy workloads efficiently. This reliability is
crucial in DevOps, as it ensures that the systems and infrastructure powering software development
and delivery remain dependable and consistent.
Command-Line Power: Linux's command-line interface (CLI) provides powerful tools and utilities
that empower DevOps professionals to automate tasks, manage configurations, and perform complex
operations. The CLI allows for efficient scripting, automation, and system management, enabling
streamlined processes and reducing manual effort.
Rich Ecosystem: Linux has a vast ecosystem of open-source tools, frameworks, and utilities that are
integral to the DevOps toolchain. From version control systems like Git to containerization
technologies like Docker and orchestration platforms like Kubernetes, Linux is the foundation for
many key DevOps tools and technologies.
Containerization and Virtualization: Linux has been at the forefront of containerization and
virtualization technologies. Platforms like Docker, which leverage Linux kernel features, enable the
creation and management of lightweight, isolated containers. These containers facilitate
reproducibility, scalability, and portability of applications, allowing for efficient deployment and
management in DevOps workflows.
Security and Stability: Linux is well-regarded for its strong security features and a robust
permission-based model. It offers a solid foundation for building secure and stable systems, which are
critical in the context of DevOps. The security of the underlying operating system greatly impacts the
overall security posture of the software being developed and deployed.
Cloud and Infrastructure Compatibility: Linux has excellent compatibility with cloud platforms,
making it an ideal choice for DevOps in cloud environments. Many cloud providers offer Linux-based
virtual machine instances and container services, allowing seamless integration of Linux-based
systems with cloud infrastructure.
Community Support: The Linux community is vast and active, providing extensive support,
documentation, and resources for troubleshooting and knowledge sharing. DevOps teams can benefit
from the collective expertise and experience of the Linux community when working with Linux-based
systems.
Overall, Linux's open-source nature, flexibility, stability, command-line power, compatibility with
containerization and virtualization, and strong security make it a preferred choice in the DevOps
world. It serves as a reliable foundation for building and managing the infrastructure, tools, and
environments that drive modern software development and deployment practices.

34
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Linux Basic Command Utilities


Linux provides a wide range of command-line utilities that are essential for managing and interacting
with the operating system. Here are some basic command-line utilities commonly used in Linux:
ls: Used to list files and directories in the current directory or a specified location. It provides
information about file permissions, ownership, size, and modification dates.
cd: Used to change the current working directory. It allows you to navigate through the directory
structure.
pwd: Prints the current working directory, displaying the full path of the directory you are currently
in.
mkdir: Creates a new directory or directories. It is used with the desired directory name(s) as an
argument.
rm: Removes files and directories. It can be used with options like -r to remove directories recursively
and -f to force removal without prompting for confirmation.
cp: Copies files and directories. It takes the source file/directory path and the destination path as
arguments.
mv: Moves or renames files and directories. It can be used to move files/directories to a different
location or rename them.
cat: Concatenates and displays the contents of one or more files. It is often used to view the contents
of text files.
grep: Searches for patterns within files or input. It is commonly used with regular expressions to find
specific text within files.
find: Searches for files and directories based on specified criteria such as name, size, or modification
time. It is a powerful tool for locating files in a directory hierarchy.
chmod: Changes the permissions of files and directories. It allows you to modify permissions for the
owner, group, and others using numeric or symbolic notation.
chown: Changes the ownership of files and directories. It is used to assign ownership to specific users
or groups.
sudo: Executes a command with superuser (root) privileges. It allows authorized users to perform
administrative tasks.
man: Displays the manual pages for a command, providing detailed information about its usage,
options, and examples.
tar: Archives files and directories into a single file (often compressed). It is used for bundling and
compressing files or directories.
wget: Downloads files from the internet using HTTP, HTTPS, or FTP protocols. It is commonly used
for downloading files or webpages.
ssh: Establishes secure shell (SSH) connections to remote servers. It enables secure remote login and
command execution on remote machines.
35
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
These are just a few examples of the numerous command-line utilities available in Linux. Each utility
has various options and functionalities that can be explored further by referring to their respective
manual pages or online documentation.

In the context of DevOps, here are some additional command-line utilities commonly used in
Linux:

1. git: A version control system used for tracking changes in source code. It allows collaboration,
branching, merging, and version management. Commands like git clone, git commit, git push, and git
pull are commonly used.
2. docker: A containerization platform that allows you to build, deploy, and manage containers. It
provides commands like docker build, docker run, and docker-compose for container management.
3. kubectl: The command-line interface for managing Kubernetes clusters. It allows you to deploy and
manage containerized applications, check cluster status, and interact with various Kubernetes
resources.
4. ansible: An automation tool used for configuration management, application deployment, and
orchestration. It enables the automation of tasks across multiple servers with simple and declarative
scripts called "playbooks."
5. terraform: A tool for provisioning and managing infrastructure as code. It allows you to define and
create infrastructure resources in various cloud platforms and data centers using a declarative
language.
6. curl: A command-line tool used for making HTTP requests and interacting with APIs. It can be used
to test API endpoints, download files, and perform various web-related operations.
7. jq: A lightweight command-line tool for parsing and manipulating JSON data. It allows you to extract,
filter, and transform JSON data, making it useful for processing API responses and working with
JSON files.
8. awk: A powerful text processing utility that enables pattern scanning and text manipulation. It is
commonly used for data extraction, transformation, and reporting tasks.
9. sed: A stream editor used for performing text transformations on input streams or files. It is
particularly useful for search and replace operations, text manipulation, and basic scripting.
10. ssh-keygen: A utility used for generating SSH key pairs. SSH keys are essential for secure remote
access and authentication to servers and systems.
11. systemctl: A command-line utility for managing system services in Linux. It allows you to start, stop,
restart, enable, or disable system services and view their status.
12. top and htop: These utilities provide real-time monitoring of system resources, including CPU,
memory, and process information. They are useful for monitoring system performance and identifying
resource-intensive processes.
13. traceroute: A command-line tool used to trace the path packets take from your computer to a
destination IP address or domain. It helps diagnose network connectivity issues and identify network
hops.
14. nc/netcat: A versatile networking utility that can be used for various purposes, such as establishing
TCP/UDP connections, port scanning, and transferring data between systems.

36
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
These are just a few examples of command-line utilities that are frequently used in DevOps
workflows. The choice of utilities may vary depending on the specific requirements of the DevOps
tasks and the technologies being used.

Linux Administration
Linux is a major strength in computing technology. Most web servers, mobile phones,
personal computers, supercomputers, and cloud servers are powered by Linux. The job of
a Linux systems administrator is to manage the operations of a computer system like
maintaining, enhancing, creating user accounts/reports, and taking backups using Linux
tools and command-line interface tools. Most computing devices are powered by Linux
because of its high stability, high security, and open-source environment. There are some
of the things that a Linux system administrator should know and understand:
Linux File Systems
A Linux system administrator should have a solid knowledge and understanding of the
various Linux file systems used by Linux like Ext2, Ext3, and Ext4. Understanding the
difference between these file systems is important so that one can easily perform tasks and
partition disks or configure Linux file system permissions.

File System Hierarchy


The Linux File System Hierarchy (FHS) tells us about the location and structure of
directories and files on a Linux system. It is important for managing system files effectively.
Managing Root/Super User
The root user is the most powerful user on a Linux System because it has access to all the
system files and directories. So, it is important for maintaining system security.
Basic Bash Command
The default shell of Linux is Bash, and it is used for executing commands on the command-
line interface. A Linux system administrator should have a basic understanding of the
command of bash to perform tasks.
Handling File, Directories, and Users
Managing files, directories and users is a critical part of Linux system administration. A
system administrator should be able to perform the basic file and directory management
tasks.
Duties of a Linux Administrator:
Linux System Administration has become a solid criterion for an organization and institute
that requires a solid IT foundation. Hence, the need for efficient Linux administrators is a
requirement of the time. The job profile might change from each organization as there may
be added responsibilities and duties to the role. Below are some duties of a Linux System
Administrator:
Maintain all internet requests inclusive to DNS, RADIUS, Apache, MySQL, PHP.
Taking regular back up of data, create new stored procedures and listing back-up is one
of the duties.
Analyzing all error logs and fixing along with providing excellent customer support for
Webhosting, ISP and LAN Customers on troubleshooting increased support troubles.
Communicating with the staff, vendors, and customers in a cultivated, professional
manner at all times has to be one of his characteristics.
Enhance, maintain and create the tools for the Linux environment and its users.

37
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Detecting and solving the service problems ranging from disaster recovery to login
problems.
Installing the necessary systems and security tools. Working with the Data Network
Engineer and other personnel/departments to analyze hardware requirements and
makes acquiring recommendations.
Troubleshoot, when a problem occurs in the server.
Steps to Start the Career as Linux System Administrator:
Install and learn to use Linux environment.
Get Certified in Linux administration.
Learn to do Documentation.
Joining up with a local Linux Users Group or Community for Support and Help
In short, the main role of the Linux Systems Administrator is to manage the operations
like installing, observing the software and hardware systems and taking backup. And also
have a good ability to describe an In-depth understanding of technical knowledge. Even
freshman-level Professionals have great possibilities for the position of System
Administrator with the yearly median salary is around INR 3 Lacs, salary increase with an
increase in job experience. To get the experience you need to check for the latest skills
and learning in the Linux community.

Environment Variables
Environment variables or ENVs basically define the behavior of the environment. They
can affect the processes ongoing or the programs that are executed in the environment.

Scope of an environment variable

Scope of any variable is the region from which it can be accessed or over which it is
defined. An environment variable in Linux can have global or local scope.
Global
A globally scoped ENV that is defined in a terminal can be accessed from anywhere in
that particular environment which exists in the terminal. That means it can be used in all
kind of scripts, programs or processes running in the environment bound by that terminal.
Local
A locally scoped ENV that is defined in a terminal cannot be accessed by any program or
process running in the terminal. It can only be accessed by the terminal( in which it was
defined) itself.

How to access ENVs?

SYNTAX:
$NAME
NOTE: Both local and global environment variables are accessed in the same way.

How to display ENVs?

To display any ENV

SYNTAX:
$ echo $NAME
38
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To display all the Linux ENVs

SYNTAX:
$ printenv //displays all the global ENVs
or
$ set //display all the ENVs(global as well as local)
or
$ env //display all the global ENVs

How to set environment variables?

To set a global ENV


$ export NAME=Value
or
$ set NAME=Value
EXAMPLE:

To set a local ENV

SYNTAX:
$ NAME=Value
EXAMPLE:

To set user wide ENVs

These variable are set and configured in ~/.bashrc, ~/.bash_profile, ~/.bash_login,


~/.profile
files according to the requirement. These variables can be accessed by a particular user
and persist through power offs.
Following steps can be followed to do so:
Step 1: Open the terminal.
Step 2:
$ sudo vi ~/.bashrc
Step 3:Enter password.
Step 4:Add variable in the file opened.
export NAME=Value
Step 5: Save and close the file.
Step 6:
$ source ~/.bashrc
EXAMPLE:
39
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

To set system wide ENVs

These variable are set and configured in /etc/environment, /etc/profile, /etc/profile.d/,


/etc/bash.bashrc files according to the requirement.These variables can be accessed by
any user and persist through power offs.
Following steps can be followed to do so:
Step 1: Open the terminal.
Step 2:
$ sudo -H vi /etc/environment
Step 3:Enter password.
Step 4:Add variable in the file opened.
NAME=Value
Step 5: Save and close the file.
Step 6: Logout and Login again.

How to unset environment variables?

SYNTAX:
$ unset NAME
or
$ NAME=''
EXAMPLE:

NOTE: To unset permanent ENVs, you need to re-edit the files and remove the lines that
were added while defining them.

Some commonly used ENVs in Linux

$USER: Gives current user's name.


$PATH: Gives search path for commands.
$PWD: Gives the path of present working directory.
$HOME: Gives path of home directory.
$HOSTNAME: Gives name of the host.
$LANG: Gives the default system language.
$EDITOR: Gives default file editor.
$UID: Gives user ID of current user.
$SHELL: Gives location of current user's shell program.
EXAMPLE:

40
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Networking

Every computer is connected to some other computer through a network whether internally
or externally to exchange some information. This network can be small as some computers
connected in your home or office, or can be large or complicated as in large University or the
entire Internet.

Maintaining a system's network is a task of System/Network administrator. Their task


includes network configuration and troubleshooting.

Here is a list of Networking and Troubleshooting commands:

ifconfig Display and manipulate route and network interfaces.


ip It is a replacement of ifconfig command.
traceroute Network troubleshooting utility.
tracepath Similar to traceroute but doesn't require root privileges.
ping To check connectivity between two nodes.
netstat Display connection information.
ss It is a replacement of netstat.
dig Query DNS related information.
nslookup Find DNS related query.
route Shows and manipulate IP routing table.
host Performs DNS lookups.
arp View or add contents of the kernel's ARP table.
iwconfig Used to configure wireless network interface.
hostname To identify a network name.
curl or wget To download a file from internet.
mtr Combines ping and tracepath into a single command.
whois Will tell you about the website's whois.
ifplugstatus Tells whether a cable is plugged in or not.
Explanation of the above commands:
ifconfig: ifconfig is short for interface configurator. This command is utilized in network
inspection, initializing the interface, enabling or disabling an IP address, and configuring an
interface with an IP address. Also, it is used to show the network and route interface.
The basic details shown with ifconfig are:
o MTU
o MAC address
o IP address
Syntax:
1. Ifconfig

41
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
ip: It is the updated and latest edition of ifconfig command. The command provides the
information of every network, such as ifconfig. Also, it can be used to get information about a
particular interface.
Syntax:
1. ip a
2. ip addr
traceroute: The traceroute command is one of the most helpful commands in the
networking field. It's used to balance the network. It identifies the delay and decides the
pathway to our target. Basically, it aids in the below ways:
o It determines the location of the network latency and informs it.
o It follows the path to the destination.
o It gives the names and recognizes all devices on the path.
Syntax:
1. traceroute <destination>
tracepath: The tracepath command is the same as the traceroute command, and it is used to
find network delays. Besides, it does not need root privileges. By default, it comes pre-
installed in Ubuntu. It traces the path to the destination and recognizes all hops in it. It
identifies the point at which the network is weak if our network is not strong enough.
Syntax:
1. tracepath <destination>
ping: It is short for Packet Internet Groper. The ping command is one of the widely used
commands for network troubleshooting. Basically, it inspects the network connectivity
between two different nodes.
Syntax:
1. ping <destination>
netstat: It is short for network statistics. It gives statistical figures of many interfaces, which
contain open sockets, connection information, and routing tables.
Syntax:
1. Netstat
ss: This command is the substitution for the netstat command. The ss command is more
informative and much faster than netstat. The ss command's faster response is possible
because it fetches every information from inside the kernel userspace.
Syntax:
1. Ss
nsloopup: The nslookup command is an older edition of the dig command. Also, it is utilized
for DNS related problems.
Syntax:
1. nslookup <domainname>
42
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
dig: dig is short for Domain Information Groper. The dig command is an improvised edition
of the nslookup command. It is utilized in DNS lookup to reserve the DNS name server. Also,
it is used to balance DNS related problems. Mainly, it is used to authorize DNS mappings,
host addresses, MX records, and every other DNS record for the best DNS topography
understanding.
Syntax:
1. dig <domainname>
route: The route command shows and employs the routing table available for our system.
Basically, a router is used to detect a better way to transfer the packets around a destination.
Syntax:
1. Route
host: The host command shows the IP address for a hostname and the domain name for an
IP address. Also, it is used to get DNS lookup for DNS related issues.
Syntax:
1. host -t <resourceName>
arp: The arp command is short for Address Resolution Protocol. This command is used to see
and include content in the ARP table of the kernel.
Syntax:
1. Arp
iwconfig: It is a simple command which is used to see and set the system's hostname.
Syntax:
1. Hostname
curl and wget: These commands are used to download files from CLI from the internet. curl
must be specified with the "O" option to get the file, while wget is directly used.
curl Syntax:
1. curl -O <fileLink>
wget Syntax:
1. wget <fileLink>
mtr: The mtr command is a mix of the traceroute and ping commands. It regularly shows
information related to the packets transferred using the ping time of all hops. Also, it is used
to see network problems.
Syntax:
1. mtr <path>
whois: The whois command fetches every website related information. We can get every
information of a website, such as an owner and the registration information.
Syntax:
1. mtr <websiteName>

43
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
ifplugstatus: The ifplugstatus command checks whether a cable is currently plugged into a
network interface. It is not available in Ubuntu directly. We can install it with the help of the
below command:
1. sudo apt-get install ifplugd
Syntax:
1. Ifplugstatus
iftop: The iftop command is utilized in traffic monitoring.
tcpdump: The tcpdump command is widely used in network analysis with other commands
of the Linux network. It analyses the traffic passing from the network interface and shows it.
When balancing the network, this type of packet access will be crucial.
Syntax:
1. $ tcpdump -i <network_device>

Linux Server Installation


Installing a Linux server involves several steps, and the exact process can vary depending on
the distribution you choose. Here's a general step-by-step guide to installing a Linux server:

1. Choose a Linux Distribution: Select a Linux distribution suitable for server environments, such
as Ubuntu Server, CentOS, Debian, or Fedora Server. Visit the official website of the chosen
distribution and download the ISO image for the server edition.
2. Create Installation Media: Create a bootable installation media, such as a USB drive or DVD,
using the downloaded ISO image. You can use tools like Rufus (for Windows) or Etcher (for
Windows, macOS, and Linux) to create the bootable media.
3. Boot from Installation Media: Insert the installation media into the server's appropriate drive
(USB port or DVD drive) and restart the server. Ensure that the server is set to boot from the
installation media. You may need to change the boot order in the server's BIOS or UEFI
settings.
4. Start the Installation: Once the server boots from the installation media, you will be presented
with the Linux distribution's installer. Follow the on-screen instructions to proceed with the
installation.
5. Language and Keyboard Settings: Choose the language and keyboard layout for the
installation process.
6. Disk Partitioning: Select the disk or partition where you want to install the Linux server. You
can choose automatic partitioning or manual partitioning based on your requirements. If you
are unsure, automatic partitioning is generally recommended for beginners.
7. Set Hostname and Network Configuration: Provide a hostname for your server and configure
the network settings, including IP address, subnet mask, gateway, and DNS information. You
can choose DHCP if you want the server to obtain network settings automatically.
8. Set Time Zone: Select the appropriate time zone for your server.
9. Create User and Set Password: Create a user account for administering the server. Set a
strong password for the user.

44
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
10. Software Selection: Choose the packages and software you want to install on the server. For a
basic server setup, you can select options like SSH server, basic utilities, and possibly a web
server or database server if needed.
11. Begin Installation: Once you have configured the necessary settings, proceed with the
installation process. The installer will copy files, install packages, and configure the server.
12. Reboot and Login: After the installation completes, you will be prompted to reboot the
server. Remove the installation media and reboot the server. Once the server restarts, you can
log in with the user account you created during the installation process.
13. Post-Installation Configuration: After logging in, you may need to perform additional
configuration steps, such as updating packages, configuring firewall rules, setting up
additional services, and securing the server. Refer to the documentation and best practices
for your chosen distribution to ensure a secure and optimized server configuration.

Remember to consult the official documentation and guides specific to your chosen Linux
distribution for any distribution-specific installation instructions or recommendations.

RPM and YUM Installation


RPM (Red Hat Package Manager) and YUM (Yellowdog Updater Modified) are package
management systems used in Linux distributions like Red Hat, CentOS, and Fedora. Here's a
step-by-step guide on how to install software using RPM and YUM:

Installing Software with RPM:

1. Download the RPM Package: Locate the RPM package you want to install from a trusted
source or the official repository.
2. Open Terminal: Open a terminal or command-line interface on your Linux server.
3. Navigate to the Directory: Use the cd command to navigate to the directory where the RPM
package is located. For example:
bashCopy code

cd /path/to/package/directory

4. Install the RPM Package: Use the rpm command with the -i flag to install the RPM package.
For example:
cssCopy code

rpm - i package_name .rpm

5. Verify the Installation: After the installation completes, you can verify the installation by
running commands specific to the software you installed. Refer to the software's
documentation for the relevant commands.

Installing Software with YUM:

1. Open Terminal: Open a terminal or command-line interface on your Linux server.

45
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
2. Update the System: It's a good practice to update the system before installing any new
software. Run the following command to update the package repositories:
sqlCopy code

sudo yum update

3. Search for the Package: Use the yum search command to search for the package you want to
install. For example, to search for the package named "package_name," use the following
command:
sqlCopy code

yum search package_name

4. Install the Package: Once you find the package you want to install, use the yum install
command to install it. For example, to install the package named "package_name," use the
following command:
Copy code

sudo yum install package_name

5. Confirm Installation: YUM will prompt you to confirm the installation by displaying the
package details and asking for confirmation. Type 'y' and press Enter to proceed with the
installation.
6. Verify the Installation: After the installation completes, you can verify the installation by
running commands specific to the software you installed. Refer to the software's
documentation for the relevant commands.

Note: When using YUM, it automatically resolves dependencies and installs any necessary
dependencies for the software you want to install.

Remember to run the commands with administrative privileges (using sudo) to install
packages system-wide.

46
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 2: Version Control-GIT
Introduction to GIT
Git is a popular version control system. It was created by Linus Torvalds in 2005, and has been
maintained by Junio Hamano since then.
It is used for:
Tracking code changes
Tracking who made changes
Coding collaboration
What does Git do?
Manage projects with Repositories
Clone a project to work on a local copy
Control and track changes with Staging and Committing
Branch and Merge to allow for work on different parts and versions of a project
Pull the latest version of the project to a local copy
Push local updates to the main project
Working with Git
Initialize Git on a folder, making it a Repository
Git now creates a hidden folder to keep track of changes in that folder
When a file is changed, added or deleted, it is considered modified
You select the modified files you want to Stage
The Staged files are Committed, which prompts Git to store a permanent snapshot of the
files
Git allows you to see the full history of every commit.
You can revert back to any previous commit.
Git does not store a separate copy of every file in every commit, but keeps track of changes
made in each commit!
Why Git?
Over 70% of developers use Git!
Developers can work together from anywhere in the world.
Developers can see the full history of the project.
Developers can revert to earlier versions of a project.
What is GitHub?
Git is not the same as GitHub.
GitHub makes tools that use Git.
GitHub is the largest host of source code in the world, and has been owned by Microsoft since
2018.
In this tutorial, we will focus on using Git with GitHub.
47
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

What is Git
Git is the most commonly used version control system. Git tracks the changes you make to files, so
you have a record of what has been done, and you can revert to specific versions should you ever
need to. Git also makes collaboration easier, allowing changes by multiple people to all be merged
into one source.
So regardless of whether you write code that only you will see, or work as part of a team, Git will be
useful for you.
Git is software that runs locally. Your files and their history are stored on your computer. You can
also use online hosts (such as GitHub or Bitbucket) to store a copy of the files and their revision
history. Having a centrally located place where you can upload your changes and download changes
from others, enable you to collaborate more easily with other developers. Git can automatically
merge the changes, so two people can even work on different parts of the same file and later merge
those changes without losing each other‘s work!
Ways to Use Git
Git is software that you can access via a command line (terminal), or a desktop app that has a GUI
(graphical user interface) such as Sourcetree shown below.

Git Repositories
A Git repository (or repo for short) contains all of the project files and the entire revision history.
You‘ll take an ordinary folder of files (such as a website‘s root folder), and tell Git to make it a
repository. This creates a .git subfolder, which contains all of the Git metadata for tracking changes.
On Unix-based operating systems such as macOS, files and folders that start with a period (.) are
hidden, so you will not see the .git folder in the macOS Finder unless you show hidden files, but it‘s
there! You might be able to see it in some code editors.
Stage & Commit Files

48
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Think of Git as keeping a list of changes to files. So how do we tell Git to record our changes? Each
recorded change to a file or set of files is called a commit.
Before we make a commit, we must tell Git what files we want to commit. This is called staging and
uses the add command. Why must we do this? Why can‘t we just commit the file directly? Let‘s say
you‘re working on a two files, but only one of them is ready to commit. You don‘t want to be forced
to commit both files, just the one that‘s ready. That‘s where Git‘s add command comes in. We add
files to a staging area, and then we commit the files that have been staged.
Remote Repositories (on GitHub & Bitbucket)
Storing a copy of your Git repo with an online host (such as GitHub or Bitbucket) gives you a
centrally located place where you can upload your changes and download changes from others,
letting you collaborate more easily with other developers. After you have a remote repository set up,
you upload (push) your files and revision history to it. After someone else makes changes to a remote
repo, you can download (pull) their changes into your local repo.

Branches & Merging


Git lets you branch out from the original code base. This lets you more easily work with other
developers, and gives you a lot of flexibility in your workflow.
Here‘s an example of how Git branches are useful. Let‘s say you need to work on a new feature for a
website. You create a new branch and start working. You haven‘t finished your new feature, but you
get a request to make a rush change that needs to go live on the site today. You switch back to the
master branch, make the change, and push it live. Then you can switch back to your new feature
branch and finish your work. When you‘re done, you merge the new feature branch into the master
branch and both the new feature and rush change are kept!
When you merge two branches (or merge a local and remote branch) you can sometimes get
a conflict. For example, you and another developer unknowingly both work on the same part of
a file. The other developer pushes their changes to the remote repo. When you then pull them to your
local repo you‘ll get a merge conflict. Luckily Git has a way to handle conflicts, so you can see both
sets of changes and decide which you want to keep.

49
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Pull Requests
Pull requests are a way to discuss changes before merging them into your codebase. Let‘s say you‘re
managing a project. A developer makes changes on a new branch and would like to merge that
branch into the master. They can create a pull request to notify you to review their code. You can
discuss the changes, and decide if you want to merge it or not.

About Version Control System and Types


As we know that a software product is developed in collaboration by a group of developers they
might be located at different locations and each one of them contributes to some specific kind of
functionality/features. So in order to contribute to the product, they made modifications to the
source code(either by adding or removing). A version control system is a kind of software that helps
the developer team to efficiently communicate and manage(track) all the changes that have been
made to the source code along with the information like who made and what changes have been
made. A separate branch is created for every contributor who made the changes and the changes
aren‘t merged into the original source code unless all are analyzed as soon as the changes are green
signaled they merged to the main source code. It not only keeps source code organized but also
improves productivity by making the development process smooth.
Basically Version control system keeps track on changes made on a particular software and take a
snapshot of every modification. Let‘s suppose if a team of developer add some new functionalities
in an application and the updated version is not working properly so as the version control system
keeps track of our work so with the help of version control system we can omit the new changes and
continue with the previous version.
Benefits of the version control system:
Enhances the project development speed by providing efficient collaboration,
Leverages the productivity, expedites product delivery, and skills of the employees through
better communication and assistance,
Reduce possibilities of errors and conflicts meanwhile project development through traceability
to every small change,
Employees or contributors of the project can contribute from anywhere irrespective of the
different geographical locations through this VCS,
For each different contributor to the project, a different working copy is maintained and not
merged to the main file unless the working copy is validated. The most popular example is Git,
Helix core, Microsoft TFS,
Helps in recovery in case of any disaster or contingent situation,
Informs us about Who, What, When, Why changes have been made.
Use of Version Control System:
A repository: It can be thought of as a database of changes. It contains all the edits and
historical versions (snapshots) of the project.
Copy of Work (sometimes called as checkout): It is the personal copy of all the files in a
project. You can edit to this copy, without affecting the work of others and you can finally
commit your changes to a repository when you are done making your changes.
Working in a group: Consider yourself working in a company where you are asked to work on
some live project. You can‘t change the main code as it is in production, and any change may
cause inconvenience to the user, also you are working in a team so you need to collaborate with

50
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
your team to and adapt their changes. Version control helps you with the, merging different
requests to main repository without making any undesirable changes. You may test the
functionalities without putting it live, and you don‘t need to download and set up each time, just
pull the changes and do the changes, test it and merge it back. It may be visualized as.

Types of Version Control Systems:


Local Version Control Systems
Centralized Version Control Systems
Distributed Version Control Systems
Local Version Control Systems: It is one of the simplest forms and has a database that kept all the
changes to files under revision control. RCS is one of the most common VCS tools. It keeps patch
sets (differences between files) in a special format on disk. By adding up all the patches it can then
re-create what any file looked like at any point in time.
Centralized Version Control Systems: Centralized version control systems contain just one
repository globally and every user need to commit for reflecting one‘s changes in the repository. It
is possible for others to see your changes by updating.
Two things are required to make your changes visible to others which are:
You commit
They update

51
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

The benefit of CVCS (Centralized Version Control Systems) makes collaboration amongst
developers along with providing an insight to a certain extent on what everyone else is doing on the
project. It allows administrators to fine-grained control over who can do what.
It has some downsides as well which led to the development of DVS. The most obvious is the single
point of failure that the centralized repository represents if it goes down during that period
collaboration and saving versioned changes is not possible. What if the hard disk of the central
database becomes corrupted, and proper backups haven‘t been kept? You lose absolutely
everything.
Distributed Version Control Systems: Distributed version control systems contain multiple
repositories. Each user has their own repository and working copy. Just committing your changes
will not give others access to your changes. This is because commit will reflect those changes in
your local repository and you need to push them in order to make them visible on the central
repository. Similarly, When you update, you do not get others‘ changes unless you have first pulled
those changes into your repository.
To make your changes visible to others, 4 things are required:
You commit
You push
They pull
They update
The most popular distributed version control systems are Git, and Mercurial. They help us overcome
the problem of single point of failure.

52
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Purpose of Version Control:


Multiple people can work simultaneously on a single project. Everyone works on and edits their
own copy of the files and it is up to them when they wish to share the changes made by them
with the rest of the team.
It also enables one person to use multiple computers to work on a project, so it is valuable even
if you are working by yourself.
It integrates the work that is done simultaneously by different members of the team. In some rare
cases, when conflicting edits are made by two people to the same line of a file, then human
assistance is requested by the version control system in deciding what should be done.
Version control provides access to the historical versions of a project. This is insurance against
computer crashes or data loss. If any mistake is made, you can easily roll back to a previous
version. It is also possible to undo specific edits that too without losing the work done in the
meanwhile. It can be easily known when, why, and by whom any part of a file was edited.

Difference between CVCS and DVCS


Central Version Control System (CVCS)
In CVCS, the central server stores all the data. This central server enables team collaboration. It just
contains a single repository, and each user gets their working copy. We need to commit, so the
changes get reflected in the repository. Others can check our changes by updating their local copy.

53
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Benefits of CVCS
Easy to learn and manage
Works well with binary files
More control over users and their access.
CVS and SVN are some conventional Central Version Control systems.

Drawbacks of CVCS
It is not locally available, which means we must connect to the network to perform operations.
During the operations, if the central server gets crashed, there is a high chance of losing the
data.
For every command, CVCS connects the central server which impacts speed of operation
The Distributed Version Control System is developed to overcome all these issues.

Distributed Version Control System (DVCS)

In DVCS, there is no need to store the entire data on our local repository. Instead, we can have a clone
of the remote repository to the local. We can also have a full snapshot of the project history.

54
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

The User needs to update for the changes to be reflected in the local repository. Then the user can
push the changes to the central repository. If other users want to check the changes, they will pull the
updated central repository to their local repository, and then they update in their local copy.

Benefits of DVCS
Except for pushing and pulling the code, the user can work offline in DVCS
DVCS is fast compared to CVCS because you don't have to contact the central server for
every command
Merging and branching the changes in DVCS is very easy
Performance of DVCS is better
Even if the main server crashes, code will be stored in the local systems

Git and Mercurial are standard distributed version central systems. If we don‘t want a DVCS on our
server, we can use either GitHub or BitBucket to store our central repository, and we can get the clone
of the central repository to our local systems. GitHub and BitBucket are the most popular companies
that provide cloud hosting for software development version control using Git.

DVCS is critical for DevOps because of the following reasons:


Avoids dependency issues in modern containerized applications (Micro Services)
Improves the performance of DevOps SDLC
Supports in building more reliable applications

55
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Version control is a fundamental concept to most of the companies. But its crucial role in DevOps
cannot be overlooked.

A short history of GIT


Before version control systems software developers did not have an efficient way to collaborate on
their code. Software developers had a hectic time while trying to work on the same code at the same
time. They improvised by mailing each other code for sharing, they stored their code on USB sticks
and physical floppy disks as backups, they made sure to work in small teams and work in different
parts of a system it was manageable for small projects but people needed large systems that could
suit their needs. These challenges led to the need for a version control system that developers could
effectively collaborate on code and keep backups of various versions of a project.
Birth of Git
Until April 2005 Linus Torvalds was using BitKeeper for version control of the Linux Kernel
development. He had a large number of volunteer developers working on the Linux Kernel and their
contributions had to be managed. BitKeeper was a nice tool for managing the enormous contribution
by the developers. The Linux developers used the tool for free after an agreement between the two
parties as BitKeeper was a proprietary source control management system which means you had to
pay for the use of the tool. There came a conflict of interest after Andrew Tridgell created an open-
source client for accessing the Bitkeeper version control system by reverse-engineering the
BitKeeper protocols. This caused the copyright holder to withdrawal the free-to-use policy that they
had earlier agreed upon. Many developers of the Linux kernel gave up access to the BitKeeper.
Linux knew he had to act fast to replace the version control system that he knew and loved so he
took a working vacation to decide on what to do as the current free-to-use version control systems
could not solve his problems at the time. The result of his vacation was the birth of a new version
control system named Git.
He had some goals in mind on how to make the next version control system that could manage a
large project like his own. He set out to build a version control system that was the complete
opposite of Concurrent Versions System (CVS), which could support distributed Version Control
system just like BitKeeper and one that Included very strong safeguards against corruption, either
accidental or malicious. The initial development of Git began in 2005 on 3 April. On 6 April
announcement of the project took place and became self-hosting the next day. Later on that year
Linux Torvalds achieved its performance goal after a benchmark was performed and it managed the
kernel 2.6.12 release. Since 2005 on 26 July maintenance was turned over to Junio Hamano who
was a major contributor to the project (responsible for the 1.0 release ) and remains the project‘s
core maintainer.
Several other volunteer contributors were fully employed by the company to make improvements git
such are Jeff King who started contributing while he was a student, Shawn Pearce opened up git to
the android and java ecosystem with his work on JGit, and Johannes Schindelin for opening up git to
the windows community on his work on git for windows. In late 2007 Preston-Werner teamed up
with Chris Wanstrath, Scott Chacon, and P.J. Hyett to start developing GitHub after Tom Preston -
Werner was introduced to Git by a coworker. He saw the need to offer source-code hosting based on
git and a modern web interface. Today GitHub has more developers than its competitors. Google
was the first adopter of git for their Linux-based operating system, Android in march 2009.
Git at the time was not considered able to manage such a huge project that consisted of many
developers around the world to work on a single open-source project. so they built Repo which was
not meant to replace git but to make it easier to use Git. Microsoft followed suit several years later
56
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
though they were known to despise open-source tools. However there was a cultural shift at the
company and they started embracing open source by contributing to libgit2, a library of Git
development resources, to help speed up Git applications but the major boost to git popularity by
Microsoft was 2017 when the entire development effort for the Microsoft windows suite of products
moved to git which created the worlds largest git repository. Microsoft will later on June 2018
acquire GitHub for $7.5 billion in Microsoft stock. this took the development community by
surprise as Microsoft before was known to be against open-source which made many developers
suspicious and migrate to other platforms.

What is Git About?


Git is a free and open-source distributed version control system designed to handle everything
from small to very large projects with speed and efficiency.
Git relies on the basis of distributed development of software where more than one developer
may have access to the source code of a specific application and can modify changes to it that
may be seen by other developers.
Initially designed and developed by Linus Torvalds for Linux kernel development in 2005.
Every git working directory is a full-fledged repository with complete history and full version
tracking capabilities, independent of network access or a central server.
Git allows a team of people to work together, all using the same files. And it helps the team cope
up with the confusion that tends to happen when multiple people are editing the same files.
Characteristics of Git
1. Strong support for non-linear development
2. Distributed development
3. Compatibility with existing systems/protocol
4. Efficient handling of large projects
5. Data Assurance
6. Automatic Garbage Collection
7. Periodic explicit object packing
Future of Git
Today git and GitHub are taking over the world as many developers are adopting git and GitHub for
version control. There are about 56 million developers according to statistics. Developers are really
changing the world and Git and GitHub are part of the story. Software development has a bright
future and I am thrilled to see what Microsoft is a software company that will do with GitHub in the
coming years.
GIT Basics
Version Control System
Version Control System (VCS) is a software that helps software developers to work together and
maintain a complete history of their work.
Listed below are the functions of a VCS −
Allows developers to work simultaneously.
Does not allow overwriting each other‘s changes.

57
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maintains a history of every version.
Following are the types of VCS −
Centralized version control system (CVCS).
Distributed/Decentralized version control system (DVCS).
In this chapter, we will concentrate only on distributed version control system and especially on Git.
Git falls under distributed version control system.
Distributed Version Control System
Centralized version control system (CVCS) uses a central server to store all files and enables team
collaboration. But the major drawback of CVCS is its single point of failure, i.e., failure of the central
server. Unfortunately, if the central server goes down for an hour, then during that hour, no one can
collaborate at all. And even in a worst case, if the disk of the central server gets corrupted and proper
backup has not been taken, then you will lose the entire history of the project. Here, distributed
version control system (DVCS) comes into picture.
DVCS clients not only check out the latest snapshot of the directory but they also fully mirror the
repository. If the server goes down, then the repository from any client can be copied back to the
server to restore it. Every checkout is a full backup of the repository. Git does not rely on the central
server and that is why you can perform many operations when you are offline. You can commit
changes, create branches, view logs, and perform other operations when you are offline. You require
network connection only to publish your changes and take the latest changes.
Advantages of Git
Free and open source
Git is released under GPL‘s open source license. It is available freely over the internet. You can use
Git to manage property projects without paying a single penny. As it is an open source, you can
download its source code and also perform changes according to your requirements.
Fast and small
As most of the operations are performed locally, it gives a huge benefit in terms of speed. Git does not
rely on the central server; that is why, there is no need to interact with the remote server for every
operation. The core part of Git is written in C, which avoids runtime overheads associated with other
high-level languages. Though Git mirrors entire repository, the size of the data on the client side is
small. This illustrates the efficiency of Git at compressing and storing data on the client side.
Implicit backup
The chances of losing data are very rare when there are multiple copies of it. Data present on any
client side mirrors the repository, hence it can be used in the event of a crash or disk corruption.
Security
Git uses a common cryptographic hash function called secure hash function (SHA1), to name and
identify objects within its database. Every file and commit is check-summed and retrieved by its
checksum at the time of checkout. It implies that, it is impossible to change file, date, and commit
message and any other data from the Git database without knowing Git.
No need of powerful hardware
In case of CVCS, the central server needs to be powerful enough to serve requests of the entire team.
For smaller teams, it is not an issue, but as the team size grows, the hardware limitations of the server
can be a performance bottleneck. In case of DVCS, developers don‘t interact with the server unless

58
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
they need to push or pull changes. All the heavy lifting happens on the client side, so the server
hardware can be very simple indeed.
Easier branching
CVCS uses cheap copy mechanism, If we create a new branch, it will copy all the codes to the new
branch, so it is time-consuming and not efficient. Also, deletion and merging of branches in CVCS is
complicated and time-consuming. But branch management with Git is very simple. It takes only a few
seconds to create, delete, and merge branches.
DVCS Terminologies
Local Repository
Every VCS tool provides a private workplace as a working copy. Developers make changes in their
private workplace and after commit, these changes become a part of the repository. Git takes it one
step further by providing them a private copy of the whole repository. Users can perform many
operations with this repository such as add file, remove file, rename file, move file, commit changes,
and many more.
Working Directory and Staging Area or Index
The working directory is the place where files are checked out. In other CVCS, developers generally
make modifications and commit their changes directly to the repository. But Git uses a different
strategy. Git doesn‘t track each and every modified file. Whenever you do commit an operation, Git
looks for the files present in the staging area. Only those files present in the staging area are
considered for commit and not all the modified files.
Let us see the basic workflow of Git.
Step 1 − You modify a file from the working directory.
Step 2 − You add these files to the staging area.
Step 3 − You perform commit operation that moves the files from the staging area. After push
operation, it stores the changes permanently to the Git repository.

Suppose you modified two files, namely ―sort.c‖ and ―search.c‖ and you want two different commits
for each operation. You can add one file in the staging area and do commit. After the first commit,
repeat the same procedure for another file.
# First commit

59
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[bash]$ git add sort.c

# adds file to the staging area


[bash]$ git commit –m ―Added sort operation‖

# Second commit
[bash]$ git add search.c

# adds file to the staging area


[bash]$ git commit –m ―Added search operation‖
Blobs
Blob stands for Binary Large Object. Each version of a file is represented by blob. A blob holds the
file data but doesn‘t contain any metadata about the file. It is a binary file, and in Git database, it is
named as SHA1 hash of that file. In Git, files are not addressed by names. Everything is content-
addressed.
Trees
Tree is an object, which represents a directory. It holds blobs as well as other sub-directories. A tree is
a binary file that stores references to blobs and trees which are also named as SHA1 hash of the tree
object.
Commits
Commit holds the current state of the repository. A commit is also named by SHA1 hash. You can
consider a commit object as a node of the linked list. Every commit object has a pointer to the parent
commit object. From a given commit, you can traverse back by looking at the parent pointer to view
the history of the commit. If a commit has multiple parent commits, then that particular commit has
been created by merging two branches.
Branches
Branches are used to create another line of development. By default, Git has a master branch, which is
same as trunk in Subversion. Usually, a branch is created to work on a new feature. Once the feature is
completed, it is merged back with the master branch and we delete the branch. Every branch is
referenced by HEAD, which points to the latest commit in the branch. Whenever you make a commit,
HEAD is updated with the latest commit.
Tags
Tag assigns a meaningful name with a specific version in the repository. Tags are very similar to
branches, but the difference is that tags are immutable. It means, tag is a branch, which nobody intends
to modify. Once a tag is created for a particular commit, even if you create a new commit, it will not
be updated. Usually, developers create tags for product releases.
Clone
Clone operation creates the instance of the repository. Clone operation not only checks out the
working copy, but it also mirrors the complete repository. Users can perform many operations with
this local repository. The only time networking gets involved is when the repository instances are
being synchronized.
60
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Pull
Pull operation copies the changes from a remote repository instance to a local one. The pull operation
is used for synchronization between two repository instances. This is same as the update operation in
Subversion.
Push
Push operation copies changes from a local repository instance to a remote one. This is used to store
the changes permanently into the Git repository. This is same as the commit operation in Subversion.
HEAD
HEAD is a pointer, which always points to the latest commit in the branch. Whenever you make a
commit, HEAD is updated with the latest commit. The heads of the branches are stored
in .git/refs/heads/ directory.
[CentOS]$ ls -1 .git/refs/heads/
master
[CentOS]$ cat .git/refs/heads/master
570837e7d58fa4bccd86cb575d884502188b0c49
Revision
Revision represents the version of the source code. Revisions in Git are represented by commits.
These commits are identified by SHA1 secure hashes.
URL
URL represents the location of the Git repository. Git URL is stored in config file.
[tom@CentOS tom_repo]$ pwd
/home/tom/tom_repo
[tom@CentOS tom_repo]$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = [email protected]:project.git
fetch = +refs/heads/*:refs/remotes/origin/*

61
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
GIT Command Line
Git in the Command Line
What is Git?
Git is a free and open source control system used for managing projects on Github. It offers
many more commands and much more flexibility than Github’s online interface does.
Note: This tutorial assumes some knowledge of the basic commands of Github. To refresh
on those, refer to GitHub Basics.
Installing Git
To install Git, follow the instructions on this page. For Windows, when installing Git through
the installer, it is recommended you select the ―Use Git from the Windows Command
Prompt‖ option. This will allow you to use all git commands through your terminal (CMD,
PowerShell, Anaconda) rather than having to use Git’s personal terminal, Git Bash.
Using the Command Line
If you are already familiar with using the command prompt, feel free to skip this section.
The command prompt/terminal is another way of interfacing with your computer, rather than
the way you typically would use a computer by clicking different buttons. While the terminal
can be confusing at first, and requires some memorization of some commands, it provides a
lot of power for using your computer in different ways. Knowing how to use the terminal
opens a lot of new doors and can ultimately make using your computer much easier and
more accessible.
Note that there are a few terminal options for Windows, such as CMD, Windows PowerShell,
and the Anaconda Prompt. This tutorial will use PowerShell, as it is most similar to the Mac
terminal, and has all the necessary functionality. However, all the other terminal options
should accomplish everything you want, just with slightly different commands.
To open your terminal in Windows, search for ―PowerShell‖ in your programs. On Mac, just
search for ―Terminal‖ in your programs. A prompt like the one below should open up.

Within this prompt, you will run commands by typing them directly into this prompt and hitting
―Enter‖. The path listed before the where you type is the directory you are currently working
in on your computer. In order to run commands on a specific folder (such as your subteams
repository) you will need to navigate to that folder. The two commands to do this are:
ls (list): Lists all files in the current directory you are in
cd (change directory): changes your directory to the directory listed after cd
For example, say I want to move to the ―aguaclara_demo‖ repository to make some git
changes. First I use ls to see what files I can change to:

62
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

I see the CS folder, so I use cd CS to move to that folder. From there I continue to use these
commands until I find the folder I am looking for.
If you ever need to move up one folder, you can use cd .. to accomplish this. If you know
the file path of the folder you want, you can also add that directly to move to your desired
folder. For example, I could move directly to the aguaclara_demo folder as shown below:

One other important note for using the terminal is to always wait for the commands to finish
running. Sometimes when running a more complicated command, the computer will take a
while to run, and the terminal will slowly show commands as they run. In this case, make
sure to wait until all the commands are done running, and you can see the blinking cursor
before you type another command. If you ever want to stop a command while it’s running,
hold down the control button (or the command button on Mac) and hit ―c‖.

Using Git in the command line


Cloning a Repository
To clone a repository from GitHub to your local computer, you can use the command line to
accomplish this task. Let’s all clone the aguaclara_tutorial repository so that you get the hang
of it. First go to the repository and click on the green Clone or Download button. Copy the URL
shown.
Then, navigate to your command line. Go to the folder you would like your repository to be
stored on your local computer. Then, type in git clone <INSERT URL> . Wait until the command
line finishes cloning the repository. Then you’re done!
Assuming you added Git to your command line during installation, you can run any Git
command from the command line just by typing git adding a space, and writing the
command. For example, to use the simple command git status , which compares your local

63
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
progress to the digital collection of Github, just type ―git status‖ while in the folder you want to
check.

Pulling
Most of what you will be doing with Git is pulling and pushing changes from Github. To pull,
just use the command git pull .

Pushing
To push your local changes, first stage your changes, then commit them to your branch, and
then push them to the origin.
To stage your changes, use git add -A The -A ensures you add all of your files you have
worked on.

To commit your changes, use the command git commit -m "Commit Message" and fill in the
commit message with whatever you want to say about your commit. Note that it is very
important to include the -m and the commit message. If you do not, Git will take you to an
interface using the text editor Vim, which is very challenging to use.

64
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

If you happen to accidentally type git commit without the -m and the commit message and
get taken to Vim, you can still write your commit message. Use your arrow keys to scroll
up to the top line where it is blank. Write your commit message, then to exit out of this
editor, press Escape. You cursor should appear in the bottom left corner. From there
type :x and hit enter to save your commit message.

Finally, to push your changes, use git push . If you have any merge errors, the terminal
will notify you and you can fix them manually.

Installing Git
Installing on Linux
Installing on Windows
Initial setup
Git Essentials
Creating repository
Creating a Repository
Create a folder on your desktop and name it "My Repo."
Create a New Folder
Next, create a text file containing this line of code in the folder:
65
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1print(“Hello World!”)
python
Name the file hello_world.py.
Adding Code to the Repo
You now have some code in a folder. Next, you'll turn it into a repository.
In the command line/terminal, go to the ―My Repo‖ folder.
Command Line in Folder Location
Before using Git, you’ll need to initialize, or configure, it by specifying who you are. This will
be used in the version history so it’s clear who is making the contribution to the code. Run
the command:
1git config –global user.email “your email here”
bash
The quotation marks are not needed as part of the actual command you will run.
Configuring Your User
Now, programmatically convert the folder into a repository, or initialize the repository. This is
simple to do. Simply run:
1git init
bash
Congratulations! You have successfully initialized a repo. Now, add your code into it.
Initializing Your Repository Using Git Init

Cloning, check-in and committing


"Cloning" in simple English means producing identical individuals either naturally or
artificially. If you are familiar with the term, there is no surprise what is going to be in
this tutorial. Before coming on to What is Cloning in Git or Git Clone, I hope the reader
is well-versed with the process of Forking in GitHub.

Sometimes, non-technical people or the people who have not yet worked on Git
consider these two terms (Git Clone & Git Fork) as similar. Actually, they are, but with
some differences. It is better to rinse your brain with forking before learning the concept
of cloning in Git.

Also, since the basics of Git and GitHub have already been covered in this course, from
now on we will use both of them to perform the operations and procedures on our
code/files. On the greater circle, this tutorial will make you familiar with:

What is Cloning?
Purpose of Cloning
Importance of Cloning in Git
Cloning a repository and Git Clone command

66
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
What is Git Clone or Cloning in Git?
Cloning is a process of creating an identical copy of a Git Remote Repository to the local machine.

Now, you might wonder, that is what we did while forking the repository!!

When we clone a repository, all the files are downloaded to the local machine but the remote git
repository remains unchanged. Making changes and committing them to your local repository (cloned
repository) will not affect the remote repository that you cloned in any way. These changes made on
the local machine can be synced with the remote repository anytime the user wants.

Why Clone a Repository?


As mentioned in the previous sections, cloning downloads the complete source code to the local
system, let's go through the real reasons why cloning is required ar the first place:

Contribute to Organizational Projects: A centralized system is required for organizations


where multiple people work on the same code base. Cloning helps us achieve this motive. By
cloning, people can edit the project code to either fix some issue or provide some modifications
i.e. an extra or extended feature. This definitely helps in producing better software in less time
with greater collaboration.
Make use of Open Source Repositories: A famous idiom in English, "Do not reinvent the
wheel" is suitable for this point to understand. Similarly, if someone wants to use some
functionality that has already been developed by someone else, then why to code it from
scratch and waste time & resources? For e.g. there are unlimited opensource repositories are
available, which can directly fit into the projects.

Since cloning is so important part of the Git and GitHub journey, it is better to see in detail how
cloning works. It is a very simple and straightforward process to which the next section is dedicated.

67
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
How does Cloning in Git works?
A lot of people want to set up a shared repository to allow a team of developers to publish their code
on GitHub / GitLab / BitBucket etc. A repository that is uploaded online for collaboration is called
an Upstream Repository or a Central Repository.

A central repository indicates that all the changes from all the contributors pushed into this repository
only. So, this is the most updated repository instance of itself. Sometimes this is often called
the original repository. Now, the image given below is pretty clear about the concept of cloning.

With respect to the above image, the cloning process works in these steps:

Clone a Repository: The user starts from the upstream repository on GitHub. Since the user
navigated to the repository because he/she is interested in the concept and they like to
contribute. The process starts from cloning when they clone the repository it into their local
machine. Now they have the exact copy of the project files on their system to make the changes.
Make the desired changes: After cloning, contributors provide their contribution to the
repository. Contribution in the form of editing the source files resulting in either a bug fix or
adding functionality or maybe optimizing the code. But the bottom line is, everything happens
on their local system.
Pushing the Changes: Once the changes are done and now the modifications can be pushed to
the upstream repository.

68
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Fetch pull and remote
Git Fetch is the command that tells the local repository that there are changes
available in the remote repository without bringing the changes into the local
repository. Git Pull on the other hand brings the copy of the remote directory
changes into the local repository. Let us look at Git Fetch and Git Pull separately
with the help of an example.
Git Fetch
Let us create a file called demo.txt with ―Hello Geeks” content inside it initialize the
directory to a git repository and push the changes to a remote repository.
git init
git add <Filename>
git commit -m <Commit Message>
git remote add origin <Link to your remote repository>
git push origin <branch name>

69
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Now, we have my demo.txt in the remote repository.

The local and the remote repositories are now in sync and have the same content at
both places. Let’s now update our demo.txt in the remote repository.

70
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Now since we have updated our demo.txt remotely, let’s bring the changes to our
local repository. Our local repository has only 1 commit while the remote repository
now has 2 commits (observe the second commit starting from 4c4fcb8). Let’s use
the git fetch command to see in the local repository whether we have a change in
the remote repository or not. Before that let’s use the git log command to see our
previous commits.

We can see that after using git fetch we get the information that there is some
commit done in the remote repository. (notice the 4c4fcb8 which is the initials of our
2nd commit in a remote repository). To merge these changes into our local
repository, we need to use the git merge origin/<branch name> command.

Let us have a look at our commits in the local repository using the git log command.
And we got our remote repository commit in our local repository. This is how git
fetch works. Let us now have a look at the git pull command.
Git Pull
Let’s make more changes to our demo.txt file at the remote repository.

71
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Now, we have 3 commits at our remote repository whereas 2 commits at our local
repository. (Notice the third commit starting with 09d828f). Let us now bring this
change to our local repository using the git pull origin <branch name> command.
We can see that with the help of just git pull command we directly fetched and
merged our remote repository with the local repository.
git pull = git fetch + git merge
Let us see what our demo.txt in the local repository looks like –
And now our remote and local repositories are again in sync with each other. So,
from the above examples, we can conclude that –

Difference Table

Git Fetch Git Pull


Gives the information of a new change Brings the copy of all the changes from a remote
from a remote repository without merging repository and merges them into the current branch
into the current branch
Repository data is updated in the .git The local repository is updated directly
directory
Review of commits and changes can be Updates the changes to the local repository
done immediately.
No possibility of merge conflicts. Merge conflicts are possible if the remote and the
local repositories have done changes at the same
place.
Command for Git fetch is git Command for Git Pull is git pull<remote><branch>
fetch<remote>
Git fetch basically imports the commits to Git Pull basically brings the local branch up-to-date
local branches so as to keep up-to-date that with the remote copy that will also updates the other
what everybody is working on. remote tracking branches.

72
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Branching
A branch is a version of the repository that diverges from the main working project. It is a
feature available in most modern version control systems. A Git project can have more than
one branch. These branches are a pointer to a snapshot of your changes. When you want to
add a new feature or fix a bug, you spawn a new branch to summarize your changes. So, it is
complex to merge the unstable code with the main code base and also facilitates you to
clean up your future history before merging with the main branch.

Git Master Branch


The master branch is a default branch in Git. It is instantiated when first commit made on the
project. When you make the first commit, you're given a master branch to the starting
commit point. When you start making a commit, then master branch pointer automatically
moves forward. A repository can have only one master branch.
Master branch is the branch in which all the changes eventually get merged back. It can be
called as an official working version of your project.
Operations on Branches
We can perform various operations on Git branches. The git branch command allows you
to create, list, rename and delete branches. Many operations on branches are applied by git
checkout and git merge command. So, the git branch is tightly integrated with the git
checkout and git merge commands.
The Operations that can be performed on a branch:

You can create a new branch with the help of the git branch command. This command will
be used as:
Syntax:
1. $ git branch <branch name>
Output:

73
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

This command will create the branch B1 locally in Git directory.

You can List all of the available branches in your repository by using the following command.
Either we can use git branch - list or git branch command to list the available branches in
the repository.
Syntax:
1. $ git branch --list
or
1. $ git branch
Output:

Here, both commands are listing the available branches in the repository. The symbol * is
representing currently active branch.

You can delete the specified branch. It is a safe operation. In this command, Git prevents you
from deleting the branch if it has unmerged changes. Below is the command to do this.
Syntax:
1. $ git branch -d<branch name>
Output:

This command will delete the existing branch B1 from the repository.
The git branch d command can be used in two formats. Another format of this command
is git branch D. The 'git branch D' command is used to delete the specified branch.
1. $ git branch -D <branch name>

74
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
You can delete a remote branch from Git desktop application. Below command is used to
delete a remote branch:
Syntax:
1. $ git push origin -delete <branch name>
Output:

As you can see in the above output, the remote branch named branch2 from my GitHub
account is deleted.

Git allows you to switch between the branches without making a commit. You can switch
between two branches with the git checkout command. To switch between the branches,
below command is used:
1. $ git checkout<branch name>
Switch from master Branch
You can switch from master to any other branch available on your repository without making
any commit.
Syntax:
1. $ git checkout <branch name>
Output:

As you can see in the output, branches are switched from master to branch4 without
making any commit.
Switch to master branch
You can switch to the master branch from any other branch with the help of below
command.
Syntax:
1. $ git branch -m master
Output:

75
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

As you can see in the above output, branches are switched from branch1 to master without
making any commit.

We can rename the branch with the help of the git branch command. To rename a branch,
use the below command:
Syntax:
1. $ git branch -m <old branch name><new branch name>
Output:

As you can see in the above output, branch4 renamed as renamedB1.

Git allows you to merge the other branch with the currently active branch. You can merge
two branches with the help of git merge command. Below command is used to merge the
branches:
Syntax:
1. $ git merge <branch name>
Output:

From the above output, you can see that the master branch merged with renamedB1. Since
I have made no-commit before merging, so the output is showing as already up to date.

Creating the Branches, switching the branches, merging


Basic Branching and Merging

76
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Let’s go through a simple example of branching and merging with a workflow that you might
use in the real world. You’ll follow these steps:
1. Do some work on a website.
2. Create a branch for a new user story you’re working on.
3. Do some work in that branch.
At this stage, you’ll receive a call that another issue is critical and you need a hotfix. You’ll do
the following:
1. Switch to your production branch.
2. Create a branch to add the hotfix.
3. After it’s tested, merge the hotfix branch, and push to production.
4. Switch back to your original user story and continue working.
Basic Branching
First, let’s say you’re working on your project and have a couple of commits already on
the master branch.

Figure 18. A simple commit history


You’ve decided that you’re going to work on issue #53 in whatever issue-tracking system
your company uses. To create a new branch and switch to it at the same time, you can run
the git checkout command with the -b switch:
$ git checkout -b iss53
Switched to a new branch "iss53"
This is shorthand for:
$ git branch iss53
$ git checkout iss53

77
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Figure 19. Creating a new branch pointer
You work on your website and do some commits. Doing so moves the iss53 branch
forward, because you have it checked out (that is, your HEAD is pointing to it):
$ vim index.html
$ git commit -a -m 'Create new footer [issue 53]'

Figure 20. The iss53 branch has moved forward with your work
Now you get the call that there is an issue with the website, and you need to fix it
immediately. With Git, you don’t have to deploy your fix along with the iss53 changes
you’ve made, and you don’t have to put a lot of effort into reverting those changes before you
can work on applying your fix to what is in production. All you have to do is switch back to
your master branch.
However, before you do that, note that if your working directory or staging area has
uncommitted changes that conflict with the branch you’re checking out, Git won’t let you
switch branches. It’s best to have a clean working state when you switch branches. There
are ways to get around this (namely, stashing and commit amending) that we’ll cover later
on, in Stashing and Cleaning. For now, let’s assume you’ve committed all your changes, so
you can switch back to your master branch:
$ git checkout master
Switched to branch 'master'
At this point, your project working directory is exactly the way it was before you started
working on issue #53, and you can concentrate on your hotfix. This is an important point to
remember: when you switch branches, Git resets your working directory to look like it did the
last time you committed on that branch. It adds, removes, and modifies files automatically to
make sure your working copy is what the branch looked like on your last commit to it.
Next, you have a hotfix to make. Let’s create a hotfix branch on which to work until it’s
completed:
$ git checkout -b hotfix
Switched to a new branch 'hotfix'
$ vim index.html
$ git commit -a -m 'Fix broken email address'
[hotfix 1fb7853] Fix broken email address

78
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1 file changed, 2 insertions(+)

Figure 21. Hotfix branch based on master


You can run your tests, make sure the hotfix is what you want, and finally merge
the hotfix branch back into your master branch to deploy to production. You do this with
the git merge command:
$ git checkout master
$ git merge hotfix
Updating f42c576..3a0874c
Fast-forward
index.html | 2 ++
1 file changed, 2 insertions(+)
You’ll notice the phrase ―fast-forward‖ in that merge. Because the commit C4 pointed to by
the branch hotfix you merged in was directly ahead of the commit C2 you’re on, Git simply
moves the pointer forward. To phrase that another way, when you try to merge one commit
with a commit that can be reached by following the first commit’s history, Git simplifies things
by moving the pointer forward because there is no divergent work to merge together — this is
called a ―fast-forward.‖
Your change is now in the snapshot of the commit pointed to by the master branch, and
you can deploy the fix.

79
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Figure 22. master is fast-forwarded to hotfix
After your super-important fix is deployed, you’re ready to switch back to the work you were
doing before you were interrupted. However, first you’ll delete the hotfix branch, because
you no longer need it — the master branch points at the same place. You can delete it with
the -d option to git branch:
$ git branch -d hotfix
Deleted branch hotfix (3a0874c).
Now you can switch back to your work-in-progress branch on issue #53 and continue
working on it.
$ git checkout iss53
Switched to branch "iss53"
$ vim index.html
$ git commit -a -m 'Finish the new footer [issue 53]'
[iss53 ad82d7a] Finish the new footer [issue 53]
1 file changed, 1 insertion(+)

Figure 23. Work continues on iss53


It’s worth noting here that the work you did in your hotfix branch is not contained in the
files in your iss53 branch. If you need to pull it in, you can merge your master branch into
your iss53 branch by running git merge master, or you can wait to integrate those
changes until you decide to pull the iss53 branch back into master later.
Basic Merging
Suppose you’ve decided that your issue #53 work is complete and ready to be merged into
your master branch. In order to do that, you’ll merge your iss53 branch into master,
much like you merged your hotfix branch earlier. All you have to do is check out the
branch you wish to merge into and then run the git merge command:
$ git checkout master
Switched to branch 'master'
$ git merge iss53
Merge made by the 'recursive' strategy.
80
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
index.html | 1 +
1 file changed, 1 insertion(+)
This looks a bit different than the hotfix merge you did earlier. In this case, your
development history has diverged from some older point. Because the commit on the branch
you’re on isn’t a direct ancestor of the branch you’re merging in, Git has to do some work. In
this case, Git does a simple three-way merge, using the two snapshots pointed to by the
branch tips and the common ancestor of the two.

Figure 24. Three snapshots used in a typical merge


Instead of just moving the branch pointer forward, Git creates a new snapshot that results
from this three-way merge and automatically creates a new commit that points to it. This is
referred to as a merge commit, and is special in that it has more than one parent.

Figure 25. A merge commit


Now that your work is merged in, you have no further need for the iss53 branch. You can
close the issue in your issue-tracking system, and delete the branch:
$ git branch -d iss53

The branches.

81
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 3 : Chef for configuration management

Overview of Chef; Common Chef Terminology (Server, Workstation,


Client, Repository Etc.) Servers and Nodes Chef Configuration
Concepts.
Chef is a Configuration management DevOps tool that manages the infrastructure by writing
code rather than using a manual process so that it can be automated, tested and deployed
very easily. Chef has Client-server architecture and it supports multiple platforms like
Windows, Ubuntu, Centos, and Solaris etc. It can also be integrated with cloud platform like
AWS, Google Cloud Platform, and Open Stack etc. Before getting into Chef deeply let us
understand Configuration Management.
Configuration Management

Let us take an example, suppose you are a system engineer in an organization and you want
to deploy or update software or an operating system on more than hundreds of systems in
your organization in one day. This can be done manually but still, it causes multiple errors,
some software may crash while updating and we won’t be able to revert back to the previous
version. To solve such kinds of issues we use Configuration management.
Configuration Management keeps track of all the software and hardware-related information
of an organization and it also repairs, deploys, and updates the entire application with its
automated procedures. Configuration management does the work of multiple System
Administrators and developers who manage hundreds of servers and applications. Some
tools used for Configuration management are Chef, Puppet, Ansible, CF Engine, SaltStack,
etc.
Let us take a scenario, suppose you have shifted your office into a different environment and
you wanted your system administrator to install, update and deploy software on hundreds of
system overnight. When the system engineer does this task manually it may cause Human
errors and some software’s may not function properly. At this stage, we use Chef which is a
powerful automated tool which transfers infrastructure into code.

82
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Chef automated the application configuration, deployment and management throughout the
network even if we are operating it on cloud or hybrid. We can use chef to speed up the
application deployment. Chef is a great tool for accelerating software delivery, the speed of
software development refers to how quickly the software is able to change in response to
new requirements or conditions
Benefits of Chef
Accelerating software delivery, when your infrastructure is automated all the software
requirements like testing, creating new environments for software deployments etc. becomes
faster.
Increased service Resiliency, by making the infrastructure automated it monitors for bugs
and errors before they occur it can also recover from errors more quickly.
Risk Management, chef lowers risk and improves compliance at all stages of deployment. It
reduces the conflicts during the development and production environment.
Cloud Adoption, Chef can be easily adapted to a cloud environment and the servers and
infrastructure can be easily configured, installed and managed automatically by Chef.
Managing Data Centers and Cloud Environments, as discussed earlier Chef can run on
different platforms, under chef you can manage all your cloud and on-premise platforms
including servers.
Streamlined IT operation and Workflow, Chef provides a pipeline for continuous
deployment starting from building to testing and all the way through delivery, monitoring, and
troubleshooting.
Features of Chef
Easily manage hundreds of server with a handful of employees.
It can be easily managed using operating systems such as Linux, Windows, FreeBSD,
and
It maintains a blueprint of the entire infrastructure.
It integrates with all major cloud service providers.
Centralized management, i.e., a single Chef server can be used as the center for
deploying the policies.
Pros of Chef
One of the most flexible solutions for OS and middleware management.
Designed for programmers.
Chef offers hybrid and SaaS solutions for Chef Servers
Sequential execution order
83
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Very stable, reliable and mature, especially for large deployments in both public and
private environments.
Cons of Chef
Requires steep learning curve
Initial setup is complicated.
Lacks push, so no immediate actions on change. The pull process follows a specified
schedule.
How Chef Works?
Chef basically consists of three components, Chef Server, workstations and Nodes. The chef
server is center hubs of all the operations were changes are stored. The workstation is the
place all the codes are created or changed. Nodes are a machine that is managed by chef.
The user can interact with chef and chef server through Chef Workstation. Knife and Chef
command line tools are used for interacting with Chef Server. Chef node is a virtual or a
cloud machine managed by chef and each node is configured by Chef-Client installed on it.
Chef server stores all part of the configuration. It ensures all the elements are in right place
and are working as expected.

84
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Chef Components
Chef has major components such as Workstation, Cookbook, Node, Chef-Client, and Chef-
Server. Let us see the entire major component in detail.
Chef Server
Chef server contains all configuration data and it stores cookbooks, recipes, and metadata
that describe each node in the Chef-Client. Configuration details are given to node through
Chef-Client. Any changes made must pass through the Chef server to be deployed. Prior to
pushing the changes, it verifies that the nodes and workstation are paired with the server
through the use of authorization keys, and then allow for communication between
workstations and nodes.
Workstation
The workstation is used to interact with Chef-server and also to interact with Chef-nodes. It is
also used to create Cookbooks. Workstation is a place where all the interaction takes place
where Cookbooks are created, tested and deployed, and in workstation, codes are tested.
Workstation is also used for defining roles and environments based on the development and
production environment. Some components of workstation are
Development Kit it contains all the packages requires for using Chef
Chef Command line tool is a place where cookbooks are created, tested and deployed and
through this policies are uploaded to Chef Server.
Knife is used for interacting with Chef Nodes.
Test Kitchen is for validating Chef Code
Chef-Repo is a repository in which cookbooks are created, tested and maintained though
Chef Command line tool.
Cookbooks

Cookbooks are created using Ruby language and Domain Specific languages are used for
specific resources. A cookbook contains recipes which specify resources to be used and in
which order it is to be used. The cookbook contains all the details regarding the work and it
changes the configuration of the Chef-Node.
Attributes are used for overriding default setting in a node.
Files are for transferring files from sub directory to a specific path in chef-client.
Libraries are written in Ruby and it’s used for configuring custom resources and recipes.
Metadata contains information for deploying the cookbooks to each node.
Recipes are a configuration element that is stored in a cookbook. Recipes can also be
included in other recipes and executed based on the run list. Recipes are created using
Ruby language.
Nodes

85
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Nodes are managed by Chef and each node is configured by installing Chef-Client on it.
Chef-Nodes are a machine such as physical, virtual cloud etc.

Chief-Client is for registering and authenticating node, building node objects and for
configuration of the nodes. Chief-client runs locally on every node to configure the node.

Ohai is used for determining the system state at beginning of Chef run in Chef-Client. It
Collects. All the system configuration data.
Roles of Chef in DevOps
Chef is for automating and managing the infrastructure. Chef IT automation can be done
using various Chef DevOps products like Chef-server, Chef-client. Chef DevOps is a tool for
accelerating application delivery and DevOps Collaboration. Chef helps solve the problem
by treating infrastructure as code. Rather than manually changing anything, the machine
setup is described in a Chef recipe.
Conclusion
Chef is a powerful configuration management tool in DevOps and it has good features to be
the best in the market. Day by day Chef has been improving its features and delivering good
results to the customer. Chef is used by worlds leading IT industries like Facebook, AWS,
HP Public cloud etc. Job opportunities are increasing day by day for Chef Automation
masters. To be a master in Chef Automation come to and join our Intellipaat family and be a
master in it.

86
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Workstation Setup: How to configure knife Execute some commands
to test connection between knife and workstation.
Chef follows the concept of client-server architecture, hence in order to start working with
Chef one needs to set up Chef on the workstation and develop the configuration locally.
Later it can be uploaded to Chef server to make them working on the Chef nodes, which
needs to be configured.
Opscode provides a fully packaged version, which does not have any external prerequisites.
This fully packaged Chef is called the omnibus installer.

On Windows Machine
Step 1 − Download the setup .msi file of chefDK on the machine.
Step 2 − Follow the installation steps and install it on the target location.
The setup will look as shown in the following screenshot.

ChefDK Path Variable


$ echo $PATH
/c/opscode/chef/bin:/c/opscode/chefdk/bin:
On Linux Machine
In order to set up on the Linux machine, we need to first get curl on the machine.
Step 1 − Once curl is installed on the machine, we need to install Chef on the workstation
using Opscode’s omnibus Chef installer.
$ curl –L https://www.opscode.com/chef/install.sh | sudo bash
Step 2 − Install Ruby on the machine.
Step 3 − Add Ruby to path variable.
$ echo ‗export PATH = ‖/opt/chef/embedded/bin:$PATH‖‘ ≫ ~/.bash_profile &&
source ~/.bash_profile
The Omnibus Chef will install Ruby and all the required Ruby gems
into /opt/chef/embedded by adding /opt/chef/embedded/bin directory to the .bash_profile
file.
If Ruby is already installed, then install the Chef Ruby gem on the machine by running the
following command.
$ gem install chef

Knife is Chef’s command-line tool to interact with the Chef server. One uses it for uploading
cookbooks and managing other aspects of Chef. It provides an interface between the chefDK
(Repo) on the local machine and the Chef server. It helps in managing −

Chef nodes
Cookbook
Recipe
87
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Environments
Cloud Resources
Cloud Provisioning
Installation on Chef client on Chef nodes
Knife provides a set of commands to manage Chef infrastructure.
Bootstrap Commands
knife bootstrap [SSH_USER@]FQDN (options)
Client Commands
knife client bulk delete REGEX (options)
knife client create CLIENTNAME (options)
knife client delete CLIENT (options)
knife client edit CLIENT (options)
Usage: C:/opscode/chef/bin/knife (options)
knife client key delete CLIENT KEYNAME (options)
knife client key edit CLIENT KEYNAME (options)
knife client key list CLIENT (options)
knife client key show CLIENT KEYNAME (options)
knife client list (options)
knife client reregister CLIENT (options)
knife client show CLIENT (options)
Configure Commands
knife configure (options)
knife configure client DIRECTORY
Cookbook Commands
knife cookbook bulk delete REGEX (options)
knife cookbook create COOKBOOK (options)
knife cookbook delete COOKBOOK VERSION (options)
knife cookbook download COOKBOOK [VERSION] (options)
knife cookbook list (options)
knife cookbook metadata COOKBOOK (options)
knife cookbook metadata from FILE (options)
knife cookbook show COOKBOOK [VERSION] [PART] [FILENAME] (options)
knife cookbook test [COOKBOOKS...] (options)
knife cookbook upload [COOKBOOKS...] (options)
Cookbook Site Commands
knife cookbook site download COOKBOOK [VERSION] (options)
knife cookbook site install COOKBOOK [VERSION] (options)
knife cookbook site list (options)
knife cookbook site search QUERY (options)
knife cookbook site share COOKBOOK [CATEGORY] (options)

88
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife cookbook site show COOKBOOK [VERSION] (options)
knife cookbook site unshare COOKBOOK
Data Bag Commands
knife data bag create BAG [ITEM] (options)
knife data bag delete BAG [ITEM] (options)
knife data bag edit BAG ITEM (options)
knife data bag from file BAG FILE|FOLDER [FILE|FOLDER..] (options)
knife data bag list (options)
knife data bag show BAG [ITEM] (options)
Environment Commands
knife environment compare [ENVIRONMENT..] (options)
knife environment create ENVIRONMENT (options)
knife environment delete ENVIRONMENT (options)
knife environment edit ENVIRONMENT (options)
knife environment from file FILE [FILE..] (options)
knife environment list (options)
knife environment show ENVIRONMENT (options)
Exec Commands
knife exec [SCRIPT] (options)
Help Commands
knife help [list|TOPIC]
Index Commands
knife index rebuild (options)
Node Commands
knife node bulk delete REGEX (options)
knife node create NODE (options)
knife node delete NODE (options)
knife node edit NODE (options)
knife node environment set NODE ENVIRONMENT
knife node from file FILE (options)
knife node list (options)
knife node run_list add [NODE] [ENTRY[,ENTRY]] (options)
knife node run_list remove [NODE] [ENTRY[,ENTRY]] (options)
knife node run_list set NODE ENTRIES (options)
knife node show NODE (options)
OSC Commands
knife osc_user create USER (options)
knife osc_user delete USER (options)
knife osc_user edit USER (options)
knife osc_user list (options)
knife osc_user reregister USER (options)

89
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife osc_user show USER (options)
Path-Based Commands
knife delete [PATTERN1 ... PATTERNn]
knife deps PATTERN1 [PATTERNn]
knife diff PATTERNS
knife download PATTERNS
knife edit [PATTERN1 ... PATTERNn]
knife list [-dfR1p] [PATTERN1 ... PATTERNn]
knife show [PATTERN1 ... PATTERNn]
knife upload PATTERNS
knife xargs [COMMAND]
Raw Commands
knife raw REQUEST_PATH
Recipe Commands
knife recipe list [PATTERN]
Role Commands
knife role bulk delete REGEX (options)
knife role create ROLE (options)
knife role delete ROLE (options)
knife role edit ROLE (options)
knife role env_run_list add [ROLE] [ENVIRONMENT] [ENTRY[,ENTRY]] (options)
knife role env_run_list clear [ROLE] [ENVIRONMENT]
knife role env_run_list remove [ROLE] [ENVIRONMENT] [ENTRIES]
knife role env_run_list replace [ROLE] [ENVIRONMENT] [OLD_ENTRY]
[NEW_ENTRY]
knife role env_run_list set [ROLE] [ENVIRONMENT] [ENTRIES]
knife role from file FILE [FILE..] (options)
knife role list (options)
knife role run_list add [ROLE] [ENTRY[,ENTRY]] (options)
knife role run_list clear [ROLE]
knife role run_list remove [ROLE] [ENTRY]
knife role run_list replace [ROLE] [OLD_ENTRY] [NEW_ENTRY]
knife role run_list set [ROLE] [ENTRIES]
knife role show ROLE (options)
Serve Commands
knife serve (options)
SSH Commands
knife ssh QUERY COMMAND (options)
SSL Commands
knife ssl check [URL] (options)
knife ssl fetch [URL] (options)
Status Commands
90
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife status QUERY (options)
Tag Commands
knife tag create NODE TAG ...
knife tag delete NODE TAG ...
knife tag list NODE
User Commands
knife user create USERNAME DISPLAY_NAME FIRST_NAME LAST_NAME EMAIL
PASSWORD (options)
knife user delete USER (options)
knife user edit USER (options)
knife user key create USER (options)
knife user key delete USER KEYNAME (options)
knife user key edit USER KEYNAME (options)
knife user key list USER (options)
knife user key show USER KEYNAME (options)
knife user list (options)
knife user reregister USER (options)
knife user show USER (options)
Knife Setup
In order to set up knife, one needs to move to .chef directory and create a knife.rb inside the
chef repo, which tells knife about the configuration details. This will have a couple up details.
current_dir = File.dirname(__FILE__)
log_level :info
log_location STDOUT
node_name 'node_name'
client_key "#{current_dir}/USER.pem"
validation_client_name 'ORG_NAME-validator'
validation_key "#{current_dir}/ORGANIZATION-validator.pem"
chef_server_url 'https://api.chef.io/organizations/ORG_NAME'
cache_type 'BasicFile'
cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )
cookbook_path ["#{current_dir}/../cookbooks"]
In the above code, we are using the hosted Chef server which uses the following two keys.
validation_client_name 'ORG_NAME-validator'
validation_key "#{current_dir}/ORGANIZATION-validator.pem"
Here, knife.rb tells knife which organization to use and where to find the private key. It tells
knife where to find the users’ private key.
client_key "#{current_dir}/USER.pem"
The following line of code tells knife we are using the hosted server.
chef_server_url 'https://api.chef.io/organizations/ORG_NAME'
Using the knife.rb file, the validator knife can now connect to your organization’s hosted
Opscode.
91
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Organization Setup: Create organization; Add yourself and


node to organization.

The Chef Infra Server uses role-based access control (RBAC) to restrict access to objects—
nodes, environments, roles, data bags, cookbooks, and so on. This ensures that only
authorized user and/or Chef Infra Client requests to the Chef Infra Server are allowed.
Access to objects on the Chef Infra Server is fine-grained, allowing access to be defined by
object type, object, group, user, and organization. The Chef Infra Server uses permissions
to define how a user may interact with an object, after they have been authorized to do so.

The Chef Infra Server uses organizations, groups, and users to define role -based access
control:

Feature Description
An organization is the top-level entity for role-based access control in
the Chef Infra Server. Each organization contains the default groups
( admins , clients , and users , plus billing_admins for the hosted Chef
Infra Server), at least one user and at least one node (on which the
Chef Infra Client is installed). The Chef Infra Server supports multiple
organizations. The Chef Infra Server includes a single default
organization that is defined during setup. Additional organizations can
be created after the initial setup and configuration of the Chef Infra
Server.
A group is used to define access to object types and objects in the
Chef Infra Server and also to assign permissions that determine what
types of tasks are available to members of that group who are
authorized to perform them. Groups are configured by organization.

Individual users who are members of a group will inherit the


permissions assigned to the group. The Chef Infra Server includes the
following default groups: admins , clients , and users . For users of the
hosted Chef Infra Server, an additional default group is
provided: billing_admins .
A user is any non-administrator human being who will manage data that
is uploaded to the Chef Infra Server from a workstation or who will log
on to the Chef management console web user interface. The Chef Infra
Server includes a single default user that is defined during setup and is
automatically assigned to the admins group.

A client is an actor that has permission to access the Chef Infra Server.
A client is most often a node (on which the Chef Infra Client runs), but
is also a workstation (on which knife runs), or some other machine that
is configured to use the Chef Infra Server API. Each request to the Chef
Infra Server that is made by a client uses a private key for
authentication that must be authorized by the public key on the Chef
Infra Server.

When a user makes a request to the Chef Infra Server using the Chef Infra Server API,
permission to perform that action is determined by the following process :

92
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1. Check if the user has permission to the object type
2. If no, recursively check if the user is a member of a security group that has
permission to that object
3. If yes, allow the user to perform the action

Permissions are managed using the Chef management console add-on in the Chef Infra
Server web user interface.

Organizations
A single instance of the Chef Infra Server can support many organizations. Each
organization has a unique set of groups and users. Each organization manages a unique set
of nodes, on which a Chef Infra Client is installed and configured so that it may interact with
a single organization on the Chef Infra Server.

A user may belong to multiple organizations under the following conditions:

Role-based access control is configured for each organization


For a single user to interact with the Chef Infra Server using knife from the same
chef-repo, that user may need to edit their config.rb file before that interaction

Using multiple organizations within the Chef Infra Server ensures that the sa me toolset,
coding patterns and practices, physical hardware, and product support effort is being
applied across the entire company, even when:

Multiple product groups must be supported—each product group can have its own
security requirements, schedule, and goals
Updates occur on different schedules—the nodes in one organization are managed
completely independently from the nodes in another
Individual teams have competing needs for object and object types —data bags,
environments, roles, and cookbooks are unique to each organization, even if they
share the same name
Permissions
Permissions are used in the Chef Infra Server to define how users and groups can interact
with objects on the server. Permissions are configured for each organization.

Object Permissions
The Chef Infra Server includes the following object permissions:

Permission Description
Delete Use the Delete permission to define which users and groups may delete an
object. This permission is required for any user who uses the knife [object]
delete [object_name] argument to interact with objects on the Chef Infra
Server.
Grant Use the Grant permission to define which users and groups may configure
permissions on an object. This permission is required for any user who
configures permissions using the Administration tab in the Chef management
console.
Read Use the Read permission to define which users and groups may view the
details of an object. This permission is required for any user who uses
the knife [object] show [object_name] argument to interact with objects on
the Chef Infra Server.

93
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Update Use the Update permission to define which users and groups may edit the
details of an object. This permission is required for any user who uses
the knife [object] edit [object_name] argument to interact with objects on
the Chef Infra Server and for any Chef Infra Client to save node data to the
Chef Infra Server at the conclusion of a Chef Infra Client run.
Global Permissions

The Chef Infra Server includes the following global permissions:

Permission Description
Create Use the Create global permission to define which users and groups may
create the following server object types: cookbooks, data bags, environments,
nodes, roles, and tags. This permission is required for any user who uses
the knife [object] create argument to interact with objects on the Chef Infra
Server.
List Use the List global permission to define which users and groups may view the
following server object types: cookbooks, data bags, environments, nodes,
roles, and tags. This permission is required for any user who uses the knife
[object] list argument to interact with objects on the Chef Infra Server.

These permissions set the default permissions for the following Chef Infra Server object
types: clients, cookbooks, data bags, environments, groups, nodes, roles, and sandboxes.

Client Key Permissions

Note

This is only necessary after migrating a client from one Chef Infra Server to
another. Permissions must be reset for client keys after the migration.

Keys should have DELETE , GRANT , READ and UPDATE permissions.

Use the following code to set the correct permissions:

Copy
#!/usr/bin/env ruby
require 'chef/knife'

#previously knife.rb
Chef::Config.from_file(File.join(Chef::Knife.chef_config_dir, 'knife.rb'))

rest = Chef::ServerAPI.new(Chef::Config[:chef_server_url])

Chef::Node.list.each do |node|
%w(read update delete grant).each do |perm|
ace = rest.get("nodes/#{node[0]}/_acl")[perm]
ace['actors'] << node[0] unless ace['actors'].include?(node[0])
rest.put("nodes/#{node[0]}/_acl/#{perm}", perm => ace)
puts "Client \"#{node[0]}\" granted \"#{perm}\" access on node \"#{node[0]}\""
end
end

Save it as a Ruby script— chef_server_permissions.rb , for example—in


the .chef/scripts directory located in the chef-repo, and then run a knife command similar
to:
94
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
knife exec chef_server_permissions.rb
Knife ACL
The knife plugin knife-acl provides a fine-grained approach to modifying permissions, by
wrapping API calls to the _acl endpoint and makes such permission changes easier to
manage.

Warning

Chef Manage is deprecated and no longer under active development. It is


supported on Chef Automate installations up to version 1.8 and replaced by Chef
Automate 2.0. Contact your Chef account representative for information about
upgrading your system. See our Automate documentation to learn more about
Chef Automate 2.

This document is no longer maintained.

knife-acl and the Chef Manage browser interface are incompatible. After engaging knife-
acl, you will need to discontinue using the Chef Manage browser interface from that point
forward due to possible incompatibilities.

Groups
The Chef Infra Server includes the following default groups:

Group Description
admins The admins group defines the list of users who have administrative
rights to all objects and object types for a single organization.
billing_admins The billing_admins group defines the list of users who have
permission to manage billing information. This permission exists
only for the hosted Chef Infra Server.
clients The clients group defines the list of nodes on which a Chef Infra
Client is installed and under management by Chef. In general,
think of this permission as "all of the non-human actors---Chef
Infra Client, in almost every case---that get data from, and/or
upload data to, the Chef server". Newly-created Chef Infra Client
instances are added to this group automatically.
public_key_read_access The public_key_read_access group defines which users and clients
have read permissions to key-related endpoints in the Chef Infra
Server API.
users The users group defines the list of users who use knife and the
Chef management console to interact with objects and object
types. In general, think of this permission as "all of the non -admin
human actors who work with data that is uploaded to and/or
downloaded from the Chef server".
Example Default Permissions
The following sections show the default permissions assigned by the Chef Infra Server to
the admins , billing_admins , clients , and users groups.

Note

95
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The creator of an object on the Chef Infra Server is
assigned create , delete , grant , read , and update permission to that object.
admins
The admins group is assigned the following:

Group Create Delete Grant Read Update


admins yes yes yes yes yes
clients yes yes yes yes yes
users yes yes yes yes yes
billing_admins
The billing_admins group is assigned the following:

billing_admins
The billing_admins group is assigned the following:

Group Create Delete Read Update


billing_admins no no yes yes
clients
The clients group is assigned the following:

Object Create Delete Read Update


clients no no no no
cookbooks no no yes no
cookbook_artifacts no no yes no
data no no yes no
environments no no yes no
nodes yes no yes no
organization no no yes no
policies no no yes no
policy_groups no no yes no
roles no no yes no
sandboxes no no no no
public_key_read_access
The public_key_read_access group controls which users and clients have read permissions
to the following endpoints:

GET /clients/CLIENT/keys
GET /clients/CLIENT/keys/KEY
GET /users/USER/keys
GET /users/USER/keys/

By default, the public_key_read_access assigns all members of the users and clients group
permission to these endpoints:

Group Create Delete Grant Read Update


admins no no no no no

96
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
clients yes yes yes yes yes
users yes yes yes yes yes
users
The users group is assigned the following:

users
The users group is assigned the following:

Object Create Delete Read Update


clients no yes yes no
cookbooks yes yes yes yes
cookbook_artifacts yes yes yes yes
data yes yes yes yes
environments yes yes yes yes
nodes yes yes yes yes
organization no no yes no
policies yes yes yes yes
policy_groups yes yes yes yes
roles yes yes yes yes
sandboxes yes no no no
chef-validator
Every request made by Chef Infra Client to the Chef Infra Server must be an authenticated
request using the Chef Infra Server API and a private key. When Chef Infra Client makes a
request to the Chef Infra Server, Chef Infra Client authenticates each request using a
private key located in /etc/chef/client.pem .

The chef-validator is allowed to do the following at the start of a Chef Infra Client run. After
the Chef Infra Client is registered with Chef Infra Server, that Chef Infra Client is added to
the clients group:

Object Create Delete Read Update


clients yes no no no
Server Admins
The server-admins group is a global group that grants its members permission to create,
read, update, and delete user accounts, with the exception of superuser accounts.
The server-admins group is useful for users who are responsible for day-to-day
administration of the Chef Infra Server, especially user management using the knife
user subcommand. Before members can be added to the server-admins group, they must
already have a user account on the Chef Infra Server.

Scenario
The following user accounts exist on the Chef Infra Server: pivotal (a superuser
account), alice , bob , carol , and dan . Run the following command to view a list of users on
the Chef Infra Server:

Copy
chef-server-ctl user-list

97
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
and it returns the same list of users:

Copy
pivotal
alice
bob
carol
dan

Alice is a member of the IT team whose responsibilities include day-to-day administration of


the Chef Infra Server, in particular managing the user accounts on the Chef Infra Server that
are used by the rest of the organization. From a workstation, Alice runs the following
command:

Copy
knife user list -c ~/.chef/alice.rb

and it returns the following error:

Copy
ERROR: You authenticated successfully to <chef_server_url> as alice
but you are not authorized for this action
Response: Missing read permission

Alice is not a superuser and does not have permissions on other users because user
accounts are global to organizations in the Chef Infra Server. Let’s add Alice to the server-
admins group:

Copy
chef-server-ctl grant-server-admin-permissions alice

and it returns the following response:

Copy
User alice was added to server-admins.

Alice can now create, read, update, and delete user accounts on the Chef Infra Server, even
for organizations to which Alice is not a member. From a workstation, Alice re-runs the
following command:

Copy
knife user list -c ~/.chef/alice.rb

which now returns:

Copy
pivotal
alice
bob
carol
dan

Alice is now a server administrator and can use the following knife subcommands to
manage users on the Chef Infra Server:

knife user-create
98
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife user-delete
knife user-edit
knife user-list
knife user-show

For example, Alice runs the following command:

Copy
knife user edit carol -c ~/.chef/alice.rb

and the $EDITOR opens in which Alice makes changes, and then saves them.

Superuser Accounts
Superuser accounts may not be managed by users who belong to the server-admins group.
For example, Alice attempts to delete the pivotal superuser account:

Copy
knife user delete pivotal -c ~/.chef/alice.rb

and the following error is returned:

Copy
ERROR: You authenticated successfully to <chef_server_url> as user1
but you are not authorized for this action
Response: Missing read permission

Alice’s action is unauthorized even with membership in the server-admins group.

Manage server-admins Group


Membership of the server-admins group is managed with a set of chef-server-
ctl subcommands:

chef-server-ctl grant-server-admin-permissions
chef-server-ctl list-server-admins
chef-server-ctl remove-server-admin-permissions
Add Members
The grant-server-admin-permissions subcommand is used to add a user to the server-
admins group. Run the command once for each user added.

This subcommand has the following syntax:

Copy
chef-server-ctl grant-server-admin-permissions USER_NAME

where USER_NAME is the user to add to the list of server administrators.

For example:

Copy
chef-server-ctl grant-server-admin-permissions bob

99
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
returns:

Copy
User bob was added to server-admins. This user can now list,
read, and create users (even for orgs they are not members of)
for this Chef Infra Server.
Remove Members
The remove-server-admin-permissions subcommand is used to remove a user from
the server-admins group. Run the command once for each user removed.

This subcommand has the following syntax:

Copy
chef-server-ctl remove-server-admin-permissions USER_NAME

where USER_NAME is the user to remove from the list of server administrators.

For example:

Copy
chef-server-ctl remove-server-admin-permissions bob

returns:

Copy
User bob was removed from server-admins. This user can no longer
list, read, and create users for this Chef Infra Server except for where
they have default permissions (such as within an org).

List Membership
The list-server-admins subcommand is used to return a list of users who are members of
the server-admins group.

This subcommand has the following syntax:

Copy
chef-server-ctl list-server-admins

and will return a list of users similar to:

Copy
pivotal
alice
bob
carol
dan

Manage Organizations
Use the org-create , org-delete , org-list , org-show , org-user-add and org-user-
remove commands to manage organizations.

100
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
org-create
The org-create subcommand is used to create an organization. (The validation key for the
organization is returned to STDOUT when creating an organization with this command.)

Syntax
This subcommand has the following syntax:

Copy
chef-server-ctl org-create ORG_NAME "ORG_FULL_NAME" (options)

where:

The name must begin with a lower-case letter or digit, may only contain lower-case
letters, digits, hyphens, and underscores, and must be between 1 and 255
characters. For example: chef .
The full name must begin with a non-white space character and must be between 1
and 1023 characters. For example: "Chef Software, Inc." .
Options
This subcommand has the following options:

-a USER_NAME , --association_user USER_NAME

Associate a user with an organization and add them to


the admins and billing_admins security groups.

-f FILE_NAME , --filename FILE_NAME

Write the ORGANIZATION-validator.pem to FILE_NAME instead of printing it to STDOUT .

org-delete
The org-delete subcommand is used to delete an organization.

Syntax
This subcommand has the following syntax:

Copy
chef-server-ctl org-delete ORG_NAME
org-list
The org-list subcommand is used to list all of the organizations currently present on the
Chef Infra Server.

Syntax
This subcommand has the following syntax:

Copy
101
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
chef-server-ctl org-list (options)
Options
This subcommand has the following options:

-a , --all-orgs

Show all organizations.

-w , --with-uri

Show the corresponding URIs.

org-show
The org-show subcommand is used to show the details for an organization.

Syntax
This subcommand has the following syntax:

Copy
chef-server-ctl org-show ORG_NAME
org-user-add
The org-user-add subcommand is used to add a user to an organization.

Syntax
This subcommand has the following syntax:

Copy
chef-server-ctl org-user-add ORG_NAME USER_NAME (options)
Options
This subcommand has the following options:

--admin

Add the user to the admins group.

org-user-remove
The org-user-remove subcommand is used to remove a user from an organization.

Syntax
This subcommand has the following syntax:

Copy
chef-server-ctl org-user-remove ORG_NAME USER_NAME (options)

102
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Test Node Setup: Create a server and add to organization,
check node details using knife.
Chef is a tool used for Configuration Management which
closely competes with Puppet. Chef is an automation tool that
provides a way to define infrastructure as code.
1. Install chef-client
Either use the https://www.chef.io/chef/install.sh script or download and install the correct chef-
client package for your OS.

2. Create /etc/chef/client.rb
Perhaps you can use one of your bootstrapped nodes as a reference. The important bit is that
you have chef_server_url pointing at your Chef server.
Example:

/etc/chef/client.rb
chef_server_url "https://mychefserver.myorg.com/organizations/myorg"
validation_client_name "myorg-validator"
validation_key "/etc/chef/myorg-validator.pem"
log_level :info

3. Copy validation key


The key you got after running chef-server-ctl org-create. If lost you can generate a new one from Chef
Manage.
Copy the key to /etc/chef/myorg-validator.pem (to what is configured as validation_key in client.rb)

4. Fetch SSL cert


Optionally, if the SSL certificate on your Chef server isn't signed (it probably isn't), you must
manually fetch it so that knife/chef-client will trust the certificate.

mkdir /etc/chef/trusted_certs
knife ssl fetch -c /etc/chef/client.rb

103
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Node Objects and Search: How to Add Run list to Node Check
node Details.

A run-list defines all of the information necessary for Chef to configure a node
into the desired state. A run-list is:

An ordered list of roles and/or recipes that are run in the exact order
defined in the run-list; if a recipe appears more than once in the run-list,
Chef Infra Client will not run it twice
Always specific to the node on which it runs; nodes may have a run -list
that is identical to the run-list used by other nodes
Stored as part of the node object on the Chef server
Maintained using knife and then uploaded from the workstation to the Chef
Infra Server, or maintained using Chef Automate
Run-list Format

A run-list must be in one of the following formats: fully qualified, cookbook, or


default. Both roles and recipes must be in quotes, for example:

Copy
"role[NAME]"

or

Copy
"recipe[COOKBOOK::RECIPE]"

Use a comma to separate roles and recipes when adding more than one item the
run-list:

Copy
"recipe[COOKBOOK::RECIPE],COOKBOOK::RECIPE,role[NAME]"
Empty Run-lists

Use an empty run-list to determine if a failed Chef Infra Client run has anything
to do with the recipes that are defined within that run-list. This is a quick way to
discover if the underlying cause of a Chef Infra Client run failure is a
configuration issue. If a failure persists even if the run-list is empty, check the
following:

Configuration settings in the config.rb file


Permissions for the user to both the Chef Infra Server and to the node on
which a Chef Infra Client run is to take place

104
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Knife Commands

The following knife commands may be used to manage run-lists on the Chef Infra
Server.

Quotes, Windows

When running knife in Windows, a string may be interpreted as a wildcard pattern


when quotes are not present in the command. The number of quotes to use
depends on the shell from which the command is being run.

When running knife from the command prompt, a string should be surrounded by
single quotes ( ' '). For example:

Copy
knife node run_list set test-node 'recipe[iptables]'

When running knife from Windows PowerShell, a string should be surrounded by


triple single quotes ( ''' '''). For example:

Copy
knife node run_list set test-node '''recipe[iptables]'''
Import-Module chef

The Chef Infra Client 12.4 release adds an optional feature to the Microsoft
Installer Package (MSI) for Chef. This feature enables the ability to pass quoted
strings from the Windows PowerShell command line without the need for triple
single quotes (''' '''). This feature installs a Windows PowerShell module
(typically in C:\opscode\chef\modules ) that is also appended to
the PSModulePath environment variable. This feature is not enabled by default.
To activate this feature, run the following command from within Windows
PowerShell:

Copy
Import-Module chef

or add Import-Module chef to the profile for Windows PowerShell located at:

Copy
~\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1

This module exports cmdlets that have the same name as the command-line
tools—chef-client, knife—that are built into Chef.

For example:
105
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
knife exec -E 'puts ARGV' """&s0meth1ng"""

is now:

Copy
knife exec -E 'puts ARGV' '&s0meth1ng'

and:

Copy
knife node run_list set test-node '''role[ssssssomething]'''

is now:

Copy
knife node run_list set test-node 'role[ssssssomething]'

To remove this feature, run the following command from within Windows
PowerShell:

Copy
Remove-Module chef
run_list add

Use the run_list add argument to add run-list items (roles or recipes) to a
node.

A run-list must be in one of the following formats: fully qualified, cookbook, or


default. Both roles and recipes must be in quotes, for example:

Copy
"role[NAME]"

or

Copy
"recipe[COOKBOOK::RECIPE]"

Use a comma to separate roles and recipes when adding more than one item the
run-list:

Copy
"recipe[COOKBOOK::RECIPE],COOKBOOK::RECIPE,role[NAME]"
Syntax

106
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
This argument has the following syntax:

Copy
knife node run_list add NODE_NAME RUN_LIST_ITEM (options)
Options

This argument has the following options:

-a ITEM , --after ITEM

Add a run-list item after the specified run-list item.

-b ITEM , --before ITEM

Add a run-list item before the specified run-list item.

Note

See config.rb for more information about how to add certain knife options as
settings in the config.rb file.
Examples

The following examples show how to use this knife subcommand:

ADD A ROLE

To add a role to a run-list, enter:

Copy
knife node run_list add NODE_NAME 'role[ROLE_NAME]'
ADD ROLES AND RECIPES

To add roles and recipes to a run-list, enter:

Copy
knife node run_list add NODE_NAME
'recipe[COOKBOOK::RECIPE_NAME],recipe[COOKBOOK::RECIPE_NAME],role[ROLE
_NAME]'
ADD A RECIPE WITH A FQDN

To add a recipe to a run-list using the fully qualified format, enter:

Copy
knife node run_list add NODE_NAME 'recipe[COOKBOOK::RECIPE_NAME]'
ADD A RECIPE WITH A COOKBOOK

107
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To add a recipe to a run-list using the cookbook format, enter:

Copy
knife node run_list add NODE_NAME 'COOKBOOK::RECIPE_NAME'
ADD THE DEFAULT RECIPE

To add the default recipe of a cookbook to a run-list, enter:

Copy
knife node run_list add NODE_NAME 'COOKBOOK'
run_list remove

Use the run_list remove argument to remove run-list items (roles or recipes)
from a node. A recipe must be in one of the following formats: fully qualified,
cookbook, or default. Both roles and recipes must be in quotes, for
example: 'role[ROLE_NAME]' or 'recipe[COOKBOOK::RECIPE_NAME]' . Use a
comma to separate roles and recipes when removing more than one, like
this: 'recipe[COOKBOOK::RECIPE_NAME],COOKBOOK::RECIPE_NAME,role[ROLE_NAM
E]'.

Syntax

This argument has the following syntax:

Copy
knife node run_list remove NODE_NAME RUN_LIST_ITEM
Options

This command does not have any specific options.

Note

See config.rb for more information about how to add certain knife options as
settings in the config.rb file.
Examples

The following examples show how to use this knife subcommand:

REMOVE A ROLE

To remove a role from a run-list, enter:

Copy
knife node run_list remove NODE_NAME 'role[ROLE_NAME]'
REMOVE A RUN-LIST
108
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To remove a recipe from a run-list using the fully qualified format, enter:

Copy
knife node run_list remove NODE_NAME 'recipe[COOKBOOK::RECIPE_NAME]'
run_list set

Use the run_list set argument to set the run-list for a node. A recipe must be
in one of the following formats: fully qualified, cookbook, or default. Both roles
and recipes must be in quotes, for
example: "role[ROLE_NAME]" or "recipe[COOKBOOK::RECIPE_NAME]" . Use a
comma to separate roles and recipes when setting more than one, like
this: "recipe[COOKBOOK::RECIPE_NAME],COOKBOOK::RECIPE_NAME,role[ROLE_NAM
E]".

Syntax

This argument has the following syntax:

Copy
knife node run_list set NODE_NAME RUN_LIST_ITEM
Options

This command does not have any specific options.

Examples

None.

status

The following examples show how to use the knife status subcommand to
verify the status of run-lists.

View status, include run-lists

To include run-lists in the status, enter:

Copy
knife status --run-list

to return something like:

Copy
20 hours ago, dev-vm.chisamore.com, ubuntu 10.04, dev-
vm.chisamore.com, 10.66.44.126, role[lb].

109
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
3 hours ago, i-225f954f, ubuntu 10.04, ec2-67-202-63-102.compute-
1.amazonaws.com, 67.202.63.102, role[web].
3 hours ago, i-a45298c9, ubuntu 10.04, ec2-174-129-127-206.compute-
1.amazonaws.com, 174.129.127.206, role[web].
3 hours ago, i-5272a43f, ubuntu 10.04, ec2-184-73-9-250.compute-
1.amazonaws.com, 184.73.9.250, role[web].
3 hours ago, i-226ca64f, ubuntu 10.04, ec2-75-101-240-230.compute-
1.amazonaws.com, 75.101.240.230, role[web].
3 hours ago, i-f65c969b, ubuntu 10.04, ec2-184-73-60-141.compute-
1.amazonaws.com, 184.73.60.141, role[web].
View status using a query

To show the status of a subset of nodes that are returned by a specific query,
enter:

Copy
knife status "role:web" --run-list

to return something like:

Copy
3 hours ago, i-225f954f, ubuntu 10.04, ec2-67-202-63-102.compute-
1.amazonaws.com, 67.202.63.102, role[web].
3 hours ago, i-a45298c9, ubuntu 10.04, ec2-174-129-127-206.compute-
1.amazonaws.com, 174.129.127.206, role[web].
3 hours ago, i-5272a43f, ubuntu 10.04, ec2-184-73-9-250.compute-
1.amazonaws.com, 184.73.9.250, role[web].
3 hours ago, i-226ca64f, ubuntu 10.04, ec2-75-101-240-230.compute-
1.amazonaws.com, 75.101.240.230, role[web].
3 hours ago, i-f65c969b, ubuntu 10.04, ec2-184-73-60-141.compute-
1.amazonaws.com, 184.73.60.141, role[web].
Run-lists, Applied

A run-list will tell Chef Infra Client what to do when bootstrapping that node for
the first time, and then how to configure that node on every subsequent Chef
Infra Client run.

Bootstrap Operations

The knife bootstrap command is a common way to install Chef Infra Client on a
node. The default for this approach assumes that a node can access the Chef
website so that it may download the Chef Infra Client package from that location.

The Chef Infra Client installer will detect the version of the operating system, and
then install the appropriate Chef Infra Client version using a single command to
110
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
install Chef Infra Client and all of its dependencies, including an embedded
version of Ruby, OpenSSL, parsers, libraries, and command line utilities.

The Chef Infra Client installer puts everything into a unique directory
(/opt/chef/ ) so that Chef Infra Client will not interfere with other applications
that may be running on the target machine. Once installed, Chef Infra Client
requires a few more configuration steps before it can perform its first Chef Infra
Client run on a node.

A node is any physical, virtual, or cloud device that is configured and maintained
by an instance of Chef Infra Client. Bootstrapping installs Chef Infra Client on a
target system so that it can run as a client and sets the node up to communicate
with a Chef Infra Server. There are two ways to do this:

Run the knife bootstrap command from a workstation.


Perform an unattended install to bootstrap from the node itself, without
requiring SSH or WinRM connectivity.

The following diagram shows the stages of the bootstrap operation, and the list
below the diagram describes each of those stages in greater detail.

111
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
During a knife bootstrap bootstrap operation, the following happens:

Stages Description
knife Enter the knife bootstrap subcommand from a workstation. Include the
bootstrap hostname, IP address, or FQDN of the target node as part of this command.
Knife will establish an SSH or WinRM connection with the target system
and run a bootstrap script.
Get the install The shell script will make a request to the Chef website to get the most
script from recent version of a the Chef Infra Client install
Chef script( install.sh or install.ps1 ).
Get the Chef The install script then gathers system-specific information and determines
Infra Client the correct package for Chef Infra Client, and then downloads the
package from appropriate package from omnitruck-direct.chef.io .
Chef
Install Chef Chef Infra Client is installed on the target node using a system native
Infra Client package (.rpm, .msi, etc).
Start a Chef On UNIX and Linux-based machines: The second shell script executes
Infra Client the chef-client binary with a set of initial settings stored within first-
run boot.json on the node. first-boot.json is generated from the workstation
as part of the initial knife bootstrap subcommand.

On Windows machines: The batch file that is derived from the windows -
chef-client-msi.erb bootstrap template executes the chef-client binary
with a set of initial settings stored within first-boot.json on the
node. first-boot.json is generated from the workstation as part of the
initial knife bootstrap subcommand.
Complete a a Chef Infra Client run proceeds, using HTTPS (port 443), and registers the
Chef Infra node with the Chef Infra Server.
Client run
The first Chef Infra Client run, by default, contains an empty run -list. A run-
list can be specified as part of the initial bootstrap operation using the --
run-list option as part of the knife bootstrap subcommand.
The Chef Infra Client Run
A “Chef Infra Client run” is the term used to describe the steps Chef Infra Client takes to
configure a node when the chef-client command is run. The following diagram shows the
various stages that occur during a Chef Infra Client run.

112
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

During every Chef Infra Client run, the following happens:

Stages Description
Get configuration Chef Infra Client gets process configuration data from
data the client.rb file on the node, and then gets node configuration data
113
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
from Ohai. One important piece of configuration data is the name of
the node, which is found in the node_name attribute in the client.rb
file or is provided by Ohai. If Ohai provides the name of a node, it is
typically the FQDN for the node, which is always unique within an
organization.
Authenticate to the Chef Infra Client authenticates to the Chef Infra Server using an RSA
Chef Infra Server private key and the Chef Infra Server API. The name of the node is
required as part of the authentication process to the Chef Infra
Server. If this is the first Chef Infra Client run for a node, the chef-
validator will be used to generate the RSA private key.
Get, rebuild the Chef Infra Client pulls down the node object from the Chef Infra
node object Server and then rebuilds it. A node object is made up of the system
attributes discovered by Ohai, the attributes set in Policyfiles or
Cookbooks, and the run list of cookbooks. The first time Chef Infra
Client runs on a node, it creates a node object from the default run-
list. A node that has not yet had a Chef Infra Client run will not have
a node object or a Chef Infra Server entry for a node object. On any
subsequent Chef Infra Client runs, the rebuilt node object will also
contain the run-list from the previous Chef Infra Client run.
Expand the run-list Chef Infra Client expands the run-list from the rebuilt node object
and compiles a complete list of recipes in the exact order that they
will be applied to the node.
Synchronize Chef Infra Client requests all the cookbook files (including recipes,
cookbooks templates, resources, providers, attributes, and libraries) that it
needs for every action identified in the run-list from the Chef Infra
Server. The Chef Infra Server responds to Chef Infra Client with the
complete list of files. Chef Infra Client compares the list of files to
the files that already exist on the node from previous runs, and then
downloads a copy of every new or modified file from the Chef Infra
Server.
Reset node All attributes in the rebuilt node object are reset. All attributes from
attributes attribute files, Policyfiles, and Ohai are loaded. Attributes that are
defined in attribute files are first loaded according to cookbook
order. For each cookbook, attributes in the default.rb file are
loaded first, and then additional attribute files (if present) are loaded
in lexical sort order. If attribute files are found within any cookbooks
that are listed as dependencies in the metadata.rb file, these are
loaded as well. All attributes in the rebuilt node object are updated
with the attribute data according to attribute precedence. When all
the attributes are updated, the rebuilt node object is complete.
Compile the Chef Infra Client identifies each resource in the node object and
resource collection builds the resource collection. Libraries are loaded first to ensure
that all language extensions and Ruby classes are available to all
resources. Next, attributes are loaded, followed by custom
resources. Finally, all recipes are loaded in the order specified by the
expanded run-list. This is also referred to as the "compile phase".
Converge the node Chef Infra Client configures the system based on the information
that has been collected. Each resource is executed in the order
identified by the run-list, and then by the order in which each
resource is listed in each recipe. Each resource defines an action to
run, which configures a specific part of the system. This process is
also referred to as convergence. This is also referred to as the
"execution phase".
Update the node When all the actions identified by resources in the resource
object, process collection have been done and Chef Infra Client finishes
exception and successfully, then Chef Infra Client updates the node object on the

114
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
report handlers Chef Infra Server with the node object built during a Chef Infra Client
run. (This node object will be pulled down by Chef Infra Client during
the next Chef Infra Client run.) This makes the node object (an d the
data in the node object) available for search.

Chef Infra Client always checks the resource collection for the
presence of exception and report handlers. If any are present, each
one is processed appropriately.
Get, run Chef After the Chef Infra Client run finishes, it begins the Compliance
InSpec Compliance Phase, which is a Chef InSpec run within the Chef Infra Client. Chef
Profiles InSpec retrieves tests from either a legacy audit cookbook or a
current InSpec profile.
Send or Save When all the InSpec tests finish running, Chef InSpec checks the
Compliance Report reporting handlers defined in the legacy audit cookbook or in a
current InSpec profile and processes them appropriately.
Stop, wait for the When everything is configured and the Chef Infra Client run is
next run complete, Chef Infra Client stops and waits until the next time it is
asked to run.

Environments: How to create Environments, Add servers to


environments.
Environments are logical grouping of machines to which various applications can be mapped
and deployed. Using environments and templates, you can model the infrastructure and the
middleware available for various applications. Environments consist of logical groups of
resources with a similar function or role called environment tiers. An environment tier can be
a group of machines with a similar function or role, such all the web servers or application
servers or database servers for an environment. Resources are actual target end point
machines (such as physical servers), virtual machines, or mobile devices.
Environments can be static, dynamic, or hybrid. A static environment has resources that are
already provisioned and managed at the platform level. Each resource has its own logical
name to identify it from the other resources in the system. It also can be assigned to one or
more resource pools or to a zone (a collection of agents). Several resources can correspond
to the same physical host or agent machine. Resources can also be configured as standard or
proxy.
Standard resources are machines running the CloudBees CD/RO agent on a supported agent
platform while
Proxy resources (agents and targets) are on remote platforms or hosts that exist in your
environment and requires SSH keys for authentication. The CloudBees CD/RO agent does not
need to run on the remote platform or host.
See Configuring Resources for more information about to create, configure, and manage
resources.
This example shows to how model a static environment to which the application will be
deployed. For information about dynamic environments and how to model them,
see Deploying Applications in Dynamic Environments.
What is Chef Environment?

115
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
It is always a good idea to have a separate environment for development, testing, and
production. Chef enables grouping nodes into separate environments to support an ordered
development flow.

For instance, one environment may be called "testing" and another may be called
"production". Since you don't want any code that is still in testing on your production
machines, each machine can only be in one environment. You can then have one
configuration for machines in your testing environment, and a completely different
configuration for computers in production.

Additional environments can be created to reflect each organization‟s patterns and workflow.
For example, creating production, staging, testing, and development environments.
Generally, an environment is also associated with one (or more) cookbook versions.

What is _default envionment?


By default, an environment called "_default" is created. Each node will be placed into this
environment unless another environment is specified. Environments can be created to tag a
server as part of a process group.

What is _default envionment?


By default, an environment called "_default" is created. Each node will be placed into this
environment unless another environment is specified. Environments can be created to tag a
server as part of a process group.

What envionment contents?


12345678
name
description
cookbook A version constraint for a single cookbook
cookbook_versions A version constraint for a group of
cookbooks
default_attributes
override_attributes

Chef Environment Syntax Patterns


1234567
name 'environment_name'
description 'environment_description'
cookbook OR cookbook_versions 'cookbook' OR 'cookbook' =>
'cookbook_version'
default_attributes 'node' => { 'attribute' => [ 'value', 'value',
'etc.' ] }

116
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
override_attributes 'node' => { 'attribute' => [ 'value', 'value',
'etc.' ] }

How to create a envionment?

1. Creating a Ruby file in the environments sub-directory of the chef-repo and then
pushing it to the Chef server
2. Creating a JSON file directly in the chef-repo and then pushing it to the Chef server
3. Using knife
4. Using the Chef management console web user interface
5. Using the Chef server REST API

Create and Manage chef environment using Knife


1234567891011121314151617181920212223
$ knife environment create development
$ knife environment create dev -d --description "The development
environment."
$ knife environment delete ENVIRONMENT_NAME
$ knife environment list -w
$ knife environment show
$ knife environment show devops -F json
$ knife environment list

{
"name": "development",
"description": "",
"cookbook_versions": {
},
"json_class": "Chef::Environment",
"chef_type": "environment",
"default_attributes": {
},
"override_attributes": {
}
}

Create chef environment using Ruby


123456789101112131415161718
$ cd ~/chef-repo/environments
$ navi no development.rb
name "development"
description "The master development branch"
cookbook_versions({
117
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
"nginx" => "<= 1.1.0",
"apt" => "= 0.0.1"
})
override_attributes ({
"nginx" => {
"listen" => [ "80", "443" ]
},
"mysql" => {
"root_pass" => "root"
}
})

Creating an environment
To create an environment:
1. Select Deployment Automation Environments.
2. Select New + to add an environment. The New Environment dialog appears.
3. Select Create New or From traditional application… based on your requirements. If you
selected From traditional application choose an application from the list.
4. Enter a name into the Name field.
5. Select a project for the new environment and enter a description of the environment.
You can include hyperlinks as part of an object description for any CloudBees CD/RO
object.
6. Select OK.
The Environment Editor opens.

From here, you can create a tier with resources or a cluster with nodes.
118
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Creating an environment tier
Environment tiers are used with traditional applications.
1. Define Tier 1:
a. Select Details from the vertical dots menu.
b. On the Details tab, enter the name and optional description.
c. On the Capabilities tab, optionally define a capability.
d. Select OK.
2. Assign resources it:
a. Select + in Resources block.
b. In the New dialog box, click Add resources or Add resource pool. The Resources list
or Resource Pools list, respectively, displays.
c. Select one or more enabled resources or resource pools for this environment and then
click OK.
The Environment editor now has an environment tier called mysql with one resource.
Example:

Figure 1. mysql environment tier


Creating an environment cluster
Environment clusters are used with microservice applications.
Before creating an environment cluster, you must create a cluster configuration, which is a CloudBees
CD/RO object to hold common values, predefined parameter sets, and credentials for your cluster.
Creating a cluster configuration leverages the CloudBees CD/RO EC-Helm plugin. You must create
one configuration for each cluster in your deployment. For more information, refer to Create Helm
plugin configurations.
1. Select + at the bottom of Cluster 1. The Cluster Definition dialog appears.
2. Select a platform. For a microservice application, select Kubernetes (via Helm).
3. Enter the cluster attributes as needed. The following table describes the available attributes:
Label Definition
Configuration The name of the previously defined configuration for the cluster.
Name
Namespace The name of the Kubernetes namespace. If not supplied, the default namespace is used.
Kubeconfig context The name of the context to be used for this cluster deployment. If not specified, the one defined by
the current-context inside kubeconfig file is used.
4. Select OK when finished.
5. Link a utility resource to the cluster. This defines the static environment to which the cluster is
deployed.

119
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

a. Select the New Utility Resource button (with wrench and hammer) in the upper right.
The New dialog appears.
b. Select Add resources or Add resource pool. The Resources list or Resource Pools list,
respectively, displays.
c. Select one or more enabled resources or resource pools for this environment and then
select OK.
The new Utility Resource tile now appears in the editor field.

Chef helps in performing environment specific configuration. It is always a good idea to have
a separate environment for development, testing, and production.
Chef enables grouping nodes into separate environments to support an ordered
development flow.

Creating an Environment
Creation of environment on the fly can be done using the knife utility. Following command
will open a Shell’s default editor, so that one can modify the environment definition.
vipin@laptop:~/chef-repo $ knife environment create book {
"name": "book",
"description": "",
"cookbook_versions": {
},
"json_class": "Chef::Environment",
"chef_type": "environment",
"default_attributes": {
},
"override_attributes": {
}
}
Created book

Testing a Created Environment


vipin@laptop:~/chef-repo $ knife environment list
_default
book

List Node for All Environments


vipin@laptop:~/chef-repo $ knife node list
my_server

120
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
_default Environment
Each organization will always start with at least a single environment called default
environment, which is always available to the Chef server. A default environment cannot be
modified in anyway. Any kind of changes can only be accommodated in the custom
environment that we create.

Environment Attributes
An attribute can be defined in an environment and then used to override the default settings
in the node. When the Chef client run takes place, then these attributes are compared with
the default attributes that are already present in the node. When the environment attributes
take precedence over the default attributes, Chef client will apply these settings and values
when the Chef client run takes place on each node.
An environment attribute can only be either default_attribute or override_attribute. It cannot
be a normal attribute. One can use default_attribute or override_attribute methods.

Attribute Type
Default − A default attribute is always reset at the start of every Chef client run and have the
lowest attribute precedence.
Override − An override attribute is always reset at the start of every Chef client run and has
a higher attribute precedence than default, force_default and normal. An override attribute is
most often defined in the recipe but can also be specified in an attribute file for a role or for
an environment.

Order of Applying an Attribute

121
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

122
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Roles: Create roles, Add Roles to organization.
What is Role?
A role is a way to define certain patterns and processes that exist across nodes in an organization as
belonging to a single job function. Each role consists of zero (or more) attributes and a run-list. Each
node can have zero (or more) roles assigned to it. When a role is run against a node, the configuration
details of that node are compared against the attributes of the role, and then the contents of that role‘s
run-list are applied to the node‘s configuration details. When a chef-client runs, it merges its own
attributes and run-lists with those contained within each assigned role.

How to use Roles in Chef?


1. Create a Role and add the cookbooks into it.
2. Assign the role into each node or bootstrap new nodes using roles
3. The the run list

How to create Role?


Method 1: In Chef Server directly
> knife role create client1
&

Add the run list e.g. ―recipe[nginx]‖ under ―run_list‖

Save & exit

The role will be created in Chef Server.

Example

name "web_servers"
description "This role contains nodes, which act as web servers"
run_list "recipe[webserver]"
default_attributes 'ntp' => {
'ntpdate' => {
'disable' => true
}
}
Let’s download the role from the Chef server so we have it locally in a Chef repository.
> knife role show client1 -d -Fjson > roles/client1.json

123
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Now, Lets bootstrap the node using knife with roles
> knife bootstrap --run-list "role[webserver]" --sudo hostname
How to edit the roles in chef Server?
> knife role edit client1
Method 2: In local repo under chef-repo folder
> vi webserver.rb
example –

name "web_servers"
description "This role contains nodes, which act as web servers"
run_list "recipe[webserver]"
default_attributes 'ntp' => {
'ntpdate' => {
'disable' => true
}
}
& Then upload to chef server using following commands.

$ knife role from file path/to/role/file


$ knife role from file web_servers.rb
How Assigning Roles to Nodes?
> knife node list
$ knife node edit node_name
OR
# Assign the role to a node called server:
$ knife node run_list add server 'role[web_servers]'
This will bring up the node‘s definition file, which will allow us to add a role to its run_list:

{ "name": "client1", "chef_environment": "_default", "normal": { "tags": [ ] }, "run_list": [


"recipe[nginx]" ] }
For instance, we can replace our recipe with our role in this file:

{ "name": "client1", "chef_environment": "_default", "normal": { "tags": [ ] }, "run_list": [


"role[web_server]" ] }
How to bootstrap node using role?
> knife bootstrap {{address}} --ssh-user {{user}} --ssh-password '{{password}}' --sudo --use-sudo-
password --node-name node1 --run-list 'role[production]'
> knife bootstrap --run-list "role[phpapp-web]" --sudo hostname
How to run roles against nodes?
You can run chef-client on multiple nodes via knife ssh command like, To query for all nodes that
have the webserver role and then use SSH to run the command sudo chef-client, enter:

> knife ssh "role:webserver" "sudo chef-client"

124
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To find the uptime of all of web servers running Ubuntu on the Amazon EC2 platform, enter:

> knife ssh "role:web" "uptime" -x ubuntu -a ec2.public_hostname


Method 3: Using Chef Autotmate UI
Step 1 – Create a role

Step 2 – Add a List of Cookbooks

Step 3 – Edit a Node and Roles

125
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 4 – Run knife command from workstation

$ knife ssh ―role:webserver‖ ―sudo chef-client‖

How it works

You define a role in a Ruby file inside the roles folder of your Chef repository. A role consists of
a name attribute and a description attribute. Additionally, a role usually contains a role-specific run
list and role-specific attribute settings.
Every node, which has a role in its run list, will have the role‘s run list expanded into its own. This
means that all the recipes (and roles), which are in the role‘s run list, will be executed on your nodes.

You need to upload your role on your Chef server by using the knife role from
file command.

Only then should you add the role to your node‘s run list.

Running the Chef client on a node having your role in its run list will execute all the recipes listed in
the role.

126
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Chef Attributes with Roles

Example of Role file


{
"name": "rajesh-node-1",
"chef_environment": "_default",
"normal": {
"tags": [

]
},
"policy_name": null,
"policy_group": null,
"run_list": [
"role[web-role]"
]

127
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Attributes: Understanding of Attributes, Creating Custom
Attributes, Defining in Cookbooks.
About Attributes
An attribute is a specific detail about a node. Attributes are used by Chef Infra Client to
understand:

The current state of the node


What the state of the node was at the end of the previous Chef Infra Client run
What the state of the node should be at the end of the current Chef Infra Client run

Attributes are defined by:

The node as saved on the Chef Infra Server


Attributes passed using JSON on the command line
Cookbooks (in attribute files and/or recipes)
Policyfiles

During every Chef Infra Client run, Chef Infra Client builds the attribute list using:

Attributes passed using JSON on the command line


Data about the node collected by Ohai.
The node object that was saved to the Chef Infra Server at the end of the previous
Chef Infra Client run.
The rebuilt node object from the current Chef Infra Client run, after it is updated for
changes to cookbooks (attribute files and/or recipes) and/or Policyfile s, and updated
for any changes to the state of the node itself.

After the node object is rebuilt, all of the attributes are compared, and then the node is
updated based on attribute precedence. At the end of every Chef Infra Client run, the node
object that defines the current state of the node is uploaded to the Chef Infra Server so
that it can be indexed for search.

There are 6 different types of attributes:


default: A default attribute is automatically reset at the start of every chef-client run and has
the lowest attribute precedence. Use defaultattributes as often as possible in cookbooks.
force_default: Use the force_default attribute to ensure that an attribute defined in a cookbook
(by an attribute file or by a recipe) takes precedence over a default attribute set by a role or an
environment.
normal: A normal attribute is a setting that persists in the node object. A normal attribute has a
higher attribute precedence than a defaultattribute.
override: An override attribute is automatically reset at the start of every chef-client run and
has a higher attribute precedence than default, force_default, and normal attributes.
An overrideattribute is most often specified in a recipe, but can be specified in an attribute file,
for a role, and/or for an environment. A cookbook should be authored so that it
uses override attributes only when required.

128
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
force_override: Use the force_override attribute to ensure that an attribute defined in a
cookbook (by an attribute file or by a recipe) takes precedence over an override attribute set by
a role or an environment.
automatic: An automatic attribute contains data that is identified by Ohai at the beginning of
every chef-client run. An automatic attribute cannot be modified and always has the highest
attribute precedence.

Attributes

Sometimes you might use hard-coded values (for example, directory name, filename, username, etc.)
at multiple locations inside your recipes. Later when you want to change this value, it becomes a
tedious process, as you have to browse through all the recipes that contains this value and change them
accordingly.
Instead, you can define the hard-code value as variable inside an attribute file, and use the attribute
name inside the recipe. This way when you want to change the value, you are changing only at one
place in the attribute file.
These are the different attribute types available: default, force_default, normal, override,
force_override, automatic
Inside your cookbook, for most situations, you‘ll be using the default attribute type.
The following is a sample attribute file, where I‘ve defined mysql related hard-coded values that I
need to eventually use in multiple recipes. In this example, the attribute file was created under ~/chef-
repo/cookbooks/thegeekstuff/attributes directory.
default['mysql']['dir'] = '/data/mysql'
default['mysql']['username'] = 'dbadmin'
default['mysql']['dbname'] = ‗devdb‘
Resources

You‘ll see the resources directory only from Chef 12.5 version and above.
Chef provides several built in resources for you to use. For example, using chef‘s built-in resource you
can manage packages, services, files, directories on your system.
But, if you have a complex requirement that is specific to your application or tool, you can create your
own custom resource, and place them under the resources directory. Once you place your custom
resource under this directory, you can use them in your recipes just like how you would use any other
chef‘s build-in recipes.
The following is a simple custom resource example. This file was created under ~/chef-
repo/cookbooks/thegeekstuff/resources directory.
There are three parts to this example: 1) Declare custom properties in the beginning 2) Load current
property values 3) Create action blocks for this custom resource.
property :myapp_name, String, default: 'Default Name for My App'

load_current_value do
# write code to load the current value for your properties
end

129
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
action :create do
file '/home/tomcat/myapp/config.cfg' do
content 'location=west'
end
# Write additional code to define other aspects of your app creation
end

action :delete do
#write code to delete your application
end
Definition

If you are using chef-client 12.5 or above, Chef recommends that you don‘t use definitions anymore,
and use the custom resources (which is explained above). Definition might be deprecated in future
version.
Definition is similar to a compile-time macros that can be used in multiple recipes.
Definitions are processed when the resources are compiled, and these are not same as resources as
definition don‘t support properties like only_if, not_if, etc.
The following is a simple definition example. In this example, app_config is the definition resource
name. This definition file was created under ~/chef-repo/cookbooks/thegeekstuff/definitions directory.

define :app_config do
file '/home/tomcat/myapp/config.cfg' do
content 'location=west'
end
end
Files

If you want certain files to be copied over to all your remote nodes as part of Chef deployment, you
can copy those files over to ―files‖ directory under your cookbook.
A particular file that is located under the files directory can be copied over to one or more remote
nodes using cookbook_file resource.
In the following example, I want to copy the dblogin.php file to the remote server. In this case, I might
create a recipe using cookbook_file resource as shown below.
First, make sure you copy the file mentioned in the source property (i.e dblogin.php) in the following
recipe to your ~/chef-repo/cookbooks/{your-cookbook-name}/files directory.
cookbook_file '/home/tomcat/myapp/login/dblogin.php' do
source 'dblogin.php'
owner 'tomcat'
group 'tomcat'
mode '0755'
action :create
end
Libraries

130
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
All custom library files that you create for a specific cookbook should be placed under the ―libraries‖
directory.
Library files should be written in ruby language.
You can write a library code that can either change the behavior of some of the existing chef
functionality, or to create a new functionality that is not currently satisfied by any of the existing chef
resources.
Basically you‘ll use libraries to create a custom chef resource that will solve some specific problem
based on your requirement.
You can start a brand new library without extending an existing library by simply starting your library
code (for example: MyCustomLibrary) with this line at the beginning: class MyCustomLibrary
If you are existing an existing Chef functionality, you should extend those appropriate Chef classes. In
the following example, we are extending Chef database resource. This is just the first few lines of a
custom library code that shows how you start the library file definition.
class Chef
class Resource
class MyDBResource < Chef::Resource::Database
..
Providers

All the providers that you write for your particular custom requirement should go under providers
directory.
You‘ll use custom providers when you want to inform chef-client how to manage a specific action.
You can define multiple actions for your custom provider, and inform chef-client how to manage
those actions.
You‘ll typically use custom provider when you are using LWRP (lightweight resource providers), in
which case, you‘ll first define a custom resource with your own set of actions, and then you‘ll write
custom providers, where you‘ll write ruby code to tell chef-client what exactly needs to be done for
those actions.
For example, a custom provider ( dbcluster.rb ) file located under ―providers‖ directory might have
few custom actions defined as shown below.
action :check do
..
end

action :setmaster do
..
end
Recipes

Chef recipe is the heart and soul of Chef functionality. This is where you‘ll specify all configurations
and setups that you want to be executed on your remote servers (nodes).
Recipe are written in Ruby language. Recipe will be stored inside a cookbook.
You can have multiple recipes (which is recommended for complex system configuration) inside one
cookbook.

131
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
You can also include an existing recipe in your current recipe. This way you can create a new recipe
that is dependent on another recipe.
In simple terms, recipes are bunch of Chef resources that you‘ll call to setup a configuration. For every
resource that you include in your recipe, you can specify what actions from that resource you want to
be executed, and you can also set appropriate attribute values, etc.
You can also write your own custom logic using Ruby inside the Chef recipe for a specific resources
that you are calling.
Everything that you write inside a recipe is executed sequentially.
When you have multiple recipes inside a cookbook, the order in which these recipes will be executed
can be specified using a run-list.
The following is a simple recipe example file ( mysetup.rb ) that can be placed under ―recipes‖
directory which will install the given packages and start the httpd services on a remote node where this
recipe is executed.
package [‗httpd‘, 'gcc', 'gcc-c++', 'nfs-utils'] do
action :install
end

service 'httpd' do
action [:enable, :start]
end
Templates

Template is similar to files, but the major difference is that using template we can dynamically
generate static text files.
Inside template we can have ruby programming language statements, which can then be used to
generate some content dynamically.
Chef templates are just an ERB, which is embedded ruby template.
To use a template, first create the template files using ERB and place it under the ―templates‖
directory.
Also, inside the cookbook recipe we should use template resource to call our template.
For example, place the following index.html.erb template file under ~/chef-repo/cookbooks/{your-
cookbook-name}/templates/default/ directory.
<html>
<body>
<h1>Hello world on <%= node[:fqdn] %></h1>
</body>
</html>
Cookbook Doc Files

When you create cookbook, it creates these two documentation files the top-level of your cookbook
directory: 1) README.md 2) CHANGELOG.md
Maintaining these two files are very important when multiple people are working on your cookbook.

132
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The first file README.md is where you‘ll document everything about your cookbook. By default,
when a cookbook is created, it already gives an excellent template inside the README.md file for
you to start your documentation.
Once you‘ve deployed a cookbook on production, as a best practice, any changes that you make to
your cookbook after that should have a updated new version number. Inside the CHANGELOG.md
file, you‘ll document specifically what changes were done in each and every version of your
cookbook.
Both README.md and CHANGELOG.md file uses the Markdown template format.
Cookbook Metadata File

As the name suggests, metadata.rb file is used to store certain metadata information about your
cookbook. This metadata.rb file is located at the top-level of the cookbook directory. I.e ~/chef-
repo/cookbooks/{your-cookbook-name}/metadata.rb
The information inside the metadata file is used by the chef server to make sure it deploys the correct
cookbook versions on the individual remote nodes.
When you upload your cookbook to the Chef server, the metadata file is compiled and stored in the
Chef server as JSON file.
By default when you create a cookbook using knife command, it generates the metadata.rb file.
There are certain metadata parameters that you can use inside this file. For example, you can specify
the current version number of your cookbook. You can also specify which chef-client versions will be
supported by this cookbook, etc. The following example shows a partial metadata.rb file.
name 'devdb'
maintainer_email '[email protected]'
description 'Setup the Development DB server'
version '2.5.1'
chef_version ">= 12.9"

Attribute Sources
Chef Infra Client evaluates attributes in the order that they are defined in the run -list,
including any attributes that are in the run-list as cookbook dependencies.
Attributes are provided to Chef Infra Client from the following locations:
JSON files passed using the chef-client -j
Nodes (collected by Ohai at the start of each Chef Infra Client run)
Attribute files (in cookbooks)
Recipes (in cookbooks)
Environments
Roles
Policyfiles
Notes:
Many attributes are maintained in the chef-repo for Policyfiles, environments, roles,
and cookbooks (attribute files and recipes)
133
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Many attributes are collected by Ohai on each individual node at the start of every Chef
Infra Client run
The attributes that are maintained in the chef-repo are uploaded to the Chef Infra Server
from the workstation, periodically
Chef Infra Client will pull down the node object from the Chef Infra Server and then
reset all the attributes except normal . The node object will contain the attribute data
from the previous Chef Infra Client run including attributes set with JSON files using -
j.
Chef Infra Client will update the cookbooks on the node (if required), which updates
the attributes contained in attribute files and recipes
Chef Infra Client will update the role and environment data (if required)
Chef Infra Client will rebuild the attribute list and apply attribute precedence while
configuring the node
Chef Infra Client pushes the node object to the Chef Infra Server at the end of a Chef
Infra Client run; the updated node object on the Chef Infra Server is then indexed for
search and is stored until the next Chef Infra Client run
Automatic Attributes (Ohai)
An automatic attribute is a specific detail about a node, such as an IP address, a host name, a
list of loaded kernel modules, and so on. Automatic attributes are detected by Ohai and are
then used by Chef Infra Client to ensure that they are handled properly during every Chef Infra
Client run. The most commonly accessed automatic attributes are:
Attribute Description
node['platform'] The platform on which a node is running. This attribute helps determine
which providers will be used.
node['platform_family'] The platform family is a Chef Infra specific grouping of similar platforms
where cookbook code can often be shared. For example, `rhel` includes
Red Hat Linux, Oracle Linux, CentOS, and several other platforms that
are almost identical to Red Hat Linux.
node['platform_version'] The version of the platform. This attribute helps determine which
providers will be used.
node['ipaddress'] The IP address for a node. If the node has a default route, this is the IPV4
address for the interface. If the node does not have a default route, the
value for this attribute should be nil. The IP address for default route is
the recommended default value.
node['macaddress'] The MAC address for a node, determined by the same interface that
detects the node['ipaddress'].
node['fqdn'] The fully qualified domain name for a node. This is used as the name of a
node unless otherwise set.
node['hostname'] The host name for the node.
node['domain'] The domain for the node.
node['recipes'] A list of recipes associated with a node (and part of that node's run-list).

134
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
node['roles'] A list of roles associated with a node (and part of that node's run-list).
node['ohai_time'] The time at which Ohai was last run. This attribute is not commonly used
in recipes, but it is saved to the Chef Infra Server and can be accessed
using the knife status subcommand.
Ohai collects a list of automatic attributes at the start of each Chef Infra Client run. This list
will vary from organization to organization, by server type, and by the platform that runs those
servers. All the attributes collected by Ohai are unmodifiable by Chef Infra Client. Run
the ohai command on a system to see which automatic attributes Ohai has collected for a
particular node.
Attribute Files
An attribute file is located in the attributes/ sub-directory for a cookbook. When a cookbook is
run against a node, the attributes contained in all attribute files are evaluated in the context of
the node object. Node methods (when present) are used to set attribute values on a node. For
example, the apache2 cookbook contains an attribute file called default.rb , which contains the
following attributes:
Copy
default['apache']['dir'] = '/etc/apache2'
default['apache']['listen_ports'] = [ '80','443' ]
The use of the node object ( node) is implicit in the previous example; the following example
defines the node object itself as part of the attribute:
Copy
node.default['apache']['dir'] = '/etc/apache2'
node.default['apache']['listen_ports'] = [ '80','443' ]
Another (much less common) approach is to set a value only if an attribute has no value. This
can be done by using the _unless variants of the attribute priority methods:
default_unless
normal_unless
Use the _unless variants carefully (and only when necessary) because when they are used,
attributes applied to nodes may become out of sync with the values in the cookbooks as these
cookbooks are updated. This approach can create situations where two otherwise identical
nodes end up having slightly different configurations and can also be a challenge to debug.
File Methods
Use the following methods within the attributes file for a cookbook or within a recipe. These
methods correspond to the attribute type of the same name:
override
default
normal
_unless
attribute?
135
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
A useful method that is related to attributes is the attribute? method. This method will check
for the existence of an attribute, so that processing can be done in an attributes file or recipe,
but only if a specific attribute exists.
Using attribute?() in an attributes file:
Copy
if attribute?('ec2')
# ... set stuff related to EC2
end
Using attribute?() in a recipe:
Copy
if node.attribute?('ec2')
# ... do stuff on EC2 nodes
end
Attributes from Recipes
A recipe is the most fundamental configuration element within the organization. A recipe:
Is authored using Ruby, which is a programming language designed to read and behave
in a predictable manner
Is mostly a collection of resources, defined using patterns (resource names, attribute-
value pairs, and actions); helper code is added around this using Ruby, when needed
Must define everything that is required to configure part of a system
Must be stored in a cookbook
May be included in another recipe
May use the results of a search query and read the contents of a d ata bag (including an
encrypted data bag)
May have a dependency on one (or more) recipes
Must be added to a run-list before it can be used by Chef Infra Client
Is always executed in the same order as listed in a run-list
An attribute can be defined in a cookbook (or a recipe) and then used to override the default
settings on a node. When a cookbook is loaded during a Chef Infra Client run, these attributes
are compared to the attributes that are already present on the node. Attributes that are defined
in attribute files are first loaded according to cookbook order. For each cookbook, attributes in
the default.rb file are loaded first, and then additional attribute files (if present) are loaded in
lexical sort order. When the cookbook attributes take preceden ce over the default attributes,
Chef Infra Client applies those new settings and values during a Chef Infra Client run on the
node.
Attributes from Roles
A role is a way to define certain patterns and processes that exist across nodes in an
organization as belonging to a single job function. Each role consists of zero (or more)
attributes and a run-list. Each node can have zero (or more) roles assigned to it. When a role is
136
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
run against a node, the configuration details of that node are compared against the attributes of
the role, and then the contents of that role‘s run-list are applied to the node‘s configuration
details. When a Chef Infra Client runs, it merges its own attributes and run -lists with those
contained within each assigned role.
An attribute can be defined in a role and then used to override the default settings on a node.
When a role is applied during a Chef Infra Client run, these attributes are compared to the
attributes that are already present on the node. When the role attributes take prec edence over
the default attributes, Chef Infra Client applies those new settings and values during a Chef
Infra Client run.
A role attribute can only be set to be a default attribute or an override attribute. A role
attribute cannot be set to be a normal attribute. Use
the default_attribute and override_attribute methods in the .rb attributes file or
the default_attributes and override_attributes hashes in a JSON data file.
Attributes from Environments
An environment is a way to map an organization‘s real-life workflow to what can be
configured and managed when using Chef Infra. This mapping is accomplished by setting
attributes and pinning cookbooks at the environment level. With environments, you can
change cookbook configurations depending on the system‘s designation. For example, by
designating different staging and production environments, you can then define the correct
URL of a database server for each environment. Environments also allow organizations to
move new cookbook releases from staging to production with confidence by stepping releases
through testing environments before entering production.
Attributes can be defined in an environment and then used to override the default attributes in
a cookbook. When an environment is applied during a Chef In fra Client run, environment
attributes are compared to the attributes that are already present on the node. When the
environment attributes take precedence over the default attributes, Chef Infra Client applies
those new settings and values during a Chef Infra Client run.
Environment attributes can be set to either default attribute level or an override attribute level.

In a nutshell, server configuration management (also popularly referred to as IT


Automation) is a solution for turning your infrastructure administration into a codebase,
describing all processes necessary for deploying a server in a set of provisioning
scripts that can be versioned and easily reused. It can greatly improve the integrity of
any server infrastructure over time.

In a previous guide, we talked about the main benefits of implementing a configuration


management strategy for your server infrastructure, how configuration management
tools work, and what these tools typically have in common.

This part of the series will walk you through the process of automating server
provisioning using Chef, a powerful configuration management tool that leverages the
Ruby programming language to automate infrastructure administration and
provisioning. We will focus on the language terminology, syntax, and features
necessary for creating a simplified example to fully automate the deployment of an
Ubuntu 18.04 web server using Apache.

This is the list of steps we need to automate in order to reach our goal:
137
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1. Update the apt cache
2. Install Apache
3. Create a custom document root directory
4. Place an index.html file in the custom document root
5. Apply a template to set up our custom virtual host
6. Restart Apache

We will start by having a look at the terminology used by Chef, followed by an


overview of the main language features that can be used to write recipes. At the end
of this guide, we will share the complete example so you can try it by yourself.

Note: this guide is intended to get you introduced to the Chef language and how to write
recipes to automate your server provisioning. For a more introductory view of Chef,
including the steps necessary for installing and getting started with this tool, please refer
to Chef’s official documentation.

##Getting Started Before we can move to a more hands-on view of Chef, it is


important that we get acquainted with important terminology and concepts introduced
by this tool. ###Chef Terms

Chef Server: a central server that stores information and manages provisioning of the
nodes
Chef Node: an individual server that is managed by a Chef Server
Chef Workstation: a controller machine where the provisionings are created and
uploaded to the Chef Server
Recipe: a file that contains a set of instructions (resources) to be executed. A recipe
must be contained inside a Cookbook
Resource: a portion of code that declares an element of the system and what action
should be executed. For instance, to install a package we declare a package resource
with the action install
Cookbook: a collection of recipes and other related files organized in a pre-defined
way to facilitate sharing and reusing parts of a provisioning
Attributes: details about a specific node. Attributes can be automatic (see next
definition) and can also be defined inside recipes
Automatic Attributes: global variables containing information about the system, like
network interfaces and operating system (known as facts in other tools). These
automatic attributes are collected by a tool called Ohai
Services: used to trigger service status changes, like restarting or stopping a service

###Recipe Format Chef recipes are written using Ruby. A recipe is basically a
collection of resource definitions that will create a step-by-step set of instructions to
be executed by the nodes. These resource definitions can be mixed with Ruby code
for more flexibility and modularity.

Below you can find a simple example of a recipe that will run apt-get update and
install vim afterwards:
execute "apt-get update" do
command "apt-get update"
end

138
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
apt_package "vim" do
action :install
end

##Writing Recipes ###Working with Variables Local variables can be defined inside
recipes as regular Ruby local variables. The example below shows how to create a
local variable that is later used inside a resource definition:

package = "vim"

apt_package package do
action :install
end
These variables, however, have a limited scope, being valid only inside the file where
they were defined. If you want to create a variable and make it globally available, so
you can use it from any of your cookbooks or recipes, you need to define a custom
attribute.

####Using Attributes Attributes represent details about a node. Chef has automatic
attributes, which are the attributes collected by a tool called Ohai and containing
information about the system (such as platform, hostname and default IP address), but
it also lets you define your own custom attributes.

Attributes have different precedence levels, defined by the type of attribute you
create. default attributes are the most common choice, as they can still be
overwritten by other attribute types when desired.
The following example shows how the previous example would look like with
a default node attribute instead of a local variable:
node.default['main']['package'] = "vim"

apt_package node['main']['package'] do
action :install
end

There are two details to observe in this example:

The recommended practice when defining node variables is to organize them as


hashes using the current cookbook in use as the key. In this case, we used main,
because we have a cookbook with the same name. This avoids confusion if you are
working with multiple cookbooks that might have similar named attributes. Notice that
we used node.default when defining the attribute, but when accessing its value later,
we used node directly. The node.default usage defines that we are creating an
attribute of type default. This attribute could have its value overwritten by another
type with higher precedence, such as normal or override attributes.

The attributes’ precedence can be slightly confusing at first, but you will get used to it
after some practice. To illustrate the behavior, consider the following example:

139
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
node.normal['main']['package'] = "vim"

node.override['main']['package'] = "git"

node.default['main']['package'] = "curl"

apt_package node['main']['package'] do
action :install
end
Do you know which package will be installed in this case? If you guessed git, you
guessed correctly. Regardless of the order in which the attributes were defined, the
higher precedence of the type override will make the node['main']['package'] be
evaluated to git`. ###Using Loops Loops are typically used to repeat a task using
different input values. For instance, instead of creating 10 tasks for installing 10
different packages, you can create a single task and use a loop to repeat the task with
all the different packages you want to install.
Chef supports all Ruby loop structures for creating loops inside recipes. For simple
usage, each is a common choice:
['vim', 'git', 'curl'].each do |package|
apt_package package do
action :install
end
end

Instead of using an inline array, you can also create a variable or attribute for defining
the parameters you want to use inside the loop. This will keep things more organized
and easier to read. Below, the same example now using a local variable to define the
packages that should be installed:

packages = ['vim', 'git', 'curl']

packages.each do |package|
apt_package package do
action :install
end
end

###Using Conditionals Conditionals can be used to dynamically decide whether or not


a block of code should be executed, based on a variable or an output from a
command, for instance.

Chef supports all Ruby conditionals for creating conditional statements inside recipes.
Additionally, all resource types support two special properties that will evaluate an
expression before deciding if the task should be executed or not: if_only and not_if.
The example below will check for the existence of php before trying to install the
extension php-pear. It will use the command which for verifying if there is

140
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
a php executable currently installed on this system. If the command which php returns
false, this task won’t be executed:
apt_package "php-pear" do
action :install
only_if "which php"
end
If we want to do the opposite, executing a command at all times except when a
condition is evaluated as true, we use not_if instead. This example will
install php5 unless the system is CentOS:
apt_package "php5" do
action :install
not_if { node['platform'] == 'centos' }
end
For performing more complex evaluations, of if you want to execute several tasks
under a specific condition, you may use any of the standard Ruby conditionals. The
following example will only execute apt-get update when the system is either
Debian or Ubuntu:
if node['platform'] == 'debian' || node['platform'] == 'ubuntu'
execute "apt-get update" do
command "apt-get update"
end
end
The attribute node['platform'] is an automatic attribute from Chef. The last example was
only to demonstrate a more complex conditional construction, however it could be
replaced by a simple test using the automatic attribute node['platform_family'], which
would return “debian” for both Debian and Ubuntu systems.

###Working with Templates Templates are typically used to set up configuration files,
allowing for the use of variables and other features intended to make these files more
versatile and reusable.

Chef uses Embedded Ruby (ERB) templates, which is the same format used by Puppet.
They support conditionals, loops and other Ruby features.

Below is an example of an ERB template for setting up an Apache virtual host, using a
variable to define the document root for this host:

<VirtualHost *:80>
ServerAdmin webmaster@localhost
DocumentRoot <%= @doc_root %>

<Directory <%= @doc_root %>>


AllowOverride All
Require all granted
</Directory>
</VirtualHost>

141
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
In order to apply the template, we need to create a template resource. This is how you
would apply this template to replace the default Apache virtual host:
template "/etc/apache2/sites-available/000-default.conf" do
source "vhost.erb"
variables({ :doc_root => node['main']['doc_root'] })
action :create
end
Chef makes a few assumptions when dealing with local files, in order to enforce
organization and modularity. In this case, Chef would look for a vhost.erb template file
inside a templates folder that should be in the same cookbook where this recipe is
located.
Unlike the other configuration management tools we’ve seen so far, Chef has a more
strict scope for variables. This means you will have to explicitly provide any variables
you plan to use inside a template, when defining the template resource. In this
example, we used the variables method to pass along the doc_root attribute we need
at the virtual host template. ###Defining and Triggering Services Service resources
are used to make sure services are initialized and enabled. They are also used to
trigger service restarts.

In Chef, service resources need to be declared before you try to notify them,
otherwise you will get an error.

Let’s take into consideration our previous template usage example, where we set up an
Apache virtual host. If you want to make sure Apache is restarted after a virtual host
change, you first need to create a service resource for the Apache service. This is how
such resource is defined in Chef:
service "apache2" do
action [ :enable, :start ]
end
Now, when defining the template resource, you need to include a notify option in
order to trigger a restart:
template "/etc/apache2/sites-available/000-default.conf" do
source "vhost.erb"
variables({ :doc_root => node['main']['doc_root'] })
action :create
notifies :restart, resources(:service => "apache2")
end

##Example Recipe Now let’s have a look at a recipe that will automate the installation
of an Apache web server within an Ubuntu 14.04 system, as discussed in this guide’s
introduction.

The complete example, including the template file for setting up Apache and an HTML
file to be served by the web server, can be found on Github. The folder also contains a
Vagrantfile that lets you test the recipe in a simplified setup, using a virtual machine
managed by Vagrant.

Below you can find the complete recipe:


142
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1. node.default['main']['doc_root'] = "/vagrant/web"
2.
3.
4.
5. execute "apt-get update" do
6.
7. command "apt-get update"
8.
9. end
10.
11.
12.
13. apt_package "apache2" do
14.
15. action :install
16.
17. end
18.
19.
20.
21. service "apache2" do
22.
23. action [ :enable, :start ]
24.
25. end
26.
27.
28.
29. directory node['main']['doc_root'] do
30.
31. owner 'www-data'
32.
33. group 'www-data'
34.
35. mode '0644'
36.
37. action :create
38.
39. end
40.
41.
42.
43. cookbook_file "#{node['main']['doc_root']}/index.html" do
143
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
44.
45. source 'index.html'
46.
47. owner 'www-data'
48.
49. group 'www-data'
50.
51. action :create
52.
53. end
54.
55.
56.
57. template "/etc/apache2/sites-available/000-default.conf" do
58.
59. source "vhost.erb"
60.
61. variables({ :doc_root => node['main']['doc_root'] })
62.
63. action :create
64.
65. notifies :restart, resources(:service => "apache2")
66.
67. end
68.
69.
70.

###Recipe Explained

####line 1 The recipe starts with an attribute definition, node['main']['doc_root'].


We could have used a simple local variable here, however in most use case scenarios,
recipes need to define global variables that will be used from included recipes or other
files. For these situations, it is necessary to create an attribute instead of a local
variable, as the later has a limited scope.
####lines 3-5 This execute resource runs an apt-get update.
####lines 7-10 This apt_package resource installs the package apache2.
####lines 12-15 This service resource enables and starts the service apache2. Later
on, we will need to notify this resource for a service restart. It is important that the
service definition comes before any resource that attempts to notify a service,
otherwise you will get an error.
####lines 17-22 This directory resource uses the value defined by the custom
attribute node['main']['doc_root'] to create a directory that will serve as
our document root.

144
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
####lines 24-29 A cookbook_file resource is used to copy a local file to a remote
server. This resource will copy our index.html file and place it inside the document
root we created in a previous task.
####lines 31-36 Finally, this template resource applies our Apache virtual host
template and notifies the service apache2 for a restart.

##Conclusion Chef is a powerful configuration management tool that leverages the


Ruby language to automate server provisioning and deployment. It gives you freedom
to use the standard language features for maximum flexibility, while also offering
custom DSLs for some resources.

Data bags: Understanding the data bags, Creating and managing


the Data bags, Creating the data bags using CLI and Chef Console,
Sample Data bags for Creating Users.
About Data Bags
Data bags store global variables as JSON data. Data bags are indexed for searching and
can be loaded by a cookbook or accessed during a search.
Create a Data Bag
A data bag can be created in two ways: using knife or manually. In general, using knife to
create data bags is recommended, but as long as the data bag folders and data bag item
JSON files are created correctly, either method is safe and effective.

Create a Data Bag with Knife


knife can be used to create data bags and data bag items when the knife data
bag subcommand is run with the create argument. For example:

Copy
knife data bag create DATA_BAG_NAME (DATA_BAG_ITEM)

knife can be used to update data bag items using the from file argument:

Copy
knife data bag from file BAG_NAME ITEM_NAME.json

As long as a file is in the correct directory structure, knife will be able to find the d ata bag
and data bag item with only the name of the data bag and data bag item. For example:

Copy
knife data bag from file BAG_NAME ITEM_NAME.json

will load the following file:

Copy
data_bags/BAG_NAME/ITEM_NAME.json

145
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Continuing the example above, if you are in the “admins” directory and make changes to the
file charlie.json, then to upload that change to the Chef Infra Server use the following
command:

Copy
knife data bag from file admins charlie.json

In some cases, such as when knife is not being run from the root directory for the chef-
repo, the full path to the data bag item may be required. For example:

Copy
knife data bag from file BAG_NAME /path/to/file/ITEM_NAME.json
Manually
One or more data bags and data bag items can be created manually under
the data_bags directory in the chef-repo. Any method can be used to create the data bag
folders and data bag item JSON files. For example:

Copy
mkdir data_bags/admins

would create a data bag folder named “admins”. The equivalent command for using knife is:

Copy
knife data bag create admins

A data bag item can be created manually in the same way as the data bag, but by also
specifying the file name for the data bag item (this example is using vi, a visual editor for
UNIX):

Copy
vi data_bags/admins/charlie.json

would create a data bag item named “charlie.json” under the “admins” sub -directory in
the data_bags directory of the chef-repo. The equivalent command for using knife is:

Copy
knife data bag create admins charlie

Store Data in a Data Bag


When the chef-repo is cloned from GitHub, the following occurs:

A directory named data_bags is created.


For each data bag, a sub-directory is created that has the same name as the data
bag.
For each data bag item, a JSON file is created and placed in the appropriate sub -
directory.

The data_bags directory can be placed under version source control.

When deploying from a private repository using a data bag, use the deploy_key option to
ensure the private key is present:

Copy
146
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
{
'id': 'my_app',
... (truncated) ...
'deploy_key': 'ssh_private_key'
}

where ssh_private_key is the same SSH private key as used with a private git repository
and the new lines converted to \n .

Directory Structure
All data bags are stored in the data_bags directory of the chef-repo. This directory structure
is understood by knife so that the full path does not need to be entered when working with
data bags from the command line. An example of the data_bags directory structure:

Copy
- data_bags
- admins
- charlie.json
- bob.json
- tom.json
- db_users
- charlie.json
- bob.json
- sarah.json
- db_config
- small.json
- medium.json
- large.json

where admins , db_users , and db_config are the names of individual data bags and all of the
files that end with .json are the individual data bag items.

Data Bag Items


A data bag is a container of related data bag items, where each individual data bag item is a
JSON file. knife can load a data bag item by specifying the name of the data bag to which
the item belongs and then the filename of the data bag item. The only structural
requirement of a data bag item is that it must have an id :

Copy
{
/* This is a supported comment style */
// This style is also supported
"id": "ITEM_NAME",
"key": "value"
}

where

key and value are the key:value pair for each additional attribute within the data bag
item
/* ... */ and // ... show two ways to add comments to the data bag item

Encrypt a Data Bag Item


A data bag item may be encrypted using shared secret encryption. This allows each data
bag item to store confidential information (such as a database password) or to be managed
147
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
in a source control system (without plain-text data appearing in revision history). Each data
bag item may be encrypted individually; if a data bag contains multiple encrypted data bag
items, these data bag items are not required to share the same encryption keys.

Note

Because the contents of encrypted data bag items are not visible to the Chef Infra Server,
search queries against data bags with encrypted items will not return any results.

Encryption Versions
The manner by which a data bag item is encrypted depends on the Chef Infra Client version
used. See the following:

Infra Client version Encryption v0 Encryption v1 Encryption v2 Encryption v3


10.x R W
11.0+ R R W
11.6+ R D R D R W
13.0 R D R D R D R W

R = read W = write D = disable

Version 0
Chef Infra Client 0.10+

Uses YAML serialization format to encrypt data bag items


Uses Base64 encoding to preserve special characters
Uses AES-256-CBC encryption, as defined by the OpenSSL package in the Ruby
Standard Library
Shared secret encryption; an encrypted file can only be decrypted by a node or a
user with the same shared secret
Recipes load encrypted data with access to the shared secret in a file on the node or
from a URI path
Decrypts only data bag item values. Keys are encrypted but searchable
Data bag id value is unencrypted for tracking data bag items
Version 1
Chef Infra Client 11.0+

Version 0
Uses JSON serialization format instead of YAML to encrypt data bag items
Adds random initialization vector encryption for each value to protect against
cryptanalysis
Version 2
Chef Infra Client 11.6+

Version 1
Option to disable versions 0 and 1
Adds Encrypt-then-MAC(EtM) protection
Version 3
148
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Chef Infra Client 13.0+

Option to disable version 0, 1, and 2


Knife Options
knife can encrypt and decrypt data bag items when the knife data bag subcommand is run
with the create , edit , from file , or show arguments and the following options:

Option Description
The encryption key that is used for values contained within a data bag item. If secret is
--secret
not specified, Chef Infra Client looks for a secret at the path specified by
SECRET
the encrypted_data_bag_secret setting in the client.rb file.
--secret-
The path to the file that contains the encryption key.
file FILE

Secret Keys
Encrypting a data bag item requires a secret key. A secret key can be created in any
number of ways. For example, OpenSSL can be used to generate a random number, which
can then be used as the secret key:

Copy
openssl rand -base64 512 | tr -d '\r\n' > encrypted_data_bag_secret

where encrypted_data_bag_secret is the name of the file which will contain the secret key.
For example, to create a secret key named “my_secret_key”:

Copy
openssl rand -base64 512 | tr -d '\r\n' > my_secret_key

The tr command eliminates any trailing line feeds. Doing so avoids key corruption when
transferring the file between platforms with different line endings.

Encrypt
A data bag item is encrypted using a knife command similar to:

Copy
knife data bag create passwords mysql --secret-file /tmp/my_data_bag_key

where “passwords” is the name of the data bag, “mysql” is the name of the data bag item,
and “/tmp/my_data_bag_key” is the path to the location in which the file that contains the
secret-key is located. knife will ask for user credentials before the encrypted data bag item
is saved.

Verify Encryption
When the contents of a data bag item are encrypted, they will not be readable until they are
decrypted. Encryption can be verified with a knife command similar to:

Copy
knife data bag show passwords mysql

149
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
where “passwords” is the name of the data bag and “mysql” is the name of the data bag
item. This will return something similar to:

Copy
id: mysql
pass:
cipher: aes-256-cbc
encrypted_data: JZtwXpuq4Hf5ICcepJ1PGQohIyqjNX6JBc2DGpnL2WApzjAUG9SkSdv75TfKSjX4
iv: VYY2qx9b4r3j0qZ7+RkKHg==
version: 1
user:
cipher: aes-256-cbc
encrypted_data: 10BVoNb/plkvkrzVdybPgFFII5GThZ3Op9LNkwVeKpA=
iv: uIqKHZ9skJlN2gpJoml6rQ==
version: 1
Decrypt
An encrypted data bag item is decrypted with a knife command similar to:

Copy
knife data bag show --secret-file /tmp/my_data_bag_key passwords mysql

that will return JSON output similar to:

Copy
{
"id": "mysql",
"pass": "thesecret123",
"user": "fred"
}

Edit a Data Bag Item


A data bag can be edited in two ways: using knife or by using the Chef management
console.

Edit a Data Bag with Knife


Use the edit argument to edit the data contained in a data bag. If encryption is being used,
the data bag will be decrypted, the data will be made available in the $EDITOR, and then
encrypted again before saving it to the Chef Infra Server.

To edit an item named “charlie” that is contained in a data bag named “admins”, en ter:

Copy
knife data bag edit admins charlie

to open the $EDITOR. Once opened, you can update the data before saving it to the Chef
Infra Server. For example, by changing:

Copy
{
"id": "charlie"
}

to:

150
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
{
"id": "charlie",
"uid": 1005,
"gid": "ops",
"shell": "/bin/zsh",
"comment": "Crazy Charlie"
}

Use Data Bags


Data bags can be accessed in the following ways:

Search
Data bags store global variables as JSON data. Data bags are indexed for searching and
can be loaded by a cookbook or accessed during a search.

Any search for a data bag (or a data bag item) must specify the name of the data bag and
then provide the search query string that will be used during the search. For example, to
use knife to search within a data bag named “admin_data” across all items, except for the
“admin_users” item, enter the following:

Copy
knife search admin_data "(NOT id:admin_users)"

Or, to include the same search query in a recipe, use a code block similar to:

Copy
search(:admin_data, 'NOT id:admin_users')

It may not be possible to know which data bag items will be needed. It may be necessary to
load everything in a data bag (but not know what “everything” is). Using a search query is
the ideal way to deal with that ambiguity, yet still ensure that all of the required data is
returned. The following examples show how a recipe can use a series of search queries to
search within a data bag named “admins”. For example, to find every administrator:

Copy
search(:admins, '*:*')

Or to search for an administrator named “charlie”:

Copy
search(:admins, 'id:charlie')

Or to search for an administrator with a group identifier of “ops”:

Copy
search(:admins, 'gid:ops')

Or to search for an administrator whose name begins with the letter “c”:

Copy
search(:admins, 'id:c*')

Data bag items that are returned by a search query can be used as if they were a hash. For
example:
151
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
charlie = search(:admins, 'id:charlie').first
# => variable 'charlie' is set to the charlie data bag item
charlie['gid']
# => "ops"
charlie['shell']
# => "/bin/zsh"

The following recipe can be used to create a user for each administrator by loading all of
the items from the “admins” data bag, looping through each admin in the data bag, and then
creating a user resource so that each of those admins exist:

Copy
admins = data_bag('admins')

admins.each do |login|
admin = data_bag_item('admins', login)
home = "/home/#{login}"

user(login) do
uid admin['uid']
gid admin['gid']
shell admin['shell']
comment admin['comment']
home home
manage_home true
end
end

And then the same recipe, modified to load administrators using a search query (and using
an array to store the results of the search query):

Copy
admins = []

search(:admins, '*:*').each do |admin|


login = admin['id']

admins << login

home = "/home/#{login}"

user(login) do
uid admin['uid']
gid admin['gid']
shell admin['shell']
comment admin['comment']

home home
manage_home true
end
end
Environments
Values that are stored in a data bag are global to the organization and are available to any
environment. There are two main strategies that can be used to store shared environment
data within a data bag: by using a top-level key that corresponds to the environment or by
using separate items for each environment.

A data bag that is storing a top-level key for an environment might look something like this:

152
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
{
"id": "some_data_bag_item",
"production" : {
# Hash with all your data here
},
"testing" : {
# Hash with all your data here
}
}

When using the data bag in a recipe, that data can be accessed from a recipe using code
similar to:

Copy
data_bag_item[node.chef_environment]['some_other_key']

The other approach is to use separate items for each environment. Depending on the
amount of data, it may all fit nicely within a single item. If this is the case, then creating
different items for each environment may be a simple approach to providing shared
environment values within a data bag. However, this approach is more time-consuming and
may not scale to large environments or when the data must be stored in many data bag
items.

Recipes
Data bags can be accessed by a recipe in the following ways:

Loaded by name when using the Chef Infra Language. Use this approach when a only
single, known data bag item is required.
Accessed through the search indexes. Use this approach when more than one data
bag item is required or when the contents of a data bag are looped through. The
search indexes will bulk-load all of the data bag items, which will result in a lower
overhead than if each data bag item were loaded by name.
Load with Chef Infra Language
The Chef Infra Language provides access to data bags and data bag items (including
encrypted data bag items) with the following methods:

data_bag(bag) , where bag is the name of the data bag.


data_bag_item('bag_name', 'item', 'secret') , where bag is the name of the data bag
and item is the name of the data bag item. If 'secret' is not specified, Chef Infra
Client will look for a secret at the path specified by
the encrypted_data_bag_secret setting in the client.rb file.

The data_bag method returns an array with a key for each of the data bag items that are
found in the data bag.

Some examples:

To load the secret from a file:

Copy
data_bag_item('bag', 'item', IO.read('secret_file'))

153
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To load a single data bag item named admins :

Copy
data_bag('admins')

The contents of a data bag item named justin :

Copy
data_bag_item('admins', 'justin')

will return something similar to:

Copy
# => {'comment'=>'Justin Currie', 'gid'=>1005, 'id'=>'justin', 'uid'=>1005,
'shell'=>'/bin/zsh'}

If item is encrypted, data_bag_item will automatically decrypt it using the key specified
above, or (if none is specified) by the Chef::Config[:encrypted_data_bag_secret] method,
which defaults to /etc/chef/encrypted_data_bag_secret .

Create and Edit


Creating and editing the contents of a data bag or a data bag item from a recipe is not
recommended. The recommended method of updating a data bag or a data bag item is to
use knife and the knife data bag subcommand. If this action must be done from a recipe,
please note the following:

If two operations concurrently attempt to update the contents of a data bag, the
last-written attempt will be the operation to update the contents of the data bag.
This situation can lead to data loss, so organizations should take steps to ensure that
only one Chef Infra Client is making updates to a data bag at a time.
Altering data bags from the node when using the open source Chef Infra Server
requires the node’s API client to be granted admin privileges. In most cases, this is
not advisable.

and then take steps to ensure that any subsequent actions are done carefully. The following
examples show how a recipe can be used to create and edit the contents of a data bag or a
data bag item using the Chef::DataBag and Chef::DataBagItem objects.

To create a data bag from a recipe:

Copy
users = Chef::DataBag.new
users.name('users')
users.create

To create a data bag item from a recipe:

Copy
sam = {
'id' => 'sam',
'Full Name' => 'Sammy',
'shell' => '/bin/zsh',
}
databag_item = Chef::DataBagItem.new
databag_item.data_bag('users')
154
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
databag_item.raw_data = sam
databag_item.save

To edit the contents of a data bag item from a recipe:

Copy
sam = data_bag_item('users', 'sam')
sam['Full Name'] = 'Samantha'
sam.save
Create Users
Chef Infra Client can create users on systems based on the contents of a data bag. For
example, a data bag named “admins” can contain a data bag item for each of the
administrators that will manage the various systems that each Chef Infra Client is
maintaining. A recipe can load the data bag items and then create user accounts on the
target system with code similar to the following:

Copy
# Load the keys of the items in the 'admins' data bag
admins = data_bag('admins')

admins.each do |login|
# This causes a round-trip to the server for each admin in the data bag
admin = data_bag_item('admins', login)
homedir = '/home/#{login}'

# for each admin in the data bag, make a user resource


# to ensure they exist
user(login) do
uid admin['uid']
gid admin['gid']
shell admin['shell']
comment admin['comment']
home homedir
manage_home true
end
end

# Create an "admins" group on the system


# You might use this group in the /etc/sudoers file
# to provide sudo access to the admins
group 'admins' do
gid '999'
members 'admins'
end
chef-solo

chef-solo can load data from a data bag as long as the contents of that data bag are
accessible from a directory structure that exists on the same machine as chef -solo. The
location of this directory is configurable using the data_bag_path option in the solo.rb file.
The name of each sub-directory corresponds to a data bag and each JSON file within a
sub-directory corresponds to a data bag item. Search is not available in recipes when they
are run with chef-solo; use the data_bag() and data_bag_item() functions to access data
bags and data bag items.

155
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 4 : Build tool- Maven
Maven Installation
Apache Maven is a build-automation tool designed to provide a comprehensive and easy-to-useway
of developing Java applications. It uses a POM (Project Object Model) approach to create a
standardized development environment for multiple teams.

In this tutorial, we will show you how to install Apache Maven on a system running
Windows.

Prerequisites

A system running Windows.


A working Internet connection.
Access to an account with administrator
privileges.Access to the command prompt.
A copy of Java installed and ready to use, with the JAVA_HOME environment variable set up (learn
howto set up the JAVA_HOME environment variable in our guide to installing Java on Windows).

How to
Install
Maven on
Windows
Follow the steps outlined below to install Apache Maven on Windows.

Step 1: Download Maven Zip File and Extract


1. Visit the Maven download page and download the version of Maven you want to install.
156
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The Files section contains the archives of the latest version. Access earlier versions using the
archives link in the Previous Releases section.

2. Click on the appropriate link to download the binary zip archive of the latest version of
Maven.As of the time of writing this tutorial, that is version 3.8.4.

3. Since there is no installation process, extract the Maven archive to a directory of your choice
once the download is complete. For this tutorial, we are using C:\Program
Files\Maven\apache-maven-3.8.4.

Step 2: Add MAVEN_HOME System Variable


1. Open the Start menu and search for environment variables.

2. Click the Edit the system environment variables result.

157
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

3. Under the Advanced tab in the System Properties window, click Environment Variables.

158
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
4. Click the New button under the System variables section to add a new system
environmentvariable.

5. Enter MAVEN_HOME as the variable name and the path to the Maven directory as the variable
value. Click OK to save the new system variable.

Step 3: Add MAVEN_HOME Directory in


PATH Variable
1. Select the Path variable under the System variables section in the
EnvironmentVariables window. Click the Edit button to edit the variable.

159
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

2. Click the New button in the Edit environment variable window.

160
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
3. Enter %MAVEN_HOME%\bin in the new field. Click OK to save changes to the Path variable.

Note: Not adding the path to the Maven home directory to the Path variable
causes the 'mvn' is not recognized as an internal or external
command, operable program or batch file error when using the mvn
command.

4. Click OK in the Environment Variables window to save the changes to the system variables.

161
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 4: Verify Maven Installation


In the command prompt, use the following command to verify the installation by checking the
current version of Maven:

mvn -version

162
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Conclusion
After reading this tutorial, you should have a copy of Maven installed and ready to use on your
Windows system.

Maven Build requirements


What we learnt in Project Creation chapter is how to create a Java application using Maven.
Now we'll see how to build and test the application.
Go to C:/MVN directory where you've created your java application.
Open consumerBanking folder. You will see the POM.xml file with the following contents.
Update it to reflect the current java version.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
</dependency>
</dependencies>
</project>
Here you can see, Maven already added Junit as test framework. By default, Maven adds a
source file App.java and a test file AppTest.java in its default directory structure, as
discussed in the previous chapter.
Let's open the command console, go the C:\MVN\consumerBanking directory and execute
the following mvn command.
C:\MVN\consumerBanking>mvn clean package
Maven will start building the project.
C:\MVN\consumerBanking>mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.bank:consumerBanking >----------------
[INFO] Building consumerBanking 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ consumerBanking ---
[INFO] Deleting C:\MVN\consumerBanking\target
163
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ consumerBanking ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform
dependent!
[INFO] skip non existing resourceDirectory C:\MVN\consumerBanking\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ consumerBanking ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding Cp1252, i.e. build is platform
dependent!
[INFO] Compiling 1 source file to C:\MVN\consumerBanking\target\classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ consumerBanking ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform
dependent!
[INFO] skip non existing resourceDirectory C:\MVN\consumerBanking\src\test\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ consumerBanking ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding Cp1252, i.e. build is platform
dependent!
[INFO] Compiling 1 source file to C:\MVN\consumerBanking\target\test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ consumerBanking ---
[INFO] Surefire report directory: C:\MVN\consumerBanking\target\surefire-reports

-------------------------------------------------------
TESTS
-------------------------------------------------------
Running com.companyname.bank.AppTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ consumerBanking ---
[INFO] Building jar: C:\MVN\consumerBanking\target\consumerBanking-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.663 s
[INFO] Finished at: 2021-12-13T17:34:27+05:30
[INFO] ------------------------------------------------------------------------

C:\MVN\consumerBanking>
You've built your project and created final jar file, following are the key learning concepts −
We give maven two goals, first to clean the target directory (clean) and then package
the project build output as jar (package).
Packaged jar is available in consumerBanking\target folder as consumerBanking-1.0-
SNAPSHOT.jar.
Test reports are available in consumerBanking\target\surefire-reports folder.
164
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven compiles the source code file(s) and then tests the source code file(s).
Then Maven runs the test cases.
Finally, Maven creates the package.
Now open the command console, go the C:\MVN\consumerBanking\target\classes directory
and execute the following java command.
>java com.companyname.bank.App
You will see the result as follows −
Hello World!
Adding Java Source Files
Let's see how we can add additional Java files in our project. Open
C:\MVN\consumerBanking\src\main\java\com\companyname\bank folder, create Util class in
it as Util.java.
package com.companyname.bank;

public class Util {


public static void printMessage(String message){
System.out.println(message);
}
}
Update the App class to use Util class.
package com.companyname.bank;

/**
* Hello world!
*
*/

public class App {


public static void main( String[] args ){
Util.printMessage("Hello World!");
}
}
Now open the command console, go the C:\MVN\consumerBanking directory and execute
the following mvn command.
>mvn clean compile
After Maven build is successful, go to the C:\MVN\consumerBanking\target\classes directory
and execute the following java command.
>java -cp com.companyname.bank.App
You will see the result as follows −
Hello World!

165
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven POM Builds (pom.xml)
POM is an acronym for Project Object Model. The pom.xml file contains information of
project and configuration information for the maven to build the project such as
dependencies, build directory, source directory, test source directory, plugin, goals etc.
Maven reads the pom.xml file, then executes the goal.
Before maven 2, it was named as project.xml file. But, since maven 2 (also in maven 3), it is
renamed as pom.xml.

Elements of maven pom.xml file


For creating the simple pom.xml file, you need to have following elements:
Element Description
project It is the root element of pom.xml file.
modelVersion It is the sub element of project. It specifies the modelVersion. It should be
set to 4.0.0.
groupId It is the sub element of project. It specifies the id for the project group.
artifactId It is the sub element of project. It specifies the id for the artifact (project). An
artifact is something that is either produced or used by a project. Examples
of artifacts produced by Maven for a project include: JARs, source and
binary distributions, and WARs.
version It is the sub element of project. It specifies the version of the artifact under
given group.
File: pom.xml
1. <project xmlns="http://maven.apache.org/POM/4.0.0"
2. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
4. http://maven.apache.org/xsd/maven-4.0.0.xsd">
5.
6. <modelVersion>4.0.0</modelVersion>
7. <groupId>com.javatpoint.application1</groupId>
8. <artifactId>my-app</artifactId>
9. <version>1</version>
10.
11. </project>

166
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven pom.xml file with additional elements
Here, we are going to add other elements in pom.xml file such as:
Element Description
packaging defines packaging type such as jar, war etc.
name defines name of the maven project.
url defines url of the project.
dependencies defines dependencies for this project.
dependency defines a dependency. It is used inside dependencies.
scope defines scope for this maven project. It can be compile, provided, runtime,
test and system.
File: pom.xml
1. <project xmlns="http://maven.apache.org/POM/4.0.0"
2. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
4. http://maven.apache.org/xsd/maven-4.0.0.xsd">
5.
6. <modelVersion>4.0.0</modelVersion>
7.
8. <groupId>com.javatpoint.application1</groupId>
9. <artifactId>my-application1</artifactId>
10. <version>1.0</version>
11. <packaging>jar</packaging>
12.
13. <name>Maven Quick Start Archetype</name>
14. <url>http://maven.apache.org</url>
15.
16. <dependencies>
17. <dependency>
18. <groupId>junit</groupId>
19. <artifactId>junit</artifactId>
20. <version>4.8.2</version>
21. <scope>test</scope>
22. </dependency>
23. </dependencies>
24.
167
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
25. </project>

POM stands for Project Object Model this file contains the project information and
configuration details for Maven to build the project. It contains information such as
dependencies, source directory, a build directory, plugins, goals, etc.Maven reads the
pom.xml file and executes the desired goal .older version of Maven 2 this file was named as
project.xml, for the latest version since Maven 2 this file was renamed as POM.XML.
POM.XML stores some additional information such as project version, mailing lists,
description. When maven is executing goals and tasks maven searches for POM.XML in the
current directory. It reads configuration from pom file and executes the desired goal. pom is
a fundamental unit file to work in maven.
Super POM is mavens default POM.XML file.
It contains some default values for most of the projects.
Basic Structure of POM.XML
The basic structure of pom.xml are given below:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.educba.examples</groupId>
<artifactId>example4</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Maven Archetype</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>

168
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Basic Key Elements Pom.Xml Contains
Basic key elements pom.xml contains are given below:
Tag Name Description
Project This is the root element of every POM.XML file and this is top level
element.
modelVersion Sub element of project tag. Indicates the version model in which
the current pom.xml is using. Version model changes in very
infrequently .in some cases it‟s mandatory to ensure stability.
groupId Sub element of the project tag, indicates a unique identifier of a
group the project is created. Typically fully qualified domain
name.
artifactId This element indicates a unique base name-primary artifact
generated by a particular project.
Version This element specifies the version for the artifact under the given
group.
packaging This packaging element indicates packages type to be used by
this artifact i.e. JAR, WAR, EAR.
Name This element is used to define the name for the project, often
used maven generated documentation.
URL This element defines the URL of the project.
description This element provides a basic description of the project.
Dependencies This element is used to define dependencies in the project.
Dependency Sub element of dependencies, used to provide specific
dependency.
Scope This element defines the scope of the maven project.
The following figure depicts the Graphical representation use of pom.xml in Maven.

169
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

To look out the default configuration of POM.XML file execute the following command.
Open command console => go to directory containing pom.xml => and execute
command.
In our example, pom.xml exist in D:\Maven\Maven_projects.
D:\Maven\Maven_projects>mvn help:effective-pom.
help: effective-pom this command effective POM as XML for the build process.
This command produces the following output.
2. Minimal POM
Some of the important requirements for l pom files are given below.
Project root i.e. <project>
Model version i.e. <modelVersion>
Group ID i.e. <groupId>
Artifact Id i.e. <artifactId>
Version i.e. <version>
Consider the following POM file.
<Project>
<modelVersion>4.0.0</modelVersion>
<groupId>com.educba.example</groupId>
<artifactId>example4</artifactId>
<version>1</version>
</project>

170
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
In pom file compulsorily groupId, artifactId, version should be configured. These three values
depict the fully qualified artifact name of the project.in case the configuration details are not
specified maven uses default configuration specified in the super pom file. For example, if
packaging type is not specified then default packaging type „JAR‟ is used.
Note: On the Project Inheritance (with respect to POM files).
Maven allows two POM files Parent pom and Child pom files. While inheriting from parent
pom child pom file inherits or that are merged following properties. And these properties are
also contained in parent pom file.
Plugin configuration.
Dependencies.
Resources.
Plugin lists.
Plugin execution Ids.
3. Parent POM
Super pom is one of the examples for the parent pom file which is written above. Inheritance
is achieved through a super pom file.
For Parent and child pom Maven checks two properties.
POM file in Project root directory.
Reference from child POM file which contains the same coordinates stated in parent
POM.
Note: Example for Maven parent POM please refer Super POM.
An important reason to use parent POM file is to have a central place to store information
about artifacts, compiler settings, etc. and these are shared in all modules.
4. Child POM
Child POM file refer the parent POM file using the <parent> tag .groupId, artifactId,version
attributes are compulsory in child pom file. Child POM file inherits all dependencies and
properties from the parent POM file. additionally, It also inherits subprojects dependencies.
Consider following pom file.
<project>
<parent>
<groupId>com.educba.examples</groupId>
<artifactId>example4</artifactId>
<version>1</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<groupId>com.educba.examples</groupId>
<artifactId>module_1</artifactId>
171
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<version>1</version>
</project>
Conclusion
Maven provides one of the most important benefits is handling project relationships
inheritance and dependencies. Dependency management in previous days was the most
complicated mess for handling complex projects maven solves these problems through
dependency management through repositories .most important feature of the POM file is its
dependency List. The POM file programmer can add new dependency easily and quickly.
Using POM hierarchy duplication can be avoided. POM inheritance helps in less time
consumption and reduces the complexity of multiple dependency declaration. The quick
project set up is achieved through POM files no build.xml compared to other tools. In some
cases POM.XML becomes very large for complex projects.in large projects sometimes it‟s very
difficult to maintain the jars in a repository and that makes use of several versions of jar files.

172
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven Build Life Cycle
What is Build Lifecycle?
A Build Lifecycle is a well-defined sequence of phases, which define the order in which the
goals are to be executed. Here phase represents a stage in life cycle. As an example, a
typical Maven Build Lifecycle consists of the following sequence of phases.
Phase Handles Description
prepare- resource copying Resource copying can be customized in this phase.
resources
validate Validating the Validates if the project is correct and if all necessary
information information is available.
compile compilation Source code compilation is done in this phase.
Test Testing Tests the compiled source code suitable for testing
framework.
package packaging This phase creates the JAR/WAR package as
mentioned in the packaging in POM.xml.
install installation This phase installs the package in local/remote maven
repository.
Deploy Deploying Copies the final package to the remote repository.
There are always pre and post phases to register goals, which must run prior to, or after a
particular phase.
When Maven starts building a project, it steps through a defined sequence of phases and
executes goals, which are registered with each phase.
Maven has the following three standard lifecycles −

clean
default(or build)
site
A goal represents a specific task which contributes to the building and managing of a
project. It may be bound to zero or more build phases. A goal not bound to any build phase
could be executed outside of the build lifecycle by direct invocation.
The order of execution depends on the order in which the goal(s) and the build phase(s) are
invoked. For example, consider the command below. The clean and package arguments
are build phases while the dependency:copy-dependencies is a goal.
mvn clean dependency:copy-dependencies package
Here the clean phase will be executed first, followed by the dependency:copy-
dependencies goal, and finally package phase will be executed.

Clean Lifecycle
When we execute mvn post-clean command, Maven invokes the clean lifecycle consisting of
the following phases.

pre-clean
clean
post-clean

173
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven clean goal (clean:clean) is bound to the clean phase in the clean lifecycle.
Its clean:cleangoal deletes the output of a build by deleting the build directory. Thus,
when mvn clean command executes, Maven deletes the build directory.
We can customize this behavior by mentioning goals in any of the above phases of clean life
cycle.
In the following example, We'll attach maven-antrun-plugin:run goal to the pre-clean, clean,
and post-clean phases. This will allow us to echo text messages displaying the phases of the
clean lifecycle.
We've created a pom.xml in C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.pre-clean</id>
<phase>pre-clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>pre-clean phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.clean</id>
<phase>clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>clean phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.post-clean</id>
174
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<phase>post-clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>post-clean phase</echo>
</tasks>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Now open command console, go to the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn post-clean
Maven will start processing and displaying all the phases of clean life cycle.

C:\MVN>mvn post-clean
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.pre-clean) @ project ---
[INFO] Executing tasks
[echo] pre-clean phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ project ---
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.clean) @ project ---
[INFO] Executing tasks
[echo] clean phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.post-clean) @ project ---
[INFO] Executing tasks
[echo] post-clean phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.740 s
[INFO] Finished at: 2021-12-10T20:03:53+05:30
[INFO] ------------------------------------------------------------------------

C:\MVN>

175
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
You can try tuning mvn clean command, which will display pre-clean and clean. Nothing will
be executed for post-clean phase.

Default (or Build) Lifecycle


This is the primary life cycle of Maven and is used to build the application. It has the following
21 phases.
Sr.No. Lifecycle Phase & Description
1 validate
Validates whether project is correct and all necessary information is available to
complete the build process.
2 initialize
Initializes build state, for example set properties.
3 generate-sources
Generate any source code to be included in compilation phase.
4 process-sources
Process the source code, for example, filter any value.
5 generate-resources
Generate resources to be included in the package.
6 process-resources
Copy and process the resources into the destination directory, ready for packaging
phase.
7 compile
Compile the source code of the project.
8 process-classes
Post-process the generated files from compilation, for example to do bytecode
enhancement/optimization on Java classes.
9 generate-test-sources
Generate any test source code to be included in compilation phase.
10 process-test-sources
Process the test source code, for example, filter any values.
11 test-compile
Compile the test source code into the test destination directory.
12 process-test-classes
Process the generated files from test code file compilation.
13 test
Run tests using a suitable unit testing framework (Junit is one).
14 prepare-package
Perform any operations necessary to prepare a package before the actual
packaging.
15 package

176
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Take the compiled code and package it in its distributable format, such as a JAR,
WAR, or EAR file.
16 pre-integration-test
Perform actions required before integration tests are executed. For example, setting
up the required environment.
17 integration-test
Process and deploy the package if necessary into an environment where
integration tests can be run.
18 post-integration-test
Perform actions required after integration tests have been executed. For example,
cleaning up the environment.
19 verify
Run any check-ups to verify the package is valid and meets quality criteria.
20 install
Install the package into the local repository, which can be used as a dependency in
other projects locally.
21 deploy
Copies the final package to the remote repository for sharing with other developers
and projects.

There are few important concepts related to Maven Lifecycles, which are worth to mention −
When a phase is called via Maven command, for example mvn compile, only phases
up to and including that phase will execute.
Different maven goals will be bound to different phases of Maven lifecycle depending
upon the type of packaging (JAR / WAR / EAR).
In the following example, we will attach maven-antrun-plugin:run goal to few of the phases of
Build lifecycle. This will allow us to echo text messages displaying the phases of the lifecycle.
We've updated pom.xml in C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.validate</id>
<phase>validate</phase>

177
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>validate phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.compile</id>
<phase>compile</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>compile phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.test</id>
<phase>test</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>test phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.package</id>
<phase>package</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>package phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.deploy</id>
<phase>deploy</phase>
<goals>
<goal>run</goal>
178
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
</goals>
<configuration>
<tasks>
<echo>deploy phase</echo>
</tasks>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Now open command console, go the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn compile
Maven will start processing and display phases of build life cycle up to the compile phase.

C:\MVN>mvn compile
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.validate) @ project ---
[INFO] Executing tasks
[echo] validate phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ project ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform
dependent!
[INFO] skip non existing resourceDirectory C:\MVN\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ project ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.compile) @ project ---
[INFO] Executing tasks
[echo] compile phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.033 s
[INFO] Finished at: 2021-12-10T20:05:46+05:30
[INFO] ------------------------------------------------------------------------

C:\MVN>
Site Lifecycle
179
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven Site plugin is generally used to create fresh documentation to create reports, deploy
site, etc. It has the following phases −

pre-site
site
post-site
site-deploy
In the following example, we will attach maven-antrun-plugin:run goal to all the phases of
Site lifecycle. This will allow us to echo text messages displaying the phases of the lifecycle.
We've updated pom.xml in C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-site-plugin</artifactId>
<version>3.7</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>2.9</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.pre-site</id>
<phase>pre-site</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>pre-site phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.site</id>
<phase>site</phase>

180
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>site phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.post-site</id>
<phase>post-site</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>post-site phase</echo>
</tasks>
</configuration>
</execution>

<execution>
<id>id.site-deploy</id>
<phase>site-deploy</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>site-deploy phase</echo>
</tasks>
</configuration>
</execution>

</executions>
</plugin>
</plugins>
</build>
</project>
Now open the command console, go the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn site
Maven will start processing and displaying the phases of site life cycle up to site phase.
C:\MVN>mvn site
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
181
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[INFO] --- maven-antrun-plugin:3.0.0:run (id.pre-site) @ project ---
[INFO] Executing tasks
[WARNING] [echo] pre-site phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-site-plugin:3.7:site (default-site) @ project ---
[WARNING] Input file encoding has not been set, using platform encoding Cp1252, i.e. build is platform
dependent!
[WARNING] No project URL defined - decoration links will not be relativized!
[INFO] Rendering site with org.apache.maven.skins:maven-default-skin:jar:1.2 skin.
[INFO]
[INFO] --- maven-antrun-plugin:3.0.0:run (id.site) @ project ---
[INFO] Executing tasks
[WARNING] [echo] site phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.323 s
[INFO] Finished at: 2021-12-10T20:22:31+05:30
[INFO] ------------------------------------------------------------------------

C:\MVN>
Maven Local Repository (.m2)
Introduction to Repositories
Artifact Repositories
A repository in Maven holds build artifacts and dependencies of varying types.
There are exactly two types of repositories: local and remote:
1. the local repository is a directory on the computer where Maven runs. It caches remote downloads
and contains temporary build artifacts that you have not yet released.
2. remote repositories refer to any other type of repository, accessed by a variety of protocols such
as file:// and https:// . These repositories might be a truly remote repository set up by a
third party to provide their artifacts for downloading (for example, repo.maven.apache.org). Other
"remote" repositories may be internal repositories set up on a file or HTTP server within your
company, used to share private artifacts between development teams and for releases.
Local and remote repositories are structured the same way so that scripts can run on either side, or
they can be synced for offline use. The layout of the repositories is completely transparent to the
Maven user, however.
Using Repositories
In general, you should not need to do anything with the local repository on a regular basis, except
clean it out if you are short on disk space (or erase it completely if you are willing to download
everything again).
For the remote repositories, they are used for both downloading and uploading (if you have the
permission to do so).
Downloading from a Remote Repository

182
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Downloading in Maven is triggered by a project declaring a dependency that is not present in the
local repository (or for a SNAPSHOT , when the remote repository contains one that is newer). By
default, Maven will download from the central repository.

To override this, you need to specify a mirror as shown in Using Mirrors for Repositories.

You can set this in your settings.xml file to globally use a certain mirror. However, it is
common for a project to customise the repository in its pom.xml and that your setting will take
precedence. If dependencies are not being found, check that you have not overridden the remote
repository.
For more information on dependencies, see Dependency Mechanism.
Using Mirrors for the Central Repository
There are several official Central repositories geographically distributed. You can make changes to
your settings.xml file to use one or more mirrors. Instructions for this can be found in the
guide Using Mirrors for Repositories.
Building Offline
If you are temporarily disconnected from the internet and you need to build your projects offline,
you can use the offline switch on the CLI:
mvn -o package
Many plugins honor the offline setting and do not perform any operations that connect to the
internet. Some examples are resolving Javadoc links and link checking the site.
Uploading to a Remote Repository
While this is possible for any type of remote repository, you must have the permission to do so. To
have someone upload to the Central Maven repository, see Repository Center.
Internal Repositories
When using Maven, particularly in a corporate environment, connecting to the internet to download
dependencies is not acceptable for security, speed or bandwidth reasons. For that reason, it is
desirable to set up an internal repository to house a copy of artifacts, and to publish private artifacts
to.
Such an internal repository can be downloaded using HTTP or the file system (with
a file:// URL), and uploaded to using SCP, FTP, or a file copy.
As far as Maven is concerned, there is nothing special about this repository: it is another remote
repository that contains artifacts to download to a user's local cache, and is a publish destination for
artifact releases.
Additionally, you may want to share the repository server with your generated project sites. For
more information on creating and deploying sites, see Creating a Site.
Setting up the Internal Repository
To set up an internal repository just requires that you have a place to put it, and then copy required
artifacts there using the same layout as in a remote repository such as repo.maven.apache.org.

It is not recommended that you scrape or rsync:// a full copy of central as there is a large
amount of data there and doing so will get you banned. You can use a program such as those

183
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
described on the Repository Management page to run your internal repository's server, download
from the internet as required, and then hold the artifacts in your internal repository for faster
downloading later.
The other options available are to manually download and vet releases, then copy them to the
internal repository, or to have Maven download them for a user, and manually upload the vetted
artifacts to the internal repository which is used for releases. This step is the only one available for
artifacts where the license forbids their distribution automatically, such as several J2EE JARs
provided by Sun. Refer to the Guide to coping with SUN JARs document for more information.
It should be noted that Maven intends to include enhanced support for such features in the future,
including click through licenses on downloading, and verification of signatures.

Using the Internal Repository


Using the internal repository is quite simple. Simply make a change to add
a repositories element:

1. <project>
2. ...
3. <repositories>
4. <repository>
5. <id>my-internal-site</id>
6. <url>https://myserver/repo</url>
7. </repository>
8. </repositories>
9. ...
10. </project>

If your internal repository requires authentication, the id element can be used in your settings file
to specify login information.
Deploying to the Internal Repository
One of the most important reasons to have one or more internal repositories is to be able to publish
your own private releases.
To publish to the repository, you will need to have access via one of SCP, SFTP, FTP, WebDAV, or
the filesystem. Connectivity is accomplished with the various wagons. Some wagons may need to
be added as extension to your build.

Maven's local repository is a directory on the local machine that stores all the project
artifacts.
When we execute a Maven build, Maven automatically downloads all the dependency jars
into the local repository. Usually, this directory is named .m2.
Here's where the default local repository is located based on OS:
Windows: C:\Users\<User_Name>\.m2Copy
184
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Linux: /home/<User_Name>/.m2Copy
Mac: /Users/<user_name>/.m2Copy
And for Linux and Mac, we can write in the short form:
~/.m2Copy
3. Custom Local Repository in settings.xml
If the repo isn't present in this default location, it's likely because of some pre-existing
configuration.
That config file is located in the Maven installation directory in a folder called conf, with the
name settings.xml.
Here's the relevant configuration that determines the location of our missing local repo:
<settings>
<localRepository>C:/maven_repository</localRepository>
...Copy
This is essentially how we can change the location of the local repo. Of course, if we change
that location, we'll no longer find the repo at the default location.
The files stored in the earlier location won't be moved automatically.
Passing Local Repository Location via Command Line
Apart from setting the custom local repository in Maven's settings.xml, the mvn command
supports the maven.repo.local property, which allows us to pass the local repository location
as a command-line parameter:
mvn -Dmaven.repo.local=/my/local/repository/path clean installCopy
In this way, we don't have to change Maven's settings.xml.
By default, Maven local repository is defaulted to ${user.home}/.m2/repository folder :
1. Unix/Mac OS X – ~/.m2/repository
2. Windows – C:\Users\{your-username}\.m2\repository
When we compile a Maven project, Maven will download all the project‟s dependency and
plugin jars into the Maven local repository, save time for next compilation.
1. Find Maven Local Repository
1.1 If the default .m2 is unable to find, maybe someone changed the default path. Issue the
following command to find out where is the Maven local repository:

mvn help:evaluate -Dexpression=settings.localRepository


1.2 Example :

Terminal

D:\> mvn help:evaluate -Dexpression=settings.localRepository

[INFO] Scanning for projects...


[INFO]

185
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-help-plugin:3.1.0:evaluate (default-cli) @ standalone-pom ---
[INFO] No artifact parameter specified, using 'org.apache.maven:standalone-pom:pom:1' as project.
[INFO]

C:\opt\maven-repository

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.598 s
[INFO] Finished at: 2018-10-24T16:44:18+08:00
[INFO] ------------------------------------------------------------------------
In above output, The Maven local repository is relocated to C:\opt\maven-repository

2. Update Maven Local Repository


2.1 Find this file {MAVEN_HOME}\conf\settings.xml and update the localRepository.

{MAVEN_HOME}\conf\settings.xml

<settings>
<!-- localRepository
| The path to the local repository maven will use to store artifacts.
|
| Default: ~/.m2/repository
<localRepository>/path/to/local/repo</localRepository>
-->

<localRepository>D:/maven_repo</localRepository>
Note
Issue mvn -version to find out where is Maven installed.
2.2 Save the file, done, the Maven local repository is now changed to D:/maven_repo.

186
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The local repository is a directory on the computer where Maven runs. When you build
project, it caches remote downloads to reduce network traffic, and also contains
temporary build artifacts that you have not yet released.
1. Maven local repository default locations
By default, in all systems, the maven local repository location path
is .m2/repository under home user.
Unix / Linux – /home/{username}/.m2/repository OR you also can access
with ~/.m2/repository
2. Custom Maven local repository path
2.1. If you wants to change maven local repository location, first of all you need to find
maven setup directory. You can find that from command-line in windows echo
%MAVEN_HOME%, in Mac/ Linux echo $MAVEN_HOME. Alternatively you can try $ mvn -
version command to get the maven setup location.
In Windows:
$ echo %MAVEN_HOME%
Z:\D\maven\apache-maven-3.6.3
In Linux or Mac:
$ echo $MAVEN_HOME
/home/admin/Documents/apache-maven-3.6.3
2.2. Now, you will find conf directory under setup path. Explore it, you will
find settings.xml.

2.3. Open settings.xml, specify value for localrepository property like following
and save the file. Your repository location pointed to the specified location.

187
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Note: If the maven local repository not present in the default location path
under {username}/.m2, look at localrepository property value in settings.xml.
Default repository location might be changed to custom location.
Maven Global Repository
Maven repository is a directory where all the packages, JAR files, plugins or any other artifacts
are stored with POM.xml. Repository in maven holds build artifacts and dependencies of
various types. It provides three types of repositories.
Types of Repositories
Consider the following to understand the types and where they are stored.

1. Local Repositories
Maven local repository is located in the local computer system.it is created by maven when
the user runs any maven command. The default location is %USER_HOME%/.m2 directory.
When maven build is executed, Maven automatically downloads all the dependency jars into
the local repository. For new version maven will download automatically. If version declared
in the dependency tag in POM.xml file it simply uses it without downloading. By default,
maven creates local repository under %UESR_HOME% directory.
188
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Update and Setting the Maven Local Repository:
To update, find this file {MAVEN_HOME}\conf\settings.xml
And to set, use following code:
Code:
<settings>
<localRepository>/path/to/local/repo/</localRepository>
<interactiveMode>true</interactiveMode>
<offline>false</offline>
</settings>
The default value or the path is: ${user.home}/.m2/repository.
interactiveMode is true if you want to interact with the user for input, false if not.
Offline mode is true if the build system operates in offline mode, true if.
Advantages
Reduced version conflict.
Less manual intervention for the first time build process.
Single central reference repository for all dependent software libraries rather than
several independent local libraries.
Fasten the clean build process while using local repositories.
2. Central Repositories
This repositories are located on the web. It has been created by the apache maven itself.it
contains a large number of commonly used libraries. It is not necessary to configure the
maven central repository URL. Internet access is required to search and download the maven
central repository. When maven cannot find a dependency jar file for local repository its
starts searching in maven central repository using URL: http://repo1.maven.org/maven2/.

189
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

To override default location make changes in settings.xml file to use one or more mirrors.
Any special configuration is not required to access a central repository. Except in the case
system under firewall, you need to change the proxy settings.
To set up a maven proxy setting, follow the below steps:
Navigate to path – {M2_HOME}/conf/settings.xml
Open xml in edit mode in any text editor.
Open and update <proxy>
3. Remote Repository
This is stored in the organization‟s internal network or server. The company maintains a
repository outside the developer‟s machine and are called as Remote Repository.
The following pom.xml declares remote repository URL and dependencies.
<project>
<dependencies>
<dependency>
<groupId>com.educba.lib</groupId>
<artifactId>library</artifactId>
<version>1.0.0</version>
</dependency>
<dependencies>
<repositories>
<repository>
<id>edu.lib_1</id>

190
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<url> http:// (Organization URL)/maven2/lib_1</url>
</repository>
<repository>
<id>edu.lib_2</id>
<url>http:// (Organization URL)/maven2/lib_2</url>
</repository>
</repositories>
</project>
Adding Remote Repository
Not every library is stored in the Maven Central Repository, some libraries are available in
Java.net or JBoss repository.
1. Java.net Repository
<repositories>
<repository>
<id>java-net-repo</id>
<url>https://maven.java.net/content/repositories/public/</url>
</repository>
</repositories>
2. JBoss Repository
<repositories>
<repository>
<id>jboss-repo</id>
<url>http://repository.jboss.org/nexus/content/groups/public/</url>
</repository>
</repositories>
3. Spring Repository
<repositories>
<repository>
<id>spring-repo</id>
<url>https://repo.spring.io/release</url>
</repository>
</repositories>
Advantages
Artifact team sharing.
191
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Effective separation of artifact still projects under development and release phase.
Centralize libraries management provides security and each client speaks with a single
global repository, avoiding the risk that different elements of the team.
The remote repository allows keeping under control the nature of third-party libraries
used in projects, thus avoiding introducing elements not compliant with company
policies.
Repository Manager
A repository manager is a dedicated server application designed to manage repositories. The
usage of a repository manager is considered an essential best practice for any significant
usage of Maven.
Repository Manager is considered one of the proxy servers for public Maven
Repositories.
Allows Repositories as a destination for Maven project outputs.
Advantages
Repository Manager reduces the complexity of downloading remote Repository hence
the time consumption is less and increases build performance.
Due to trustful dependence, external repositories Repository Manager Increases build
stability.
Due to interaction with remote SNAPSHOT repositories, the Repository Manager
increases the performance.
Repository Manager controls the provided and consumed artifacts.
Repository Manager acts as Central Storage and provides access to artifacts and
MetaData.
Repository Manager acts as a platform for sharing or exchanging binary artifacts.
Building artifacts from scratch is not required.
Available Repository Managers
The followings are the open-source and commercial repository managers who are known to
support the repository format used by Maven.
Apache Archiva
CloudRepo
Cloudsmith Package
JFrog Artifactory Open Source
JFrog Artifactory Pro
MyGet
Sonatype Nexus OSS
Sonatype Nexus Pro
packagecloud.io

192
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven Dependency Checking Process
First, it scans through local repositories for all configured dependencies.
If it‟s found continues with further execution.
Absence of dependencies in local repository; maven scans in central repositories for
that particular dependency.
Available dependencies are downloaded in local repositories for future execution of
the project.
Even dependencies are not found in Local Repository and Central Repository, Maven
starts scanning in Remote Repositories.
In case dependencies are not available in any of the three Repositories- Local
Repository, Central Repository, Remote Repository, Maven throws an Exception “not
able to find the dependencies & stops processing”.
It downloads all found dependencies into the Local Repository.
Note: Since repositories are used by default, the primary type of binary component
repository is a JAR file containing Java byte code. There is no limit to what type of content
stored. Users can easily deploy any libraries to Maven Repositories. When Maven downloads
a component like a dependency or a repository it also downloads that components POM.

Conclusion – Maven Repository


Maven repositories permit artifacts Javadoc to distribute close by the artifacts JAR and
integrated development environment. Maven simplifies when a code has dependency
outside source control organizers. Mavens‟ dependency handling is systematically organized
in a coordinated manner for identifying artifacts-software libraries or modules, POM
references of the JUnit coordinates as a direct dependency. Consider an example: the
hibernate library has to be declared in the POM.XML file maven automatically downloads the
dependencies that Hibernate dependency required, Maven stores automatically in Local
Repository. In an Organization project developed on a single machine depends on other
machines through the Local Repository.

Maven Central Repository


This documentation is for those that need to use or contribute to the
Maven central repository. This includes those that need dependencies for their own
build or projects that wish to have their releases added to the Maven central repository,
even if they don't use Maven as their build tool.
Discontinuing support for TLSv1.1 and below as of June 15th 2018 and Discontinuing
support for HTTP as of January 15th 2020
 Maintaining your Metadata - Information for third-party projects
 Guide to uploading artifacts - How to get things uploaded to the central repository
 Fixing Central Metadata - How to fix issues in content already uploaded

193
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Group ID, Artifact ID, Snapshot


Maven
Maven is a widely popular open-source tool used to build, publish, and deploy various
projects simultaneously. This build automation tool is used mainly with Java Projects, but
Maven can also use it to develop and manage projects written in languages like Ruby, C#,
What is GroupID?
Like names are to Human Beings, GroupID is to projects developed using Maven. The
Maven GroupID is the ID of the entire project group. GroupID is a unique entity among all the
projects.
The GroupID follows the naming convention of the Java Package name, which starts with the
reserved domain name. Maven needs to enforce the rule as multiple legacies have yet to
follow the pattern and, instead of doing the same, end up using single-word GroupIDs.

It is rather challenging to obtain a single-word GroupID approved by the Maven Inclusion and
Central Repository.
To make things even more modular, Maven GroupID allows us to create multiple sub-groups
to make things easier. We can determine the granularity of the GroupID by using the Project
structure.
Project Configuration in Maven is done using the Project Object Model, represented by the
file pom.xml. The POM describes the dependencies managed by the Project and proves
helpful in the plugin configuration for software building.
The POM file also keeps a record of relationships between multi-module projects.

194
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven GroupID Naming
When working with Maven GroupID, the thing to note about the class file is that we do not
have to pick a name for them; it will automatically choose its name by utilizing the 1:1
mapping from the Java File.
Maven only asks us to pick two names; therefore, it is relatively simple. Thus to define the
Maven GrouID name, the below steps have to be followed:
Step 1: Create a Template for the Project in the Spring Initializer. The figure below shows a
template of the Maven GroupID naming project.
Enter the details as follows:
Group Name: com.Group_ID
Artifact: Maven_Group_ID
Name: Maven_Group_ID
Packaging: JAR
Java Version 8

Step 2: After creating the template, extract the template file and open the same in VS Code
or any other Editor or IDE with Spring Boot Functionalities.
The pom.xml File in the Project should be as follows.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSch
ema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/mav
en-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.5</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.Group_ID</groupId>
<artifactId>Maven_Group_ID</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Maven_Group</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

<build>
195
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>

</project>

In the example below, we define naming conventions as:


<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.5</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>

<groupId>com.Group_ID</groupId>
<artifactId>Maven_Group_ID</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Maven_Group_ID</name>
<description>Demo project for Spring Boot</description>

We can also modify the name/Group ID of the Project by changing the contents in the name
tag as:
<name>Maven_Group_ID</name>

The pom.xml file looks like below:


<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSch


ema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/mav
en-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.5</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>

<groupId>com.Group_ID</groupId>
<artifactId>Maven_Group_ID</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Maven_Group</name>
<description>Demo project for Spring Boot</description>

<properties>
<java.version>1.8</java.version>
196
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
</properties>

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<dependencies>
<dependency>
<groupId> maven_group </groupId>
<artifactId> RestrictionRule </artifactId>
<version> 0.0.1 </version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>

</project>
Restricting the GroupID
Maven restricts the usage of GroupIDs for the Project to enforce the source code. Here is
how we can implement a single group restriction on the GroupID in Maven:
<dependency>
<groupId> Maven_Group_ID </groupId>
<artifactId> RestrictionRule </artifactId>
<version> 0.0.1 </version>
</dependency>
How is GroupID different from ArtifactID?
The below table briefly discusses the differences between the Maven GroupID and the
ArtifactID.
Sr. Group ID Artifact ID
No

1 Projects are identified uniquely using the name of the JAR file without the version is called
GroupID the Artifact ID

2 There are various versions of Group Artifact ID has no different Versions


ID

197
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Frequently Asked Questions
What is GroupID in Maven?
GroupID is n unique identifying feature of all the projects currently being worked on in
Maven. It can be automated or the user can define the GroupID by themselves as well.
What is POM?
POM is the Project Object Model, the fundamental unit of work in Maven. It is an XML file
containing information about the configuration details and the Project that Maven uses for
building the projects.
What is the use of Maven GroupID?
GroupID is the unique identifying feature of all the Projects that starts from a reversed
Domain. Maven does not enforce this convention, as there are multiple projects where the
pattern is broken.
What is ArtifactID?
Artifact ID is the name of .JAR file without the version. If we are creating the JAR File, we
can name the ArtifactID however we like.
What is Spring Boot?
Spring Boot is a Java Based Open Source Framework that is used to create a microservice.
It is also used to build Standalone Production Ready Spring Applications.
Differences between Group ID, Artifact ID
artifactId is the name of the jar without version. If you created it then you can choose
whatever name you want with lowercase letters and no strange symbols. If it's a third party jar
you have to take the name of the jar as it's distributed. eg. maven, commons-math
groupId will identify your project uniquely across all projects, so we need to enforce a naming
schema. It has to follow the package name rules, what means that has to be at least as a domain
name you control, and you can create as many subgroups as you want. Look at More information
about package names. eg. org.apache.maven, org.apache.commons

The main difference between groupId and artifactId in Maven is that the groupId specifies the id
of the project group while the artifactId specifies the id of the project.

It is required to use third party libraries when developing a project. The programmer can
download and add these third-party libraries to the project, but it is difficult to update them later.
Maven provides a solution to this issue. It helps to include all the dependencies required for the
project. Moreover, the programmer can specify the required dependencies in the POM.XML file. It
has the configuration information to build the project. Furthermore, this file consists of several
XML elements, and two of them are groupId and artifactId. example groupId : com.test.java
(similar to package name) artifactId : javaproject(project or module name)

What is SNAPSHOT?
SNAPSHOT is a special version that indicates a current development copy. Unlike regular
versions, Maven checks for a new SNAPSHOT version in a remote repository for every build.
Now data-service team will release SNAPSHOT of its updated code every time to repository,
say data-service: 1.0-SNAPSHOT, replacing an older SNAPSHOT jar.

198
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Snapshot vs Version
In case of Version, if Maven once downloaded the mentioned version, say data-service:1.0, it
will never try to download a newer 1.0 available in repository. To download the updated
code, data-service version is be upgraded to 1.1.
In case of SNAPSHOT, Maven will automatically fetch the latest SNAPSHOT (data-
service:1.0-SNAPSHOT) every time app-ui team build their project.

app-ui pom.xml
app-ui project is using 1.0-SNAPSHOT of data-service.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>app-ui</groupId>
<artifactId>app-ui</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>health</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>data-service</groupId>
<artifactId>data-service</artifactId>
<version>1.0-SNAPSHOT</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>

data-service pom.xml
data-service project is releasing 1.0-SNAPSHOT for every minor change.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>data-service</groupId>
<artifactId>data-service</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>health</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
</project>
199
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Although, in case of SNAPSHOT, Maven automatically fetches the latest SNAPSHOT on
daily basis, you can force maven to download latest snapshot build using -U switch to any
maven command.
mvn clean package -U
Let's open the command console, go to the C:\ > MVN > app-ui directory and execute the
following mvn command.
C:\MVN\app-ui>mvn clean package -U
Maven will start building the project after downloading the latest SNAPSHOT of data-service.
[INFO] Scanning for projects...
[INFO]--------------------------------------------
[INFO] Building consumerBanking
[INFO] task-segment: [clean, package]
[INFO]--------------------------------------------
[INFO] Downloading data-service:1.0-SNAPSHOT
[INFO] 290K downloaded.
[INFO] [clean:clean {execution: default-clean}]
[INFO] Deleting directory C:\MVN\app-ui\target
[INFO] [resources:resources {execution: default-resources}]

[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources,


i.e. build is platform dependent!

[INFO] skip non existing resourceDirectory C:\MVN\app-ui\src\main\resources


[INFO] [compiler:compile {execution:default-compile}]
[INFO] Compiling 1 source file to C:\MVN\app-ui\target\classes
[INFO] [resources:testResources {execution: default-testResources}]

[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources,


i.e. build is platform dependent!

[INFO] skip non existing resourceDirectory C:\MVN\app-ui\src\test\resources


[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] Compiling 1 source file to C:\MVN\app-ui\target\test-classes
[INFO] [surefire:test {execution: default-test}]
[INFO] Surefire report directory: C:\MVN\app-ui\target\
surefire-reports

--------------------------------------------------
TESTS
--------------------------------------------------

Running com.companyname.bank.AppTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.027 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO] [jar:jar {execution: default-jar}]


[INFO] Building jar: C:\MVN\app-ui\target\
app-ui-1.0-SNAPSHOT.jar
[INFO]--------------------------------------------------------
200
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[INFO] BUILD SUCCESSFUL
[INFO]--------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: 2015-09-27T12:30:02+05:30
[INFO] Final Memory: 16M/89M
[INFO]---------------------------------------------------------------
Maven Dependencies
Dependency management is a core feature of Maven. Managing dependencies for a single project is
easy. Managing dependencies for multi-module projects and applications that consist of hundreds of
modules is possible. Maven helps a great deal in defining, creating, and maintaining reproducible builds
with well-defined classpaths and library versions.
Learn more about:

 Transitive Dependencies
 Excluded/Optional Dependencies
 Dependency Scope
 Dependency Management
 Importing Dependencies
 Bill of Materials (BOM) POMs
 System Dependencies

Transitive Dependencies
Maven avoids the need to discover and specify the libraries that your own dependencies require by
including transitive dependencies automatically.
This feature is facilitated by reading the project files of your dependencies from the remote
repositories specified. In general, all dependencies of those projects are used in your project, as are
any that the project inherits from its parents, or from its dependencies, and so on.
There is no limit to the number of levels that dependencies can be gathered from. A problem arises
only if a cyclic dependency is discovered.
With transitive dependencies, the graph of included libraries can quickly grow quite large. For this
reason, there are additional features that limit which dependencies are included:

 Dependency mediation - this determines what version of an artifact will be chosen when multiple
versions are encountered as dependencies. Maven picks the "nearest definition". That is, it uses the
version of the closest dependency to your project in the tree of dependencies. You can always
guarantee a version by declaring it explicitly in your project's POM. Note that if two dependency
versions are at the same depth in the dependency tree, the first declaration wins.
 "nearest definition" means that the version used will be the closest one to your project in the
tree of dependencies. Consider this tree of dependencies:
 A

 ├── B

 │ └── C

 │ └── D 2.0

 └── E

201
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
└── D 1.0

In text, dependencies for A, B, and C are defined as A -> B -> C -> D 2.0 and A -> E -> D 1.0,
then D 1.0 will be used when building A because the path from A to D through E is shorter.
You could explicitly add a dependency to D 2.0 in A to force the use of D 2.0, as shown here:
A

├── B

│ └── C

│ └── D 2.0

├── E

│ └── D 1.0

└── D 2.0

 Dependency management - this allows project authors to directly specify the versions of artifacts
to be used when they are encountered in transitive dependencies or in dependencies where no
version has been specified. In the example in the preceding section a dependency was directly
added to A even though it is not directly used by A. Instead, A can include D as a dependency in
its dependencyManagement section and directly control which version of D is used when, or if, it
is ever referenced.
 Dependency scope - this allows you to only include dependencies appropriate for the current stage
of the build. This is described in more detail below.
 Excluded dependencies - If project X depends on project Y, and project Y depends on project Z,
the owner of project X can explicitly exclude project Z as a dependency, using the "exclusion"
element.
 Optional dependencies - If project Y depends on project Z, the owner of project Y can mark
project Z as an optional dependency, using the "optional" element. When project X depends on
project Y, X will depend only on Y and not on Y's optional dependency Z. The owner of project X
may then explicitly add a dependency on Z, at her option. (It may be helpful to think of optional
dependencies as "excluded by default.")

Although transitive dependencies can implicitly include desired dependencies, it is a good practice
to explicitly specify the dependencies your source code uses directly. This best practice proves its
value especially when the dependencies of your project change their dependencies.
For example, assume that your project A specifies a dependency on another project B, and project B
specifies a dependency on project C. If you are directly using components in project C, and you
don't specify project C in your project A, it may cause build failure when project B suddenly
updates/removes its dependency on project C.
Another reason to directly specify dependencies is that it provides better documentation for your
project: one can learn more information by just reading the POM file in your project, or by
executing mvn dependency:tree.
Maven also provides dependency:analyze plugin goal for analyzing the dependencies: it helps
making this best practice more achievable.

202
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Dependency Scope
Dependency scope is used to limit the transitivity of a dependency and to determine when a
dependency is included in a classpath.
There are 6 scopes:

 compile
This is the default scope, used if none is specified. Compile dependencies are available in all
classpaths of a project. Furthermore, those dependencies are propagated to dependent projects.
 provided
This is much like compile , but indicates you expect the JDK or a container to provide the
dependency at runtime. For example, when building a web application for the Java Enterprise
Edition, you would set the dependency on the Servlet API and related Java EE APIs to
scope provided because the web container provides those classes. A dependency with this scope is
added to the classpath used for compilation and test, but not the runtime classpath. It is not
transitive.
 runtime
This scope indicates that the dependency is not required for compilation, but is for execution.
Maven includes a dependency with this scope in the runtime and test classpaths, but not the
compile classpath.
 test
This scope indicates that the dependency is not required for normal use of the application, and is
only available for the test compilation and execution phases. This scope is not transitive. Typically
this scope is used for test libraries such as JUnit and Mockito. It is also used for non-test libraries
such as Apache Commons IO if those libraries are used in unit tests (src/test/java) but not in the
model code (src/main/java).
 system
This scope is similar to provided except that you have to provide the JAR which contains it
explicitly. The artifact is always available and is not looked up in a repository.
 import
This scope is only supported on a dependency of type pom in the <dependencyManagement> section.
It indicates the dependency is to be replaced with the effective list of dependencies in the specified
POM's <dependencyManagement> section. Since they are replaced, dependencies with a scope
of import do not actually participate in limiting the transitivity of a dependency.

Each of the scopes (except for import ) affects transitive dependencies in different ways, as is
demonstrated in the table below. If a dependency is set to the scope in the left column, a transitive
dependency of that dependency with the scope across the top row results in a dependency in the
main project with the scope listed at the intersection. If no scope is listed, it means the dependency
is omitted.
compile provided runtime test
compile compile(*) - runtime -
provided provided - provided -
runtime runtime - runtime -
test test - test -

203
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
(*) Note: it is intended that this should be runtime scope instead, so that all compile dependencies
must be explicitly listed. However, if a library you depend on extends a class from another library,
both must be available at compile time. For this reason, compile time dependencies remain as
compile scope even when they are transitive.

Dependency Management
The dependency management section is a mechanism for centralizing dependency information.
When you have a set of projects that inherit from a common parent, it's possible to put all
information about the dependency in the common POM and have simpler references to the artifacts
in the child POMs. The mechanism is best illustrated through some examples. Given these two
POMs which extend the same parent:
Project A:

1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-a</groupId>
6. <artifactId>artifact-a</artifactId>
7. <version>1.0</version>
8. <exclusions>
9. <exclusion>
10. <groupId>group-c</groupId>
11. <artifactId>excluded-artifact</artifactId>
12. </exclusion>
13. </exclusions>
14. </dependency>
15. <dependency>
16. <groupId>group-a</groupId>
17. <artifactId>artifact-b</artifactId>
18. <version>1.0</version>
19. <type>bar</type>
20. <scope>runtime</scope>
21. </dependency>
22. </dependencies>
23. </project>

Project B:

1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-c</groupId>
6. <artifactId>artifact-b</artifactId>
7. <version>1.0</version>
8. <type>war</type>

204
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
9. <scope>runtime</scope>
10. </dependency>
11. <dependency>
12. <groupId>group-a</groupId>
13. <artifactId>artifact-b</artifactId>
14. <version>1.0</version>
15. <type>bar</type>
16. <scope>runtime</scope>
17. </dependency>
18. </dependencies>
19. </project>

These two example POMs share a common dependency and each has one non-trivial dependency.
This information can be put in the parent POM like this:

1. <project>
2. ...
3. <dependencyManagement>
4. <dependencies>
5. <dependency>
6. <groupId>group-a</groupId>
7. <artifactId>artifact-a</artifactId>
8. <version>1.0</version>
9.
10. <exclusions>
11. <exclusion>
12. <groupId>group-c</groupId>
13. <artifactId>excluded-artifact</artifactId>
14. </exclusion>
15. </exclusions>
16.
17. </dependency>
18.
19. <dependency>
20. <groupId>group-c</groupId>
21. <artifactId>artifact-b</artifactId>
22. <version>1.0</version>
23. <type>war</type>
24. <scope>runtime</scope>
25. </dependency>
26.
27. <dependency>
28. <groupId>group-a</groupId>
29. <artifactId>artifact-b</artifactId>
30. <version>1.0</version>
31. <type>bar</type>
32. <scope>runtime</scope>
33. </dependency>
205
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
34. </dependencies>
35. </dependencyManagement>
36. </project>

Then the two child POMs become much simpler:

1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-a</groupId>
6. <artifactId>artifact-a</artifactId>
7. </dependency>
8.
9. <dependency>
10. <groupId>group-a</groupId>
11. <artifactId>artifact-b</artifactId>
12. <!-- This is not a jar dependency, so we must specify type. -->
13. <type>bar</type>
14. </dependency>
15. </dependencies>
16. </project>

1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-c</groupId>
6. <artifactId>artifact-b</artifactId>
7. <!-- This is not a jar dependency, so we must specify type. -->
8. <type>war</type>
9. </dependency>
10.
11. <dependency>
12. <groupId>group-a</groupId>
13. <artifactId>artifact-b</artifactId>
14. <!-- This is not a jar dependency, so we must specify type. -->
15. <type>bar</type>
16. </dependency>
17. </dependencies>
18. </project>

NOTE: In two of these dependency references, we had to specify the <type/> element. This is
because the minimal set of information for matching a dependency reference against a
dependencyManagement section is actually {groupId, artifactId, type, classifier}. In many cases,
these dependencies will refer to jar artifacts with no classifier. This allows us to shorthand the
identity set to {groupId, artifactId}, since the default for the type field is jar , and the default
classifier is null.
206
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
A second, and very important use of the dependency management section is to control the versions
of artifacts used in transitive dependencies. As an example consider these projects:
Project A:

1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>A</artifactId>
5. <packaging>pom</packaging>
6. <name>A</name>
7. <version>1.0</version>
8. <dependencyManagement>
9. <dependencies>
10. <dependency>
11. <groupId>test</groupId>
12. <artifactId>a</artifactId>
13. <version>1.2</version>
14. </dependency>
15. <dependency>
16. <groupId>test</groupId>
17. <artifactId>b</artifactId>
18. <version>1.0</version>
19. <scope>compile</scope>
20. </dependency>
21. <dependency>
22. <groupId>test</groupId>
23. <artifactId>c</artifactId>
24. <version>1.0</version>
25. <scope>compile</scope>
26. </dependency>
27. <dependency>
28. <groupId>test</groupId>
29. <artifactId>d</artifactId>
30. <version>1.2</version>
31. </dependency>
32. </dependencies>
33. </dependencyManagement>
34. </project>

Project B:

1. <project>
2. <parent>
3. <artifactId>A</artifactId>
4. <groupId>maven</groupId>
5. <version>1.0</version>
6. </parent>

207
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
7. <modelVersion>4.0.0</modelVersion>
8. <groupId>maven</groupId>
9. <artifactId>B</artifactId>
10. <packaging>pom</packaging>
11. <name>B</name>
12. <version>1.0</version>
13.
14. <dependencyManagement>
15. <dependencies>
16. <dependency>
17. <groupId>test</groupId>
18. <artifactId>d</artifactId>
19. <version>1.0</version>
20. </dependency>
21. </dependencies>
22. </dependencyManagement>
23.
24. <dependencies>
25. <dependency>
26. <groupId>test</groupId>
27. <artifactId>a</artifactId>
28. <version>1.0</version>
29. <scope>runtime</scope>
30. </dependency>
31. <dependency>
32. <groupId>test</groupId>
33. <artifactId>c</artifactId>
34. <scope>runtime</scope>
35. </dependency>
36. </dependencies>
37. </project>

When maven is run on project B, version 1.0 of artifacts a, b, c, and d will be used regardless of the
version specified in their POM.

 a and c both are declared as dependencies of the project so version 1.0 is used due to dependency
mediation. Both also have runtime scope since it is directly specified.
 b is defined in B's parent's dependency management section and since dependency management
takes precedence over dependency mediation for transitive dependencies, version 1.0 will be
selected should it be referenced in a or c's POM. b will also have compile scope.
 Finally, since d is specified in B's dependency management section, should d be a dependency (or
transitive dependency) of a or c, version 1.0 will be chosen - again because dependency
management takes precedence over dependency mediation and also because the current POM's
declaration takes precedence over its parent's declaration.

The reference information about the dependency management tags is available from the project
descriptor reference.

208
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Importing Dependencies
The examples in the previous section describe how to specify managed dependencies through
inheritance. However, in larger projects it may be impossible to accomplish this since a project can
only inherit from a single parent. To accommodate this, projects can import managed dependencies
from other projects. This is accomplished by declaring a POM artifact as a dependency with a scope
of "import".
Project B:

1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>B</artifactId>
5. <packaging>pom</packaging>
6. <name>B</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>maven</groupId>
13. <artifactId>A</artifactId>
14. <version>1.0</version>
15. <type>pom</type>
16. <scope>import</scope>
17. </dependency>
18. <dependency>
19. <groupId>test</groupId>
20. <artifactId>d</artifactId>
21. <version>1.0</version>
22. </dependency>
23. </dependencies>
24. </dependencyManagement>
25.
26. <dependencies>
27. <dependency>
28. <groupId>test</groupId>
29. <artifactId>a</artifactId>
30. <version>1.0</version>
31. <scope>runtime</scope>
32. </dependency>
33. <dependency>
34. <groupId>test</groupId>
35. <artifactId>c</artifactId>
36. <scope>runtime</scope>
37. </dependency>
38. </dependencies>

209
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
39. </project>

Assuming A is the POM defined in the preceding example, the end result would be the same. All of
A's managed dependencies would be incorporated into B except for d since it is defined in this
POM.
Project X:

1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>X</artifactId>
5. <packaging>pom</packaging>
6. <name>X</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>test</groupId>
13. <artifactId>a</artifactId>
14. <version>1.1</version>
15. </dependency>
16. <dependency>
17. <groupId>test</groupId>
18. <artifactId>b</artifactId>
19. <version>1.0</version>
20. <scope>compile</scope>
21. </dependency>
22. </dependencies>
23. </dependencyManagement>
24. </project>

Project Y:

1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>Y</artifactId>
5. <packaging>pom</packaging>
6. <name>Y</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>test</groupId>
13. <artifactId>a</artifactId>

210
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
14. <version>1.2</version>
15. </dependency>
16. <dependency>
17. <groupId>test</groupId>
18. <artifactId>c</artifactId>
19. <version>1.0</version>
20. <scope>compile</scope>
21. </dependency>
22. </dependencies>
23. </dependencyManagement>
24. </project>

Project Z:

1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>Z</artifactId>
5. <packaging>pom</packaging>
6. <name>Z</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>maven</groupId>
13. <artifactId>X</artifactId>
14. <version>1.0</version>
15. <type>pom</type>
16. <scope>import</scope>
17. </dependency>
18. <dependency>
19. <groupId>maven</groupId>
20. <artifactId>Y</artifactId>
21. <version>1.0</version>
22. <type>pom</type>
23. <scope>import</scope>
24. </dependency>
25. </dependencies>
26. </dependencyManagement>
27. </project>

In the example above Z imports the managed dependencies from both X and Y. However, both X
and Y contain dependency a. Here, version 1.1 of a would be used since X is declared first and a is
not declared in Z's dependencyManagement.
This process is recursive. For example, if X imports another POM, Q, when Z is processed it will
simply appear that all of Q's managed dependencies are defined in X.

211
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Bill of Materials (BOM) POMs


Imports are most effective when used for defining a "library" of related artifacts that are generally
part of a multiproject build. It is fairly common for one project to use one or more artifacts from
these libraries. However, it has sometimes been difficult to keep the versions in the project using the
artifacts in synch with the versions distributed in the library. The pattern below illustrates how a
"bill of materials" (BOM) can be created for use by other projects.
The root of the project is the BOM POM. It defines the versions of all the artifacts that will be
created in the library. Other projects that wish to use the library should import this POM into the
dependencyManagement section of their POM.

1. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/X


MLSchema-instance"
2. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/
maven-4.0.0.xsd">
3. <modelVersion>4.0.0</modelVersion>
4. <groupId>com.test</groupId>
5. <artifactId>bom</artifactId>
6. <version>1.0.0</version>
7. <packaging>pom</packaging>
8. <properties>
9. <project1Version>1.0.0</project1Version>
10. <project2Version>1.0.0</project2Version>
11. </properties>
12.
13. <dependencyManagement>
14. <dependencies>
15. <dependency>
16. <groupId>com.test</groupId>
17. <artifactId>project1</artifactId>
18. <version>${project1Version}</version>
19. </dependency>
20. <dependency>
21. <groupId>com.test</groupId>
22. <artifactId>project2</artifactId>
23. <version>${project2Version}</version>
24. </dependency>
25. </dependencies>
26. </dependencyManagement>
27.
28. <modules>
29. <module>parent</module>
30. </modules>
31. </project>

212
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The parent subproject has the BOM POM as its parent. It is a normal multiproject pom.

1. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/X


MLSchema-instance"
2. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd
/maven-4.0.0.xsd">
3. <modelVersion>4.0.0</modelVersion>
4. <parent>
5. <groupId>com.test</groupId>
6. <version>1.0.0</version>
7. <artifactId>bom</artifactId>
8. </parent>
9.
10. <groupId>com.test</groupId>
11. <artifactId>parent</artifactId>
12. <version>1.0.0</version>
13. <packaging>pom</packaging>
14.
15. <dependencyManagement>
16. <dependencies>
17. <dependency>
18. <groupId>log4j</groupId>
19. <artifactId>log4j</artifactId>
20. <version>1.2.12</version>
21. </dependency>
22. <dependency>
23. <groupId>commons-logging</groupId>
24. <artifactId>commons-logging</artifactId>
25. <version>1.1.1</version>
26. </dependency>
27. </dependencies>
28. </dependencyManagement>
29. <modules>
30. <module>project1</module>
31. <module>project2</module>
32. </modules>
33. </project>

Next are the actual project POMs.

1. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/X


MLSchema-instance"
2. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd
/maven-4.0.0.xsd">
3. <modelVersion>4.0.0</modelVersion>
4. <parent>
5. <groupId>com.test</groupId>
6. <version>1.0.0</version>
7. <artifactId>parent</artifactId>
8. </parent>
213
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
9. <groupId>com.test</groupId>
10. <artifactId>project1</artifactId>
11. <version>${project1Version}</version>
12. <packaging>jar</packaging>
13.
14. <dependencies>
15. <dependency>
16. <groupId>log4j</groupId>
17. <artifactId>log4j</artifactId>
18. </dependency>
19. </dependencies>
20. </project>
21.
22. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/X
MLSchema-instance"
23. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd
/maven-4.0.0.xsd">
24. <modelVersion>4.0.0</modelVersion>
25. <parent>
26. <groupId>com.test</groupId>
27. <version>1.0.0</version>
28. <artifactId>parent</artifactId>
29. </parent>
30. <groupId>com.test</groupId>
31. <artifactId>project2</artifactId>
32. <version>${project2Version}</version>
33. <packaging>jar</packaging>
34.
35. <dependencies>
36. <dependency>
37. <groupId>commons-logging</groupId>
38. <artifactId>commons-logging</artifactId>
39. </dependency>
40. </dependencies>
41. </project>

The project that follows shows how the library can now be used in another project without having to
specify the dependent project's versions.

1. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/X


MLSchema-instance"
2. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd
/maven-4.0.0.xsd">
3. <modelVersion>4.0.0</modelVersion>
4. <groupId>com.test</groupId>
5. <artifactId>use</artifactId>
6. <version>1.0.0</version>
7. <packaging>jar</packaging>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>com.test</groupId>
13. <artifactId>bom</artifactId>
14. <version>1.0.0</version>

214
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
15. <type>pom</type>
16. <scope>import</scope>
17. </dependency>
18. </dependencies>
19. </dependencyManagement>
20. <dependencies>
21. <dependency>
22. <groupId>com.test</groupId>
23. <artifactId>project1</artifactId>
24. </dependency>
25. <dependency>
26. <groupId>com.test</groupId>
27. <artifactId>project2</artifactId>
28. </dependency>
29. </dependencies>
30. </project>

Finally, when creating projects that import dependencies, beware of the following:

 Do not attempt to import a POM that is defined in a submodule of the current POM. Attempting to
do that will result in the build failing since it won't be able to locate the POM.
 Never declare the POM importing a POM as the parent (or grandparent, etc) of the target POM.
There is no way to resolve the circularity and an exception will be thrown.
 When referring to artifacts whose POMs have transitive dependencies, the project needs to specify
versions of those artifacts as managed dependencies. Not doing so results in a build failure since
the artifact may not have a version specified. (This should be considered a best practice in any
case as it keeps the versions of artifacts from changing from one build to the next).

System Dependencies
Important note: This is deprecated.

Dependencies with the scope system are always available and are not looked up in repository. They
are usually used to tell Maven about dependencies which are provided by the JDK or the VM. Thus,
system dependencies are especially useful for resolving dependencies on artifacts which are now
provided by the JDK, but were available as separate downloads earlier. Typical examples are the
JDBC standard extensions or the Java Authentication and Authorization Service (JAAS).
A simple example would be:

1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>javax.sql</groupId>
6. <artifactId>jdbc-stdext</artifactId>
7. <version>2.0</version>
8. <scope>system</scope>
9. <systemPath>${java.home}/lib/rt.jar</systemPath>
215
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
10. </dependency>
11. </dependencies>
12. ...
13. </project>

If your artifact is provided by the JDK's tools.jar , the system path would be defined as follows:

1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>sun.jdk</groupId>
6. <artifactId>tools</artifactId>
7. <version>1.5.0</version>
8. <scope>system</scope>
9. <systemPath>${java.home}/../lib/tools.jar</systemPath>
10. </dependency>
11. </dependencies>
12. ...
13. </project>

Maven Plugins
Maven is actually a plugin execution framework where every task is actually done by plugins.
Maven Plugins are generally used to −
create jar file
create war file
compile code files
unit testing of code
create project documentation
create project reports
A plugin generally provides a set of goals, which can be executed using the following syntax

mvn [plugin-name]:[goal-name]
For example, a Java project can be compiled with the maven-compiler-plugin's compile-goal
by running the following command.
mvn compiler:compile
Plugin Types
Maven provided the following two types of Plugins −
Sr.No. Type & Description
1 Build plugins
They execute during the build process and should be configured in the <build/>
element of pom.xml.
2 Reporting plugins
216
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
They execute during the site generation process and they should be configured in
the <reporting/> element of the pom.xml.
Following is the list of few common plugins −
Sr.No. Plugin & Description
1 clean
Cleans up target after the build. Deletes the target directory.
2 compiler
Compiles Java source files.
3 surefire
Runs the JUnit unit tests. Creates test reports.
4 jar
Builds a JAR file from the current project.
5 war
Builds a WAR file from the current project.
6 javadoc
Generates Javadoc for the project.
7 antrun
Runs a set of ant tasks from any phase mentioned of the build.
Example
We've used maven-antrun-plugin extensively in our examples to print data on console.
Refer Build Profiles chapter. Let us understand it in a better way and create a pom.xml in
C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.clean</id>
<phase>clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>clean phase</echo>
217
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
</tasks>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Next, open the command console and go to the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn clean
Maven will start processing and displaying the clean phase of clean life cycle.
C:\MVN>mvn clean
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ project ---
[INFO] Deleting C:\MVN\target
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.clean) @ project ---
[INFO] Executing tasks
[echo] clean phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.266 s
[INFO] Finished at: 2021-12-13T13:58:10+05:30
[INFO] ------------------------------------------------------------------------

C:\MVN>
The above example illustrates the following key concepts −
Plugins are specified in pom.xml using plugins element.
Each plugin can have multiple goals.
You can define phase from where plugin should starts its processing using its phase
element. We've used clean phase.
You can configure tasks to be executed by binding them to goals of plugin. We've
bound echo task with run goal of maven-antrun-plugin.
Maven will then download the plugin if not available in local repository and start its
processing.

218
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 5 : Docker– Containers & Build tool- Maven
Introduction: What is a Docker, Use case of Docker, Platforms for
Docker, Dockers vs. Virtualization
What is a Docker
Docker is an open source platform that enables developers to build, deploy, run, update and
manage containers—standardized, executable components that combine application source code with
the operating system (OS) libraries and dependencies required to run that code in any environment.
Containers simplify development and delivery of distributed applications. They have become
increasingly popular as organizations shift to cloud-native development and
hybrid multicloud environments. It‘s possible for developers to create containers without Docker, by
working directly with capabilities built into Linux and other operating systems. But Docker
makes containerization faster, easier and safer. At this writing, Docker reported over 13 million
developers using the platform (link resides outside ibm.com).
Docker also refers to Docker, Inc. (link resides outside ibm.com), the company that sells the
commercial version of Docker, and to the Docker open source project to which Docker, Inc, and many
other organizations and individuals contribute.
Docker was created in 2013 by Solomon Hykes while working for dotCloud, a cloud hosting
company. It was originally built as an internal tool to make it easier to develop and deploy
applications.
Docker containers are based on Linux containers, which have been around since the early 2000s, but
they weren‘t widely used until Docker created a simple and easy-to-use platform for running
containers that quickly caught on with developers and system administrators alike.
In March of 2014, Docker open-sourced its technology and became one of the most popular projects
on GitHub, raising millions from investors soon after.
In an incredibly short amount of time, Docker has become one of the most popular tools for
developing and deploying software, and it has been adopted by pretty much everyone in the DevOps
community!
How Does Docker Work?

Docker architecture, by nhumrich


Docker is a technology that allows you to build, run, test, and deploy distributed applications. It uses
operating-system-level virtualization to deliver software in packages called containers.

219
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The way Docker does this is by packaging an application and its dependencies in a virtual container
that can run on any computer. This containerization allows for much better portability and efficiency
when compared to virtual machines.
These containers are isolated from each other and bundle their own tools, libraries, and configuration
files. They can communicate with each other through well-defined channels. All containers are run by
a single operating system kernel, and therefore use few resources.
As mentioned, OS virtualization has been around for a while in the form of Linux
Containers (LXC), Solaris Zones, and FreeBSD jail. However, Docker took this concept further by
providing an easy-to-use platform that automated the deployment of applications in containers.
Here are some of the benefits of Docker containers over traditional virtual machines:
They‘re portable and can run on any computer that has a Docker runtime environment.
They‘re isolated from each other and can run different versions of the same software without
affecting each other.
They‘re extremely lightweight, so they can start up faster and use fewer resources.
Docker Components and Tools
Docker consists of three major components:
the Docker Engine, a runtime environment for containers
the Docker command line client, used to interact with the Docker Engine
the Docker Hub, a cloud service that provides registry and repository services for Docker
images
In addition to these core components, there‘s also a number of other tools that work with Docker,
including:
Swarm, a clustering and scheduling tool for dockerized applications
Docker Desktop, successor of Docker Machine, and the fastest way to containerize
applications
Docker Compose, a tool for defining and running multi-container Docker applications
Docker Registry, an on-premises registry service for storing and managing Docker images
Kubernetes, a container orchestration tool that can be used with Docker
Rancher, a container management platform for delivering Kubernetes-as-a-Service
There‘s even a number of services supporting the Docker ecosystem:
Amazon Elastic Container Service (Amazon ECS), a managed container orchestration service
from Amazon Web Services
Azure Kubernetes Service (AKS), a managed container orchestration service from Microsoft
Azure
Google Kubernetes Engine (GKE), a fully managed Kubernetes engine that runs in Google
Cloud Platform
Portainer, for deploying, configuring, troubleshooting and securing containers in minutes on
Kubernetes, Docker, Swarm and Nomad in any cloud, data center or device

220
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Understanding Docker Containers

Docker containers vs virtual machines. Source: Wikipedia


Containers are often compared to virtual machines, but there are some important differences between
the two. Virtual machines run a full copy of an operating system, whereas containers share the host
kernel with other containers. This makes containers much more lightweight and efficient than virtual
machines.
For starters, a container is a self-contained unit of software that includes all the dependencies
required to run an application. This makes it easy to package and ship applications without having to
worry about compatibility issues. Docker containers can be run on any machine that has a Docker
engine installed.
These containers are isolated from one another and bundle their own tools, libraries, and configuration
files, and they can communicate with each other through well-defined channels.
Building a Docker Container with Docker Images
Docker containers are built from images, which are read-only template with all the dependencies and
configurations required to run an application.
A container, in fact, is a runtime instance of an image — what the image becomes in memory when
actually executed. It runs completely isolated from the host environment by default, only accessing
host files and ports if configured to do so. As such, containers have their own networking, storage, and
process space; and this isolation makes it easy to move containers between hosts without having to
worry about compatibility issues.
Images can be created by either using a Dockerfile (which contains all the necessary instructions
for creating an image) or by using Docker commit, which takes an existing container and creates
an image from it.
What’s in a Docker Container?
Docker containers include everything an application needs to run, including:
the code
a runtime
libraries
environment variables
configuration files
A Docker container consists of three main parts:
the Dockerfile, used to build the image.
the image itself, a read-only template with instructions for creating a Docker container

221
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
the container, a runnable instance created from an image (you can create, start, stop, move or
delete a container using the Docker API or CLI)
A container shares the kernel with other containers and its host machine. This makes it much more
lightweight than a virtual machine.

How containers work, and why they're so popular


Containers are made possible by process isolation and virtualization capabilities built into the Linux
kernel. These capabilities—such as control groups (Cgroups) for allocating resources among
processes, and namespaces for restricting a processes access or visibility into other resources or areas
of the system—enable multiple application components to share the resources of a single instance of
the host operating system in much the same way that a hypervisor enables multiple virtual machines
(VMs) to share the CPU, memory and other resources of a single hardware server.
As a result, container technology offers all the functionality and benefits of VMs—including
application isolation, cost-effective scalability, and disposability—plus important additional
advantages:
Lighter weight: Unlike VMs, containers don‘t carry the payload of an entire OS instance and
hypervisor. They include only the OS processes and dependencies necessary to execute the
code. Container sizes are measured in megabytes (vs. gigabytes for some VMs), make better
use of hardware capacity, and have faster startup times.
Improved developer productivity: Containerized applications can be written once and run
anywhere. And compared to VMs, containers are faster and easier to deploy, provision and
restart. This makes them ideal for use in continuous integration and continuous
delivery (CI/CD) pipelines and a better fit for development teams adopting Agile
and DevOps practices.
Greater resource efficiency: With containers, developers can run several times as many copies
of an application on the same hardware as they can using VMs. This can reduce cloud
spending.
Companies using containers report other benefits including improved app quality, faster response to
market changes and much more. Learn more with this interactive tool:
Why use Docker?
Docker is so popular today that ―Docker‖ and ―containers‖ are used interchangeably. But the first
container-related technologies were available for years—even decades (link resides outside IBM)—
before Docker was released to the public in 2013.
Most notably, in 2008, LinuXContainers (LXC) was implemented in the Linux kernel, fully enabling
virtualization for a single instance of Linux. While LXC is still used today, newer technologies using
the Linux kernel are available. Ubuntu, a modern, open-source Linux operating system, also provides
this capability.
Docker lets developers access these native containerization capabilities using simple commands, and
automate them through a work-saving application programming interface (API). Compared to LXC,
Docker offers:
Improved and seamless container portability: While LXC containers often reference
machine-specific configurations, Docker containers run without modification across any
desktop, data center and cloud environment.
Even lighter weight and more granular updates: With LXC, multiple processes can be
combined within a single container. This makes it possible to build an application that can
continue running while one of its parts is taken down for an update or repair.

222
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Automated container creation: Docker can automatically build a container based on
application source code.

Container versioning: Docker can track versions of a container image, roll back to previous
versions, and trace who built a version and how. It can even upload only the deltas between an
existing version and a new one.

Container reuse: Existing containers can be used as base images—essentially like templates
for building new containers.

Shared container libraries: Developers can access an open-source registry containing


thousands of user-contributed containers.
Today Docker containerization also works with Microsoft Windows and Apple MacOS. Developers
can run Docker containers on any operating system, and most leading cloud providers, including
Amazon Web Services (AWS), Microsoft Azure, and IBM Cloud offer specific services to help
developers build, deploy and run applications containerized with Docker.
Docker tools and terms
Some of the tools, terms and technologies developers encounter when using Docker include:
DockerFile
Every Docker container starts with a simple text file containing instructions for how to build the
Docker container image. DockerFile automates the process of Docker image creation. It‘s essentially a
list of command-line interface (CLI) instructions that Docker Engine will run in order to assemble the
image. The list of Docker commands is huge, but standardized: Docker operations work the same
regardless of contents, infrastructure, or other environment variables.
Docker images
Docker images contain executable application source code as well as all the tools, libraries, and
dependencies that the application code needs to run as a container. When you run the Docker image, it
becomes one instance (or multiple instances) of the container.
It‘s possible to build a Docker image from scratch, but most developers pull them down from common
repositories. Multiple Docker images can be created from a single base image, and they‘ll share the
commonalities of their stack.
Docker images are made up of layers, and each layer corresponds to a version of the image. Whenever
a developer makes changes to the image, a new top layer is created, and this top layer replaces the
previous top layer as the current version of the image. Previous layers are saved for rollbacks or to be
re-used in other projects.
Each time a container is created from a Docker image, yet another new layer called the container layer
is created. Changes made to the container—such as the addition or deletion of files—are saved to the
container layer only and exist only while the container is running. This iterative image-creation
process enables increased overall efficiency since multiple live container instances can run from just a
single base image, and when they do so, they leverage a common stack.
Docker containers
Docker containers are the live, running instances of Docker images. While Docker images are read-
only files, containers are life, ephemeral, executable content. Users can interact with them, and
administrators can adjust their settings and conditions using Docker commands.
Docker Hub

223
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Hub (link resides outside ibm.com) is the public repository of Docker images that calls itself
the ―world‘s largest library and community for container images.‖ It holds over 100,000 container
images sourced form commercial software vendors, open-source projects, and individual developers.
It includes images that have been produced by Docker, Inc., certified images belonging to the Docker
Trusted Registry, and many thousands of other images.
All Docker Hub users can share their images at will. They can also download predefined base images
from the Docker filesystem to use as a starting point for any containerization project.
Other image repositories exist, as well, notably GitHub. GitHub is a repository hosting service, well
known for application development tools and as a platform that fosters collaboration and
communication. Users of Docker Hub can create a repository (repo) which can hold many images. The
repository can be public or private, and can be linked to GitHub or BitBucket accounts.
Docker Desktop
Docker Desktop (link resides outside ibm.com) is an application for Mac or Windows that includes
Docker Engine, Docker CLI client, Docker Compose, Kubernetes, and others. It also includes access
to Docker Hub.
Docker daemon
Docker daemon is a service that creates and manages Docker images, using the commands from the
client. Essentially Docker daemon serves as the control center of your Docker implementation. The
server on which Docker daemon runs is called the Docker host.
Docker registry
A Docker registry is a scalable open-source storage and distribution system for Docker images. The
registry enables you to track image versions in repositories, using tagging for identification. This is
accomplished using git, a version control tool.
Docker deployment and orchestration
When running just a few containers, it‘s fairly simple to manage an application within Docker
Engine, the industry de facto runtime. But for deployments comprising thousands of containers and
hundreds of services, it‘s nearly impossible to manage the workflow without the help of some
purpose-built tools.
Docker plugins
Docker plugins (link resides outside ibm.com) can be used to make Docker even more functional.A
number of Docker plugins are included in the Docker Engine plugin system, and third-party plugins
can be loaded as well.
Docker Compose
Developers can use Docker Compose to manage multi-container applications, where all containers run
on the same Docker host. Docker Compose creates a YAML (.YML) file that specifies which services
are included in the application and can deploy and run containers with a single command. Because
YAML syntax is language-agnostic, YAML files can be used in programs written in Java, Python,
Ruby and many others languages.
Developers can also use Docker Compose to define persistent volumes for storage, specify base nodes,
and document and configure service dependencies.
Kubernetes
Monitoring and managing container lifecycles in more complex environments requires a container
orchestration tool. While Docker includes its own orchestration tool (called Docker Swarm), most
developers choose Kubernetes instead.
Kubernetes is an open-source container orchestration platform descended from a project developed for
internal use at Google. Kubernetes schedules and automates tasks integral to the management of
224
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
container-based architectures, including container deployment, updates, service discovery, storage
provisioning, load balancing, health monitoring, and more. In addition, the open source ecosystem of
tools for Kubernetes—which includes Istio and Knative—enables organizations to deploy a high-
productivity platform-as-a-service (PaaS) for containerized applications and a faster on-ramp
to serverless computing.
How to Run a Container?
Docker containers are portable and can be run on any host with a Docker engine installed (see How to
Install Docker on Windows 10 Home.
To run a container, you need to first pull the image from a registry. Then, you can create and start the

container using this image.


For example, let‘s say we want to launch an Alpine Linux container. We would first pull the Alpine
Docker image from Docker Hub. To do that, we use the docker pull command, followed by the
name of the repository and tag (version) that we want to download:
docker pull alpine:latest
This particular image is very small — only 5MB in size! After pulling it down to our system
using docker pull, we can verify that it exists locally by running docker images. This should
give us output similar to what‘s shown below:
REPOSITORY TAG IMAGE ID CREATED
SIZE
alpine latest f70734b6b2f0 3 weeks ago
5MB
Now that we have the image locally, we can launch a container using it. To do so, we use the docker
run command, followed by the name of the image:
docker run alpine
This will give us an error message telling us that we need to specify a command to be executed inside
our container. By default, Docker containers won‘t launch any processes or commands when they‘re
created.
We can provide this command as an argument to docker run, like so:
docker run alpine echo "Hello, World!"
Here, all we‘re doing is running the echo program and passing in ―Hello, World!‖ as input. When
you execute this line, you should see output similar to what‘s shown below:
Hello, World!
Great! We‘ve successfully launched our first Docker container. But what if we want to launch a shell
inside an Alpine container? To do so, we can pass in the sh command as input to docker run:
docker run -it alpine sh
The -i flag stands for ―interactive‖ and is used to keep stdin open even if not attached. The -t flag
allocates a pseudo TTY device. Together, these two flags allow us to attach directly to our running
container and give us an interactive shell session:
/ #
From here, we can execute any commands that are available to us in the Alpine Linux distribution. For
example, let‘s try running ls:
/ # ls
225
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
bin dev etc home lib media mnt proc root run sys tmp
var
boot home lib64 media mnt opt rootfs sbin usr
If we want to exit this shell, we can simply type exit:
/ # exit
And that‘s it! We‘ve now successfully launched and exited our first Docker container.
Why Would You Use a Container?
There are many good reasons for using containers:
Flexibility. Containers can be run on any platform that supports Docker, whether it‘s a laptop,
server, virtual machine, or cloud instance. This makes it easy to move applications around and
helps DevOps teams achieve consistent environments across development, testing, and
production.
Isolation. Each container runs in its own isolated environment and has its own set of
processes, file systems, and network interfaces. This ensures that one container can‘t interfere
with or access the resources of another container.
Density and Efficiency. Multiple containers can be run on the same host system without
requiring multiple copies of the operating system or extra hardware resources, and containers
are lightweight and require fewer resources than virtual machines, making them more efficient
to run. All this saves precious time and money when deploying applications at scale.
Scalability. Containers can be easily scaled up or down to meet changing demands. This
makes it possible to efficiently utilize resources when demand is high and quickly release them
when demand decreases.
Security. The isolation capabilities of containers help to secure applications from malicious
attacks and accidental leaks. By running each container in its own isolated environment, you
can further minimize the risk of compromise.
Portability. Containers can be easily moved between different hosts, making it easy to
distribute applications across a fleet of servers. This makes it possible to utilize resources
efficiently and helps ensure that applications are always available when needed.
Reproducibility. Containers can be easily replicated to create identical copies of an
environment. This is useful for creating testing and staging environments that match
production, or for distributing applications across a fleet of servers.
Speed. Containers can be started and stopped quickly, making them ideal for applications that
need to be up and running at a moment‘s notice.
Simplicity. The container paradigm is simple and easy to understand, making it easy to get
started with containers.
Ecosystem. The Docker ecosystem includes a wide variety of tools and services that make it
easy to build, ship, and run containers.

226
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Use case of Docker,
Docker allows you to instantly create and manage containers with ease, which facilitates faster
deployments. The ability to deploy and scale infrastructure using a simple YAML config file makes it
easy to use all while offering a faster time to market. Security is prioritized with each isolated
container.

Docker provides lightweight virtualization with almost zero overhead. Primarily, you can benefit from
an extra layer of abstraction offered by Docker without having to worry about the overhead. Many
containers can be run on a single machine than with virtualization alone. Containers can be started and
stopped within milliseconds.
In summary, Docker‘s functionality falls into several categories:
Portable deployment of applications
Support for automatic building of docker images
Built-in version tracking
Registry for sharing images
A growing tools ecosystem from the docker API
Consistency among different environments
Efficient utilisation of resources
The feature that really sets Docker apart, is the layered file system and the ability to apply version
control to entire containers. The benefits of being able to track, revert and view changes is a highly
desirable and widely-used feature in software development. Docker extends that same idea to a higher
construct; the entire application, with all its dependencies in a single environment.

There are many real-life use cases of Docker. They are

DevOps adoption: Docker standardizes the configuration setup interface and simplifies the
DevOps process. The most collaboration between DevOps and Docker is in the CI/CD
production and testing.

227
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Recovering Files: When you find a hardware failure, you set up the rollback steps from time
to time. But, Docker will easily revert you to the last version and replicate the files to new
hardware.
Consolidating Servers: Docker can consolidate multiple servers just like a virtualization
machine consolidates multiple applications. Docker can provide a denser consolidation and
share unused memory across the instances.
Debugging: Besides container orchestration, Docker is also used to fix apps. Docker has a
debug mode and extensions. They give you an overview of where the problem is running.
Multi-tenancy: It is an architecture where a single instance of an app runs in multiple places.
Managing development operations becomes challenging for these apps. Docker creates an
isolated environment and gives developers a chance to run multiple instances of tiers on each
tenant.

Docker Use Cases 1: From Monolith to Microservices Architecture


Gone are the days when software was developed using only a monolith approach
(waterfall model) wherein the entire software was developed as a single entity. Although
monolith architecture facilitates the building, testing, deploying and horizontal scaling of
software, as the application gets bigger, management can become a challenge. Any bug
in any function can affect the entire app. Furthermore, making a simple change requires
rewriting, testing and deploying the entire application. As such, adopting new
technologies isn’t flexible.

On the other hand, Microservices break down the app into multiple independent and
modular services which each possess their own database schema and communicate
with each other via APIs. The microservices architecture suits the DevOps-enabled
infrastructures as it facilitates continuous delivery. By leveraging Docker, organizations
can easily incorporate DevOps best practices into the infrastructure allowing them to
stay ahead of the competition. Moreover, Docker allows developers to easily share
software along with its dependencies with operations teams and ensure that it runs the
same way on both ends. For instance, administrators can use the Docker images
created by the developers using Dockerfiles to stage and update production
environments. As such, the complexity of building and configuring CI/CD pipelines is
reduced allowing for a higher level of control over all changes made to the
infrastructure. Load balancing configuration becomes easier too.

Docker Use Cases 2: Increased Productivity


In a traditional development environment, the complexity usually lies in defining,
building and configuring development environments using manual efforts without
delaying the release cycles. The lack of portability causes inconsistent behavior in the
apps. Docker allows you to build containerized development environments using Docker
images and to easily set up and use the development environment, all while delivering
consistent performance throughout its lifecycle. Moreover, it offers seamless support
for all tools, frameworks and technologies used in the development environment.

Secondly, Docker environments facilitate automated builds, automated tests and


Webhooks. This means you can easily integrate Bitbucket or GitHub repos with the

228
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
development environment and create automatic builds from the source code and move
them into the Docker Repo. A connected workflow between developers and CI/CD tools
also means faster releases.

Docker comes with a cloud-managed container registry eliminating the need to manage
your own registry, which can get expensive when you scale the underlying
infrastructure. Moreover, the complexity in configuration becomes a thing of the past.
Implementing role-based access allows people across various teams to securely access
Docker images. Also, Slack integration allows teams to seamlessly collaborate and
coordinate throughout the product life cycle.

Offering accelerated development, automated workflows and seamless collaboration,


there’s no doubt that Docker increases productivity.

Docker Use Cases 3: Infrastructure as Code


The microservice architecture enables you to break down software into multiple service
modules allowing you to work individually with each function. While this brings
scalability and automation, there’s a catch: it leaves you with hundreds of services to
monitor and manage. This is where Infrastructure as Code (IaC) comes to your rescue,
enabling you to manage the infrastructure using code. Basically, it allows you to define
the provisioning of resources for the infrastructure using config files and convert the
infrastructure into software, thereby taking advantage of software best practices such as
CI/CD processes, automation, reusability and versioning.

Docker brings IaC into the development phase of the CI/CD pipeline as developers can
use Docker-compose to build composite apps using multiple services and ensure that it
works consistently across the pipeline. IaC is a typical example of a Docker use case.

Docker Use Cases 4: Multi-Environment Standardization


Docker provides a production parity environment for all its members across the
pipeline. Consider an instance wherein a software development team is evolving. When
a new member joins the team, each member has to install/update the operating system,
database, node, yarn etc. It can take 1-2 days just to get the machines ready.
Furthermore, it’s a challenge to ensure that everyone gets the same OS, program
versions, database versions, node versions, code editor extensions and configurations.

For instance, if you use two different versions of a library for two different programs,
you need to install two versions. In addition, custom environment variables should be
specified before you execute these programs. Now, what if you make certain last minute
changes to dependencies in the development phase and forget to make those changes
in the production?

Docker packages all the required resources into a container and ensures that there are
no conflicts between dependencies. Moreover, you can monitor untracked elements

229
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
that break your environment. Docker standardizes the environment ensuring that
containers work similarly throughout the CI/CD pipeline.

Docker Use Cases 5: Loosely Coupled Architecture


Gone are the days of the traditional waterfall software development model. Today,
developers, enabled by the cloud and microservices architecture, are breaking
applications into smaller units and easily building them as loosely coupled services that
communicate with each other via REST APIs. Docker helps developers package each
service into a container along with the required resources making it easy to deploy,
move and update them.

Telecom industries are leveraging the 5G technology and Docker’s support for software-
defined network technology to build loosely coupled architectures. The new 5G
technology supports network function virtualization allowing telecoms to virtualize
network appliance hardware. As such, they can divide and develop each network
function into a service and package it into a container. These containers can be installed
on commodity hardware which allows telecoms to eliminate the need for expensive
hardware infrastructure thus significantly reducing costs. The fairly recent entrance of
public cloud providers into the telecom market has shrunk the profits of telecom
operators and ISVs. They can now use Docker to build cost-effective public clouds with
the existing infrastructure, thereby turning docker use cases into new revenue streams.

Docker Use Cases 6: For Multi-tenancy


Multi-tenancy is a cloud deployment model wherein a single installed application serves
multiple customers with the data of each customer being completely isolated. Software-
as-a-Service (SaaS) apps mostly use the multi-tenancy approach.

There are 4 common approaches to a multi-tenancy model:

Shared database – Isolated Schema: All tenants’ data is stored in a single


database in a separate schema for each tenant. The isolation is medium.
Shared Database – Shared Schema: All tenants’ data is stored in a single
database wherein each tenant’s data is identified by a “Foreign Key”. The isolation
level is low.
Isolated database – Shared App Server: The data related to each tenant is
stored in a separate database. The isolation level is high.
Docker-based Isolated tenants: A separate database stores each tenant’s data
and each tenant is identified by a new set of containers.

While the tenant data is separated, all of these approaches use the same application
server for all tenants. That said, Docker allows for complete isolation wherein each
tenant app code runs inside its own container for each tenant.

To do this, organizations can simply convert the app code into a Docker image to run
containers and use docker-compose.yaml to define the configuration for multi-container
230
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
and multi-tenant apps, thus enabling them to run containers for each tenant. A separate
Postgres database and a separate app server will be used for each tenant running inside
the container. Each tenant will need 2 database servers and 2 app servers. You can
route your requests to the right tenant container by adding an NGINX server container.

Docker Use Cases 7: Speed Up Your CI/CD Pipeline Deployments


Unlike monolith applications which take a few minutes to turn on, containers launch
within a few seconds seeing as they are lightweight. As such, you can quickly deploy
code at lightning speeds or rapidly make changes to codebases and libraries using
containers in the CI/CD pipelines. However, it’s important to note that long build times
can slow down the CI/CD deployments. This occurs because the CI/CD pipeline must
start from scratch every time meaning dependencies must be pulled on each occasion.
Luckily, Docker comes with a cache layer that makes it easy to overcome the build issue.
That said, it only works on local machines and therefore is not available for remote
runner machines.

To solve this issue, use the “—from-cache” command to instruct Docker build to get the
cache from the local machine image. If you don’t have the local existing docker image,
you can simply create an image and pull it just before the execution of the “Docker
build” command. It’s important to note that this method only uses the latest docker
image base. Therefore, in order to get the earlier images caching, you should push and
pull every docker image based on each stage.

Docker Use Cases 8: Isolated App Infrastructure


One of Docker’s key advantages is its isolated application infrastructure. Each container
is packaged with all dependencies, therefore you don’t need to worry about dependency
conflicts. You can easily deploy and run multiple applications on one or multiple
machines with ease, regardless of the OS, platform and version of the app. Consider an
instance wherein two servers are using different versions of the same application. By
running these servers in independent containers, you can eliminate dependency issues.

Docker also offers an SSH server for automation and debugging for each isolated
container. Seeing as each service/daemon is isolated, it’s easy to monitor applications
and resources running inside the isolated container and quickly identify errors. This
allows you to run an immutable infrastructure, thereby minimizing any downtimes
resulting from infrastructure changes.

Docker Use Cases 9: Portability – Ship any Application Anywhere


Portability is one of the top five Docker use cases. Portability is the ability of a software
application to run on any environment regardless of the host OS, plugins or platform.
Containers offer portability seeing as they come packaged with all of the resources
required to run an application such as the code, system libraries, runtime, libraries and
configuration settings. Portability is also measured by the amount of tweaking needed
for an application to move to another host environment. For example, Linux containers
run on all Linux distributions but sometimes fail to work on Windows environments.
231
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker offers complete portability allowing you to move an app between various
environments without making any significant changes to its configuration. Docker has
created a standard for containerization, it’s therefore no surprise that its containers are
highly portable. Moreover, Docker containers use the host machine OS kernel and
eliminate the need to add the OS. This makes them lightweight and easy to move
between different environments.

The foregoing is especially useful when developers want to test an application in various
operating systems and analyze the results. Any discrepancies in code will only affect a
single container and therefore won’t crash the entire operating system.

Docker Use Cases 10: Hybrid and Multi-cloud Enablement


According to Channel Insider, the top three drivers of Docker adoption in organizations
are hybrid clouds, VMware costs and pressure from testing teams. Although hybrid
clouds are flexible and allow you to run customized solutions, distributing the load
across multiple environments can be a challenge. In order to facilitate seamless
movement between clouds, cloud providers usually need to compromise on costs or
feature sets. Docker eliminates these interoperability issues seeing as its containers run
in the same way in both on-premise and cloud deployments. You can seamlessly move
them between testing and production environments or internal clouds built using
multiple cloud vendor offerings. Also, the complexity of deployment processes is
reduced.

Thanks to Docker, organizations can build hybrid and multi-cloud environments


comprising two or more public/private clouds from different vendors. Migrating from
AWS to the Azure cloud is easy. Plus, you can select services and distribute them across
different clouds based on security protocols and service-level agreements.

Docker Use Cases 11: Reduce IT/Infrastructure Costs

With virtual machines, you need to copy the entire guest operating system. Thankfully,
this is not the case with Docker. Docker allows you to provision fewer resources
enabling you to run more apps and facilitating efficient optimization of resources. For
example, developer teams can consolidate resources onto a single server thus reducing
storage costs. Furthermore, Docker comes with high scalability allowing you to provision
required resources for a precise moment and automatically scale the infrastructure on-
demand. You only pay for the resources you actually use. Moreover, apps running inside
Docker deliver the same level of performance across the CI/CD pipeline, from
development to testing, staging and production. As such, bugs and errors are
minimized. This environment parity enables organizations to manage the infrastructure
with minimal staff and technical resources therefore saving considerably on
maintenance costs. Basically, Docker enhances productivity which means you don’t need
to hire as many developers as you would in a traditional software development
environment. Docker also comes with the highest level of security and, most
importantly, it’s open-source and free.
232
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Use Cases 12: Security Practices
Docker containers are secure by default. When you create a container using Docker, it
will automatically create a set of namespaces and isolate the container. Therefore, a
container cannot access or affect processes running inside another container. Similarly,
each container gets its own network stack which means it cannot gain privileged access
to the network ports, sockets and interfaces of other containers unless certain
permissions are granted. In addition to resource accounting and limiting, control groups
handle the provisioning of memory, compute and disk I/O resources. Distributed-Denial-
of-Service (DDoS) attacks are thus successfully mitigated seeing as a resource-exhausted
container cannot crash the system.

When a container launches, the Docker daemon activates a set of restriction capabilities,
augmenting the binary root with fine-grained access controls. This provides higher
security seeing as a lot of processes that run as root don’t need real root privileges.
Therefore, they can operate with lesser privileges. Another important feature is the
running signed images using the Docker Content Trust Signature Verification feature
defined in the dockerd config file. If you want to add an extra layer of security and
harden the Docker containers, SELinux, Apparmor and GRSEC are notable tools that can
help you do so.

Docker Use Cases 13: Disaster Recovery

While hybrid and multi-cloud environments offer amazing benefits to organizations,


they also pose certain challenges. Maintaining resilience is a notable one. In order to
ensure business continuity, your applications must withstand errors and failures
without data losses. You can’t afford downtimes when a component fails, especially with
critical applications. As such, we recommend that you remove single points of failure
using redundant component resiliency and access paths for high availability.
Applications should also possess self-healing abilities. Containers can help you in this
regard. Nevertheless, for cases where unforeseen failures arise, you need a disaster
recovery plan that reduces business impact during human-created or natural failures.

Docker containers can be easily and instantly created or destroyed. When a container
fails, it is automatically replaced by another one seeing as containers are built using the
Docker images and based on dockerfile configurations. Before moving an image to
another environment, you can commit data to existing platforms. You can also restore
data in case of a disaster.

All of this being said, it’s important to understand that the underlying hosts may be
connected to other components. Therefore, your disaster recovery plan should involve
spinning up a replacement host as well. In addition, you should consider things like
stateful servers, network and VPN configurations, etc.

233
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Use Cases 14: Easy Infrastructure Scaling
Docker augments the microservices architecture wherein applications are broken down
into independent services and packaged into containers. Organizations are taking
advantage of microservices and cloud architectures and building distributed
applications. Docker enables you to instantly spin up identical containers for an
application and horizontally scale the infrastructure. As the number of containers
increases, you’ll need to use a container orchestration tool such as Kubernetes or
Docker Swarm. These tools come with smart scaling abilities that allow them to
automatically scale up the infrastructure on-demand. They also help you optimize costs
seeing as they remove the need to run unnecessary containers. It’s important to fine-
grain components in order to make orchestration easier. In addition, stateless and
disposable components will enable you to monitor and manage the lifecycle of the
container with ease.

Docker Use Cases 15: Dependency Management


Isolation of dependencies is the strongest feature of containers. Consider an instance
where you have two applications that use different third party libraries. If the
applications depend on different versions of the same library, it can be a challenge to
keep tabs on the version difference throughout the product life cycle. You may need to
allow containers to talk to each other. For instance, an app needs to talk to a database
associated with another app. When you move an application to a new machine, you’ll
have to remember all of the dependencies. Furthermore, version and package conflicts
can be painful.

When trying to reproduce an environment, there are OS, language and package
dependencies that should be taken care of. If you work with Python language, you’ll
need dependency management tools such as virtualenv, venv and pyenv. If the new
environment doesn’t have a tool like git, you’ll need to create a script to install git CLI.
The script keeps changing for different OS and OS versions, therefore every team
member should be aware of these tools, which isn’t always easy.

Be it OS, language or CLI tool dependencies, Docker is the best tool for dependency
management. By simply defining the configuration in the dockerfile along with its
dependencies, you can seamlessly move an app to another machine or environment
without the need to remember the dependencies, worry about package conflicts or keep
track of user preferences and local machine configurations.

234
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Companies Powered by Docker


Docker use cases are not limited by region or industry.

Paypal is a leading US-based financial technology company which offers online payment
services across the globe. The company processes around 200 payments per second
across three different systems; Paypal, Venmo and Braintree. As such, moving services
between different clouds and architectures used to delay deployment and maintenance
tasks. Paypal therefore implemented Docker and standardized its apps and operations
across the infrastructure. To this day, the company has migrated 700 apps to Docker
and works with 4000 software employees managing 200,000 containers and 8+ billion
transactions per year while achieving a 50% increase in productivity.
Adobe also uses Docker for containerization tasks. For instance, ColdFusion is an Adobe
web programming language and application server that facilitates communication
235
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
between web apps and backend systems. Adobe uses Docker to containerize and
deploy ColdFusion services. It uses Docker Hub and Amazon Elastic Container Registry
to host Docker images. Users can therefore pull these images to the local machine and
run Docker commands.
GE is one of the few companies that was bold enough to embrace the technology at its
embryonic stage and has become a leader over the years. As such, the company
operates multiple legacy apps which delay the deployment cycle. GE turned to Docker
and has since managed to considerably reduce development to deployment time.
Moreover, it is now able to achieve higher application density than VMs, which reduces
operational costs.
What’s Next After Docker?
Once you understand how Docker is impacting different business aspects, the next thing
you want to grasp is how to fully leverage Docker technology. As organization
operations evolve, the need for thousands of containers arises. Thankfully, Docker is
highly scalable and you can easily scale services up and down while defining the number
of replicas needed using the scale.
$ docker service scale frontend=50

You can also scale multiple services at once using the docker service scale command.
Container Management Systems
As business evolves, organizations need to scale operations on-demand. Furthermore,
as container clusters increase, it becomes challenging to orchestrate them. Container
management systems help you manage container tasks right from creation and
deployment all the way to scaling and destruction, allowing you to use automation
wherever applicable. Basically, they simplify container management. In addition to
creating and removing containers, these systems manage other container-related tasks
such as orchestration, security, scheduling, monitoring, storage, log management, load
balancing and network management. According to Datadog, organizations that use
container management systems host 11.5 containers per host on average compared to
6.5 containers per host when managed by non-orchestrated environments.
Popular Container Management Tools
Here are some of the most popular container managers for your business.
Kubernetes: Kubernetes is the most popular container orchestration tool
developed by Google. It wasn’t long before Kubernetes became a de facto
standard for container management and orchestration. Google moved the tool to
Cloud Native Computing Foundation (CNCF), which means the tool is now
supported by industry giants such as IBM, Microsoft, Google andRedHat. It
enables you to quickly package, test, deploy and manage large clusters of
containers with ease. It’s also open-source, cost-effective and cloud-agnostic.

236
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Amazon EKS: As Kubernetes became a standard for container management cloud
providers started to incorporate it into their platform offerings. Amazon Elastic
Kubernetes Service (EKS) is a managed Kubernetes service for managing
Kubernetes on AWS. With EKS organizations don’t need to install and configure
Kubernetes work nodes or planes seeing as it handles that for you. In a nutshell,
EKS acts as a container service and manages container orchestration for you.
However, EKS only works with AWS cloud.

Amazon ECS: Amazon Elastic Container Service (ECS) is a fully managed container
orchestration tool for AWS environments which helps organizations manage
microservices and batch jobs with ease. ECS looks similar to EKS but differs seeing
as it manages container clusters, unlike EKS which only performs Kubernetes
tasks. ECS is free while EKS charges $0.1 per hour. That said, seeing as it’s open-
source, EKS provides you with more support from the community. ECS, on the
other hand, is more of a proprietary tool. ECS is mostly useful for people who
don’t have extensive DevOps resources or who find Kubernetes to be complex.

Amazon Fargate: Amazon Fargate is another container management as a


serverless container service that enables organizations to run virtual machines
without having to manage servers or container clusters. It’s actually a part of ECS
but it also works with EKS. While ECS offers better control over infrastructure, it
has some management complexities. If you want to run specific tasks without
worrying about infrastructure management, we recommend Fargate.

Azure Kubernetes Service: Azure Kubernetes Service (AKS) a container


management tool that is a fully-managed Kubernetes service offered by Microsoft
for Azure environments. It’s open-source and mostly free seeing as you only pay
for the associated resources. AKS is integrated with the Azure Active Directory
(AD) and offers a higher security level with role-based access controls. It
seamlessly integrates with Microsoft solutions and is easy to manage using Azure
CLI or the Azure portal.

Google Kubernetes Service: Google Kubernetes Engine (GKE) is a Kubernetes-


managed service developed by Google in 2015 to manage Google compute
engine instances running Kubernetes. GKE was the first ever Kubernetes-
managed service, followed by AKS and EKS. GKE offers more features and
automation than its competitors. Google charges $0.15 per hour per cluster.

237
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Platforms for Docker,
The Docker platform runs natively on Linux (on x86-64, ARM and many
other CPU architectures) and on Windows (x86-64). Docker Inc. builds
products that let you build and run containers on Linux, Windows and
macOS.

Unleashing the power of Docker technology doesn’t have to be a daunting task.


With the right hosting solution, you can get up and running with your applications
quickly, deploy faster and scale quicker with minimal effort.

From Back4app Containers and Heroku to innovative Docker hosting providers –


there are plenty of options available for managing your Docker containers. But how
do you decide which one is best for you?

Well, look no further! We’ve got the top 10 Docker hosting platforms in 2023 all
lined up for you. Whether you’re looking for advanced features or an easy-to-use
solution, these picks offer guaranteed performance, scalability, and reliability.

So save yourself the hassle of researching each option in detail – this curated
selection has something to suit everyone’s needs. Get ready to welcome optimized
containerization into your infrastructure!
1 Back4app Containers
2 Heroku
3 Google Cloud Run
4 Kamatera
5 Amazon ECS
6 AppFleet
7 A2 Hosting
8 Digital Ocean
9 Linode
10 Conversio
Back4app Containers
Back4app Containers is an innovative cloud-based hosting platform perfect for managing your
Docker containers. With advanced features like automated deployment, self-healing functionality, and
custom scaling options, this platform offers robust scalability and reliability for any size of the project.
What makes Back4app Containers stand out is its ease of use. All it takes is a few clicks to get your
application up and running – no need to worry about complex configurations or software updates. Plus,
the intuitive dashboard makes it simple to check on stats at any time. And with failover redundancy
built in, there‘s no need to monitor your containers 24/7 – Back4app Containers will take care of that
for you.

238
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
If you‘re looking for a hassle-free way to manage all your Docker containers, look no further than
Back4app Containers. It‘s the ideal choice for businesses of any size who want reliable performance
from their applications without any coding or IT experience required.
By taking care of the difficult work for you, Back4app Containers lets you focus on what matters most
– running your business the way it should be run! Please read the article Deploying a Docker
Application for a detailed tutorial on this subject.
Heroku
Heroku is a PaaS, or cloud-based platform as a service that enables developers to build, deploy, scale,
and manage applications quickly. Heroku allows developers to focus on coding while its platform
automates the deployment of code and scales applications according to the user‘s needs.
The core features of Heroku include automated application scaling, one-click deployments, and easy
integration with third-party services such as databases and log management. Developers can use their
existing programming language, including Ruby, Java, Node.js, and Python. Heroku also provides
developers with access to an ever-growing range of add-ons for added functionality.
Heroku is highly recommended for managing Docker containers due to its ease of use and scalability
abilities. It eliminates the need for complex configuration and makes it easier to deploy applications
quickly without worrying about setting up environment variables every time you want to update your
application.
Overall, Heroku is the perfect choice for developers who are looking for an efficient way to manage
Docker containers while having the ability to quickly scale their apps when needed.
Google Cloud Run
Google Cloud Run is a serverless computing platform by Google that helps users manage and deploy
their Docker containers on the cloud. It provides an efficient way to run stateless containers that are
invocable via HTTP requests, allowing you to quickly build applications in your favorite language and
deploy them in seconds. With Cloud Run, you can focus on creating code without worrying about
managing the underlying infrastructure.
Cloud Run‘s core features include automatic scaling, which allows your application to scale up or down
based on demand, secure execution of containers with built-in authentication and authorization, and
high availability with no downtime during deployments. Additionally, Cloud Run supports multiple
languages such as Java, Node.js, Go, Python, .NET Core, and Ruby.
Overall, Google Cloud Run is a great choice for managing your Docker containers due to its ease of use
and scalability. It simplifies the process of deploying and managing applications by providing an
efficient way to run stateless containers with minimal effort.
Kamatera
Kamatera is an innovative cloud provider that specializes in managing Docker containers. It provides
an easy-to-use platform for businesses to manage their Docker services, allowing them to take
advantage of scalability and flexibility on demand.
Kamatera offers a wide range of features tailored for managing Docker containers, including port
assignment and mapping, container life cycle management, resource scheduling, and usage tracking.
Additionally, it also provides deep customization through configurable environments, such as using
different operating systems or customizing the memory allocation of the virtual machines within each
container.
For businesses that demand scalability without sacrificing control over their environment, Kamatera‘s
platform offers comprehensive control and real-time metrics with support for multiple clouds. This
gives companies the ability to manage complex architectures without requiring specialized personnel.
In addition to providing an intuitive platform for managing Docker containers, Kamatera also takes
security seriously by offering both physical and environmental protection against unauthorized access
239
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
or data breaches. This includes hourly scans of entire environments as well as two-factor authentication
for users connecting from remote locations. All of this helps keep businesses safe from unwelcome
intrusions or attacks while ensuring data is kept secure and accessible when needed.
Amazon ECS
Amazon Elastic Container Service (ECS) is an Amazon-managed container orchestration service that
provides a secure, efficient, and scalable way to run Docker containers. With ECS, customers can
easily configure their desired number of containers running on their clusters without requiring any
additional infrastructure or computing resources.
At the core of Amazon ECS are several benefits that make it a good choice for managing your Docker
containers:
Easy Deployment: Amazon ECS streamlines the process of deploying and managing
applications in production. It automates the steps involved in launching and scaling
containerized applications.
Scalability and Performance: ECS enables users to increase or decrease the number of
available resources depending on their workloads at any given time, ensuring that their
applications always remain up and running efficiently.
Security and Reliability: Amazon ECS uses its own security features designed to ensure that
customer data is stored securely while still allowing access to control the containerized
application‘s environment.
Cost Efficiency: Amazon ECS is highly cost-effective as compared to other similar services
due to its low operating costs, which include storage, computing power, and networking.

AppFleet
AppFleet is an intelligent platform for managing your Docker containers. It provides a powerful set of
features to simplify the process of deploying and maintaining applications in production environments.
With AppFleet, you can easily manage, deploy and scale your applications without worrying about
server maintenance or other external factors.
AppFleet offers advanced monitoring tools that make it easy to track application performance and
metrics over time. It also makes it easy to keep track of costs associated with running your applications.
The platform supports rolling updates, which lets the user make changes to the application code without
taking it offline. This helps reduce downtime and ensure a smoother transition when making changes to
production applications.
In addition, AppFleet‘s orchestration capabilities enable users to automatically scale containers based
on resource needs and allocate resources across multiple nodes when needed. Its cloud automation
capabilities allow users to quickly spin up resources in the cloud, saving them time and money in
managing containerized workloads. Finally, AppFleet‘s intuitive Web-based dashboard makes it simple
for users to review their container environment as well as troubleshoot any issues they may be having.
A2 Hosting
A2 Hosting offers quick and simple Docker hosting services to manage your containers. They provide
robust features for full scalability, including unlimited storage and bandwidth, free SSL certificates,
custom domains, and more. Their cloud hosting platform is fast and secure, providing instant scalability
with no setup costs or hardware investments.
A2 Hosting‘s core features make it an ideal choice for managing your Docker containers. All of their
plans come with a cPanel control panel to help you easily manage your containers and configurations.
They also offer excellent reliability and uptime with 24/7 support, and plenty of storage options
including SSDs, unlimited databases, and FTP accounts. Additionally, they use LXC virtualization
technology to ensure that each container runs smoothly within its own environment.

240
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Digital Ocean
Digital Ocean is a cloud computing platform for developers and businesses. It provides powerful,
reliable infrastructure and services that make it easy to manage workloads and applications.
Digital Ocean offers a wide array of core features, such as on-demand virtual servers and block storage.
Its intuitive command line interface allows users to spin up new instances quickly in just 55 seconds. It
also features high availability, scalability, custom networking options, and comprehensive monitoring
capabilities.
Because of its simplicity and flexibility, Digital Ocean makes it easy to deploy applications by utilizing
Docker containers. With containerization technology, you can easily build, ship, and run your
applications without the need for a traditional server setup. This helps you save time and money when it
comes to managing multiple projects or development environments.
Linode
Linode is an innovative cloud-hosting service that has quickly become one of the leading choices for
managing your Docker containers. It offers reliable, secure, and scalable plans so that you can
customize your hosting experience according to your specific needs.
Linode‘s core features include an intuitive control panel for easy management, 24/7 monitoring and
support, DDoS protection, and fast SSD-based storage on the Akamai Connected Cloud. All these
features combine to make Linode the perfect choice for managing your Docker containers.
With its reliable performance, Linode provides users with complete control over their web hosting
choices, such as operating system installation, server configuration, and partitioning. The intuitive
interface also allows users to quickly set up custom applications on the cloud platform, such as web
servers or databases.
Conversio
Conversio is an intuitive, powerful platform that makes managing Docker containers incredibly easy
and convenient. With Conversio, you can manage your Docker containers with ease, thanks to its
robust feature set that allows you to do virtually anything.
At the core of Conversio are features like container scheduling and orchestration, health checks for
running containers and deployments, resource utilization and monitoring, as well as auto-scaling
capabilities. It even allows users to customize their container environment by setting up custom
configurations and templating resources.
What sets Conversio apart from other solutions on the market is its ease of use. The UI is user-friendly
and provides a streamlined way to keep track of all your applications in one place. Plus, it‘s designed to
work with popular automation tools such as Jenkins and Kubernetes for optimized performance across
multiple cloud providers. This makes it ideal for businesses looking for a comprehensive solution to
manage their Docker containers effectively.
Conclusion
Docker containers are an excellent way to efficiently manage applications and workloads, but it is
important to carefully consider the features each cloud provider offers. The key lies in finding the right
combination of features that fits your needs while delivering cost-effective performance. Each of these
providers offers a unique set of advantages, so be sure to compare all your options before making a
decision.

241
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Dockers vs. Virtualization
What is Docker?
Docker is popular virtualization software that helps its users in developing, deploying, monitoring, and
running applications in a Docker Container with all their dependencies.
Docker containers include all dependencies (frameworks, libraries, etc.) to run an application in an
efficient and bug-free manner.
Docker Containers have the following benefits:
Light-weight
Applications run in isolation
Occupies less space
Easily portable and highly secure
Short boot-up time

What is a Virtual Machine?


A virtual machine (VM) is a computing environment or software that aids developers to access an
operating system via a physical machine.
Now, let‘s dig into the concept of Docker vs. virtual machine.
Docker vs. Virtual Machine
Depicted below is a diagrammatic representation of how an application looks when deployed on
Docker and virtual machines:

Now, let's have a look at the primary differences between Docker and virtual machines.
Differences Docker Virtual Machine
Operating Docker is a container-based model It is not a container-based model; they use user
system where containers are software space along with the kernel space of an OS
packages used for executing an It does not share the host kernel
application on any operating system Each workload needs a complete OS or
In Docker, the containers share the hypervisor
host OS kernel
Here, multiple workloads can run on
a single OS
Performance Docker containers result in high- Since VM uses a separate OS; it causes more
performance as they use the same resources to be used
operating system with no additional Virtual machines don‘t start quickly and lead
software (like hypervisor) to poor performance
242
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker containers can start up
quickly and result in less boot-up
time
Portability With docker containers, users can It has known portability issues. VMs don‘t
create an application and store it into have a central hub and it requires more
a container image. Then, he/she can memory space to store data
run it across any host environment While transferring files, VMs should have a
Docker container is smaller than copy of the OS and its dependencies because
VMs, because of which the process of which image size is increased and becomes
of transferring files on the host‘s a tedious process to share data
filesystem is easier
Speed The application in Docker containers It takes a much longer time than it takes for a
starts with no delay since the OS is container to run applications
already up and running To deploy a single application, Virtual
These containers were basically Machines need to start the entire OS, which
designed to save time in the would cause a full boot process
deployment process of an application
Key Difference: Docker and Virtual Machine
There are many analogies of Docker and virtual machines. Docker containers and virtual machines
differ in many ways; let's discuss one analogy using apartment vs. bungalow.
Apartment (Eg: Containers) Virtual machine (Eg: Bungalow)
Most amenities (binary and library) are shared with neighbors Amenities (binary and library)
(applications) cannot be shared with neighbors
(applications)
Can have multiple tenants (Applications) Cannot have multiple tenants
(application)
For a more in-depth understanding, we will look at the key differences between the two below:
Docker Virtual machine
Containers stop working when the ―stop command‖ is Virtual machines are always in the
executed running state
It has lots of snapshots as it builds images upon the layers Doesn‘t comprise many snapshots
Images can be version controlled; they have a local registry VM doesn‘t have a central hub; they
called Docker hub are not version controlled
It can run multiple containers on a system It can run only a limited number of
VMs on a system
It can start multiple containers at a time on the Docker engine It can start only a single VM on a
VMX
Next, let‘s have a look at a real-life use-case of Docker using the BBC news channel.

243
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Architecture: Docker Architecture., Understanding the Docker
components

Docker daemon runs on the host operating system. It is responsible for running containers to
manage docker services. Docker daemon communicates with other daemons. It offers various
Docker objects such as images, containers, networking, and storage. s

Docker follows Client-Server architecture, which includes the three main components that
are Docker Client, Docker Host, and Docker Registry.

Docker client uses commands and REST APIs to communicate with the Docker Daemon
(Server). When a client runs any docker command on the docker client terminal, the client
terminal sends these docker commands to the Docker daemon. Docker daemon receives
these commands from the docker client in the form of command and REST API's request.

Docker Client uses Command Line Interface (CLI) to run the following commands -

docker build

docker pull

docker run

244
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Docker Host is used to provide an environment to execute and run applications. It contains
the docker daemon, images, containers, networks, and storage.

Docker Registry manages and stores the Docker images.

There are two types of registries in the Docker -

Pubic Registry - Public Registry is also called as Docker hub.

Private Registry - It is used to share images within the enterprise.

Docker Objects
There are the following Docker Objects -

Docker images are the read-only binary templates used to create Docker Containers. It uses
a private container registry to share container images within the enterprise and also uses
public container registry to share container images within the whole world. Metadata is also
used by docket images to describe the container's abilities.

Containers are the structural units of Docker, which is used to hold the entire package that is
needed to run the application. The advantage of containers is that it requires very less
resources.

In other words, we can say that the image is a template, and the container is a copy of that
template.

245
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Using Docker Networking, an isolated package can be communicated. Docker contains the
following network drivers -

o Bridge - Bridge is a default network driver for the container. It is used when multiple
docker communicates with the same docker host.
o Host - It is used when we don't need for network isolation between the container and
the host.
o None - It disables all the networking.
o Overlay - Overlay offers Swarm services to communicate with each other. It enables
containers to run on the different docker host.
o Macvlan - Macvlan is used when we want to assign MAC addresses to the containers.

Docker Storage is used to store data on the container. Docker offers the following options for
the Storage -

o Data Volume - Data Volume provides the ability to create persistence storage. It also
allows us to name volumes, list volumes, and containers associates with the volumes.
o Directory Mounts - It is one of the best options for docker storage. It mounts a host's
directory into a container.
o Storage Plugins - It provides an ability to connect to external storage platforms.

Docker Architecture and its Components for


Beginners
I assume you have a basic understanding of Docker. If not, you may refer to this previous
article.
I believe you understand the Docker importance in DevOps. Now behind this fantastic tool,
there has to be an amazing, well-thought architecture. Isn‟t it?
But before I talk about that, let me showcase the previous and current virtualization systems.
Traditional vs. New-Generation Virtualization
Earlier, we used to create virtual machines, and each VM had an OS which took a lot of space
and made it heavy.
Now in docker container‟s case, you have a single OS, and the resources are shared between
the containers. Hence it is lightweight and boots in seconds.

246
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Docker Architecture
Below is the simple diagram of a Docker architecture.

Let me explain you the components of a docker architecture.


Docker Engine
It is the core part of the whole Docker system. Docker Engine is an application which
follows client-server architecture. It is installed on the host machine. There are three
components in the Docker Engine:
Server: It is the docker daemon called dockerd. It can create and manage docker
images. Containers, networks, etc.
Rest API: It is used to instruct docker daemon what to do.
Command Line Interface (CLI): It is a client which is used to enter docker commands.
Docker Client
Docker users can interact with Docker through a client. When any docker commands runs,
the client sends them to dockerd daemon, which carries them out. Docker API is used by
Docker commands. Docker client can communicate with more than one daemon.
Docker Registries
247
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
It is the location where the Docker images are stored. It can be a public docker registry or a
private docker registry. Docker Hub is the default place of docker images, its stores‟ public
registry. You can also create and run your own private registry.
When you execute docker pull or docker run commands, the required docker image is pulled
from the configured registry. When you execute docker push command, the docker image is
stored on the configured registry.
Docker Objects
When you are working with Docker, you use images, containers, volumes, networks; all these
are Docker objects.
Images
Docker images are read-only templates with instructions to create a docker container. Docker
image can be pulled from a Docker hub and used as it is, or you can add additional
instructions to the base image and create a new and modified docker image. You can create
your own docker images also using a dockerfile. Create a dockerfile with all the instructions
to create a container and run it; it will create your custom docker image.
Docker image has a base layer which is read-only, and the top layer can be written. When
you edit a dockerfile and rebuild it, only the modified part is rebuilt in the top layer.
Containers
After you run a docker image, it creates a docker container. All the applications and their
environment run inside this container. You can use Docker API or CLI to start, stop, delete a
docker container.
Below is a sample command to run a ubuntu docker container:
docker run -i -t ubuntu /bin/bash
Volumes

The persisting data generated by docker and used by Docker containers are stored in
Volumes. They are completely managed by docker through docker CLI or Docker API.
Volumes work on both Windows and Linux containers. Rather than persisting data in a
container‟s writable layer, it is always a good option to use volumes for it. Volume‟s content
exists outside the lifecycle of a container, so using volume does not increase the size of a
container.

You can use -v or –mount flag to start a container with a volume. In this sample command,
you are using geekvolume volume with geekflare container.

docker run -d --name geekflare -v geekvolume:/app nginx:latest


Networks

Docker networking is a passage through which all the isolated container communicate. There
are mainly five network drivers in docker:

248
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1. Bridge: It is the default network driver for a container. You use this network when your
application is running on standalone containers, i.e. multiple containers
communicating with same docker host.
2. Host: This driver removes the network isolation between docker containers and
docker host. It is used when you don‟t need any network isolation between host and
container.
3. Overlay: This network enables swarm services to communicate with each other. It is
used when the containers are running on different Docker hosts or when swarm
services are formed by multiple applications.
4. None: This driver disables all the networking.
5. macvlan: This driver assigns mac address to containers to make them look like
physical devices. The traffic is routed between containers through their mac addresses.
This network is used when you want the containers to look like a physical device, for
example, while migrating a VM setup.

Installation: Installing Docker on Linux. Understanding Installation of


Docker on windows. Some Docker commands. Provisioning.
We can install docker on any operating system whether it is Mac, Windows, Linux or any
cloud. Docker Engine runs natively on Linux distributions. Here, we are providing step by step
process to install docker engine for Linux Ubuntu Xenial-16.04 [LTS].
Introduction to Docker
Docker can be described as an open platform to develop, ship, and run applications. Docker
allows us to isolate our applications from our infrastructure, so we can quickly deliver
software. We can manage our infrastructure in similar ways we manage our applications with
Docker. By taking the benefits of the methodologies of Docker to ship, test, and deploy code
quickly, we can significantly decrease the delay between running code and writing it in
production.
What is the Docker Platform?
Docker gives the ability to package and execute an application in a loosely separated
environment known as a container. This separation and security permit us to execute several
containers simultaneously on the specified host. The containers are lightweight and include
everything required to execute the applications, so we don't need to depend on what is
installed on the host currently. We can easily distribute containers while we work and ensure
that everyone we distribute with receives a similar container that operates in a similar way.
Docker gives a platform and tooling facility to maintain the lifecycle of our containers:
o The container becomes a unit to distribute and test our application.
o Develop our application and its supporting elements using containers.
o When we are ready, deploy our application into our production environment as an
orchestrated service or a container. It works similarly whether our production
environment is a cloud provider, a local data center, or a combination of the two.

249
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Usage of Docker
o Fast application delivery
Docker accumulates the development lifecycle by permitting developers to operate in
standardized environments with local containers, which give our services and applications.
Containers are ideal for continuous delivery and continuous integration workflows.
o Responsive scaling and deployment
The container-based platform of Docker permits for highly compact workloads. The
containers can execute on the local laptop, cloud providers, virtual or physical machines in
the data center, or in a combination of the environments of a developer.
The lightweight nature and portability of Docker also make it easier to dynamically maintain
workloads and scale up and tear down services and applications as business requirements
dictate.
o Running multiple workloads on a similar hardware
Docker is fast and lightweight. It offers a cost-effective and viable replacement for
hypervisor-based virtual machines, so we can use more of our server capacity to gain our
business objectives. Docker is great for high-density platforms and for medium and small
deployments where we require to work more using fewer resources.

Docker need two important installation requirements:


o It only works on a 64-bit Linux installation.
o It requires Linux kernel version 3.10 or higher.
To check your current kernel version, open a terminal and type uname -r command to
display your kernel version:
Command:
1. $ uname -r

Update apt sources


Follow following instructions to update apt sources.
1. Open a terminal window.
2. Login as a root user by using sudo command.
3. Update package information and install CA certificates.
Command:
1. $ apt-get update
2. $ apt-get install apt-transport-https ca-certificates

250
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
See, the attached screen shot below.

4. Add the new GPG key. Following command downloads the key.
Command:
1. $ sudo apt-key adv \
2. --keyserver hkp://ha.pool.sks-keyservers.net:80 \
3. --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
Screen shot is given below.

5. Run the following command, it will substitute the entry for your operating system for the
file.
1. $ echo "<REPO>" | sudo tee /etc/apt/sources.list.d/docker.list
See, the attached screen shot below.

6. Open the file /etc/apt/sources.list.d/docker.listand paste the following line into the file.
1. deb https://apt.dockerproject.org/repo ubuntu-xenial main

251
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

7. Now again update your apt packages index.


1. $ sudo atp-get update

See, the attached screen shot below.


8. Verify that APT is pulling from the right repository.
1. $ apt-cache policy docker-engine
See, the attached screen shot below.

9. Install the recommended packages.


1. $ sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual

252
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Install the latest Docker version.
1. update your apt packages index.
1. $ sudo apt-get update
See, the attached screen shot below.

2. Install docker-engine.
1. $ sudo apt-get install docker-engine
See, the attached screen shot below.

3. Start the docker daemon.


1. $ sudo service docker start
See, the attached screen shot below.

4. Verify that docker is installed correctly by running the hello-world image.


1. $ sudo docker run hello-world
See, the attached screen shot below.

253
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

This above command downloads a test image and runs it in a container. When the container
runs, it prints a message and exits.

How to install docker on Windows


We can install docker on any operating system like Windows, Linux, or Mac. Here, we are
going to install docker-engine on Windows. The main advantage of using Docker on
Windows is that it provides an ability to run natively on Windows without any kind of
virtualization. To install docker on windows, we need to download and install the Docker
Toolbox.

Follow the below steps to install docker on windows -

Step 1: Click on the below link to download


DockerToolbox.exe. https://download.docker.com/win/stable/DockerToolbox.exe

Step 2: Once the DockerToolbox.exe file is downloaded, double click on that file. The
following window appears on the screen, in which click on the Next.

Step 3: Browse the location where you want to install the Docker Toolbox and click on the
Next.

254
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 4: Select the components according to your requirement and click on the Next.

Step 5: Select Additional Tasks and click on the Next.

255
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 6: The Docker Toolbox is ready to install. Click on Install.

Step 7: Once the installation is completed, the following Wizard appears on the screen, in
which click on the Finish.

256
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 8: After the successful installation, three icons will appear on the screen that
are: Docker Quickstart Terminal, Kitematic (Alpha), and OracleVM VirtualBox. Double
click on the Docker Quickstart Terminal.

Step 9: A Docker Quickstart Terminal window appears on the screen.

257
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

To verify that the docker is successfully installed, type the below command and press enter
key.

1. docker run hello-world

The following output will be visible on the screen, otherwise not.

You can check the Docker version using the following command.

1. docker -version

258
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

List of 57 Essential Docker Commands


Here are the top 57 essential/ basic docker commands with descriptions to
learn and use.
Command Usage
docker attach Attach local standard input, output, and error streams to a running container
docker build Build an image from a Dockerfile
docker builder Manage builds
docker checkpoint Manage checkpoints
docker commit Create a new image from a container’s changes
docker config Manage Docker configs
docker container Manage containers
docker context Manage contexts
docker cp Copy files/folders between a container and the local filesystem
docker create Create a new container
docker diff Inspect changes to files or directories on a container’s filesystem
docker events Get real time events from the server
docker exec Run a command in a running container
docker export Export a container’s filesystem as a tar archive
docker history Show the history of an image
docker image Manage images
docker images List images
docker import Import the contents from a tarball to create a filesystem image
docker info Display system-wide information
docker inspect Return low-level information on Docker objects
docker kill Kill one or more running containers
docker load Load an image from a tar archive or STDIN
docker login Log in to a Docker registry
docker logout Log out from a Docker registry
docker logs Fetch the logs of a container
docker manifest Manage Docker image manifests and manifest lists
docker network Manage networks
docker node Manage Swarm nodes
docker pause Pause all processes within one or more containers
docker plugin Manage plugins
docker port List port mappings or a specific mapping for the container
docker ps List containers
docker pull Pull an image or a repository from a registry
docker push Push an image or a repository to a registry
docker rename Rename a container
docker restart Restart one or more containers
docker rm Remove one or more containers
docker rmi Remove one or more images
docker run Run a command in a new container
docker save Save one or more images to a tar archive (streamed to STDOUT by default)
docker search Search the Docker Hub for images
docker secret Manage Docker secrets
docker service Manage services
docker stack Manage Docker stacks
259
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
docker start Start one or more stopped containers
docker stats Display a live stream of container(s) resource usage statistics
docker stop Stop one or more running containers
docker swarm Manage Swarm
docker system Manage Docker
docker tag Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
docker top Display the running processes of a container
docker trust Manage trust on Docker images
docker unpause Unpause all processes within one or more containers
docker update Update configuration of one or more containers
docker version Show the Docker version information
docker volume Manage volumes
docker wait Block until one or more containers stop, then print their exit codes
Let's understand a few of the above commands along with their usage in
detail. The following are the most used docker basic commands for
beginners and experienced docker professionals.
1. docker –version
This command is used to get the current version of the docker
Syntax
docker - -version [OPTIONS]
Copy Code
By default, this will render all version information in an easy-to-read layout.
2. docker pull
Pull an image or a repository from a registry
Syntax
docker pull [OPTIONS] NAME[: TAG|@DIGEST]
Copy Code
To download an image or set of images (i.e. A Repository) , Once can use
docker pull command
Example:
$ docker pull dockerimage
Copy Code
3. docker run
This command is used to create a container from an image
Syntax
docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
Copy Code
The docker run command creates a writeable container layer over the
specified image and then starts it using the specified command.
The docker run command can be used with many variations, One can refer
to the following documentation docker run.
4. docker ps
This command is used to list all the containers
260
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Syntax
docker ps [OPTIONS]
Copy Code
The above command can be used with other options like - all or –a
docker ps -all: Lists all containers
Example:
$ docker ps
Copy Code
$ docker ps -a
Copy Code
5. docker exec
This command is used to run a command in a running container
Syntax
docker exec [OPTIONS] CONTAINER COMMAND [ARG...]
Copy Code
Docker exec command runs a new command in a running container.
Refer to the following article for more detail regarding the usage of the
docker exec command docker exec.
6. docker stop
This command is used to stop one or more running containers.
Syntax:
docker stop [OPTIONS] CONTAINER [CONTAINER...]
Copy Code
The main process inside the container will receive SIGTERM, and after a
grace period, SIGKILL. The first signal can be changed with the STOPSIGNAL
instruction in the container’s Dockerfile, or the --stop-signal option to docker
run.
Example:
$ docker stop my_container
Copy Code
7. docker restart
This command is used to restart one or more containers.
Syntax: docker restart [OPTIONS] CONTAINER [CONTAINER...]
Example:
$ docker restart my_container
Copy Code
8. docker kill
This command is used to kill one or more containers.
Syntax: docker kill [OPTIONS] CONTAINER [CONTAINER...]
Example:

261
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
$ docker kill my_container
Copy Code
9. docker commit
This command is used to create a new image from the container image.
Syntax: docker commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]
Docker commit command allows users to take an existing running container
and save its current state as an image
There are certain steps to be followed before running the command
First , Pull the image from docker hub
Deploy the container using the image id from first step
Modify the container (Any changes ,if needed)
Commit the changes
Example:
$ docker commit c3f279d17e0a dev/testimage:version3.
Copy Code
10. docker push
This command is used to push an image or repository to a registry.
Syntax: docker push [OPTIONS] NAME[: TAG]
Use docker image push to share your images to the Docker Hub registry or
to a self-hosted one.
Example:
$ docker image push registry-host:5000/myadmin/rhel-
httpd:lates
Copy Code
Apart from the above commands, we have other commands for which the
detailing can be found in the following link Docker reference.
One can become DevOps certified by referring to DevOps Certification courses .
Docker Use Cases
Let's understand a few of the docker use cases:
Use case 1: Developers write their code locally and can share it using docker
containers.
Use case 2: Fixing the bugs and deploying them into the respective
environments is as simple as pushing the image to the respective
environment.
Use case 3: Using docker one can push their application to the test
environment and execute automated and manual tests
Use case 4: One can make their deployment responsive and scalable by using
docker since docker can handle dynamic workloads feasibility.
Let us take an example of an application,
When a company wants to develop a web application they need an
environment where they have a tomcat server installed. Once the tester set
262
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
up a tomcat environment and test the application, it is deployed into a
production environment. Once again the tomcat has to be setup in
production environment to host the java web application There are some
issues with this approach:
Loss of time and effort.
Developer and tester might use a different tomcat versions.
Now, let's see how the Docker container can be used to prevent this loss.
In order to overcome the issues, docker will be used by a developer to create
a docker image using a base image which is already existing in Docker hub.
Docker hub has some base images available for free. Now this image can be
used by developer, tester and the system admin to deploy a tomcat
environment. In this way, Docker container solves the problem.
Docker Architecture
Docker architecture generally consists of a Docker Engine which is a client -
server application with three major components:
1. Generally, docker follows a client-server architecture
2. The client communicates with the daemon, which generally takes up the task of
building,running, and shipping the docker containers.
3. The client and daemon communicate using REST API calls. These calls act as an interface
between the client and daemon
4. A command-line interface, Docker CLI runs docker commands. Some basic docker commands
with examples are listed in the next section.
5. Registry stores the docker images

Docker Hub.: Downloading Docker images.


You can download the IBM Connect:Direct® for UNIX Docker image from Fix Central.
To download the Docker image, complete the following steps:
1. Log on to the Fix Central web site by using necessary credentials.
2. Select the item to download:
6.0.0.2-IBMConnectDirectforUNIX-Certified-Container-Linux-
x86-iFix000.tar.gz
3. Untar the downloaded tar file to extract the following file:
cdu6.0_certified_container_6.0.0.2.tar
4. Load the Docker image into registry using the downloaded image file by running the
following command.
docker load -i cdu6.0_certified_container_6.0.0.2.tar
5. Invoke the docker images command to verify if the image is loaded successfully.

Docker Hub is the public repository that hosts a large number of Docker images. Docker
images are pre-built containers that can be easily downloaded and run on any system. Users
263
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
can also download the Docker images for offline use. Moreover, they can load the Docker
image onto another computer or keep a backup of the Docker image.
This blog will explain the method to download the official Docker image for offline use.
How to Download Docker Images for Offline Use?
To download Docker images for offline use, check out the provided steps:
Navigate to Docker Hub.
Search for the desired image and copy its ―pull‖ command.
Pull the Docker image in the local repository using the ―docker pull <image-name>‖
command.
Save the image to file via the ―docker save -o <output-file-name> <image-name>‖
command.
Load the image from the saved file using the ―docker load -i <output-file-name>‖
command.
Run the Docker image for verification.
Step 1: Choose an Image and Copy its “pull” Command
First, redirect to Docker Hub, and search for the desired Docker image. For instance, we
have searched for the ―hello-world‖ image. Then, copy the below-highlighted command:

264
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Step 2: Pull Docker Image
Now, run the copied command in the Windows PowerShell to pull the selected Docker image
into the local repository:
docker pull hello-world

Step 3: Save Docker Image to a File


To save a Docker image to the file, utilize the below-listed command:
docker save -o hello-world_image.docker hello-world
Here:
―docker save‖ command is used to save the Docker image to the tar archive file.
―-o‖ option is utilized to specify the output file. In our case, it is ―hello-
world_image.docker‖.
―hello-world‖ specifies the name of the Docker container image to be saved.
This command will save the ―hello-world‖ image to the ―hello-world_image.docker‖ file:

Step 4: Verification
Follow the provided path in your PC to view the output file:
C:\Users\<user-name>
In the below image, the saved output file can be seen, i.e., ―hello-world_image.docker‖:

265
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 5: Load Docker Image from Saved File


Next, write out the following command to load the Docker image from the saved file on the
offline PC:
docker load -i hello-world_image.docker
Here:
―docker load‖ command is used to load the Docker image from a tar archive file.
―-i‖ option is utilized to specify the input file, i.e., ―hello-world_image.docker‖.
This command will load the ―hello-world‖ image from the ―hello-world_image.docker‖ file:

Step 6: Run Docker Image


Lastly, run the Docker image to verify the installation:
docker run hello-world
The below output indicates that the Docker image has been downloaded and installed successfully for
offline use:

266
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Conclusion
To download Docker images for offline use, first, search for the desired image on Docker
Hub and copy its ―pull‖ command. Then, run the ―docker pull <image-name>‖ command to
pull the Docker image in the local repository. After that, save the Docker image to the file via
the ―docker save -o <output-file-name> <image-name>‖ command and load it from the
saved file using the ―docker load -i <output-file-name>‖ command. Lastly, run the Docker
image for verification. This blog explained the method to download the official Docker image
for offline use.
Uploading the images in Docker Registry and AWS ECS,
1. Create the AWS ECR repository
In the AWS console go to the AWS ECR page. Click the “Create repository” button.

267
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

AWS ECR list all repositories page


Then chouse visibility of your repository. I leave it as “private”, so it will be managed by IAM
and repository policy permissions and won't be accessible to the public. Then fill up the name
and click Create a repository on the bottom of the page.

Create AWS ECR repository form


An empty repository has been created!

The newly created AWS ECR repository


2. Prepare the image to be pushed.
In this example, I will push the image of a simple Node.js app that listens on port 8080 and
displays its host/container name. The source code you can check here. The root directory has
a Dockerfile. We will use it to build an image. Before pushing the image to the repository we
need to tag it with the repository URL. In the root directory in the terminal run commands:
The command to tag an image
The result will be like this:

268
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Result of commands execution


3. Authenticate the Docker CLI to your AWS ECR
Now we need to authenticate the Docker CLI to AWS ECR.
The login command
For this command to execute successfully you have to have your AWS credentials stored in
the credentials file and your IAM principal has to have the necessary permission. Make sure
you use the right region, as it is a common mistake.
The result of the executed login command
This command retrieves an authentication token using the GetAuthorizationToken API and
then redirects it using the pipe (|) to the login command of the container client, the Docker
CLI in my case. The authorization token is valid for 12 hours.
4. Push the image to AWS ECR

269
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Now the image is in my repository created in step 1.

The image in the repository

Understanding the containers,


Containers are technologies that allow the packaging and isolation of
applications with their entire runtime environment—all of the files
necessary to run. This makes it easy to move the contained application
between environments (dev, test, production, etc.) while retaining full
functionality. Containers are also an important part of IT security.
By building security into the container pipeline and defending
infrastructure, containers stay reliable, scalable, and trusted. You can also
easily move the containerized application between public, private and
hybrid cloud environments and data centers (or on-premises) with
consistent behavior and functionality.
A container is a standard unit of software that packages up code and all its
dependencies so the application runs quickly and reliably from one computing
environment to another. A Docker container image is a lightweight, standalone,
executable package of software that includes everything needed to run an
application: code, runtime, system tools, system libraries and settings. Container
images become containers at runtime and in the case of Docker containers –
images become containers when they run on Docker Engine. Available for both
Linux and Windows-based applications, containerized software will always run
the same, regardless of the infrastructure. Containers isolate software from its
environment and ensure that it works uniformly despite differences for instance
between development and staging. Docker containers that run on Docker Engine:
Standard: Docker created the industry standard for containers, so they could be
portable anywhere
Lightweight: Containers share the machine‘s OS system kernel and therefore do
not require an OS per application, driving higher server efficiencies and reducing
server and licensing costs
Secure: Applications are safer in containers and Docker provides the strongest
default isolation capabilities in the industry
270
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Linux containers?
Imagine you’re developing an application. You do your work on a laptop and your
environment has a specific configuration. Other developers may have slightly different
configurations. The application you’re developing relies on that configuration and is
dependent on specific libraries, dependencies, and files. Meanwhile, your business has
development and production environments that are standardized with their own
configurations and their own sets of supporting files. You want to emulate those
environments as much as possible locally, but without all the overhead of recreating the
server environments. So, how do you make your app work across these environments, pass
quality assurance, and get your app deployed without massive headaches, rewriting, and
break-fixing? The answer: containers.

The container that holds your application has the necessary libraries, dependencies, and
files so you can move it through production without nasty side effects. In fact, the contents of
a container image—created using an open-source tool like Buildah—can be thought of as an
installation of a Linux distribution because it comes complete with RPM packages,
configuration files, etc. But, container image distribution is a lot easier than installing new
copies of operating systems. Crisis averted—everyone’s happy.

271
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
That’s a common example, but Linux containers can be applied to many different problems
where portability, configurability, and isolation is needed. The point of Linux containers is to
develop faster and meet business needs as they arise. In some cases, such as real-time
data streaming with Apache Kafka, containers are essential because they're the only way to
provide the scalability an application needs. No matter the infrastructure—on-premise, in
the cloud, or a hybrid of the two—containers meet the demand. Of course, choosing the right
container platform is just as important as the containers themselves.

Red Hat® OpenShift® includes everything needed for hybrid cloud, enterprise container, and
Kubernetes development and deployments. OpenShift is available as a cloud service with
major cloud providers, or you can manage OpenShift yourself for greater flexibility and
customization.

Running commands in container.


Docker is a containerization tool that helps developers create and manage portable,
consistent Linux containers.
When developing or deploying containers you’ll often need to look inside a running
container to inspect its current state or debug a problem. To this end, Docker provides
the docker exec command to run programs in containers that are already running.
In this tutorial we will learn about the docker exec command and how to use it to run
commands and get an interactive shell in a running Docker container.
Prerequisites
This tutorial assumes you already have Docker installed, and your user has permission
to run docker. If you need to run docker as the root user, please remember to
prepend sudo to the commands in this tutorial.
For more information on using Docker without sudo access, please see the Executing
the Docker Command Without Sudo section of our How To Install Docker tutorial.
Starting a Test Container
To use the docker exec command, you will need a running Docker container. If you
don’t already have a container, start a test container with the following docker
run command:
1. docker run -d --name container-name alpine watch "date >> /var/log/date.log"
2.
Copy
This command creates a new Docker container from the official alpine image. This is a
popular Linux container image that uses Alpine Linux, a lightweight, minimal Linux
distribution.
We use the -d flag to detach the container from our terminal and run it in the
background. --name container-name will name the container container-name. You
could choose any name you like here, or leave this off entirely to have Docker
automatically generate a unique name for the new container.
Next we have alpine, which specifies the image we want to use for the container.
And finally we have watch "date >> /var/log/date.log". This is the command we
want to run in the container. watch will repeatedly run the command you give it, every
two seconds by default. The command that watch will run in this case is date >>
/var/log/date.log. date prints the current date and time, like this:
1. Output
2. Fri Jul 23 14:57:05 UTC 2021
272
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
The >> /var/log/date.log portion of the command redirects the output from date and
appends it to the file /var/log/date.log. Every two seconds a new line will be
appended to the file, and after a few seconds it will look something like this:
Output
Fri Jul 23 15:00:26 UTC 2021
Fri Jul 23 15:00:28 UTC 2021
Fri Jul 23 15:00:30 UTC 2021
Fri Jul 23 15:00:32 UTC 2021
Fri Jul 23 15:00:34 UTC 2021

In the next step we’ll learn how to find the names of Docker containers. This will be
useful if you already have a container you’re targeting, but you’re not sure what its
name is.

Finding the Name of a Docker Container


We’ll need to provide docker exec with the name (or container ID) of the container we
want to work with. We can find this information using the docker ps command:
1. docker ps
2.
Copy

This command lists all of the Docker containers running on the server, and provides
some high-level information about them:

Output
CONTAINER ID IMAGE COMMAND CREATED STATUS
PORTS NAMES
76aded7112d4 alpine "watch 'date >> /var…" 11 seconds ago Up 10
seconds container-name
In this example, the container ID and name are highlighted. You may use either to
tell docker exec which container to use.
If you’d like to rename your container, use the docker rename command:
1. docker rename container-name new-name
2.
Copy
Next, we’ll run through several examples of using docker exec to execute commands in
a running Docker container.
Running an Interactive Shell in a Docker Container
If you need to start an interactive shell inside a Docker Container, perhaps to explore
the filesystem or debug running processes, use docker exec with the -i and -t flags.
The -i flag keeps input open to the container, and the -t flag creates a pseudo-
terminal that the shell can attach to. These flags can be combined like this:
1. docker exec -it container-name sh
2.
Copy

273
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
This will run the sh shell in the specified container, giving you a basic shell prompt. To
exit back out of the container, type exit then press ENTER:
1. exit
2.
Copy
If your container image includes a more advanced shell such as bash, you could
replace sh with bash above.
Running a Non-interactive Command in a Docker Container
If you need to run a command inside a running Docker container, but don’t need any
interactivity, use the docker exec command without any flags:
1. docker exec container-name tail /var/log/date.log
2.
Copy
This command will run tail /var/log/date.log on the container-name container, and
output the results. By default the tail command will print out the last ten lines of a file.
If you’re running the demo container we set up in the first section, you will see
something like this:
Output
Mon Jul 26 14:39:33 UTC 2021
Mon Jul 26 14:39:35 UTC 2021
Mon Jul 26 14:39:37 UTC 2021
Mon Jul 26 14:39:39 UTC 2021
Mon Jul 26 14:39:41 UTC 2021
Mon Jul 26 14:39:43 UTC 2021
Mon Jul 26 14:39:45 UTC 2021
Mon Jul 26 14:39:47 UTC 2021
Mon Jul 26 14:39:49 UTC 2021
Mon Jul 26 14:39:51 UTC 2021
This is essentially the same as opening up an interactive shell for the Docker container
(as done in the previous step with docker exec -it container-name sh) and then
running the tail /var/log/date.log command. However, rather than opening up a
shell, running the command, and then closing the shell, this command returns that
same output in a single command and without opening up a pseudo-terminal.
Running Commands in an Alternate Directory in a Docker
Container
To run a command in a certain directory of your container, use the --workdir flag to
specify the directory:
1. docker exec --workdir /tmp container-name pwd
2.
Copy
This example command sets the /tmp directory as the working directory, then runs
the pwd command, which prints out the present working directory:
Output
/tmp
The pwd command has confirmed that the working directory is /tmp.

274
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Running Commands as a Different User in a Docker Container
To run a command as a different user inside your container, add the --user flag:
1. docker exec --user guest container-name whoami
2.
Copy
This will use the guest user to run the whoami command in the container.
The whoami command prints out the current user’s username:
Output
guest
The whoami command confirms that the container’s current user is guest.
Passing Environment Variables into a Docker Container
Sometimes you need to pass environment variables into a container along with the
command to run. The -e flag lets you specify an environment variable:
1. docker exec -e TEST=sammy container-name env
2.
Copy
This command sets the TEST environment variable to equal sammy, then runs
the env command inside the container. The env command then prints out all the
environment variables:
Output
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=76aded7112d4
TEST=sammy
HOME=/root
The TEST variable is set to sammy.
To set multiple variables, repeat the -e flag for each one:
1. docker exec -e TEST=sammy -e ENVIRONMENT=prod container-name env
2.
Copy
If you’d like to pass in a file full of environment variables you can do that with the --
env-file flag.
First, make the file with a text editor. We’ll open a new file with nano here, but you can
use any editor you’re comfortable with:
1. nano .env
2.
Copy
We’re using .env as the filename, as that’s a popular standard for using these sorts of
files to manage information outside of version control.
Write your KEY=value variables into the file, one per line, like the following:
.env
TEST=sammy
ENVIRONMENT=prod
Save and close the file. To save the file and exit nano, press CTRL+O, then ENTER to save,
then CTRL+X to exit.
Now run the docker exec command, specifying the correct filename after --env-file:
1. docker exec --env-file .env container-name env

275
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
2.
Copy
Output
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=76aded7112d4
TEST=sammy
ENVIRONMENT=prod
HOME=/root

The two variables in the file are set.

You may specify multiple files by using multiple --env-file flags. If the variables in the
files overlap each other, whichever file was listed last in the command will override the
previous files.
Common Errors
When using the docker exec command, you may encounter a few common errors:
Error: No such container: container-name
The No such container error means the specified container does not exist, and may
indicate a misspelled container name. Use docker ps to list out your running containers
and double-check the name.
Error response from daemon: Container
2a94aae70ea5dc92a12e30b13d0613dd6ca5919174d73e62e29cb0f79db6e4ab is not
running
This not running message means that the container exists, but it is stopped. You can
start the container with docker start container-name
Error response from daemon: Container container-name is paused, unpause the
container before exec
The Container is paused error explains the problem fairly well. You need to unpause
the container with docker unpause container-name before proceeding.

Running multiple containers.


Docker Compose is a tool that helps us overcome this problem and
efficiently handle multiple containers at once. Also used to manage several
containers at the same time for the same application.
This tool can become very powerful and allow you to deploy applications
with complex architectures very quickly.

276
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Docker (individual container) VS Docker-Compose (several containers)


With Compose, you use a Compose file to define and configure your
application’s components as services. Then, using a single command, you
can create and start all the services according to your configurations.
Compose works in all environments: production, staging, development,
testing, as well as CI workflows.
In short, Docker Compose works by applying many rules declared within a
single docker-compose.yaml configuration file.
Let’s go step by step.
First of all, you need to install docker-compose in your environment.
Create a docker-compose.yaml file that defines the services (containers) that
make up your application. So they can be run together in an isolated
environment. In this compose file, we define all the configurations that
need to build and run the services as docker containers.
There are several steps to follow to use docker-compose.
1. Split your app into services
The first thing to do is to think about how you’re going to divide the
components of your application into different services(containers).
In a simple client-server web application, it could contain three main layers
(frontend, backend, and the database). So we can split the app in that way.
Likewise, you will have to identify your services of the application,
respectively.
2. Pull or build images
For some of your services, you may not need to build from a
custom Dockerfile , and a public image on DockerHub will suffice.
For example, if you have a MySQL database in your application, you can
pull MySQL image from the hub instead of building it. For others, you will
have to create a Dockerfile and build them.

277
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
3. Configure environment variables, declare dependencies
Most applications use environment variables for initialization and startup.
And also, after we divide the application into services, they have
dependencies on each other. So we need to identify those things before
we declare the compose file.
4. Configure networking
Docker containers communicate with each other through their internal
network that is created by compose (eg service_name:port). If you want to
connect from your host machine, you will have to expose the service to a
host port.
5. Set up volumes
In most cases, we would not want our database contents to be lost each
time the database service is brought down. A simple way to persist our DB
data is to mount a volume.
6. Build & Run
Now, you are set to go and create the compose file and build the images
for your services and generate containers from those images.
A sample docker-compose.yaml file is shown below with all the
configurations discussed before. Get detailed service configuration
reference from the docker-compose file reference.
These YAML rules, both human-readable and machine-optimized, provide
us an efficient way to snapshot the entire project within a few minutes.
After all those, in the end, we just need to run:
$ docker-compose up [options]
And compose will start and run your entire app. along with the above
command, you can use the following options,
-d, --detach Detached mode: Run containers in the background,
print new container names.--no-deps Don't start linked
services.--no-build Don't build an image, even if it's missing.-
-build Build images before starting containers.--no-start
Don't start the services after creating them.--no-recreate If
containers already exist, don't recreate them.--force-recreate
Recreate containers even if their configuration and image haven't
changed.
Other useful commands with compose
Compose has commands for managing the whole lifecycle of your
application:
$ docker-compose build : build or rebuild services
$ docker-compose config : validate and view the Compose file
$ docker-compose down : stop and remove containers, networks,
images, and volumes
278
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
$ docker-compose bundle : generate a Docker bundle from the
compose file
$ docker-compose logs <service_name> : stream the log output of
running services
$ docker-compose exec <service_name> <command> : execute a
command in a running container
$ docker-compose run <service_name> <command> : run a one-off
command
$ docker-compose stop <service_name(s)> : stop running containers
without removing them
$ docker-compose start <service_name(s)> : start existing containers
for a service.
$ docker-compose pull/ push <service_name(s)> : pull/ push service
images
$ docker-compose kill <service_name(s)> : kill containers
$ docker-compose rm <service_name(s)> : remove stopped containers
$ docker-compose ps : list containers
$ docker-compose images : list images

Custom images: Creating a custom image.

You can create custom images from source disks, images, snapshots, or images stored in Cloud
Storage and use these images to create virtual machine (VM) instances. Custom images are ideal for
situations where you have created and modified a persistent boot disk or specific image to a certain
state and need to save that state for creating VMs.

Alternatively, you can use the virtual disk import tool to import boot disk images to Compute Engine
from your existing systems and add them to your custom images list.

Before you begin


If you want to use the command-line examples in this guide, do the following:
1. Install or update to the latest version of the Google Cloud CLI.
2. Set a default region and zone.
If you want to use the API examples in this guide, set up API access.
Read the Images document.

Create a custom image


This section describes how to create a custom image on a Linux VM. For information about creating a
Windows image, see Creating a Windows image.
279
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Select an image storage location
When creating a custom image, you can specify the image's Cloud Storage location, excluding dual-
region locations. By specifying the image storage location, you can meet your regulatory and
compliance requirements for data locality as well as your high availability needs by ensuring
redundancy across regions. To create, modify, and delete images stored in Cloud Storage, you must
have roles/compute.storageAdmin.

The storage location feature is optional. If you don't select a location, Compute Engine stores your
image in the multi-region closest to the image source. For example, when you create an image from a
source disk that is located in us-central1 and if you don't specify a location for the custom image, then
Compute Engine stores the image in the us multi-region.

If the image is not available in a region where you are creating a VM, Compute Engine caches the
image in that region the first time you create a VM.

To see the location where an image is stored, use the images describe command from gcloud compute:

gcloud compute images describe IMAGE_NAME \


--project=PROJECT_ID

Replace the following:

IMAGE_NAME: the name of your image.


PROJECT_ID: the project ID to which the image belongs.

All of your existing images prior to this feature launch remain where they are, the only change is that
you can view the image location of all your images. If you have an existing image you want to move,
you must recreate it in the desired location.

Prepare your VM for an image


You can create an image from a disk even while it is attached to a running VM. However, your image
is more reliable if you put the VM in a state that is easier for the image to capture. This section
describes how to prepare your boot disk for the image.

Minimize writing data to the persistent disk


Use one of the following processes to reduce the disk writes:

Stop the VM so that it can shut down and stop writing any data to the persistent disk.
If you can't stop your VM before you create the image, minimize the amount of writes to the disk and
sync your file system. To minimize writing to your persistent disk, follow these steps:
1. Pause apps or operating system processes that write data to that persistent disk.
2. Run an app flush to disk if necessary. For example, MySQL has a FLUSH statement. Other apps might
have similar processes.
3. Stop your apps from writing to your persistent disk.
4. Run sudo sync.

280
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Disable the auto-delete option for the disk
By default, the auto-delete option is enabled on the boot disks. Before creating an image from a disk,
disable auto-delete to ensure that the disk is not automatically deleted when you delete the VM.

You can use the Google Cloud console, the Google Cloud CLI, or the Compute Engine API to disable
auto-delete for the disk.

ConsolegcloudAPI
1. In the Google Cloud console, go to the VM instances page.
Go to the VM instances page
2. Click on the VM that you're using as the source for creating an image. The VM instance details page
displays.
3. Click Edit.
4. In the Boot disk section, for the Deletion rule, ensure that the Keep disk option is selected.
5. Click Save.

After you prepare the VM, create the image.

Create the image


You can create disk images from the following sources:
A persistent disk, even while that disk is attached to a VM
A snapshot of a persistent disk
Another image in your project
An image that is shared from another project
A compressed RAW image in Cloud Storage

You can create a disk image once every 10 minutes. If you want to issue a burst of requests to create a
disk image, you can issue at most 6 requests in 60 minutes. For more information, see Snapshot
frequency limits.

ConsolegcloudAPIGoJavaPython
1. In the Google Cloud console, go to the Create an image page.
Go to Create an image
2. Specify the Name of your image.
3. Specify the Source from which you want to create an image. This can be a persistent disk, a snapshot,
another image, or a disk.raw file in Cloud Storage.
4. If you are creating an image from a disk attached to a running VM, check Keep instance running to
confirm that you want to create the image while the VM is running. You can prepare your VM before
creating the image.
5. In the Based on source disk location (default) drop-down list, specify the location to store the image.
For example, specify us to store the image in the us multi-region, or us-central1 to store it in the us-
central1 region. If you don't make a selection, Compute Engine stores the image in the multi-region
closest to your image's source location.
6. Optional: specify the properties for your image.
Family: the image family this new image belongs to.

281
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Description: a description for your custom image.
Label: a label to group together resources.
7. Specify the encryption key. You can choose between a Google-managed key, a Cloud Key
Management Service (Cloud KMS) key or a customer- supplied encryption (CSEK) key. If no
encryption key is specified, images are encrypted using a Google-managed key.
8. Click Create to create the image.

For more information about adding images, see the images reference.

Share the image


After creating a custom image, you can share it across projects. If you allow users from another
project to use your custom images, then they can access these images by specifying the image
project in their request.

Enable guest operating system features on custom images


Use guest operating system (OS) features to configure the following networking, security, storage, and
OS options on custom images. Custom images with these configured features are used as boot disks.

gcloudAPI

Use the gcloud compute images create command with the --guest-os-features flag to create a new custom
image from an existing custom image.

gcloud compute images create IMAGE_NAME \


--source-image=SOURCE_IMAGE \
[--source-image-project=IMAGE_PROJECT] \
--guest-os-features="FEATURES,..." \
[--storage-location=LOCATION]

Replace the following:

IMAGE_NAME: the name for the new image

SOURCE_IMAGE: an image to base the new image on

IMAGE_PROJECT: Optional: the project containing the source image

Use this parameter to copy an image from another project.


FEATURES: guest OS tags to enable features for VMs that you create from images

To add multiple values, use commas to separate values. Set to one or more of the following values:
VIRTIO_SCSI_MULTIQUEUE. Use on local SSD devices as an alternative to NVMe. For more
information about images that support SCSI, see Choosing an interface.
For Linux images, you can enable multi-queue SCSI on local SSD devices on images with kernel
versions 3.17 or later. For Windows images, you can enable multi-queue SCSI on local SSD devices
on images with Compute Engine Windows driver version 1.2.
WINDOWS. Tag Windows Server custom boot images as Windows images.

282
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
MULTI_IP_SUBNET. Configure interfaces with a netmask other than /32. For more information about
multiple network interfaces and how they work, see Multiple network interfaces overview and
examples.
UEFI_COMPATIBLE. Boot with UEFI firmware and the following Shielded VM features:
Secure Boot: disabled by default
Virtual Trusted Platform Module (vTPM): enabled by default
Integrity monitoring: enabled by default
GVNIC. Support higher network bandwidths of up to 50 Gbps to 100 Gbps speeds. For more
information, see Using Google Virtual NIC.
SEV_CAPABLE. Use if you're creating a Confidential VM on the AMD Secure Encrypted Virtualization
(SEV) CPU platform. For more information, see Create a new Confidential VM instance.
SUSPEND_RESUME_COMPATIBLE. Support suspend and resume on a VM. For more information, see OS
compatibility.
LOCATION: Optional: region or multi-region in which to store the image

For example, specify us to store the image in the us multi-region, or us-central1 to store it in the us-
central1 region. If you don't make a selection, Compute Engine stores the image in the multi-region
closest to your image's source location.
Considerations for Arm images
Google offers the Tau T2A machine series, which runs on the Ampere Altra CPU platform. You can
start a VM with the T2A machine series and then use that source VM to create an Arm image. The
process for creating a custom Arm image is identical to creating an x86 image.

To help your users differentiate between Arm and x86 images, Arm images will have
an architecture field set to ARM64. Possible values for this field are:

ARCHITECTURE_UNSPECIFIED
X86_64
ARM64

Image users can then filter on this field to find x86 or Arm-based images.

Running a container from the custom image.


Docker is an open-source container management service and one of the most popular
tools of DevOps which is being popular among the deployment team. Docker is mostly
used in Agile-based projects which require continuous delivery of the software. The
founder, Chief Technical Officer, and Chief Architect Officer of the Docker Open source
project is Solomon Hykes. It was launched in 2013 by Dotcloud since then it is the world’s
leading software container platform. For more details about containerization using docker
and its docker architecture.
How we can create our own customized Docker images and how we can push them to
the Docker hub profile? It is good practice to push your images to the docker hub profile as
283
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
you don’t have to create it again and you can pull those images in your system as well as in
the cloud with all your work saved in it.
Creating docker images is not a tedious task. We can create a docker image easily with a
few commands. There are two ways of creating a docker image depending upon the
purpose for which you want to create the image. The first method is using the commit
command and another method is by using the Dockerfile concept. Know more details about
the components of Docker ie, Docker images, and Docker File.
Now let’s start creating our own customized docker image using the commit command.
Before going with our own docker image we should first set up and configure docker in
our operating system. To learn more about how to set up docker you can refer to how to
install docker. After successful installation let’s learn some of the docker commands.
Docker Commit
A new image can be produced using the Docker commit command based on modifications
made to an existing container. It is a practical technique to generate a fresh image that
incorporates any adjustments made to a container, like adding new packages or changing
files.
Note: Using docker commit we can create an image from the container.
Syntax:
docker commit <containerId/Name> <imageName>
Steps for Committing Changes to Docker Image
Now we will create our own image from the existing alex43/ubuntu-with-git:v1.0 image
and we will customize it with our needs and we will upload it.
Commit a Container
Step 1: Pull a Docker Image
The very first step is to pull the image as shown in the below image. Use the command and
pull the image into your system as shown below.
docker pull alex43/ubuntu-with-git:v1.0

Step 2: Deploy The Container


After pulling the image run the container by using the below command where The “-it”
flag instructs Docker to create an interactive bash shell in the container by allocating a
284
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
pseudo-TTY linked to the container’s stdin. The command opens a new container and
moves you to a fresh shell prompt so you can start working inside of it.
docker run -it <Imagename/ImageID> bin/bash

Step 3: Modify The Container


Know we are in the container we can install the required package or modify the image here
we will try to install Nmap Software. Check whether the software has already been installed
before you start installing it. with the following command.
nmap --version
To install the Nmap use the following command.
apt-get install nmap

285
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Once the installation is complete, confirm once more that the software was installed as
shown in the example below. And exit from the container.

Step 4: Commit Changes to The Image


Lastly, commit the changes by using the syntax shown below to produce a new image.us
the container ID and tag the new image with a new tag.
sudo docker commit [CONTAINER_ID] [new_image_name]

286
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Additional Options for Docker Commit Command
The first command is the pull command. This command will download/pull the complete
operating system within seconds depending on your internet connectivity. The syntax is
like, docker pull image_name. Here I am pulling alex43/ubuntu-with-git:v1.0 which is my
own customized image.
docker pull alex43/ubuntu-with-git:v1.0
The second command is the run command which we will use to run the pulled image. This
command will launch my image and we will get an interactive shell/terminal of that image.
The syntax is like this -it is for an interactive terminal, –name to give the reference name
for my image launched, and then my image_name.
docker run -it --name myos alex43/ubuntu-with-git:v1.0
The third command and the most important command for creating our own image is
the commit command. By using this command we can simply create our own image with
the packages which we want from the existing image. The syntax is like, docker commit
Nameof_RunningImage your_own_name: tag.
docker commit myos ubuntu-basicbundle:v1.0
The fourth command is the tag command. By using this command we need to rename our
image with the syntax username/image-name:tag. Before executing this command you
need to create an account on the Docker hub and you have to give the same username
which you have given in the Docker hub profile.
docker tag alex43/ubuntu-with-git:v1.0 alex43/ubuntu-basicbundle:v1.0
The fifth command is the login command. By using this command we will log in to the
docker hub account through our terminal and it is required to upload our docker image to
the docker hub profile.
docker login --username alex43 --password your_passwd
The fifth command is the push command. By using this command we can upload our own
created docker image to the docker hub profile and can use it anywhere from our local
system to the cloud by pulling it.
docker push alex43/ubuntu-basicbundle:v1.0
So these were the few commands with the concept which we will be using in this tutorial
and I will be uploading one fresh image so that you guys can understand it in a better
way.
When to Commit New Changes to a New Container Image
By committing new changes to a new container image it will be useful in the
containerization process where you can make an image from the changes we have done to
a container. The timing of when to commit a new image depends upon a few factors:
1. Modifications are finished: Be sure that the modifications you’ve made are complete
and function as intended before committing new changes to a container image. You can
end up with an image that doesn’t perform properly or needs additional adjustments if
you commit insufficient changes.
2. Consistency of Changes: It’s crucial to make sure that the changes you’ve made to
the container are stable and won’t result in any problems when they’re deployed. Test
the container rigorously to confirm that it performs as expected before making
modifications to an image.
3. Frequency of Changes: Committing changes to a fresh container image more
regularly may make sense if you frequently modify the container. This can lessen the
287
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
chance of needing to roll back modifications if problems develop and ensure that each
new version of the container reflects the most recent changes.
In conclusion, only commit fresh changes to a fresh container image once they have been
fully finished, stable, and properly tested. When to commit new changes to an image
depends on the frequency of changes and your deployment workflow.
Conclusion
In this post, we’ve discussed the significance of the docker commits command and
provided step-by-step instructions with an example of how to use it. Docker commit is
mainly used to commit the image from the running container in which we have done some
modifications like installing some software or adding any variables in the container.

Publishing the custom image.

Publish your images


Follow this guide to learn how you can share your packaged application in an image using Docker Hub.

Step 1: Get an image


Before you publish your image, you need an image to publish. For this guide, use the welcome-to-docker image.
To get the image, use Docker Desktop to search for the welcome-to-docker image, and then select Pull.

Step 2: Sign in to Docker


To publish images publicly on Docker Hub, you first need an account. Select Sign in on the top-right of Docker
Desktop to either sign in or create a new account on Docker Hub.

288
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 3: Rename your image


Before you can publish your image to Docker Hub, you need to rename it so that Docker Hub knows that the image
is yours. Run the following docker tag command in your terminal to rename your image. Replace YOUR-
USERNAME with your Docker ID.
$ docker tag docker/welcome-to-docker YOUR-USERNAME/welcome-to-docker

Step 4: Push your image to Docker Hub


In Docker Desktop, go to the Images tab and find your image. In the Actions column, select the Show image
actions icon and then select Push to Hub. Your image uploads to Docker Hub and is publicly available for anyone
to use.

289
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Step 5: Verify the image is on Docker Hub
That‘s it! Your image is now shared on Docker Hub. In your browser, go to Docker Hub and verify that you see
the welcome-to-docker repository.

Docker Networking: Accessing containers,


When your deployment is up and running, you can access your Docker containers to begin
using the InfoSphere® MDM application.
Procedure
1. On the host machine, go to the Docker working directory where you earlier deployed the
Docker image package files (/mdm).
2. Run the Docker list command to get a list of all the Docker containers running in your system:
docker container ls
3. For terminal access, attach to each InfoSphere MDM Docker container, as needed.
docker exec - it <container name> /bin/bash
In each command, replace <container name> with the name of the Docker container you
wish to connect to (such
as mdm_container, db2_container, wb_container, bpmdb_container, mdmisc_container,
or clientapps_container).
The following are the default users used for attaching to various containers if you do not
specify another user in the terminal access command:
o The bpmadmin user connects to the mdmisc_container. This user does not have a set
password.
o The ws9admin user connects to mdm_container, clientapps_container,
and wb_container. This user does not have a set password.
o The db2inst1 user connects to db2_container and bpmdb_container.
4. Tip: Terminal access is useful for running commands such as Installation Verification Testing
(IVT). To run IVT from a terminal command, run the following script:
./verify.sh <dbuser> <dbpassword> <mdmadmin> <mdmpassword> <database SSL
enabled> <database trust store location> <database trust store password>
Where:
o dbuser is the database user.
o dbpassword is the password associated with the database user.
o mdmadmin is the default WebSphere® Application Server administration security user
name.
o mdmpassword is the password associated with the mdmadmin user.
o database SSL enabled is whether the database is SSL enabled or not
(true or false). For Deployed MDM, the database is always SSL enabled.
o database trust store location is the file path of the database trust store.
For Deployed MDM, the file path of the database trust store
is $WAS_PROFILE_HOME/config/cells/WASHOSTCell01/nodes/WASHOS
TNode01/trust.p12.

290
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o database trust store password is the password of the database trust store.
For Deployed MDM, the trust store password is WebAS.
The SSL and trust store arguments cannot be left blank. For example, if SSL is not enabled on
the database and there is no trust store, then the command should be similar to the following:
verify.sh db2inst1 db2inst1 mdmadmin mdmadmin false none none
5. To access the various parts of the InfoSphere MDM application on the Docker containers, use
the following information:
MDM application container (mdm_container)
o InfoSphere MDM administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o WebSphere Application Server administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o Database administrator credentials:
1. user ID: db2inst1
2. password: db2inst1
Note: Use db2inst1 only if you are using IBM Db2 (db2_container). If you are
using another database system, use the appropriate user and password.
o Access the WebSphere Application Server Integrated Solutions Console (admin
console) at https://<hostname>:9043/ibm/console/logon.jsp
o Access the InfoSphere MDM Inspector user interface at
https://<hostname>:9443/inspector/application/inspector.html
o Access the InfoSphere MDM Web Reports user interface at
https://<hostname>:9443/webreports/common/login.html
o Access the InfoSphere MDM Enterprise Viewer user interface at
https://<hostname>:9443/accessweb/servlet/dousrlogin
o Access the InfoSphere MDM Business Administration user interface at
https://<hostname>:9443/CustomerBusinessAdminWeb/faces/
o Access the MDM AE/SE user interface at https://<hostname>:9443/mdm-aese/
MDM user interface container (mdmui_container)
o InfoSphere MDM administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o WebSphere Application Server administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o Access the WebSphere Application Server Integrated Solutions Console (admin
console) at https://<hostname>:9043/ibm/console/logon.jsp
o Access the InfoSphere MDM Inspector user interface at
https://<hostname>:39043/inspector/application/inspector.html
291
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o Access the InfoSphere MDM Web Reports user interface at
https://<hostname>:39043/webreports/common/login.html
o Access the InfoSphere MDM Enterprise Viewer user interface at
https://<hostname>:39043/accessweb/servlet/dousrlogin
o Access the InfoSphere MDM Business Administration user interface at
https://<hostname>:39043/CustomerBusinessAdminWeb/faces/
o Access the MDM AE/SE user interface at https://<hostname>:8543/mdm-aese/
MDM Workbench container (wb_container)
The MDM Workbench container includes VNC server (and also a noVNC server) so that you
can access it remotely.
To access the MDM Workbench desktop using VNC in GUI mode, browse to
http://hostname:6080/vnc.html. Each open browser/tab at this URL opens a new session.
The default password for noVNC is temp4now. To change it, edit the Docker Compose
file mdm-wb.yml.
MDM Workbench workspaces are mapped under the Docker volume on the host machine
under workspace. This location contains all of the workspace assets and will survive even if
the wb_container goes down. It is mapped to
the wb_container location /opt/IBM/rationalsdp/workspace.
IBM® Stewardship Center container (mdmisc_container)
o IBM Business Process Manager administrator credentials:
1. user ID: bpmadmin
2. password: bpmadmin
o IBM Stewardship Center credentials:
1. user ID: dsuser1
2. password: password
o Access the BPM Process Center at the following URLs:
1. https://hostname:7001/ibm/console/logon.jsp
2. https://hostname:7026/ProcessCenter/login.jsp
3. https://hostname:7026/ProcessPortal/login.jsp
4. https://hostname:7026/ProcessAdmin/login.jsp
o Access the BPM Process Server at the following URLs:
1. https://hostname:8001/ibm/console/logon.jsp
2. https://hostname:8026/ProcessPortal/login.jsp
3. https://hostname:8026/ProcessAdmin/login.jsp
o Access the IBM Stewardship Center portal at
https://hostname:8026/ProcessPortal/login.jsp
IBM WebSphere MQ container (mdmmq_container)
o IBM WebSphere MQ administrator credentials:
1. user ID: admin
2. password: passw0rd

292
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o IBM WebSphere MQ port number: 1414
o IBM WebSphere MQ queue manager: MDM.QUEUE.MGR
o IBM WebSphere MQ channel: MDM.SVR.CH

linking containers,
Docker is a set of platforms as a service (PaaS) products that use the Operating
system level visualization to deliver software in packages called containers.There
are times during the development of our application when we need two containers to
be able to communicate with each other. It might be possible that the services of
both containers are dependent on each other. This can be done with the help
of Container Linking.
Previously the containers were used by using the ―–link‖ flag but that has now
become deprecated and is considered a legacy command.

Connect with the Linking System


There are two ways of linking the containers
The default way
User-defined way
To understand the formation of a custom network between two containers we need
to understand how docker assigns the network automatically.

293
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The Default Way

Once we install docker and create a container a default bridged network is assigned
to docker, by the name of Docker0. The IP is in the range of 172.17.0.0/16 (where
172.17.0.1 is assigned to the interface)

Now the containers that we will create will get their IPs in the range of
172.17.0.2/16.
Step 1: Create two new containers, webcon, and dbcon
$ docker run -it --name webcon -d httpd
$ docker run -it --name dbcon -e MYSQL_ROOT_PASSWORD=1234 -d mysql
You can use any image, we’ll be using MySQL and HTTPD images in our case.

Step 2: Check the IPs of the new containers.


$ docker network inspect bridge

294
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

With the help of these IPs, the docker host establishes a connection with the
containers.
Step 3: Get inside the webcon container and try to ping the dbcon container, if you
get a response back this means that the default connection is established.
$ docker container exec -it webcon /bin/bash
(to get into the webcon container)
$ ping "172.17.0.3"
(ping the dbcon container)

User-Defined Way

Step 1: Create a custom bridge network.


$ docker network create <bridge_name>
(This will create a bridge with custom subnet and gateway)
We can also give our own subnet and gateway.
$ docker network create --subnet <your_subnet>
--gateway <Your_gateway> bridgename

295
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 2: Verify if your network has been created or not.


$ docker network ls
Step 3: Associate or link the two containers on the network that you just created by
using the ―–net‖ flag.
$ docker run --name <container_name>
--net=<custom_net>
-d <image_name>

We have used httpd and Alpine images for our containers.


Step 4: Get inside the webnew container( IP- 10.7.0.10) and ping the alpine
container(IP- 10.7.0.2)
$ docker exec -it webnew /bin/bash
$ ping "10.7.0.2" (inside the webnew container)
If you start receiving the packets from the Alpine container then you have
successfully established a connection between both containers using your own
OUR-NET network. So this is how you can create your own custom bridged network
which allows you to establish a connection between your container.
The Importance of Naming
Docker mainly depends upon the names of the containers we can see in the above
example whenever you create a new container the name gets created automatically
we can also name our container is will us in two different ways.
By giving the container a name, we can keep track of the type of program that is
executed inside of it, such as a web application or a database.
If a web application wants to communicate with DB servers, for instance, it can
act as a barrier between the two containers like a connection link.
We can name our container with the help of the command shown below
(--name) docker run -d -P --name <name/imagename/tag>

296
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Environment Variables
If suppose the developer mentioned some –env (Environmental variables) in the
source code by which we can connect to the database server, for example,
Username and password then while creating the container we set the username and
password as shown in the below command.
docker run -d --name <name> -e USERNAME=<***> -e PASSWORD=<***> --
network <****>
We can set the above-mentioned env variables to the database container by using
the following command.
docker run -d -p <port> --name <name> -e HOSTNAME=<***> -e
USERNAME=<***> -e PASSWORD=<***> --network <***>

Updating the /etc/hosts file


Docker adds a host entry for the source container to the /etc/hosts apart from the
environmental variables we provide the command to link two containers is
mentioned below.
docker run -t -i --rm --link <Mention Entries>
To check the list of entries that have been mentioned in the /etc/hosts file we can
use the below command.
cat /etc/hosts

Container Linking allows multiple containers to link with each other. It is a better option than
exposing ports. Let’s go step by step and learn how it works.

297
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Step 4 − Now, attach to the receiving container.

298
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Then run the env command. You will notice new variables for linking with the source
container.

Exposing container ports,


The Docker open-source platform has revolutionized the way we create, deploy, and manage contai
application, you‘ll need to write a Dockerfile—which has instructions Docker uses for building and ru

At times, you may need to set out some networking rules to enable smooth interaction between cont
make your Docker ports accessible by services in the outside world.

You can do this in the following ways:

Add an EXPOSE instruction in the


Dockerfile Use the –expose flag at
runtime to expose a port
Use the -p flag or -P flag in the Docker run string to publish a port

Whereas each of the above rules may realize mostly similar results, they work differently.

So, which rule should you go for?

This article will demonstrate how to apply different networking rules when implementing Docker expo

Exposing Docker ports via EXPOSE or –expose


299
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
There are two ways of exposing ports in Docker:

Including an EXPOSE instruction in the


DockerfileUsing the –expose flag at runtime

While the two commands are equivalent, they differ in how they

work.Let‘s talk about each of them.

a) Using EXPOSE
With the EXPOSE rule, you can tell Docker that the container listens on the stated network ports dur

Here is an example of how to expose a port in Dockerfile:

The above line will instruct Docker that the container‘s service can be connected to via port 8080.

You can also expose multiple ports:

By default, the EXPOSE keyword specifies that the port listens on TCP protocol.

How EXPOSE and –expose work


Basically, EXPOSE is a documentation mechanism that gives configuration information another com
which initial incoming ports will provide services, or informs the decisions that the container operator
much networking control to an image developer.

As earlier explained, you can use the –expose flag in a Docker run string to add to the exposed port

By default, the EXPOSE instruction does not expose the container‘s ports to be accessible from the
stated ports available for inter-container interaction.

For example, let‘s say you have a Node.js application and a Redis server deployed on the same Doc
application communicates with the Redis server, the Redis container should expose a port.

If you check the Dockerfile of the official Redis image, a line is included that says EXPOSE 6379. Th
talk with one another.

Therefore, when your Node.js application connects to the 6379 port of the Redis container, the EXP
container communication takes place.
Publishing Docker ports via -P or -p
300
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
There are two ways of publishing ports in Docker:

Using the
-P flag
Using the
-p flag

Let‘s talk about each of them.

a) Using the -P flag


Using the -P (upper case) flag at runtime lets you publish all exposed ports to random ports on the h

As earlier mentioned, EXPOSE is usually used as a documentation mechanism; that is, hinting to th
providing services.

Docker allows you to add -P at runtime and convert the EXPOSE instructions in the Dockerfile to
spe

Docker identifies all ports exposed using the EXPOSE directive and those exposed using the –expo
is mapped automatically to a random port on the host interface. This automatic mapping also preven

b) Using the -p flag


Using the -p (lower case) flag at runtime lets you publish a container‘s specific port(s) to the Docker

It allows you to map a container‘s port or a range of ports to the host explicitly—instead of

exposing

301
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

How publishing ports works


By default, if you use the docker run or docker create command to run or create a container, they
daccessible by services in the outside world.

So, while it‘s possible for your Docker containers to connect to the outside world without making any
for the outside world to connect to your Docker containers.

If you want to override this default behavior, you can use either the -P or the -p flag in your Docker r

Publishing ports produce a firewall rule that binds a container port to a port on the Docker host, ensu
client that can communicate with the host.

It‘s what makes a port accessible to Docker containers that are not connected to the container‘s net
Docker environment.

Differences between EXPOSE and publish


Whereas publishing a port using either -P or -p exposes it, exposing a port using EXPOSE or –expo

So, while exposed ports can only be accessed internally, published ports can be accessible by exter

That‘s the main difference between exposing and publishing ports in Docker.

Container Routing.

Container routing determines how to transport containers from their origins to their
destinations in a liner shipping network. Take Figure 1 as an example, which shows a liner
shipping network consisting of three ship routes. Containers from Singapore to Hong Kong
can be transported on either ship route 1 or ship route 2. If there are many containers to be
transported from Singapore to Jakarta, then containers from Singapore to Hong Kong should
be transported on ship route 2 to reserve the capacity on ship route 1 for containers from
Singapore to Jakarta. In addition to different ship routes on which containers can be
transported from origin to destination, another complicating factor is transshipment. For
instance, containers from Hong Kong to Colombo can be transported on ship route 2, or they
can be transported on ship route 1 to Singapore and transshipped to ship route 2 and then
transported to Colombo. The choice of direct shipment on ship route 2 is preferable because
otherwise it would involve a high transshipment cost at Singapore. However, if there are many
containers to be transported from Hong Kong to Xiamen or from Xiamen to Singapore, then
the choice of transshipment at Singapore from ship route 1 to ship route 2 has to be adopted.
Consequently, it is not an easy task to determine the optimal container routing.

302
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences

Figure 1
An illustrative liner shipping network [8].

Container routing determines the container handling cost. Table 1 shows the handling costs for two
types of laden containers at three ports: D20 means dry 20 ft container, and D40 is dry 40 ft container.
In terms of cargo capacity, a D40 is equivalent to two D20s. However, Table 1 clearly indicates in the
three rows ―Ratio‖ that the ratio of the cost of handling a D40 and that of handling a D20 is strictly
less than 2. In fact, all the ratios in Table 1 are less than 1.5, and some ratios are even 1 or very close
to 1. This is because both the handling of a D20 and the handling of a D40 involve one quay crane
move (we note that nowadays some quay cranes can handle one D40 or two D20s in each move.).
Therefore, to reduce container handling costs, a shipping line should try to transport more D40s
instead of D20s as a D40 can hold as much cargos as two D20s.
Table 1
Laden container handling cost (USD/container) at three ports (source: [11]).

1.1. Container Repacking

As the handling cost of a D40 is much lower than that of two D20s, it might be advantageous to
unpack two D20s and repack them to one D40. In the sequel, we use ―TEU‖ and ―D20‖
interchangeably and use ―forty-foot equivalent unit (FEU)‖ and ―D40‖ interchangeably. The load,

303
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
transshipment, and discharge cost (USD/container) of a TEU at a port is denoted by , , and ,
respectively. The load, transshipment, and discharge cost (USD/container) of an FEU at port is
denoted by , , and , respectively. We further let be the cost of repacking two TEUs into one FEU
and be the cost of unpacking one FEU to two TEUs (Since multiple rehandling of containers would
increase the risk for damage and therefore may increase insurance costs, we can include in and the
extra insurance costs. Moreover, repacking requires consent from shippers and we can include in the
rehandling cost the component of discount for shippers who agree for their cargos to be repacked.).

Figure 2 shows an example of transporting two TEUs from port to port . The two TEUs need to be
transshipped twice. If they are transported as two TEUs, as shown in Figure 2(a), then, at the port of
origin, that is, , two TEUs are loaded; at , two TEUs are transshipped; at , two TEUs are transshipped;
and, at the destination port , two TEUs are discharged. Therefore, the total container handling cost is

(b)Two TEUs fromtowith repacking atand unpacking at

Routing Traffic to Docker Containers

304
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
How to Setup Nginx Reverse Proxy for Routing Incoming Traffic to Different Containers
and Certbot for Auto-Renewing SSL Certificates

For small applications or test environments where separate machines for different web

servers are cost prohibitive, one option is to have different servers run on the same

machine in different Docker containers. Docker doesn’t support exposing the same port

to multiple containers simultaneously (source). Still, we can install Nginx on the host
machine and have it conditionally route the requests to the different containers.

Incoming traffic routed to two docker containers with different web servers on the same
machine

The problem

After exposing port 80 to one container, if we try to have that port exposed to another
container, we get the following error:

$ docker run -ti -d -p 80:80 httpd


docker: Error response from daemon: driver failed programming external connectivity on endpoint
hopeful_haibt (...): Bind for 0.0.0.0:80 failed: port is already allocated.

The solution: Reverse Proxy

In this example, we want to have one container to serve a Flask application

for flaskapp.example.com and another container to serve a Node.js application


for nodeapp.example.com; both from port 80 of the same machine.
305
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
We will set up a reverse proxy that routes the request for hosts to different containers:

1. Install Docker

There are plenty of tutorials out there for installing Docker, such as this one. This article
will focus on the routing part.

2. Install Nginx on the host machine

Again, there are plenty of other good tutorials on this. Here is one. (You can also install
Nginx in a separate container.)

3. Point your domains to the server

For our example, we would setup A records for flaskapp and nodeapp that point to the IP
of the server in the DNS records of example.com.

4. Run the Docker containers with the web servers you need, but on ports other than 80
and 443

Say we have two Docker images already built or pulled: flaskApp and nodeApp. We can
expose port 8080 for flaskApp and port 8081 for nodeApp:

$ docker run -dit -rm --name flaskApp -p 8080:80 my-flask-app


$ docker run -dit -rm --name nodeApp -p 8081:80 my-nodejs-app
At this point, you should be able to see your applications served
at flaskapp.example.com:8080 and nodeapp.example.com:8081.

5. Configure Nginx reverse proxy

To serve both these apps on port 80, we will set up server blocks in the host machine.

Create the server block for flaskapp.example.com:

$ sudo nano /etc/nginx/sites-available/flaskapp.example.com


Paste the following in this file:

server {
listen 80;
server_name flaskapp.example.com;
location / {
proxy_pass http://localhost:8080;
}
306
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
}
Create the server block for nodeapp.example.com:

$ sudo nano /etc/nginx/sites-available/nodeapp.example.com


Paste the following in this file:

server {
listen 80;
server_name nodeapp.example.com;
location / {
proxy_pass http://localhost:8080;
}
}
Create symlinks in the sites-enabled directory:

$ sudo ln -s /etc/nginx/sites-available/flaskapp.example.com /etc/nginx/sites-enabled/


$ sudo ln -s /etc/nginx/sites-available/nodeapp.example.com /etc/nginx/sites-enabled/
Ensure that the configuration we did is valid:

$ sudo nginx -t
Restart Nginx:

$ sudo systemctl restart nginx


6. Setup automatically renewing SSL certificates with Certbot

Install Certbot and its Nginx plugin.

$ sudo apt-get update


$ sudo apt-get install certbot
$ sudo apt-get install python3-certbot-nginx
Then generate certificates for your websites.

$ sudo certbot --nginx -d flaskapp.example.com -d www.flaskapp.example.com


$ sudo certbot --nginx -d nodeapp.example.com -d www.nodeapp.example.com
Now, if you look at the server blocks we’ve set up, you will see the new lines added for
the SSL configuration automatically.

Of course, “-d www.flaskapp.example.com” is not necessary if you are not using the

www version for the subdomain. After following the prompts, you will see a success
message as below.
307
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Requesting a certificate for flaskapp.example.com
Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/flaskapp.example.com/fullchain.pem
Key is saved at:
/etc/letsencrypt/live/flaskapp.example.com/privkey.pem
This certificate expires on 2022-10-25.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.
Deploying certificate
Successfully deployed certificate for test4.cansin.net to /etc/nginx/sites-enabled/flaskapp.example.com
Congratulations! You have successfully enabled HTTPS on https://flaskapp.example.com
You can list your certificates with:

$ certbot certificates
… and delete them with:

$ sudo certbot delete --cert-name flaskapp.example.com


.. and delete the server blocks for that site:

$ sudo rm /etc/nginx/sites-enabled/flaskapp.example.com
$ sudo rm /etc/nginx/sites-available/flaskapp.example.com

308

You might also like