DevOps E BOOK Final
DevOps E BOOK Final
1
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Course Code: IT-41 Course Name: DevOps
Unit Topics Details Weightage
No. in %
1 1. Introduction to DevOps.
1.1. Define Devops
1.2. What is Devops
1.3. SDLC models, Lean, ITIL, Agile
1.4. Why Devops?
1.5. History of Devops
1.6. Devops Stakeholders
1.7. Devops Goals
1.8. Important terminology
1.9. Devops perspective
1.10. DevOps and Agile
1.11. DevOps Tools 10
1.12. Configuration management
1.13. Continuous Integration and Deployment
1.14. Linux OS Introduction
1.15. Importance of Linux in DevOps
1.16. Linux Basic Command Utilities
1.17. Linux Administration
1.18. Environment Variables
1.19. Networking
1.20. Linux Server Installation
1.21. RPM and YUM Installation
2 2. Version Control-GIT
2.1. Introduction to GIT
2.2. What is Git
2.3. About Version Control System and Types
2.4. Difference between CVCS and DVCS
2.5. A short history of GIT
2.6. GIT Basics
2.7. GIT Command Line
2.8. Installing Git
2.9. Installing on Linux
2.10. Installing on Windows 15
2.11. Initial setup
2.12. Git Essentials
2.13. Creating repository
2.14. Cloning, check-in and committing
2.15. Fetch pull and remote
2.16. Branching
2.17. Creating the Branches, switching the branches,
merging
2.18. The branches.
3
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 1 : Introduction to DevOps.
Define Devops
DevOps is the combination of cultural philosophies, practices, and tools that
increases an organization’s ability to deliver applications and services at high
velocity: evolving and improving products at a faster pace than organizations
using traditional software development and infrastructure management
processes. This speed enables organizations to better serve their customers and
compete more effectively in the market.
In some DevOps models, quality assurance and security teams may also become
more tightly integrated with development and operations and throughout the
application lifecycle. When security is the focus of everyone on a DevOps team,
this is sometimes referred to as DevSecOps.
These teams use practices to automate processes that historically have been
manual and slow. They use a technology stack and tooling which help them
operate and evolve applications quickly and reliably. These tools also help
engineers independently accomplish tasks (for example, deploying code or
provisioning infrastructure) that normally would have required help from other
teams, and this further increases a team’s velocity.
Benefits of DevOps
Move at high velocity so you can innovate for customers faster, adapt to changing markets better, and
grow more efficient at driving business results. The DevOps model enables your developers and
operations teams to achieve these results. For example, microservices and continuous delivery let teams
take ownership of services and then release updates to them quicker.
4
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Increase the frequency and pace of releases so you can innovate and improve your product faster. The
quicker you can release new features and fix bugs, the faster you can respond to your customers’ needs
and build competitive advantage. Continuous integration and continuous delivery are practices that
automate the software release process, from build to deploy.
Ensure the quality of application updates and infrastructure changes so you can reliably deliver at a more
rapid pace while maintaining a positive experience for end users. Use practices like continuous
integration and continuous delivery to test that each change is functional and safe. Monitoring and
logging practices help you stay informed of performance in real-time.
Operate and manage your infrastructure and development processes at scale. Automation and
consistency help you manage complex or changing systems efficiently and with reduced risk. For
example, infrastructure as code helps you manage your development, testing, and production
environments in a repeatable and more efficient manner.
Build more effective teams under a DevOps cultural model, which emphasizes values such as ownership
and accountability. Developers and operations teams collaborate closely, share many responsibilities, and
combine their workflows. This reduces inefficiencies and saves time (e.g. reduced handover periods
between developers and operations, writing code that takes into account the environment in which it is
run).
Move quickly while retaining control and preserving compliance. You can adopt a DevOps model without
sacrificing security by using automated compliance policies, fine-grained controls, and configuration
management techniques. For example, using infrastructure as code and policy as code, you can define
and then track compliance at scale.
What is Devops
DevOps combines development and operations to increase the efficiency, speed, and security of
software development and delivery compared to traditional processes. A more nimble software
development lifecycle results in a competitive advantage for businesses and their customers.
DevOps explained
5
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
DevOps can be best explained as people working together to conceive, build and deliver secure
software at top speed. DevOps practices enable software development (dev) and operations (ops)
teams to accelerate delivery through automation, collaboration, fast feedback, and iterative
improvement.
Stemming from an Agile approach to software development, a DevOps process expands on the
cross-functional approach of building and shipping applications in a faster and more iterative
manner. In adopting a DevOps development process, you are making a decision to improve the flow
and value delivery of your application by encouraging a more collaborative environment at all
stages of the development cycle.
DevOps represents a change in mindset for IT culture. In building on top of Agile, lean practices, and
systems theory, DevOps focuses on incremental development and rapid delivery of software.
Success relies on the ability to create a culture of accountability, improved collaboration, empathy,
and joint responsibility for business outcomes.
Core DevOps principles
The DevOps methodology comprises four key principles that guide the effectiveness and efficiency of
application development and deployment. These principles, listed below, center on the best aspects
of modern software development.
1. Automation of the software development lifecycle. This includes automating testing, builds,
releases, the provisioning of development environments, and other manual tasks that can slow
down or introduce human error into the software delivery process.
2. Collaboration and communication. A good DevOps team has automation, but a great DevOps
team also has effective collaboration and communication.
3. Continuous improvement and minimization of waste. From automating repetitive tasks to
watching performance metrics for ways to reduce release times or mean-time-to-recovery, high
performing DevOps teams are regularly looking for areas that could be improved.
4. Hyperfocus on user needs with short feedback loops. Through automation, improved
communication and collaboration, and continuous improvement, DevOps teams can take a
moment and focus on what real users really want, and how to give it to them.
By adopting these principles, organizations can improve code quality, achieve a faster time to
market, and engage in better application planning.
Planning
6
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Requirements Analysis
Design
Coding
Unit Testing and
Acceptance Testing.
At the end of the iteration, a working product is displayed to the customer and important
stakeholders.
What is Agile?
Agile model believes that every project needs to be handled differently and the existing methods
need to be tailored to best suit the project requirements. In Agile, the tasks are divided to time boxes
(small time frames) to deliver specific features for a release.
Iterative approach is taken and working software build is delivered after each iteration. Each build is
incremental in terms of features; the final build holds all the features required by the customer.
The Agile thought process had started early in the software development and started becoming
popular with time due to its flexibility and adaptability.
The most popular Agile methods include Rational Unified Process (1994), Scrum (1995), Crystal Clear,
Extreme Programming (1996), Adaptive Software Development, Feature Driven Development, and
Dynamic Systems Development Method (DSDM) (1995). These are now collectively referred to
as Agile Methodologies, after the Agile Manifesto was published in 2001.
7
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Individuals and interactions − In Agile development, self-organization and motivation are
important, as are interactions like co-location and pair programming.
Working software − Demo working software is considered the best means of communication
with the customers to understand their requirements, instead of just depending on
documentation.
Customer collaboration − As the requirements cannot be gathered completely in the
beginning of the project due to various factors, continuous customer interaction is very
important to get proper product requirements.
Responding to change − Agile Development is focused on quick responses to change and
continuous development.
Agile Vs Traditional SDLC Models
Agile is based on the adaptive software development methods, whereas the traditional SDLC
models like the waterfall model is based on a predictive approach. Predictive teams in the traditional
SDLC models usually work with detailed planning and have a complete forecast of the exact tasks and
features to be delivered in the next few months or during the product life cycle.
Predictive methods entirely depend on the requirement analysis and planning done in the beginning
of cycle. Any changes to be incorporated go through a strict change control management and
prioritization.
Agile uses an adaptive approach where there is no detailed planning and there is clarity on future
tasks only in respect of what features need to be developed. There is feature driven development
and the team adapts to the changing product requirements dynamically. The product is tested very
frequently, through the release iterations, minimizing the risk of any major failures in future.
Customer Interaction is the backbone of this Agile methodology, and open communication with
minimum documentation are the typical features of Agile development environment. The agile
teams work in close collaboration with each other and are most often located in the same
geographical location.
Lean is a systematic process for stopping waste and was created in the manufacturing world by W.
Edwards Deming and Taiichi Ohno’s Toyota Production System. It revolutionized the Japanese
industrial economy after World War II and later returned to the United States. The book, “Lean
Software Development An Agile Toolkit,” adapted these lean techniques to software development.
They identified seven principles of lean that apply to the software. Like the just-in-time tenet of lean
manufacturing and aligned with the agile idea of being flexible, you try to move fast but delay
decisions in enhanced feedback loops and group context. Building integrity will inform the
approaches to continue this integration and testing.
The basic philosophy of lean is about identifying which actions you and your organization perform
that add value to the product or service you produce and which do not. Activities that don’t add
8
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
value are called waste. Lean recognizes three significant types of waste, and they all have Japanese
names: muda, muri, and mura. Muri is the primary form of waste, and it comes in two types: type
one, which is technically waste but necessary for some reason, like compliance, and type two, which
is just plain wasteful. The Poppendiecks also defined seven primary wastes that are endemic in
software development. This includes bugs and delays, but it also includes spending effort on features
that aren’t needed. Toyota didn’t take long to adapt the lean to product development. A recent
popular adaptation of that is found in Eric Ries’s book “Lean Startup.” In the book, he proposes the
build-measure-learn loop as a variation of the usual Kaizen plan-do-check-act cycle. You focus on
delivering the minimum viable product to customers, get their feedback, and iterate from there
instead of trying to analyze what the perfect product would have been upfront. There are a variety of
techniques that go along with lean.
The Information Technology Infrastructure Library is a set of detailed practices for IT activities such
as IT service management and IT asset management that focus on aligning IT services with the needs
of the business. DevOps stands on the shoulders of giants. And there are a lot of concepts from the
various ITSM and SDLC frameworks and maturity models that are worth learning. Teams should be
organized around the standard ITIL processes, with sections for change management, supplier
management, incident management, etc. But when you implement these processes, you want to use
a lean and agile mindset and need to craft them in a way that’s people first, and that doesn’t
introduce waste or bottlenecks, into the value stream, in the name of a standard or best practices.
IT service management is a realization that service delivery is an integral part of the overall software
development life cycle. Engineers should properly manage it from design, development, deployment,
and maintenance to retirement. In the past, the software development life cycles focused on code
writing and tended to stop at handoff, or if they mentioned deployment and maintenance, they went
into very little detail on them. In this way, ITSM is one of DevOps’ ancestors. ITIL was the first ITSM
framework. It launched the idea of ITSM. So many folks still speak about them as if they’re the same
thing, even though other ITSM frameworks, like COBIT, have emerged since. ITIL is a UK government
standard that grew out of the Thatcherism of the 1980s as the previously organically managed IT
assets.
ITIL Guidelines
The UK government didn’t have a lot of alignment and standards. And so, their central IT division
published guidelines on managing services in the late eighties and early nineties. The UK’s central IT
group did this first version of ITIL so well that it piqued interest outside the UK government. In 2001,
ITIL v2 was published with the explicit intent of being used by others. V3 was published in 2007 and
updated in 2011. It uses a process model-based view of controlling and managing services. It can be
said to inherit from Deming’s plan due check act cycle. ITIL recognizes four primary phases of the
service lifestyle. Service strategy, design, transition, and operation. It has guidance for every kind of
IT process you’ve ever heard of, from incident management to portfolio management, to capacity
management to serve catalogs. At the same time, all of the high-level principles of ITIL make sense.
9
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
It’s designed to be a reasonably prescriptive and top-down framework. While it’s not technically
against ITIL to do agile development or perform continuous integration and deployment or other
such practices, honestly, much of the culture, advice, and consultancy around ITIL assumes a
waterfall push-driven model of the world. But it certainly doesn’t have to be that way.
Why Devops?
DevOps implementation varies with each company, depending on their goals, processes, and even
corporate cultures. We can, however, identify a number of core DevOps principles that most teams
follow. The advantages of DevOps are Fostering a collaborative environment through communication,
mutual trust, sharing of skills and ideas, and problem-solving.
Establishing a culture of end-to-end accountability, in which the entire team is responsible for
the outcomes and there are no ―pointing fingers‖ between the ―Dev‖ and ―Ops‖ experts.
Focusing on continual improvement based on customer input and evolving technologies in
order to optimize product quality, cost, and delivery speed.
Whenever possible, use automation to streamline and speed up development and deployment
processes, as well as enhance efficiency and dependability.
Providing a client-centric strategy with quick feedback loops to meet changing customer
needs.
Taking lessons from mistakes and fostering an environment where they can be turned into new
opportunities.
Faster innovation:
10
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
One of the DevOps benefits is faster innovation, Because of speedier product delivery to the market,
you can innovate faster than your competition. The DevOps culture allows the team to openly
contribute ground-breaking ideas and communicate their thoughts in real-time.
Higher reliability:
The development, deployment, and other processes become more reliable and less prone to errors.
With DevOps and continuous testing ensuring faster development cycles, the team can quickly
identify any inconsistencies or problems in the program. It‘s simple to address issues swiftly thanks to
good communication and sharing of experience. It‘s also quite simple to undo a deployment at any
point.
Customer satisfaction:
Another significant argument for the importance of DevOps is that the customer-centric approach,
regular feedback, shorter time to market, and continuous improvement all lead to the most fulfilling
software development outcomes.
DevOps in the future will place a larger emphasis on maximizing the usage of cloud technology.
According to Deloitte Consulting analyst David Linthicum, the cloud‘s centralized nature enables
DevOps automation with a consistent platform for testing, deployment, and production.
Regardless of what new technologies the future brings, enterprises must understand that DevOps is all
about the journey and that their DevOps-related goals and expectations will change over time.
History of Devops
11
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The origins of the DevOps trace back to when the term, ―DevOps‖ was first coined by Patrick
Debois in 2009 which is regarded as the DevOps origin year. Debois is now regarded as one of the
pioneering figures of DevOps and has gained significance over the years as one of its gurus with
more and more organizations integrating DevOps into their operating systems. To answer the
looming question of when DevOps started, let us first look at how the term was formulated. The
evolution of DevOps occurred because of the combination of the words ―development‖ and
―operations,‖. This, therefore essentially provides a fundamental point for comprehending what
exactly people mean when they refer to ―DevOps.‖ One important thing you need to know about
the DevOps methodology is that it isn‘t a technology, process or established definitive.
DevOps is often referred to as a cultural viewpoint. Most importantly, the true meaning of
DevOps has widened to become an umbrella term referring to the culture, processes, and mindset
used for optimizing and shortening the life cycle of software development with the help of fast
feedback loops for offering features, updates and fixes at a frequent pace. To know more about the
DevOps culture and find the answer to the question ‗when did DevOps start?‘, you can check out
some of the Best DevOps Courses online to help you acquire a comprehensive idea about what it
is, how it functions and how it is implemented.
The DevOps history is quite an interesting one considering how it was first implemented into the
workflow systems of organizations at large. In this article, we will discuss in detail when DevOps
started, and the detailed history and evolution of DevOps.
Devops Stakeholders
Devops Goals
The primary goals of DevOps are to improve collaboration, increase efficiency, and deliver high-
quality software more rapidly. Here are some specific goals that organizations strive to achieve
through DevOps practices:
13
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Accelerated Software Delivery: DevOps aims to reduce the time it takes to develop, test, and deploy
software. By automating processes, streamlining workflows, and fostering collaboration between
development and operations teams, organizations can release new features and updates more
frequently, enabling faster time-to-market.
Continuous Integration and Continuous Deployment (CI/CD): CI/CD is a core principle of DevOps. It
involves automating the integration and testing of code changes, as well as the continuous deployment
of software to production environments. The goal is to ensure that changes are thoroughly tested,
validated, and deployed quickly and reliably.
Increased Collaboration and Communication: DevOps promotes closer collaboration and
communication between developers, operations teams, and other stakeholders. Breaking down silos
and fostering a culture of collaboration helps teams work together more effectively, share knowledge,
and resolve issues faster.
Improved Quality and Reliability: DevOps emphasizes the integration of quality assurance processes
throughout the software development lifecycle. By automating testing, performing continuous
monitoring, and using feedback loops, organizations can identify and address issues earlier, resulting
in higher-quality software and more reliable systems.
Infrastructure as Code (IaC): DevOps encourages the use of infrastructure as code, where
infrastructure resources, such as servers, networks, and configurations, are defined and managed
programmatically. This approach enables consistency, repeatability, and scalability, reducing manual
configuration errors and facilitating infrastructure changes.
Agile and Lean Practices: DevOps aligns with agile and lean principles, focusing on iterative
development, frequent feedback, and continuous improvement. By applying agile methodologies and
lean practices, organizations can adapt quickly to changing requirements, minimize waste, and
optimize processes for efficiency and value delivery.
Enhanced Scalability and Flexibility: DevOps promotes scalability and flexibility by leveraging cloud
computing and virtualization technologies. By provisioning and managing resources dynamically,
organizations can scale their infrastructure based on demand, optimize resource utilization, and
quickly adapt to changing business needs.
Improved Security and Compliance: Security is an integral part of DevOps. By incorporating security
practices throughout the software development lifecycle, organizations can proactively address
vulnerabilities, enforce compliance requirements, and enhance overall system security.
Monitoring and Feedback Loops: DevOps emphasizes continuous monitoring of applications and
infrastructure, collecting metrics and logs to gain insights into system performance, availability, and
user behavior. Feedback loops enable teams to detect issues, identify areas for improvement, and make
data-driven decisions to enhance the software and its delivery process.
Cultural Transformation: DevOps often requires a cultural shift within an organization. It promotes a
collaborative and cross-functional mindset, encourages transparency, and fosters a blameless culture
where learning from failures is valued. The goal is to create an environment that supports
experimentation, innovation, and continuous learning.
It's important to note that the specific goals and priorities of DevOps may vary based on organizational
context, industry, and project requirements. However, these goals collectively represent the core
objectives that organizations typically aim to achieve through adopting DevOps practices.
14
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Important terminology
Here are some important terminologies and concepts commonly used in the DevOps domain:
Continuous Integration (CI): CI is the practice of frequently merging code changes from multiple
developers into a central repository. It involves automating the build and testing process to identify
integration issues early and ensure that the codebase remains in a working state.
Continuous Deployment (CD): CD is the process of automatically deploying software changes to
production environments after passing through the necessary testing and validation stages. It aims to
deliver software updates to end-users rapidly and reliably.
Infrastructure as Code (IaC): IaC is an approach where infrastructure resources, such as servers,
networks, and configurations, are defined and managed programmatically using code. It enables the
automation and reproducibility of infrastructure provisioning and management.
Microservices: Microservices is an architectural style where an application is composed of loosely
coupled, independently deployable services. Each service focuses on a specific business capability and
communicates with other services through well-defined APIs. Microservices enable flexibility,
scalability, and independent development and deployment of different parts of an application.
Orchestration: Orchestration refers to the coordination and management of various automated tasks,
workflows, and processes in a DevOps environment. It involves managing the execution order,
dependencies, and parallelization of tasks to achieve desired outcomes efficiently.
Configuration Management: Configuration management involves managing and maintaining
consistent configurations of infrastructure resources and software systems. It includes defining,
provisioning, and managing configurations, ensuring consistency, and facilitating efficient change
management.
Version Control: Version control, often implemented using tools like Git, is a system that tracks and
manages changes to source code and other files. It enables teams to collaborate, manage codebase
history, and revert to previous versions if needed.
Continuous Monitoring: Continuous monitoring involves collecting and analyzing data about system
performance, health, and user behavior in real-time. It helps detect issues, ensure system availability,
and provide insights for optimization and troubleshooting.
DevOps Pipeline: A DevOps pipeline represents the end-to-end process of software development,
testing, and deployment. It typically includes stages like code compilation, testing, artifact generation,
deployment, and monitoring. Automation and integration of these stages enable smooth and efficient
software delivery.
DevOps Culture: DevOps culture emphasizes collaboration, communication, and shared responsibility
between development and operations teams. It promotes a mindset of continuous learning,
experimentation, and a focus on delivering value to customers.
Agile: Agile is an iterative and incremental software development methodology that emphasizes
flexibility, customer collaboration, and rapid response to change. DevOps aligns with agile principles,
promoting short development cycles, continuous feedback, and adaptive planning.
15
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Kanban: Kanban is a visual management framework used to visualize and optimize the flow of work.
It provides transparency, promotes collaboration, and helps identify and address bottlenecks in the
development and delivery process.
These are just a few key terminologies and concepts within the vast DevOps landscape. There are
many more specific tools, frameworks, and practices that organizations may adopt based on their
requirements and goals.
Devops perspective
DevOps is a very simple approach from a theory point of view. Just a set of practices that combine
software development (Dev) and information-technology operations (Ops) to shorten system
development life cycle and provide continuous delivery with high software quality.
However, practically implementing DevOps is easier said than done. The practice requires a shift in
mindset, a lot of patience, and effective management. The process of implementing DevOps is long
and consists of the following steps:
The first step to DevOps implementation is to create the DevOps infrastructure on which the
application will run. But doing so is not easy. There is a lack of co-operation between the development
and the operations team. Both work in two different groups and silos. Developers want to deliver
changes as soon as possible and the operations team, on the other hand, aims for stability.
Now, we must bring them both together but also ensure they work towards the common goal of our
stakeholders, i.e. releasing valuable software as soon as possible with minimum risk involved.
For this, we will have to create a continuous delivery pipeline so that both the development and the
operation team can work together without any kind of confusion – thus releasing software sprints
faster without much risk. We should make the following small yet crucial changes to realize this goal:
We should audit even the smallest changes made to the deployment environment so that if anything
goes wrong, we can easily track what caused the problem.
We need to set strong monitoring systems to alert the development and operations team on time if
any abnormal event occurs. This will minimize the downtime if anything goes wrong.
We should ensure the application logs a WARNING every time a connection is unexpectedly closed
or timed out, INFO or DEBUG every time a connection is closed.
We should make sure our operation team can test the scenario if anything goes wrong so that the
same thing can be prevented from happening again in the future.
We must involve the operations team in the organizational IT service continuity plan right from the
start.
For creating the DevOps infrastructure, we should use technology with which the operations team is
well-familiar – so that they can easily own and manage the environment.
How are we going to deploy & configure various bits of software that forms our infrastructure?
How can we manage our infrastructure once the provisioning and configuration?
Everything we need to create and maintain the infrastructure, such as operating system install
definitions, configuration for data centre automation tools like Puppet, general infrastructure
configurations like DNS files & SMTP settings, and the scripts for managing the infrastructure will be
kept under the version control.
All these files in version control will provide inputs to the deployment pipeline – whose job in case of
infrastructural changes is to:
Verify that the infrastructural changes will be run on all the applications before they are pushed into
the production environment. This will ensure that the new version of the infrastructure passes all
functional and non-functional tests before it is live.
Push changes to the production environment and the testing environment managed by the operations
team.
Perform tests for ensuring the successful deployment of the new infrastructure on the application.
For managing the DevOps infrastructure environment, we will need the following things:
Controlling the access so that no one can make changes without approval.
Even the smallest change no matter if it‘s about updating the firewall or deploying a new version of
the software – should be run through the same change management process.
The DevOps Infrastructure modification process should be managed through a single ticketing
system everyone can log into.
Changes should be logged as they are so that they can be easily audited.
We need to test the changes in a production-like testing environment before pushing them live.
Apply all the DevOps infrastructure changes to the version control first and then apply them through
the automated process.
17
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Run tests to verify if the changes we made have worked or not.
Server provisioning & server configuration management is often overlooked in small and medium-
sized organizations. Yet, it‘s a very important factor in DevOps infrastructure and environmental
management. Let‘s know about both in detail:
A. SERVER PROVISIONING
In server provisioning, we take a set of resources like appropriate systems and data and software to
build a server and make it ready for network operation. Typical tasks during server provisioning are
selecting a server from a pool of available servers, loading appropriate software, customizing and
configuring the system, changing a boot image for the server, and finally changing its parameters. We
can read more about server provisioning here.
B. VIRTUALIZATION
Virtualization is the fundamental enable of the cloud which enables thousands of hosts virtually access
servers over the internet. A virtual machine emulates a physical machine. Following will be benefits of
virtualization:
Consolidation
Hardware standardization
After installing the operating system, we need to ensure full control over the configuration. They
should not change in an uncontrolled manner. Nobody should be able to log into the deployment
environment except the operations team and no change should be done without an automated system.
We also need to apply OS service packages, upgrades, install new software, change necessary settings,
and perform deployments.
Next, we need to run parallel tests in the deployment pipeline to see if everything is running smoothly
in the production environment or there are any issues.
4. Managing Data
We may also face a set of problems in Data management & organization while implementing the
DevOps infrastructure, such as:
There is a large volume of data involved which makes it impossible to keep track of each data
involved in software development.
The lifecycle of application data is different from other parts of the system.
18
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
One way to avoid this problem and effectively manage data is to delete the previous version or replace
the old version with a new copy.
However, doing so is not possible in real-time scenarios. Every single bit of data is important. There
can be scenarios when we might need to roll back to a previous state due to some issues. In that case,
we will still need the older versions of data. So, we will need some advanced approaches for data
management like:
A. DATABASE SCRIPTING
One great way to manage data in DevOps Infrastructure is to capture all database initialization and
migration as scripts and check into version control. Then, we can use these scripts to manage every
database used in the delivery process.
However, we need to make sure all the database scripts are managed effectively so that there is no
issue while retrieving data from the databases.
The most challenging yet crucial part of managing the DevOps infrastructure is to reproduce an
environment after an issue occurs. We must ensure the application behaves the way it was behaving
before the issue. And that‘s where the process of deploying a database comes into play. This is what
happens while we deploy a database afresh:
5. Incremental Change
Incremental change is another effective technique to manage DevOps infrastructure data. It ensures an
application keeps working even after we are making changes into it – which is an important per-
requisite of continuous integration (CI). Continuous delivery, on the other hand, demands the
successful deployment of every software release, including the changes to the database into
production. This means we must update the entire operational database while retaining the valuable
data held in it. So, we need an efficient rollback strategy so that we can easily take back control of
things if anything goes wrong.
A. DATABASE VERSIONING
It is one of the most efficient mechanisms for data migration in an automated fashion. All we need is
to create a table in the database which contains its version number. Now, every time we make a
change to the database, we will have to create two scripts:
A roll-forward script that takes the database from version x to version x+1.
A roll-backward script that takes the database from version x+1 to version x.
19
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Another thing we will need is an application configuration setting which specifies the version of the
database with which it is designed to work.
Then during the deployment, we can use a tool which looks at the current version of the database and
the database version required by the application version being deployed. Then this tool will use the
roll-forward or roll-backward scripts to align both the application and the database version correctly.
We can read about database scripting in detail here.
This is another common practice for data migration. But we are not in its favor because it‘s better if
applications can communicate directly – not through the database. Still, many companies are
following this practice and integrating all applications through a single database.
If you are doing the same, be careful because even a small change in the database can have a knock-on
effect on how other applications are working. We should test such changes in an orchestrated
environment before implementing them in the production environment.
With the help of roll-forward and roll-backward scripts, it‘s easy to use an application at the deploy
time to migrate the existing database to its correct version – that too without losing any data.
Another effective data migration strategy is to perform both the database migration process from the
application deployment process independently. This will also make sure data migration is done
without data loss or any change in the application behaviour.
6. Configuration Management
Managing configuration manually is simple for a single machine. However, when we are handling five
or ten servers with which 100-200 computers are connected – configuration management becomes a
nightmare. That‘s why we need a better way to manage things:
A. VERSION CONTROL
Version control is responsible for recording changes to a file or a set of files over time – so that we can
easily remember specific versions later. It‘s a wise thing to use because if we know the previous
versions of files, we can easily roll back to the earlier versions of the project. Version control can also
help us recover in case we make mistakes and screw up things.
Use version control for everything (source code, tests, database scripts, builds & deployment scripts,
documentation, libraries, and configuration files).
20
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Use detailed multi-paragraph commit messages during check-in. This can save hours of debugging in
case any error occurs later.
Since external libraries come in binary form, managing them can be a difficult task. Here are two ways
we can get this done:
Declare the external libraries and use a tool like Maven or Ivy to down them from the Internet
repositories to our own artifact repository.
2. Managing components
The best way is to split the application into smaller components. This will limit the scope of the
changes to the application, reduce regression bugs, encourage reuse, and enable a much more efficient
development process on large projects.
Keep all the available application configuration options in the same repository as its source code.
Perform configurations using an automated process with the help of values taken from the
configuration repository.
That‘s how we establish a DevOps infrastructure management and software deployment environment.
Although, the process is not easy. It requires a lot of patience and guidance because there are a lot of
chances we can go wrong.
At Softobiz, we have been responsible for building many successful projects like EasyWebinar using
the DevOps approach. We can effectively guide you through the process of DevOps Infrastructure
management and building the deployment pipeline environment.
21
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
DevOps and Agile
DevOps and Agile are the two software development methodologies with similar aims, getting the
end-product as quickly and efficiently as possible. While many organizations are hoping to employ
these practices, there is often some confusion between both methodologies.
What does each methodology enclose? Where do they overlap? Can they work together, or should we
choose one over the other?
What is DevOps?
The DevOps is a combination of two words, one is software Development, and second is Operations.
This allows a single team to handle the entire application lifecycle, from development to testing,
deployment, and operations. DevOps helps you to reduce the disconnection between software
developers, quality assurance (QA) engineers, and system administrators.
DevOps promotes collaboration between Development and Operations team to deploy code to
production faster in an automated & repeatable way.
DevOps helps to increase organization speed to deliver applications and services. It also allows
organizations to serve their customers better and compete more strongly in the market.
DevOps can also be defined as a sequence of development and IT operations with better
communication and collaboration.
DevOps has become one of the most valuable business disciplines for enterprises or organizations.
With the help of DevOps, quality, and speed of the application delivery has improved to a great
extent.
22
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
DevOps is nothing but a practice or methodology of making "Developers" and "Operations" folks
work together. DevOps represents a change in the IT culture with a complete focus on rapid IT service
delivery through the adoption of agile practices in the context of a system-oriented approach.
What is Agile?
The Agile involves continuous iteration of development and testing in the SDLC process. Both
development and testing activities are concurrent, unlike the waterfall model. This software
development method emphasizes on incremental, iterative, and evolutionary development.
It breaks the product into small pieces and integrates them for final testing. It can be implemented in
many ways, such as Kanban, XP, Scrum, etc.
The Agile software development focus on the four core values, such as:
o Working software over comprehensive documentation.
o Responded to change over following a plan.
o Customer collaboration over contract negotiation.
o Individual and team interaction over the process and tools.
Below are some essential differences between the DevOps and Agile:
Parameter DevOps Agile
Definition DevOps is a practice of bringing Agile refers to the continuous iterative
development and operation teams approach, which focuses on
together. collaboration, customer feedback, small,
and rapid releases.
Purpose DevOps purpose is to manage end to end The agile purpose is to manage complex
engineering processes. projects.
Task It focuses on constant testing and It focuses on constant changes.
delivery.
Team size It has a large team size as it involves all It has a small team size. As smaller is
the stack holders. the team, the fewer people work on it so
that they can move faster.
Team skillset The DevOps divides and spreads the The Agile development emphasizes
skill set between development and the training all team members to have a
operation team. wide variety of similar and equal skills.
Implementation DevOps is focused on collaboration, so Agile can implement within a range of
it does not have any commonly accepted tactical frameworks such as safe, scrum,
framework. and sprint.
Duration The ideal goal is to deliver the code to Agile development is managed in units
production daily or every few hours. of sprints. So this time is much less than
a month for each sprint.
Target areas End to End business solution and fast Software development.
delivery.
Feedback Feedback comes from the internal team. In Agile, feedback is coming from the
customer.
Shift left It supports both variations left and right. It supports only shift left.
principle
Focus DevOps focuses on operational and Agile focuses on functional and non-
business readiness. functional readiness.
Importance In DevOps, developing, testing, and Developing software is inherent to
implementation all are equally Agile.
important.
Quality DevOps contributes to creating better The Agile produces better applications
quality with automation and early bug suites with the desired requirements. It
removal. Developers need to follow can quickly adapt according to the
23
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Coding and best Architectural practices changes made on time during the project
to maintain quality standards. life.
Tools Puppet, Chef, AWS, Ansible, and team Bugzilla, Kanboard, JIRA are some
City OpenStack are popular DevOps popular Agile tools.
tools.
Automation Automation is the primary goal of Agile does not emphasize on the
DevOps. It works on the principle of automation.
maximizing efficiency when deploying
software.
Communication DevOps communication involves specs Scrum is the most common method of
and design documents. It is essential for implementing Agile software
the operational team to fully understand development. Scrum meeting is carried
the software release and its network out daily.
implications for the enough running the
deployment process.
Documentation In the DevOps, the process The agile method gives priority to the
documentation is foremost because it working system over complete
will send the software to an operational documentation. It is ideal when you are
team for deployment. Automation flexible and responsive. However, it can
minimizes the impact of insufficient harm when you are trying to turn things
documentation. However, in the over to another team for deployment.
development of sophisticated software,
it's difficult to transfer all the knowledge
required.
DevOps Tools
Here are some most popular DevOps tools with brief explanation shown in the below image, such as:
24
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1) Puppet
Puppet is the most widely used DevOps tool. It allows the delivery and release of the technology
changes quickly and frequently. It has features of versioning, automated testing, and continuous
delivery. It enables to manage entire infrastructure as code without expanding the size of the team.
Features
o Real-time context-aware reporting.
o Model and manage the entire environment.
o Defined and continually enforce infrastructure.
o Desired state conflict detection and remediation.
o It inspects and reports on packages running across the infrastructure.
o It eliminates manual work for the software delivery process.
o It helps the developer to deliver great software quickly.
2) Ansible
Ansible is a leading DevOps tool. Ansible is an open-source IT engine that automates application
deployment, cloud provisioning, intra service orchestration, and other IT tools. It makes it easier for
DevOps teams to scale automation and speed up productivity.
Ansible is easy to deploy because it does not use any agents or custom security infrastructure on the
client-side, and by pushing modules to the clients. These modules are executed locally on the client-
side, and the output is pushed back to the Ansible server.
Features
o It is easy to use to open source deploy applications.
o It helps in avoiding complexity in the software development process.
o It eliminates repetitive tasks.
o It manages complex deployments and speeds up the development process.
3) Docker
Docker is a high-end DevOps tool that allows building, ship, and run distributed applications on
multiple systems. It also helps to assemble the apps quickly from the components, and it is typically
suitable for container management.
Features
o It configures the system more comfortable and faster.
o It increases productivity.
o It provides containers that are used to run the application in an isolated environment.
o It routes the incoming request for published ports on available nodes to an active container.
This feature enables the connection even if there is no task running on the node.
25
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o It allows saving secrets into the swarm itself.
4) Nagios
Nagios is one of the more useful tools for DevOps. It can determine the errors and rectify them with
the help of network, infrastructure, server, and log monitoring systems.
Features
o It provides complete monitoring of desktop and server operating systems.
o The network analyzer helps to identify bottlenecks and optimize bandwidth utilization.
o It helps to monitor components such as services, application, OS, and network protocol.
o It also provides to complete monitoring of Java Management Extensions.
5) CHEF
A chef is a useful tool for achieving scale, speed, and consistency. The chef is a cloud-based system
and open source technology. This technology uses Ruby encoding to develop essential building blocks
such as recipes and cookbooks. The chef is used in infrastructure automation and helps in reducing
manual and repetitive tasks for infrastructure management.
Chef has got its convention for different building blocks, which are required to manage and automate
infrastructure.
Features
o It maintains high availability.
o It can manage multiple cloud environments.
o It uses popular Ruby language to create a domain-specific language.
o The chef does not make any assumptions about the current status of the node. It uses its
mechanism to get the current state of the machine.
6) Jenkins
Jenkins is a DevOps tool for monitoring the execution of repeated tasks. Jenkins is a software that
allows continuous integration. Jenkins will be installed on a server where the central build will take
place. It helps to integrate project changes more efficiently by finding the issues quickly.
Features
o Jenkins increases the scale of automation.
o It can easily set up and configure via a web interface.
o It can distribute the tasks across multiple machines, thereby increasing concurrency.
o It supports continuous integration and continuous delivery.
o It offers 400 plugins to support the building and testing any project virtually.
o It requires little maintenance and has a built-in GUI tool for easy updates.
26
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
7) Git
Git is an open-source distributed version control system that is freely available for everyone. It is
designed to handle minor to major projects with speed and efficiency. It is developed to co-ordinate
the work among programmers. The version control allows you to track and work together with your
team members at the same workspace. It is used as a critical distributed version-control for the
DevOps tool.
Features
o It is a free open source tool.
o It allows distributed development.
o It supports the pull request.
o It enables a faster release cycle.
o Git is very scalable.
o It is very secure and completes the tasks very fast.
8) SALTSTACK
Stackify is a lightweight DevOps tool. It shows real-time error queries, logs, and more directly into the
workstation. SALTSTACK is an ideal solution for intelligent orchestration for the software-defined
data center.
Features
o It eliminates messy configuration or data changes.
o It can trace detail of all the types of the web request.
o It allows us to find and fix the bugs before production.
o It provides secure access and configures image caches.
o It secures multi-tenancy with granular role-based access control.
o Flexible image management with a private registry to store and manage images.
9) Splunk
Splunk is a tool to make machine data usable, accessible, and valuable to everyone. It delivers
operational intelligence to DevOps teams. It helps companies to be more secure, productive, and
competitive.
Features
o It has the next-generation monitoring and analytics solution.
o It delivers a single, unified view of different IT services.
o Extend the Splunk platform with purpose-built solutions for security.
o Data drive analytics with actionable insight.
27
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
10) Selenium
Selenium is a portable software testing framework for web applications. It provides an easy interface
for developing automated tests.
Features
o It is a free open source tool.
o It supports multiplatform for testing, such as Android and ios.
o It is easy to build a keyword-driven framework for a WebDriver.
o It creates robust browser-based regression automation suites and tests.
Configuration management
configuration management tools
There are a variety of configuration management tools available, and each has specific features that
make it better for some situations than others. Yet the top five configuration management tools,
presented below in alphabetical order, have several things in common that I believe are essential for
DevOps success: all have an open source license, use externalized configuration definition files, run
unattended, and are scriptable. All of the descriptions are based on information from the tools'
software repositories and websites.
Ansible
"Ansible is a radically simple IT automation platform that makes your applications and systems easier
to deploy. Avoid writing scripts or custom code to deploy and update your applications—automate in
a language that approaches plain English, using SSH, with no agents to install on remote systems." —
GitHub repository
Website
Documentation
Community
Ansible is one of my favorite tools; I started using it several years ago and fell in love with it. You can
use Ansible to execute the same command for a list of servers from the command line. You can also
use it to automate tasks using "playbooks" written into a YAML file, which facilitate communication
between teams and non-technical people. Its main advantages are that it is simple, agentless, and easy
to read (especially for non-programmers).
Because agents are not required, there is less overhead on servers. An SSH connection is necessary
when running in push mode (which is the default), but pull mode is available if needed. Playbooks can
be written with a minimal set of commands or they can be scaled for more elaborate automation tasks
that could include roles, variables, and modules written by other people.
You can combine Ansible with other tools to create a central console to control processes. Those tools
include Ansible Works (AWX), Jenkins, RunDeck, and ARA, which offers traceability when running
playbooks.
28
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CFEngine
"CFEngine 3 is a popular open source configuration management system. Its primary function is to
provide automated configuration and maintenance of large-scale computer systems." —GitHub
repository
Website
Documentation
Community
CFEngine was introduced by Mark Burgess in 1993 as a scientific approach to automated
configuration management. The goal was to deal with the entropy in computer systems' configuration
and resolve it with end-state "convergence." Convergence means a desired end-state and elaborates on
idempotence as a capacity to reach the desired end-state. Burgess' research evolved in 2004 when he
proposed the Promise theory as a model of voluntary cooperation between agents.
The current version of CFEngine incorporates Promise theory and uses agents running on each server
that pull the configuration from a central repository. It requires some expert knowledge to deal with
configurations, so it's best suited for technical people.
Chef
"A systems integration framework, built to bring the benefits of configuration management to your
entire infrastructure." —GitHub repository
Website
Documentation
Community
Chef uses "recipes" written in Ruby to keep your infrastructure running up-to-date and compliant. The
recipes describe a series of resources that should be in a particular state. Chef can run in client/server
mode or in a standalone configuration named chef-solo. It has good integration with the major cloud
providers to automatically provision and configure new machines.
Chef has a solid user base and provides a full toolset to allow people with different technical
backgrounds and skills to interact around the recipes. But, at its base, it is more technically oriented
tool.
Puppet
"Puppet, an automated administrative engine for your Linux, Unix, and Windows systems, performs
administrative tasks (such as adding users, installing packages, and updating server configurations)
based on a centralized specification." —GitHub repository
Website
Documentation
Community
Conceived as a tool oriented toward operations and sysadmins, Puppet has consolidated as a
configuration management tool. It usually works in a client-server architecture, and an agent
communicates with the server to fetch configuration instructions.
Puppet uses a declarative language or Ruby to describe the system configuration. It is organized in
modules, and manifest files contain the desired-state goals to keep everything as required. Puppet uses
the push model by default, and the pull model can be configured.
Salt
"Software to automate the management and configuration of any infrastructure or application at
scale." — GitHub repository
Website
29
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Documentation
Community
Salt was created for high-speed data collection and scale beyond tens of thousands of servers. It uses
Python modules to handle configuration details and specific actions. These modules manage all of
Salt's remote execution and state management behavior. Some level of technical skills are required to
configure the modules.
Salt uses a client-server topology (with the Salt master as server and Salt minions as clients).
Configurations are kept in Salt state files, which describe everything required to keep a system in the
desired state.
30
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
31
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
After the basics of all three concepts, it's essential to understand how these three processes relate to
each other.
Linux OS Introduction
Linux is a community of open-source Unix like operating systems that are based on the Linux
Kernel. It was initially released by Linus Torvalds on September 17, 1991. It is a free and open-
source operating system and the source code can be modified and distributed to anyone
commercially or noncommercially under the GNU General Public License.
Initially, Linux was created for personal computers and gradually it was used in other machines like
servers, mainframe computers, supercomputers, etc. Nowadays, Linux is also used in embedded
systems like routers, automation controls, televisions, digital video recorders, video game consoles,
smartwatches, etc. The biggest success of Linux is Android(operating system) it is based on the
Linux kernel that is running on smartphones and tablets. Due to android Linux has the largest
installed base of all general-purpose operating systems. Linux is generally packaged in a Linux
distribution.
Linux Distribution
Linux distribution is an operating system that is made up of a collection of software based on Linux
kernel or you can say distribution contains the Linux kernel and supporting libraries and software.
And you can get Linux based operating system by downloading one of the Linux distributions and
these distributions are available for different types of devices like embedded devices, personal
computers, etc. Around 600 + Linux Distributions are available and some of the popular Linux
distributions are:
MX Linux
Manjaro
Linux Mint
elementary
Ubuntu
Debian
Solus
Fedora
openSUSE
Deepin
ARCHITECTURE OF LINUX
Linux architecture has the following components:
1. Kernel: Kernel is the core of the Linux based operating system. It virtualizes the common
hardware resources of the computer to provide each process with its virtual resources. This
makes the process seem as if it is the sole process running on the machine. The kernel is also
32
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
responsible for preventing and mitigating conflicts between different processes. Different types
of the kernel are:
Monolithic Kernel
Hybrid kernels
Exo kernels
Micro kernels
2. System Library: Isthe special types of functions that are used to implement the functionality of
the operating system.
3. Shell: It is an interface to the kernel which hides the complexity of the kernel‘s functions from
the users. It takes commands from the user and executes the kernel‘s functions.
4. Hardware Layer: This layer consists all peripheral devices like RAM/ HDD/ CPU etc.
5. System Utility: It provides the functionalities of an operating system to the user.
Advantages of Linux
The main advantage of Linux, is it is an open-source operating system. This means the source
code is easily available for everyone and you are allowed to contribute, modify and distribute the
code to anyone without any permissions.
In terms of security, Linux is more secure than any other operating system. It does not mean that
Linux is 100 percent secure it has some malware for it but is less vulnerable than any other
operating system. So, it does not require any anti-virus software.
The software updates in Linux are easy and frequent.
Various Linux distributions are available so that you can use them according to your
requirements or according to your taste.
Linux is freely available to use on the internet.
It has large community support.
It provides high stability. It rarely slows down or freezes and there is no need to reboot it after a
short time.
It maintain the privacy of the user.
The performance of the Linux system is much higher than other operating systems. It allows a
large number of people to work at the same time and it handles them efficiently.
It is network friendly.
The flexibility of Linux is high. There is no need to install a complete Linux suit; you are
allowed to install only required components.
Linux is compatible with a large number of file formats.
It is fast and easy to install from the web. It can also install on any hardware even on your old
computer system.
It performs all tasks properly even if it has limited space on the hard disk.
Disadvantages of Linux
It is not very user-friendly. So, it may be confusing for beginners.
It has small peripheral hardware drivers as compared to windows.
Is There Any Difference between Linux and Ubuntu?
The answer is YES. The main difference between Linux and Ubuntu is Linux is the family of open -
source operating systems which is based on Linux kernel, whereas Ubuntu is a free open-source
operating system and the Linux distribution which is based on Debian. Or in other words, Linux is
the core system and Ubuntu is the distribution of Linux. Linux is developed by Linus Torvalds and
released in 1991 and Ubuntu is developed by Canonical Ltd. and released in 2004.
34
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
In the context of DevOps, here are some additional command-line utilities commonly used in
Linux:
1. git: A version control system used for tracking changes in source code. It allows collaboration,
branching, merging, and version management. Commands like git clone, git commit, git push, and git
pull are commonly used.
2. docker: A containerization platform that allows you to build, deploy, and manage containers. It
provides commands like docker build, docker run, and docker-compose for container management.
3. kubectl: The command-line interface for managing Kubernetes clusters. It allows you to deploy and
manage containerized applications, check cluster status, and interact with various Kubernetes
resources.
4. ansible: An automation tool used for configuration management, application deployment, and
orchestration. It enables the automation of tasks across multiple servers with simple and declarative
scripts called "playbooks."
5. terraform: A tool for provisioning and managing infrastructure as code. It allows you to define and
create infrastructure resources in various cloud platforms and data centers using a declarative
language.
6. curl: A command-line tool used for making HTTP requests and interacting with APIs. It can be used
to test API endpoints, download files, and perform various web-related operations.
7. jq: A lightweight command-line tool for parsing and manipulating JSON data. It allows you to extract,
filter, and transform JSON data, making it useful for processing API responses and working with
JSON files.
8. awk: A powerful text processing utility that enables pattern scanning and text manipulation. It is
commonly used for data extraction, transformation, and reporting tasks.
9. sed: A stream editor used for performing text transformations on input streams or files. It is
particularly useful for search and replace operations, text manipulation, and basic scripting.
10. ssh-keygen: A utility used for generating SSH key pairs. SSH keys are essential for secure remote
access and authentication to servers and systems.
11. systemctl: A command-line utility for managing system services in Linux. It allows you to start, stop,
restart, enable, or disable system services and view their status.
12. top and htop: These utilities provide real-time monitoring of system resources, including CPU,
memory, and process information. They are useful for monitoring system performance and identifying
resource-intensive processes.
13. traceroute: A command-line tool used to trace the path packets take from your computer to a
destination IP address or domain. It helps diagnose network connectivity issues and identify network
hops.
14. nc/netcat: A versatile networking utility that can be used for various purposes, such as establishing
TCP/UDP connections, port scanning, and transferring data between systems.
36
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
These are just a few examples of command-line utilities that are frequently used in DevOps
workflows. The choice of utilities may vary depending on the specific requirements of the DevOps
tasks and the technologies being used.
Linux Administration
Linux is a major strength in computing technology. Most web servers, mobile phones,
personal computers, supercomputers, and cloud servers are powered by Linux. The job of
a Linux systems administrator is to manage the operations of a computer system like
maintaining, enhancing, creating user accounts/reports, and taking backups using Linux
tools and command-line interface tools. Most computing devices are powered by Linux
because of its high stability, high security, and open-source environment. There are some
of the things that a Linux system administrator should know and understand:
Linux File Systems
A Linux system administrator should have a solid knowledge and understanding of the
various Linux file systems used by Linux like Ext2, Ext3, and Ext4. Understanding the
difference between these file systems is important so that one can easily perform tasks and
partition disks or configure Linux file system permissions.
37
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Detecting and solving the service problems ranging from disaster recovery to login
problems.
Installing the necessary systems and security tools. Working with the Data Network
Engineer and other personnel/departments to analyze hardware requirements and
makes acquiring recommendations.
Troubleshoot, when a problem occurs in the server.
Steps to Start the Career as Linux System Administrator:
Install and learn to use Linux environment.
Get Certified in Linux administration.
Learn to do Documentation.
Joining up with a local Linux Users Group or Community for Support and Help
In short, the main role of the Linux Systems Administrator is to manage the operations
like installing, observing the software and hardware systems and taking backup. And also
have a good ability to describe an In-depth understanding of technical knowledge. Even
freshman-level Professionals have great possibilities for the position of System
Administrator with the yearly median salary is around INR 3 Lacs, salary increase with an
increase in job experience. To get the experience you need to check for the latest skills
and learning in the Linux community.
Environment Variables
Environment variables or ENVs basically define the behavior of the environment. They
can affect the processes ongoing or the programs that are executed in the environment.
Scope of any variable is the region from which it can be accessed or over which it is
defined. An environment variable in Linux can have global or local scope.
Global
A globally scoped ENV that is defined in a terminal can be accessed from anywhere in
that particular environment which exists in the terminal. That means it can be used in all
kind of scripts, programs or processes running in the environment bound by that terminal.
Local
A locally scoped ENV that is defined in a terminal cannot be accessed by any program or
process running in the terminal. It can only be accessed by the terminal( in which it was
defined) itself.
SYNTAX:
$NAME
NOTE: Both local and global environment variables are accessed in the same way.
SYNTAX:
$ echo $NAME
38
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To display all the Linux ENVs
SYNTAX:
$ printenv //displays all the global ENVs
or
$ set //display all the ENVs(global as well as local)
or
$ env //display all the global ENVs
SYNTAX:
$ NAME=Value
EXAMPLE:
SYNTAX:
$ unset NAME
or
$ NAME=''
EXAMPLE:
NOTE: To unset permanent ENVs, you need to re-edit the files and remove the lines that
were added while defining them.
40
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Networking
Every computer is connected to some other computer through a network whether internally
or externally to exchange some information. This network can be small as some computers
connected in your home or office, or can be large or complicated as in large University or the
entire Internet.
41
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
ip: It is the updated and latest edition of ifconfig command. The command provides the
information of every network, such as ifconfig. Also, it can be used to get information about a
particular interface.
Syntax:
1. ip a
2. ip addr
traceroute: The traceroute command is one of the most helpful commands in the
networking field. It's used to balance the network. It identifies the delay and decides the
pathway to our target. Basically, it aids in the below ways:
o It determines the location of the network latency and informs it.
o It follows the path to the destination.
o It gives the names and recognizes all devices on the path.
Syntax:
1. traceroute <destination>
tracepath: The tracepath command is the same as the traceroute command, and it is used to
find network delays. Besides, it does not need root privileges. By default, it comes pre-
installed in Ubuntu. It traces the path to the destination and recognizes all hops in it. It
identifies the point at which the network is weak if our network is not strong enough.
Syntax:
1. tracepath <destination>
ping: It is short for Packet Internet Groper. The ping command is one of the widely used
commands for network troubleshooting. Basically, it inspects the network connectivity
between two different nodes.
Syntax:
1. ping <destination>
netstat: It is short for network statistics. It gives statistical figures of many interfaces, which
contain open sockets, connection information, and routing tables.
Syntax:
1. Netstat
ss: This command is the substitution for the netstat command. The ss command is more
informative and much faster than netstat. The ss command's faster response is possible
because it fetches every information from inside the kernel userspace.
Syntax:
1. Ss
nsloopup: The nslookup command is an older edition of the dig command. Also, it is utilized
for DNS related problems.
Syntax:
1. nslookup <domainname>
42
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
dig: dig is short for Domain Information Groper. The dig command is an improvised edition
of the nslookup command. It is utilized in DNS lookup to reserve the DNS name server. Also,
it is used to balance DNS related problems. Mainly, it is used to authorize DNS mappings,
host addresses, MX records, and every other DNS record for the best DNS topography
understanding.
Syntax:
1. dig <domainname>
route: The route command shows and employs the routing table available for our system.
Basically, a router is used to detect a better way to transfer the packets around a destination.
Syntax:
1. Route
host: The host command shows the IP address for a hostname and the domain name for an
IP address. Also, it is used to get DNS lookup for DNS related issues.
Syntax:
1. host -t <resourceName>
arp: The arp command is short for Address Resolution Protocol. This command is used to see
and include content in the ARP table of the kernel.
Syntax:
1. Arp
iwconfig: It is a simple command which is used to see and set the system's hostname.
Syntax:
1. Hostname
curl and wget: These commands are used to download files from CLI from the internet. curl
must be specified with the "O" option to get the file, while wget is directly used.
curl Syntax:
1. curl -O <fileLink>
wget Syntax:
1. wget <fileLink>
mtr: The mtr command is a mix of the traceroute and ping commands. It regularly shows
information related to the packets transferred using the ping time of all hops. Also, it is used
to see network problems.
Syntax:
1. mtr <path>
whois: The whois command fetches every website related information. We can get every
information of a website, such as an owner and the registration information.
Syntax:
1. mtr <websiteName>
43
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
ifplugstatus: The ifplugstatus command checks whether a cable is currently plugged into a
network interface. It is not available in Ubuntu directly. We can install it with the help of the
below command:
1. sudo apt-get install ifplugd
Syntax:
1. Ifplugstatus
iftop: The iftop command is utilized in traffic monitoring.
tcpdump: The tcpdump command is widely used in network analysis with other commands
of the Linux network. It analyses the traffic passing from the network interface and shows it.
When balancing the network, this type of packet access will be crucial.
Syntax:
1. $ tcpdump -i <network_device>
1. Choose a Linux Distribution: Select a Linux distribution suitable for server environments, such
as Ubuntu Server, CentOS, Debian, or Fedora Server. Visit the official website of the chosen
distribution and download the ISO image for the server edition.
2. Create Installation Media: Create a bootable installation media, such as a USB drive or DVD,
using the downloaded ISO image. You can use tools like Rufus (for Windows) or Etcher (for
Windows, macOS, and Linux) to create the bootable media.
3. Boot from Installation Media: Insert the installation media into the server's appropriate drive
(USB port or DVD drive) and restart the server. Ensure that the server is set to boot from the
installation media. You may need to change the boot order in the server's BIOS or UEFI
settings.
4. Start the Installation: Once the server boots from the installation media, you will be presented
with the Linux distribution's installer. Follow the on-screen instructions to proceed with the
installation.
5. Language and Keyboard Settings: Choose the language and keyboard layout for the
installation process.
6. Disk Partitioning: Select the disk or partition where you want to install the Linux server. You
can choose automatic partitioning or manual partitioning based on your requirements. If you
are unsure, automatic partitioning is generally recommended for beginners.
7. Set Hostname and Network Configuration: Provide a hostname for your server and configure
the network settings, including IP address, subnet mask, gateway, and DNS information. You
can choose DHCP if you want the server to obtain network settings automatically.
8. Set Time Zone: Select the appropriate time zone for your server.
9. Create User and Set Password: Create a user account for administering the server. Set a
strong password for the user.
44
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
10. Software Selection: Choose the packages and software you want to install on the server. For a
basic server setup, you can select options like SSH server, basic utilities, and possibly a web
server or database server if needed.
11. Begin Installation: Once you have configured the necessary settings, proceed with the
installation process. The installer will copy files, install packages, and configure the server.
12. Reboot and Login: After the installation completes, you will be prompted to reboot the
server. Remove the installation media and reboot the server. Once the server restarts, you can
log in with the user account you created during the installation process.
13. Post-Installation Configuration: After logging in, you may need to perform additional
configuration steps, such as updating packages, configuring firewall rules, setting up
additional services, and securing the server. Refer to the documentation and best practices
for your chosen distribution to ensure a secure and optimized server configuration.
Remember to consult the official documentation and guides specific to your chosen Linux
distribution for any distribution-specific installation instructions or recommendations.
1. Download the RPM Package: Locate the RPM package you want to install from a trusted
source or the official repository.
2. Open Terminal: Open a terminal or command-line interface on your Linux server.
3. Navigate to the Directory: Use the cd command to navigate to the directory where the RPM
package is located. For example:
bashCopy code
cd /path/to/package/directory
4. Install the RPM Package: Use the rpm command with the -i flag to install the RPM package.
For example:
cssCopy code
5. Verify the Installation: After the installation completes, you can verify the installation by
running commands specific to the software you installed. Refer to the software's
documentation for the relevant commands.
45
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
2. Update the System: It's a good practice to update the system before installing any new
software. Run the following command to update the package repositories:
sqlCopy code
3. Search for the Package: Use the yum search command to search for the package you want to
install. For example, to search for the package named "package_name," use the following
command:
sqlCopy code
4. Install the Package: Once you find the package you want to install, use the yum install
command to install it. For example, to install the package named "package_name," use the
following command:
Copy code
5. Confirm Installation: YUM will prompt you to confirm the installation by displaying the
package details and asking for confirmation. Type 'y' and press Enter to proceed with the
installation.
6. Verify the Installation: After the installation completes, you can verify the installation by
running commands specific to the software you installed. Refer to the software's
documentation for the relevant commands.
Note: When using YUM, it automatically resolves dependencies and installs any necessary
dependencies for the software you want to install.
Remember to run the commands with administrative privileges (using sudo) to install
packages system-wide.
46
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 2: Version Control-GIT
Introduction to GIT
Git is a popular version control system. It was created by Linus Torvalds in 2005, and has been
maintained by Junio Hamano since then.
It is used for:
Tracking code changes
Tracking who made changes
Coding collaboration
What does Git do?
Manage projects with Repositories
Clone a project to work on a local copy
Control and track changes with Staging and Committing
Branch and Merge to allow for work on different parts and versions of a project
Pull the latest version of the project to a local copy
Push local updates to the main project
Working with Git
Initialize Git on a folder, making it a Repository
Git now creates a hidden folder to keep track of changes in that folder
When a file is changed, added or deleted, it is considered modified
You select the modified files you want to Stage
The Staged files are Committed, which prompts Git to store a permanent snapshot of the
files
Git allows you to see the full history of every commit.
You can revert back to any previous commit.
Git does not store a separate copy of every file in every commit, but keeps track of changes
made in each commit!
Why Git?
Over 70% of developers use Git!
Developers can work together from anywhere in the world.
Developers can see the full history of the project.
Developers can revert to earlier versions of a project.
What is GitHub?
Git is not the same as GitHub.
GitHub makes tools that use Git.
GitHub is the largest host of source code in the world, and has been owned by Microsoft since
2018.
In this tutorial, we will focus on using Git with GitHub.
47
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
What is Git
Git is the most commonly used version control system. Git tracks the changes you make to files, so
you have a record of what has been done, and you can revert to specific versions should you ever
need to. Git also makes collaboration easier, allowing changes by multiple people to all be merged
into one source.
So regardless of whether you write code that only you will see, or work as part of a team, Git will be
useful for you.
Git is software that runs locally. Your files and their history are stored on your computer. You can
also use online hosts (such as GitHub or Bitbucket) to store a copy of the files and their revision
history. Having a centrally located place where you can upload your changes and download changes
from others, enable you to collaborate more easily with other developers. Git can automatically
merge the changes, so two people can even work on different parts of the same file and later merge
those changes without losing each other‘s work!
Ways to Use Git
Git is software that you can access via a command line (terminal), or a desktop app that has a GUI
(graphical user interface) such as Sourcetree shown below.
Git Repositories
A Git repository (or repo for short) contains all of the project files and the entire revision history.
You‘ll take an ordinary folder of files (such as a website‘s root folder), and tell Git to make it a
repository. This creates a .git subfolder, which contains all of the Git metadata for tracking changes.
On Unix-based operating systems such as macOS, files and folders that start with a period (.) are
hidden, so you will not see the .git folder in the macOS Finder unless you show hidden files, but it‘s
there! You might be able to see it in some code editors.
Stage & Commit Files
48
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Think of Git as keeping a list of changes to files. So how do we tell Git to record our changes? Each
recorded change to a file or set of files is called a commit.
Before we make a commit, we must tell Git what files we want to commit. This is called staging and
uses the add command. Why must we do this? Why can‘t we just commit the file directly? Let‘s say
you‘re working on a two files, but only one of them is ready to commit. You don‘t want to be forced
to commit both files, just the one that‘s ready. That‘s where Git‘s add command comes in. We add
files to a staging area, and then we commit the files that have been staged.
Remote Repositories (on GitHub & Bitbucket)
Storing a copy of your Git repo with an online host (such as GitHub or Bitbucket) gives you a
centrally located place where you can upload your changes and download changes from others,
letting you collaborate more easily with other developers. After you have a remote repository set up,
you upload (push) your files and revision history to it. After someone else makes changes to a remote
repo, you can download (pull) their changes into your local repo.
49
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Pull Requests
Pull requests are a way to discuss changes before merging them into your codebase. Let‘s say you‘re
managing a project. A developer makes changes on a new branch and would like to merge that
branch into the master. They can create a pull request to notify you to review their code. You can
discuss the changes, and decide if you want to merge it or not.
50
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
your team to and adapt their changes. Version control helps you with the, merging different
requests to main repository without making any undesirable changes. You may test the
functionalities without putting it live, and you don‘t need to download and set up each time, just
pull the changes and do the changes, test it and merge it back. It may be visualized as.
51
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The benefit of CVCS (Centralized Version Control Systems) makes collaboration amongst
developers along with providing an insight to a certain extent on what everyone else is doing on the
project. It allows administrators to fine-grained control over who can do what.
It has some downsides as well which led to the development of DVS. The most obvious is the single
point of failure that the centralized repository represents if it goes down during that period
collaboration and saving versioned changes is not possible. What if the hard disk of the central
database becomes corrupted, and proper backups haven‘t been kept? You lose absolutely
everything.
Distributed Version Control Systems: Distributed version control systems contain multiple
repositories. Each user has their own repository and working copy. Just committing your changes
will not give others access to your changes. This is because commit will reflect those changes in
your local repository and you need to push them in order to make them visible on the central
repository. Similarly, When you update, you do not get others‘ changes unless you have first pulled
those changes into your repository.
To make your changes visible to others, 4 things are required:
You commit
You push
They pull
They update
The most popular distributed version control systems are Git, and Mercurial. They help us overcome
the problem of single point of failure.
52
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
53
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Benefits of CVCS
Easy to learn and manage
Works well with binary files
More control over users and their access.
CVS and SVN are some conventional Central Version Control systems.
Drawbacks of CVCS
It is not locally available, which means we must connect to the network to perform operations.
During the operations, if the central server gets crashed, there is a high chance of losing the
data.
For every command, CVCS connects the central server which impacts speed of operation
The Distributed Version Control System is developed to overcome all these issues.
In DVCS, there is no need to store the entire data on our local repository. Instead, we can have a clone
of the remote repository to the local. We can also have a full snapshot of the project history.
54
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The User needs to update for the changes to be reflected in the local repository. Then the user can
push the changes to the central repository. If other users want to check the changes, they will pull the
updated central repository to their local repository, and then they update in their local copy.
Benefits of DVCS
Except for pushing and pulling the code, the user can work offline in DVCS
DVCS is fast compared to CVCS because you don't have to contact the central server for
every command
Merging and branching the changes in DVCS is very easy
Performance of DVCS is better
Even if the main server crashes, code will be stored in the local systems
Git and Mercurial are standard distributed version central systems. If we don‘t want a DVCS on our
server, we can use either GitHub or BitBucket to store our central repository, and we can get the clone
of the central repository to our local systems. GitHub and BitBucket are the most popular companies
that provide cloud hosting for software development version control using Git.
55
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Version control is a fundamental concept to most of the companies. But its crucial role in DevOps
cannot be overlooked.
57
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maintains a history of every version.
Following are the types of VCS −
Centralized version control system (CVCS).
Distributed/Decentralized version control system (DVCS).
In this chapter, we will concentrate only on distributed version control system and especially on Git.
Git falls under distributed version control system.
Distributed Version Control System
Centralized version control system (CVCS) uses a central server to store all files and enables team
collaboration. But the major drawback of CVCS is its single point of failure, i.e., failure of the central
server. Unfortunately, if the central server goes down for an hour, then during that hour, no one can
collaborate at all. And even in a worst case, if the disk of the central server gets corrupted and proper
backup has not been taken, then you will lose the entire history of the project. Here, distributed
version control system (DVCS) comes into picture.
DVCS clients not only check out the latest snapshot of the directory but they also fully mirror the
repository. If the server goes down, then the repository from any client can be copied back to the
server to restore it. Every checkout is a full backup of the repository. Git does not rely on the central
server and that is why you can perform many operations when you are offline. You can commit
changes, create branches, view logs, and perform other operations when you are offline. You require
network connection only to publish your changes and take the latest changes.
Advantages of Git
Free and open source
Git is released under GPL‘s open source license. It is available freely over the internet. You can use
Git to manage property projects without paying a single penny. As it is an open source, you can
download its source code and also perform changes according to your requirements.
Fast and small
As most of the operations are performed locally, it gives a huge benefit in terms of speed. Git does not
rely on the central server; that is why, there is no need to interact with the remote server for every
operation. The core part of Git is written in C, which avoids runtime overheads associated with other
high-level languages. Though Git mirrors entire repository, the size of the data on the client side is
small. This illustrates the efficiency of Git at compressing and storing data on the client side.
Implicit backup
The chances of losing data are very rare when there are multiple copies of it. Data present on any
client side mirrors the repository, hence it can be used in the event of a crash or disk corruption.
Security
Git uses a common cryptographic hash function called secure hash function (SHA1), to name and
identify objects within its database. Every file and commit is check-summed and retrieved by its
checksum at the time of checkout. It implies that, it is impossible to change file, date, and commit
message and any other data from the Git database without knowing Git.
No need of powerful hardware
In case of CVCS, the central server needs to be powerful enough to serve requests of the entire team.
For smaller teams, it is not an issue, but as the team size grows, the hardware limitations of the server
can be a performance bottleneck. In case of DVCS, developers don‘t interact with the server unless
58
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
they need to push or pull changes. All the heavy lifting happens on the client side, so the server
hardware can be very simple indeed.
Easier branching
CVCS uses cheap copy mechanism, If we create a new branch, it will copy all the codes to the new
branch, so it is time-consuming and not efficient. Also, deletion and merging of branches in CVCS is
complicated and time-consuming. But branch management with Git is very simple. It takes only a few
seconds to create, delete, and merge branches.
DVCS Terminologies
Local Repository
Every VCS tool provides a private workplace as a working copy. Developers make changes in their
private workplace and after commit, these changes become a part of the repository. Git takes it one
step further by providing them a private copy of the whole repository. Users can perform many
operations with this repository such as add file, remove file, rename file, move file, commit changes,
and many more.
Working Directory and Staging Area or Index
The working directory is the place where files are checked out. In other CVCS, developers generally
make modifications and commit their changes directly to the repository. But Git uses a different
strategy. Git doesn‘t track each and every modified file. Whenever you do commit an operation, Git
looks for the files present in the staging area. Only those files present in the staging area are
considered for commit and not all the modified files.
Let us see the basic workflow of Git.
Step 1 − You modify a file from the working directory.
Step 2 − You add these files to the staging area.
Step 3 − You perform commit operation that moves the files from the staging area. After push
operation, it stores the changes permanently to the Git repository.
Suppose you modified two files, namely ―sort.c‖ and ―search.c‖ and you want two different commits
for each operation. You can add one file in the staging area and do commit. After the first commit,
repeat the same procedure for another file.
# First commit
59
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[bash]$ git add sort.c
# Second commit
[bash]$ git add search.c
61
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
GIT Command Line
Git in the Command Line
What is Git?
Git is a free and open source control system used for managing projects on Github. It offers
many more commands and much more flexibility than Github’s online interface does.
Note: This tutorial assumes some knowledge of the basic commands of Github. To refresh
on those, refer to GitHub Basics.
Installing Git
To install Git, follow the instructions on this page. For Windows, when installing Git through
the installer, it is recommended you select the ―Use Git from the Windows Command
Prompt‖ option. This will allow you to use all git commands through your terminal (CMD,
PowerShell, Anaconda) rather than having to use Git’s personal terminal, Git Bash.
Using the Command Line
If you are already familiar with using the command prompt, feel free to skip this section.
The command prompt/terminal is another way of interfacing with your computer, rather than
the way you typically would use a computer by clicking different buttons. While the terminal
can be confusing at first, and requires some memorization of some commands, it provides a
lot of power for using your computer in different ways. Knowing how to use the terminal
opens a lot of new doors and can ultimately make using your computer much easier and
more accessible.
Note that there are a few terminal options for Windows, such as CMD, Windows PowerShell,
and the Anaconda Prompt. This tutorial will use PowerShell, as it is most similar to the Mac
terminal, and has all the necessary functionality. However, all the other terminal options
should accomplish everything you want, just with slightly different commands.
To open your terminal in Windows, search for ―PowerShell‖ in your programs. On Mac, just
search for ―Terminal‖ in your programs. A prompt like the one below should open up.
Within this prompt, you will run commands by typing them directly into this prompt and hitting
―Enter‖. The path listed before the where you type is the directory you are currently working
in on your computer. In order to run commands on a specific folder (such as your subteams
repository) you will need to navigate to that folder. The two commands to do this are:
ls (list): Lists all files in the current directory you are in
cd (change directory): changes your directory to the directory listed after cd
For example, say I want to move to the ―aguaclara_demo‖ repository to make some git
changes. First I use ls to see what files I can change to:
62
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
I see the CS folder, so I use cd CS to move to that folder. From there I continue to use these
commands until I find the folder I am looking for.
If you ever need to move up one folder, you can use cd .. to accomplish this. If you know
the file path of the folder you want, you can also add that directly to move to your desired
folder. For example, I could move directly to the aguaclara_demo folder as shown below:
One other important note for using the terminal is to always wait for the commands to finish
running. Sometimes when running a more complicated command, the computer will take a
while to run, and the terminal will slowly show commands as they run. In this case, make
sure to wait until all the commands are done running, and you can see the blinking cursor
before you type another command. If you ever want to stop a command while it’s running,
hold down the control button (or the command button on Mac) and hit ―c‖.
63
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
progress to the digital collection of Github, just type ―git status‖ while in the folder you want to
check.
Pulling
Most of what you will be doing with Git is pulling and pushing changes from Github. To pull,
just use the command git pull .
Pushing
To push your local changes, first stage your changes, then commit them to your branch, and
then push them to the origin.
To stage your changes, use git add -A The -A ensures you add all of your files you have
worked on.
To commit your changes, use the command git commit -m "Commit Message" and fill in the
commit message with whatever you want to say about your commit. Note that it is very
important to include the -m and the commit message. If you do not, Git will take you to an
interface using the text editor Vim, which is very challenging to use.
64
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
If you happen to accidentally type git commit without the -m and the commit message and
get taken to Vim, you can still write your commit message. Use your arrow keys to scroll
up to the top line where it is blank. Write your commit message, then to exit out of this
editor, press Escape. You cursor should appear in the bottom left corner. From there
type :x and hit enter to save your commit message.
Finally, to push your changes, use git push . If you have any merge errors, the terminal
will notify you and you can fix them manually.
Installing Git
Installing on Linux
Installing on Windows
Initial setup
Git Essentials
Creating repository
Creating a Repository
Create a folder on your desktop and name it "My Repo."
Create a New Folder
Next, create a text file containing this line of code in the folder:
65
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1print(“Hello World!”)
python
Name the file hello_world.py.
Adding Code to the Repo
You now have some code in a folder. Next, you'll turn it into a repository.
In the command line/terminal, go to the ―My Repo‖ folder.
Command Line in Folder Location
Before using Git, you’ll need to initialize, or configure, it by specifying who you are. This will
be used in the version history so it’s clear who is making the contribution to the code. Run
the command:
1git config –global user.email “your email here”
bash
The quotation marks are not needed as part of the actual command you will run.
Configuring Your User
Now, programmatically convert the folder into a repository, or initialize the repository. This is
simple to do. Simply run:
1git init
bash
Congratulations! You have successfully initialized a repo. Now, add your code into it.
Initializing Your Repository Using Git Init
Sometimes, non-technical people or the people who have not yet worked on Git
consider these two terms (Git Clone & Git Fork) as similar. Actually, they are, but with
some differences. It is better to rinse your brain with forking before learning the concept
of cloning in Git.
Also, since the basics of Git and GitHub have already been covered in this course, from
now on we will use both of them to perform the operations and procedures on our
code/files. On the greater circle, this tutorial will make you familiar with:
What is Cloning?
Purpose of Cloning
Importance of Cloning in Git
Cloning a repository and Git Clone command
66
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
What is Git Clone or Cloning in Git?
Cloning is a process of creating an identical copy of a Git Remote Repository to the local machine.
Now, you might wonder, that is what we did while forking the repository!!
When we clone a repository, all the files are downloaded to the local machine but the remote git
repository remains unchanged. Making changes and committing them to your local repository (cloned
repository) will not affect the remote repository that you cloned in any way. These changes made on
the local machine can be synced with the remote repository anytime the user wants.
Since cloning is so important part of the Git and GitHub journey, it is better to see in detail how
cloning works. It is a very simple and straightforward process to which the next section is dedicated.
67
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
How does Cloning in Git works?
A lot of people want to set up a shared repository to allow a team of developers to publish their code
on GitHub / GitLab / BitBucket etc. A repository that is uploaded online for collaboration is called
an Upstream Repository or a Central Repository.
A central repository indicates that all the changes from all the contributors pushed into this repository
only. So, this is the most updated repository instance of itself. Sometimes this is often called
the original repository. Now, the image given below is pretty clear about the concept of cloning.
With respect to the above image, the cloning process works in these steps:
Clone a Repository: The user starts from the upstream repository on GitHub. Since the user
navigated to the repository because he/she is interested in the concept and they like to
contribute. The process starts from cloning when they clone the repository it into their local
machine. Now they have the exact copy of the project files on their system to make the changes.
Make the desired changes: After cloning, contributors provide their contribution to the
repository. Contribution in the form of editing the source files resulting in either a bug fix or
adding functionality or maybe optimizing the code. But the bottom line is, everything happens
on their local system.
Pushing the Changes: Once the changes are done and now the modifications can be pushed to
the upstream repository.
68
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Fetch pull and remote
Git Fetch is the command that tells the local repository that there are changes
available in the remote repository without bringing the changes into the local
repository. Git Pull on the other hand brings the copy of the remote directory
changes into the local repository. Let us look at Git Fetch and Git Pull separately
with the help of an example.
Git Fetch
Let us create a file called demo.txt with ―Hello Geeks” content inside it initialize the
directory to a git repository and push the changes to a remote repository.
git init
git add <Filename>
git commit -m <Commit Message>
git remote add origin <Link to your remote repository>
git push origin <branch name>
69
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The local and the remote repositories are now in sync and have the same content at
both places. Let’s now update our demo.txt in the remote repository.
70
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Now since we have updated our demo.txt remotely, let’s bring the changes to our
local repository. Our local repository has only 1 commit while the remote repository
now has 2 commits (observe the second commit starting from 4c4fcb8). Let’s use
the git fetch command to see in the local repository whether we have a change in
the remote repository or not. Before that let’s use the git log command to see our
previous commits.
We can see that after using git fetch we get the information that there is some
commit done in the remote repository. (notice the 4c4fcb8 which is the initials of our
2nd commit in a remote repository). To merge these changes into our local
repository, we need to use the git merge origin/<branch name> command.
Let us have a look at our commits in the local repository using the git log command.
And we got our remote repository commit in our local repository. This is how git
fetch works. Let us now have a look at the git pull command.
Git Pull
Let’s make more changes to our demo.txt file at the remote repository.
71
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Now, we have 3 commits at our remote repository whereas 2 commits at our local
repository. (Notice the third commit starting with 09d828f). Let us now bring this
change to our local repository using the git pull origin <branch name> command.
We can see that with the help of just git pull command we directly fetched and
merged our remote repository with the local repository.
git pull = git fetch + git merge
Let us see what our demo.txt in the local repository looks like –
And now our remote and local repositories are again in sync with each other. So,
from the above examples, we can conclude that –
Difference Table
72
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Branching
A branch is a version of the repository that diverges from the main working project. It is a
feature available in most modern version control systems. A Git project can have more than
one branch. These branches are a pointer to a snapshot of your changes. When you want to
add a new feature or fix a bug, you spawn a new branch to summarize your changes. So, it is
complex to merge the unstable code with the main code base and also facilitates you to
clean up your future history before merging with the main branch.
You can create a new branch with the help of the git branch command. This command will
be used as:
Syntax:
1. $ git branch <branch name>
Output:
73
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
You can List all of the available branches in your repository by using the following command.
Either we can use git branch - list or git branch command to list the available branches in
the repository.
Syntax:
1. $ git branch --list
or
1. $ git branch
Output:
Here, both commands are listing the available branches in the repository. The symbol * is
representing currently active branch.
You can delete the specified branch. It is a safe operation. In this command, Git prevents you
from deleting the branch if it has unmerged changes. Below is the command to do this.
Syntax:
1. $ git branch -d<branch name>
Output:
This command will delete the existing branch B1 from the repository.
The git branch d command can be used in two formats. Another format of this command
is git branch D. The 'git branch D' command is used to delete the specified branch.
1. $ git branch -D <branch name>
74
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
You can delete a remote branch from Git desktop application. Below command is used to
delete a remote branch:
Syntax:
1. $ git push origin -delete <branch name>
Output:
As you can see in the above output, the remote branch named branch2 from my GitHub
account is deleted.
Git allows you to switch between the branches without making a commit. You can switch
between two branches with the git checkout command. To switch between the branches,
below command is used:
1. $ git checkout<branch name>
Switch from master Branch
You can switch from master to any other branch available on your repository without making
any commit.
Syntax:
1. $ git checkout <branch name>
Output:
As you can see in the output, branches are switched from master to branch4 without
making any commit.
Switch to master branch
You can switch to the master branch from any other branch with the help of below
command.
Syntax:
1. $ git branch -m master
Output:
75
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
As you can see in the above output, branches are switched from branch1 to master without
making any commit.
We can rename the branch with the help of the git branch command. To rename a branch,
use the below command:
Syntax:
1. $ git branch -m <old branch name><new branch name>
Output:
Git allows you to merge the other branch with the currently active branch. You can merge
two branches with the help of git merge command. Below command is used to merge the
branches:
Syntax:
1. $ git merge <branch name>
Output:
From the above output, you can see that the master branch merged with renamedB1. Since
I have made no-commit before merging, so the output is showing as already up to date.
76
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Let’s go through a simple example of branching and merging with a workflow that you might
use in the real world. You’ll follow these steps:
1. Do some work on a website.
2. Create a branch for a new user story you’re working on.
3. Do some work in that branch.
At this stage, you’ll receive a call that another issue is critical and you need a hotfix. You’ll do
the following:
1. Switch to your production branch.
2. Create a branch to add the hotfix.
3. After it’s tested, merge the hotfix branch, and push to production.
4. Switch back to your original user story and continue working.
Basic Branching
First, let’s say you’re working on your project and have a couple of commits already on
the master branch.
77
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Figure 19. Creating a new branch pointer
You work on your website and do some commits. Doing so moves the iss53 branch
forward, because you have it checked out (that is, your HEAD is pointing to it):
$ vim index.html
$ git commit -a -m 'Create new footer [issue 53]'
Figure 20. The iss53 branch has moved forward with your work
Now you get the call that there is an issue with the website, and you need to fix it
immediately. With Git, you don’t have to deploy your fix along with the iss53 changes
you’ve made, and you don’t have to put a lot of effort into reverting those changes before you
can work on applying your fix to what is in production. All you have to do is switch back to
your master branch.
However, before you do that, note that if your working directory or staging area has
uncommitted changes that conflict with the branch you’re checking out, Git won’t let you
switch branches. It’s best to have a clean working state when you switch branches. There
are ways to get around this (namely, stashing and commit amending) that we’ll cover later
on, in Stashing and Cleaning. For now, let’s assume you’ve committed all your changes, so
you can switch back to your master branch:
$ git checkout master
Switched to branch 'master'
At this point, your project working directory is exactly the way it was before you started
working on issue #53, and you can concentrate on your hotfix. This is an important point to
remember: when you switch branches, Git resets your working directory to look like it did the
last time you committed on that branch. It adds, removes, and modifies files automatically to
make sure your working copy is what the branch looked like on your last commit to it.
Next, you have a hotfix to make. Let’s create a hotfix branch on which to work until it’s
completed:
$ git checkout -b hotfix
Switched to a new branch 'hotfix'
$ vim index.html
$ git commit -a -m 'Fix broken email address'
[hotfix 1fb7853] Fix broken email address
78
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1 file changed, 2 insertions(+)
79
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Figure 22. master is fast-forwarded to hotfix
After your super-important fix is deployed, you’re ready to switch back to the work you were
doing before you were interrupted. However, first you’ll delete the hotfix branch, because
you no longer need it — the master branch points at the same place. You can delete it with
the -d option to git branch:
$ git branch -d hotfix
Deleted branch hotfix (3a0874c).
Now you can switch back to your work-in-progress branch on issue #53 and continue
working on it.
$ git checkout iss53
Switched to branch "iss53"
$ vim index.html
$ git commit -a -m 'Finish the new footer [issue 53]'
[iss53 ad82d7a] Finish the new footer [issue 53]
1 file changed, 1 insertion(+)
The branches.
81
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 3 : Chef for configuration management
Let us take an example, suppose you are a system engineer in an organization and you want
to deploy or update software or an operating system on more than hundreds of systems in
your organization in one day. This can be done manually but still, it causes multiple errors,
some software may crash while updating and we won’t be able to revert back to the previous
version. To solve such kinds of issues we use Configuration management.
Configuration Management keeps track of all the software and hardware-related information
of an organization and it also repairs, deploys, and updates the entire application with its
automated procedures. Configuration management does the work of multiple System
Administrators and developers who manage hundreds of servers and applications. Some
tools used for Configuration management are Chef, Puppet, Ansible, CF Engine, SaltStack,
etc.
Let us take a scenario, suppose you have shifted your office into a different environment and
you wanted your system administrator to install, update and deploy software on hundreds of
system overnight. When the system engineer does this task manually it may cause Human
errors and some software’s may not function properly. At this stage, we use Chef which is a
powerful automated tool which transfers infrastructure into code.
82
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Chef automated the application configuration, deployment and management throughout the
network even if we are operating it on cloud or hybrid. We can use chef to speed up the
application deployment. Chef is a great tool for accelerating software delivery, the speed of
software development refers to how quickly the software is able to change in response to
new requirements or conditions
Benefits of Chef
Accelerating software delivery, when your infrastructure is automated all the software
requirements like testing, creating new environments for software deployments etc. becomes
faster.
Increased service Resiliency, by making the infrastructure automated it monitors for bugs
and errors before they occur it can also recover from errors more quickly.
Risk Management, chef lowers risk and improves compliance at all stages of deployment. It
reduces the conflicts during the development and production environment.
Cloud Adoption, Chef can be easily adapted to a cloud environment and the servers and
infrastructure can be easily configured, installed and managed automatically by Chef.
Managing Data Centers and Cloud Environments, as discussed earlier Chef can run on
different platforms, under chef you can manage all your cloud and on-premise platforms
including servers.
Streamlined IT operation and Workflow, Chef provides a pipeline for continuous
deployment starting from building to testing and all the way through delivery, monitoring, and
troubleshooting.
Features of Chef
Easily manage hundreds of server with a handful of employees.
It can be easily managed using operating systems such as Linux, Windows, FreeBSD,
and
It maintains a blueprint of the entire infrastructure.
It integrates with all major cloud service providers.
Centralized management, i.e., a single Chef server can be used as the center for
deploying the policies.
Pros of Chef
One of the most flexible solutions for OS and middleware management.
Designed for programmers.
Chef offers hybrid and SaaS solutions for Chef Servers
Sequential execution order
83
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Very stable, reliable and mature, especially for large deployments in both public and
private environments.
Cons of Chef
Requires steep learning curve
Initial setup is complicated.
Lacks push, so no immediate actions on change. The pull process follows a specified
schedule.
How Chef Works?
Chef basically consists of three components, Chef Server, workstations and Nodes. The chef
server is center hubs of all the operations were changes are stored. The workstation is the
place all the codes are created or changed. Nodes are a machine that is managed by chef.
The user can interact with chef and chef server through Chef Workstation. Knife and Chef
command line tools are used for interacting with Chef Server. Chef node is a virtual or a
cloud machine managed by chef and each node is configured by Chef-Client installed on it.
Chef server stores all part of the configuration. It ensures all the elements are in right place
and are working as expected.
84
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Chef Components
Chef has major components such as Workstation, Cookbook, Node, Chef-Client, and Chef-
Server. Let us see the entire major component in detail.
Chef Server
Chef server contains all configuration data and it stores cookbooks, recipes, and metadata
that describe each node in the Chef-Client. Configuration details are given to node through
Chef-Client. Any changes made must pass through the Chef server to be deployed. Prior to
pushing the changes, it verifies that the nodes and workstation are paired with the server
through the use of authorization keys, and then allow for communication between
workstations and nodes.
Workstation
The workstation is used to interact with Chef-server and also to interact with Chef-nodes. It is
also used to create Cookbooks. Workstation is a place where all the interaction takes place
where Cookbooks are created, tested and deployed, and in workstation, codes are tested.
Workstation is also used for defining roles and environments based on the development and
production environment. Some components of workstation are
Development Kit it contains all the packages requires for using Chef
Chef Command line tool is a place where cookbooks are created, tested and deployed and
through this policies are uploaded to Chef Server.
Knife is used for interacting with Chef Nodes.
Test Kitchen is for validating Chef Code
Chef-Repo is a repository in which cookbooks are created, tested and maintained though
Chef Command line tool.
Cookbooks
Cookbooks are created using Ruby language and Domain Specific languages are used for
specific resources. A cookbook contains recipes which specify resources to be used and in
which order it is to be used. The cookbook contains all the details regarding the work and it
changes the configuration of the Chef-Node.
Attributes are used for overriding default setting in a node.
Files are for transferring files from sub directory to a specific path in chef-client.
Libraries are written in Ruby and it’s used for configuring custom resources and recipes.
Metadata contains information for deploying the cookbooks to each node.
Recipes are a configuration element that is stored in a cookbook. Recipes can also be
included in other recipes and executed based on the run list. Recipes are created using
Ruby language.
Nodes
85
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Nodes are managed by Chef and each node is configured by installing Chef-Client on it.
Chef-Nodes are a machine such as physical, virtual cloud etc.
Chief-Client is for registering and authenticating node, building node objects and for
configuration of the nodes. Chief-client runs locally on every node to configure the node.
Ohai is used for determining the system state at beginning of Chef run in Chef-Client. It
Collects. All the system configuration data.
Roles of Chef in DevOps
Chef is for automating and managing the infrastructure. Chef IT automation can be done
using various Chef DevOps products like Chef-server, Chef-client. Chef DevOps is a tool for
accelerating application delivery and DevOps Collaboration. Chef helps solve the problem
by treating infrastructure as code. Rather than manually changing anything, the machine
setup is described in a Chef recipe.
Conclusion
Chef is a powerful configuration management tool in DevOps and it has good features to be
the best in the market. Day by day Chef has been improving its features and delivering good
results to the customer. Chef is used by worlds leading IT industries like Facebook, AWS,
HP Public cloud etc. Job opportunities are increasing day by day for Chef Automation
masters. To be a master in Chef Automation come to and join our Intellipaat family and be a
master in it.
86
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Workstation Setup: How to configure knife Execute some commands
to test connection between knife and workstation.
Chef follows the concept of client-server architecture, hence in order to start working with
Chef one needs to set up Chef on the workstation and develop the configuration locally.
Later it can be uploaded to Chef server to make them working on the Chef nodes, which
needs to be configured.
Opscode provides a fully packaged version, which does not have any external prerequisites.
This fully packaged Chef is called the omnibus installer.
On Windows Machine
Step 1 − Download the setup .msi file of chefDK on the machine.
Step 2 − Follow the installation steps and install it on the target location.
The setup will look as shown in the following screenshot.
Knife is Chef’s command-line tool to interact with the Chef server. One uses it for uploading
cookbooks and managing other aspects of Chef. It provides an interface between the chefDK
(Repo) on the local machine and the Chef server. It helps in managing −
Chef nodes
Cookbook
Recipe
87
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Environments
Cloud Resources
Cloud Provisioning
Installation on Chef client on Chef nodes
Knife provides a set of commands to manage Chef infrastructure.
Bootstrap Commands
knife bootstrap [SSH_USER@]FQDN (options)
Client Commands
knife client bulk delete REGEX (options)
knife client create CLIENTNAME (options)
knife client delete CLIENT (options)
knife client edit CLIENT (options)
Usage: C:/opscode/chef/bin/knife (options)
knife client key delete CLIENT KEYNAME (options)
knife client key edit CLIENT KEYNAME (options)
knife client key list CLIENT (options)
knife client key show CLIENT KEYNAME (options)
knife client list (options)
knife client reregister CLIENT (options)
knife client show CLIENT (options)
Configure Commands
knife configure (options)
knife configure client DIRECTORY
Cookbook Commands
knife cookbook bulk delete REGEX (options)
knife cookbook create COOKBOOK (options)
knife cookbook delete COOKBOOK VERSION (options)
knife cookbook download COOKBOOK [VERSION] (options)
knife cookbook list (options)
knife cookbook metadata COOKBOOK (options)
knife cookbook metadata from FILE (options)
knife cookbook show COOKBOOK [VERSION] [PART] [FILENAME] (options)
knife cookbook test [COOKBOOKS...] (options)
knife cookbook upload [COOKBOOKS...] (options)
Cookbook Site Commands
knife cookbook site download COOKBOOK [VERSION] (options)
knife cookbook site install COOKBOOK [VERSION] (options)
knife cookbook site list (options)
knife cookbook site search QUERY (options)
knife cookbook site share COOKBOOK [CATEGORY] (options)
88
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife cookbook site show COOKBOOK [VERSION] (options)
knife cookbook site unshare COOKBOOK
Data Bag Commands
knife data bag create BAG [ITEM] (options)
knife data bag delete BAG [ITEM] (options)
knife data bag edit BAG ITEM (options)
knife data bag from file BAG FILE|FOLDER [FILE|FOLDER..] (options)
knife data bag list (options)
knife data bag show BAG [ITEM] (options)
Environment Commands
knife environment compare [ENVIRONMENT..] (options)
knife environment create ENVIRONMENT (options)
knife environment delete ENVIRONMENT (options)
knife environment edit ENVIRONMENT (options)
knife environment from file FILE [FILE..] (options)
knife environment list (options)
knife environment show ENVIRONMENT (options)
Exec Commands
knife exec [SCRIPT] (options)
Help Commands
knife help [list|TOPIC]
Index Commands
knife index rebuild (options)
Node Commands
knife node bulk delete REGEX (options)
knife node create NODE (options)
knife node delete NODE (options)
knife node edit NODE (options)
knife node environment set NODE ENVIRONMENT
knife node from file FILE (options)
knife node list (options)
knife node run_list add [NODE] [ENTRY[,ENTRY]] (options)
knife node run_list remove [NODE] [ENTRY[,ENTRY]] (options)
knife node run_list set NODE ENTRIES (options)
knife node show NODE (options)
OSC Commands
knife osc_user create USER (options)
knife osc_user delete USER (options)
knife osc_user edit USER (options)
knife osc_user list (options)
knife osc_user reregister USER (options)
89
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife osc_user show USER (options)
Path-Based Commands
knife delete [PATTERN1 ... PATTERNn]
knife deps PATTERN1 [PATTERNn]
knife diff PATTERNS
knife download PATTERNS
knife edit [PATTERN1 ... PATTERNn]
knife list [-dfR1p] [PATTERN1 ... PATTERNn]
knife show [PATTERN1 ... PATTERNn]
knife upload PATTERNS
knife xargs [COMMAND]
Raw Commands
knife raw REQUEST_PATH
Recipe Commands
knife recipe list [PATTERN]
Role Commands
knife role bulk delete REGEX (options)
knife role create ROLE (options)
knife role delete ROLE (options)
knife role edit ROLE (options)
knife role env_run_list add [ROLE] [ENVIRONMENT] [ENTRY[,ENTRY]] (options)
knife role env_run_list clear [ROLE] [ENVIRONMENT]
knife role env_run_list remove [ROLE] [ENVIRONMENT] [ENTRIES]
knife role env_run_list replace [ROLE] [ENVIRONMENT] [OLD_ENTRY]
[NEW_ENTRY]
knife role env_run_list set [ROLE] [ENVIRONMENT] [ENTRIES]
knife role from file FILE [FILE..] (options)
knife role list (options)
knife role run_list add [ROLE] [ENTRY[,ENTRY]] (options)
knife role run_list clear [ROLE]
knife role run_list remove [ROLE] [ENTRY]
knife role run_list replace [ROLE] [OLD_ENTRY] [NEW_ENTRY]
knife role run_list set [ROLE] [ENTRIES]
knife role show ROLE (options)
Serve Commands
knife serve (options)
SSH Commands
knife ssh QUERY COMMAND (options)
SSL Commands
knife ssl check [URL] (options)
knife ssl fetch [URL] (options)
Status Commands
90
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife status QUERY (options)
Tag Commands
knife tag create NODE TAG ...
knife tag delete NODE TAG ...
knife tag list NODE
User Commands
knife user create USERNAME DISPLAY_NAME FIRST_NAME LAST_NAME EMAIL
PASSWORD (options)
knife user delete USER (options)
knife user edit USER (options)
knife user key create USER (options)
knife user key delete USER KEYNAME (options)
knife user key edit USER KEYNAME (options)
knife user key list USER (options)
knife user key show USER KEYNAME (options)
knife user list (options)
knife user reregister USER (options)
knife user show USER (options)
Knife Setup
In order to set up knife, one needs to move to .chef directory and create a knife.rb inside the
chef repo, which tells knife about the configuration details. This will have a couple up details.
current_dir = File.dirname(__FILE__)
log_level :info
log_location STDOUT
node_name 'node_name'
client_key "#{current_dir}/USER.pem"
validation_client_name 'ORG_NAME-validator'
validation_key "#{current_dir}/ORGANIZATION-validator.pem"
chef_server_url 'https://api.chef.io/organizations/ORG_NAME'
cache_type 'BasicFile'
cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )
cookbook_path ["#{current_dir}/../cookbooks"]
In the above code, we are using the hosted Chef server which uses the following two keys.
validation_client_name 'ORG_NAME-validator'
validation_key "#{current_dir}/ORGANIZATION-validator.pem"
Here, knife.rb tells knife which organization to use and where to find the private key. It tells
knife where to find the users’ private key.
client_key "#{current_dir}/USER.pem"
The following line of code tells knife we are using the hosted server.
chef_server_url 'https://api.chef.io/organizations/ORG_NAME'
Using the knife.rb file, the validator knife can now connect to your organization’s hosted
Opscode.
91
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The Chef Infra Server uses role-based access control (RBAC) to restrict access to objects—
nodes, environments, roles, data bags, cookbooks, and so on. This ensures that only
authorized user and/or Chef Infra Client requests to the Chef Infra Server are allowed.
Access to objects on the Chef Infra Server is fine-grained, allowing access to be defined by
object type, object, group, user, and organization. The Chef Infra Server uses permissions
to define how a user may interact with an object, after they have been authorized to do so.
The Chef Infra Server uses organizations, groups, and users to define role -based access
control:
Feature Description
An organization is the top-level entity for role-based access control in
the Chef Infra Server. Each organization contains the default groups
( admins , clients , and users , plus billing_admins for the hosted Chef
Infra Server), at least one user and at least one node (on which the
Chef Infra Client is installed). The Chef Infra Server supports multiple
organizations. The Chef Infra Server includes a single default
organization that is defined during setup. Additional organizations can
be created after the initial setup and configuration of the Chef Infra
Server.
A group is used to define access to object types and objects in the
Chef Infra Server and also to assign permissions that determine what
types of tasks are available to members of that group who are
authorized to perform them. Groups are configured by organization.
A client is an actor that has permission to access the Chef Infra Server.
A client is most often a node (on which the Chef Infra Client runs), but
is also a workstation (on which knife runs), or some other machine that
is configured to use the Chef Infra Server API. Each request to the Chef
Infra Server that is made by a client uses a private key for
authentication that must be authorized by the public key on the Chef
Infra Server.
When a user makes a request to the Chef Infra Server using the Chef Infra Server API,
permission to perform that action is determined by the following process :
92
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1. Check if the user has permission to the object type
2. If no, recursively check if the user is a member of a security group that has
permission to that object
3. If yes, allow the user to perform the action
Permissions are managed using the Chef management console add-on in the Chef Infra
Server web user interface.
Organizations
A single instance of the Chef Infra Server can support many organizations. Each
organization has a unique set of groups and users. Each organization manages a unique set
of nodes, on which a Chef Infra Client is installed and configured so that it may interact with
a single organization on the Chef Infra Server.
Using multiple organizations within the Chef Infra Server ensures that the sa me toolset,
coding patterns and practices, physical hardware, and product support effort is being
applied across the entire company, even when:
Multiple product groups must be supported—each product group can have its own
security requirements, schedule, and goals
Updates occur on different schedules—the nodes in one organization are managed
completely independently from the nodes in another
Individual teams have competing needs for object and object types —data bags,
environments, roles, and cookbooks are unique to each organization, even if they
share the same name
Permissions
Permissions are used in the Chef Infra Server to define how users and groups can interact
with objects on the server. Permissions are configured for each organization.
Object Permissions
The Chef Infra Server includes the following object permissions:
Permission Description
Delete Use the Delete permission to define which users and groups may delete an
object. This permission is required for any user who uses the knife [object]
delete [object_name] argument to interact with objects on the Chef Infra
Server.
Grant Use the Grant permission to define which users and groups may configure
permissions on an object. This permission is required for any user who
configures permissions using the Administration tab in the Chef management
console.
Read Use the Read permission to define which users and groups may view the
details of an object. This permission is required for any user who uses
the knife [object] show [object_name] argument to interact with objects on
the Chef Infra Server.
93
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Update Use the Update permission to define which users and groups may edit the
details of an object. This permission is required for any user who uses
the knife [object] edit [object_name] argument to interact with objects on
the Chef Infra Server and for any Chef Infra Client to save node data to the
Chef Infra Server at the conclusion of a Chef Infra Client run.
Global Permissions
Permission Description
Create Use the Create global permission to define which users and groups may
create the following server object types: cookbooks, data bags, environments,
nodes, roles, and tags. This permission is required for any user who uses
the knife [object] create argument to interact with objects on the Chef Infra
Server.
List Use the List global permission to define which users and groups may view the
following server object types: cookbooks, data bags, environments, nodes,
roles, and tags. This permission is required for any user who uses the knife
[object] list argument to interact with objects on the Chef Infra Server.
These permissions set the default permissions for the following Chef Infra Server object
types: clients, cookbooks, data bags, environments, groups, nodes, roles, and sandboxes.
Note
This is only necessary after migrating a client from one Chef Infra Server to
another. Permissions must be reset for client keys after the migration.
Copy
#!/usr/bin/env ruby
require 'chef/knife'
#previously knife.rb
Chef::Config.from_file(File.join(Chef::Knife.chef_config_dir, 'knife.rb'))
rest = Chef::ServerAPI.new(Chef::Config[:chef_server_url])
Chef::Node.list.each do |node|
%w(read update delete grant).each do |perm|
ace = rest.get("nodes/#{node[0]}/_acl")[perm]
ace['actors'] << node[0] unless ace['actors'].include?(node[0])
rest.put("nodes/#{node[0]}/_acl/#{perm}", perm => ace)
puts "Client \"#{node[0]}\" granted \"#{perm}\" access on node \"#{node[0]}\""
end
end
Warning
knife-acl and the Chef Manage browser interface are incompatible. After engaging knife-
acl, you will need to discontinue using the Chef Manage browser interface from that point
forward due to possible incompatibilities.
Groups
The Chef Infra Server includes the following default groups:
Group Description
admins The admins group defines the list of users who have administrative
rights to all objects and object types for a single organization.
billing_admins The billing_admins group defines the list of users who have
permission to manage billing information. This permission exists
only for the hosted Chef Infra Server.
clients The clients group defines the list of nodes on which a Chef Infra
Client is installed and under management by Chef. In general,
think of this permission as "all of the non-human actors---Chef
Infra Client, in almost every case---that get data from, and/or
upload data to, the Chef server". Newly-created Chef Infra Client
instances are added to this group automatically.
public_key_read_access The public_key_read_access group defines which users and clients
have read permissions to key-related endpoints in the Chef Infra
Server API.
users The users group defines the list of users who use knife and the
Chef management console to interact with objects and object
types. In general, think of this permission as "all of the non -admin
human actors who work with data that is uploaded to and/or
downloaded from the Chef server".
Example Default Permissions
The following sections show the default permissions assigned by the Chef Infra Server to
the admins , billing_admins , clients , and users groups.
Note
95
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The creator of an object on the Chef Infra Server is
assigned create , delete , grant , read , and update permission to that object.
admins
The admins group is assigned the following:
billing_admins
The billing_admins group is assigned the following:
GET /clients/CLIENT/keys
GET /clients/CLIENT/keys/KEY
GET /users/USER/keys
GET /users/USER/keys/
By default, the public_key_read_access assigns all members of the users and clients group
permission to these endpoints:
96
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
clients yes yes yes yes yes
users yes yes yes yes yes
users
The users group is assigned the following:
users
The users group is assigned the following:
The chef-validator is allowed to do the following at the start of a Chef Infra Client run. After
the Chef Infra Client is registered with Chef Infra Server, that Chef Infra Client is added to
the clients group:
Scenario
The following user accounts exist on the Chef Infra Server: pivotal (a superuser
account), alice , bob , carol , and dan . Run the following command to view a list of users on
the Chef Infra Server:
Copy
chef-server-ctl user-list
97
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
and it returns the same list of users:
Copy
pivotal
alice
bob
carol
dan
Copy
knife user list -c ~/.chef/alice.rb
Copy
ERROR: You authenticated successfully to <chef_server_url> as alice
but you are not authorized for this action
Response: Missing read permission
Alice is not a superuser and does not have permissions on other users because user
accounts are global to organizations in the Chef Infra Server. Let’s add Alice to the server-
admins group:
Copy
chef-server-ctl grant-server-admin-permissions alice
Copy
User alice was added to server-admins.
Alice can now create, read, update, and delete user accounts on the Chef Infra Server, even
for organizations to which Alice is not a member. From a workstation, Alice re-runs the
following command:
Copy
knife user list -c ~/.chef/alice.rb
Copy
pivotal
alice
bob
carol
dan
Alice is now a server administrator and can use the following knife subcommands to
manage users on the Chef Infra Server:
knife user-create
98
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
knife user-delete
knife user-edit
knife user-list
knife user-show
Copy
knife user edit carol -c ~/.chef/alice.rb
and the $EDITOR opens in which Alice makes changes, and then saves them.
Superuser Accounts
Superuser accounts may not be managed by users who belong to the server-admins group.
For example, Alice attempts to delete the pivotal superuser account:
Copy
knife user delete pivotal -c ~/.chef/alice.rb
Copy
ERROR: You authenticated successfully to <chef_server_url> as user1
but you are not authorized for this action
Response: Missing read permission
chef-server-ctl grant-server-admin-permissions
chef-server-ctl list-server-admins
chef-server-ctl remove-server-admin-permissions
Add Members
The grant-server-admin-permissions subcommand is used to add a user to the server-
admins group. Run the command once for each user added.
Copy
chef-server-ctl grant-server-admin-permissions USER_NAME
For example:
Copy
chef-server-ctl grant-server-admin-permissions bob
99
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
returns:
Copy
User bob was added to server-admins. This user can now list,
read, and create users (even for orgs they are not members of)
for this Chef Infra Server.
Remove Members
The remove-server-admin-permissions subcommand is used to remove a user from
the server-admins group. Run the command once for each user removed.
Copy
chef-server-ctl remove-server-admin-permissions USER_NAME
where USER_NAME is the user to remove from the list of server administrators.
For example:
Copy
chef-server-ctl remove-server-admin-permissions bob
returns:
Copy
User bob was removed from server-admins. This user can no longer
list, read, and create users for this Chef Infra Server except for where
they have default permissions (such as within an org).
List Membership
The list-server-admins subcommand is used to return a list of users who are members of
the server-admins group.
Copy
chef-server-ctl list-server-admins
Copy
pivotal
alice
bob
carol
dan
Manage Organizations
Use the org-create , org-delete , org-list , org-show , org-user-add and org-user-
remove commands to manage organizations.
100
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
org-create
The org-create subcommand is used to create an organization. (The validation key for the
organization is returned to STDOUT when creating an organization with this command.)
Syntax
This subcommand has the following syntax:
Copy
chef-server-ctl org-create ORG_NAME "ORG_FULL_NAME" (options)
where:
The name must begin with a lower-case letter or digit, may only contain lower-case
letters, digits, hyphens, and underscores, and must be between 1 and 255
characters. For example: chef .
The full name must begin with a non-white space character and must be between 1
and 1023 characters. For example: "Chef Software, Inc." .
Options
This subcommand has the following options:
org-delete
The org-delete subcommand is used to delete an organization.
Syntax
This subcommand has the following syntax:
Copy
chef-server-ctl org-delete ORG_NAME
org-list
The org-list subcommand is used to list all of the organizations currently present on the
Chef Infra Server.
Syntax
This subcommand has the following syntax:
Copy
101
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
chef-server-ctl org-list (options)
Options
This subcommand has the following options:
-a , --all-orgs
-w , --with-uri
org-show
The org-show subcommand is used to show the details for an organization.
Syntax
This subcommand has the following syntax:
Copy
chef-server-ctl org-show ORG_NAME
org-user-add
The org-user-add subcommand is used to add a user to an organization.
Syntax
This subcommand has the following syntax:
Copy
chef-server-ctl org-user-add ORG_NAME USER_NAME (options)
Options
This subcommand has the following options:
--admin
org-user-remove
The org-user-remove subcommand is used to remove a user from an organization.
Syntax
This subcommand has the following syntax:
Copy
chef-server-ctl org-user-remove ORG_NAME USER_NAME (options)
102
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Test Node Setup: Create a server and add to organization,
check node details using knife.
Chef is a tool used for Configuration Management which
closely competes with Puppet. Chef is an automation tool that
provides a way to define infrastructure as code.
1. Install chef-client
Either use the https://www.chef.io/chef/install.sh script or download and install the correct chef-
client package for your OS.
2. Create /etc/chef/client.rb
Perhaps you can use one of your bootstrapped nodes as a reference. The important bit is that
you have chef_server_url pointing at your Chef server.
Example:
/etc/chef/client.rb
chef_server_url "https://mychefserver.myorg.com/organizations/myorg"
validation_client_name "myorg-validator"
validation_key "/etc/chef/myorg-validator.pem"
log_level :info
mkdir /etc/chef/trusted_certs
knife ssl fetch -c /etc/chef/client.rb
103
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Node Objects and Search: How to Add Run list to Node Check
node Details.
A run-list defines all of the information necessary for Chef to configure a node
into the desired state. A run-list is:
An ordered list of roles and/or recipes that are run in the exact order
defined in the run-list; if a recipe appears more than once in the run-list,
Chef Infra Client will not run it twice
Always specific to the node on which it runs; nodes may have a run -list
that is identical to the run-list used by other nodes
Stored as part of the node object on the Chef server
Maintained using knife and then uploaded from the workstation to the Chef
Infra Server, or maintained using Chef Automate
Run-list Format
Copy
"role[NAME]"
or
Copy
"recipe[COOKBOOK::RECIPE]"
Use a comma to separate roles and recipes when adding more than one item the
run-list:
Copy
"recipe[COOKBOOK::RECIPE],COOKBOOK::RECIPE,role[NAME]"
Empty Run-lists
Use an empty run-list to determine if a failed Chef Infra Client run has anything
to do with the recipes that are defined within that run-list. This is a quick way to
discover if the underlying cause of a Chef Infra Client run failure is a
configuration issue. If a failure persists even if the run-list is empty, check the
following:
104
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Knife Commands
The following knife commands may be used to manage run-lists on the Chef Infra
Server.
Quotes, Windows
When running knife from the command prompt, a string should be surrounded by
single quotes ( ' '). For example:
Copy
knife node run_list set test-node 'recipe[iptables]'
Copy
knife node run_list set test-node '''recipe[iptables]'''
Import-Module chef
The Chef Infra Client 12.4 release adds an optional feature to the Microsoft
Installer Package (MSI) for Chef. This feature enables the ability to pass quoted
strings from the Windows PowerShell command line without the need for triple
single quotes (''' '''). This feature installs a Windows PowerShell module
(typically in C:\opscode\chef\modules ) that is also appended to
the PSModulePath environment variable. This feature is not enabled by default.
To activate this feature, run the following command from within Windows
PowerShell:
Copy
Import-Module chef
or add Import-Module chef to the profile for Windows PowerShell located at:
Copy
~\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
This module exports cmdlets that have the same name as the command-line
tools—chef-client, knife—that are built into Chef.
For example:
105
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
knife exec -E 'puts ARGV' """&s0meth1ng"""
is now:
Copy
knife exec -E 'puts ARGV' '&s0meth1ng'
and:
Copy
knife node run_list set test-node '''role[ssssssomething]'''
is now:
Copy
knife node run_list set test-node 'role[ssssssomething]'
To remove this feature, run the following command from within Windows
PowerShell:
Copy
Remove-Module chef
run_list add
Use the run_list add argument to add run-list items (roles or recipes) to a
node.
Copy
"role[NAME]"
or
Copy
"recipe[COOKBOOK::RECIPE]"
Use a comma to separate roles and recipes when adding more than one item the
run-list:
Copy
"recipe[COOKBOOK::RECIPE],COOKBOOK::RECIPE,role[NAME]"
Syntax
106
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
This argument has the following syntax:
Copy
knife node run_list add NODE_NAME RUN_LIST_ITEM (options)
Options
Note
See config.rb for more information about how to add certain knife options as
settings in the config.rb file.
Examples
ADD A ROLE
Copy
knife node run_list add NODE_NAME 'role[ROLE_NAME]'
ADD ROLES AND RECIPES
Copy
knife node run_list add NODE_NAME
'recipe[COOKBOOK::RECIPE_NAME],recipe[COOKBOOK::RECIPE_NAME],role[ROLE
_NAME]'
ADD A RECIPE WITH A FQDN
Copy
knife node run_list add NODE_NAME 'recipe[COOKBOOK::RECIPE_NAME]'
ADD A RECIPE WITH A COOKBOOK
107
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To add a recipe to a run-list using the cookbook format, enter:
Copy
knife node run_list add NODE_NAME 'COOKBOOK::RECIPE_NAME'
ADD THE DEFAULT RECIPE
Copy
knife node run_list add NODE_NAME 'COOKBOOK'
run_list remove
Use the run_list remove argument to remove run-list items (roles or recipes)
from a node. A recipe must be in one of the following formats: fully qualified,
cookbook, or default. Both roles and recipes must be in quotes, for
example: 'role[ROLE_NAME]' or 'recipe[COOKBOOK::RECIPE_NAME]' . Use a
comma to separate roles and recipes when removing more than one, like
this: 'recipe[COOKBOOK::RECIPE_NAME],COOKBOOK::RECIPE_NAME,role[ROLE_NAM
E]'.
Syntax
Copy
knife node run_list remove NODE_NAME RUN_LIST_ITEM
Options
Note
See config.rb for more information about how to add certain knife options as
settings in the config.rb file.
Examples
REMOVE A ROLE
Copy
knife node run_list remove NODE_NAME 'role[ROLE_NAME]'
REMOVE A RUN-LIST
108
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To remove a recipe from a run-list using the fully qualified format, enter:
Copy
knife node run_list remove NODE_NAME 'recipe[COOKBOOK::RECIPE_NAME]'
run_list set
Use the run_list set argument to set the run-list for a node. A recipe must be
in one of the following formats: fully qualified, cookbook, or default. Both roles
and recipes must be in quotes, for
example: "role[ROLE_NAME]" or "recipe[COOKBOOK::RECIPE_NAME]" . Use a
comma to separate roles and recipes when setting more than one, like
this: "recipe[COOKBOOK::RECIPE_NAME],COOKBOOK::RECIPE_NAME,role[ROLE_NAM
E]".
Syntax
Copy
knife node run_list set NODE_NAME RUN_LIST_ITEM
Options
Examples
None.
status
The following examples show how to use the knife status subcommand to
verify the status of run-lists.
Copy
knife status --run-list
Copy
20 hours ago, dev-vm.chisamore.com, ubuntu 10.04, dev-
vm.chisamore.com, 10.66.44.126, role[lb].
109
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
3 hours ago, i-225f954f, ubuntu 10.04, ec2-67-202-63-102.compute-
1.amazonaws.com, 67.202.63.102, role[web].
3 hours ago, i-a45298c9, ubuntu 10.04, ec2-174-129-127-206.compute-
1.amazonaws.com, 174.129.127.206, role[web].
3 hours ago, i-5272a43f, ubuntu 10.04, ec2-184-73-9-250.compute-
1.amazonaws.com, 184.73.9.250, role[web].
3 hours ago, i-226ca64f, ubuntu 10.04, ec2-75-101-240-230.compute-
1.amazonaws.com, 75.101.240.230, role[web].
3 hours ago, i-f65c969b, ubuntu 10.04, ec2-184-73-60-141.compute-
1.amazonaws.com, 184.73.60.141, role[web].
View status using a query
To show the status of a subset of nodes that are returned by a specific query,
enter:
Copy
knife status "role:web" --run-list
Copy
3 hours ago, i-225f954f, ubuntu 10.04, ec2-67-202-63-102.compute-
1.amazonaws.com, 67.202.63.102, role[web].
3 hours ago, i-a45298c9, ubuntu 10.04, ec2-174-129-127-206.compute-
1.amazonaws.com, 174.129.127.206, role[web].
3 hours ago, i-5272a43f, ubuntu 10.04, ec2-184-73-9-250.compute-
1.amazonaws.com, 184.73.9.250, role[web].
3 hours ago, i-226ca64f, ubuntu 10.04, ec2-75-101-240-230.compute-
1.amazonaws.com, 75.101.240.230, role[web].
3 hours ago, i-f65c969b, ubuntu 10.04, ec2-184-73-60-141.compute-
1.amazonaws.com, 184.73.60.141, role[web].
Run-lists, Applied
A run-list will tell Chef Infra Client what to do when bootstrapping that node for
the first time, and then how to configure that node on every subsequent Chef
Infra Client run.
Bootstrap Operations
The knife bootstrap command is a common way to install Chef Infra Client on a
node. The default for this approach assumes that a node can access the Chef
website so that it may download the Chef Infra Client package from that location.
The Chef Infra Client installer will detect the version of the operating system, and
then install the appropriate Chef Infra Client version using a single command to
110
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
install Chef Infra Client and all of its dependencies, including an embedded
version of Ruby, OpenSSL, parsers, libraries, and command line utilities.
The Chef Infra Client installer puts everything into a unique directory
(/opt/chef/ ) so that Chef Infra Client will not interfere with other applications
that may be running on the target machine. Once installed, Chef Infra Client
requires a few more configuration steps before it can perform its first Chef Infra
Client run on a node.
A node is any physical, virtual, or cloud device that is configured and maintained
by an instance of Chef Infra Client. Bootstrapping installs Chef Infra Client on a
target system so that it can run as a client and sets the node up to communicate
with a Chef Infra Server. There are two ways to do this:
The following diagram shows the stages of the bootstrap operation, and the list
below the diagram describes each of those stages in greater detail.
111
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
During a knife bootstrap bootstrap operation, the following happens:
Stages Description
knife Enter the knife bootstrap subcommand from a workstation. Include the
bootstrap hostname, IP address, or FQDN of the target node as part of this command.
Knife will establish an SSH or WinRM connection with the target system
and run a bootstrap script.
Get the install The shell script will make a request to the Chef website to get the most
script from recent version of a the Chef Infra Client install
Chef script( install.sh or install.ps1 ).
Get the Chef The install script then gathers system-specific information and determines
Infra Client the correct package for Chef Infra Client, and then downloads the
package from appropriate package from omnitruck-direct.chef.io .
Chef
Install Chef Chef Infra Client is installed on the target node using a system native
Infra Client package (.rpm, .msi, etc).
Start a Chef On UNIX and Linux-based machines: The second shell script executes
Infra Client the chef-client binary with a set of initial settings stored within first-
run boot.json on the node. first-boot.json is generated from the workstation
as part of the initial knife bootstrap subcommand.
On Windows machines: The batch file that is derived from the windows -
chef-client-msi.erb bootstrap template executes the chef-client binary
with a set of initial settings stored within first-boot.json on the
node. first-boot.json is generated from the workstation as part of the
initial knife bootstrap subcommand.
Complete a a Chef Infra Client run proceeds, using HTTPS (port 443), and registers the
Chef Infra node with the Chef Infra Server.
Client run
The first Chef Infra Client run, by default, contains an empty run -list. A run-
list can be specified as part of the initial bootstrap operation using the --
run-list option as part of the knife bootstrap subcommand.
The Chef Infra Client Run
A “Chef Infra Client run” is the term used to describe the steps Chef Infra Client takes to
configure a node when the chef-client command is run. The following diagram shows the
various stages that occur during a Chef Infra Client run.
112
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Stages Description
Get configuration Chef Infra Client gets process configuration data from
data the client.rb file on the node, and then gets node configuration data
113
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
from Ohai. One important piece of configuration data is the name of
the node, which is found in the node_name attribute in the client.rb
file or is provided by Ohai. If Ohai provides the name of a node, it is
typically the FQDN for the node, which is always unique within an
organization.
Authenticate to the Chef Infra Client authenticates to the Chef Infra Server using an RSA
Chef Infra Server private key and the Chef Infra Server API. The name of the node is
required as part of the authentication process to the Chef Infra
Server. If this is the first Chef Infra Client run for a node, the chef-
validator will be used to generate the RSA private key.
Get, rebuild the Chef Infra Client pulls down the node object from the Chef Infra
node object Server and then rebuilds it. A node object is made up of the system
attributes discovered by Ohai, the attributes set in Policyfiles or
Cookbooks, and the run list of cookbooks. The first time Chef Infra
Client runs on a node, it creates a node object from the default run-
list. A node that has not yet had a Chef Infra Client run will not have
a node object or a Chef Infra Server entry for a node object. On any
subsequent Chef Infra Client runs, the rebuilt node object will also
contain the run-list from the previous Chef Infra Client run.
Expand the run-list Chef Infra Client expands the run-list from the rebuilt node object
and compiles a complete list of recipes in the exact order that they
will be applied to the node.
Synchronize Chef Infra Client requests all the cookbook files (including recipes,
cookbooks templates, resources, providers, attributes, and libraries) that it
needs for every action identified in the run-list from the Chef Infra
Server. The Chef Infra Server responds to Chef Infra Client with the
complete list of files. Chef Infra Client compares the list of files to
the files that already exist on the node from previous runs, and then
downloads a copy of every new or modified file from the Chef Infra
Server.
Reset node All attributes in the rebuilt node object are reset. All attributes from
attributes attribute files, Policyfiles, and Ohai are loaded. Attributes that are
defined in attribute files are first loaded according to cookbook
order. For each cookbook, attributes in the default.rb file are
loaded first, and then additional attribute files (if present) are loaded
in lexical sort order. If attribute files are found within any cookbooks
that are listed as dependencies in the metadata.rb file, these are
loaded as well. All attributes in the rebuilt node object are updated
with the attribute data according to attribute precedence. When all
the attributes are updated, the rebuilt node object is complete.
Compile the Chef Infra Client identifies each resource in the node object and
resource collection builds the resource collection. Libraries are loaded first to ensure
that all language extensions and Ruby classes are available to all
resources. Next, attributes are loaded, followed by custom
resources. Finally, all recipes are loaded in the order specified by the
expanded run-list. This is also referred to as the "compile phase".
Converge the node Chef Infra Client configures the system based on the information
that has been collected. Each resource is executed in the order
identified by the run-list, and then by the order in which each
resource is listed in each recipe. Each resource defines an action to
run, which configures a specific part of the system. This process is
also referred to as convergence. This is also referred to as the
"execution phase".
Update the node When all the actions identified by resources in the resource
object, process collection have been done and Chef Infra Client finishes
exception and successfully, then Chef Infra Client updates the node object on the
114
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
report handlers Chef Infra Server with the node object built during a Chef Infra Client
run. (This node object will be pulled down by Chef Infra Client during
the next Chef Infra Client run.) This makes the node object (an d the
data in the node object) available for search.
Chef Infra Client always checks the resource collection for the
presence of exception and report handlers. If any are present, each
one is processed appropriately.
Get, run Chef After the Chef Infra Client run finishes, it begins the Compliance
InSpec Compliance Phase, which is a Chef InSpec run within the Chef Infra Client. Chef
Profiles InSpec retrieves tests from either a legacy audit cookbook or a
current InSpec profile.
Send or Save When all the InSpec tests finish running, Chef InSpec checks the
Compliance Report reporting handlers defined in the legacy audit cookbook or in a
current InSpec profile and processes them appropriately.
Stop, wait for the When everything is configured and the Chef Infra Client run is
next run complete, Chef Infra Client stops and waits until the next time it is
asked to run.
115
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
It is always a good idea to have a separate environment for development, testing, and
production. Chef enables grouping nodes into separate environments to support an ordered
development flow.
For instance, one environment may be called "testing" and another may be called
"production". Since you don't want any code that is still in testing on your production
machines, each machine can only be in one environment. You can then have one
configuration for machines in your testing environment, and a completely different
configuration for computers in production.
Additional environments can be created to reflect each organization‟s patterns and workflow.
For example, creating production, staging, testing, and development environments.
Generally, an environment is also associated with one (or more) cookbook versions.
116
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
override_attributes 'node' => { 'attribute' => [ 'value', 'value',
'etc.' ] }
1. Creating a Ruby file in the environments sub-directory of the chef-repo and then
pushing it to the Chef server
2. Creating a JSON file directly in the chef-repo and then pushing it to the Chef server
3. Using knife
4. Using the Chef management console web user interface
5. Using the Chef server REST API
{
"name": "development",
"description": "",
"cookbook_versions": {
},
"json_class": "Chef::Environment",
"chef_type": "environment",
"default_attributes": {
},
"override_attributes": {
}
}
Creating an environment
To create an environment:
1. Select Deployment Automation Environments.
2. Select New + to add an environment. The New Environment dialog appears.
3. Select Create New or From traditional application… based on your requirements. If you
selected From traditional application choose an application from the list.
4. Enter a name into the Name field.
5. Select a project for the new environment and enter a description of the environment.
You can include hyperlinks as part of an object description for any CloudBees CD/RO
object.
6. Select OK.
The Environment Editor opens.
From here, you can create a tier with resources or a cluster with nodes.
118
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Creating an environment tier
Environment tiers are used with traditional applications.
1. Define Tier 1:
a. Select Details from the vertical dots menu.
b. On the Details tab, enter the name and optional description.
c. On the Capabilities tab, optionally define a capability.
d. Select OK.
2. Assign resources it:
a. Select + in Resources block.
b. In the New dialog box, click Add resources or Add resource pool. The Resources list
or Resource Pools list, respectively, displays.
c. Select one or more enabled resources or resource pools for this environment and then
click OK.
The Environment editor now has an environment tier called mysql with one resource.
Example:
119
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
a. Select the New Utility Resource button (with wrench and hammer) in the upper right.
The New dialog appears.
b. Select Add resources or Add resource pool. The Resources list or Resource Pools list,
respectively, displays.
c. Select one or more enabled resources or resource pools for this environment and then
select OK.
The new Utility Resource tile now appears in the editor field.
Chef helps in performing environment specific configuration. It is always a good idea to have
a separate environment for development, testing, and production.
Chef enables grouping nodes into separate environments to support an ordered
development flow.
Creating an Environment
Creation of environment on the fly can be done using the knife utility. Following command
will open a Shell’s default editor, so that one can modify the environment definition.
vipin@laptop:~/chef-repo $ knife environment create book {
"name": "book",
"description": "",
"cookbook_versions": {
},
"json_class": "Chef::Environment",
"chef_type": "environment",
"default_attributes": {
},
"override_attributes": {
}
}
Created book
120
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
_default Environment
Each organization will always start with at least a single environment called default
environment, which is always available to the Chef server. A default environment cannot be
modified in anyway. Any kind of changes can only be accommodated in the custom
environment that we create.
Environment Attributes
An attribute can be defined in an environment and then used to override the default settings
in the node. When the Chef client run takes place, then these attributes are compared with
the default attributes that are already present in the node. When the environment attributes
take precedence over the default attributes, Chef client will apply these settings and values
when the Chef client run takes place on each node.
An environment attribute can only be either default_attribute or override_attribute. It cannot
be a normal attribute. One can use default_attribute or override_attribute methods.
Attribute Type
Default − A default attribute is always reset at the start of every Chef client run and have the
lowest attribute precedence.
Override − An override attribute is always reset at the start of every Chef client run and has
a higher attribute precedence than default, force_default and normal. An override attribute is
most often defined in the recipe but can also be specified in an attribute file for a role or for
an environment.
121
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
122
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Roles: Create roles, Add Roles to organization.
What is Role?
A role is a way to define certain patterns and processes that exist across nodes in an organization as
belonging to a single job function. Each role consists of zero (or more) attributes and a run-list. Each
node can have zero (or more) roles assigned to it. When a role is run against a node, the configuration
details of that node are compared against the attributes of the role, and then the contents of that role‘s
run-list are applied to the node‘s configuration details. When a chef-client runs, it merges its own
attributes and run-lists with those contained within each assigned role.
Example
name "web_servers"
description "This role contains nodes, which act as web servers"
run_list "recipe[webserver]"
default_attributes 'ntp' => {
'ntpdate' => {
'disable' => true
}
}
Let’s download the role from the Chef server so we have it locally in a Chef repository.
> knife role show client1 -d -Fjson > roles/client1.json
123
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Now, Lets bootstrap the node using knife with roles
> knife bootstrap --run-list "role[webserver]" --sudo hostname
How to edit the roles in chef Server?
> knife role edit client1
Method 2: In local repo under chef-repo folder
> vi webserver.rb
example –
name "web_servers"
description "This role contains nodes, which act as web servers"
run_list "recipe[webserver]"
default_attributes 'ntp' => {
'ntpdate' => {
'disable' => true
}
}
& Then upload to chef server using following commands.
124
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To find the uptime of all of web servers running Ubuntu on the Amazon EC2 platform, enter:
125
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
How it works
You define a role in a Ruby file inside the roles folder of your Chef repository. A role consists of
a name attribute and a description attribute. Additionally, a role usually contains a role-specific run
list and role-specific attribute settings.
Every node, which has a role in its run list, will have the role‘s run list expanded into its own. This
means that all the recipes (and roles), which are in the role‘s run list, will be executed on your nodes.
You need to upload your role on your Chef server by using the knife role from
file command.
Only then should you add the role to your node‘s run list.
Running the Chef client on a node having your role in its run list will execute all the recipes listed in
the role.
126
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Chef Attributes with Roles
]
},
"policy_name": null,
"policy_group": null,
"run_list": [
"role[web-role]"
]
127
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Attributes: Understanding of Attributes, Creating Custom
Attributes, Defining in Cookbooks.
About Attributes
An attribute is a specific detail about a node. Attributes are used by Chef Infra Client to
understand:
During every Chef Infra Client run, Chef Infra Client builds the attribute list using:
After the node object is rebuilt, all of the attributes are compared, and then the node is
updated based on attribute precedence. At the end of every Chef Infra Client run, the node
object that defines the current state of the node is uploaded to the Chef Infra Server so
that it can be indexed for search.
128
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
force_override: Use the force_override attribute to ensure that an attribute defined in a
cookbook (by an attribute file or by a recipe) takes precedence over an override attribute set by
a role or an environment.
automatic: An automatic attribute contains data that is identified by Ohai at the beginning of
every chef-client run. An automatic attribute cannot be modified and always has the highest
attribute precedence.
Attributes
Sometimes you might use hard-coded values (for example, directory name, filename, username, etc.)
at multiple locations inside your recipes. Later when you want to change this value, it becomes a
tedious process, as you have to browse through all the recipes that contains this value and change them
accordingly.
Instead, you can define the hard-code value as variable inside an attribute file, and use the attribute
name inside the recipe. This way when you want to change the value, you are changing only at one
place in the attribute file.
These are the different attribute types available: default, force_default, normal, override,
force_override, automatic
Inside your cookbook, for most situations, you‘ll be using the default attribute type.
The following is a sample attribute file, where I‘ve defined mysql related hard-coded values that I
need to eventually use in multiple recipes. In this example, the attribute file was created under ~/chef-
repo/cookbooks/thegeekstuff/attributes directory.
default['mysql']['dir'] = '/data/mysql'
default['mysql']['username'] = 'dbadmin'
default['mysql']['dbname'] = ‗devdb‘
Resources
You‘ll see the resources directory only from Chef 12.5 version and above.
Chef provides several built in resources for you to use. For example, using chef‘s built-in resource you
can manage packages, services, files, directories on your system.
But, if you have a complex requirement that is specific to your application or tool, you can create your
own custom resource, and place them under the resources directory. Once you place your custom
resource under this directory, you can use them in your recipes just like how you would use any other
chef‘s build-in recipes.
The following is a simple custom resource example. This file was created under ~/chef-
repo/cookbooks/thegeekstuff/resources directory.
There are three parts to this example: 1) Declare custom properties in the beginning 2) Load current
property values 3) Create action blocks for this custom resource.
property :myapp_name, String, default: 'Default Name for My App'
load_current_value do
# write code to load the current value for your properties
end
129
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
action :create do
file '/home/tomcat/myapp/config.cfg' do
content 'location=west'
end
# Write additional code to define other aspects of your app creation
end
action :delete do
#write code to delete your application
end
Definition
If you are using chef-client 12.5 or above, Chef recommends that you don‘t use definitions anymore,
and use the custom resources (which is explained above). Definition might be deprecated in future
version.
Definition is similar to a compile-time macros that can be used in multiple recipes.
Definitions are processed when the resources are compiled, and these are not same as resources as
definition don‘t support properties like only_if, not_if, etc.
The following is a simple definition example. In this example, app_config is the definition resource
name. This definition file was created under ~/chef-repo/cookbooks/thegeekstuff/definitions directory.
define :app_config do
file '/home/tomcat/myapp/config.cfg' do
content 'location=west'
end
end
Files
If you want certain files to be copied over to all your remote nodes as part of Chef deployment, you
can copy those files over to ―files‖ directory under your cookbook.
A particular file that is located under the files directory can be copied over to one or more remote
nodes using cookbook_file resource.
In the following example, I want to copy the dblogin.php file to the remote server. In this case, I might
create a recipe using cookbook_file resource as shown below.
First, make sure you copy the file mentioned in the source property (i.e dblogin.php) in the following
recipe to your ~/chef-repo/cookbooks/{your-cookbook-name}/files directory.
cookbook_file '/home/tomcat/myapp/login/dblogin.php' do
source 'dblogin.php'
owner 'tomcat'
group 'tomcat'
mode '0755'
action :create
end
Libraries
130
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
All custom library files that you create for a specific cookbook should be placed under the ―libraries‖
directory.
Library files should be written in ruby language.
You can write a library code that can either change the behavior of some of the existing chef
functionality, or to create a new functionality that is not currently satisfied by any of the existing chef
resources.
Basically you‘ll use libraries to create a custom chef resource that will solve some specific problem
based on your requirement.
You can start a brand new library without extending an existing library by simply starting your library
code (for example: MyCustomLibrary) with this line at the beginning: class MyCustomLibrary
If you are existing an existing Chef functionality, you should extend those appropriate Chef classes. In
the following example, we are extending Chef database resource. This is just the first few lines of a
custom library code that shows how you start the library file definition.
class Chef
class Resource
class MyDBResource < Chef::Resource::Database
..
Providers
All the providers that you write for your particular custom requirement should go under providers
directory.
You‘ll use custom providers when you want to inform chef-client how to manage a specific action.
You can define multiple actions for your custom provider, and inform chef-client how to manage
those actions.
You‘ll typically use custom provider when you are using LWRP (lightweight resource providers), in
which case, you‘ll first define a custom resource with your own set of actions, and then you‘ll write
custom providers, where you‘ll write ruby code to tell chef-client what exactly needs to be done for
those actions.
For example, a custom provider ( dbcluster.rb ) file located under ―providers‖ directory might have
few custom actions defined as shown below.
action :check do
..
end
action :setmaster do
..
end
Recipes
Chef recipe is the heart and soul of Chef functionality. This is where you‘ll specify all configurations
and setups that you want to be executed on your remote servers (nodes).
Recipe are written in Ruby language. Recipe will be stored inside a cookbook.
You can have multiple recipes (which is recommended for complex system configuration) inside one
cookbook.
131
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
You can also include an existing recipe in your current recipe. This way you can create a new recipe
that is dependent on another recipe.
In simple terms, recipes are bunch of Chef resources that you‘ll call to setup a configuration. For every
resource that you include in your recipe, you can specify what actions from that resource you want to
be executed, and you can also set appropriate attribute values, etc.
You can also write your own custom logic using Ruby inside the Chef recipe for a specific resources
that you are calling.
Everything that you write inside a recipe is executed sequentially.
When you have multiple recipes inside a cookbook, the order in which these recipes will be executed
can be specified using a run-list.
The following is a simple recipe example file ( mysetup.rb ) that can be placed under ―recipes‖
directory which will install the given packages and start the httpd services on a remote node where this
recipe is executed.
package [‗httpd‘, 'gcc', 'gcc-c++', 'nfs-utils'] do
action :install
end
service 'httpd' do
action [:enable, :start]
end
Templates
Template is similar to files, but the major difference is that using template we can dynamically
generate static text files.
Inside template we can have ruby programming language statements, which can then be used to
generate some content dynamically.
Chef templates are just an ERB, which is embedded ruby template.
To use a template, first create the template files using ERB and place it under the ―templates‖
directory.
Also, inside the cookbook recipe we should use template resource to call our template.
For example, place the following index.html.erb template file under ~/chef-repo/cookbooks/{your-
cookbook-name}/templates/default/ directory.
<html>
<body>
<h1>Hello world on <%= node[:fqdn] %></h1>
</body>
</html>
Cookbook Doc Files
When you create cookbook, it creates these two documentation files the top-level of your cookbook
directory: 1) README.md 2) CHANGELOG.md
Maintaining these two files are very important when multiple people are working on your cookbook.
132
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The first file README.md is where you‘ll document everything about your cookbook. By default,
when a cookbook is created, it already gives an excellent template inside the README.md file for
you to start your documentation.
Once you‘ve deployed a cookbook on production, as a best practice, any changes that you make to
your cookbook after that should have a updated new version number. Inside the CHANGELOG.md
file, you‘ll document specifically what changes were done in each and every version of your
cookbook.
Both README.md and CHANGELOG.md file uses the Markdown template format.
Cookbook Metadata File
As the name suggests, metadata.rb file is used to store certain metadata information about your
cookbook. This metadata.rb file is located at the top-level of the cookbook directory. I.e ~/chef-
repo/cookbooks/{your-cookbook-name}/metadata.rb
The information inside the metadata file is used by the chef server to make sure it deploys the correct
cookbook versions on the individual remote nodes.
When you upload your cookbook to the Chef server, the metadata file is compiled and stored in the
Chef server as JSON file.
By default when you create a cookbook using knife command, it generates the metadata.rb file.
There are certain metadata parameters that you can use inside this file. For example, you can specify
the current version number of your cookbook. You can also specify which chef-client versions will be
supported by this cookbook, etc. The following example shows a partial metadata.rb file.
name 'devdb'
maintainer_email '[email protected]'
description 'Setup the Development DB server'
version '2.5.1'
chef_version ">= 12.9"
Attribute Sources
Chef Infra Client evaluates attributes in the order that they are defined in the run -list,
including any attributes that are in the run-list as cookbook dependencies.
Attributes are provided to Chef Infra Client from the following locations:
JSON files passed using the chef-client -j
Nodes (collected by Ohai at the start of each Chef Infra Client run)
Attribute files (in cookbooks)
Recipes (in cookbooks)
Environments
Roles
Policyfiles
Notes:
Many attributes are maintained in the chef-repo for Policyfiles, environments, roles,
and cookbooks (attribute files and recipes)
133
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Many attributes are collected by Ohai on each individual node at the start of every Chef
Infra Client run
The attributes that are maintained in the chef-repo are uploaded to the Chef Infra Server
from the workstation, periodically
Chef Infra Client will pull down the node object from the Chef Infra Server and then
reset all the attributes except normal . The node object will contain the attribute data
from the previous Chef Infra Client run including attributes set with JSON files using -
j.
Chef Infra Client will update the cookbooks on the node (if required), which updates
the attributes contained in attribute files and recipes
Chef Infra Client will update the role and environment data (if required)
Chef Infra Client will rebuild the attribute list and apply attribute precedence while
configuring the node
Chef Infra Client pushes the node object to the Chef Infra Server at the end of a Chef
Infra Client run; the updated node object on the Chef Infra Server is then indexed for
search and is stored until the next Chef Infra Client run
Automatic Attributes (Ohai)
An automatic attribute is a specific detail about a node, such as an IP address, a host name, a
list of loaded kernel modules, and so on. Automatic attributes are detected by Ohai and are
then used by Chef Infra Client to ensure that they are handled properly during every Chef Infra
Client run. The most commonly accessed automatic attributes are:
Attribute Description
node['platform'] The platform on which a node is running. This attribute helps determine
which providers will be used.
node['platform_family'] The platform family is a Chef Infra specific grouping of similar platforms
where cookbook code can often be shared. For example, `rhel` includes
Red Hat Linux, Oracle Linux, CentOS, and several other platforms that
are almost identical to Red Hat Linux.
node['platform_version'] The version of the platform. This attribute helps determine which
providers will be used.
node['ipaddress'] The IP address for a node. If the node has a default route, this is the IPV4
address for the interface. If the node does not have a default route, the
value for this attribute should be nil. The IP address for default route is
the recommended default value.
node['macaddress'] The MAC address for a node, determined by the same interface that
detects the node['ipaddress'].
node['fqdn'] The fully qualified domain name for a node. This is used as the name of a
node unless otherwise set.
node['hostname'] The host name for the node.
node['domain'] The domain for the node.
node['recipes'] A list of recipes associated with a node (and part of that node's run-list).
134
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
node['roles'] A list of roles associated with a node (and part of that node's run-list).
node['ohai_time'] The time at which Ohai was last run. This attribute is not commonly used
in recipes, but it is saved to the Chef Infra Server and can be accessed
using the knife status subcommand.
Ohai collects a list of automatic attributes at the start of each Chef Infra Client run. This list
will vary from organization to organization, by server type, and by the platform that runs those
servers. All the attributes collected by Ohai are unmodifiable by Chef Infra Client. Run
the ohai command on a system to see which automatic attributes Ohai has collected for a
particular node.
Attribute Files
An attribute file is located in the attributes/ sub-directory for a cookbook. When a cookbook is
run against a node, the attributes contained in all attribute files are evaluated in the context of
the node object. Node methods (when present) are used to set attribute values on a node. For
example, the apache2 cookbook contains an attribute file called default.rb , which contains the
following attributes:
Copy
default['apache']['dir'] = '/etc/apache2'
default['apache']['listen_ports'] = [ '80','443' ]
The use of the node object ( node) is implicit in the previous example; the following example
defines the node object itself as part of the attribute:
Copy
node.default['apache']['dir'] = '/etc/apache2'
node.default['apache']['listen_ports'] = [ '80','443' ]
Another (much less common) approach is to set a value only if an attribute has no value. This
can be done by using the _unless variants of the attribute priority methods:
default_unless
normal_unless
Use the _unless variants carefully (and only when necessary) because when they are used,
attributes applied to nodes may become out of sync with the values in the cookbooks as these
cookbooks are updated. This approach can create situations where two otherwise identical
nodes end up having slightly different configurations and can also be a challenge to debug.
File Methods
Use the following methods within the attributes file for a cookbook or within a recipe. These
methods correspond to the attribute type of the same name:
override
default
normal
_unless
attribute?
135
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
A useful method that is related to attributes is the attribute? method. This method will check
for the existence of an attribute, so that processing can be done in an attributes file or recipe,
but only if a specific attribute exists.
Using attribute?() in an attributes file:
Copy
if attribute?('ec2')
# ... set stuff related to EC2
end
Using attribute?() in a recipe:
Copy
if node.attribute?('ec2')
# ... do stuff on EC2 nodes
end
Attributes from Recipes
A recipe is the most fundamental configuration element within the organization. A recipe:
Is authored using Ruby, which is a programming language designed to read and behave
in a predictable manner
Is mostly a collection of resources, defined using patterns (resource names, attribute-
value pairs, and actions); helper code is added around this using Ruby, when needed
Must define everything that is required to configure part of a system
Must be stored in a cookbook
May be included in another recipe
May use the results of a search query and read the contents of a d ata bag (including an
encrypted data bag)
May have a dependency on one (or more) recipes
Must be added to a run-list before it can be used by Chef Infra Client
Is always executed in the same order as listed in a run-list
An attribute can be defined in a cookbook (or a recipe) and then used to override the default
settings on a node. When a cookbook is loaded during a Chef Infra Client run, these attributes
are compared to the attributes that are already present on the node. Attributes that are defined
in attribute files are first loaded according to cookbook order. For each cookbook, attributes in
the default.rb file are loaded first, and then additional attribute files (if present) are loaded in
lexical sort order. When the cookbook attributes take preceden ce over the default attributes,
Chef Infra Client applies those new settings and values during a Chef Infra Client run on the
node.
Attributes from Roles
A role is a way to define certain patterns and processes that exist across nodes in an
organization as belonging to a single job function. Each role consists of zero (or more)
attributes and a run-list. Each node can have zero (or more) roles assigned to it. When a role is
136
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
run against a node, the configuration details of that node are compared against the attributes of
the role, and then the contents of that role‘s run-list are applied to the node‘s configuration
details. When a Chef Infra Client runs, it merges its own attributes and run -lists with those
contained within each assigned role.
An attribute can be defined in a role and then used to override the default settings on a node.
When a role is applied during a Chef Infra Client run, these attributes are compared to the
attributes that are already present on the node. When the role attributes take prec edence over
the default attributes, Chef Infra Client applies those new settings and values during a Chef
Infra Client run.
A role attribute can only be set to be a default attribute or an override attribute. A role
attribute cannot be set to be a normal attribute. Use
the default_attribute and override_attribute methods in the .rb attributes file or
the default_attributes and override_attributes hashes in a JSON data file.
Attributes from Environments
An environment is a way to map an organization‘s real-life workflow to what can be
configured and managed when using Chef Infra. This mapping is accomplished by setting
attributes and pinning cookbooks at the environment level. With environments, you can
change cookbook configurations depending on the system‘s designation. For example, by
designating different staging and production environments, you can then define the correct
URL of a database server for each environment. Environments also allow organizations to
move new cookbook releases from staging to production with confidence by stepping releases
through testing environments before entering production.
Attributes can be defined in an environment and then used to override the default attributes in
a cookbook. When an environment is applied during a Chef In fra Client run, environment
attributes are compared to the attributes that are already present on the node. When the
environment attributes take precedence over the default attributes, Chef Infra Client applies
those new settings and values during a Chef Infra Client run.
Environment attributes can be set to either default attribute level or an override attribute level.
This part of the series will walk you through the process of automating server
provisioning using Chef, a powerful configuration management tool that leverages the
Ruby programming language to automate infrastructure administration and
provisioning. We will focus on the language terminology, syntax, and features
necessary for creating a simplified example to fully automate the deployment of an
Ubuntu 18.04 web server using Apache.
This is the list of steps we need to automate in order to reach our goal:
137
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1. Update the apt cache
2. Install Apache
3. Create a custom document root directory
4. Place an index.html file in the custom document root
5. Apply a template to set up our custom virtual host
6. Restart Apache
Note: this guide is intended to get you introduced to the Chef language and how to write
recipes to automate your server provisioning. For a more introductory view of Chef,
including the steps necessary for installing and getting started with this tool, please refer
to Chef’s official documentation.
Chef Server: a central server that stores information and manages provisioning of the
nodes
Chef Node: an individual server that is managed by a Chef Server
Chef Workstation: a controller machine where the provisionings are created and
uploaded to the Chef Server
Recipe: a file that contains a set of instructions (resources) to be executed. A recipe
must be contained inside a Cookbook
Resource: a portion of code that declares an element of the system and what action
should be executed. For instance, to install a package we declare a package resource
with the action install
Cookbook: a collection of recipes and other related files organized in a pre-defined
way to facilitate sharing and reusing parts of a provisioning
Attributes: details about a specific node. Attributes can be automatic (see next
definition) and can also be defined inside recipes
Automatic Attributes: global variables containing information about the system, like
network interfaces and operating system (known as facts in other tools). These
automatic attributes are collected by a tool called Ohai
Services: used to trigger service status changes, like restarting or stopping a service
###Recipe Format Chef recipes are written using Ruby. A recipe is basically a
collection of resource definitions that will create a step-by-step set of instructions to
be executed by the nodes. These resource definitions can be mixed with Ruby code
for more flexibility and modularity.
Below you can find a simple example of a recipe that will run apt-get update and
install vim afterwards:
execute "apt-get update" do
command "apt-get update"
end
138
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
apt_package "vim" do
action :install
end
##Writing Recipes ###Working with Variables Local variables can be defined inside
recipes as regular Ruby local variables. The example below shows how to create a
local variable that is later used inside a resource definition:
package = "vim"
apt_package package do
action :install
end
These variables, however, have a limited scope, being valid only inside the file where
they were defined. If you want to create a variable and make it globally available, so
you can use it from any of your cookbooks or recipes, you need to define a custom
attribute.
####Using Attributes Attributes represent details about a node. Chef has automatic
attributes, which are the attributes collected by a tool called Ohai and containing
information about the system (such as platform, hostname and default IP address), but
it also lets you define your own custom attributes.
Attributes have different precedence levels, defined by the type of attribute you
create. default attributes are the most common choice, as they can still be
overwritten by other attribute types when desired.
The following example shows how the previous example would look like with
a default node attribute instead of a local variable:
node.default['main']['package'] = "vim"
apt_package node['main']['package'] do
action :install
end
The attributes’ precedence can be slightly confusing at first, but you will get used to it
after some practice. To illustrate the behavior, consider the following example:
139
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
node.normal['main']['package'] = "vim"
node.override['main']['package'] = "git"
node.default['main']['package'] = "curl"
apt_package node['main']['package'] do
action :install
end
Do you know which package will be installed in this case? If you guessed git, you
guessed correctly. Regardless of the order in which the attributes were defined, the
higher precedence of the type override will make the node['main']['package'] be
evaluated to git`. ###Using Loops Loops are typically used to repeat a task using
different input values. For instance, instead of creating 10 tasks for installing 10
different packages, you can create a single task and use a loop to repeat the task with
all the different packages you want to install.
Chef supports all Ruby loop structures for creating loops inside recipes. For simple
usage, each is a common choice:
['vim', 'git', 'curl'].each do |package|
apt_package package do
action :install
end
end
Instead of using an inline array, you can also create a variable or attribute for defining
the parameters you want to use inside the loop. This will keep things more organized
and easier to read. Below, the same example now using a local variable to define the
packages that should be installed:
packages.each do |package|
apt_package package do
action :install
end
end
Chef supports all Ruby conditionals for creating conditional statements inside recipes.
Additionally, all resource types support two special properties that will evaluate an
expression before deciding if the task should be executed or not: if_only and not_if.
The example below will check for the existence of php before trying to install the
extension php-pear. It will use the command which for verifying if there is
140
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
a php executable currently installed on this system. If the command which php returns
false, this task won’t be executed:
apt_package "php-pear" do
action :install
only_if "which php"
end
If we want to do the opposite, executing a command at all times except when a
condition is evaluated as true, we use not_if instead. This example will
install php5 unless the system is CentOS:
apt_package "php5" do
action :install
not_if { node['platform'] == 'centos' }
end
For performing more complex evaluations, of if you want to execute several tasks
under a specific condition, you may use any of the standard Ruby conditionals. The
following example will only execute apt-get update when the system is either
Debian or Ubuntu:
if node['platform'] == 'debian' || node['platform'] == 'ubuntu'
execute "apt-get update" do
command "apt-get update"
end
end
The attribute node['platform'] is an automatic attribute from Chef. The last example was
only to demonstrate a more complex conditional construction, however it could be
replaced by a simple test using the automatic attribute node['platform_family'], which
would return “debian” for both Debian and Ubuntu systems.
###Working with Templates Templates are typically used to set up configuration files,
allowing for the use of variables and other features intended to make these files more
versatile and reusable.
Chef uses Embedded Ruby (ERB) templates, which is the same format used by Puppet.
They support conditionals, loops and other Ruby features.
Below is an example of an ERB template for setting up an Apache virtual host, using a
variable to define the document root for this host:
<VirtualHost *:80>
ServerAdmin webmaster@localhost
DocumentRoot <%= @doc_root %>
141
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
In order to apply the template, we need to create a template resource. This is how you
would apply this template to replace the default Apache virtual host:
template "/etc/apache2/sites-available/000-default.conf" do
source "vhost.erb"
variables({ :doc_root => node['main']['doc_root'] })
action :create
end
Chef makes a few assumptions when dealing with local files, in order to enforce
organization and modularity. In this case, Chef would look for a vhost.erb template file
inside a templates folder that should be in the same cookbook where this recipe is
located.
Unlike the other configuration management tools we’ve seen so far, Chef has a more
strict scope for variables. This means you will have to explicitly provide any variables
you plan to use inside a template, when defining the template resource. In this
example, we used the variables method to pass along the doc_root attribute we need
at the virtual host template. ###Defining and Triggering Services Service resources
are used to make sure services are initialized and enabled. They are also used to
trigger service restarts.
In Chef, service resources need to be declared before you try to notify them,
otherwise you will get an error.
Let’s take into consideration our previous template usage example, where we set up an
Apache virtual host. If you want to make sure Apache is restarted after a virtual host
change, you first need to create a service resource for the Apache service. This is how
such resource is defined in Chef:
service "apache2" do
action [ :enable, :start ]
end
Now, when defining the template resource, you need to include a notify option in
order to trigger a restart:
template "/etc/apache2/sites-available/000-default.conf" do
source "vhost.erb"
variables({ :doc_root => node['main']['doc_root'] })
action :create
notifies :restart, resources(:service => "apache2")
end
##Example Recipe Now let’s have a look at a recipe that will automate the installation
of an Apache web server within an Ubuntu 14.04 system, as discussed in this guide’s
introduction.
The complete example, including the template file for setting up Apache and an HTML
file to be served by the web server, can be found on Github. The folder also contains a
Vagrantfile that lets you test the recipe in a simplified setup, using a virtual machine
managed by Vagrant.
###Recipe Explained
144
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
####lines 24-29 A cookbook_file resource is used to copy a local file to a remote
server. This resource will copy our index.html file and place it inside the document
root we created in a previous task.
####lines 31-36 Finally, this template resource applies our Apache virtual host
template and notifies the service apache2 for a restart.
Copy
knife data bag create DATA_BAG_NAME (DATA_BAG_ITEM)
knife can be used to update data bag items using the from file argument:
Copy
knife data bag from file BAG_NAME ITEM_NAME.json
As long as a file is in the correct directory structure, knife will be able to find the d ata bag
and data bag item with only the name of the data bag and data bag item. For example:
Copy
knife data bag from file BAG_NAME ITEM_NAME.json
Copy
data_bags/BAG_NAME/ITEM_NAME.json
145
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Continuing the example above, if you are in the “admins” directory and make changes to the
file charlie.json, then to upload that change to the Chef Infra Server use the following
command:
Copy
knife data bag from file admins charlie.json
In some cases, such as when knife is not being run from the root directory for the chef-
repo, the full path to the data bag item may be required. For example:
Copy
knife data bag from file BAG_NAME /path/to/file/ITEM_NAME.json
Manually
One or more data bags and data bag items can be created manually under
the data_bags directory in the chef-repo. Any method can be used to create the data bag
folders and data bag item JSON files. For example:
Copy
mkdir data_bags/admins
would create a data bag folder named “admins”. The equivalent command for using knife is:
Copy
knife data bag create admins
A data bag item can be created manually in the same way as the data bag, but by also
specifying the file name for the data bag item (this example is using vi, a visual editor for
UNIX):
Copy
vi data_bags/admins/charlie.json
would create a data bag item named “charlie.json” under the “admins” sub -directory in
the data_bags directory of the chef-repo. The equivalent command for using knife is:
Copy
knife data bag create admins charlie
When deploying from a private repository using a data bag, use the deploy_key option to
ensure the private key is present:
Copy
146
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
{
'id': 'my_app',
... (truncated) ...
'deploy_key': 'ssh_private_key'
}
where ssh_private_key is the same SSH private key as used with a private git repository
and the new lines converted to \n .
Directory Structure
All data bags are stored in the data_bags directory of the chef-repo. This directory structure
is understood by knife so that the full path does not need to be entered when working with
data bags from the command line. An example of the data_bags directory structure:
Copy
- data_bags
- admins
- charlie.json
- bob.json
- tom.json
- db_users
- charlie.json
- bob.json
- sarah.json
- db_config
- small.json
- medium.json
- large.json
where admins , db_users , and db_config are the names of individual data bags and all of the
files that end with .json are the individual data bag items.
Copy
{
/* This is a supported comment style */
// This style is also supported
"id": "ITEM_NAME",
"key": "value"
}
where
key and value are the key:value pair for each additional attribute within the data bag
item
/* ... */ and // ... show two ways to add comments to the data bag item
Note
Because the contents of encrypted data bag items are not visible to the Chef Infra Server,
search queries against data bags with encrypted items will not return any results.
Encryption Versions
The manner by which a data bag item is encrypted depends on the Chef Infra Client version
used. See the following:
Version 0
Chef Infra Client 0.10+
Version 0
Uses JSON serialization format instead of YAML to encrypt data bag items
Adds random initialization vector encryption for each value to protect against
cryptanalysis
Version 2
Chef Infra Client 11.6+
Version 1
Option to disable versions 0 and 1
Adds Encrypt-then-MAC(EtM) protection
Version 3
148
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Chef Infra Client 13.0+
Option Description
The encryption key that is used for values contained within a data bag item. If secret is
--secret
not specified, Chef Infra Client looks for a secret at the path specified by
SECRET
the encrypted_data_bag_secret setting in the client.rb file.
--secret-
The path to the file that contains the encryption key.
file FILE
Secret Keys
Encrypting a data bag item requires a secret key. A secret key can be created in any
number of ways. For example, OpenSSL can be used to generate a random number, which
can then be used as the secret key:
Copy
openssl rand -base64 512 | tr -d '\r\n' > encrypted_data_bag_secret
where encrypted_data_bag_secret is the name of the file which will contain the secret key.
For example, to create a secret key named “my_secret_key”:
Copy
openssl rand -base64 512 | tr -d '\r\n' > my_secret_key
The tr command eliminates any trailing line feeds. Doing so avoids key corruption when
transferring the file between platforms with different line endings.
Encrypt
A data bag item is encrypted using a knife command similar to:
Copy
knife data bag create passwords mysql --secret-file /tmp/my_data_bag_key
where “passwords” is the name of the data bag, “mysql” is the name of the data bag item,
and “/tmp/my_data_bag_key” is the path to the location in which the file that contains the
secret-key is located. knife will ask for user credentials before the encrypted data bag item
is saved.
Verify Encryption
When the contents of a data bag item are encrypted, they will not be readable until they are
decrypted. Encryption can be verified with a knife command similar to:
Copy
knife data bag show passwords mysql
149
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
where “passwords” is the name of the data bag and “mysql” is the name of the data bag
item. This will return something similar to:
Copy
id: mysql
pass:
cipher: aes-256-cbc
encrypted_data: JZtwXpuq4Hf5ICcepJ1PGQohIyqjNX6JBc2DGpnL2WApzjAUG9SkSdv75TfKSjX4
iv: VYY2qx9b4r3j0qZ7+RkKHg==
version: 1
user:
cipher: aes-256-cbc
encrypted_data: 10BVoNb/plkvkrzVdybPgFFII5GThZ3Op9LNkwVeKpA=
iv: uIqKHZ9skJlN2gpJoml6rQ==
version: 1
Decrypt
An encrypted data bag item is decrypted with a knife command similar to:
Copy
knife data bag show --secret-file /tmp/my_data_bag_key passwords mysql
Copy
{
"id": "mysql",
"pass": "thesecret123",
"user": "fred"
}
To edit an item named “charlie” that is contained in a data bag named “admins”, en ter:
Copy
knife data bag edit admins charlie
to open the $EDITOR. Once opened, you can update the data before saving it to the Chef
Infra Server. For example, by changing:
Copy
{
"id": "charlie"
}
to:
150
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
{
"id": "charlie",
"uid": 1005,
"gid": "ops",
"shell": "/bin/zsh",
"comment": "Crazy Charlie"
}
Search
Data bags store global variables as JSON data. Data bags are indexed for searching and
can be loaded by a cookbook or accessed during a search.
Any search for a data bag (or a data bag item) must specify the name of the data bag and
then provide the search query string that will be used during the search. For example, to
use knife to search within a data bag named “admin_data” across all items, except for the
“admin_users” item, enter the following:
Copy
knife search admin_data "(NOT id:admin_users)"
Or, to include the same search query in a recipe, use a code block similar to:
Copy
search(:admin_data, 'NOT id:admin_users')
It may not be possible to know which data bag items will be needed. It may be necessary to
load everything in a data bag (but not know what “everything” is). Using a search query is
the ideal way to deal with that ambiguity, yet still ensure that all of the required data is
returned. The following examples show how a recipe can use a series of search queries to
search within a data bag named “admins”. For example, to find every administrator:
Copy
search(:admins, '*:*')
Copy
search(:admins, 'id:charlie')
Copy
search(:admins, 'gid:ops')
Or to search for an administrator whose name begins with the letter “c”:
Copy
search(:admins, 'id:c*')
Data bag items that are returned by a search query can be used as if they were a hash. For
example:
151
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
charlie = search(:admins, 'id:charlie').first
# => variable 'charlie' is set to the charlie data bag item
charlie['gid']
# => "ops"
charlie['shell']
# => "/bin/zsh"
The following recipe can be used to create a user for each administrator by loading all of
the items from the “admins” data bag, looping through each admin in the data bag, and then
creating a user resource so that each of those admins exist:
Copy
admins = data_bag('admins')
admins.each do |login|
admin = data_bag_item('admins', login)
home = "/home/#{login}"
user(login) do
uid admin['uid']
gid admin['gid']
shell admin['shell']
comment admin['comment']
home home
manage_home true
end
end
And then the same recipe, modified to load administrators using a search query (and using
an array to store the results of the search query):
Copy
admins = []
home = "/home/#{login}"
user(login) do
uid admin['uid']
gid admin['gid']
shell admin['shell']
comment admin['comment']
home home
manage_home true
end
end
Environments
Values that are stored in a data bag are global to the organization and are available to any
environment. There are two main strategies that can be used to store shared environment
data within a data bag: by using a top-level key that corresponds to the environment or by
using separate items for each environment.
A data bag that is storing a top-level key for an environment might look something like this:
152
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Copy
{
"id": "some_data_bag_item",
"production" : {
# Hash with all your data here
},
"testing" : {
# Hash with all your data here
}
}
When using the data bag in a recipe, that data can be accessed from a recipe using code
similar to:
Copy
data_bag_item[node.chef_environment]['some_other_key']
The other approach is to use separate items for each environment. Depending on the
amount of data, it may all fit nicely within a single item. If this is the case, then creating
different items for each environment may be a simple approach to providing shared
environment values within a data bag. However, this approach is more time-consuming and
may not scale to large environments or when the data must be stored in many data bag
items.
Recipes
Data bags can be accessed by a recipe in the following ways:
Loaded by name when using the Chef Infra Language. Use this approach when a only
single, known data bag item is required.
Accessed through the search indexes. Use this approach when more than one data
bag item is required or when the contents of a data bag are looped through. The
search indexes will bulk-load all of the data bag items, which will result in a lower
overhead than if each data bag item were loaded by name.
Load with Chef Infra Language
The Chef Infra Language provides access to data bags and data bag items (including
encrypted data bag items) with the following methods:
The data_bag method returns an array with a key for each of the data bag items that are
found in the data bag.
Some examples:
Copy
data_bag_item('bag', 'item', IO.read('secret_file'))
153
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To load a single data bag item named admins :
Copy
data_bag('admins')
Copy
data_bag_item('admins', 'justin')
Copy
# => {'comment'=>'Justin Currie', 'gid'=>1005, 'id'=>'justin', 'uid'=>1005,
'shell'=>'/bin/zsh'}
If item is encrypted, data_bag_item will automatically decrypt it using the key specified
above, or (if none is specified) by the Chef::Config[:encrypted_data_bag_secret] method,
which defaults to /etc/chef/encrypted_data_bag_secret .
If two operations concurrently attempt to update the contents of a data bag, the
last-written attempt will be the operation to update the contents of the data bag.
This situation can lead to data loss, so organizations should take steps to ensure that
only one Chef Infra Client is making updates to a data bag at a time.
Altering data bags from the node when using the open source Chef Infra Server
requires the node’s API client to be granted admin privileges. In most cases, this is
not advisable.
and then take steps to ensure that any subsequent actions are done carefully. The following
examples show how a recipe can be used to create and edit the contents of a data bag or a
data bag item using the Chef::DataBag and Chef::DataBagItem objects.
Copy
users = Chef::DataBag.new
users.name('users')
users.create
Copy
sam = {
'id' => 'sam',
'Full Name' => 'Sammy',
'shell' => '/bin/zsh',
}
databag_item = Chef::DataBagItem.new
databag_item.data_bag('users')
154
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
databag_item.raw_data = sam
databag_item.save
Copy
sam = data_bag_item('users', 'sam')
sam['Full Name'] = 'Samantha'
sam.save
Create Users
Chef Infra Client can create users on systems based on the contents of a data bag. For
example, a data bag named “admins” can contain a data bag item for each of the
administrators that will manage the various systems that each Chef Infra Client is
maintaining. A recipe can load the data bag items and then create user accounts on the
target system with code similar to the following:
Copy
# Load the keys of the items in the 'admins' data bag
admins = data_bag('admins')
admins.each do |login|
# This causes a round-trip to the server for each admin in the data bag
admin = data_bag_item('admins', login)
homedir = '/home/#{login}'
chef-solo can load data from a data bag as long as the contents of that data bag are
accessible from a directory structure that exists on the same machine as chef -solo. The
location of this directory is configurable using the data_bag_path option in the solo.rb file.
The name of each sub-directory corresponds to a data bag and each JSON file within a
sub-directory corresponds to a data bag item. Search is not available in recipes when they
are run with chef-solo; use the data_bag() and data_bag_item() functions to access data
bags and data bag items.
155
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 4 : Build tool- Maven
Maven Installation
Apache Maven is a build-automation tool designed to provide a comprehensive and easy-to-useway
of developing Java applications. It uses a POM (Project Object Model) approach to create a
standardized development environment for multiple teams.
In this tutorial, we will show you how to install Apache Maven on a system running
Windows.
Prerequisites
How to
Install
Maven on
Windows
Follow the steps outlined below to install Apache Maven on Windows.
2. Click on the appropriate link to download the binary zip archive of the latest version of
Maven.As of the time of writing this tutorial, that is version 3.8.4.
3. Since there is no installation process, extract the Maven archive to a directory of your choice
once the download is complete. For this tutorial, we are using C:\Program
Files\Maven\apache-maven-3.8.4.
157
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
3. Under the Advanced tab in the System Properties window, click Environment Variables.
158
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
4. Click the New button under the System variables section to add a new system
environmentvariable.
5. Enter MAVEN_HOME as the variable name and the path to the Maven directory as the variable
value. Click OK to save the new system variable.
159
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
160
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
3. Enter %MAVEN_HOME%\bin in the new field. Click OK to save changes to the Path variable.
Note: Not adding the path to the Maven home directory to the Path variable
causes the 'mvn' is not recognized as an internal or external
command, operable program or batch file error when using the mvn
command.
4. Click OK in the Environment Variables window to save the changes to the system variables.
161
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
mvn -version
162
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Conclusion
After reading this tutorial, you should have a copy of Maven installed and ready to use on your
Windows system.
-------------------------------------------------------
TESTS
-------------------------------------------------------
Running com.companyname.bank.AppTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec
Results :
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ consumerBanking ---
[INFO] Building jar: C:\MVN\consumerBanking\target\consumerBanking-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.663 s
[INFO] Finished at: 2021-12-13T17:34:27+05:30
[INFO] ------------------------------------------------------------------------
C:\MVN\consumerBanking>
You've built your project and created final jar file, following are the key learning concepts −
We give maven two goals, first to clean the target directory (clean) and then package
the project build output as jar (package).
Packaged jar is available in consumerBanking\target folder as consumerBanking-1.0-
SNAPSHOT.jar.
Test reports are available in consumerBanking\target\surefire-reports folder.
164
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven compiles the source code file(s) and then tests the source code file(s).
Then Maven runs the test cases.
Finally, Maven creates the package.
Now open the command console, go the C:\MVN\consumerBanking\target\classes directory
and execute the following java command.
>java com.companyname.bank.App
You will see the result as follows −
Hello World!
Adding Java Source Files
Let's see how we can add additional Java files in our project. Open
C:\MVN\consumerBanking\src\main\java\com\companyname\bank folder, create Util class in
it as Util.java.
package com.companyname.bank;
/**
* Hello world!
*
*/
165
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven POM Builds (pom.xml)
POM is an acronym for Project Object Model. The pom.xml file contains information of
project and configuration information for the maven to build the project such as
dependencies, build directory, source directory, test source directory, plugin, goals etc.
Maven reads the pom.xml file, then executes the goal.
Before maven 2, it was named as project.xml file. But, since maven 2 (also in maven 3), it is
renamed as pom.xml.
166
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven pom.xml file with additional elements
Here, we are going to add other elements in pom.xml file such as:
Element Description
packaging defines packaging type such as jar, war etc.
name defines name of the maven project.
url defines url of the project.
dependencies defines dependencies for this project.
dependency defines a dependency. It is used inside dependencies.
scope defines scope for this maven project. It can be compile, provided, runtime,
test and system.
File: pom.xml
1. <project xmlns="http://maven.apache.org/POM/4.0.0"
2. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
4. http://maven.apache.org/xsd/maven-4.0.0.xsd">
5.
6. <modelVersion>4.0.0</modelVersion>
7.
8. <groupId>com.javatpoint.application1</groupId>
9. <artifactId>my-application1</artifactId>
10. <version>1.0</version>
11. <packaging>jar</packaging>
12.
13. <name>Maven Quick Start Archetype</name>
14. <url>http://maven.apache.org</url>
15.
16. <dependencies>
17. <dependency>
18. <groupId>junit</groupId>
19. <artifactId>junit</artifactId>
20. <version>4.8.2</version>
21. <scope>test</scope>
22. </dependency>
23. </dependencies>
24.
167
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
25. </project>
POM stands for Project Object Model this file contains the project information and
configuration details for Maven to build the project. It contains information such as
dependencies, source directory, a build directory, plugins, goals, etc.Maven reads the
pom.xml file and executes the desired goal .older version of Maven 2 this file was named as
project.xml, for the latest version since Maven 2 this file was renamed as POM.XML.
POM.XML stores some additional information such as project version, mailing lists,
description. When maven is executing goals and tasks maven searches for POM.XML in the
current directory. It reads configuration from pom file and executes the desired goal. pom is
a fundamental unit file to work in maven.
Super POM is mavens default POM.XML file.
It contains some default values for most of the projects.
Basic Structure of POM.XML
The basic structure of pom.xml are given below:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.educba.examples</groupId>
<artifactId>example4</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Maven Archetype</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
168
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Basic Key Elements Pom.Xml Contains
Basic key elements pom.xml contains are given below:
Tag Name Description
Project This is the root element of every POM.XML file and this is top level
element.
modelVersion Sub element of project tag. Indicates the version model in which
the current pom.xml is using. Version model changes in very
infrequently .in some cases it‟s mandatory to ensure stability.
groupId Sub element of the project tag, indicates a unique identifier of a
group the project is created. Typically fully qualified domain
name.
artifactId This element indicates a unique base name-primary artifact
generated by a particular project.
Version This element specifies the version for the artifact under the given
group.
packaging This packaging element indicates packages type to be used by
this artifact i.e. JAR, WAR, EAR.
Name This element is used to define the name for the project, often
used maven generated documentation.
URL This element defines the URL of the project.
description This element provides a basic description of the project.
Dependencies This element is used to define dependencies in the project.
Dependency Sub element of dependencies, used to provide specific
dependency.
Scope This element defines the scope of the maven project.
The following figure depicts the Graphical representation use of pom.xml in Maven.
169
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To look out the default configuration of POM.XML file execute the following command.
Open command console => go to directory containing pom.xml => and execute
command.
In our example, pom.xml exist in D:\Maven\Maven_projects.
D:\Maven\Maven_projects>mvn help:effective-pom.
help: effective-pom this command effective POM as XML for the build process.
This command produces the following output.
2. Minimal POM
Some of the important requirements for l pom files are given below.
Project root i.e. <project>
Model version i.e. <modelVersion>
Group ID i.e. <groupId>
Artifact Id i.e. <artifactId>
Version i.e. <version>
Consider the following POM file.
<Project>
<modelVersion>4.0.0</modelVersion>
<groupId>com.educba.example</groupId>
<artifactId>example4</artifactId>
<version>1</version>
</project>
170
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
In pom file compulsorily groupId, artifactId, version should be configured. These three values
depict the fully qualified artifact name of the project.in case the configuration details are not
specified maven uses default configuration specified in the super pom file. For example, if
packaging type is not specified then default packaging type „JAR‟ is used.
Note: On the Project Inheritance (with respect to POM files).
Maven allows two POM files Parent pom and Child pom files. While inheriting from parent
pom child pom file inherits or that are merged following properties. And these properties are
also contained in parent pom file.
Plugin configuration.
Dependencies.
Resources.
Plugin lists.
Plugin execution Ids.
3. Parent POM
Super pom is one of the examples for the parent pom file which is written above. Inheritance
is achieved through a super pom file.
For Parent and child pom Maven checks two properties.
POM file in Project root directory.
Reference from child POM file which contains the same coordinates stated in parent
POM.
Note: Example for Maven parent POM please refer Super POM.
An important reason to use parent POM file is to have a central place to store information
about artifacts, compiler settings, etc. and these are shared in all modules.
4. Child POM
Child POM file refer the parent POM file using the <parent> tag .groupId, artifactId,version
attributes are compulsory in child pom file. Child POM file inherits all dependencies and
properties from the parent POM file. additionally, It also inherits subprojects dependencies.
Consider following pom file.
<project>
<parent>
<groupId>com.educba.examples</groupId>
<artifactId>example4</artifactId>
<version>1</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<groupId>com.educba.examples</groupId>
<artifactId>module_1</artifactId>
171
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<version>1</version>
</project>
Conclusion
Maven provides one of the most important benefits is handling project relationships
inheritance and dependencies. Dependency management in previous days was the most
complicated mess for handling complex projects maven solves these problems through
dependency management through repositories .most important feature of the POM file is its
dependency List. The POM file programmer can add new dependency easily and quickly.
Using POM hierarchy duplication can be avoided. POM inheritance helps in less time
consumption and reduces the complexity of multiple dependency declaration. The quick
project set up is achieved through POM files no build.xml compared to other tools. In some
cases POM.XML becomes very large for complex projects.in large projects sometimes it‟s very
difficult to maintain the jars in a repository and that makes use of several versions of jar files.
172
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven Build Life Cycle
What is Build Lifecycle?
A Build Lifecycle is a well-defined sequence of phases, which define the order in which the
goals are to be executed. Here phase represents a stage in life cycle. As an example, a
typical Maven Build Lifecycle consists of the following sequence of phases.
Phase Handles Description
prepare- resource copying Resource copying can be customized in this phase.
resources
validate Validating the Validates if the project is correct and if all necessary
information information is available.
compile compilation Source code compilation is done in this phase.
Test Testing Tests the compiled source code suitable for testing
framework.
package packaging This phase creates the JAR/WAR package as
mentioned in the packaging in POM.xml.
install installation This phase installs the package in local/remote maven
repository.
Deploy Deploying Copies the final package to the remote repository.
There are always pre and post phases to register goals, which must run prior to, or after a
particular phase.
When Maven starts building a project, it steps through a defined sequence of phases and
executes goals, which are registered with each phase.
Maven has the following three standard lifecycles −
clean
default(or build)
site
A goal represents a specific task which contributes to the building and managing of a
project. It may be bound to zero or more build phases. A goal not bound to any build phase
could be executed outside of the build lifecycle by direct invocation.
The order of execution depends on the order in which the goal(s) and the build phase(s) are
invoked. For example, consider the command below. The clean and package arguments
are build phases while the dependency:copy-dependencies is a goal.
mvn clean dependency:copy-dependencies package
Here the clean phase will be executed first, followed by the dependency:copy-
dependencies goal, and finally package phase will be executed.
Clean Lifecycle
When we execute mvn post-clean command, Maven invokes the clean lifecycle consisting of
the following phases.
pre-clean
clean
post-clean
173
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven clean goal (clean:clean) is bound to the clean phase in the clean lifecycle.
Its clean:cleangoal deletes the output of a build by deleting the build directory. Thus,
when mvn clean command executes, Maven deletes the build directory.
We can customize this behavior by mentioning goals in any of the above phases of clean life
cycle.
In the following example, We'll attach maven-antrun-plugin:run goal to the pre-clean, clean,
and post-clean phases. This will allow us to echo text messages displaying the phases of the
clean lifecycle.
We've created a pom.xml in C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.pre-clean</id>
<phase>pre-clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>pre-clean phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.clean</id>
<phase>clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>clean phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.post-clean</id>
174
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<phase>post-clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>post-clean phase</echo>
</tasks>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Now open command console, go to the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn post-clean
Maven will start processing and displaying all the phases of clean life cycle.
C:\MVN>mvn post-clean
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.pre-clean) @ project ---
[INFO] Executing tasks
[echo] pre-clean phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ project ---
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.clean) @ project ---
[INFO] Executing tasks
[echo] clean phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.post-clean) @ project ---
[INFO] Executing tasks
[echo] post-clean phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.740 s
[INFO] Finished at: 2021-12-10T20:03:53+05:30
[INFO] ------------------------------------------------------------------------
C:\MVN>
175
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
You can try tuning mvn clean command, which will display pre-clean and clean. Nothing will
be executed for post-clean phase.
176
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Take the compiled code and package it in its distributable format, such as a JAR,
WAR, or EAR file.
16 pre-integration-test
Perform actions required before integration tests are executed. For example, setting
up the required environment.
17 integration-test
Process and deploy the package if necessary into an environment where
integration tests can be run.
18 post-integration-test
Perform actions required after integration tests have been executed. For example,
cleaning up the environment.
19 verify
Run any check-ups to verify the package is valid and meets quality criteria.
20 install
Install the package into the local repository, which can be used as a dependency in
other projects locally.
21 deploy
Copies the final package to the remote repository for sharing with other developers
and projects.
There are few important concepts related to Maven Lifecycles, which are worth to mention −
When a phase is called via Maven command, for example mvn compile, only phases
up to and including that phase will execute.
Different maven goals will be bound to different phases of Maven lifecycle depending
upon the type of packaging (JAR / WAR / EAR).
In the following example, we will attach maven-antrun-plugin:run goal to few of the phases of
Build lifecycle. This will allow us to echo text messages displaying the phases of the lifecycle.
We've updated pom.xml in C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.validate</id>
<phase>validate</phase>
177
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>validate phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.compile</id>
<phase>compile</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>compile phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.test</id>
<phase>test</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>test phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.package</id>
<phase>package</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>package phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.deploy</id>
<phase>deploy</phase>
<goals>
<goal>run</goal>
178
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
</goals>
<configuration>
<tasks>
<echo>deploy phase</echo>
</tasks>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Now open command console, go the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn compile
Maven will start processing and display phases of build life cycle up to the compile phase.
C:\MVN>mvn compile
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.validate) @ project ---
[INFO] Executing tasks
[echo] validate phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ project ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform
dependent!
[INFO] skip non existing resourceDirectory C:\MVN\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ project ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.compile) @ project ---
[INFO] Executing tasks
[echo] compile phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.033 s
[INFO] Finished at: 2021-12-10T20:05:46+05:30
[INFO] ------------------------------------------------------------------------
C:\MVN>
Site Lifecycle
179
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven Site plugin is generally used to create fresh documentation to create reports, deploy
site, etc. It has the following phases −
pre-site
site
post-site
site-deploy
In the following example, we will attach maven-antrun-plugin:run goal to all the phases of
Site lifecycle. This will allow us to echo text messages displaying the phases of the lifecycle.
We've updated pom.xml in C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-site-plugin</artifactId>
<version>3.7</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>2.9</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.pre-site</id>
<phase>pre-site</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>pre-site phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.site</id>
<phase>site</phase>
180
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>site phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.post-site</id>
<phase>post-site</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>post-site phase</echo>
</tasks>
</configuration>
</execution>
<execution>
<id>id.site-deploy</id>
<phase>site-deploy</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>site-deploy phase</echo>
</tasks>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Now open the command console, go the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn site
Maven will start processing and displaying the phases of site life cycle up to site phase.
C:\MVN>mvn site
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
181
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[INFO] --- maven-antrun-plugin:3.0.0:run (id.pre-site) @ project ---
[INFO] Executing tasks
[WARNING] [echo] pre-site phase
[INFO] Executed tasks
[INFO]
[INFO] --- maven-site-plugin:3.7:site (default-site) @ project ---
[WARNING] Input file encoding has not been set, using platform encoding Cp1252, i.e. build is platform
dependent!
[WARNING] No project URL defined - decoration links will not be relativized!
[INFO] Rendering site with org.apache.maven.skins:maven-default-skin:jar:1.2 skin.
[INFO]
[INFO] --- maven-antrun-plugin:3.0.0:run (id.site) @ project ---
[INFO] Executing tasks
[WARNING] [echo] site phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.323 s
[INFO] Finished at: 2021-12-10T20:22:31+05:30
[INFO] ------------------------------------------------------------------------
C:\MVN>
Maven Local Repository (.m2)
Introduction to Repositories
Artifact Repositories
A repository in Maven holds build artifacts and dependencies of varying types.
There are exactly two types of repositories: local and remote:
1. the local repository is a directory on the computer where Maven runs. It caches remote downloads
and contains temporary build artifacts that you have not yet released.
2. remote repositories refer to any other type of repository, accessed by a variety of protocols such
as file:// and https:// . These repositories might be a truly remote repository set up by a
third party to provide their artifacts for downloading (for example, repo.maven.apache.org). Other
"remote" repositories may be internal repositories set up on a file or HTTP server within your
company, used to share private artifacts between development teams and for releases.
Local and remote repositories are structured the same way so that scripts can run on either side, or
they can be synced for offline use. The layout of the repositories is completely transparent to the
Maven user, however.
Using Repositories
In general, you should not need to do anything with the local repository on a regular basis, except
clean it out if you are short on disk space (or erase it completely if you are willing to download
everything again).
For the remote repositories, they are used for both downloading and uploading (if you have the
permission to do so).
Downloading from a Remote Repository
182
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Downloading in Maven is triggered by a project declaring a dependency that is not present in the
local repository (or for a SNAPSHOT , when the remote repository contains one that is newer). By
default, Maven will download from the central repository.
To override this, you need to specify a mirror as shown in Using Mirrors for Repositories.
You can set this in your settings.xml file to globally use a certain mirror. However, it is
common for a project to customise the repository in its pom.xml and that your setting will take
precedence. If dependencies are not being found, check that you have not overridden the remote
repository.
For more information on dependencies, see Dependency Mechanism.
Using Mirrors for the Central Repository
There are several official Central repositories geographically distributed. You can make changes to
your settings.xml file to use one or more mirrors. Instructions for this can be found in the
guide Using Mirrors for Repositories.
Building Offline
If you are temporarily disconnected from the internet and you need to build your projects offline,
you can use the offline switch on the CLI:
mvn -o package
Many plugins honor the offline setting and do not perform any operations that connect to the
internet. Some examples are resolving Javadoc links and link checking the site.
Uploading to a Remote Repository
While this is possible for any type of remote repository, you must have the permission to do so. To
have someone upload to the Central Maven repository, see Repository Center.
Internal Repositories
When using Maven, particularly in a corporate environment, connecting to the internet to download
dependencies is not acceptable for security, speed or bandwidth reasons. For that reason, it is
desirable to set up an internal repository to house a copy of artifacts, and to publish private artifacts
to.
Such an internal repository can be downloaded using HTTP or the file system (with
a file:// URL), and uploaded to using SCP, FTP, or a file copy.
As far as Maven is concerned, there is nothing special about this repository: it is another remote
repository that contains artifacts to download to a user's local cache, and is a publish destination for
artifact releases.
Additionally, you may want to share the repository server with your generated project sites. For
more information on creating and deploying sites, see Creating a Site.
Setting up the Internal Repository
To set up an internal repository just requires that you have a place to put it, and then copy required
artifacts there using the same layout as in a remote repository such as repo.maven.apache.org.
It is not recommended that you scrape or rsync:// a full copy of central as there is a large
amount of data there and doing so will get you banned. You can use a program such as those
183
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
described on the Repository Management page to run your internal repository's server, download
from the internet as required, and then hold the artifacts in your internal repository for faster
downloading later.
The other options available are to manually download and vet releases, then copy them to the
internal repository, or to have Maven download them for a user, and manually upload the vetted
artifacts to the internal repository which is used for releases. This step is the only one available for
artifacts where the license forbids their distribution automatically, such as several J2EE JARs
provided by Sun. Refer to the Guide to coping with SUN JARs document for more information.
It should be noted that Maven intends to include enhanced support for such features in the future,
including click through licenses on downloading, and verification of signatures.
1. <project>
2. ...
3. <repositories>
4. <repository>
5. <id>my-internal-site</id>
6. <url>https://myserver/repo</url>
7. </repository>
8. </repositories>
9. ...
10. </project>
If your internal repository requires authentication, the id element can be used in your settings file
to specify login information.
Deploying to the Internal Repository
One of the most important reasons to have one or more internal repositories is to be able to publish
your own private releases.
To publish to the repository, you will need to have access via one of SCP, SFTP, FTP, WebDAV, or
the filesystem. Connectivity is accomplished with the various wagons. Some wagons may need to
be added as extension to your build.
Maven's local repository is a directory on the local machine that stores all the project
artifacts.
When we execute a Maven build, Maven automatically downloads all the dependency jars
into the local repository. Usually, this directory is named .m2.
Here's where the default local repository is located based on OS:
Windows: C:\Users\<User_Name>\.m2Copy
184
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Linux: /home/<User_Name>/.m2Copy
Mac: /Users/<user_name>/.m2Copy
And for Linux and Mac, we can write in the short form:
~/.m2Copy
3. Custom Local Repository in settings.xml
If the repo isn't present in this default location, it's likely because of some pre-existing
configuration.
That config file is located in the Maven installation directory in a folder called conf, with the
name settings.xml.
Here's the relevant configuration that determines the location of our missing local repo:
<settings>
<localRepository>C:/maven_repository</localRepository>
...Copy
This is essentially how we can change the location of the local repo. Of course, if we change
that location, we'll no longer find the repo at the default location.
The files stored in the earlier location won't be moved automatically.
Passing Local Repository Location via Command Line
Apart from setting the custom local repository in Maven's settings.xml, the mvn command
supports the maven.repo.local property, which allows us to pass the local repository location
as a command-line parameter:
mvn -Dmaven.repo.local=/my/local/repository/path clean installCopy
In this way, we don't have to change Maven's settings.xml.
By default, Maven local repository is defaulted to ${user.home}/.m2/repository folder :
1. Unix/Mac OS X – ~/.m2/repository
2. Windows – C:\Users\{your-username}\.m2\repository
When we compile a Maven project, Maven will download all the project‟s dependency and
plugin jars into the Maven local repository, save time for next compilation.
1. Find Maven Local Repository
1.1 If the default .m2 is unable to find, maybe someone changed the default path. Issue the
following command to find out where is the Maven local repository:
Terminal
185
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-help-plugin:3.1.0:evaluate (default-cli) @ standalone-pom ---
[INFO] No artifact parameter specified, using 'org.apache.maven:standalone-pom:pom:1' as project.
[INFO]
C:\opt\maven-repository
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.598 s
[INFO] Finished at: 2018-10-24T16:44:18+08:00
[INFO] ------------------------------------------------------------------------
In above output, The Maven local repository is relocated to C:\opt\maven-repository
{MAVEN_HOME}\conf\settings.xml
<settings>
<!-- localRepository
| The path to the local repository maven will use to store artifacts.
|
| Default: ~/.m2/repository
<localRepository>/path/to/local/repo</localRepository>
-->
<localRepository>D:/maven_repo</localRepository>
Note
Issue mvn -version to find out where is Maven installed.
2.2 Save the file, done, the Maven local repository is now changed to D:/maven_repo.
186
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The local repository is a directory on the computer where Maven runs. When you build
project, it caches remote downloads to reduce network traffic, and also contains
temporary build artifacts that you have not yet released.
1. Maven local repository default locations
By default, in all systems, the maven local repository location path
is .m2/repository under home user.
Unix / Linux – /home/{username}/.m2/repository OR you also can access
with ~/.m2/repository
2. Custom Maven local repository path
2.1. If you wants to change maven local repository location, first of all you need to find
maven setup directory. You can find that from command-line in windows echo
%MAVEN_HOME%, in Mac/ Linux echo $MAVEN_HOME. Alternatively you can try $ mvn -
version command to get the maven setup location.
In Windows:
$ echo %MAVEN_HOME%
Z:\D\maven\apache-maven-3.6.3
In Linux or Mac:
$ echo $MAVEN_HOME
/home/admin/Documents/apache-maven-3.6.3
2.2. Now, you will find conf directory under setup path. Explore it, you will
find settings.xml.
2.3. Open settings.xml, specify value for localrepository property like following
and save the file. Your repository location pointed to the specified location.
187
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Note: If the maven local repository not present in the default location path
under {username}/.m2, look at localrepository property value in settings.xml.
Default repository location might be changed to custom location.
Maven Global Repository
Maven repository is a directory where all the packages, JAR files, plugins or any other artifacts
are stored with POM.xml. Repository in maven holds build artifacts and dependencies of
various types. It provides three types of repositories.
Types of Repositories
Consider the following to understand the types and where they are stored.
1. Local Repositories
Maven local repository is located in the local computer system.it is created by maven when
the user runs any maven command. The default location is %USER_HOME%/.m2 directory.
When maven build is executed, Maven automatically downloads all the dependency jars into
the local repository. For new version maven will download automatically. If version declared
in the dependency tag in POM.xml file it simply uses it without downloading. By default,
maven creates local repository under %UESR_HOME% directory.
188
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Update and Setting the Maven Local Repository:
To update, find this file {MAVEN_HOME}\conf\settings.xml
And to set, use following code:
Code:
<settings>
<localRepository>/path/to/local/repo/</localRepository>
<interactiveMode>true</interactiveMode>
<offline>false</offline>
</settings>
The default value or the path is: ${user.home}/.m2/repository.
interactiveMode is true if you want to interact with the user for input, false if not.
Offline mode is true if the build system operates in offline mode, true if.
Advantages
Reduced version conflict.
Less manual intervention for the first time build process.
Single central reference repository for all dependent software libraries rather than
several independent local libraries.
Fasten the clean build process while using local repositories.
2. Central Repositories
This repositories are located on the web. It has been created by the apache maven itself.it
contains a large number of commonly used libraries. It is not necessary to configure the
maven central repository URL. Internet access is required to search and download the maven
central repository. When maven cannot find a dependency jar file for local repository its
starts searching in maven central repository using URL: http://repo1.maven.org/maven2/.
189
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To override default location make changes in settings.xml file to use one or more mirrors.
Any special configuration is not required to access a central repository. Except in the case
system under firewall, you need to change the proxy settings.
To set up a maven proxy setting, follow the below steps:
Navigate to path – {M2_HOME}/conf/settings.xml
Open xml in edit mode in any text editor.
Open and update <proxy>
3. Remote Repository
This is stored in the organization‟s internal network or server. The company maintains a
repository outside the developer‟s machine and are called as Remote Repository.
The following pom.xml declares remote repository URL and dependencies.
<project>
<dependencies>
<dependency>
<groupId>com.educba.lib</groupId>
<artifactId>library</artifactId>
<version>1.0.0</version>
</dependency>
<dependencies>
<repositories>
<repository>
<id>edu.lib_1</id>
190
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<url> http:// (Organization URL)/maven2/lib_1</url>
</repository>
<repository>
<id>edu.lib_2</id>
<url>http:// (Organization URL)/maven2/lib_2</url>
</repository>
</repositories>
</project>
Adding Remote Repository
Not every library is stored in the Maven Central Repository, some libraries are available in
Java.net or JBoss repository.
1. Java.net Repository
<repositories>
<repository>
<id>java-net-repo</id>
<url>https://maven.java.net/content/repositories/public/</url>
</repository>
</repositories>
2. JBoss Repository
<repositories>
<repository>
<id>jboss-repo</id>
<url>http://repository.jboss.org/nexus/content/groups/public/</url>
</repository>
</repositories>
3. Spring Repository
<repositories>
<repository>
<id>spring-repo</id>
<url>https://repo.spring.io/release</url>
</repository>
</repositories>
Advantages
Artifact team sharing.
191
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Effective separation of artifact still projects under development and release phase.
Centralize libraries management provides security and each client speaks with a single
global repository, avoiding the risk that different elements of the team.
The remote repository allows keeping under control the nature of third-party libraries
used in projects, thus avoiding introducing elements not compliant with company
policies.
Repository Manager
A repository manager is a dedicated server application designed to manage repositories. The
usage of a repository manager is considered an essential best practice for any significant
usage of Maven.
Repository Manager is considered one of the proxy servers for public Maven
Repositories.
Allows Repositories as a destination for Maven project outputs.
Advantages
Repository Manager reduces the complexity of downloading remote Repository hence
the time consumption is less and increases build performance.
Due to trustful dependence, external repositories Repository Manager Increases build
stability.
Due to interaction with remote SNAPSHOT repositories, the Repository Manager
increases the performance.
Repository Manager controls the provided and consumed artifacts.
Repository Manager acts as Central Storage and provides access to artifacts and
MetaData.
Repository Manager acts as a platform for sharing or exchanging binary artifacts.
Building artifacts from scratch is not required.
Available Repository Managers
The followings are the open-source and commercial repository managers who are known to
support the repository format used by Maven.
Apache Archiva
CloudRepo
Cloudsmith Package
JFrog Artifactory Open Source
JFrog Artifactory Pro
MyGet
Sonatype Nexus OSS
Sonatype Nexus Pro
packagecloud.io
192
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven Dependency Checking Process
First, it scans through local repositories for all configured dependencies.
If it‟s found continues with further execution.
Absence of dependencies in local repository; maven scans in central repositories for
that particular dependency.
Available dependencies are downloaded in local repositories for future execution of
the project.
Even dependencies are not found in Local Repository and Central Repository, Maven
starts scanning in Remote Repositories.
In case dependencies are not available in any of the three Repositories- Local
Repository, Central Repository, Remote Repository, Maven throws an Exception “not
able to find the dependencies & stops processing”.
It downloads all found dependencies into the Local Repository.
Note: Since repositories are used by default, the primary type of binary component
repository is a JAR file containing Java byte code. There is no limit to what type of content
stored. Users can easily deploy any libraries to Maven Repositories. When Maven downloads
a component like a dependency or a repository it also downloads that components POM.
193
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
It is rather challenging to obtain a single-word GroupID approved by the Maven Inclusion and
Central Repository.
To make things even more modular, Maven GroupID allows us to create multiple sub-groups
to make things easier. We can determine the granularity of the GroupID by using the Project
structure.
Project Configuration in Maven is done using the Project Object Model, represented by the
file pom.xml. The POM describes the dependencies managed by the Project and proves
helpful in the plugin configuration for software building.
The POM file also keeps a record of relationships between multi-module projects.
194
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Maven GroupID Naming
When working with Maven GroupID, the thing to note about the class file is that we do not
have to pick a name for them; it will automatically choose its name by utilizing the 1:1
mapping from the Java File.
Maven only asks us to pick two names; therefore, it is relatively simple. Thus to define the
Maven GrouID name, the below steps have to be followed:
Step 1: Create a Template for the Project in the Spring Initializer. The figure below shows a
template of the Maven GroupID naming project.
Enter the details as follows:
Group Name: com.Group_ID
Artifact: Maven_Group_ID
Name: Maven_Group_ID
Packaging: JAR
Java Version 8
Step 2: After creating the template, extract the template file and open the same in VS Code
or any other Editor or IDE with Spring Boot Functionalities.
The pom.xml File in the Project should be as follows.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSch
ema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/mav
en-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.5</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.Group_ID</groupId>
<artifactId>Maven_Group_ID</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Maven_Group</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
195
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
<groupId>com.Group_ID</groupId>
<artifactId>Maven_Group_ID</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Maven_Group_ID</name>
<description>Demo project for Spring Boot</description>
We can also modify the name/Group ID of the Project by changing the contents in the name
tag as:
<name>Maven_Group_ID</name>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.5</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.Group_ID</groupId>
<artifactId>Maven_Group_ID</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Maven_Group</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
196
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<dependencies>
<dependency>
<groupId> maven_group </groupId>
<artifactId> RestrictionRule </artifactId>
<version> 0.0.1 </version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>
</project>
Restricting the GroupID
Maven restricts the usage of GroupIDs for the Project to enforce the source code. Here is
how we can implement a single group restriction on the GroupID in Maven:
<dependency>
<groupId> Maven_Group_ID </groupId>
<artifactId> RestrictionRule </artifactId>
<version> 0.0.1 </version>
</dependency>
How is GroupID different from ArtifactID?
The below table briefly discusses the differences between the Maven GroupID and the
ArtifactID.
Sr. Group ID Artifact ID
No
1 Projects are identified uniquely using the name of the JAR file without the version is called
GroupID the Artifact ID
197
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Frequently Asked Questions
What is GroupID in Maven?
GroupID is n unique identifying feature of all the projects currently being worked on in
Maven. It can be automated or the user can define the GroupID by themselves as well.
What is POM?
POM is the Project Object Model, the fundamental unit of work in Maven. It is an XML file
containing information about the configuration details and the Project that Maven uses for
building the projects.
What is the use of Maven GroupID?
GroupID is the unique identifying feature of all the Projects that starts from a reversed
Domain. Maven does not enforce this convention, as there are multiple projects where the
pattern is broken.
What is ArtifactID?
Artifact ID is the name of .JAR file without the version. If we are creating the JAR File, we
can name the ArtifactID however we like.
What is Spring Boot?
Spring Boot is a Java Based Open Source Framework that is used to create a microservice.
It is also used to build Standalone Production Ready Spring Applications.
Differences between Group ID, Artifact ID
artifactId is the name of the jar without version. If you created it then you can choose
whatever name you want with lowercase letters and no strange symbols. If it's a third party jar
you have to take the name of the jar as it's distributed. eg. maven, commons-math
groupId will identify your project uniquely across all projects, so we need to enforce a naming
schema. It has to follow the package name rules, what means that has to be at least as a domain
name you control, and you can create as many subgroups as you want. Look at More information
about package names. eg. org.apache.maven, org.apache.commons
The main difference between groupId and artifactId in Maven is that the groupId specifies the id
of the project group while the artifactId specifies the id of the project.
It is required to use third party libraries when developing a project. The programmer can
download and add these third-party libraries to the project, but it is difficult to update them later.
Maven provides a solution to this issue. It helps to include all the dependencies required for the
project. Moreover, the programmer can specify the required dependencies in the POM.XML file. It
has the configuration information to build the project. Furthermore, this file consists of several
XML elements, and two of them are groupId and artifactId. example groupId : com.test.java
(similar to package name) artifactId : javaproject(project or module name)
What is SNAPSHOT?
SNAPSHOT is a special version that indicates a current development copy. Unlike regular
versions, Maven checks for a new SNAPSHOT version in a remote repository for every build.
Now data-service team will release SNAPSHOT of its updated code every time to repository,
say data-service: 1.0-SNAPSHOT, replacing an older SNAPSHOT jar.
198
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Snapshot vs Version
In case of Version, if Maven once downloaded the mentioned version, say data-service:1.0, it
will never try to download a newer 1.0 available in repository. To download the updated
code, data-service version is be upgraded to 1.1.
In case of SNAPSHOT, Maven will automatically fetch the latest SNAPSHOT (data-
service:1.0-SNAPSHOT) every time app-ui team build their project.
app-ui pom.xml
app-ui project is using 1.0-SNAPSHOT of data-service.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>app-ui</groupId>
<artifactId>app-ui</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>health</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>data-service</groupId>
<artifactId>data-service</artifactId>
<version>1.0-SNAPSHOT</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
data-service pom.xml
data-service project is releasing 1.0-SNAPSHOT for every minor change.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>data-service</groupId>
<artifactId>data-service</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>health</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
</project>
199
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Although, in case of SNAPSHOT, Maven automatically fetches the latest SNAPSHOT on
daily basis, you can force maven to download latest snapshot build using -U switch to any
maven command.
mvn clean package -U
Let's open the command console, go to the C:\ > MVN > app-ui directory and execute the
following mvn command.
C:\MVN\app-ui>mvn clean package -U
Maven will start building the project after downloading the latest SNAPSHOT of data-service.
[INFO] Scanning for projects...
[INFO]--------------------------------------------
[INFO] Building consumerBanking
[INFO] task-segment: [clean, package]
[INFO]--------------------------------------------
[INFO] Downloading data-service:1.0-SNAPSHOT
[INFO] 290K downloaded.
[INFO] [clean:clean {execution: default-clean}]
[INFO] Deleting directory C:\MVN\app-ui\target
[INFO] [resources:resources {execution: default-resources}]
--------------------------------------------------
TESTS
--------------------------------------------------
Running com.companyname.bank.AppTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.027 sec
Results :
Transitive Dependencies
Excluded/Optional Dependencies
Dependency Scope
Dependency Management
Importing Dependencies
Bill of Materials (BOM) POMs
System Dependencies
Transitive Dependencies
Maven avoids the need to discover and specify the libraries that your own dependencies require by
including transitive dependencies automatically.
This feature is facilitated by reading the project files of your dependencies from the remote
repositories specified. In general, all dependencies of those projects are used in your project, as are
any that the project inherits from its parents, or from its dependencies, and so on.
There is no limit to the number of levels that dependencies can be gathered from. A problem arises
only if a cyclic dependency is discovered.
With transitive dependencies, the graph of included libraries can quickly grow quite large. For this
reason, there are additional features that limit which dependencies are included:
Dependency mediation - this determines what version of an artifact will be chosen when multiple
versions are encountered as dependencies. Maven picks the "nearest definition". That is, it uses the
version of the closest dependency to your project in the tree of dependencies. You can always
guarantee a version by declaring it explicitly in your project's POM. Note that if two dependency
versions are at the same depth in the dependency tree, the first declaration wins.
"nearest definition" means that the version used will be the closest one to your project in the
tree of dependencies. Consider this tree of dependencies:
A
├── B
│ └── C
│ └── D 2.0
└── E
201
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
└── D 1.0
In text, dependencies for A, B, and C are defined as A -> B -> C -> D 2.0 and A -> E -> D 1.0,
then D 1.0 will be used when building A because the path from A to D through E is shorter.
You could explicitly add a dependency to D 2.0 in A to force the use of D 2.0, as shown here:
A
├── B
│ └── C
│ └── D 2.0
├── E
│ └── D 1.0
└── D 2.0
Dependency management - this allows project authors to directly specify the versions of artifacts
to be used when they are encountered in transitive dependencies or in dependencies where no
version has been specified. In the example in the preceding section a dependency was directly
added to A even though it is not directly used by A. Instead, A can include D as a dependency in
its dependencyManagement section and directly control which version of D is used when, or if, it
is ever referenced.
Dependency scope - this allows you to only include dependencies appropriate for the current stage
of the build. This is described in more detail below.
Excluded dependencies - If project X depends on project Y, and project Y depends on project Z,
the owner of project X can explicitly exclude project Z as a dependency, using the "exclusion"
element.
Optional dependencies - If project Y depends on project Z, the owner of project Y can mark
project Z as an optional dependency, using the "optional" element. When project X depends on
project Y, X will depend only on Y and not on Y's optional dependency Z. The owner of project X
may then explicitly add a dependency on Z, at her option. (It may be helpful to think of optional
dependencies as "excluded by default.")
Although transitive dependencies can implicitly include desired dependencies, it is a good practice
to explicitly specify the dependencies your source code uses directly. This best practice proves its
value especially when the dependencies of your project change their dependencies.
For example, assume that your project A specifies a dependency on another project B, and project B
specifies a dependency on project C. If you are directly using components in project C, and you
don't specify project C in your project A, it may cause build failure when project B suddenly
updates/removes its dependency on project C.
Another reason to directly specify dependencies is that it provides better documentation for your
project: one can learn more information by just reading the POM file in your project, or by
executing mvn dependency:tree.
Maven also provides dependency:analyze plugin goal for analyzing the dependencies: it helps
making this best practice more achievable.
202
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Dependency Scope
Dependency scope is used to limit the transitivity of a dependency and to determine when a
dependency is included in a classpath.
There are 6 scopes:
compile
This is the default scope, used if none is specified. Compile dependencies are available in all
classpaths of a project. Furthermore, those dependencies are propagated to dependent projects.
provided
This is much like compile , but indicates you expect the JDK or a container to provide the
dependency at runtime. For example, when building a web application for the Java Enterprise
Edition, you would set the dependency on the Servlet API and related Java EE APIs to
scope provided because the web container provides those classes. A dependency with this scope is
added to the classpath used for compilation and test, but not the runtime classpath. It is not
transitive.
runtime
This scope indicates that the dependency is not required for compilation, but is for execution.
Maven includes a dependency with this scope in the runtime and test classpaths, but not the
compile classpath.
test
This scope indicates that the dependency is not required for normal use of the application, and is
only available for the test compilation and execution phases. This scope is not transitive. Typically
this scope is used for test libraries such as JUnit and Mockito. It is also used for non-test libraries
such as Apache Commons IO if those libraries are used in unit tests (src/test/java) but not in the
model code (src/main/java).
system
This scope is similar to provided except that you have to provide the JAR which contains it
explicitly. The artifact is always available and is not looked up in a repository.
import
This scope is only supported on a dependency of type pom in the <dependencyManagement> section.
It indicates the dependency is to be replaced with the effective list of dependencies in the specified
POM's <dependencyManagement> section. Since they are replaced, dependencies with a scope
of import do not actually participate in limiting the transitivity of a dependency.
Each of the scopes (except for import ) affects transitive dependencies in different ways, as is
demonstrated in the table below. If a dependency is set to the scope in the left column, a transitive
dependency of that dependency with the scope across the top row results in a dependency in the
main project with the scope listed at the intersection. If no scope is listed, it means the dependency
is omitted.
compile provided runtime test
compile compile(*) - runtime -
provided provided - provided -
runtime runtime - runtime -
test test - test -
203
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
(*) Note: it is intended that this should be runtime scope instead, so that all compile dependencies
must be explicitly listed. However, if a library you depend on extends a class from another library,
both must be available at compile time. For this reason, compile time dependencies remain as
compile scope even when they are transitive.
Dependency Management
The dependency management section is a mechanism for centralizing dependency information.
When you have a set of projects that inherit from a common parent, it's possible to put all
information about the dependency in the common POM and have simpler references to the artifacts
in the child POMs. The mechanism is best illustrated through some examples. Given these two
POMs which extend the same parent:
Project A:
1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-a</groupId>
6. <artifactId>artifact-a</artifactId>
7. <version>1.0</version>
8. <exclusions>
9. <exclusion>
10. <groupId>group-c</groupId>
11. <artifactId>excluded-artifact</artifactId>
12. </exclusion>
13. </exclusions>
14. </dependency>
15. <dependency>
16. <groupId>group-a</groupId>
17. <artifactId>artifact-b</artifactId>
18. <version>1.0</version>
19. <type>bar</type>
20. <scope>runtime</scope>
21. </dependency>
22. </dependencies>
23. </project>
Project B:
1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-c</groupId>
6. <artifactId>artifact-b</artifactId>
7. <version>1.0</version>
8. <type>war</type>
204
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
9. <scope>runtime</scope>
10. </dependency>
11. <dependency>
12. <groupId>group-a</groupId>
13. <artifactId>artifact-b</artifactId>
14. <version>1.0</version>
15. <type>bar</type>
16. <scope>runtime</scope>
17. </dependency>
18. </dependencies>
19. </project>
These two example POMs share a common dependency and each has one non-trivial dependency.
This information can be put in the parent POM like this:
1. <project>
2. ...
3. <dependencyManagement>
4. <dependencies>
5. <dependency>
6. <groupId>group-a</groupId>
7. <artifactId>artifact-a</artifactId>
8. <version>1.0</version>
9.
10. <exclusions>
11. <exclusion>
12. <groupId>group-c</groupId>
13. <artifactId>excluded-artifact</artifactId>
14. </exclusion>
15. </exclusions>
16.
17. </dependency>
18.
19. <dependency>
20. <groupId>group-c</groupId>
21. <artifactId>artifact-b</artifactId>
22. <version>1.0</version>
23. <type>war</type>
24. <scope>runtime</scope>
25. </dependency>
26.
27. <dependency>
28. <groupId>group-a</groupId>
29. <artifactId>artifact-b</artifactId>
30. <version>1.0</version>
31. <type>bar</type>
32. <scope>runtime</scope>
33. </dependency>
205
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
34. </dependencies>
35. </dependencyManagement>
36. </project>
1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-a</groupId>
6. <artifactId>artifact-a</artifactId>
7. </dependency>
8.
9. <dependency>
10. <groupId>group-a</groupId>
11. <artifactId>artifact-b</artifactId>
12. <!-- This is not a jar dependency, so we must specify type. -->
13. <type>bar</type>
14. </dependency>
15. </dependencies>
16. </project>
1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>group-c</groupId>
6. <artifactId>artifact-b</artifactId>
7. <!-- This is not a jar dependency, so we must specify type. -->
8. <type>war</type>
9. </dependency>
10.
11. <dependency>
12. <groupId>group-a</groupId>
13. <artifactId>artifact-b</artifactId>
14. <!-- This is not a jar dependency, so we must specify type. -->
15. <type>bar</type>
16. </dependency>
17. </dependencies>
18. </project>
NOTE: In two of these dependency references, we had to specify the <type/> element. This is
because the minimal set of information for matching a dependency reference against a
dependencyManagement section is actually {groupId, artifactId, type, classifier}. In many cases,
these dependencies will refer to jar artifacts with no classifier. This allows us to shorthand the
identity set to {groupId, artifactId}, since the default for the type field is jar , and the default
classifier is null.
206
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
A second, and very important use of the dependency management section is to control the versions
of artifacts used in transitive dependencies. As an example consider these projects:
Project A:
1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>A</artifactId>
5. <packaging>pom</packaging>
6. <name>A</name>
7. <version>1.0</version>
8. <dependencyManagement>
9. <dependencies>
10. <dependency>
11. <groupId>test</groupId>
12. <artifactId>a</artifactId>
13. <version>1.2</version>
14. </dependency>
15. <dependency>
16. <groupId>test</groupId>
17. <artifactId>b</artifactId>
18. <version>1.0</version>
19. <scope>compile</scope>
20. </dependency>
21. <dependency>
22. <groupId>test</groupId>
23. <artifactId>c</artifactId>
24. <version>1.0</version>
25. <scope>compile</scope>
26. </dependency>
27. <dependency>
28. <groupId>test</groupId>
29. <artifactId>d</artifactId>
30. <version>1.2</version>
31. </dependency>
32. </dependencies>
33. </dependencyManagement>
34. </project>
Project B:
1. <project>
2. <parent>
3. <artifactId>A</artifactId>
4. <groupId>maven</groupId>
5. <version>1.0</version>
6. </parent>
207
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
7. <modelVersion>4.0.0</modelVersion>
8. <groupId>maven</groupId>
9. <artifactId>B</artifactId>
10. <packaging>pom</packaging>
11. <name>B</name>
12. <version>1.0</version>
13.
14. <dependencyManagement>
15. <dependencies>
16. <dependency>
17. <groupId>test</groupId>
18. <artifactId>d</artifactId>
19. <version>1.0</version>
20. </dependency>
21. </dependencies>
22. </dependencyManagement>
23.
24. <dependencies>
25. <dependency>
26. <groupId>test</groupId>
27. <artifactId>a</artifactId>
28. <version>1.0</version>
29. <scope>runtime</scope>
30. </dependency>
31. <dependency>
32. <groupId>test</groupId>
33. <artifactId>c</artifactId>
34. <scope>runtime</scope>
35. </dependency>
36. </dependencies>
37. </project>
When maven is run on project B, version 1.0 of artifacts a, b, c, and d will be used regardless of the
version specified in their POM.
a and c both are declared as dependencies of the project so version 1.0 is used due to dependency
mediation. Both also have runtime scope since it is directly specified.
b is defined in B's parent's dependency management section and since dependency management
takes precedence over dependency mediation for transitive dependencies, version 1.0 will be
selected should it be referenced in a or c's POM. b will also have compile scope.
Finally, since d is specified in B's dependency management section, should d be a dependency (or
transitive dependency) of a or c, version 1.0 will be chosen - again because dependency
management takes precedence over dependency mediation and also because the current POM's
declaration takes precedence over its parent's declaration.
The reference information about the dependency management tags is available from the project
descriptor reference.
208
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Importing Dependencies
The examples in the previous section describe how to specify managed dependencies through
inheritance. However, in larger projects it may be impossible to accomplish this since a project can
only inherit from a single parent. To accommodate this, projects can import managed dependencies
from other projects. This is accomplished by declaring a POM artifact as a dependency with a scope
of "import".
Project B:
1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>B</artifactId>
5. <packaging>pom</packaging>
6. <name>B</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>maven</groupId>
13. <artifactId>A</artifactId>
14. <version>1.0</version>
15. <type>pom</type>
16. <scope>import</scope>
17. </dependency>
18. <dependency>
19. <groupId>test</groupId>
20. <artifactId>d</artifactId>
21. <version>1.0</version>
22. </dependency>
23. </dependencies>
24. </dependencyManagement>
25.
26. <dependencies>
27. <dependency>
28. <groupId>test</groupId>
29. <artifactId>a</artifactId>
30. <version>1.0</version>
31. <scope>runtime</scope>
32. </dependency>
33. <dependency>
34. <groupId>test</groupId>
35. <artifactId>c</artifactId>
36. <scope>runtime</scope>
37. </dependency>
38. </dependencies>
209
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
39. </project>
Assuming A is the POM defined in the preceding example, the end result would be the same. All of
A's managed dependencies would be incorporated into B except for d since it is defined in this
POM.
Project X:
1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>X</artifactId>
5. <packaging>pom</packaging>
6. <name>X</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>test</groupId>
13. <artifactId>a</artifactId>
14. <version>1.1</version>
15. </dependency>
16. <dependency>
17. <groupId>test</groupId>
18. <artifactId>b</artifactId>
19. <version>1.0</version>
20. <scope>compile</scope>
21. </dependency>
22. </dependencies>
23. </dependencyManagement>
24. </project>
Project Y:
1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>Y</artifactId>
5. <packaging>pom</packaging>
6. <name>Y</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>test</groupId>
13. <artifactId>a</artifactId>
210
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
14. <version>1.2</version>
15. </dependency>
16. <dependency>
17. <groupId>test</groupId>
18. <artifactId>c</artifactId>
19. <version>1.0</version>
20. <scope>compile</scope>
21. </dependency>
22. </dependencies>
23. </dependencyManagement>
24. </project>
Project Z:
1. <project>
2. <modelVersion>4.0.0</modelVersion>
3. <groupId>maven</groupId>
4. <artifactId>Z</artifactId>
5. <packaging>pom</packaging>
6. <name>Z</name>
7. <version>1.0</version>
8.
9. <dependencyManagement>
10. <dependencies>
11. <dependency>
12. <groupId>maven</groupId>
13. <artifactId>X</artifactId>
14. <version>1.0</version>
15. <type>pom</type>
16. <scope>import</scope>
17. </dependency>
18. <dependency>
19. <groupId>maven</groupId>
20. <artifactId>Y</artifactId>
21. <version>1.0</version>
22. <type>pom</type>
23. <scope>import</scope>
24. </dependency>
25. </dependencies>
26. </dependencyManagement>
27. </project>
In the example above Z imports the managed dependencies from both X and Y. However, both X
and Y contain dependency a. Here, version 1.1 of a would be used since X is declared first and a is
not declared in Z's dependencyManagement.
This process is recursive. For example, if X imports another POM, Q, when Z is processed it will
simply appear that all of Q's managed dependencies are defined in X.
211
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
212
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The parent subproject has the BOM POM as its parent. It is a normal multiproject pom.
The project that follows shows how the library can now be used in another project without having to
specify the dependent project's versions.
214
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
15. <type>pom</type>
16. <scope>import</scope>
17. </dependency>
18. </dependencies>
19. </dependencyManagement>
20. <dependencies>
21. <dependency>
22. <groupId>com.test</groupId>
23. <artifactId>project1</artifactId>
24. </dependency>
25. <dependency>
26. <groupId>com.test</groupId>
27. <artifactId>project2</artifactId>
28. </dependency>
29. </dependencies>
30. </project>
Finally, when creating projects that import dependencies, beware of the following:
Do not attempt to import a POM that is defined in a submodule of the current POM. Attempting to
do that will result in the build failing since it won't be able to locate the POM.
Never declare the POM importing a POM as the parent (or grandparent, etc) of the target POM.
There is no way to resolve the circularity and an exception will be thrown.
When referring to artifacts whose POMs have transitive dependencies, the project needs to specify
versions of those artifacts as managed dependencies. Not doing so results in a build failure since
the artifact may not have a version specified. (This should be considered a best practice in any
case as it keeps the versions of artifacts from changing from one build to the next).
System Dependencies
Important note: This is deprecated.
Dependencies with the scope system are always available and are not looked up in repository. They
are usually used to tell Maven about dependencies which are provided by the JDK or the VM. Thus,
system dependencies are especially useful for resolving dependencies on artifacts which are now
provided by the JDK, but were available as separate downloads earlier. Typical examples are the
JDBC standard extensions or the Java Authentication and Authorization Service (JAAS).
A simple example would be:
1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>javax.sql</groupId>
6. <artifactId>jdbc-stdext</artifactId>
7. <version>2.0</version>
8. <scope>system</scope>
9. <systemPath>${java.home}/lib/rt.jar</systemPath>
215
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
10. </dependency>
11. </dependencies>
12. ...
13. </project>
If your artifact is provided by the JDK's tools.jar , the system path would be defined as follows:
1. <project>
2. ...
3. <dependencies>
4. <dependency>
5. <groupId>sun.jdk</groupId>
6. <artifactId>tools</artifactId>
7. <version>1.5.0</version>
8. <scope>system</scope>
9. <systemPath>${java.home}/../lib/tools.jar</systemPath>
10. </dependency>
11. </dependencies>
12. ...
13. </project>
Maven Plugins
Maven is actually a plugin execution framework where every task is actually done by plugins.
Maven Plugins are generally used to −
create jar file
create war file
compile code files
unit testing of code
create project documentation
create project reports
A plugin generally provides a set of goals, which can be executed using the following syntax
−
mvn [plugin-name]:[goal-name]
For example, a Java project can be compiled with the maven-compiler-plugin's compile-goal
by running the following command.
mvn compiler:compile
Plugin Types
Maven provided the following two types of Plugins −
Sr.No. Type & Description
1 Build plugins
They execute during the build process and should be configured in the <build/>
element of pom.xml.
2 Reporting plugins
216
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
They execute during the site generation process and they should be configured in
the <reporting/> element of the pom.xml.
Following is the list of few common plugins −
Sr.No. Plugin & Description
1 clean
Cleans up target after the build. Deletes the target directory.
2 compiler
Compiles Java source files.
3 surefire
Runs the JUnit unit tests. Creates test reports.
4 jar
Builds a JAR file from the current project.
5 war
Builds a WAR file from the current project.
6 javadoc
Generates Javadoc for the project.
7 antrun
Runs a set of ant tasks from any phase mentioned of the build.
Example
We've used maven-antrun-plugin extensively in our examples to print data on console.
Refer Build Profiles chapter. Let us understand it in a better way and create a pom.xml in
C:\MVN\project folder.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.companyname.projectgroup</groupId>
<artifactId>project</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<id>id.clean</id>
<phase>clean</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<echo>clean phase</echo>
217
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
</tasks>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Next, open the command console and go to the folder containing pom.xml and execute the
following mvn command.
C:\MVN\project>mvn clean
Maven will start processing and displaying the clean phase of clean life cycle.
C:\MVN>mvn clean
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< com.companyname.projectgroup:project >----------------
[INFO] Building project 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ project ---
[INFO] Deleting C:\MVN\target
[INFO]
[INFO] --- maven-antrun-plugin:1.1:run (id.clean) @ project ---
[INFO] Executing tasks
[echo] clean phase
[INFO] Executed tasks
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.266 s
[INFO] Finished at: 2021-12-13T13:58:10+05:30
[INFO] ------------------------------------------------------------------------
C:\MVN>
The above example illustrates the following key concepts −
Plugins are specified in pom.xml using plugins element.
Each plugin can have multiple goals.
You can define phase from where plugin should starts its processing using its phase
element. We've used clean phase.
You can configure tasks to be executed by binding them to goals of plugin. We've
bound echo task with run goal of maven-antrun-plugin.
Maven will then download the plugin if not available in local repository and start its
processing.
218
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
CHAPTER 5 : Docker– Containers & Build tool- Maven
Introduction: What is a Docker, Use case of Docker, Platforms for
Docker, Dockers vs. Virtualization
What is a Docker
Docker is an open source platform that enables developers to build, deploy, run, update and
manage containers—standardized, executable components that combine application source code with
the operating system (OS) libraries and dependencies required to run that code in any environment.
Containers simplify development and delivery of distributed applications. They have become
increasingly popular as organizations shift to cloud-native development and
hybrid multicloud environments. It‘s possible for developers to create containers without Docker, by
working directly with capabilities built into Linux and other operating systems. But Docker
makes containerization faster, easier and safer. At this writing, Docker reported over 13 million
developers using the platform (link resides outside ibm.com).
Docker also refers to Docker, Inc. (link resides outside ibm.com), the company that sells the
commercial version of Docker, and to the Docker open source project to which Docker, Inc, and many
other organizations and individuals contribute.
Docker was created in 2013 by Solomon Hykes while working for dotCloud, a cloud hosting
company. It was originally built as an internal tool to make it easier to develop and deploy
applications.
Docker containers are based on Linux containers, which have been around since the early 2000s, but
they weren‘t widely used until Docker created a simple and easy-to-use platform for running
containers that quickly caught on with developers and system administrators alike.
In March of 2014, Docker open-sourced its technology and became one of the most popular projects
on GitHub, raising millions from investors soon after.
In an incredibly short amount of time, Docker has become one of the most popular tools for
developing and deploying software, and it has been adopted by pretty much everyone in the DevOps
community!
How Does Docker Work?
219
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The way Docker does this is by packaging an application and its dependencies in a virtual container
that can run on any computer. This containerization allows for much better portability and efficiency
when compared to virtual machines.
These containers are isolated from each other and bundle their own tools, libraries, and configuration
files. They can communicate with each other through well-defined channels. All containers are run by
a single operating system kernel, and therefore use few resources.
As mentioned, OS virtualization has been around for a while in the form of Linux
Containers (LXC), Solaris Zones, and FreeBSD jail. However, Docker took this concept further by
providing an easy-to-use platform that automated the deployment of applications in containers.
Here are some of the benefits of Docker containers over traditional virtual machines:
They‘re portable and can run on any computer that has a Docker runtime environment.
They‘re isolated from each other and can run different versions of the same software without
affecting each other.
They‘re extremely lightweight, so they can start up faster and use fewer resources.
Docker Components and Tools
Docker consists of three major components:
the Docker Engine, a runtime environment for containers
the Docker command line client, used to interact with the Docker Engine
the Docker Hub, a cloud service that provides registry and repository services for Docker
images
In addition to these core components, there‘s also a number of other tools that work with Docker,
including:
Swarm, a clustering and scheduling tool for dockerized applications
Docker Desktop, successor of Docker Machine, and the fastest way to containerize
applications
Docker Compose, a tool for defining and running multi-container Docker applications
Docker Registry, an on-premises registry service for storing and managing Docker images
Kubernetes, a container orchestration tool that can be used with Docker
Rancher, a container management platform for delivering Kubernetes-as-a-Service
There‘s even a number of services supporting the Docker ecosystem:
Amazon Elastic Container Service (Amazon ECS), a managed container orchestration service
from Amazon Web Services
Azure Kubernetes Service (AKS), a managed container orchestration service from Microsoft
Azure
Google Kubernetes Engine (GKE), a fully managed Kubernetes engine that runs in Google
Cloud Platform
Portainer, for deploying, configuring, troubleshooting and securing containers in minutes on
Kubernetes, Docker, Swarm and Nomad in any cloud, data center or device
220
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Understanding Docker Containers
221
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
the container, a runnable instance created from an image (you can create, start, stop, move or
delete a container using the Docker API or CLI)
A container shares the kernel with other containers and its host machine. This makes it much more
lightweight than a virtual machine.
222
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Automated container creation: Docker can automatically build a container based on
application source code.
Container versioning: Docker can track versions of a container image, roll back to previous
versions, and trace who built a version and how. It can even upload only the deltas between an
existing version and a new one.
Container reuse: Existing containers can be used as base images—essentially like templates
for building new containers.
223
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Hub (link resides outside ibm.com) is the public repository of Docker images that calls itself
the ―world‘s largest library and community for container images.‖ It holds over 100,000 container
images sourced form commercial software vendors, open-source projects, and individual developers.
It includes images that have been produced by Docker, Inc., certified images belonging to the Docker
Trusted Registry, and many thousands of other images.
All Docker Hub users can share their images at will. They can also download predefined base images
from the Docker filesystem to use as a starting point for any containerization project.
Other image repositories exist, as well, notably GitHub. GitHub is a repository hosting service, well
known for application development tools and as a platform that fosters collaboration and
communication. Users of Docker Hub can create a repository (repo) which can hold many images. The
repository can be public or private, and can be linked to GitHub or BitBucket accounts.
Docker Desktop
Docker Desktop (link resides outside ibm.com) is an application for Mac or Windows that includes
Docker Engine, Docker CLI client, Docker Compose, Kubernetes, and others. It also includes access
to Docker Hub.
Docker daemon
Docker daemon is a service that creates and manages Docker images, using the commands from the
client. Essentially Docker daemon serves as the control center of your Docker implementation. The
server on which Docker daemon runs is called the Docker host.
Docker registry
A Docker registry is a scalable open-source storage and distribution system for Docker images. The
registry enables you to track image versions in repositories, using tagging for identification. This is
accomplished using git, a version control tool.
Docker deployment and orchestration
When running just a few containers, it‘s fairly simple to manage an application within Docker
Engine, the industry de facto runtime. But for deployments comprising thousands of containers and
hundreds of services, it‘s nearly impossible to manage the workflow without the help of some
purpose-built tools.
Docker plugins
Docker plugins (link resides outside ibm.com) can be used to make Docker even more functional.A
number of Docker plugins are included in the Docker Engine plugin system, and third-party plugins
can be loaded as well.
Docker Compose
Developers can use Docker Compose to manage multi-container applications, where all containers run
on the same Docker host. Docker Compose creates a YAML (.YML) file that specifies which services
are included in the application and can deploy and run containers with a single command. Because
YAML syntax is language-agnostic, YAML files can be used in programs written in Java, Python,
Ruby and many others languages.
Developers can also use Docker Compose to define persistent volumes for storage, specify base nodes,
and document and configure service dependencies.
Kubernetes
Monitoring and managing container lifecycles in more complex environments requires a container
orchestration tool. While Docker includes its own orchestration tool (called Docker Swarm), most
developers choose Kubernetes instead.
Kubernetes is an open-source container orchestration platform descended from a project developed for
internal use at Google. Kubernetes schedules and automates tasks integral to the management of
224
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
container-based architectures, including container deployment, updates, service discovery, storage
provisioning, load balancing, health monitoring, and more. In addition, the open source ecosystem of
tools for Kubernetes—which includes Istio and Knative—enables organizations to deploy a high-
productivity platform-as-a-service (PaaS) for containerized applications and a faster on-ramp
to serverless computing.
How to Run a Container?
Docker containers are portable and can be run on any host with a Docker engine installed (see How to
Install Docker on Windows 10 Home.
To run a container, you need to first pull the image from a registry. Then, you can create and start the
226
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Use case of Docker,
Docker allows you to instantly create and manage containers with ease, which facilitates faster
deployments. The ability to deploy and scale infrastructure using a simple YAML config file makes it
easy to use all while offering a faster time to market. Security is prioritized with each isolated
container.
Docker provides lightweight virtualization with almost zero overhead. Primarily, you can benefit from
an extra layer of abstraction offered by Docker without having to worry about the overhead. Many
containers can be run on a single machine than with virtualization alone. Containers can be started and
stopped within milliseconds.
In summary, Docker‘s functionality falls into several categories:
Portable deployment of applications
Support for automatic building of docker images
Built-in version tracking
Registry for sharing images
A growing tools ecosystem from the docker API
Consistency among different environments
Efficient utilisation of resources
The feature that really sets Docker apart, is the layered file system and the ability to apply version
control to entire containers. The benefits of being able to track, revert and view changes is a highly
desirable and widely-used feature in software development. Docker extends that same idea to a higher
construct; the entire application, with all its dependencies in a single environment.
DevOps adoption: Docker standardizes the configuration setup interface and simplifies the
DevOps process. The most collaboration between DevOps and Docker is in the CI/CD
production and testing.
227
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Recovering Files: When you find a hardware failure, you set up the rollback steps from time
to time. But, Docker will easily revert you to the last version and replicate the files to new
hardware.
Consolidating Servers: Docker can consolidate multiple servers just like a virtualization
machine consolidates multiple applications. Docker can provide a denser consolidation and
share unused memory across the instances.
Debugging: Besides container orchestration, Docker is also used to fix apps. Docker has a
debug mode and extensions. They give you an overview of where the problem is running.
Multi-tenancy: It is an architecture where a single instance of an app runs in multiple places.
Managing development operations becomes challenging for these apps. Docker creates an
isolated environment and gives developers a chance to run multiple instances of tiers on each
tenant.
On the other hand, Microservices break down the app into multiple independent and
modular services which each possess their own database schema and communicate
with each other via APIs. The microservices architecture suits the DevOps-enabled
infrastructures as it facilitates continuous delivery. By leveraging Docker, organizations
can easily incorporate DevOps best practices into the infrastructure allowing them to
stay ahead of the competition. Moreover, Docker allows developers to easily share
software along with its dependencies with operations teams and ensure that it runs the
same way on both ends. For instance, administrators can use the Docker images
created by the developers using Dockerfiles to stage and update production
environments. As such, the complexity of building and configuring CI/CD pipelines is
reduced allowing for a higher level of control over all changes made to the
infrastructure. Load balancing configuration becomes easier too.
228
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
development environment and create automatic builds from the source code and move
them into the Docker Repo. A connected workflow between developers and CI/CD tools
also means faster releases.
Docker comes with a cloud-managed container registry eliminating the need to manage
your own registry, which can get expensive when you scale the underlying
infrastructure. Moreover, the complexity in configuration becomes a thing of the past.
Implementing role-based access allows people across various teams to securely access
Docker images. Also, Slack integration allows teams to seamlessly collaborate and
coordinate throughout the product life cycle.
Docker brings IaC into the development phase of the CI/CD pipeline as developers can
use Docker-compose to build composite apps using multiple services and ensure that it
works consistently across the pipeline. IaC is a typical example of a Docker use case.
For instance, if you use two different versions of a library for two different programs,
you need to install two versions. In addition, custom environment variables should be
specified before you execute these programs. Now, what if you make certain last minute
changes to dependencies in the development phase and forget to make those changes
in the production?
Docker packages all the required resources into a container and ensures that there are
no conflicts between dependencies. Moreover, you can monitor untracked elements
229
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
that break your environment. Docker standardizes the environment ensuring that
containers work similarly throughout the CI/CD pipeline.
Telecom industries are leveraging the 5G technology and Docker’s support for software-
defined network technology to build loosely coupled architectures. The new 5G
technology supports network function virtualization allowing telecoms to virtualize
network appliance hardware. As such, they can divide and develop each network
function into a service and package it into a container. These containers can be installed
on commodity hardware which allows telecoms to eliminate the need for expensive
hardware infrastructure thus significantly reducing costs. The fairly recent entrance of
public cloud providers into the telecom market has shrunk the profits of telecom
operators and ISVs. They can now use Docker to build cost-effective public clouds with
the existing infrastructure, thereby turning docker use cases into new revenue streams.
While the tenant data is separated, all of these approaches use the same application
server for all tenants. That said, Docker allows for complete isolation wherein each
tenant app code runs inside its own container for each tenant.
To do this, organizations can simply convert the app code into a Docker image to run
containers and use docker-compose.yaml to define the configuration for multi-container
230
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
and multi-tenant apps, thus enabling them to run containers for each tenant. A separate
Postgres database and a separate app server will be used for each tenant running inside
the container. Each tenant will need 2 database servers and 2 app servers. You can
route your requests to the right tenant container by adding an NGINX server container.
To solve this issue, use the “—from-cache” command to instruct Docker build to get the
cache from the local machine image. If you don’t have the local existing docker image,
you can simply create an image and pull it just before the execution of the “Docker
build” command. It’s important to note that this method only uses the latest docker
image base. Therefore, in order to get the earlier images caching, you should push and
pull every docker image based on each stage.
Docker also offers an SSH server for automation and debugging for each isolated
container. Seeing as each service/daemon is isolated, it’s easy to monitor applications
and resources running inside the isolated container and quickly identify errors. This
allows you to run an immutable infrastructure, thereby minimizing any downtimes
resulting from infrastructure changes.
The foregoing is especially useful when developers want to test an application in various
operating systems and analyze the results. Any discrepancies in code will only affect a
single container and therefore won’t crash the entire operating system.
With virtual machines, you need to copy the entire guest operating system. Thankfully,
this is not the case with Docker. Docker allows you to provision fewer resources
enabling you to run more apps and facilitating efficient optimization of resources. For
example, developer teams can consolidate resources onto a single server thus reducing
storage costs. Furthermore, Docker comes with high scalability allowing you to provision
required resources for a precise moment and automatically scale the infrastructure on-
demand. You only pay for the resources you actually use. Moreover, apps running inside
Docker deliver the same level of performance across the CI/CD pipeline, from
development to testing, staging and production. As such, bugs and errors are
minimized. This environment parity enables organizations to manage the infrastructure
with minimal staff and technical resources therefore saving considerably on
maintenance costs. Basically, Docker enhances productivity which means you don’t need
to hire as many developers as you would in a traditional software development
environment. Docker also comes with the highest level of security and, most
importantly, it’s open-source and free.
232
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Use Cases 12: Security Practices
Docker containers are secure by default. When you create a container using Docker, it
will automatically create a set of namespaces and isolate the container. Therefore, a
container cannot access or affect processes running inside another container. Similarly,
each container gets its own network stack which means it cannot gain privileged access
to the network ports, sockets and interfaces of other containers unless certain
permissions are granted. In addition to resource accounting and limiting, control groups
handle the provisioning of memory, compute and disk I/O resources. Distributed-Denial-
of-Service (DDoS) attacks are thus successfully mitigated seeing as a resource-exhausted
container cannot crash the system.
When a container launches, the Docker daemon activates a set of restriction capabilities,
augmenting the binary root with fine-grained access controls. This provides higher
security seeing as a lot of processes that run as root don’t need real root privileges.
Therefore, they can operate with lesser privileges. Another important feature is the
running signed images using the Docker Content Trust Signature Verification feature
defined in the dockerd config file. If you want to add an extra layer of security and
harden the Docker containers, SELinux, Apparmor and GRSEC are notable tools that can
help you do so.
Docker containers can be easily and instantly created or destroyed. When a container
fails, it is automatically replaced by another one seeing as containers are built using the
Docker images and based on dockerfile configurations. Before moving an image to
another environment, you can commit data to existing platforms. You can also restore
data in case of a disaster.
All of this being said, it’s important to understand that the underlying hosts may be
connected to other components. Therefore, your disaster recovery plan should involve
spinning up a replacement host as well. In addition, you should consider things like
stateful servers, network and VPN configurations, etc.
233
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Use Cases 14: Easy Infrastructure Scaling
Docker augments the microservices architecture wherein applications are broken down
into independent services and packaged into containers. Organizations are taking
advantage of microservices and cloud architectures and building distributed
applications. Docker enables you to instantly spin up identical containers for an
application and horizontally scale the infrastructure. As the number of containers
increases, you’ll need to use a container orchestration tool such as Kubernetes or
Docker Swarm. These tools come with smart scaling abilities that allow them to
automatically scale up the infrastructure on-demand. They also help you optimize costs
seeing as they remove the need to run unnecessary containers. It’s important to fine-
grain components in order to make orchestration easier. In addition, stateless and
disposable components will enable you to monitor and manage the lifecycle of the
container with ease.
When trying to reproduce an environment, there are OS, language and package
dependencies that should be taken care of. If you work with Python language, you’ll
need dependency management tools such as virtualenv, venv and pyenv. If the new
environment doesn’t have a tool like git, you’ll need to create a script to install git CLI.
The script keeps changing for different OS and OS versions, therefore every team
member should be aware of these tools, which isn’t always easy.
Be it OS, language or CLI tool dependencies, Docker is the best tool for dependency
management. By simply defining the configuration in the dockerfile along with its
dependencies, you can seamlessly move an app to another machine or environment
without the need to remember the dependencies, worry about package conflicts or keep
track of user preferences and local machine configurations.
234
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Paypal is a leading US-based financial technology company which offers online payment
services across the globe. The company processes around 200 payments per second
across three different systems; Paypal, Venmo and Braintree. As such, moving services
between different clouds and architectures used to delay deployment and maintenance
tasks. Paypal therefore implemented Docker and standardized its apps and operations
across the infrastructure. To this day, the company has migrated 700 apps to Docker
and works with 4000 software employees managing 200,000 containers and 8+ billion
transactions per year while achieving a 50% increase in productivity.
Adobe also uses Docker for containerization tasks. For instance, ColdFusion is an Adobe
web programming language and application server that facilitates communication
235
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
between web apps and backend systems. Adobe uses Docker to containerize and
deploy ColdFusion services. It uses Docker Hub and Amazon Elastic Container Registry
to host Docker images. Users can therefore pull these images to the local machine and
run Docker commands.
GE is one of the few companies that was bold enough to embrace the technology at its
embryonic stage and has become a leader over the years. As such, the company
operates multiple legacy apps which delay the deployment cycle. GE turned to Docker
and has since managed to considerably reduce development to deployment time.
Moreover, it is now able to achieve higher application density than VMs, which reduces
operational costs.
What’s Next After Docker?
Once you understand how Docker is impacting different business aspects, the next thing
you want to grasp is how to fully leverage Docker technology. As organization
operations evolve, the need for thousands of containers arises. Thankfully, Docker is
highly scalable and you can easily scale services up and down while defining the number
of replicas needed using the scale.
$ docker service scale frontend=50
You can also scale multiple services at once using the docker service scale command.
Container Management Systems
As business evolves, organizations need to scale operations on-demand. Furthermore,
as container clusters increase, it becomes challenging to orchestrate them. Container
management systems help you manage container tasks right from creation and
deployment all the way to scaling and destruction, allowing you to use automation
wherever applicable. Basically, they simplify container management. In addition to
creating and removing containers, these systems manage other container-related tasks
such as orchestration, security, scheduling, monitoring, storage, log management, load
balancing and network management. According to Datadog, organizations that use
container management systems host 11.5 containers per host on average compared to
6.5 containers per host when managed by non-orchestrated environments.
Popular Container Management Tools
Here are some of the most popular container managers for your business.
Kubernetes: Kubernetes is the most popular container orchestration tool
developed by Google. It wasn’t long before Kubernetes became a de facto
standard for container management and orchestration. Google moved the tool to
Cloud Native Computing Foundation (CNCF), which means the tool is now
supported by industry giants such as IBM, Microsoft, Google andRedHat. It
enables you to quickly package, test, deploy and manage large clusters of
containers with ease. It’s also open-source, cost-effective and cloud-agnostic.
236
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Amazon EKS: As Kubernetes became a standard for container management cloud
providers started to incorporate it into their platform offerings. Amazon Elastic
Kubernetes Service (EKS) is a managed Kubernetes service for managing
Kubernetes on AWS. With EKS organizations don’t need to install and configure
Kubernetes work nodes or planes seeing as it handles that for you. In a nutshell,
EKS acts as a container service and manages container orchestration for you.
However, EKS only works with AWS cloud.
Amazon ECS: Amazon Elastic Container Service (ECS) is a fully managed container
orchestration tool for AWS environments which helps organizations manage
microservices and batch jobs with ease. ECS looks similar to EKS but differs seeing
as it manages container clusters, unlike EKS which only performs Kubernetes
tasks. ECS is free while EKS charges $0.1 per hour. That said, seeing as it’s open-
source, EKS provides you with more support from the community. ECS, on the
other hand, is more of a proprietary tool. ECS is mostly useful for people who
don’t have extensive DevOps resources or who find Kubernetes to be complex.
237
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Platforms for Docker,
The Docker platform runs natively on Linux (on x86-64, ARM and many
other CPU architectures) and on Windows (x86-64). Docker Inc. builds
products that let you build and run containers on Linux, Windows and
macOS.
Well, look no further! We’ve got the top 10 Docker hosting platforms in 2023 all
lined up for you. Whether you’re looking for advanced features or an easy-to-use
solution, these picks offer guaranteed performance, scalability, and reliability.
So save yourself the hassle of researching each option in detail – this curated
selection has something to suit everyone’s needs. Get ready to welcome optimized
containerization into your infrastructure!
1 Back4app Containers
2 Heroku
3 Google Cloud Run
4 Kamatera
5 Amazon ECS
6 AppFleet
7 A2 Hosting
8 Digital Ocean
9 Linode
10 Conversio
Back4app Containers
Back4app Containers is an innovative cloud-based hosting platform perfect for managing your
Docker containers. With advanced features like automated deployment, self-healing functionality, and
custom scaling options, this platform offers robust scalability and reliability for any size of the project.
What makes Back4app Containers stand out is its ease of use. All it takes is a few clicks to get your
application up and running – no need to worry about complex configurations or software updates. Plus,
the intuitive dashboard makes it simple to check on stats at any time. And with failover redundancy
built in, there‘s no need to monitor your containers 24/7 – Back4app Containers will take care of that
for you.
238
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
If you‘re looking for a hassle-free way to manage all your Docker containers, look no further than
Back4app Containers. It‘s the ideal choice for businesses of any size who want reliable performance
from their applications without any coding or IT experience required.
By taking care of the difficult work for you, Back4app Containers lets you focus on what matters most
– running your business the way it should be run! Please read the article Deploying a Docker
Application for a detailed tutorial on this subject.
Heroku
Heroku is a PaaS, or cloud-based platform as a service that enables developers to build, deploy, scale,
and manage applications quickly. Heroku allows developers to focus on coding while its platform
automates the deployment of code and scales applications according to the user‘s needs.
The core features of Heroku include automated application scaling, one-click deployments, and easy
integration with third-party services such as databases and log management. Developers can use their
existing programming language, including Ruby, Java, Node.js, and Python. Heroku also provides
developers with access to an ever-growing range of add-ons for added functionality.
Heroku is highly recommended for managing Docker containers due to its ease of use and scalability
abilities. It eliminates the need for complex configuration and makes it easier to deploy applications
quickly without worrying about setting up environment variables every time you want to update your
application.
Overall, Heroku is the perfect choice for developers who are looking for an efficient way to manage
Docker containers while having the ability to quickly scale their apps when needed.
Google Cloud Run
Google Cloud Run is a serverless computing platform by Google that helps users manage and deploy
their Docker containers on the cloud. It provides an efficient way to run stateless containers that are
invocable via HTTP requests, allowing you to quickly build applications in your favorite language and
deploy them in seconds. With Cloud Run, you can focus on creating code without worrying about
managing the underlying infrastructure.
Cloud Run‘s core features include automatic scaling, which allows your application to scale up or down
based on demand, secure execution of containers with built-in authentication and authorization, and
high availability with no downtime during deployments. Additionally, Cloud Run supports multiple
languages such as Java, Node.js, Go, Python, .NET Core, and Ruby.
Overall, Google Cloud Run is a great choice for managing your Docker containers due to its ease of use
and scalability. It simplifies the process of deploying and managing applications by providing an
efficient way to run stateless containers with minimal effort.
Kamatera
Kamatera is an innovative cloud provider that specializes in managing Docker containers. It provides
an easy-to-use platform for businesses to manage their Docker services, allowing them to take
advantage of scalability and flexibility on demand.
Kamatera offers a wide range of features tailored for managing Docker containers, including port
assignment and mapping, container life cycle management, resource scheduling, and usage tracking.
Additionally, it also provides deep customization through configurable environments, such as using
different operating systems or customizing the memory allocation of the virtual machines within each
container.
For businesses that demand scalability without sacrificing control over their environment, Kamatera‘s
platform offers comprehensive control and real-time metrics with support for multiple clouds. This
gives companies the ability to manage complex architectures without requiring specialized personnel.
In addition to providing an intuitive platform for managing Docker containers, Kamatera also takes
security seriously by offering both physical and environmental protection against unauthorized access
239
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
or data breaches. This includes hourly scans of entire environments as well as two-factor authentication
for users connecting from remote locations. All of this helps keep businesses safe from unwelcome
intrusions or attacks while ensuring data is kept secure and accessible when needed.
Amazon ECS
Amazon Elastic Container Service (ECS) is an Amazon-managed container orchestration service that
provides a secure, efficient, and scalable way to run Docker containers. With ECS, customers can
easily configure their desired number of containers running on their clusters without requiring any
additional infrastructure or computing resources.
At the core of Amazon ECS are several benefits that make it a good choice for managing your Docker
containers:
Easy Deployment: Amazon ECS streamlines the process of deploying and managing
applications in production. It automates the steps involved in launching and scaling
containerized applications.
Scalability and Performance: ECS enables users to increase or decrease the number of
available resources depending on their workloads at any given time, ensuring that their
applications always remain up and running efficiently.
Security and Reliability: Amazon ECS uses its own security features designed to ensure that
customer data is stored securely while still allowing access to control the containerized
application‘s environment.
Cost Efficiency: Amazon ECS is highly cost-effective as compared to other similar services
due to its low operating costs, which include storage, computing power, and networking.
AppFleet
AppFleet is an intelligent platform for managing your Docker containers. It provides a powerful set of
features to simplify the process of deploying and maintaining applications in production environments.
With AppFleet, you can easily manage, deploy and scale your applications without worrying about
server maintenance or other external factors.
AppFleet offers advanced monitoring tools that make it easy to track application performance and
metrics over time. It also makes it easy to keep track of costs associated with running your applications.
The platform supports rolling updates, which lets the user make changes to the application code without
taking it offline. This helps reduce downtime and ensure a smoother transition when making changes to
production applications.
In addition, AppFleet‘s orchestration capabilities enable users to automatically scale containers based
on resource needs and allocate resources across multiple nodes when needed. Its cloud automation
capabilities allow users to quickly spin up resources in the cloud, saving them time and money in
managing containerized workloads. Finally, AppFleet‘s intuitive Web-based dashboard makes it simple
for users to review their container environment as well as troubleshoot any issues they may be having.
A2 Hosting
A2 Hosting offers quick and simple Docker hosting services to manage your containers. They provide
robust features for full scalability, including unlimited storage and bandwidth, free SSL certificates,
custom domains, and more. Their cloud hosting platform is fast and secure, providing instant scalability
with no setup costs or hardware investments.
A2 Hosting‘s core features make it an ideal choice for managing your Docker containers. All of their
plans come with a cPanel control panel to help you easily manage your containers and configurations.
They also offer excellent reliability and uptime with 24/7 support, and plenty of storage options
including SSDs, unlimited databases, and FTP accounts. Additionally, they use LXC virtualization
technology to ensure that each container runs smoothly within its own environment.
240
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Digital Ocean
Digital Ocean is a cloud computing platform for developers and businesses. It provides powerful,
reliable infrastructure and services that make it easy to manage workloads and applications.
Digital Ocean offers a wide array of core features, such as on-demand virtual servers and block storage.
Its intuitive command line interface allows users to spin up new instances quickly in just 55 seconds. It
also features high availability, scalability, custom networking options, and comprehensive monitoring
capabilities.
Because of its simplicity and flexibility, Digital Ocean makes it easy to deploy applications by utilizing
Docker containers. With containerization technology, you can easily build, ship, and run your
applications without the need for a traditional server setup. This helps you save time and money when it
comes to managing multiple projects or development environments.
Linode
Linode is an innovative cloud-hosting service that has quickly become one of the leading choices for
managing your Docker containers. It offers reliable, secure, and scalable plans so that you can
customize your hosting experience according to your specific needs.
Linode‘s core features include an intuitive control panel for easy management, 24/7 monitoring and
support, DDoS protection, and fast SSD-based storage on the Akamai Connected Cloud. All these
features combine to make Linode the perfect choice for managing your Docker containers.
With its reliable performance, Linode provides users with complete control over their web hosting
choices, such as operating system installation, server configuration, and partitioning. The intuitive
interface also allows users to quickly set up custom applications on the cloud platform, such as web
servers or databases.
Conversio
Conversio is an intuitive, powerful platform that makes managing Docker containers incredibly easy
and convenient. With Conversio, you can manage your Docker containers with ease, thanks to its
robust feature set that allows you to do virtually anything.
At the core of Conversio are features like container scheduling and orchestration, health checks for
running containers and deployments, resource utilization and monitoring, as well as auto-scaling
capabilities. It even allows users to customize their container environment by setting up custom
configurations and templating resources.
What sets Conversio apart from other solutions on the market is its ease of use. The UI is user-friendly
and provides a streamlined way to keep track of all your applications in one place. Plus, it‘s designed to
work with popular automation tools such as Jenkins and Kubernetes for optimized performance across
multiple cloud providers. This makes it ideal for businesses looking for a comprehensive solution to
manage their Docker containers effectively.
Conclusion
Docker containers are an excellent way to efficiently manage applications and workloads, but it is
important to carefully consider the features each cloud provider offers. The key lies in finding the right
combination of features that fits your needs while delivering cost-effective performance. Each of these
providers offers a unique set of advantages, so be sure to compare all your options before making a
decision.
241
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Dockers vs. Virtualization
What is Docker?
Docker is popular virtualization software that helps its users in developing, deploying, monitoring, and
running applications in a Docker Container with all their dependencies.
Docker containers include all dependencies (frameworks, libraries, etc.) to run an application in an
efficient and bug-free manner.
Docker Containers have the following benefits:
Light-weight
Applications run in isolation
Occupies less space
Easily portable and highly secure
Short boot-up time
Now, let's have a look at the primary differences between Docker and virtual machines.
Differences Docker Virtual Machine
Operating Docker is a container-based model It is not a container-based model; they use user
system where containers are software space along with the kernel space of an OS
packages used for executing an It does not share the host kernel
application on any operating system Each workload needs a complete OS or
In Docker, the containers share the hypervisor
host OS kernel
Here, multiple workloads can run on
a single OS
Performance Docker containers result in high- Since VM uses a separate OS; it causes more
performance as they use the same resources to be used
operating system with no additional Virtual machines don‘t start quickly and lead
software (like hypervisor) to poor performance
242
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker containers can start up
quickly and result in less boot-up
time
Portability With docker containers, users can It has known portability issues. VMs don‘t
create an application and store it into have a central hub and it requires more
a container image. Then, he/she can memory space to store data
run it across any host environment While transferring files, VMs should have a
Docker container is smaller than copy of the OS and its dependencies because
VMs, because of which the process of which image size is increased and becomes
of transferring files on the host‘s a tedious process to share data
filesystem is easier
Speed The application in Docker containers It takes a much longer time than it takes for a
starts with no delay since the OS is container to run applications
already up and running To deploy a single application, Virtual
These containers were basically Machines need to start the entire OS, which
designed to save time in the would cause a full boot process
deployment process of an application
Key Difference: Docker and Virtual Machine
There are many analogies of Docker and virtual machines. Docker containers and virtual machines
differ in many ways; let's discuss one analogy using apartment vs. bungalow.
Apartment (Eg: Containers) Virtual machine (Eg: Bungalow)
Most amenities (binary and library) are shared with neighbors Amenities (binary and library)
(applications) cannot be shared with neighbors
(applications)
Can have multiple tenants (Applications) Cannot have multiple tenants
(application)
For a more in-depth understanding, we will look at the key differences between the two below:
Docker Virtual machine
Containers stop working when the ―stop command‖ is Virtual machines are always in the
executed running state
It has lots of snapshots as it builds images upon the layers Doesn‘t comprise many snapshots
Images can be version controlled; they have a local registry VM doesn‘t have a central hub; they
called Docker hub are not version controlled
It can run multiple containers on a system It can run only a limited number of
VMs on a system
It can start multiple containers at a time on the Docker engine It can start only a single VM on a
VMX
Next, let‘s have a look at a real-life use-case of Docker using the BBC news channel.
243
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Architecture: Docker Architecture., Understanding the Docker
components
Docker daemon runs on the host operating system. It is responsible for running containers to
manage docker services. Docker daemon communicates with other daemons. It offers various
Docker objects such as images, containers, networking, and storage. s
Docker follows Client-Server architecture, which includes the three main components that
are Docker Client, Docker Host, and Docker Registry.
Docker client uses commands and REST APIs to communicate with the Docker Daemon
(Server). When a client runs any docker command on the docker client terminal, the client
terminal sends these docker commands to the Docker daemon. Docker daemon receives
these commands from the docker client in the form of command and REST API's request.
Docker Client uses Command Line Interface (CLI) to run the following commands -
docker build
docker pull
docker run
244
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Host is used to provide an environment to execute and run applications. It contains
the docker daemon, images, containers, networks, and storage.
Docker Objects
There are the following Docker Objects -
Docker images are the read-only binary templates used to create Docker Containers. It uses
a private container registry to share container images within the enterprise and also uses
public container registry to share container images within the whole world. Metadata is also
used by docket images to describe the container's abilities.
Containers are the structural units of Docker, which is used to hold the entire package that is
needed to run the application. The advantage of containers is that it requires very less
resources.
In other words, we can say that the image is a template, and the container is a copy of that
template.
245
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Using Docker Networking, an isolated package can be communicated. Docker contains the
following network drivers -
o Bridge - Bridge is a default network driver for the container. It is used when multiple
docker communicates with the same docker host.
o Host - It is used when we don't need for network isolation between the container and
the host.
o None - It disables all the networking.
o Overlay - Overlay offers Swarm services to communicate with each other. It enables
containers to run on the different docker host.
o Macvlan - Macvlan is used when we want to assign MAC addresses to the containers.
Docker Storage is used to store data on the container. Docker offers the following options for
the Storage -
o Data Volume - Data Volume provides the ability to create persistence storage. It also
allows us to name volumes, list volumes, and containers associates with the volumes.
o Directory Mounts - It is one of the best options for docker storage. It mounts a host's
directory into a container.
o Storage Plugins - It provides an ability to connect to external storage platforms.
246
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Docker Architecture
Below is the simple diagram of a Docker architecture.
The persisting data generated by docker and used by Docker containers are stored in
Volumes. They are completely managed by docker through docker CLI or Docker API.
Volumes work on both Windows and Linux containers. Rather than persisting data in a
container‟s writable layer, it is always a good option to use volumes for it. Volume‟s content
exists outside the lifecycle of a container, so using volume does not increase the size of a
container.
You can use -v or –mount flag to start a container with a volume. In this sample command,
you are using geekvolume volume with geekflare container.
Docker networking is a passage through which all the isolated container communicate. There
are mainly five network drivers in docker:
248
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
1. Bridge: It is the default network driver for a container. You use this network when your
application is running on standalone containers, i.e. multiple containers
communicating with same docker host.
2. Host: This driver removes the network isolation between docker containers and
docker host. It is used when you don‟t need any network isolation between host and
container.
3. Overlay: This network enables swarm services to communicate with each other. It is
used when the containers are running on different Docker hosts or when swarm
services are formed by multiple applications.
4. None: This driver disables all the networking.
5. macvlan: This driver assigns mac address to containers to make them look like
physical devices. The traffic is routed between containers through their mac addresses.
This network is used when you want the containers to look like a physical device, for
example, while migrating a VM setup.
249
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Usage of Docker
o Fast application delivery
Docker accumulates the development lifecycle by permitting developers to operate in
standardized environments with local containers, which give our services and applications.
Containers are ideal for continuous delivery and continuous integration workflows.
o Responsive scaling and deployment
The container-based platform of Docker permits for highly compact workloads. The
containers can execute on the local laptop, cloud providers, virtual or physical machines in
the data center, or in a combination of the environments of a developer.
The lightweight nature and portability of Docker also make it easier to dynamically maintain
workloads and scale up and tear down services and applications as business requirements
dictate.
o Running multiple workloads on a similar hardware
Docker is fast and lightweight. It offers a cost-effective and viable replacement for
hypervisor-based virtual machines, so we can use more of our server capacity to gain our
business objectives. Docker is great for high-density platforms and for medium and small
deployments where we require to work more using fewer resources.
250
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
See, the attached screen shot below.
4. Add the new GPG key. Following command downloads the key.
Command:
1. $ sudo apt-key adv \
2. --keyserver hkp://ha.pool.sks-keyservers.net:80 \
3. --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
Screen shot is given below.
5. Run the following command, it will substitute the entry for your operating system for the
file.
1. $ echo "<REPO>" | sudo tee /etc/apt/sources.list.d/docker.list
See, the attached screen shot below.
6. Open the file /etc/apt/sources.list.d/docker.listand paste the following line into the file.
1. deb https://apt.dockerproject.org/repo ubuntu-xenial main
251
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
252
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Install the latest Docker version.
1. update your apt packages index.
1. $ sudo apt-get update
See, the attached screen shot below.
2. Install docker-engine.
1. $ sudo apt-get install docker-engine
See, the attached screen shot below.
253
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
This above command downloads a test image and runs it in a container. When the container
runs, it prints a message and exits.
Step 2: Once the DockerToolbox.exe file is downloaded, double click on that file. The
following window appears on the screen, in which click on the Next.
Step 3: Browse the location where you want to install the Docker Toolbox and click on the
Next.
254
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Step 4: Select the components according to your requirement and click on the Next.
255
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Step 7: Once the installation is completed, the following Wizard appears on the screen, in
which click on the Finish.
256
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Step 8: After the successful installation, three icons will appear on the screen that
are: Docker Quickstart Terminal, Kitematic (Alpha), and OracleVM VirtualBox. Double
click on the Docker Quickstart Terminal.
257
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
To verify that the docker is successfully installed, type the below command and press enter
key.
You can check the Docker version using the following command.
1. docker -version
258
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
261
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
$ docker kill my_container
Copy Code
9. docker commit
This command is used to create a new image from the container image.
Syntax: docker commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]
Docker commit command allows users to take an existing running container
and save its current state as an image
There are certain steps to be followed before running the command
First , Pull the image from docker hub
Deploy the container using the image id from first step
Modify the container (Any changes ,if needed)
Commit the changes
Example:
$ docker commit c3f279d17e0a dev/testimage:version3.
Copy Code
10. docker push
This command is used to push an image or repository to a registry.
Syntax: docker push [OPTIONS] NAME[: TAG]
Use docker image push to share your images to the Docker Hub registry or
to a self-hosted one.
Example:
$ docker image push registry-host:5000/myadmin/rhel-
httpd:lates
Copy Code
Apart from the above commands, we have other commands for which the
detailing can be found in the following link Docker reference.
One can become DevOps certified by referring to DevOps Certification courses .
Docker Use Cases
Let's understand a few of the docker use cases:
Use case 1: Developers write their code locally and can share it using docker
containers.
Use case 2: Fixing the bugs and deploying them into the respective
environments is as simple as pushing the image to the respective
environment.
Use case 3: Using docker one can push their application to the test
environment and execute automated and manual tests
Use case 4: One can make their deployment responsive and scalable by using
docker since docker can handle dynamic workloads feasibility.
Let us take an example of an application,
When a company wants to develop a web application they need an
environment where they have a tomcat server installed. Once the tester set
262
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
up a tomcat environment and test the application, it is deployed into a
production environment. Once again the tomcat has to be setup in
production environment to host the java web application There are some
issues with this approach:
Loss of time and effort.
Developer and tester might use a different tomcat versions.
Now, let's see how the Docker container can be used to prevent this loss.
In order to overcome the issues, docker will be used by a developer to create
a docker image using a base image which is already existing in Docker hub.
Docker hub has some base images available for free. Now this image can be
used by developer, tester and the system admin to deploy a tomcat
environment. In this way, Docker container solves the problem.
Docker Architecture
Docker architecture generally consists of a Docker Engine which is a client -
server application with three major components:
1. Generally, docker follows a client-server architecture
2. The client communicates with the daemon, which generally takes up the task of
building,running, and shipping the docker containers.
3. The client and daemon communicate using REST API calls. These calls act as an interface
between the client and daemon
4. A command-line interface, Docker CLI runs docker commands. Some basic docker commands
with examples are listed in the next section.
5. Registry stores the docker images
Docker Hub is the public repository that hosts a large number of Docker images. Docker
images are pre-built containers that can be easily downloaded and run on any system. Users
263
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
can also download the Docker images for offline use. Moreover, they can load the Docker
image onto another computer or keep a backup of the Docker image.
This blog will explain the method to download the official Docker image for offline use.
How to Download Docker Images for Offline Use?
To download Docker images for offline use, check out the provided steps:
Navigate to Docker Hub.
Search for the desired image and copy its ―pull‖ command.
Pull the Docker image in the local repository using the ―docker pull <image-name>‖
command.
Save the image to file via the ―docker save -o <output-file-name> <image-name>‖
command.
Load the image from the saved file using the ―docker load -i <output-file-name>‖
command.
Run the Docker image for verification.
Step 1: Choose an Image and Copy its “pull” Command
First, redirect to Docker Hub, and search for the desired Docker image. For instance, we
have searched for the ―hello-world‖ image. Then, copy the below-highlighted command:
264
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Step 2: Pull Docker Image
Now, run the copied command in the Windows PowerShell to pull the selected Docker image
into the local repository:
docker pull hello-world
Step 4: Verification
Follow the provided path in your PC to view the output file:
C:\Users\<user-name>
In the below image, the saved output file can be seen, i.e., ―hello-world_image.docker‖:
265
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
266
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Conclusion
To download Docker images for offline use, first, search for the desired image on Docker
Hub and copy its ―pull‖ command. Then, run the ―docker pull <image-name>‖ command to
pull the Docker image in the local repository. After that, save the Docker image to the file via
the ―docker save -o <output-file-name> <image-name>‖ command and load it from the
saved file using the ―docker load -i <output-file-name>‖ command. Lastly, run the Docker
image for verification. This blog explained the method to download the official Docker image
for offline use.
Uploading the images in Docker Registry and AWS ECS,
1. Create the AWS ECR repository
In the AWS console go to the AWS ECR page. Click the “Create repository” button.
267
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
268
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
269
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Now the image is in my repository created in step 1.
Linux containers?
Imagine you’re developing an application. You do your work on a laptop and your
environment has a specific configuration. Other developers may have slightly different
configurations. The application you’re developing relies on that configuration and is
dependent on specific libraries, dependencies, and files. Meanwhile, your business has
development and production environments that are standardized with their own
configurations and their own sets of supporting files. You want to emulate those
environments as much as possible locally, but without all the overhead of recreating the
server environments. So, how do you make your app work across these environments, pass
quality assurance, and get your app deployed without massive headaches, rewriting, and
break-fixing? The answer: containers.
The container that holds your application has the necessary libraries, dependencies, and
files so you can move it through production without nasty side effects. In fact, the contents of
a container image—created using an open-source tool like Buildah—can be thought of as an
installation of a Linux distribution because it comes complete with RPM packages,
configuration files, etc. But, container image distribution is a lot easier than installing new
copies of operating systems. Crisis averted—everyone’s happy.
271
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
That’s a common example, but Linux containers can be applied to many different problems
where portability, configurability, and isolation is needed. The point of Linux containers is to
develop faster and meet business needs as they arise. In some cases, such as real-time
data streaming with Apache Kafka, containers are essential because they're the only way to
provide the scalability an application needs. No matter the infrastructure—on-premise, in
the cloud, or a hybrid of the two—containers meet the demand. Of course, choosing the right
container platform is just as important as the containers themselves.
Red Hat® OpenShift® includes everything needed for hybrid cloud, enterprise container, and
Kubernetes development and deployments. OpenShift is available as a cloud service with
major cloud providers, or you can manage OpenShift yourself for greater flexibility and
customization.
In the next step we’ll learn how to find the names of Docker containers. This will be
useful if you already have a container you’re targeting, but you’re not sure what its
name is.
This command lists all of the Docker containers running on the server, and provides
some high-level information about them:
Output
CONTAINER ID IMAGE COMMAND CREATED STATUS
PORTS NAMES
76aded7112d4 alpine "watch 'date >> /var…" 11 seconds ago Up 10
seconds container-name
In this example, the container ID and name are highlighted. You may use either to
tell docker exec which container to use.
If you’d like to rename your container, use the docker rename command:
1. docker rename container-name new-name
2.
Copy
Next, we’ll run through several examples of using docker exec to execute commands in
a running Docker container.
Running an Interactive Shell in a Docker Container
If you need to start an interactive shell inside a Docker Container, perhaps to explore
the filesystem or debug running processes, use docker exec with the -i and -t flags.
The -i flag keeps input open to the container, and the -t flag creates a pseudo-
terminal that the shell can attach to. These flags can be combined like this:
1. docker exec -it container-name sh
2.
Copy
273
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
This will run the sh shell in the specified container, giving you a basic shell prompt. To
exit back out of the container, type exit then press ENTER:
1. exit
2.
Copy
If your container image includes a more advanced shell such as bash, you could
replace sh with bash above.
Running a Non-interactive Command in a Docker Container
If you need to run a command inside a running Docker container, but don’t need any
interactivity, use the docker exec command without any flags:
1. docker exec container-name tail /var/log/date.log
2.
Copy
This command will run tail /var/log/date.log on the container-name container, and
output the results. By default the tail command will print out the last ten lines of a file.
If you’re running the demo container we set up in the first section, you will see
something like this:
Output
Mon Jul 26 14:39:33 UTC 2021
Mon Jul 26 14:39:35 UTC 2021
Mon Jul 26 14:39:37 UTC 2021
Mon Jul 26 14:39:39 UTC 2021
Mon Jul 26 14:39:41 UTC 2021
Mon Jul 26 14:39:43 UTC 2021
Mon Jul 26 14:39:45 UTC 2021
Mon Jul 26 14:39:47 UTC 2021
Mon Jul 26 14:39:49 UTC 2021
Mon Jul 26 14:39:51 UTC 2021
This is essentially the same as opening up an interactive shell for the Docker container
(as done in the previous step with docker exec -it container-name sh) and then
running the tail /var/log/date.log command. However, rather than opening up a
shell, running the command, and then closing the shell, this command returns that
same output in a single command and without opening up a pseudo-terminal.
Running Commands in an Alternate Directory in a Docker
Container
To run a command in a certain directory of your container, use the --workdir flag to
specify the directory:
1. docker exec --workdir /tmp container-name pwd
2.
Copy
This example command sets the /tmp directory as the working directory, then runs
the pwd command, which prints out the present working directory:
Output
/tmp
The pwd command has confirmed that the working directory is /tmp.
274
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Running Commands as a Different User in a Docker Container
To run a command as a different user inside your container, add the --user flag:
1. docker exec --user guest container-name whoami
2.
Copy
This will use the guest user to run the whoami command in the container.
The whoami command prints out the current user’s username:
Output
guest
The whoami command confirms that the container’s current user is guest.
Passing Environment Variables into a Docker Container
Sometimes you need to pass environment variables into a container along with the
command to run. The -e flag lets you specify an environment variable:
1. docker exec -e TEST=sammy container-name env
2.
Copy
This command sets the TEST environment variable to equal sammy, then runs
the env command inside the container. The env command then prints out all the
environment variables:
Output
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=76aded7112d4
TEST=sammy
HOME=/root
The TEST variable is set to sammy.
To set multiple variables, repeat the -e flag for each one:
1. docker exec -e TEST=sammy -e ENVIRONMENT=prod container-name env
2.
Copy
If you’d like to pass in a file full of environment variables you can do that with the --
env-file flag.
First, make the file with a text editor. We’ll open a new file with nano here, but you can
use any editor you’re comfortable with:
1. nano .env
2.
Copy
We’re using .env as the filename, as that’s a popular standard for using these sorts of
files to manage information outside of version control.
Write your KEY=value variables into the file, one per line, like the following:
.env
TEST=sammy
ENVIRONMENT=prod
Save and close the file. To save the file and exit nano, press CTRL+O, then ENTER to save,
then CTRL+X to exit.
Now run the docker exec command, specifying the correct filename after --env-file:
1. docker exec --env-file .env container-name env
275
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
2.
Copy
Output
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=76aded7112d4
TEST=sammy
ENVIRONMENT=prod
HOME=/root
You may specify multiple files by using multiple --env-file flags. If the variables in the
files overlap each other, whichever file was listed last in the command will override the
previous files.
Common Errors
When using the docker exec command, you may encounter a few common errors:
Error: No such container: container-name
The No such container error means the specified container does not exist, and may
indicate a misspelled container name. Use docker ps to list out your running containers
and double-check the name.
Error response from daemon: Container
2a94aae70ea5dc92a12e30b13d0613dd6ca5919174d73e62e29cb0f79db6e4ab is not
running
This not running message means that the container exists, but it is stopped. You can
start the container with docker start container-name
Error response from daemon: Container container-name is paused, unpause the
container before exec
The Container is paused error explains the problem fairly well. You need to unpause
the container with docker unpause container-name before proceeding.
276
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
277
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
3. Configure environment variables, declare dependencies
Most applications use environment variables for initialization and startup.
And also, after we divide the application into services, they have
dependencies on each other. So we need to identify those things before
we declare the compose file.
4. Configure networking
Docker containers communicate with each other through their internal
network that is created by compose (eg service_name:port). If you want to
connect from your host machine, you will have to expose the service to a
host port.
5. Set up volumes
In most cases, we would not want our database contents to be lost each
time the database service is brought down. A simple way to persist our DB
data is to mount a volume.
6. Build & Run
Now, you are set to go and create the compose file and build the images
for your services and generate containers from those images.
A sample docker-compose.yaml file is shown below with all the
configurations discussed before. Get detailed service configuration
reference from the docker-compose file reference.
These YAML rules, both human-readable and machine-optimized, provide
us an efficient way to snapshot the entire project within a few minutes.
After all those, in the end, we just need to run:
$ docker-compose up [options]
And compose will start and run your entire app. along with the above
command, you can use the following options,
-d, --detach Detached mode: Run containers in the background,
print new container names.--no-deps Don't start linked
services.--no-build Don't build an image, even if it's missing.-
-build Build images before starting containers.--no-start
Don't start the services after creating them.--no-recreate If
containers already exist, don't recreate them.--force-recreate
Recreate containers even if their configuration and image haven't
changed.
Other useful commands with compose
Compose has commands for managing the whole lifecycle of your
application:
$ docker-compose build : build or rebuild services
$ docker-compose config : validate and view the Compose file
$ docker-compose down : stop and remove containers, networks,
images, and volumes
278
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
$ docker-compose bundle : generate a Docker bundle from the
compose file
$ docker-compose logs <service_name> : stream the log output of
running services
$ docker-compose exec <service_name> <command> : execute a
command in a running container
$ docker-compose run <service_name> <command> : run a one-off
command
$ docker-compose stop <service_name(s)> : stop running containers
without removing them
$ docker-compose start <service_name(s)> : start existing containers
for a service.
$ docker-compose pull/ push <service_name(s)> : pull/ push service
images
$ docker-compose kill <service_name(s)> : kill containers
$ docker-compose rm <service_name(s)> : remove stopped containers
$ docker-compose ps : list containers
$ docker-compose images : list images
You can create custom images from source disks, images, snapshots, or images stored in Cloud
Storage and use these images to create virtual machine (VM) instances. Custom images are ideal for
situations where you have created and modified a persistent boot disk or specific image to a certain
state and need to save that state for creating VMs.
Alternatively, you can use the virtual disk import tool to import boot disk images to Compute Engine
from your existing systems and add them to your custom images list.
The storage location feature is optional. If you don't select a location, Compute Engine stores your
image in the multi-region closest to the image source. For example, when you create an image from a
source disk that is located in us-central1 and if you don't specify a location for the custom image, then
Compute Engine stores the image in the us multi-region.
If the image is not available in a region where you are creating a VM, Compute Engine caches the
image in that region the first time you create a VM.
To see the location where an image is stored, use the images describe command from gcloud compute:
All of your existing images prior to this feature launch remain where they are, the only change is that
you can view the image location of all your images. If you have an existing image you want to move,
you must recreate it in the desired location.
Stop the VM so that it can shut down and stop writing any data to the persistent disk.
If you can't stop your VM before you create the image, minimize the amount of writes to the disk and
sync your file system. To minimize writing to your persistent disk, follow these steps:
1. Pause apps or operating system processes that write data to that persistent disk.
2. Run an app flush to disk if necessary. For example, MySQL has a FLUSH statement. Other apps might
have similar processes.
3. Stop your apps from writing to your persistent disk.
4. Run sudo sync.
280
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Disable the auto-delete option for the disk
By default, the auto-delete option is enabled on the boot disks. Before creating an image from a disk,
disable auto-delete to ensure that the disk is not automatically deleted when you delete the VM.
You can use the Google Cloud console, the Google Cloud CLI, or the Compute Engine API to disable
auto-delete for the disk.
ConsolegcloudAPI
1. In the Google Cloud console, go to the VM instances page.
Go to the VM instances page
2. Click on the VM that you're using as the source for creating an image. The VM instance details page
displays.
3. Click Edit.
4. In the Boot disk section, for the Deletion rule, ensure that the Keep disk option is selected.
5. Click Save.
You can create a disk image once every 10 minutes. If you want to issue a burst of requests to create a
disk image, you can issue at most 6 requests in 60 minutes. For more information, see Snapshot
frequency limits.
ConsolegcloudAPIGoJavaPython
1. In the Google Cloud console, go to the Create an image page.
Go to Create an image
2. Specify the Name of your image.
3. Specify the Source from which you want to create an image. This can be a persistent disk, a snapshot,
another image, or a disk.raw file in Cloud Storage.
4. If you are creating an image from a disk attached to a running VM, check Keep instance running to
confirm that you want to create the image while the VM is running. You can prepare your VM before
creating the image.
5. In the Based on source disk location (default) drop-down list, specify the location to store the image.
For example, specify us to store the image in the us multi-region, or us-central1 to store it in the us-
central1 region. If you don't make a selection, Compute Engine stores the image in the multi-region
closest to your image's source location.
6. Optional: specify the properties for your image.
Family: the image family this new image belongs to.
281
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Description: a description for your custom image.
Label: a label to group together resources.
7. Specify the encryption key. You can choose between a Google-managed key, a Cloud Key
Management Service (Cloud KMS) key or a customer- supplied encryption (CSEK) key. If no
encryption key is specified, images are encrypted using a Google-managed key.
8. Click Create to create the image.
For more information about adding images, see the images reference.
gcloudAPI
Use the gcloud compute images create command with the --guest-os-features flag to create a new custom
image from an existing custom image.
To add multiple values, use commas to separate values. Set to one or more of the following values:
VIRTIO_SCSI_MULTIQUEUE. Use on local SSD devices as an alternative to NVMe. For more
information about images that support SCSI, see Choosing an interface.
For Linux images, you can enable multi-queue SCSI on local SSD devices on images with kernel
versions 3.17 or later. For Windows images, you can enable multi-queue SCSI on local SSD devices
on images with Compute Engine Windows driver version 1.2.
WINDOWS. Tag Windows Server custom boot images as Windows images.
282
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
MULTI_IP_SUBNET. Configure interfaces with a netmask other than /32. For more information about
multiple network interfaces and how they work, see Multiple network interfaces overview and
examples.
UEFI_COMPATIBLE. Boot with UEFI firmware and the following Shielded VM features:
Secure Boot: disabled by default
Virtual Trusted Platform Module (vTPM): enabled by default
Integrity monitoring: enabled by default
GVNIC. Support higher network bandwidths of up to 50 Gbps to 100 Gbps speeds. For more
information, see Using Google Virtual NIC.
SEV_CAPABLE. Use if you're creating a Confidential VM on the AMD Secure Encrypted Virtualization
(SEV) CPU platform. For more information, see Create a new Confidential VM instance.
SUSPEND_RESUME_COMPATIBLE. Support suspend and resume on a VM. For more information, see OS
compatibility.
LOCATION: Optional: region or multi-region in which to store the image
For example, specify us to store the image in the us multi-region, or us-central1 to store it in the us-
central1 region. If you don't make a selection, Compute Engine stores the image in the multi-region
closest to your image's source location.
Considerations for Arm images
Google offers the Tau T2A machine series, which runs on the Ampere Altra CPU platform. You can
start a VM with the T2A machine series and then use that source VM to create an Arm image. The
process for creating a custom Arm image is identical to creating an x86 image.
To help your users differentiate between Arm and x86 images, Arm images will have
an architecture field set to ARM64. Possible values for this field are:
ARCHITECTURE_UNSPECIFIED
X86_64
ARM64
Image users can then filter on this field to find x86 or Arm-based images.
285
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Once the installation is complete, confirm once more that the software was installed as
shown in the example below. And exit from the container.
286
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Additional Options for Docker Commit Command
The first command is the pull command. This command will download/pull the complete
operating system within seconds depending on your internet connectivity. The syntax is
like, docker pull image_name. Here I am pulling alex43/ubuntu-with-git:v1.0 which is my
own customized image.
docker pull alex43/ubuntu-with-git:v1.0
The second command is the run command which we will use to run the pulled image. This
command will launch my image and we will get an interactive shell/terminal of that image.
The syntax is like this -it is for an interactive terminal, –name to give the reference name
for my image launched, and then my image_name.
docker run -it --name myos alex43/ubuntu-with-git:v1.0
The third command and the most important command for creating our own image is
the commit command. By using this command we can simply create our own image with
the packages which we want from the existing image. The syntax is like, docker commit
Nameof_RunningImage your_own_name: tag.
docker commit myos ubuntu-basicbundle:v1.0
The fourth command is the tag command. By using this command we need to rename our
image with the syntax username/image-name:tag. Before executing this command you
need to create an account on the Docker hub and you have to give the same username
which you have given in the Docker hub profile.
docker tag alex43/ubuntu-with-git:v1.0 alex43/ubuntu-basicbundle:v1.0
The fifth command is the login command. By using this command we will log in to the
docker hub account through our terminal and it is required to upload our docker image to
the docker hub profile.
docker login --username alex43 --password your_passwd
The fifth command is the push command. By using this command we can upload our own
created docker image to the docker hub profile and can use it anywhere from our local
system to the cloud by pulling it.
docker push alex43/ubuntu-basicbundle:v1.0
So these were the few commands with the concept which we will be using in this tutorial
and I will be uploading one fresh image so that you guys can understand it in a better
way.
When to Commit New Changes to a New Container Image
By committing new changes to a new container image it will be useful in the
containerization process where you can make an image from the changes we have done to
a container. The timing of when to commit a new image depends upon a few factors:
1. Modifications are finished: Be sure that the modifications you’ve made are complete
and function as intended before committing new changes to a container image. You can
end up with an image that doesn’t perform properly or needs additional adjustments if
you commit insufficient changes.
2. Consistency of Changes: It’s crucial to make sure that the changes you’ve made to
the container are stable and won’t result in any problems when they’re deployed. Test
the container rigorously to confirm that it performs as expected before making
modifications to an image.
3. Frequency of Changes: Committing changes to a fresh container image more
regularly may make sense if you frequently modify the container. This can lessen the
287
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
chance of needing to roll back modifications if problems develop and ensure that each
new version of the container reflects the most recent changes.
In conclusion, only commit fresh changes to a fresh container image once they have been
fully finished, stable, and properly tested. When to commit new changes to an image
depends on the frequency of changes and your deployment workflow.
Conclusion
In this post, we’ve discussed the significance of the docker commits command and
provided step-by-step instructions with an example of how to use it. Docker commit is
mainly used to commit the image from the running container in which we have done some
modifications like installing some software or adding any variables in the container.
288
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
289
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Step 5: Verify the image is on Docker Hub
That‘s it! Your image is now shared on Docker Hub. In your browser, go to Docker Hub and verify that you see
the welcome-to-docker repository.
290
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o database trust store password is the password of the database trust store.
For Deployed MDM, the trust store password is WebAS.
The SSL and trust store arguments cannot be left blank. For example, if SSL is not enabled on
the database and there is no trust store, then the command should be similar to the following:
verify.sh db2inst1 db2inst1 mdmadmin mdmadmin false none none
5. To access the various parts of the InfoSphere MDM application on the Docker containers, use
the following information:
MDM application container (mdm_container)
o InfoSphere MDM administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o WebSphere Application Server administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o Database administrator credentials:
1. user ID: db2inst1
2. password: db2inst1
Note: Use db2inst1 only if you are using IBM Db2 (db2_container). If you are
using another database system, use the appropriate user and password.
o Access the WebSphere Application Server Integrated Solutions Console (admin
console) at https://<hostname>:9043/ibm/console/logon.jsp
o Access the InfoSphere MDM Inspector user interface at
https://<hostname>:9443/inspector/application/inspector.html
o Access the InfoSphere MDM Web Reports user interface at
https://<hostname>:9443/webreports/common/login.html
o Access the InfoSphere MDM Enterprise Viewer user interface at
https://<hostname>:9443/accessweb/servlet/dousrlogin
o Access the InfoSphere MDM Business Administration user interface at
https://<hostname>:9443/CustomerBusinessAdminWeb/faces/
o Access the MDM AE/SE user interface at https://<hostname>:9443/mdm-aese/
MDM user interface container (mdmui_container)
o InfoSphere MDM administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o WebSphere Application Server administrator credentials:
1. user ID: mdmadmin
2. password: mdmadmin
o Access the WebSphere Application Server Integrated Solutions Console (admin
console) at https://<hostname>:9043/ibm/console/logon.jsp
o Access the InfoSphere MDM Inspector user interface at
https://<hostname>:39043/inspector/application/inspector.html
291
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o Access the InfoSphere MDM Web Reports user interface at
https://<hostname>:39043/webreports/common/login.html
o Access the InfoSphere MDM Enterprise Viewer user interface at
https://<hostname>:39043/accessweb/servlet/dousrlogin
o Access the InfoSphere MDM Business Administration user interface at
https://<hostname>:39043/CustomerBusinessAdminWeb/faces/
o Access the MDM AE/SE user interface at https://<hostname>:8543/mdm-aese/
MDM Workbench container (wb_container)
The MDM Workbench container includes VNC server (and also a noVNC server) so that you
can access it remotely.
To access the MDM Workbench desktop using VNC in GUI mode, browse to
http://hostname:6080/vnc.html. Each open browser/tab at this URL opens a new session.
The default password for noVNC is temp4now. To change it, edit the Docker Compose
file mdm-wb.yml.
MDM Workbench workspaces are mapped under the Docker volume on the host machine
under workspace. This location contains all of the workspace assets and will survive even if
the wb_container goes down. It is mapped to
the wb_container location /opt/IBM/rationalsdp/workspace.
IBM® Stewardship Center container (mdmisc_container)
o IBM Business Process Manager administrator credentials:
1. user ID: bpmadmin
2. password: bpmadmin
o IBM Stewardship Center credentials:
1. user ID: dsuser1
2. password: password
o Access the BPM Process Center at the following URLs:
1. https://hostname:7001/ibm/console/logon.jsp
2. https://hostname:7026/ProcessCenter/login.jsp
3. https://hostname:7026/ProcessPortal/login.jsp
4. https://hostname:7026/ProcessAdmin/login.jsp
o Access the BPM Process Server at the following URLs:
1. https://hostname:8001/ibm/console/logon.jsp
2. https://hostname:8026/ProcessPortal/login.jsp
3. https://hostname:8026/ProcessAdmin/login.jsp
o Access the IBM Stewardship Center portal at
https://hostname:8026/ProcessPortal/login.jsp
IBM WebSphere MQ container (mdmmq_container)
o IBM WebSphere MQ administrator credentials:
1. user ID: admin
2. password: passw0rd
292
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
o IBM WebSphere MQ port number: 1414
o IBM WebSphere MQ queue manager: MDM.QUEUE.MGR
o IBM WebSphere MQ channel: MDM.SVR.CH
linking containers,
Docker is a set of platforms as a service (PaaS) products that use the Operating
system level visualization to deliver software in packages called containers.There
are times during the development of our application when we need two containers to
be able to communicate with each other. It might be possible that the services of
both containers are dependent on each other. This can be done with the help
of Container Linking.
Previously the containers were used by using the ―–link‖ flag but that has now
become deprecated and is considered a legacy command.
293
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
The Default Way
Once we install docker and create a container a default bridged network is assigned
to docker, by the name of Docker0. The IP is in the range of 172.17.0.0/16 (where
172.17.0.1 is assigned to the interface)
Now the containers that we will create will get their IPs in the range of
172.17.0.2/16.
Step 1: Create two new containers, webcon, and dbcon
$ docker run -it --name webcon -d httpd
$ docker run -it --name dbcon -e MYSQL_ROOT_PASSWORD=1234 -d mysql
You can use any image, we’ll be using MySQL and HTTPD images in our case.
294
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
With the help of these IPs, the docker host establishes a connection with the
containers.
Step 3: Get inside the webcon container and try to ping the dbcon container, if you
get a response back this means that the default connection is established.
$ docker container exec -it webcon /bin/bash
(to get into the webcon container)
$ ping "172.17.0.3"
(ping the dbcon container)
User-Defined Way
295
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
296
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Environment Variables
If suppose the developer mentioned some –env (Environmental variables) in the
source code by which we can connect to the database server, for example,
Username and password then while creating the container we set the username and
password as shown in the below command.
docker run -d --name <name> -e USERNAME=<***> -e PASSWORD=<***> --
network <****>
We can set the above-mentioned env variables to the database container by using
the following command.
docker run -d -p <port> --name <name> -e HOSTNAME=<***> -e
USERNAME=<***> -e PASSWORD=<***> --network <***>
Container Linking allows multiple containers to link with each other. It is a better option than
exposing ports. Let’s go step by step and learn how it works.
297
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
298
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Then run the env command. You will notice new variables for linking with the source
container.
At times, you may need to set out some networking rules to enable smooth interaction between cont
make your Docker ports accessible by services in the outside world.
Whereas each of the above rules may realize mostly similar results, they work differently.
This article will demonstrate how to apply different networking rules when implementing Docker expo
While the two commands are equivalent, they differ in how they
a) Using EXPOSE
With the EXPOSE rule, you can tell Docker that the container listens on the stated network ports dur
The above line will instruct Docker that the container‘s service can be connected to via port 8080.
By default, the EXPOSE keyword specifies that the port listens on TCP protocol.
As earlier explained, you can use the –expose flag in a Docker run string to add to the exposed port
By default, the EXPOSE instruction does not expose the container‘s ports to be accessible from the
stated ports available for inter-container interaction.
For example, let‘s say you have a Node.js application and a Redis server deployed on the same Doc
application communicates with the Redis server, the Redis container should expose a port.
If you check the Dockerfile of the official Redis image, a line is included that says EXPOSE 6379. Th
talk with one another.
Therefore, when your Node.js application connects to the 6379 port of the Redis container, the EXP
container communication takes place.
Publishing Docker ports via -P or -p
300
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
There are two ways of publishing ports in Docker:
Using the
-P flag
Using the
-p flag
As earlier mentioned, EXPOSE is usually used as a documentation mechanism; that is, hinting to th
providing services.
Docker allows you to add -P at runtime and convert the EXPOSE instructions in the Dockerfile to
spe
Docker identifies all ports exposed using the EXPOSE directive and those exposed using the –expo
is mapped automatically to a random port on the host interface. This automatic mapping also preven
It allows you to map a container‘s port or a range of ports to the host explicitly—instead of
exposing
301
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
So, while it‘s possible for your Docker containers to connect to the outside world without making any
for the outside world to connect to your Docker containers.
If you want to override this default behavior, you can use either the -P or the -p flag in your Docker r
Publishing ports produce a firewall rule that binds a container port to a port on the Docker host, ensu
client that can communicate with the host.
It‘s what makes a port accessible to Docker containers that are not connected to the container‘s net
Docker environment.
So, while exposed ports can only be accessed internally, published ports can be accessible by exter
That‘s the main difference between exposing and publishing ports in Docker.
Container Routing.
Container routing determines how to transport containers from their origins to their
destinations in a liner shipping network. Take Figure 1 as an example, which shows a liner
shipping network consisting of three ship routes. Containers from Singapore to Hong Kong
can be transported on either ship route 1 or ship route 2. If there are many containers to be
transported from Singapore to Jakarta, then containers from Singapore to Hong Kong should
be transported on ship route 2 to reserve the capacity on ship route 1 for containers from
Singapore to Jakarta. In addition to different ship routes on which containers can be
transported from origin to destination, another complicating factor is transshipment. For
instance, containers from Hong Kong to Colombo can be transported on ship route 2, or they
can be transported on ship route 1 to Singapore and transshipped to ship route 2 and then
transported to Colombo. The choice of direct shipment on ship route 2 is preferable because
otherwise it would involve a high transshipment cost at Singapore. However, if there are many
containers to be transported from Hong Kong to Xiamen or from Xiamen to Singapore, then
the choice of transshipment at Singapore from ship route 1 to ship route 2 has to be adopted.
Consequently, it is not an easy task to determine the optimal container routing.
302
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Figure 1
An illustrative liner shipping network [8].
Container routing determines the container handling cost. Table 1 shows the handling costs for two
types of laden containers at three ports: D20 means dry 20 ft container, and D40 is dry 40 ft container.
In terms of cargo capacity, a D40 is equivalent to two D20s. However, Table 1 clearly indicates in the
three rows ―Ratio‖ that the ratio of the cost of handling a D40 and that of handling a D20 is strictly
less than 2. In fact, all the ratios in Table 1 are less than 1.5, and some ratios are even 1 or very close
to 1. This is because both the handling of a D20 and the handling of a D40 involve one quay crane
move (we note that nowadays some quay cranes can handle one D40 or two D20s in each move.).
Therefore, to reduce container handling costs, a shipping line should try to transport more D40s
instead of D20s as a D40 can hold as much cargos as two D20s.
Table 1
Laden container handling cost (USD/container) at three ports (source: [11]).
As the handling cost of a D40 is much lower than that of two D20s, it might be advantageous to
unpack two D20s and repack them to one D40. In the sequel, we use ―TEU‖ and ―D20‖
interchangeably and use ―forty-foot equivalent unit (FEU)‖ and ―D40‖ interchangeably. The load,
303
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
transshipment, and discharge cost (USD/container) of a TEU at a port is denoted by , , and ,
respectively. The load, transshipment, and discharge cost (USD/container) of an FEU at port is
denoted by , , and , respectively. We further let be the cost of repacking two TEUs into one FEU
and be the cost of unpacking one FEU to two TEUs (Since multiple rehandling of containers would
increase the risk for damage and therefore may increase insurance costs, we can include in and the
extra insurance costs. Moreover, repacking requires consent from shippers and we can include in the
rehandling cost the component of discount for shippers who agree for their cargos to be repacked.).
Figure 2 shows an example of transporting two TEUs from port to port . The two TEUs need to be
transshipped twice. If they are transported as two TEUs, as shown in Figure 2(a), then, at the port of
origin, that is, , two TEUs are loaded; at , two TEUs are transshipped; at , two TEUs are transshipped;
and, at the destination port , two TEUs are discharged. Therefore, the total container handling cost is
304
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
How to Setup Nginx Reverse Proxy for Routing Incoming Traffic to Different Containers
and Certbot for Auto-Renewing SSL Certificates
For small applications or test environments where separate machines for different web
servers are cost prohibitive, one option is to have different servers run on the same
machine in different Docker containers. Docker doesn’t support exposing the same port
to multiple containers simultaneously (source). Still, we can install Nginx on the host
machine and have it conditionally route the requests to the different containers.
Incoming traffic routed to two docker containers with different web servers on the same
machine
The problem
After exposing port 80 to one container, if we try to have that port exposed to another
container, we get the following error:
1. Install Docker
There are plenty of tutorials out there for installing Docker, such as this one. This article
will focus on the routing part.
Again, there are plenty of other good tutorials on this. Here is one. (You can also install
Nginx in a separate container.)
For our example, we would setup A records for flaskapp and nodeapp that point to the IP
of the server in the DNS records of example.com.
4. Run the Docker containers with the web servers you need, but on ports other than 80
and 443
Say we have two Docker images already built or pulled: flaskApp and nodeApp. We can
expose port 8080 for flaskApp and port 8081 for nodeApp:
To serve both these apps on port 80, we will set up server blocks in the host machine.
server {
listen 80;
server_name flaskapp.example.com;
location / {
proxy_pass http://localhost:8080;
}
306
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
}
Create the server block for nodeapp.example.com:
server {
listen 80;
server_name nodeapp.example.com;
location / {
proxy_pass http://localhost:8080;
}
}
Create symlinks in the sites-enabled directory:
$ sudo nginx -t
Restart Nginx:
Of course, “-d www.flaskapp.example.com” is not necessary if you are not using the
www version for the subdomain. After following the prompts, you will see a success
message as below.
307
AMIT KUMAR SINGH (MCA)
Allana Institute Of Management Sciences
Requesting a certificate for flaskapp.example.com
Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/flaskapp.example.com/fullchain.pem
Key is saved at:
/etc/letsencrypt/live/flaskapp.example.com/privkey.pem
This certificate expires on 2022-10-25.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.
Deploying certificate
Successfully deployed certificate for test4.cansin.net to /etc/nginx/sites-enabled/flaskapp.example.com
Congratulations! You have successfully enabled HTTPS on https://flaskapp.example.com
You can list your certificates with:
$ certbot certificates
… and delete them with:
$ sudo rm /etc/nginx/sites-enabled/flaskapp.example.com
$ sudo rm /etc/nginx/sites-available/flaskapp.example.com
308