Vini Internship Report
Vini Internship Report
ON
SERVER-SIDE INDUSTRIAL AUTOMATION SYSTEM
By
VINI SINGH
GUIDED BY
MR. ANIL JAIN
Session 2019-2020
1
2
3
4
ACKNOWLEDGEMENT
For summer internship program I was selected to intern at Linux World Informatics Pvt.Ltd. It is
one of the leading industrial internship programs in India. The main projects in the company are
related to server-side operations. In today’s ever disrupting industrial era where industry 4.0 is the
current biggest change taking place, Linux World is one of the leading companies which helps its
interns and employees to keep updated with the trend.
The one of the most influential personalities I met during this internship was my mentor
Mr. Vimal Daga. Mr. Vimal is not only a mentor but he is also a visionary who is helping the
interns working under him to be ahead in the competition by educating them with the cutting-edge
technology.
The environment at Linux world is very good for interns as there is feeling of a community
where all the employee senior or the juniors and the interns help each other achieve the goals. One
of the best parts of the Linux World is sharing, instead of running a rat race and trying to be best,
everyone working here tries to share their ideas and opinions with other employee and interns.
This not only helps people in solving problems but also learning new stuff and technologies.
I would also like to thank Mrs. Shruti Kalra (Head of Department Electronics &
Communication Engineering), Mr. Anil Jain (Mentor) and faculty members of the department,
who often helped and gave me valuable guidance to prepare my report.
Signature
Vini Singh
B.Tech 4th Year
(Electronics & Communication Engg.)
5
ABSTRACT
In today’s world, n number of tools are available in the market. Still the industry faces lots of
problems. The core reason for this is the lack of in-depth knowledge of the tools and their
backgrounds. The industry experts need to touch each and every domain and key feature of the
tool before implementation in real time. One such domain is Security. The affordability of the tools
in the sector may be beyond the budget of the company in the usual scenario. The cost of
implementing such a technology on their scale may easily cost much more than what they expect
and earn. With the current craze of serverless architecture the scalability of an infrastructure is on
of the most crucial things.
So, the companies suffer a lot and try to find cheap solutions out of the box in this area. The most
recommended tool here is Splunk. But the actual cost of the Tool is extremely high that the people
cannot afford it. Small startup to even big companies is not able to afford it. It allows processing
of only extremely small amount of data for free and the rest of the cost per Gi is exceptionally
high. The next issue not directly related is the administrator security and the automation of the
cloud-based infrastructure.
We worked to develop a highly scalable voice automated scalable cloud infrastructure, which just
required to connect a new PC to the network and just a restart of the node is required.
And in case of security of admin, the admin was remotely authenticated with facial recognition.
The recognition data was streamed to the main server.
So we insured that the product we make is affordable for all the companies and doesn’t put a high
end requirements on the customers. Moreover, we made used of tools like Hadoop instead of Spark
so that using a primitive and core tool, which is more widely accepted, the companies may not
deploy any new infrastructure to make use of the product.
Another key focus was to work with small-small optimizations in the underlying tools used so that
the end-product is highly optimized. We also focused to build up an ideal cloud infrastructure with
some good number of services available, which was then targeted in terms of Security and the
analysis was done.
6
CHAPTER INDEX
S. No. Title Page No.
Certificate [ii-iv]
Acknowledgement [v]
Abstract [vi]
1. Introduction 1
2. Related Work 2-7
2.1 Cloud Computing 2-3
2.1.1 Software as a Service 3
2.1.2 Platform as a Service 3
2.1.3 Storage as a Service 3
2.1.4 Container as a Service 3
2.2 Big Data 4
2.3Computer Vision 4
2.4 DevOps 4
2.5 Docker 4-7
2.5.1 Docker 4
2.5.2 IT Industry Revolution by Docker 5
2.5.3 What is Docker Registry 5
2.5.4 Docker Architecture 5
2.5.5 Deployment of Docker in Linux 6-7
3. Design & Implementation 13
3.1 Overview of Proposal System 13
3.2 Design of System 14
3.2.1 Python CGI 15
3.2.2 Ansible 15
3.3 Implementation of the System 16-25
4 Result & Conclusion 26
References 27
APPENDIX I 28
7
FIGURE INDEX
S. No. Title Page No.
1. 2.1 Architecture of Cloud 3
2. 2.5.4 Docker Architecture 6
3. 2.5.5 Deployment of Docker 7
4. 3.2.1 Architecture of CGI 11
5. 3.2.2 Ansible Architecture 13
6. 3.3.1 Home page of Server-Side Industrial Automation 16
System
7. 3.3.2 Output of what runs in the backend when httpd is 18
executed
8. 3.3.3 Inventory host file 18
9. 3.3.4 Simple example of an ansible adhoc command 19
10. 3.3.5 Output after running the yum configuration 20
playbook
11. 3.3.6 Output to crosscheck that yum has been 21
8
TABLE INDEX
S. No. Title Page No.
1. 1 Programming Components and Frameworks 8
2. 2 Software and Frameworks 9
9
1. Introduction
In today’s world the companies don’t have much time to spare and configure environments for
the entire systems manually. Hence this product might help them reduce their work drastically.
Currently the industries are running towards DevOps, the agile culture. We have a huge number
of tools; the count is huge enough to confuse what to use and what not to. This creates lots of
problems in industries where the company may not be using optimal solutions as going a deep dive
in each of the tools is not possible for them. So, the company needs to focus on all the field when
looking for a tool or architecting an infrastructure. There may be n number of tools to satisfy the
needs but what suits them and completes their core requirement is essential. Due to such
complexity’s security has become more tougher thing for the companies. They need to also go for
tools separately for security.
The necessity of System Administrator to code should be eliminated, as they are not developers.
The best solution here would be an automated interface for them offering high scalability. The
main focus of the System Administrator should be on the administration rather than coding the
infrastructure. So, the interface needs to be flexible and need to be able to create a scalable
infrastructure. The security of the interface will not be taken lightly. This can be taken care by the
Computer Vision. The system admin can be remotely authenticated facially.
The best driving statement here is that, industry needs an affordable solution which can be tuned
according to their needs and can be implemented without any fancy hardware needs. The other
focus is to make the work of an Admin more optimal and secure.
For enabling persistence in Ansible, we combined the Python-CGI and ansible. So, a dynamic
infrastructure could be generated using the code. The Python-CGI enabled file handling hence
persistence.
10
2. RELATED WORK
In this project we worked with various technologies, DevOPS, Cloud Computing, Big Data and
Computer Vision. Theses technologies cover very vast fields, so we have worked on a small
section of these technologies. So, the sections of these technologies on which we have worked are
mentioned in this chapter with their description of how they work and in the next section, we have
mentioned that how these technologies are in our project.
11
Figure 2.1 : Architecture of Cloud
12
challenges. Storage is rented out on subscription (monthly, yearly or space) basis. It allows clients
to easily store their data (files, images etc.) on the provider’s account. All the stored data can be
accessed from anywhere, as long as the person has login information and internet access. Best
examples are – AWS S3, EBS, EFS and Glacier.
2.4 DEVOPS
DevOps is the combination of cultural philosophies, practices, and tools that increases an
organization’s ability to deliver applications and services at high velocity: evolving and improving
products at a faster pace than organizations using traditional software development and
13
infrastructure management processes. This speed enables organizations to better serve their
customers and compete more effectively in the market. The Devops tool used here is Ansible.
2.5 DOCKER
Docker container technology was launched in 2013 as an open sourceDocker Engine.
It leveraged existing computing concepts around containers and specifically in the Linux world,
primitives known as cgroups and namespaces. Docker's technology is unique because it focuses
on the requirements of developers and systems operators to separate application dependencies
from infrastructure.
Success in the Linux world drove a partnership with Microsoft that brought Docker containers
and its functionality to Windows Server (sometimes referred to as Docker Windows containers).
Technology available from Docker and its open source project, Moby has been leveraged by all
major data center vendors and cloud providers. Many of these providers are leveraging Docker
for their container-native IaaS offerings. Additionally, the leading open source serverless
frameworks utilize Docker container technology.
Docker is a tool that is designed to benefit both developers and system administrators, making it
a part of many DevOps (developers + operations) toolchains. For developers, it means that they
can focus on writing code without worrying about the system that it will ultimately be running
on. It also allows them to get a head start by using one of thousands of programs already
designed to run in a Docker container as a part of their application. For operations staff, Docker
gives flexibility and potentially reduces the number of systems needed because of its small
footprint and lower overhead.
In a way, Docker is a bit like a virtual machine. But unlike a virtual machine, rather than creating
a whole virtual operating system, Docker allows applications to use the same Linux kernel as the
system that they're running on and only requires applications be shipped with things not already
running on the host computer. This gives a significant performance boost and reduces the size of
the application.
14
Storage itself is delegated to drivers. The default storage driver is the local posix filesystem,
which is suitable for development or small deployments. Additional cloud-based storage drivers
like S3, Microsoft Azure, OpenStack Swift, and Aliyun OSS are also supported. People looking
into using other storage backends may do so by writing their own driver implementing the
Storage API.Since securing access to your hosted images is paramount, the Registry natively
supports TLS and basic authentication.the Registry GitHub repository includes additional
information about advanced authentication and authorization methods. Only very large or public
deployments are expected to extend the Registry in this way.Finally, the Registry ships with a
robust notification system, calling webhooks in response to activity, and both extensive logging
and reporting, mostly useful for large installations that want to collect metrics.
When compared to Virtual machines, the Docker platform moves up the abstraction of resources
from the hardware level to the Operating System level. This allows for the realization of the
various benefits of Containers e.g. application portability, infrastructure separation, and self-
contained microservices. In other words, while Virtual Machines abstract the entire hardware
server, Containers abstract the Operating System kernel. This is a whole different approach to
virtualization and results in much faster and more lightweight instance
15
Figure 2.5.5: Deployment of Docker
The Docker client enables users to interact with Docker. The Docker client can reside on the
same host as the daemon or connect to a daemon on a remote host. A docker client can
communicate with more than one daemon. The Docker client provides a command line interface
(CLI) that allows you to issue build, run, and stop application commands to a Docker daemon.
The main purpose of the Docker Client is to provide a means to direct the pull of images from a
registry and to have it run on a Docker host. Common commands issued by a client are:
[root@terminal]#docker build
[root@terminal] #docker pull
[root@terminal]#docker run
16
The proposed system was implemented under RHEL 7.5 under master slave architecture. There
were 3 slaves which can be easily scaled, the RHEL 7.5 installation of nodes was standardized
using Kickstart and the setup task was automated. The client services were accessible through the
Main Webserver on the Master. The rest deployment was carried on the client. No reverse proxy
exists so the client was redirected to the appropriate node IP and port for accessing the service.
The following are the main components: -
The client was provided services like – SaaS, PaaS, CaaS, STaaS through containers. They were
provided with isolated environments. For proving STaaS simple NFS with LVM was used. CaaS
(Container as a service) was provided through Shellinabox which provided the shell of the
container in the web page itself. The SaaS (Software as a Service) can be directly streamed to the
webpage without any overhead to the client to install a software to make use of the services.
Another main component of the project is Big Data. There is a separate internal cluster where the
network packet dumps are stored. The dumps are then analyzed through Map Reduce. Then the
results can be used to blacklist the DoS attacker’s IP Address in the Firewall.
Computer Vision paired with Speech recognition was also one of the main features. The
administrator is required to run a container remotely which will capture his face through the
webcam and the face is cropped (through Haar Cascade) and streamed to the main server where
the identification is done.
After identification, the admin gets access to the voice activated menu for setting up the new nodes
after the Kickstart. The nodes are then added permanently to the cluster. The Admin also can
17
provision any node for provisioning any type of services like StaaS or other cloud services or
Hadoop cluster.
An HTTP server invokes a Python CGI script so it can process user input that a user may submit
through an HTML <FORM> or <ISINDEX> element.
18
Such a script usually lives in the server’s special cgi-bin directory. For a request, the server
places information about it- the client’s hostname, the requested URL, the query string, and other
information- in the script’s shell environment. It executes this script and sends the output back to
the client.
The input for this script connects to the client, and server sometimes reads the form data this
way. Other times, it passes form data through a query string, which is a part of the URL. A query
string holds data that doesn’t conventionally fit a hierarchical path structure.
Python CGI module handles situations and helps debug scripts. With the latest addition, it also
lends us support for uploading files from a form.
So, what does a Python CGI script output? It gives out two sections separated by a blank line. Of
these, the first section holds headers that instruct the client that a certain kind of data follows.
It is a set of standards that define a standard way of passing information or web-user request to
an application program & to get data back to forward it to users. This is the exchange of
information between web-server and a custom script. When the users requested the web-page,
the server sends the requested web-page. The web server usually passes the information to all
application programs that process data and sends back an acknowledged message; this technique
of passing data back-and-forth between server and application is the Common Gateway
Interface.
19
3.2.2 ANSIBLE:
Ansible is an open-source software provisioning, configuration management, and application-
deployment tool. It runs on many Unix-like systems, and can configure both Unix-like systems
as well as Microsoft Windows. It includes its own declarative language to describe system
configuration.
Ansible was written by Michael DeHaan and acquired by Red Hat in 2015. Ansible is agentless,
temporarily connecting remotely via SSH or remote PowerShell to do its tasks.
Ansible is an open source automation platform. It is very, very simple to setup and yet powerful.
Ansible can help you with configuration management, application deployment, task automation.
It can also do IT orchestration, where you have to run tasks in sequence and create a chain of
events which must happen on several different servers or devices. An example is if you have a
group of web servers behind a load balancer. Ansible can upgrade the web servers one at a time
and while upgrading it can remove the current web server from the load balancer and disable it in
your Nagios monitoring system. So in short you can handle complex tasks with a tool which is
easy to use.
Unlike Puppet or Chef it doesn’t use an agent on the remote host. Instead Ansible uses SSH
which is assumed to be installed on all the systems you want to manage. Also it’s written in
Python which needs to be installed on the remote host. This means that you don’t have to setup a
client server environment before using Ansible, you can just run it from any of your machines
and from the clients point of view there is no knowledge of any Ansible server (you can run
Puppet in standalone mode, but Puppet still needs to be installed). There are some other
requirements though, for example if you want to do something related to git on a remote machine
a git package must first be installed on the remote machine.
Ansible is available for free and runs on Linux, Mac or BSD. Aside from the free offering,
Ansible also has an enterprise product called Ansible Tower.
20
Architecture
Unlike most configuration-management software, Ansible does not require a single controlling
machine where orchestration begins. Ansible works against multiple systems in your
infrastructure by selecting portions of Ansible’s inventory, stored as edit-able, version-able
ASCII text files. Not only is this inventory configurable, but you can also use multiple inventory
files at the same time and pull inventory from dynamic or cloud sources or different formats
(YAML, INI, etc.). Any machine with Ansible utilities installed can leverage a set of
files/directories to orchestrate other nodes, the absence of a central-server requirement greatly
simplifies disaster-recovery planning. Nodes are managed by this controlling machine - typically
over SSH. The controlling machine describes the location of nodes through its inventory.
Sensitive data can be stored in encrypted files using Ansible Vault.
In contrast with other popular configuration-management software — such as Chef, Puppet, and
CFEngine — Ansible uses an agentless architecture, with Ansible software not normally running
or even installed on the controlled node. Instead, Ansible orchestrates a node by installing and
running modules on the node temporarily via SSH. For the duration of an orchestration task, a
process running the module communicates with the controlling machine with a JSON-based
protocol via its standard input and output. When Ansible is not managing a node, it does not
consume resources on the node because no daemons are executing of software installed.
Networking
Ansible Network modules extend the benefits of simple, powerful, agentless automation to
network administrators and teams. Ansible Network modules can configure your network stack,
test and validate existing network state, and discover and correct network configuration drift.
21
If you’re new to Ansible, or new to using Ansible for network management, start with Getting
Started with Ansible for Network Automation. If you are already familiar with network
automation with Ansible, see Advanced Topics with Ansible for Network Automation.
Tower
Ansible Tower gives you role-based access control, including control over the use of securely
stored credentials for SSH and other services. You can sync your Ansible Tower inventory with
a wide variety of cloud sources, and powerful multi-playbook workflows allow you to model
complex processes.
It logs all of your jobs, integrates well with LDAP, SAML, and other authentication sources, and
has an amazing browsable REST API. Command line tools are available for easy integration
with Jenkins as well.
DESIGN GOALS
The design goals of Ansible include:
MODULES
Modules are mostly standalone and can be written in a standard scripting language (such as
Python, Perl, Ruby, Bash, etc.). One of the guiding properties of modules is idempotency, which
means that even if an operation is repeated multiple times (e.g., upon recovery from an outage),
it will always place the system into the same state.
22
All modules return JSON format data. This means modules can be written in any programming
language. Modules should be idempotent, and should avoid making any changes if they detect
that the current state matches the desired final state. When used in an Ansible playbook, modules
can trigger ‘change events’ in the form of notifying ‘handlers’ to run additional tasks.
INVENTORY CONFIGURATION
The Inventory is a description of the nodes that can be accessed by Ansible. By default, the
Inventory is described by a configuration file, in INI or YAML format, whose default location is
in /etc/ansible/hosts. The configuration file lists either the IP address or hostname of each node
that is accessible by Ansible. In addition, nodes can be assigned to groups.
The inventory file can list individual hosts or user-defined groups of hosts. This enables you to
define groups of devices running Junos OS with similar roles upon which to perform the same
operational and configuration tasks. For example, if you are managing one or more data centers,
you can create Ansible groups for those switches that require the same set of operations, such as
upgrading Junos OS and rebooting the device.
In order to manage devices running Junos OS using Ansible, you must have a Junos OS login
account with appropriate access privileges on each device where Ansible modules are executed.
You must ensure that usernames and passwords or access keys exist for each host in the file.
An example inventory:
192.168.6.1
[webservers]
foo.example.com
bar.example.com
This configuration file specifies three nodes: the first node is specified by an IP address and the
latter two nodes are specified by hostnames. Additionally, the latter two nodes are grouped under
the webservers group.
Ansible can also use a custom Dynamic Inventory script, which can dynamically pull data from a
different system.
23
Playbooks
Playbooks are YAML files that express configurations, deployment, and orchestration in
Ansible, and allow Ansible to perform operations on managed nodes. Each Playbook maps a
group of hosts to a set of roles. Each role is represented by calls to Ansible tasks.
Playbooks are the files where Ansible code is written. Playbooks are written in YAML format.
YAML stands for Yet Another Markup Language. Playbooks are one of the core features of
Ansible and tell Ansible what to execute. They are like a to-do list for Ansible that contains a list
of tasks.
Playbooks contain the steps which the user wants to execute on a particular machine. Playbooks
are run sequentially. Playbooks are the building blocks for all the use cases of Ansible.
24
3.3 IMPLEMENTATION OF THE SYSTEM
Figure 3.3.1: Home page of Server-Side Industrial Automation System Web Portal
This is a site in which we have encapsulated the project’s main idea. This is a html front face with
python CGI running in backend. We have also given some CSS styling to make it a bit attractive.
The site uses Ansible notebooks when called, to execute the given task. For e.g.: if we want to run
a docker/container we just need to click the button run a docker and as we have put the speech
recognition feature too, you can even speak it out “Run a docker” to execute the command.
Obviously, it will ask for certain interactive questions like
1. Do you want it on your local system or on a remote system?
2. If remote desktop then provides it with the necessary IP Address details
3. Also provide which image of a docker would you like to choose.
The site in backend tracks the docker ansible playbook and then it will run on the local/remote
host as per the given instructions.
25
Playbook to configure httpd service
- hosts: all
tasks:
- package:
name: "httpd"
state: present
- copy:
dest: "/var/www/html"
src: "/var/www/html/index.html"
- service:
name: "httpd"
state: started
enabled: yes
Output:
26
Figure 3.3.2: Output of what runs in backend when httpd playbook is executed
This is the playbook that would run in backend if “configuration of http service” is chosen as one
of the many offered tasks. This output is not shown it can be seen only if we run it manually.
Hence it is just for understanding the backend part
Note: All the yml files(i.e the playbooks) are stored under the /var/www/cgi folder in the given
OS (i,e RHEL )
27
Basic Ansible Command
The above command is a basic command that shows the date of localhost. If we want we can
also write IP of any of the node that is present in the ansible inventory file
- hosts: all
tasks:
- copy:
src: /root/rhel7_5_rpm_extras
dest: /root
- copy:
src: /root/rhel7_extra_new_rpm
dest: /root
- name: creating repos
yum_repository:
name: dvd
description: dvd yum repo
file: dvd
baseurl: file:///run/media/root/RHEL-7.5\ Server.x86_64
gpgcheck: no
- name: rpm_extra
yum_repository:
name: rpm_extra
file: rpm_extra
description: rpm1
baseurl: file:///root/rhel7_5_rpm_extras
gpgcheck: no
28
- name: rpm_extra_new
yum_repository:
name: rpm_new
description: rpm2
file: rpm_new
baseurl: file:///root/rhel7_extra_new_rpm
gpgcheck: no
Output:
Note : The green color shows that no change has been made to the system state or whichever
state it suggests . And the yellow color shows change in the prevous settings.
For eg : It is clearly visible that [Gathering facts] has no change in it , hence it is in green color.
While the [copy ] has yellow color . This shows that the copy task has been executed
successfully and it was done for the first time(i.e no other file of same name has been copied to
the same destination ever before.)
29
Crosschecking the result
- hosts: all
tasks:
- name: Check the hadoop software
command: "rpm -q hadoop"
register: hq
30
# dest: /root/Desktop/
# when: "'not installed' in hq.stdout"
# - get_url:
# dest: /root/jdk.rpm
# url: #Write the url here
# when: "'No package' in x.results[0] "
# ignore_errors: True
31
- hosts: name
tasks:
- name: "Making Directory"
file:
path: "/share"
state: "directory"
- name: "Copying the HDFS configuration file"
copy:
src: "/code2/automate/hdfs-site-name.xml"
dest: "/etc/hadoop/hdfs-site.xml"
- name: "Copying the Core configuration file"
copy:
src: "/code2/automate/core-site.xml"
dest: "/etc/hadoop/core-site.xml"
- name: "Configuring the HDFS file"
lineinfile:
path: /etc/hadoop/hdfs-site.xml
regexp: '^<value>dir</value>'
line: "<value>/share</value>"
- name: "Configuring the CORE file"
lineinfile:
path: /etc/hadoop/core-site.xml
regexp: '^<value>ip</value>'
line: "<value>hdfs://{{ groups['name'][0] }}:9001</value>"
32
- name: "Firewall Rule"
firewalld:
port: "9001-9002/tcp"
immediate: true
permanent: true
state: "enabled"
- hosts: data
tasks:
- name: "Making Directory"
file:
path: "/data"
state: "directory"
- name: "Copying the HDFS configuration file"
copy:
src: "/code2/automate/hdfs-site-data.xml"
dest: "/etc/hadoop/hdfs-site.xml"
- name: "Copying the Core configuration file"
copy:
src: "/code2/automate/core-site.xml"
dest: "/etc/hadoop/core-site.xml"
- name: "Configuring the HDFS file"
lineinfile:
path: /etc/hadoop/hdfs-site.xml
regexp: '^<value>dir</value>'
line: "<value>/data</value>"
- name: "Configuring the CORE file"
lineinfile:
path: /etc/hadoop/core-site.xml
regexp: '^<value>ip</value>'
line: "<value>hdfs://{{ groups['name'][0] }}:9001</value>"
33
- name: "Starting the Firewall Services"
service:
name: "firewalld"
state: "restarted"
34
4. RESULT AND CONCLUSION
As a result, we can say that, the most basic problem that has been solved by this project is that if
there are huge numbers of systems that need to be configured you don’t need to go manually on
each of the system and configure it. Instead you can use this Server Side Industrial Automation
System to put the IPs of all the systems that need to be configured inside the inventory file of the
ansible software and just run the playbook on the master system in a company. This will
automatically configure all the systems.
Hence this is just one of the problems but the cloud related problems can be vastly solved and the
rental of computing unit might become even more frequent than it is right now.
35
REFERENCES
36
APPENDIX I
37