0% found this document useful (0 votes)

29 views52 pages

Supraja DevOps Interview Preparation

Uploaded by

Manikanta Pabbaraju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views52 pages

Supraja DevOps Interview Preparation

Uploaded by

Manikanta Pabbaraju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 52

My Introduction:

I have been working with Applaud for over 3.5 years as a Consultant. To give a small brief about
Applaud, We are a product based company providing HR related services to our clients. As a
cloud consultant, my responsibilities include maintaining the complete infrastructure to
support our Application.

We provisioned the infrastructure using AWS which we are using as our cloud provider. And we
have used all the main various services like EC2, Load balancer, Auto scaling, Route 53, S3,
CloudFront, Cloudwatch, ElasticCache.

We have clients all over the world and henceforth our infrastructure is provisioned in a multi
region which is setup with high fault tolerance and scalable to handle millions of requests.

We automated our infrastructure provisioning with Terraform which is the primary IaC tool

Coming to our Application code , it is maintained completely in the Github and this is the
primary VCS tool for our organization.

I also own 2 certifications: AWS Certified Solutions Architect Associate, Hashicorp CTA

Apart from this, in my self-interest I also worked on a React project and built my own
application using the live-swiggy api that displays all the restaurant related information using
key concepts of React.

TeamCity:
Teamcity is used as the CI/CD tool integrated with Github to run various builds and pipelines as
part of daily development operations. Below are some of the builds that are run in the
Teamcity:

To brief one such pipeline in Teamcity, we have integrated teamcity with the SonarQube:

SonarQube: (Unit tests)

SonarQube is a 3rd party tool we’re using as a Code Coverage tool to analyze the code, and to
detect the no.of bugs, code duplication, check the quality of a code. It’s integrated with
teamcity and whenever the builds are triggered it analyzes the code based on the script and
pushes back the report to github. It ensures that the code meets specific quality standard
before integrating into the main DB

→ Code Coverage Percentage (> 80%):

Ensures a high proportion of your code is tested, reducing bugs.

→ Technical Debt Ratio (< 5%):

Measures how much code needs refactoring for maintainability.

→ Number of Bugs (Ideally 0):

Counts coding errors needing fixes for functional integrity.

→ Security Vulnerabilities (Minimal):

Identifies potential security risks needing attention.

→ Code Smells Count (Minimal):

Detects 'smelly' code that may need improvement for better readability.

→ Duplications Percentage (< 3%):

Highlights repeated code blocks that should be simplified.

→ Security Hotspots Reviewed (100%):

Ensures all potential security risks are examined.

→ Complexity Metrics (Cyclomatic Complexity < 10):

Evaluates how complicated the code is, aiming for simplicity.

→ Coding Rules Compliance (Close to 100%):

Shows adherence to set coding standards for quality.

→ Quality Gate Status (Passed):

Indicates the overall health of the codebase, based on set criteria.
Ghost inspector: (E2E testing)

So as part of end to end testing, we have another build in TeamCity, that performs the end to
end tests for which we are using the GhostInspector tool. We have all the testcases that are
created in the GhostInspector. And as part of this pipeline, it sets up a new dev-like
environment where all the tests are run and a report will be generated to the developer.

Nightly builds:

And as default, the merge button will be disabled in Github and once all the tests are
succeeded, the merge will be enabled and code will be merged to the development branch .

As part of deploying changes to the staging, we have Nightly build in Teamcity. In this, we pull
all the latest changes that were merged into the development branch recently and deploy those
into the staging environment and restart all the services. This way we will ensure all the latest
changes are deployed into the staging environment every day.

Deployment steps:

1) Take the backup of MongoDB

1) Start the standalone server respect to the env and region
2) SSH into the instance and fetch the latest code and checkout to the latest release
branch
3) Stop the standalone server and create the AMI
4) Then update the launch template of the ASG to the latest version and rotate the servers.
(i.e., adding the new servers with the latest code and once all the servers are healthy
then delete the older instances)
5) Lock the branch to prevent any new commits

About env.json

We have different environments like stage, tryapplaud, production. And we have a separate
env.json file for each environment using which our code base refers to connect to aws, mongo
atlas and to refer to other global environment specific variables.
Now during deployment, we have an init script that creates env.json after instance creation and
starts all the services from the launch template.
Explain more about TeamCity and how efficiently are you handling the builds?

It’s a powerful CI/CD server which automates the building, testing, and deploying of software
projects.
We’re having multiple builds. Ex: To run E2E tests, to run unit test cases, and other builds etc
Apart from the Default Agent, we created another Agent pool in which we provide a launch
template with all details like AMI, no.of instances and the idle threshold limit is 30 mins i.e, if a
build remains idle for 30 mins, then it gets terminated and if new build gets triggered, another
instance gets created. When developers raises multiple PRs and the build triggers, if all the
builds in that Agent Pool are busy then it waits in the queue until the build completes its
execution or if the earlier request is completed in less time, then the next request instead of
going into the queue utilizes the current instance

What’s the advantage of maintaining multiple Agent Pools and multiple build agents?
An Agent Pool with multiple build agents can process multiple requests instead of sending to
queue and thus saving time

NAT gateway is placed in the public subnet, so that servers in the private subnet can be secured
from the internet and in this way, instances in the private subnet can download the packages
from the internet securely without exposing to the outer world

AWS Infrastructure:

Nginx:
It’s a high performance web server, reverse proxy server and load balancer. It’s mainly used for
serving static content, handling requests from web clients and distributing the load across
multiple servers

How Nginx is different from NLBs:

Even though they both distribute traffic, they serve different purposes.

NLB: It operates at layer 4(Transport layer) and distributes TCP/UDP traffic. It can be a
hardware device or a cloud-based service to distribute the network traffic at transport layer

Nginx: It operates at layer 7(Application layer) and routes Http, Https and other protocols based
on the content(URL headers).. It runs as a software on a server and processes various tasks such
as caching, reverse proxy, ssl termination, load balancing.
What is Nginx’s role in enabling Load Balancing, SSL Enabling & Reverse Proxying:
1. Load balancing: Nginx can distribute incoming client requests across multiple backend
servers, preventing any single server from becoming overloaded. The algorithms used
are Round robin, Least connections, IP Hash, Weighted load balancing
2. SSL/TLS Termination: Nginx can handle SSL/TLS encryption and decryption, freeing up
backend servers from this CPU-intensive task. SSL termination also enables Nginx to
perform tasks such as HTTPS redirection
3. Reverse Proxy: Nginx acts as an intermediary for requests from clients seeking resources
from backend servers. This allows Nginx to
Hide the identity of backend servers
Distribute traffic efficiently to multiple servers
Enhance security by preventing direct access to backend servers

What are the benefits of this current architecture?

Nginx in Public Subnet (behind NLB):

● The NLB distributes traffic to Nginx instances, which perform Layer 7 load balancing, SSL
termination, and reverse proxying.
● Nginx handles complex routing and security tasks before passing requests to the
application servers.

Application Servers in Private Subnet (behind ALB):

● Nginx forwards requests to an ALB, which then directs traffic to the appropriate
application servers.
● This setup ensures that the application servers are not exposed directly to the internet,
enhancing security by restricting public access.
● The ALB in the private subnet ensures that traffic is managed internally, with no direct
exposure to the internet.

How does Nginx act as a reverse proxy and route request and which algorithms does it
follow?

Nginx can distribute client requests to backend servers based on various load-balancing
algorithms. These algorithms determine how requests are distributed among the available
backend servers or load balancers:

● Round Robin: Distributes requests sequentially across all available servers. This is the
default method in Nginx.
● Least Connections: Sends requests to the server with the fewest active connections.
This is useful when backend servers have varying workloads.
● IP Hash: Routes requests from a particular client IP address to the same backend server.
This ensures session persistence (sticky sessions) without requiring additional session
management.
● Least Time: Chooses the server with the least average response time and the fewest
active connections. This requires the Nginx Plus module.
● Random with Two Choices: Randomly selects two servers and then picks the one with
fewer active connections.

The most used algorithm to handle the load when there are huge no.of parallel requests is
Least connections. Here is the schema of the nginx config file

Why use the Least Connections algorithm?

Dynamic Load Distribution: It dynamically distributes requests to the server with the fewest
active connections at the time the request is received. This ensures that no single server
becomes a bottleneck, especially when the servers have varying capacities or workloads.
Better Resource Utilization: In environments where backend servers have different processing
capabilities or where requests have different processing times, Least Connections ensures that
more powerful servers or those with lighter loads receive more requests, optimizing overall
resource utilization.
Handles Traffic Spikes Efficiently: During periods of high traffic, Least Connections is effective in
balancing the load across servers, reducing the risk of overloading any single server. This is
crucial when handling a large number of parallel requests.
Resiliency to Uneven Traffic: If some requests take longer to process than others (e.g., due to
heavy computation or I/O operations), Least Connections prevents servers with longer-running
requests from being overwhelmed by distributing new requests to servers that are less busy.

Redis cluster:

It’s an in-memory store that enhances performance by caching data, reducing latency, and
handling high throughput. It scales easily by distributing data across multiple nodes, balancing
the load, and ensuring fault tolerance through automatic failover and data replication.

Multi-Region Redis Cluster benefits

Deploying Redis across multiple regions increases availability and reliability, reduces latency for
global users, and supports disaster recovery by ensuring access to cached data even during
regional outages. It also improves data locality, reducing cross-region data transfer delays.

How Redis improves application performance

Redis excels in handling read-heavy workloads, managing user sessions, and implementing rate
limiting and queuing, leading to faster responses and more efficient application performance.
This makes applications more scalable and resilient globally.

Branching strategies:

Currently we have 3 env’s - Stage, Uat, Prod

As mentioned previously, there is a nightly build running daily in the night and teamcity will
make sure to deploy the latest code to the stage server and in this way the development branch
will be up-to-date. We’re maintaining a 1 month sprint period for deployments in Uat & Prod
environments.
So after the code freeze is done, let’s assume the current branch on which Uat env is running is
on R-12. Now, create a new branch from R-12 and name it R-13 and cherry pick all the merge
commits done in the dev branch as part of Release 13 to the R-13 branch and deploy this
branch to UAT. In this way the latest code is available in Uat env.
Blue green deployment:

It’s a strategy used in software development and deployment to minimize downtime and
reduce risks when releasing new versions of an application. It involves maintaining 2 identical
Prod environments: “blue” and “green”

How Blue-Green deployment works:

1. Two environments:
● Blue environment: This is the live env currently serving all the production traffic
● Green environment: This is an identical env where new version of the application is
deployed

2. Deployment Process:
Step 1: The new version of the application is deployed to the green environment, while
the blue environment continues to serve production traffic.
Step 2: After deployment, testing is performed on the green environment to ensure that
everything is functioning correctly.
Step 3: Once the green environment is verified, the traffic is switched from the blue
environment to the green environment, making the green environment live.
Step 4: The blue environment remains intact as a backup. If any issues are found in the
green environment after the switch, the traffic can be quickly switched back to the blue
environment.

3. Benefits:
Zero Downtime: Because the traffic can be switched instantly, users experience no
downtime during the deployment process.
Easy Rollback: If something goes wrong with the new deployment, it’s easy to revert to
the previous version by switching traffic back to the blue environment.
Improved Testing: Since the new version is deployed to a separate environment, it
allows thorough testing before it goes live.

Example: We have an ASG that uses the launch templates and currently the latest version of the
LT is 70 and is having R-12 code with 2 servers and this is the blue env. Now, you need to
release a new version with additional features. So, from the base server you’ll create a new AMI
which has R-13 code and update the LT with the latest AMI and set the default version to 71
and add 2 more new servers and once all these servers are healthy, we then remove the old
servers and now this is green env. If we find any issues again, we can then set the LT version to
70 and add 2 servers; it runs as usual with the oldest code.
Example: We have a standalone base server with all dependencies and software installed
required for our application. Whenever there is a new release to be deployed, we checkout
that branch in the base server and create a new AMI from the base server.
This way we have different AMI’s available for each Release version. For a new release, we just
update the corresponding AMI in the launch template and create a new Launch Template
version and rotate the servers. (add 2 servers and delete the earlier 2 servers)
Incase of any issues with deployment, we just flip the launch template to its earlier version
(which has the previous release AMI) and rotate the servers again.

Amazon ECS and ECR:

● What it is: ECS is a fully managed container orchestration service by AWS, used to run
and manage containers (like Docker containers) on a cluster of EC2 instances or with
AWS Fargate.
● Purpose: It helps you run, stop, and manage Docker containers in a cluster, handling
container scheduling, scaling, and deployment.
● Key Features:
○ Task Definitions: Defines which Docker containers to run.
○ Clusters: A logical grouping of EC2 instances (or Fargate tasks) where the
containers run.
○ Service: Ensures the desired number of task definitions are running.
○ Scaling: Automatically scales the number of tasks or services based on demand.
● Use Case: Ideal for running microservices or distributed applications in a containerized
environment.

● What it is: ECR is a fully managed Docker container registry that allows developers to
store, manage, and deploy Docker container images in a secure and scalable way.
● Purpose: ECR stores Docker images, which can be pulled by ECS (or other container
services) to deploy containers.
● Key Features:
○ Private Repositories: Secure storage for Docker images.
○ Image Versioning: Tracks versions of container images.
○ Integration with ECS: Seamlessly integrates with ECS to pull container images for
deployment.
○ IAM Security: Uses AWS Identity and Access Management (IAM) to control
access to your images.
● Use Case: Useful for storing container images that will be used by ECS, Kubernetes, or
other container platforms.
ECS runs your containers, and ECR stores the container images required by ECS.

For example, you would build a Docker image for your application, push it to ECR, and then
configure ECS to pull that image from ECR to run it as a service or task.

Route53:

It’s a scalable and highly available DNS web service. It translates domain names like
(www.example.com) into IP addresses that computers use to connect to each other. It can
route to different AWS services like S3 buckets, AWS instances.

What is a Hosted Zone?

It’s a container that holds information about how to route traffic for a specific domain(i.e.,
example.com). It’s the DNS database for a domain, storing DNS records that defines how
requests for your domain are handled.

Types of Records in a Hosted Zone

A DNS record in a hosted zone tells Route 53 how you want to route traffic for a domain or
subdomain. The most common types of records are:

1. A Record (Address Record):

○ Maps a domain or subdomain to an IPv4 address.
○ Example: example.com -> 192.0.2.1
○ Use Case: Use an A record when you need to point a domain directly to a specific
IP address, like a web server.
2. AAAA Record:
○ Maps a domain or subdomain to an IPv6 address.
○ Example: example.com -> 2001:0db8:85a3::8a2e:0370:7334
3. CNAME Record (Canonical Name Record):
○ Maps a domain or subdomain to another domain name.
○ Example: www.example.com -> example.com
○ Use Case: Use a CNAME record when you want to alias one domain name to
another, such as pointing www.example.com to example.com.
4. NS Record (Name Server Record):
○ Specifies the authoritative DNS servers for the hosted zone.
○ Example: example.com -> ns-123.awsdns-45.org
○ Automatically Created: NS records are automatically created when a hosted zone
is set up and point to Route 53’s name servers.
5. SOA Record (Start of Authority Record):
○ Provides information about the domain, such as the primary name server, the
email of the domain administrator, and various timers relating to refreshing the
zone.
○ Automatically Created: SOA records are also automatically created when a
hosted zone is set up.

Difference between Nginx, Load balancer, API Gateway:

Feature / Nginx Load balancer API Gateway

Aspect

Purpose Acts as an intermediary Distributes incoming Serves as a single end-

that forward client traffic across multiple point for managing,
requests to backend servers to ensure no routing, and securing API
servers and then single server becomes requests
returns server overwhelmed,
responses to clients improving availability

Key functions Load balancing Traffic distribution Routing

Caching Health checks Service Discovery
SSL Termination Session persistence Authentication &
Reverse proxy Authorisation
Cross Site Scripting Rate limiting & Throttling

Use cases Used for providing an Best for distributing Ideal for microservices
additional layer of traffic to ensure that no architecture where you
abstraction and single server is need fine-grained control
managing various overloaded & enhance over API interactions &
server-side functions fault tolerance security

Difference between Amazon GuardDuty, Amazon Inspector, Amazon Macie

Feature / Aspect Amazon GuardDuty Amazon Inspector Amazon Macie

Purpose Focuses on threat Conducts automated Specializes in

detection across security assessments discovering and
AWS environment for for AWS resources, securing sensitive
any malicious activity identifying data, particularly in
and unauthorized vulnerabilities and S3, to help manage
behavior ensuring compliance data privacy and
with security compliance
standards

Key features -> Threat detection -> Automated -> Sensitive data
-> Continuous security assessments discovery
monitoring -> Security standards -> Data privacy
-> Integrated -> Assessment -> Continuous
intelligence reports monitoring

Use cases Detecting Regular security Protecting sensitive

compromised EC2 assessment of EC2 data in Amazon S3
instances, unusual instances and by identifying and
activity in AWS containers to identify securing PII data
accounts, or vulnerabilities Maintaining data
unauthorized access Ensuring compliance privacy and
to S3 buckets with industry compliance with
Enhancing security standards and regulations such as
visibility across your organizational GDPR, HIPAA
AWS env security policies

Difference between AWS CloudTrail and AWS Config:

Feature / Aspect AWS CloudTrail AWS Config

Purpose It records API calls made on your It tracks and records the config of
AWS account. It provides visibility AWS resources over time. It
into user activity, such as who enables to assess, audit, and
made a request, what actions were evaluate the config of AWS
taken, and when and from where resources to ensure they comply
they were made. It focuses on with desired configs and standards
logging & auditing

Key features Event logging Configuration history

Audit trails Resource relationships
Security monitoring Compliance monitoring
Compliance - CloudTrail helps
meet compliance standards (e.g.,
GDPR, HIPAA, PCI DSS) by
providing a reliable source of audit
logs.
Data Integrity: CloudTrail logs can
be stored in an S3 bucket with
encryption, ensuring data integrity
and compliance with data
protection regulations.
Automating Security and Compliance monitoring with CloudTrail:

1. Enable CloudTrail Logging: Go to the AWS CloudTrail console and create a new trail. Store the
logs in an S3 bucket with appropriate permissions and encryption.
2. Set Up AWS Config for Compliance Monitoring: Enable AWS Config to track the configuration
changes of your AWS resources and define compliance rules in AWS Config to check whether
your resources are configured according to best practices (e.g., S3 buckets should be
encrypted).
3. Integrate with Amazon CloudWatch for Real-Time Monitoring: Use Amazon CloudWatch
Logs to monitor CloudTrail logs. Set up filters for specific events (e.g., unauthorized API calls,
root account usage). Create CloudWatch Alarms based on these filters. For example, trigger an
alarm if there is an IAM:DeleteUser action.
Ex: We’ll set up a Metric filter with “IAM:DeleteUser”, and if this pattern is found in the log
group, we can create a custom metric and assign it a value. if it meets the condition then we
know that some anomaly is detected and trigger an alarm
4. Automate Responses with AWS Lambda: Configure AWS Lambda functions to respond
automatically to CloudWatch Alarms. For example, a Lambda function can automatically revoke
permissions if a security violation is detected.
5. Set Up Alerts with Amazon SNS: Integrate CloudWatch Alarms with Amazon SNS to send
immediate alerts (via email, SMS, or other messaging services) when a security or compliance
violation is detected.

Web Application Firewall:

It protects your web application or APIs from common web exploits and bots that can affect
availability, compromise security, or consume excessive resources.
Key Features of AWS WAF:

1. Protection Against Common Web Attacks:

○ SQL Injection and Cross-Site Scripting (XSS): AWS WAF helps protect against
these types of attacks by inspecting incoming requests and blocking malicious
traffic.
○ HTTP Floods: It can detect and mitigate HTTP floods and DDoS attacks,
protecting your application from being overwhelmed by traffic.
2. Customizable Rules:
○ You can create custom rules that allow or block requests based on conditions
such as IP addresses, HTTP headers, URI strings, and body content.
○ AWS WAF supports rate-based rules to automatically block IP addresses that
make too many requests in a given time period.
3. Managed Rule Groups:
○ AWS offers managed rule groups that provide pre-configured rules to protect
against known threats. These are maintained and updated by AWS or third-party
vendors to ensure your application is protected from the latest threats.
4. Integration with AWS Services:
○ Amazon CloudFront: AWS WAF can be deployed with Amazon CloudFront
(AWS's CDN service) to protect applications served via the CDN.
○ Application Load Balancer (ALB): AWS WAF can also be used with an Application
Load Balancer to protect web applications hosted on Amazon EC2, ECS, or
Lambda.
○ API Gateway: Protect your APIs hosted on Amazon API Gateway with AWS WAF.
5. Real-Time Visibility and Monitoring:
○ AWS WAF provides real-time metrics and logging to Amazon CloudWatch,
allowing you to monitor your web traffic and see which rules are triggered. You
can also set up alarms based on this data.
○ WAF logs can be sent to Amazon S3 or an analytics service like AWS Kinesis for
further analysis.
6. Bot Control:
○ AWS WAF includes features for managing bot traffic. You can allow good bots
(like search engine crawlers) while blocking bad bots (like scrapers or automated
attacks).
7. Security Automation:
○ You can automate WAF rule creation and updates using AWS services such as
AWS Lambda, Amazon CloudWatch, and AWS Config. This helps in quickly
responding to emerging threats.

How to integrate WAF & Cloudfront and define CORS policies:

CloudFront is a Content Delivery Network (CDN) that delivers your content globally with low
latency and high transfer speed. When integrated with AWS WAF, CloudFront acts as the entry
point for incoming requests, applying both security rules(WAF) and performance optimizations.

How to setup AWS WAF with CloudFront and CORS:

1. Create a WAF Web ACL
2. Integrate WAF with CloudFront
3. Define CORS policies in S3 bucket permissions tab
4. Update CloudFront behavior

So, when a user makes any request for the content stored in S3 bucket, request is routed to
Nginx server and this forwards the request to ALB which determines the appropriate backend
service, which could be S3. CloudFront comes into play, when ALB forwards the request to S3
bucket, assuming CloudFront is configured as the CDN for that bucket. Once CLoudFront
validates and processes the request, it forwards to S3 bucket. If the request passes all
validations, CloudFront retrieves the requested content from S3 and caches it if needed.
CloudFront then returns the response to ALB, which passes it back through Nginx to the user.

VPC Flow logs:

It enables you to capture information about the IP traffic going to and from network interfaces
within your Virtual Private Cloud (VPC). These logs capture details such as the source and
destination IP addresses, protocols, ports, and other relevant information about the traffic
flowing through your VPC.

Key Features of VPC Flow Logs:

● Traffic Monitoring: Captures details about both accepted and rejected traffic based on
your security group and network ACL (Access Control List) rules.
● Granularity: Can be created at different levels of granularity—VPC, subnet, or network
interface level.
● Log Storage: The logs are stored in Amazon CloudWatch Logs or sent to an S3 bucket for
analysis and long-term storage.
● Filtering: You can filter the logs based on specific parameters (like specific IP addresses
or ports) to reduce the amount of data captured.
● No Impact on Network Performance: Enabling VPC Flow Logs does not affect the
performance of your network or the latency of your applications.

AWS Security Hub:

It’s a security management service that provides a centralized view of your AWS env’s security
posture. It aggregates, organizes and prioritizes security findings from multiple AWS services
and 3rd party security products. By providing automated security checks, Security Hub helps
you identify and remediate potential security issues across your AWS environment.

How AWS Security Hub Protects Your Infrastructure:

1. Aggregating Security Findings:

○ Security Hub collects security findings from various AWS services like Amazon
GuardDuty (threat detection), Amazon Inspector (vulnerability assessments), and
Amazon Macie (sensitive data discovery). It also integrates with third-party
security tools.
○ This centralized aggregation helps you get a holistic view of potential threats and
vulnerabilities across your entire AWS infrastructure.
2. Continuous Security Monitoring:
○ Security Hub continuously monitors your environment against security best
practices and compliance standards such as CIS AWS Foundations Benchmark
and AWS Foundational Security Best Practices.
○ It provides automated security checks that alert you to any deviations from
these best practices, allowing you to address them promptly.
3. Prioritization of Security Issues:
○ Findings are prioritized based on severity, so you can focus on the most critical
issues first. This ensures that the most pressing security risks are addressed in a
timely manner.
4. Automated Remediation:
○ You can automate the response to specific security findings using AWS Lambda
or other automation tools. For instance, if Security Hub identifies a
misconfigured S3 bucket, you can automatically trigger a Lambda function to
correct the configuration. This reduces the time to respond to and mitigate
security threats.
5. Security Standards and Compliance:
○ Security Hub provides built-in security standards and compliance checks. You can
monitor your environment's compliance with these standards and receive alerts
if any resources are non-compliant.
6. Integration with Other AWS Services:
○ Security Hub integrates with services like AWS Config, AWS CloudTrail, and
Amazon CloudWatch, allowing you to create custom security workflows and
alerts.

Steps to Protect Infrastructure Using AWS Security Hub:

1. Enable Security Hub:

○ Go to the AWS Management Console and enable Security Hub for your AWS
account. You can also enable it across multiple accounts using AWS
Organizations.
2. Integrate with Other Security Services:
○ Integrate Security Hub with services like Amazon GuardDuty, Amazon Inspector,
and Amazon Macie, as well as third-party tools, to centralize security findings.
3. Review and Act on Security Findings:
○ Regularly review the findings in the Security Hub dashboard. Prioritize and
address critical issues, and implement recommended security best practices.
4. Automate Responses:
○ Use AWS Lambda or other automation tools to create automated workflows for
responding to security findings. For example, automatically quarantine
compromised instances or fix misconfigurations.
5. Monitor Compliance:
○ Use the compliance dashboards in Security Hub to monitor adherence to security
standards. Remediate any non-compliant resources to maintain a secure and
compliant environment.
6. Continuous Improvement:
○ Continuously refine and improve your security practices by analyzing the trends
and patterns in the findings reported by Security Hub. Use this data to enhance
your security posture over time.

Amazon EventBridge:

It’s a serverless event bus service that enables you to build event-driven architectures by
connecting different AWS services, custom applications and 3rd party SaaS applications using
events OR we can also configure scheduled triggers . It extends the capabilities of CloudWatch
events by offering better integration options.

Lambda function:

What is event and context in lambda function?

event: It contains info about the event that triggered the lambda function. If the lambda
function is triggered by an S3 bucket event, the event contains details about the S3 object that
was created.
Context: It provides info about the runtime env about the lambda function. It includes details
such as function name, memory limit, time remaining before logout etc.

MongoDB:
MongoDB is a popular NoSQL database designed for scalability, flexibility and performance.
Unlike traditional relational databases that use tables and rows, it stores data in flexible JSON-
like documents.
Documents are grouped into collections. A collection in MongoDB is similar to a table in
RDBMS. Documents within a collection can be various shapes and sizes. It can support indexing
on any field within the documents, which improves the query performance.

Why only MongoDB and why not other databases?

As we’re using Redis as a caching layer and it supports NoSQL databases and also it’s a
preferable database for our dev team as well, we’re using MongoDB
Why MongoDB is preferred over DocumentDB(even though it supports MongoDB) in AWS

1. Feature completeness: MongoDB Atlas provides full compatibility with the latest
MongoDB features, including advanced query operators, transactions whereas
DocumentDB supports offers only partial compatibility and lacks support for MongoDB
features.
2. Deployment flexibility: MongoDB Atlas allows multi-cloud and multi-region deployments
across AWS, GCP, Azure offering more flexibility with automatic failover whereas
Amazon DocumentDB is limited to AWS and lacks these multi-cloud features.
3. Developer experience and ecosystem: MongoDB Atlas offers a rich developer
experience with support for advanced querying whereas DocumentDB has more limited
querying capabilities.

What is VPC Peering and How peering is setup between AWS application servers and
MongoDB

1. Identify the VPCs to Peer

● AWS VPC: Identify the VPC where your application servers are hosted.
● MongoDB VPC: Identify the VPC where your MongoDB instances are hosted. If using
MongoDB Atlas, Atlas creates a VPC in your chosen cloud provider (AWS in this case).

2. Create a VPC Peering Request in MongoDB

○ Log in to your MongoDB Atlas account.

○ Navigate to the Network Access section.
○ Go to the Peering tab and click Create a Peering Connection.
○ Choose AWS as your cloud provider.
○ Enter the VPC ID, AWS Account ID, and Region of your AWS VPC where your
application servers are hosted.
○ MongoDB Atlas will initiate the peering request

3. Accept the Peering Request in AWS

● Log in to your AWS Management Console.

● Navigate to VPC > Peering Connections.
● You should see a pending peering request. Select the request and click Accept.
● Once accepted, the status of the peering connection will change to Active.

4. Update Route Tables

● To enable traffic between the two VPCs, you need to update the route tables.
● Application VPC Route Table:
○ Go to Route Tables in the VPC dashboard.
○ Select the route table associated with your application servers.
○ Add a new route that points to the MongoDB VPC’s CIDR block via the VPC
peering connection.
● MongoDB VPC Route Table:
○ If necessary, update the route table associated with the MongoDB instances to
allow traffic back to your application VPC’s CIDR block via the VPC peering
connection.

5. Configure Security Groups

● Update the security groups in both VPCs to allow traffic between the application servers
and MongoDB.
● Application Servers:
○ Add inbound rules to allow traffic from the MongoDB VPC’s CIDR block on the
necessary ports (e.g., port 27017 for MongoDB).
● MongoDB Instances:
○ Similarly, add inbound rules to allow traffic from the application VPC’s CIDR
block.

6. Test the Connection

● Verify that your application servers can connect to the MongoDB instances.
● You can do this by trying to connect to MongoDB from an application server or by using
a tool like mongo shell or mongosh from the command line.

Terraform:
How do you perform terraform migration?

Let’s assume the entire infra is created manually and from now, it should be managed through
Terraform. Below are the steps to migrate it to terraform
1) Create a main.tf file
provider “aws” {
region = “us-east-1”
}

import {
id = “instance_id”,
to = “aws_instance.example”
}
2) $terraform init
3) $terraform plan -generate-config-out=”generated_resources.tf”
4) generated_resources.tf file will be generated which contains the infra resources that
were created manually
5) Copy entire file and replace it in main.tf file and also remove the import block
6) $terraform plan - it still shows there is 1 resource that needs to be created manually
even though we imported from AWS console. This is because state file isn’t created yet
7) To create state file, $terraform import aws_instance.example <instance_id>
8) $terraform plan & $terraform apply will work fine and from now, all resources can be
managed through terraform

Drift detection

There might be scenarios when a person is leaving the org, he may mess up things and how do
we handle those situations.
Suppose, our entire infra is managed through terraform and a person has manually changed
some config directly in AWS console which resulted in error in next operations. To handle these
situations: We can either do terraform refresh with cron expression which updates the state file
but it’s not a suggestible approach OR create lambda function such that it monitors for the
changes and immediately sends alerts whenever changes are made

Drawbacks of Terraform State file

State files can store sensitive info and if the code is uploaded to github, anyone who has the
access can view the file. Even if we restrict the access to the people, we have another problem
as: If a person wants to edit the code of terraform project, he then downloads the code from
github, performs edit, applies the changes, then the state file also gets updated with new
modifications. So, every time, the person has to push the state file to Github along with the
code changes. So here, the state file is updated only after “terraform apply”. But the problem
here is that, what if the person checks out the code from Github, updates the logic and he
doesn’t perform “terraform apply” and directly pushes back the code to github again and now
the state file doesn’t have the updated config. So now, when another person checks out the
code from github, it says that the state file and the main.tf file doesn’t match and asks to delete
the code.

So, to solve this problem we use the concept of Remote backend for ex: S3 bucket. The
terraform state file gets stored in S3 bucket and project code is stored in github repo.
Whenever the developer makes changes to the code and does “terraform apply”, the terraform
state file gets updated in S3 bucket.
To implement this, we can create an S3 bucket in the main.tf file and create a backend.tf file
and then include the S3 bucket name in this backend.tf file.

Similarly to implement locking mechanism, create a Dynamodb table in main.tf file and then
include the DynamoDB table in backend.tf file.

Terraform and AWS Codepipeline integration:

Below is the process we follow while making the terraform changes:

We have terraform code stored in Github and we maintain a global backend (s3 with
DynamoDB table for locking) to store the state file.

Now, the process of running init, plan and apply commands for terraform changes is
automated. Whenever a user raises a PR with code change, we trigger a build in the AWS Code
pipeline which setup terraform and runs init and plan commands and store the generated plan
to the s3 bucket.
The code changes and plan will be then reviewed and once the change is merged, another build
will be triggered which runs the apply command using the plan generated in the earlier build.

What are the challenges you faced and how did you overcome that?
In our Nginx server, in the properties.conf file we whitelist our client’s IP addresses such that
traffic will be allowed from only those IPs into our application. And also, in our NLB we maintain
static IP addresses in each availability zone and we give these static IP addresses to clients such
that they will only allow the traffic from these IP addresses.
So, recently we moved our Prod env from old VPC to new VPC to support broader IP address
range. After this activity is done, we noticed 2 issues:
1. Initially, our application was not at all accessible and on further debugging the issue was
in the NLB, we assigned new static IP and the client has old static IP. So, because of this
mismatch application wasn’t accessible and we updated the static IPs with the old ones
2. Even after doing this activity, application was accessible inconsistently and on further
debugging we noticed that the traffic isn’t distributed equally and because of this it was
accessible to few users and for few users, application was down. So, we fixed this issue
by enabling ‘Cross Zone load balancing’ so that the traffic will be distributed evenly to all
the availability zones.
SSL Termination with Nginx and Reverse Proxy configuration:
SSL Termination: Decrypts the SSL / TLS traffic at the Nginx server before passing the
unencrypted(plain text) traffic to the backend servers, such as application servers or load
balancers.

1. Nginx Configuration File (nginx.conf)

The main Nginx configuration file typically resides in /etc/nginx/nginx.conf. This file includes
global settings and can include the configuration files from sites-enabled.
2. Site Configuration Files in sites-enabled/ Directory
File: /etc/nginx/sites-enabled/wipo.applaudcloud-eu.com

3. Custom Properties Files

The custom properties files contain configurations to use Nginx as a reverse proxy and route
requests to a load balancer based on URL patterns.

File: /etc/nginx/custom-properties/wipo-custom.conf
4. Explanation

● SSL Configuration:
○ SSL termination is handled by Nginx using the provided .pem files (fullchain.pem
for the certificate and privkey.pem for the private key).
○ SSL protocols and ciphers are defined to ensure secure communication.
● Reverse Proxy:
○ The proxy_pass directive routes the incoming requests to the specified load
balancer based on the URL patterns.
○ The headers like X-Real-IP, X-Forwarded-For, and X-Forwarded-Proto are
forwarded to the backend to preserve the client’s original information.
● Domain-Specific Custom Properties:
○ Separate custom properties files are used for each domain, allowing for domain-
specific routing rules and configurations.

5. Deployment Instructions

1. Install Nginx: Ensure Nginx is installed on your server.

2. Create Directories: Ensure that directories such as /etc/nginx/ssl/ and
/etc/nginx/custom-properties/ exist.
3. Place Certificates: Place your SSL certificates (fullchain.pem and privkey.pem) in the
/etc/nginx/ssl/ directory.
4. Configure Nginx: Add the nginx.conf file and domain-specific configurations in the sites-
enabled directory.
5. Test Configuration: Run nginx -t to test the configuration for syntax errors.
6. Restart Nginx: Restart Nginx using systemctl restart nginx to apply the changes.

This setup will handle SSL termination at Nginx, route traffic to the appropriate backend load
balancers, and manage requests efficiently for all domains.

What Cost Optimisation have you done in your company?

In AWS, we have AutoScaling feature to horizontally scale our servers as and when needed
according to the traffic. But in MongoDB, we don’t have that feature and everytime clusters
used to run at max capacity even when there is not much traffic. So, this used to increase our
costs and as an alternative approach, we wrote JavaScript functions so that the alarms will
trigger whenever the cluster size reaches to Dmax capacity for ex: 80, then the function will get
triggered and performs accordingly and reduces the cluster size to min capacity similar to Auto
scaling scale-in and scale-out features. In this way, we’re maintaining the cluster capacities
accordingly and eventually decreasing the cost.
Scripting interview questions:

How do you write a loop in Bash?

The most common types of loops are: for, while, until

1) How would you modify your file if you need to iterate over the files in a directory?

2) What would happen if the list you're iterating over is empty? How would you handle that?

3) How would you modify the loop to handle an error condition, such as a missing file?

4) How would you extend this script to handle input from a user at runtime?
$ echo “Enter no.of iterations”
read num_iterations
for (( i=1;i<num_iterations;i++))
do
echo “Iteration - $i”
done
5) How do you implement error handling if the file you are searching for is not found?

How do you pass arguments to a Bash script?

1) How would you handle an unknown number of arguments in a Bash script?

2) What’s the difference between $* and $@ in a Bash script?

3) How can you check if the correct number of arguments is passed to a script?
$0 means it prints name of the script
4) Can you pass an array to a Bash script, and how would you handle it?
What are some common string manipulation techniques in Bash?
What is the difference between $() and ${}?

How can you read a file line by line and process only lines that contain specific words?

line.strip() removes any leading & trailing whitespaces including any newline character

2nd way:
How would you modify the code to handle potential errors while reading the file?
Explain how you would schedule a cron job to run a script every day at 2am?
How do you handle errors in a bash script?

Manual exit status check:

Using set -e for automatic exit error:

In this example, if any command fails the script exits automatically without executing
subsequent commands
Using trap to handle errors gracefully:

How do you parse a CSV file line by line?

#!/bin/bash
while IFS=, read -r column1 column2 column3; do
echo “Column 1: $column1, Column 2: $column2, Column 3: $column3”
done < ${filename.csv}
How do you automate the deployment of applications to multiple servers?

Write a script to monitor disk usage and send alert if threshold exceeds 80%
Write a script to backup logs older than 7 days and delete original files
Write a script to automate DB backup

Write a script to rotate logs on a weekly basis

Write a script to check the status of multiple services and restart if they are not

Write a script to update web application by pulling latest code from Github repo
Write a script to compress and archive old log files

Write a script to automate the cleanup of temporary files older than 10 days
Write a script to install list of packages

Write a script to check health of web application by sending Http request and check response
Write a script to automate the configuration of new server with necessary packages
Linux interview questions:
What is Linux?

What is a Shell in Linux?

What is the difference between Soft link and hard link?

Aspect Soft link (Symbolic link) Hard link

What? It’s a file that points to another file It’s the direct reference to the
or directory by storing its path. It physical data on the disk(inode) of a
works like a shortcut in windows. file. It essentially creates another
name for the same file content.

Key features Points to the file’s path - A soft link Points to file’s data(inode) - A hard
contains path to the original file, not link points to file’s inode, means it
its actual data points directly to the same data as
Can link to directories - It can point the original file
to both files and directories Can’t link to directories - It can only
Broken link - If original file is deleted be created for files, not directories
or moved, soft link becomes a No broken link - Even if the original
broken link and won’t work file is deleted, the data remains
Size link - Takes little space as it accessible via any of its hard links
stores only the file path Size link - Same size as the original
file

Use cases You want to create a shortcut to the You want to create backup of a file
commonly accessed directory without actually duplicating the
located deep inside, or you need to data, and you want to ensure file
link across file systems content persists even if the original
file is deleted

Example ln -s /path/to/original-file.txt ln /path/to/original-file.txt

/path/to/soft-link.txt /path/to/hard-link.txt
Here, /path/to/soft-link.txt is the Here, /path/to/hard-link.txt is the
soft link that points to hard link that points to
/path/to/original-file.txt /path/to/original-file.txt

How do you check System resource usage in Linux?

What is systemctl, systemd and how it works?

systemd: It’s an init system and service manager for linux responsible for booting the system,
managing system services, handling dependencies and controlling system states.
systemctl: It’s a command-line utility used to interact with systemd to manage and control the
state of the system services and other units managed by systemd.
How systemctl Works at a Higher Level

● Service Management: systemctl allows you to manage services (start, stop, restart,
enable, disable, etc.). These services are typically defined by unit files, usually located in
/etc/systemd/system/ or /lib/systemd/system/. Each unit file describes how the service
should start, what it depends on, and other configurations.
● Fetching Status: When you run sudo systemctl status <service_name>, systemctl
queries the state of the service from systemd's internal data structures. systemd keeps
track of the status of all managed services, including whether they are running, stopped,
or failed, along with logs related to the service. systemctl fetches and displays this
information.

Find out why a server is running slowly?

A user’s home directory is filling up disk space on the root partition. How would you resolve?

You need to create a new user and ensure they have no shell access. How would you do?

> sudo useradd -s /sbin/nologin username

> sudo useradd <username> - create new user
>sudo passwd <username> - set password
>sudo mkdir /home/<username> - create home directory
>sudo usermod -aG sudo <username> - add user to a sudo group to perform root operations
>grep <username> /etc/passwd - verify the new user