0% found this document useful (0 votes)
7 views9 pages

Sample QN

The document provides an overview of MongoDB, a NoSQL database that stores data in flexible JSON-like documents, and contrasts it with traditional SQL databases. It covers various concepts such as document-oriented databases, replica sets, sharding, indexing, and aggregation operations, as well as Docker and Power BI, discussing their functionalities and differences. Additionally, it addresses best practices, security features, and performance optimization techniques for both MongoDB and Docker.

Uploaded by

sarjadhruv1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

Sample QN

The document provides an overview of MongoDB, a NoSQL database that stores data in flexible JSON-like documents, and contrasts it with traditional SQL databases. It covers various concepts such as document-oriented databases, replica sets, sharding, indexing, and aggregation operations, as well as Docker and Power BI, discussing their functionalities and differences. Additionally, it addresses best practices, security features, and performance optimization techniques for both MongoDB and Docker.

Uploaded by

sarjadhruv1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

What is MongoDB, and how does it differ from traditional SQL databases?

Answer: MongoDB is a NoSQL database that stores data in flexible, JSON-like documents instead of
tables. It differs from traditional SQL databases by offering a dynamic schema, horizontal scalability,
and better performance for certain types of applications.

Explain the concept of document-oriented database in MongoDB.

Answer: MongoDB is a document-oriented database, which means it stores data in documents


(similar to JSON objects) instead of rows in tables. Each document can have its own unique structure
and can contain nested fields and arrays.

How does MongoDB ensure high availability and fault tolerance?

Answer: MongoDB ensures high availability and fault tolerance through replication. It maintains
multiple copies of data across a cluster of servers (replica set) and automatically promotes a new
primary if the primary node fails.

What is a replica set in MongoDB?

Answer: A replica set in MongoDB is a group of MongoDB servers that maintain the same data set. It
consists of multiple replica nodes, including a primary node for read and write operations and one
or more secondary nodes for data replication and failover.

How would you create an index in MongoDB?

Answer: You can create an index in MongoDB using the createIndex() method. Here's an example:

javascriptCopy code

db.collection.createIndex({ field: 1 })

What is sharding in MongoDB, and when would you use it?

Answer: Sharding in MongoDB is the process of horizontally partitioning data across multiple servers
to improve scalability and performance. It is typically used when the data set grows too large to be
stored on a single server or when the workload needs to be distributed across multiple servers.

Explain the difference between a shard key and a primary key in MongoDB.

Answer: A shard key is used to partition data across multiple servers in a sharded cluster, while a
primary key is a unique identifier for documents within a single collection.

How do you perform aggregation operations in MongoDB?


Answer: You can perform aggregation operations in MongoDB using the aggregate() method. This
allows you to group, filter, and perform calculations on documents within a collection.

What is the GridFS in MongoDB?

Answer: GridFS is a specification for storing and retrieving large files (exceeding 16MB) in MongoDB.
It stores files as separate documents, breaking them into smaller chunks for efficient storage and
retrieval.
Explain the concept of capped collections in MongoDB.

Answer: Capped collections in MongoDB are fixed-size collections that automatically overwrite older
documents when the collection reaches its maximum size. They are useful for storing logs and other
time-series data.

How does MongoDB handle transactions?

Answer: MongoDB supports multi-document transactions starting from version 4.0 for replica sets
and version 4.2 for sharded clusters. Transactions allow you to perform multiple operations on
multiple documents atomically.

What is the difference between MongoDB and Cassandra?

Answer: MongoDB is a document-oriented NoSQL database, while Cassandra is a distributed, wide-


column store NoSQL database. MongoDB uses a flexible schema with JSON-like documents, while
Cassandra uses a static schema with wide rows.

How would you back up and restore a MongoDB database?

Answer: You can back up a MongoDB database using the mongodump command, which creates a
binary dump of the database. To restore a backup, you can use the mongorestore command to
import the dump files back into the database.

What are the different types of reads in MongoDB?

Answer: In MongoDB, there are two types of reads: primary reads and secondary reads. Primary
reads are performed on the primary node of a replica set and reflect the most up-to-date data.
Secondary reads are performed on secondary nodes and may reflect slightly stale data due to
replication lag.

Explain the concept of write concern in MongoDB.

Answer: Write concern in MongoDB determines the level of acknowledgment a write operation
must receive before it is considered successful. It allows you to control factors such as durability and
consistency.

What is the difference between find() and findOne() methods in MongoDB?

Answer: The find() method returns a cursor to all documents that match the query criteria, while
the findOne() method returns the first document that matches the query criteria.

How would you perform a case-insensitive search in MongoDB?

Answer: You can perform a case-insensitive search in MongoDB by using a regular expression with
the $regex operator and the $options modifier. For example:

javascriptCopy code

db.collection.find({ field: { $regex: /pattern/i } })

What is the purpose of TTL (Time-To-Live) indexes in MongoDB?


Answer: TTL indexes in MongoDB allow you to automatically expire documents from a collection
after a specified period of time. They are useful for storing temporary data or managing data
retention policies.

Explain the concept of aggregation pipelines in MongoDB.

Answer: Aggregation pipelines in MongoDB allow you to process documents through a series of
stages to perform complex transformations and calculations. Each stage takes input documents,
processes them, and passes the output to the next stage.

How does MongoDB handle concurrency and locking?

Answer: MongoDB uses a multi-version concurrency control (MVCC) mechanism to handle


concurrency and locking. It allows multiple readers to access the same data simultaneously while
ensuring consistency and isolation for write operations.

What is the role of the WiredTiger storage engine in MongoDB?

Answer: WiredTiger is the default storage engine in MongoDB since version 3.2. It provides support
for compression, document-level concurrency control, and improved write performance compared
to the MMAPv1 storage engine.

Explain the concept of geospatial indexing in MongoDB.

Answer: Geospatial indexing in MongoDB allows you to efficiently query documents based on their
geographical location. It supports 2D and 3D indexes and provides various query operators for
performing spatial queries.

What are the security features available in MongoDB?

Answer: MongoDB offers various security features, including authentication, access control,
encryption (at rest and in transit), auditing, and role-based access control (RBAC).

How would you handle schema migrations in MongoDB?

Answer: In MongoDB, schema migrations are typically handled by updating application code to
accommodate changes in document structure. Alternatively, you can use migration scripts to modify
existing documents or collections as needed.

What are some best practices for optimizing performance in MongoDB?

Answer: Some best practices for optimizing performance in MongoDB include designing efficient
schemas, using appropriate indexes, avoiding nested arrays with large numbers of elements, and
sizing hardware appropriately for your workload. Additionally, optimizing query performance and
minimizing network latency can further improve overall performance.

What is Docker, and how does it differ from virtualization?

Answer: Docker is a platform that allows developers to package, distribute, and run applications in
containers. Unlike traditional virtualization, Docker containers share the host OS kernel, which
makes them lightweight and more efficient.
Explain the difference between Docker images and containers.

Answer: Docker images are read-only templates that contain the application code, runtime,
libraries, and dependencies. Docker containers are instances of Docker images that can be executed
and run as isolated processes on a host machine.

How do you build a Docker image from a Dockerfile?

Answer: You can build a Docker image from a Dockerfile using the docker build command. For
example:

Copy code

docker build -t myimage:latest .

What is the purpose of a Dockerfile?

Answer: A Dockerfile is a text file that contains instructions for building a Docker image. It specifies
the base image, environment variables, dependencies, and commands needed to set up and
configure the application environment.

How do you run a Docker container?

Answer: You can run a Docker container using the docker run command. For example:

arduinoCopy code

docker run -d myimage:latest

Explain the concept of container orchestration.

Answer: Container orchestration is the process of managing and automating the deployment,
scaling, and operation of containerized applications. It involves tools like Docker Swarm and
Kubernetes to coordinate and manage clusters of containers.

What is Docker Swarm, and how does it work?

Answer: Docker Swarm is Docker's native clustering and orchestration tool. It allows you to create a
cluster of Docker hosts (nodes) and deploy and manage services across the cluster using a single API.

What is the difference between a Docker container and an image?

Answer: A Docker image is a template that contains the application code and dependencies, while a
Docker container is a running instance of that image. Containers can be created, started, stopped,
and deleted, while images are static and immutable.

How do you share Docker images with others?


Answer: You can share Docker images with others by pushing them to a Docker registry like Docker
Hub or a private registry. Users can then pull the images from the registry using the docker pull
command.

Explain the concept of Docker volumes.


Answer: Docker volumes are a way to persist data generated by containers or share data between
containers and the host machine. They provide a mechanism for managing storage outside the
container's writable layer.

What is Docker Compose, and how is it used?

Answer: Docker Compose is a tool for defining and running multi-container Docker applications. It
allows you to use a YAML file to define the services, networks, and volumes for your application and
then use a single command to start, stop, and manage the entire application stack.

How do you monitor Docker containers?

Answer: Docker provides various tools for monitoring containers, including Docker Stats, Docker
Events, and third-party monitoring solutions like Prometheus and Grafana. These tools allow you to
monitor resource usage, container health, and performance metrics.

What is the difference between Docker Swarm and Kubernetes?

Answer: Docker Swarm is Docker's native clustering and orchestration tool, while Kubernetes is an
open-source container orchestration platform originally developed by Google. Kubernetes offers
more advanced features and a larger ecosystem but has a steeper learning curve compared to
Docker Swarm.

How do you scale Docker containers horizontally?

Answer: You can scale Docker containers horizontally by deploying multiple instances of the same
container across multiple Docker hosts (nodes) using Docker Swarm or Kubernetes. This allows you
to handle increased traffic and distribute the workload across the cluster.

Explain the concept of Docker networking.

Answer: Docker networking allows containers to communicate with each other and with external
networks. Docker provides various networking modes, including bridge, host, overlay, and macvlan,
to facilitate communication between containers and external systems.

What is Docker Hub, and how is it used?

Answer: Docker Hub is a cloud-based registry service provided by Docker for storing and sharing
Docker images. It allows users to push, pull, and manage Docker images, as well as collaborate with
others by sharing images publicly or privately.

How do you troubleshoot Docker containers?

Answer: Troubleshooting Docker containers involves identifying and resolving issues related to
networking, resource constraints, configuration errors, and application failures. Docker provides
tools like docker logs, docker inspect, and docker exec to help diagnose and troubleshoot container
problems.

What are the security best practices for Docker containers?


Answer: Security best practices for Docker containers include using minimal and secure base
images, applying container image scanning for vulnerabilities, implementing least privilege access
controls, enabling container runtime security, and regularly updating and patching containers.

Explain the concept of Docker volumes vs. bind mounts.

Answer: Docker volumes and bind mounts are both mechanisms for persisting data outside the
container's writable layer. Volumes are managed by Docker and are portable across hosts, while
bind mounts are paths from the host machine mounted into the container.

How would you automate the deployment of Docker containers?

Answer: You can automate the deployment of Docker containers using Continuous
Integration/Continuous Deployment (CI/CD) pipelines and tools like Jenkins, GitLab CI/CD, or Travis
CI. These tools allow you to build, test, and deploy Docker images automatically based on
predefined workflows and triggers.

What are some common challenges when working with Docker containers?

Answer: Common challenges when working with Docker containers include managing container
lifecycles, orchestrating container deployments at scale, optimizing resource usage, securing
containerized applications, and troubleshooting networking and performance issues.

What is the Docker daemon, and how does it work?

Answer: The Docker daemon (dockerd) is a background process that manages Docker objects like
images, containers, volumes, and networks. It listens for Docker API requests from the Docker CLI or
other Docker clients and performs actions like building, running, and managing containers.

How would you manage secrets and sensitive information in Docker containers?

Answer: Docker provides a built-in mechanism called Docker Secrets for managing sensitive data
like passwords, API keys, and certificates. Secrets are encrypted and only accessible to the services
that need them, ensuring secure storage and transmission of sensitive information.

Explain the concept of Docker health checks.

Answer: Docker health checks are a feature that allows you to monitor the health of a container and
automatically restart it if it becomes unhealthy. Health checks can be defined in a Dockerfile or using
the HEALTHCHECK instruction to periodically test the container's health status.

How does Docker help in achieving environment consistency across different development
stages?

Answer: Docker ensures environment consistency across different development stages


(development, testing, staging, production) by encapsulating the application and its dependencies
into a Docker image. This allows developers to build, test

What is Power BI, and how is it used in data engineering?


Answer: Power BI is a business analytics tool by Microsoft used for data visualization and business
intelligence. In data engineering, Power BI is used to create interactive dashboards and reports,
enabling stakeholders to gain insights from data stored in various sources, including databases and
data warehouses.

How do you connect Power BI to a database?

Answer: To connect Power BI to a database, you can use the "Get Data" option and select the
appropriate database connector (e.g., SQL Server, MySQL, PostgreSQL). Then, provide the
connection details such as server address, database name, and authentication credentials.

What is a Power BI dashboard, and how is it different from a report?

Answer: A Power BI dashboard is a single-page canvas that provides an overview of key metrics and
KPIs using visualizations such as charts, graphs, and maps. A report, on the other hand, contains
multiple pages with detailed data analysis and visualizations.

Explain the process of data modeling in Power BI.

Answer: Data modeling in Power BI involves transforming raw data into a structured format suitable
for analysis. This includes tasks such as cleaning data, creating relationships between tables,
defining calculated columns and measures, and optimizing data for performance.

What are the different types of data sources supported by Power BI?

Answer: Power BI supports a wide range of data sources, including relational databases (e.g., SQL
Server, Oracle), cloud services (e.g., Azure SQL Database, Google Analytics), files (e.g., Excel, CSV),
and online services (e.g., Salesforce, Dynamics 365).

How do you create calculated columns and measures in Power BI?

Answer: To create calculated columns and measures in Power BI, you can use the Data View or
Model View and use DAX (Data Analysis Expressions) formulas. Calculated columns are created at
the table level, while measures are created at the model level for aggregations and calculations.

What is DAX, and why is it important in Power BI?

Answer: DAX (Data Analysis Expressions) is a formula language used in Power BI for creating
calculated columns, measures, and calculated tables. It is important in Power BI because it allows
users to perform complex calculations and aggregations on data.

How do you create relationships between tables in Power BI?

Answer: To create relationships between tables in Power BI, you can use the Manage Relationships
dialog or drag and drop fields between related tables in the Data View. Power BI automatically
detects and creates relationships based on matching field names.

Explain the concept of data shaping and transformation in Power BI.


Answer: Data shaping and transformation in Power BI involve cleaning, transforming, and modeling
data to prepare it for analysis. This can include tasks such as removing duplicates, splitting columns,
pivoting data, and aggregating values.

What are Power BI data gateways, and when are they used?

Answer: Power BI data gateways are used to connect on-premises data sources to Power BI Service
or Power BI Desktop. They act as a bridge between cloud-based Power BI services and on-premises
data sources, allowing data refreshes and direct querying.

How do you schedule data refreshes in Power BI?

Answer: To schedule data refreshes in Power BI, you can configure refresh settings for datasets in
the Power BI Service. You can specify the refresh frequency, connection credentials, and
notifications for successful or failed refreshes.

What is row-level security (RLS) in Power BI, and how is it implemented?

Answer: Row-level security (RLS) in Power BI is a feature that allows you to restrict access to data at
the row level based on user roles or criteria. It is implemented by defining security roles and
creating DAX filters that control which rows users can access.

How do you create hierarchies in Power BI?

Answer: To create hierarchies in Power BI, you can select multiple fields in a table and group them
into a hierarchy using the "Create Hierarchy" option. Hierarchies allow users to drill down into data
and analyze it at different levels of granularity.

What are Power BI templates, and how are they used?

Answer: Power BI templates are pre-built report templates that contain predefined visualizations,
layouts, and data connections. They are used to quickly create new reports by importing the
template and connecting it to your data source.

Explain the concept of cross-filtering and cross-highlighting in Power BI.

Answer: Cross-filtering and cross-highlighting in Power BI allow users to interactively filter and
highlight data across multiple visualizations. Cross-filtering applies filters to other visualizations
based on user selections, while cross-highlighting highlights related data points in other
visualizations.

What are slicers in Power BI, and how do they work?

Answer: Slicers in Power BI are visual filters that allow users to interactively filter data in reports.
They can be used to filter data based on specific criteria such as dates, categories, or regions, and
they update other visualizations based on user selections.

How do you create custom visuals in Power BI?


Answer: You can create custom visuals in Power BI using the Power BI Developer Tools or by
importing custom visuals from the Power BI Visuals Gallery. Custom visuals are built using HTML,
CSS, and JavaScript and can extend the capabilities of Power BI beyond the built-in visualizations.

Explain the concept of DirectQuery mode in Power BI.

Answer: DirectQuery mode in Power BI allows users to query data directly from a data source in
real-time, without importing it into the Power BI model. This is useful for scenarios where data
freshness and real-time analysis are critical.

What is the difference between Power BI Desktop and Power BI Service?

Answer: Power BI Desktop is a desktop application used for creating and modeling reports, while
Power BI Service (Power BI online) is a cloud-based service used for publishing, sharing, and
collaborating on reports. Power BI Service also provides additional features such as data refresh,
scheduling, and sharing.

How do you share Power BI reports with others?

Answer: You can share Power BI reports with others by publishing them to the Power BI Service and
sharing them with specific users or groups. Alternatively, you can export reports to PDF or
PowerPoint format and share them via email or other communication channels.

What is the Power BI REST API, and how is it used?

Answer: The Power BI REST API allows developers to programmatically interact with Power BI
Service, including operations such as embedding reports, managing datasets, and refreshing data. It
is used to automate tasks, integrate Power BI with other applications, and extend its functionality.

How do you create a Power BI dashboard from multiple reports?

Answer: To create a Power BI dashboard from multiple reports, you can pin visualizations from
different reports to a new dashboard canvas. You can then arrange and customize the visualizations

You might also like