0% found this document useful (0 votes)
31 views8 pages

MongoDB, Cloud Fund, Talend, UNIX

Uploaded by

arkadeeproy425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

MongoDB, Cloud Fund, Talend, UNIX

Uploaded by

arkadeeproy425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

UNIX and Linux:

==============

1) What is Unix?
UNIX is a portable operating system that is designed for efficient multitasking and multi-user functions. Since it is a
portable operating system, it can run on different hardware platforms. It is written in C language.

2) List the distributions of UNIX.


UNIX has many distributions including Solaris UNIX, AIX, HP UNIX and BSD and many more.

3) List some features of UNIX.


UNIX supports the multiuser system.
UNIX supports the multitasking environment:

4) What is a UNIX shell?


The UNIX shell is a program which is used as an interface between the user and the UNIX operating system. It is
not a part of the kernel, but it can communicate directly with the server.

cd.. --> Go to before folder


cd - --> Previous directory

CommandUsage
ls Lists the content of a directory
alias Define or display aliases
unalias Remove alias definitions
pwd Prints the working directory
cd Changes directory
cp Copies files and directories
rm Remove files and directories
mv Moves (renames) files and directories
mkdir Creates directories
man Displays manual page of other commands
touch Creates empty files
chmod Changes file permissions
./ Runs an executable
exit Exits the current shell session
sudo Executes commands as superuser
shutdown Shutdowns your machine
htop Displays processes and resources information
unzip Extracts compressed ZIP files
echo Displays lines of text
cat Prints file contents
ps Reports shell processes status
kill Terminates programs
ping Tests network connectivity
vim Efficient text editing
history Shows a list of previous commands
passwd Changes user password
which Returns the full binary path of a program
tail Displays the last lines of a file
head Displays the first lines of a file
grep Prints lines that match patterns
whoami Outputs username
whatis Shows single-line descriptions
wc Word count files
uname Displays OS information
neofetch Displays OS and hardware information

pwd (Prints the working directory)


cat (Prints file contents)
cp (Copies files and directories)
mv (Moves and renames files and directories)
rm (Remove files and directories)
touch (Creates empty files)
mkdir (Creates directories)

What is a directory?
A directory is a specialized form of a file that maintains a list of all files in it.

Cmp – Compare the given two files byte by byte and display the first mismatch.
Diff – Display changes that need to do to make both files identical.

chgrp – Change the group of the file.


chown – Change ownership of the file.

r – Reading permission
w – Writing permission
x – Execution permission

How to display the last line of a file?


This can be performed using either “tail” or “sed” commands.

========================================================================================

MongoDB:
=========

What is MongoDB ?
MongoDB is an open-source NoSQL database written in C++ language. It uses JSON-like documents with optional
schemas.
It provides easy scalability and is a cross-platform, document-oriented database.
MongoDB works on the concept of Collection and Document.

What is a Document in MongoDB?


A Document in MongoDB is an ordered set of keys with associated values. It is represented by a map, hash, or
dictionary. In JavaScript, documents are represented as objects:
{"greeting" : "Hello world!"}

What is a Collection in MongoDB?


A collection in MongoDB is a group of documents. If a document is the MongoDB analog of a row in a relational
database, then a collection can be thought of as the analog to a table.

What are Databases in MongoDB?


MongoDB groups collections into databases. MongoDB can host several databases, each grouping together
collections.
Some reserved database names are as follows:
admin
local
config

What are some features of MongoDB?


Indexing: It supports generic secondary indexes and provides unique, compound, geospatial, and full-text indexing
capabilities as well.
Aggregation: It provides an aggregation framework based on the concept of data processing pipelines.
Special collection and index types: It supports time-to-live (TTL) collections for data that should expire at a certain
time
File storage: It supports an easy-to-use protocol for storing large files and file metadata.
Sharding: Sharding is the process of splitting data up across machines.

How to add data in MongoDB?


The basic method for adding data to MongoDB is “inserts”. To insert a single document, use the collection’s
insertOne method:
>db.books.insertOne({"title" : "Start With Why"})

What are the data types in MongoDB?


MongoDB supports a wide range of data types as values in documents. Documents in MongoDB are similar to
objects in JavaScript. Along with JSON’s essential key/value–pair nature, MongoDB adds support for a number of
additional data types. The common data types in MongoDB are:

Null
{"x" : null}
Boolean
{"x" : true}
Number
{"x" : 4}
String
{"x" : "foobar"}
Date
{"x" : new Date()}
Regular expression
{"x" : /foobar/i}
Array
{"x" : ["a", "b", "c"]}
Embedded document
{"x" : {"foo" : "bar"}}
Object ID
{"x" : ObjectId()}
Binary Data
Binary data is a string of arbitrary bytes.
Code
{"x" : function() { /* ... */ }}

What is a primary key in MongoDB?


In MongoDB, the _id field acts as the document's primary key, ensuring uniqueness within a collection. If not
provided during insertion, MongoDB automatically generates it.

What are indexes in MongoDB?


MongoDB uses indexes to speed up queries by quickly finding documents based on indexed fields. It supports
various index types for this purpose.
How do you insert data into a MongoDB collection?
You can insert data into a MongoDB collection using the `insertOne()` or `insertMany()` method.

What is a replica set in MongoDB?


It is a group of servers that maintain the same data. It provides data redundancy and high availability.

What are the data types supported by MongoDB?


MongoDB supports various data types, including string, number, boolean, date, array, object, null, regex, and more

How do you delete data from a MongoDB collection?


You can delete data from a MongoDB collection using methods like `deleteOne()`, `deleteMany()`, or
`findOneAndDelete()`

What is a cursor in MongoDB, and when is it used?


A cursor in MongoDB is an iterator to retrieve and process documents from query results. Cursors are used when
fetching large result sets, allowing you to retrieve documents in batches.

How is data consistency maintained in MongoDB?


MongoDB provides strong consistency within a single document but offers eventual consistency for distributed data
across multiple nodes or shards

How do you perform a query in MongoDB?


You can perform queries in MongoDB using the `find()` method, where you specify criteria to filter documents.

How do you backup a MongoDB database?


You can back up a MongoDB database using tools like `mongodump` or by configuring regular snapshots at the file
system or cluster level.

What are the main features of MongoDB?


MongoDB's key features include flexible data modeling, horizontal scalability, support for unstructured data, a robust
query language, automatic sharding, high availability via replica sets, and geospatial capabilities.

What is the role of a sharding key in MongoDB?


A sharding key determines how data is distributed across multiple shards (database partitions) in a sharded cluster.
MongoDB uses a field in the document to decide which shard should store the document.

What are the different types of indexes in MongoDB?


MongoDB supports various indexes, including single-field indexes, compound indexes, geospatial indexes, text
indexes, hashed indexes, and wildcard indexes.

Can you explain what a document in MongoDB is?


A document is a JSON-like data structure that stores and represents data. It can contain key-value pairs, arrays, and
nested documents.

What is a collection in MongoDB?


A collection in MongoDB is a grouping of documents. Collections are schema-less, meaning documents in the same
collection can have different structures.

How does MongoDB store data?


MongoDB stores data in BSON (Binary JSON) format, a binary-encoded serialization of JSON-like documents.
These documents are stored in collections within databases.
What is the role of collections in MongoDB?
Collections in MongoDB are containers for organizing and storing related documents.

What type of NoSQL database MongoDB is?


MongoDB is a document-oriented database. It stores the data in the form of the BSON structure-oriented databases.
We store these documents in a collection.

Explain Namespace?
A namespace is the series of the collection name and database name.

Why MongoDB is the best NoSQL database?


High Performance
High Availability
Easily Scalable
Rich Query Language
Document Oriented

Why do we use the pretty() method?


We use the pretty() method for displaying the results in a formatted way.

How do we remove a document from the collection?


By using the remove() method, we remove a document from the collection.

========================================================================================
=

Cloud Fundamentals:
==================

1) What is Cloud Technology?


A Cloud is a virtual space on the internet where users can store digital resources like software, applications, and
files. Users can share digital resources across the internet without the restriction of physical location.

2) What are the main features of Cloud Computing?


Agility – You can quickly get a lot of computing power when you need it.
Location Independence – Resources can be accessed from anywhere with an internet connection
Better Storage – with cloud storage, there are no limitations of capacity like in physical devices
Multi-Tenancy – Many users can share resources in the cloud.
Reliability – data backup and disaster recovery become easier and less expensive with cloud computing
Scalability – You can easily increase or decrease your resources as your needs change.

3) What are Cloud Delivery Models?


Infrastructure as a Service (IaaS) – a cloud computing model that provides access to computing resources like
storage, networking, servers, and virtualization on demand.
Platform as a Service (PaaS) – Cloud provides a complete environment for developing and deploying applications,
from simple apps to complex enterprise solutions.
Software as a Service (SaaS) – allows users to connect to and use cloud-based apps over the Internet.
Function as a Service (FaaS) – allows end-users to build and run app functionalities on a serverless architecture
4) What are the different versions of the Cloud?
Public Cloud – the set of computer resources like hardware, software, servers, storage, etc., owned and operated
by third-party cloud providers for use by businesses or individuals.
Private Cloud – a set of resources owned and operated by an organization for use by its staff, partners, or customers.
Hybrid Cloud – a combination of public and private cloud services.
Multi-cloud refers to companies using various public cloud services to support different developers and business
units.

5) Name the main constituents of the Cloud ecosystem.


Cloud Consumers, Direct Customers, Cloud Service Providers

6) Cloud service providers are companies that create and sell cloud services to users.
Direct customers are users who use your services in the cloud, without knowing if it's public or private.
Cloud consumers are people or groups within a business unit who use different cloud services to complete tasks.
Serverless machines tend to virtual machines and container management.
Serverless components also take care of multi-threading hardware allocation.

7) What are the Cloud Storage Levels?


Files, Blocks, Datasets, Objects

8) What are cloud-enabling technologies?


Broadband Networks, Virtualization, Data Centre, Web Technology, Multitenant Technology, Service Technology

9) Microservices is a way of building applications where each piece of code operates independently from others
and from the platform it's built on.

10) What is API Gateway?


An API gateway is a data-plane entry point for API calls that represent client requests to target applications and
services and acts as a single point of entry for a system.

11) In cloud computing, encapsulation means packaged software code along with all of its dependencies, such that it
can consistently run both on cloud and on-premises.

12) What are the Cloud Storage Levels?


Files – These are collections of data that are grouped into files that are located in folders.
Blocks – A block is the smallest unit of data that is individually accessible. It is the lowest level of storage and the
closest to the hardware.
Datasets – Data sets organized into a table-based, delimited, or record format.
Objects – Data and the associated metadata with it are organized as web-based resources.

13) Serverless components in cloud computing simplify application development by eliminating the need to
manage infrastructure. With serverless, you can write code without provisioning servers. Serverless machines
handle tasks like virtual machine and container management, as well as multithreading and hardware allocation.

14) Cloud computing is the delivery of computing services, such as storage, networking, servers, databases,
software, and analytics, over the internet.

15) What are the benefits of cloud computing?


a) Data backup and storage of data. b) Powerful server capabilities. c) Incremented productivity.
d) Very cost-effective and time-saving. e) Software as a Service known as SaaS.

16) What are the open-source cloud computing platform databases?


MongoDB, CouchDB, LucidDB

17) What do you mean by VPN? What does it contain?


VPN stands for Virtual Private Network. VPN is a private cloud that manages the security of the data during
communication in the cloud environment. With a VPN, you can make a public network a private network.

18) List some Cloud-Enabling technologies.


Broadband Networks, Virtualization, Data Center, Web Technology

=======================================================================================

Talend:
=======

What is Talend?
Talend is an open-source data integration platform that provides solutions for data integration and data
management. Talend Open Studio is an open-source ETL tool used for data integration and Big data.

What is the full name of Talend?


Talend Open Studio

What is Talend Open Studio?


Talend Open Studio is an open-source ETL tool used for data integration and Big data. It is based on the eclipse
developer and design tool.

What are the components in Talend Open Studio?


A component is a functional unit that is used to perform a single operation in Talend.

Talend Open Studio is written in which computer language?


Java

Why Talend is called a code generator in Talend?


Talend is a code generator because it offers GUI, that allows the user to perform drag and drop the component to
create a job. Talend translates these jobs into Javascript.

Define tMap?
tMap is an advanced component that integrates itself as a plugin to Talend Studio.

What is the function of tJava?


tJava allows the user to enter personalized code to integrate into the Talend program. This code can be executed
only once.

What is the function of tDenormalizeSortedRow?


tDenormalizeSortedRow combines in a group of all input sorted rows.

What is tJoin?
tJoin joins two tables by doing an exact match on several columns.

Differentiate between ‘Built-in’ and ‘Repository’.


Built-in Repository
===================================================================================
1. Stored locally inside a Job 1. Stored centrally inside the Repository
2. Can be used by the local Job only 2. Can be used globally by any Job within a project
3. Can be updated easily within a Job 3. Data is read-only within a Job

A scheduler is software that picks processes from a queue and loads them into memory for execution. Talend
doesn't come with its own scheduler.

Define a project in Talend?


In Talend Studio, the highest physical structure used for storing several kinds of data integration jobs, routines,
metadata, etc., is known as Project.

Define Context variable in Talend?


Context variables are the user-defined parameters used by Talend that are defined as a job at runtime.

What is the difference between XMX and XMS parameters?


XMX parameter is used to specify the maximum heap size in java, whereas the XMS parameter is used to determine
the initial heap size in java.

What is the function of tJavaFlex?


tJavaFlex allows the user to add personalized code to integrate into the Talend program.

What is the function of tJava?


tJava allows the user to enter personalized code to integrate into the Talend program.

What is the use of tContextLoad?


tContextLoad is used to load a context from a flow. This component performs two controls.

What are the ways to improve the performance of a Job in Talend?


Use of Talend ELT components when it is required
Remove unnecessary records using tFilterRows component
Use of Select Query to retrieve data from the DB
Split Talend Job into smaller SubJobs
Remove unnecessary fields or columns using the tFilterColumns component
Use of Database bulk components

Which component is used to sort data?


tSortRow, tExternalSortRow

What is the default pattern of a Date column in Talend?


By default, the date pattern for a column of type Date in a schema is “dd-MM-yyyy”.

How can you normalize delimited data in Talend Open Studio?


By using the tNormalize component

======================================================================================

You might also like