NGT Paper
NGT Paper
Ans.
MUQuestionPapers.com
a. Consistency means that the data remains consistent after any operation is performed that changes
the data, and that all users or clients accessing the application see the same updated data.
b. Availability means that the system is always available.
c. Partition Tolerance means that the system will continue to function even if it is partitioned into
groups of servers that are not able to communicate with one another.
• The CAP theorem states that at any point in time a distributed system can fulfil only two of the
above three guarantees.
MUQuestionPapers.com
e. Write a short note on Non-Relational Approach. (5)
Ans.
− Non-Relational databases are any type of databases that does not follow the relational database
model.
− They are also known as NoSQL databases and are growing in popularity as a result of the rise of big
data and the need to handle the great volumes, variety and velocity of data.
− Traditional RDBMS platforms provide scalability using a scale-up approach, which requires a faster
server to increase performance.
The following issues in RDBMS systems led to why MongoDB and other NoSQL databases were designed
the way they are designed:
• In order to scale out, the RDBMS database needs to link the data available in two or more systems in
order to report back the result. This is difficult to achieve in RDBMS systems since they are designed to
work when all the data is available for computation together. Thus the data has to be available for
processing at a single location.
• In case of multiple Active-Active servers, when both are getting updated from multiple sources there is
a challenge in determining which update is correct.
• When an application tries to read data from the second server, and the information has been updated
on the first server but has yet to be synchronized with the second server, the information returned may
be stale.
− The MongoDB team decided to take a non-relational approach to solving these problems.
− MongoDB stores its data in BSON documents where all the related data is placed together, which
means everything is in one place.
− The queries in MongoDB are based on keys in the document, so the documents can be spread across
multiple servers.
− Querying each server means it will check its own set of documents and return the result. This enables
linear scalability and improved performance.
− MongoDB has a primary-secondary replication where the primary accepts the write requests.
− If the write performance needs to be improved, then sharding can be used; this splits the data across
multiple machines and enables these multiple machines to update different parts of the datasets.
− Sharding is automatic in MongoDB; as more machines are added, data is distributed automatically.
a. Big Data and healthcare are an ideal match. It complements the healthcare industry better than
anything ever will. The amount of data the healthcare industry has to deal with is unimaginable.
b. Identifying unusual patterns of certain medicines to discover ways for developing more
economical solutions is a common practice these days.
MUQuestionPapers.com
2. Big Data in Education
a. Big Data is the key to shaping the future of the people and has the power to transform the
education system for better.
b. Big Data is providing assistance in evaluating the performances of both the teachers as well as
the students.
c. Some of the top universities are using Big Data as a tool to renovate their academic curriculum.
a. Some of the biggest E-commerce companies of the world like Amazon, Flipkart, Alibaba, and
many more are now bound to Big Data and analytics is itself an evidence of the level of
popularity Big Data has gained in recent times.
b. Big Data’s recommendation engine is one of the most amazing applications the Big Data world
has ever witnessed.
c. It furnishes the companies with a 360-degree view of its customers.
a. Media and Entertainment industry is all about art and employing Big Data in it is a sheer piece
of art.
b. Earlier the companies broadcasted the Ads randomly without any kind of analysis.
c. But after the advent of Big Data analytics in the industry, companies now are aware of the kind
of Ads that attracts a customer and the most appropriate time to broadcast it for seeking
maximum attention.
5. Big Data in Finance
a. The functioning of any financial organization depends heavily on its data and to safeguard that
data is one of the toughest challenges any financial firm faces. Data has been the second most
important commodity for them after money.
b. Big Data is bossing the key areas of financial firms such as fraud detection, risk
analysis, algorithmic trading, and customer contentment.
6. Big Data in Travel Industry
a. Big Data and analytics, travel companies are now able to offer more customized traveling
experience. They are now able to understand their customer’s requirements in a much-
enhanced way.
b. From providing them with the best offers to be able to make suggestions in real-time, Big Data
is certainly a perfect guide for any traveller.
c. Discuss the points to be considered while Importing data in Shared environment. (5)
Ans. The points to be considered while Importing data in Shared environment.
•
Pre-Splitting of the Data :
o Instead of leaving the choice of chunks creation with MongoDB, you can tell MongoDB how
to do so using the following command:
o db.runCommand( { split : "practicalmongodb.mycollection" , middle : { shardkey : value } } );
Post this you can also let MongoDB know which chunks goes to which node. For all this you
will need knowledge of the data you will be imported to the database.
MUQuestionPapers.com
• Deciding on the Chunk Size :
o You need to keep the following points in mind when deciding on the chunk size :
o If the size is too small, the data will be distributed evenly but it will end up having more
frequent migrations, which will be an expensive operation at the mongos layer.
o If the size is large, it will lead to less migration, reducing the expense at the mongos layer,
but you will end up with uneven data distribution.
Ans.
− In MongoDB, _id field as the primary key for the collection so that each document can be uniquely
identified in the collection. The _id field contains a unique ObjectID value.
− By default when inserting documents in the collection, if you don't add a field name with the _id in the
field name, then MongoDB will automatically add an Object id field.
− In the following example, you will see how to explicitly specify the _id field when inserting the
documents within a collection.
− While explicitly specifying the _id field, you have to keep in mind the uniqueness of the field; otherwise
the insert will fail.
• The following command explicitly specifies the _id field:
>db.users.insert({"_id":10, "Name": "explicit id"})
• The insert operation creates the following document in the users collection:
{ "_id" : 10, "Name" : "explicit id" }
• This can be confirmed by issuing the following command:
>db.users.find()
− When you query the documents in a collection, you can see the ObjectId for each document in the
collection.
MUQuestionPapers.com
− If you want to ensure that MongoDB does not create the _id Field when the collection is created and if
you want to specify your own id as the _id of the collection, then you need to explicitly define this
while creating the collection.
− When explicitly creating an id field, it needs to be created with _id in its name.
• Example :
db.Employee.insert({_id:10, "EmployeeName" : "Smith"})
• Secondary Indexes :
o All indexes that are user created using ensureIndex() in MongoDB are termed as secondary
indexes.
o These indexes can be created on any field in the document or the sub document.
o These indexes can be created on a field that is holding a sub-document.
o These indexes can either be created on a single field or a set of fields. When created with
set of fields, it’s also termed a compound index .
o If the index is created on a field that holds an array as its value, then a multikey index is used
for indexing each value of the array separately.
o Multikey compound indexes can also be created. However, at any point, only one field of
the compound index can be of the array type.
• Unique Indexes :
o When you create an index, you need to ensure uniqueness of the values being stored in the
indexed field.
o you can create indexes with the Unique property set to true.
o The following command can be run to create the unique index:
db.payroll.ensureIndex( { "userid": 1 }, { unique: true } )
• Sparse Indexes :
o A sparse index is an index that holds entries of the documents within a collection that has
the fields on which the index is created.
o The index is said to be sparse because this contains documents with the indexes field and
miss the documents when the fields are missing.
o Null value is stored in case the fields are missing.
MUQuestionPapers.com
• Geospatial Indexes :
o MongoDB provides geospatial indexes . To create a geospatial index, a coordinate pair in the
following forms must exist in the documents:
o Either an array with two elements
o Or an embedded document with two keys (the key names can be anything).
• Geohaystack Indexes :
o Geohaystack indexes are bucket-based geospatial indexes (also called geospatial haystack
indexes ).
o They are useful for queries that need to find out locations in a small area and also need to
be filtered along another dimension, such as finding documents with coordinates within 10
miles and a type field value as restaurant.
1. Range-based partitioning
− In range-based partitioning , the shard key values are divided into ranges.
− The values are considered as a straight line starting from a Min value to Max value where Min is the
starting period (say, 01/01/1970) and Max is the end period (say, 12/31/9999).
− Every document in the collection will have timestamp value within this range only, and it will represent
some point on the line.
− Based on the number of shards available, the line will be divided into ranges, and documents will be
distributed based on them.
− The documents where the values of the shard key are nearby are likely to fall on the same shard. This
can significantly improve the performance of the range queries.
MUQuestionPapers.com
2. Hash-based partitioning:
− In hash-based partitioning , the data is distributed on the basis of the hash value of the shard field.
− If selected, this will lead to a more random distribution compared to range-based partitioning.
− It’s unlikely that the documents with close shard key will be part of the same chunk.
− For example, for ranges based on the hash of the _id field, there will be a straight line of hash values,
which will again be partitioned on basis of the number of shards.
For example : if you create collection called users , it will be stored in collection-0—
2259994602858926461 files and the associated index will be stored in index-1—
2259994602858926461 , index-2—2259994602858926461 , and so on.
− WiredTiger uses the traditional B+ tree structure for storing and managing data but that’s where the
similarity ends.
− Unlike B+ tree, it doesn’t support in-place updates. WiredTiger cache is used for any read/write
operations on the data.
− The trees in cache are optimized for in-memory access.
MUQuestionPapers.com
➢ Advantages of the WiredTiger Storage Engine
• Efficient storage due to a multiple of compression technologies such as the Snapp, gzip and prefix
compressions.
• It is highly scalable with concurrent reads and writes. This in the end improves on the throughput
and general database performance.
• Assure data durability with write-ahead log and usage of checkpoints.
• Optimal memory usage. The WiredTiger uses both the internal cache and file system cache.
• Storage :
a. MongoDB can use SSDs (solid state drives ) or local attached storage.
b. MongoDB’s disk access patterns don’t have sequential properties, SSDs usage can enable customers to
experience substantial performance gains.
MUQuestionPapers.com
c. Another benefit of using a SSD is if the working set no longer fits in memory, they provide a gentle
degradation of performance.
d. Most MongoDB deployments should use RAID-10. When using the WiredTiger storage engine, the use
of a XFS file system is highly recommended due to performance issues.
• CPU :
a. Since MongoDB with a MMAPv1 storage engine rarely encounters workloads needing a large number
of cores, it’s preferable to use servers with a faster clock speed than the ones with multiple cores but
slower clock speed.
b. The WiredTiger storage engine is CPU bound, so using a server with multiple cores will offer a
significant performance improvement.
{
"filename": "test.txt",
"chunkSize": NumberInt(261120),
"uploadDate": ISODate("2014-04-13T11:32:33.557Z"),
"md5": "7b762939321e146569b07f72c62cca4f",
"length": NumberInt(646)
}
The document specifies the file name, chunk size, uploaded date, and length.
➢ Example of fs.chunks document −
{
"files_id": ObjectId("534a75d19f54bfec8a2fe44b"),
"n": NumberInt(0),
"data": "Mongo Binary Data"
}
− Now, we will store an mp3 file using GridFS using the put command. For this, we will use
the mongofiles.exe utility present in the bin folder of the MongoDB installation folder.
− Open your command prompt, navigate to the mongofiles.exe in the bin folder of MongoDB installation
folder and type the following code –
MUQuestionPapers.com
>mongofiles.exe -d gridfs put song.mp3
➢ To see the file's document in database, you can use find query −
>db.fs.files.find()
➢ We can also see all the chunks present in fs.chunks collection related to the stored file with the following
code, using the document id returned in the previous query −
>db.fs.chunks.find({files_id:ObjectId('534a811bf8b4aa4d33fdf94d')})
MUQuestionPapers.com
f. Discuss how data is written using Journaling. (5)
Ans.
− In the MongoDB system, mongod is the primary daemon process.
− The disk has the data files and the journal files.
− When the mongod is started, the data files are mapped to a shared view i.e, virtual address space.
− If a data file is 2000 Bytes on a memory address ranges from 1,000,000 - 1,002,000.
− Data is not actually loaded it is just map.
− In this scenario, the journaling is not yet enabled.
− When journaling is enabled, a second mapping is made to a private view by the mongod.
MUQuestionPapers.com
− The data file is not directly connected to the private view, so the changes will not be flushed from the
private view to the disk by the OS.
− When a write operation is initiated it, first it writes to the private view.
− The journal keeps appending the change description as and when it gets the change.
− If the mongod fails at this point, the journal can replay all the changes even if the data file is not yet
modified.
MUQuestionPapers.com
4. Attempt any three of the following:
1. Data can be loaded into resilient distributed datasets (RDD) from external sources including
relational databases.
2. or a distributed file system such as HDFS.
3. Spark provides high-level methods for operating on RDDs and which output new RDDs. These
operations include joins.
4. or aggregations.
5. Spark data may be persisted to disk in a variety of formats.
MUQuestionPapers.com
1. OLTP applications work with the database in the usual manner. Data is maintained in disk files.
2. but cached in memory.
3. An OLTP application primarily reads and writes from memory (3),
4. but any committed transactions are written immediately to the transaction log on disk.
5. When required or as configured, row data is loaded into a columnar representation for use by
analytic applications.
6. Any transactions that are committed once the data is loaded into columnar format are recorded in
a journal.
7. and analytic queries will consult the journal to determine if they need to read updated data from
the row store or possibly rebuild the columnar structure.
i. DOM manipulation :
The jQuery made it easy to select DOM elements, negotiate them and modifying their content by
using cross-browser open source selector engine called Sizzle.
MUQuestionPapers.com
ii. Event handling :
The jQuery offers an elegant way to capture a wide variety of events, such as a user clicking on a
link, without the need to clutter the HTML code itself with event handlers.
iii. AJAX Support :
The jQuery helps you a lot to develop a responsive and feature rich site using AJAX technology.
iv. Animations :
The jQuery comes with plenty of built-in animation effects which you can use in your websites.
v. Lightweight :
The jQuery is very lightweight library - about 19KB in size (Minified and gzipped).
vi. Cross Browser Support :
The jQuery has cross-browser support, and works well in IE 6.0+, FF 2.0+, Safari 3.0+, Chrome and
Opera 9.0+
vii. Latest Technology :
The jQuery supports CSS3 selectors and basic XPath syntax.
− While SSD continues to be an increasingly economic solution for small databases or for
performance-critical systems, it is unlikely to become a universal solution for massive databases,
especially for data that is infrequently accessed.
− We are, therefore, likely to see combinations of solid state disk, traditional hard drives, and
memory providing the foundation for next-generation databases.
MUQuestionPapers.com
e. Explain the jQuery DOM Filter Methods. (5)
Ans.
− The jQuery is a very powerful tool which helps us to incorporate a variety of DOM traversal methods to
select elements in a document randomly or in sequential order.
− Most of the DOM traversal methods do not modify the elements whereas they filter them out upon the
given conditions.
• The filter() method is used to filter out all the elements that do not match the selected criteria and
those matches will be returned.
• Syntax:
$(selector).filter(criteria, function(index))
• Parameters:
criteria : It specifies a selector expression, a jQuery object or one or more elements to be returned
from a group of selected elements.
To specify more than one criteria, use a comma
function(index) : It specifies a function to run for each element in the set. If the function returns
true, the element is kept. Otherwise, it is removed.
Example:
<html>
<head>
<title>The JQuery Example</title>
</head>
<body>
<div>
<ul>
<li>list item 1</li>
<li>list item 2</li>
<li>list item 3</li>
<li>list item 4</li>
<li>list item 5</li>
<li>list item 6</li>
</ul>
</div>
</body>
</html>
MUQuestionPapers.com
Examples:
• moving a mouse over an element
• selecting a radio button
• clicking on an element
The term "fires/fired" is often used with events. Example: "The keypress event is fired, the moment
you press a key".
• click():The function is executed when the user clicks on the HTML element.
− Example : $("p").click(function(){
$(this).hide();
});
• dblclick() : The function is executed when the user double-clicks on the HTML element.
− Example : $("p").dblclick(function(){
$(this).hide();
});
• mouseenter() : The function is executed when the mouse pointer enters the HTML element.
− Example : $("#p1").mouseenter(function(){
alert("You entered p1!");
});
• mouseleave() : The function is executed when the mouse pointer leaves the HTML element.
− Example : $("#p1").mouseleave(function(){
alert("Bye! You now leave p1!");
});
• mousedown() : The function is executed, when the left, middle or right mouse button is pressed
down, while the mouse is over the HTML element.
MUQuestionPapers.com
− Example : $("#p1").mousedown(function(){
alert("Mouse down over p1!");
});
• a string
• a number
• an object (JSON object)
• an array
• a boolean
• null
string, number, Boolean, null are simple datatype or primitive datatypes whereas object and array are
referred as complex or structure datatypes.
• a function
• a date
• undefined
➢ JSON Strings :
o Strings in JSON must be written in double quotes.
o Example : { "name":"John" }
➢ JSON Numbers :
o Numbers in JSON must be an integer or a floating point.
o Example : { "age":30 }
➢ JSON Objects :
o Values in JSON can be objects.
o It is set of name or value pairs inserted between { } (curly braces).
o Example : {
"employee":{ "name":"John", "age":30, "city":"New York" }
}
o Objects as values in JSON must follow the same rules as JSON objects.
➢ JSON Arrays :
o It is an ordered collection of values and begins with [(left bracket) and ends with ] (right
bracket).
o The values of array are separated by ,(comma).
MUQuestionPapers.com
o Example : {
"employees":[ "John", "Anna", "Peter" ]
}
➢ JSON Booleans :
o This datatype can be either true/false.
o Example : { "sale":true }
➢ JSON null :
o It is just a define nullable value.
o Example : { "middlename":null }
There are several validators currently available for different programming languages. Currently the most
complete and compliant JSON Schema validator available is JSV.
"properties": {
"id": {
"description": "The unique identifier for a product",
"type": "integer"
},
"name": {
"description": "Name of the product",
"type": "string"
},
MUQuestionPapers.com
"price": {
"type": "number",
"minimum": 0,
"exclusiveMinimum": true
}
},
• $schema : The $schema keyword states that this schema is written according to the draft v4
specification.
• title : You will use this to give a title to your schema.
• description : A little description of the schema.
• type : The type keyword defines the first constraint on our JSON data: it has to be a JSON Object.
• properties : Defines various keys and their value types, minimum and maximum values to be used in
JSON file.
6. Its files are very easy to read as 6. Its documents are comparatively
compared to XML. difficult to read and interpret.
7. It doesn’t use end tag. 7. It has start and end tags.
MUQuestionPapers.com
d. How do we do encoding and decoding JSON in Python. (5)
Ans.
➢ Encoding JSON in Python (encode):
− Python encode() function encodes the Python object into a JSON string representation.
Syntax
demjson.encode(self, obj, nest_level=0)
Example
import demjson
json = demjson.encode(data)
print json
Example
import demjson
json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
text = demjson.decode(json)
print text
Ans. JSON syntax is basically considered as a subset of JavaScript syntax; it includes the following:
• Data is represented in name/value pairs.
• Curly braces hold objects and each name is followed by ':'(colon), the name/value pairs are separated
by , (comma).
• Square brackets hold arrays and values are separated by ,(comma).
{
"book":
[
{
"id": "01",
"language": "Java",
"edition": "third",
MUQuestionPapers.com
"author": "Herbert Schildt"
},
{
"id": "07",
"language": "C++",
"edition": "second",
"author": "E.Balagurusamy"
}
]
}
JSON supports the following two data structures −
• Collection of name/value pairs − This Data Structure is supported by different programming
languages.
• Ordered list of values − It includes array, list, vector or sequence etc.
− The implementations of the two structures are represented in the forms of the object and array.
Crockford outlines the two structural representations of JSON through a series of syntax diagrams
Ans.
− Over the years, many a developer has needed to be able to string together the isolated requests of
a common server, in order to facilitate things such as shopping carts for e-commerce.
− One of the technologies that was forged from this requirement brought forth a technique that we
will leverage in order to achieve the persistence of JSON. That technology is the HTTP cookie.
− The HTTP cookie, or cookie for short, was created as a means to string together the actions taken
by the user per “isolated” request and provide a convenient way to persist the state of one page
into that of another.
− The cookie is simply a chunk of data that the browser has been notified to retain. Furthermore, the
browser will have to supply, per subsequent request, the retained cookie to the server for the
domain that set it, thereby providing state to a stateless protocol.
− The cookie can be utilized on the client side of an application with JavaScript.
− Additionally, it is available to the server, supplied within the header of each request made by the
browser.
− The header can be parsed for any cookies and made available to server-side code. Cookies provide
both front-end and back-end technologies the ability to collaborate and reflect the captured state,
in order to properly handle each page view or request accordingly.
− The ability to continue to progress the state from one page to another allows each action to no
longer be isolated and, instead, occur within the entirety of the user’s interaction with a web site.
Syntax:
− The cookie is simply a string of ASCII encoded characters composed of one or more attribute-value
pairs, separated by a semicolon (;) token.
− Key/Value Pairs Intended to Be Persisted As a Cookie Must Both Be Valid ASCII Characters.
MUQuestionPapers.com