0% found this document useful (0 votes)
29 views4 pages

Question 1: Fill in The Blanks

Uploaded by

Suhani Modi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views4 pages

Question 1: Fill in The Blanks

Uploaded by

Suhani Modi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Answer big data

Certainly! Here’s the text streamlined for direct copy-pasting:

Question 1: Fill in the Blanks

1. Additional Vs related to Big Data Analysis: Variability,


Visualization, Viability

2. The object is the first citizen in the object-oriented paradigm.

3. The objective of Big Data analysis, SPAS, stands for: Scalability,


Performance, Availability, Security

4. The two main components Hadoop ecosystem initially relied on:


HDFS, MapReduce

Question 2: True/False

1. True - Reduce is the process after map tasks are sorted by keys.

2. True - Pattern extraction and decision-making occur in the Intelligent


phase.

3. True - Pig enables complex job creation in Hadoop.

4. True - NoSQL systems are suited for real-time applications and may
support SQL-like languages.

5. False - Network bandwidth varies by node proximity on the same


rack, not between nodes on the same rack.

Question 3: MongoDB Insertion and Command Result


1. Insert command for new record:

db.city.insertOne({ _id: 102, city: "TORONTO", passengers:


800 })

2. Result of db.city.find({ name: { $regex: "(?i)t(?i)oronto" } }):

• It retrieves all documents where city matches “Toronto” case-


insensitively, including Toronto, toronto, and TORONTO.

Question 4: Difference between NFS and HDFS

• NFS (Network File System): Primarily used for local network storage
and lacks distributed fault tolerance.

• HDFS (Hadoop Distributed File System): Designed for distributed data


storage, it includes redundancy for fault tolerance and is optimized for
large data processing across clusters.

Question 5: Java Class Tester.java

1. Pig Command to Register the Jar:

REGISTER 'NumTester.jar';

2. Output of Code:

• It outputs True or False based on the value of name, checking if


each value is a prime number.
3. Functionality of Tester.java:

• It checks if an integer is a prime number and returns a boolean.

Question 6: Using HIVE

1. Create tables for documents:

CREATE TABLE doc1 (line STRING);


CREATE TABLE doc2 (line STRING);

2. Load documents into tables:

LOAD DATA INPATH 'hdfs_path/doc1.txt' INTO TABLE doc1;


LOAD DATA INPATH 'hdfs_path/doc2.txt' INTO TABLE doc2;

3. Create a table with common words in doc1 and doc2:

CREATE TABLE common_words AS


SELECT word FROM doc1 INTERSECT SELECT word FROM doc2;

Question 7: HDFS Command Explanation

• The command hadoop fs -setrep 2 -R -w /user/hadoop/ sets the


replication factor of files in /user/hadoop/ to 2, applying recursively.

Question 8: Improving GenAI Models with Big Data


• Big Data analysis enables enhanced model training, data diversity
management, and scalable data pipelines for continuous model
learning and improvement. Big Data helps refine these models
through extensive datasets, improving context understanding and
performance.

Question 9: Apriori Algorithm for Association Rules

1. Steps:

• Pass 1: Identify frequent items meeting the support threshold.

• Pass 2: Combine frequent items from Pass 1 into pairs and test for
frequency.

• Subsequent Passes: Continue increasing item combinations until


no additional frequent itemsets meet the support threshold.

2. Association Rules:

• Rules like {bread, milk} -> butter with confidence of 75% are
derived by calculating the conditional probability of item co-
occurrences based on support.

This text is ready for you to paste directly. Let me know if there’s
anything more you need!

You might also like