BIG DATA & Hadoop Interview Questions With Answers
BIG DATA & Hadoop Interview Questions With Answers
BIG DATA & Hadoop Interview Questions With Answers
com
H2kInfosys
H2K Infosys provides online IT training and placement services worldwide. www.H2KINFOSYS.com USA- +1-(770)-777-1269, UK (020) 3371 7615 [email protected] / [email protected]
DISCLAIMER
H2K Infosys, LLC (hereinafter H2K) acknowledges the proprietary rights of the trademarks and products names of other companies mentioned in any of the training material including but not limited to the handouts, written material, videos, power point presentations, etc. All such training materials are provided to H2K students for learning purposes only. H2K students shall not use such materials for their private gain nor can they sell any such materials to a third party. Some of the examples provided in any such training materials may not be owned by H2K and as such H2K does not claim any proprietary rights for the same. H2K does not guarantee nor is it responsible for such products and projects. H2K acknowledges that any such information or product that has been lawfully received from third party source is free from restriction and without any breach or violation of law whatsoever.
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
3. Can you give a detailed overview about the Big Data being generated by Facebook?
Ans: As of December 31, 2012, there are 1.06 billion monthly active users on Facebook and 680 million mobile users. On an average, 3.2 billion likes and comments are posted every day on Facebook. 72% of web audience is on Facebook. And why not! There are so many activities going on facebook from wall posts, sharing images, videos, writing comments and liking posts, etc. In fact, Facebook started using Hadoop in mid-2009 and was one of the initial users of Hadoop.
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
8. What is Hadoop?
Ans. Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity computers using a simple programming model.
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
13. Give examples of some companies that are using Hadoop structure?
Ans. A lot of companies are using the Hadoop structure such as Cloudera, EMC, MapR, Hortonworks, Amazon, Facebook, eBay, Twitter, Google and so on.
14. What is the basic difference between traditional RDBMS and Hadoop?
Ans. Traditional RDBMS is used for transactional systems to report and archive the data, whereas Hadoop is an approach to store huge amount of data in the distributed file system and process it. RDBMS will be useful when you want to seek one record from Big data, whereas, Hadoop will be useful when you want Big data in one shot and perform analysis on that later.
17.What is HDFS?
Ans. HDFS is a file system designed for storing very large files with streaming data access patterns, running clusters on commodity hardware.
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
store a file, it automatically gets replicated at two other locations also. So even if one or two of the systems collapse, the file is still available on the third system.
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
28. Explain how input and output data format of the Hadoop framework?
Ans: Fileinputformat, textinputformat, keyvaluetextinputformat, sequencefileinputformat, sequencefileasinputtextformat, wholefileformat are file formats in hadoop framework
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
37. How many states does Writable interface defines ___ in Hadoop?
Ans. Two
38. What are sequence files and why are they important in Hadoop?
Ans: Sequence files are binary format files that are compressed and are splitable. They are often used in high-performance map-reduce jobs
39. What are map files and why are they important in Hadoop?
Ans: Map files are sorted sequence files that also have an index. The index allows fast data look up.
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
44.Why would a developer create a map-reduce without the reduce step Hadoop?
Ans: There is a CPU intensive step that occurs between the map and reduce steps. Disabling the reduce step speeds up data processing.
46. How can you overwrite the default input format in Hadoop?
Ans: In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster.
48.What happens if mapper output does not match reducer input in Hadoop?
Ans: A real-time exception will be thrown and map-reduce job will fail.
Job Oriented Instructor Led Face2Face True Live Online I.T. Training for Everyone Worldwide www.H2KINFOSYS.com || [email protected]
49. Can you provide multiple input paths to a map-reduce jobs Hadoop?
Ans: Yes, developers can add any number of input paths.
50. Since the data is replicated thrice in HDFS, does it mean that any calculation done on one node will also be replicated on the other two?
Ans: Since there are 3 nodes, when we send the MapReduce programs, calculations will be done only on the original data. The master node will know which node exactly has that particular data. In case, if one of the nodes is not responding, it is assumed to be failed. Only then, the required calculation will be done on the second replica.