0% found this document useful (0 votes)
60 views1 page

Hadoop Interview Questions

The document discusses various topics related to checking services in Unix, incremental loads in Hive, exporting data from MySQL to HDFS using Sqoop, and comparing the Parquet and ORC file formats. Parquet is summarized as being a more efficient columnar storage format better suited for analytical querying and reads, while ORC indexes stripes and row groups but not individual queries.

Uploaded by

Matthew Reach
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views1 page

Hadoop Interview Questions

The document discusses various topics related to checking services in Unix, incremental loads in Hive, exporting data from MySQL to HDFS using Sqoop, and comparing the Parquet and ORC file formats. Parquet is summarized as being a more efficient columnar storage format better suited for analytical querying and reads, while ORC indexes stripes and row groups but not individual queries.

Uploaded by

Matthew Reach
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

1. How to check services are up and running in unix ?

sudo
2. Incremental load options in hive , examples ? dwgeek.com
3. Exporting data from mysql to HDFS using sqoop
a. Sqoop
b. Encrypt
c. decrypt
4. Which is better parquet or orc?
ORC (Optimized Row Columnar) indexes are used only for the selection of stripes and row
groups and not for answering queries. AVRO is a row-based storage format
whereas PARQUET is a columnar based storage format. PARQUET is much better for
analytical querying i.e. reads and querying are much more efficient than writing
5.

You might also like