Interview Questions
Interview Questions
1. Explain you’re current project and you’re roles and responsibilities in the
project?
5.What is Spark and explain spark architecture? and convert into you’re project?
9.Please explain difference between coalesce and repartition? have you used in
project and how?
16. If we create a new column and give same name for it which is already exists in
Data Frame, then what will happen?
17.Explain User Defined Functions (UDF) in Spark? have you used in project? if yes
then explain?
20.Scenari based question: There are 2 Data Frames emp, department and write a code
to join them simply?
22. What are the issues you have faced in you’re project and how you resolved
those?
===================================================================================
==================
SCALA READING DATASET
======================
val df = sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema",
"true").load("/usermanteshchougule3333gmail/Spark_Project_Marketing_Analytics/
Input_Data/Marketing_Analysis.csv")
df = spark.read.option("inferSchema",True).option("header",True).csv('/FileStore/
tables/StudentData.csv')
df.show()
============================================
injestion job flow
>>SecurityManager athenticates and disable security check>>Utils start spark driver
on respected port eg port 26234
>>spark env register MapOutputTracker and register BlockManagerMaster>>
BlockManagerMaster end point defined Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information>>
BlockManagerMaster end point is up now>>
diskBlockManager creates temporay directory( eg.DiskBlockManager: Created local
directory at /tmp/blockmgr-299f7dcd-03de-485a-ba60-ac80d8703f3c)
>>memoryStore allocates memory for task(MemoryStore: MemoryStore started with
capacity 7.8 GB)>>
========================================
CI/CD