0% found this document useful (0 votes)

187 views5 pages

Azure Data Factory Interview Questions and Aswers

Uploaded by

prashanthreddy270894

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views5 pages

Azure Data Factory Interview Questions and Aswers

Uploaded by

prashanthreddy270894

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

AZURE ADF INTERVIEW QUESTIONS AND ANSWERS

What is Azure Data Factory?

Cloud-based integration service that allows creating data-driven workflows in the cloud for
orchestrating and automating data movement and data transformation.Using Azure data factory, you
can create and schedule the data-driven workflows(called pipelines) that can ingest data from disparate
data stores.It can process and transform the data by using compute services such as HDInsight Hadoop,
Spark, Azure Data Lake Analytics, and Azure Machine Learning.

Windows Azure Storage?

It gives four types of storage services:

Queues for informing between web parts and worker roles

Tables for storing structural data

BLOBs (Binary Large Objects) to store contents, records, or vast information

Windows Azure Drives (VHD) to mount a page BLOB. These can be transferred and downloaded by
means of BLOBs

What is the integration runtime?

The integration runtime is the compute infrastructure that Azure Data Factory uses to provide the
following data integration capabilities across various network environments.

3 Types of integration runtimes:

Azure Integration Run Time: Azure Integration Run Time can copy data between cloud data stores and it
can dispatch the activity to a variety of compute services such as Azure HDinsight or SQL server where
the transformation takes place

Self Hosted Integration Run Time: Self Hosted Integration Run Time is software with essentially the
same code as Azure Integration Run Time. But you install it on an on-premise machine or a virtual
machine in a virtual network. A Self Hosted IR can run copy activities between a public cloud data store
and a data store in a private network. It can also dispatch transformation activities against compute
resources in a private network. We use Self Hosted IR because Data factory will not be able to directly
access on-primitive data sources as they sit behind a firewall.It is sometimes possible to establish a
direct connection between Azure and on-premises data sources by configuring the firewall in a specific
way if we do that we don’t need to use a self-hosted IR.
AZURE ADF INTERVIEW QUESTIONS AND ANSWERS
Azure SSIS Integration Run Time: With SSIS Integration Run Time, you can natively execute SSIS
packages in a managed environment. So when we lift and shift the SSIS packages to data factory, we use
Azure SSIS Integration Run TIme.

What is the limit on the number of integration runtimes?

There is no hard limit on the number of integration runtime instances you can have in a data factory.
There is, however, a limit on the number of VM cores that the integration runtime can use per
subscription for SSIS package execution.

What is blob storage in Azure?

Azure Blob Storage is a service for storing large amounts of unstructured object data, such as text or
binary data. You can use Blob Storage to expose data publicly to the world or to store application data
privately. Common uses of Blob Storage include:

Serving images or documents directly to a browser

Storing files for distributed access

Streaming video and audio

Storing data for backup and restore disaster recovery, and archiving

Storing data for analysis by an on-premises or Azure-hosted service

What are the steps for creating ETL process in Azure Data Factory?

While we are trying to extract some data from Azure SQL server database, if something has to be
processed, then it will be processed and is stored in the Data Lake Store.

Steps for Creating ETL

Create a Linked Service for source data store which is SQL Server Database

Assume that we have a cars dataset

Create a Linked Service for destination data store which is Azure Data Lake Store

Create a dataset for Data Saving

Create the pipeline and add copy activity

AZURE ADF INTERVIEW QUESTIONS AND ANSWERS
Schedule the pipeline by adding a trigger

What are the top-level concepts of Azure Data Factory?

Pipeline: It acts as a carrier in which we have various processes taking place.

This individual process is an activity.

Activities: Activities represent the processing steps in a pipeline. A pipeline can have one or multiple
activities. It can be anything i.e process like querying a data set or moving the dataset from one source
to another.

Datasets: Sources of data. In simple words, it is a data structure that holds our data.

Linked services: These store information that is very important when it comes to connecting an external
source.

For example: Consider SQL server, you need a connection string that you can connect to an external
device. you need to mention the source and the destination of your data.

How can I schedule a pipeline?

You can use the scheduler trigger or time window trigger to schedule a pipeline.

The trigger uses a wall-clock calendar schedule, which can schedule pipelines periodically or in calendar-
based recurrent patterns (for example, on Mondays at 6:00 PM and Thursdays at 9:00 PM).

Can I pass parameters to a pipeline run?

Yes, parameters are a first-class, top-level concept in Data Factory.

You can define parameters at the pipeline level and pass arguments as you execute the pipeline run on
demand or by using a trigger.

Can I define default values for the pipeline parameters?

You can define default values for the parameters in the pipelines.

Can an activity in a pipeline consume arguments that are passed to a pipeline run?

Each activity within the pipeline can consume the parameter value that’s passed to the pipeline and run
with the @parameter construct.

Can an activity output property be consumed in another activity?

An activity output can be consumed in a subsequent activity with the @activity construct
AZURE ADF INTERVIEW QUESTIONS AND ANSWERS
How do I gracefully handle null values in an activity output?

You can use the @coalesce construct in the expressions to handle the null values gracefully.

Which Data Factory version do I use to create data flows?

Use the Data Factory V2 version to create data flows

Explain the two levels of security in ADLS Gen2?

The two levels of security applicable to ADLS Gen2 were also in effect for ADLS Gen1. Even though this is
not new, it is worth calling out the two levels of security because it’s a very fundamental piece to getting
started with the data lake and it is confusing for many people just getting started.

Role-Based Access Control (RBAC). RBAC includes built-in Azure roles such as reader, contributor,
owner or custom roles. Typically, RBAC is assigned for two reasons. One is to specify who can manage
the service itself (i.e., update settings and properties for the storage account). Another reason is to
permit the use of built-in data explorer tools, which require reader permissions.

Access Control Lists (ACLs). Access control lists specify exactly which data objects a user may read,
write, or execute (execute is required to browse the directory structure). ACLs are POSIX-compliant, thus
familiar to those with a Unix or Linux background.

What is table storage in Windows Azure?

Windows Azure Table storage service stores a lot of organized information. Windows Azure tables are
perfect for putting away organized, non-relational data.

Table: A table is a collection of entities. Tables don’t uphold a blueprint on elements, which implies that
a solitary table can contain substances that have distinctive arrangements of properties. A record can
contain numerous tables.

Entity: An entity is an arrangement of properties, like a database row. An entity can be up to 1 MB in

size.

Properties: A property is a name–value pair. Every entity can incorporate up to 252 properties to store
data. Every entity likewise has three system properties that determine a segment key, a row key, and a
timestamp.

What is Azure Functions?

Azure Functions is a solution for executing small lines of code or functions in the cloud. We can also
select the programming languages we want to use. We pay only for the time our code executes; that is,
AZURE ADF INTERVIEW QUESTIONS AND ANSWERS
we pay per usage. It supports a variety of programming languages, like C#, F#, Node.js, Python, PHP or
Java. It supports continuous deployment and integration. Azure Functions applications let us develop
serverless applications.

What is Azure HdInsight Cluster?

A: Azure HDInsight is a cloud service that makes it easy, fast and cost-effective to process massive
amounts of data using open-source frameworks like Hadoop, Spark, Hive, LLAP, Kafka, Storm and R.
HDInsight can enable a broad range of scenarios, including ETL, data warehousing, and Machine
Learning, to name a few.

What is Azure Data Lake?

Microsoft Azure knowledge Lake may be a extremely ascendible public cloud service that enables
developers, scientists, business professionals and other Microsoft customers to gain insight from large,
complex data sets. As with most knowledge lake offerings, the service is composed of two parts: data
storage and data analytics.

What is SQL Azure database?

SQL Azure database is just an approach to get associated with cloud services where you can store your
database into the cloud. Microsoft Azure is the most ideal approach to utilize PaaS where you can have
different databases on a similar account.

Microsoft SQL Azure has a similar component of SQL Server, i.e., high accessibility, versatility, and
security in the core.

How to stop a running slice?

If you need to stop the pipeline from executing, you can use Suspend-AzDataFactoryPipeline cmdlet.
Currently, suspending the pipeline does not stop the slice executions that are in progress. Once the in-
progress executions finish, no extra slice is picked up.

Azure Data Engineering Interview Q & A - Topicwise
No ratings yet
Azure Data Engineering Interview Q & A - Topicwise
57 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Azure Data Factory
No ratings yet
Azure Data Factory
6 pages
Azure Databricks Interview Question
No ratings yet
Azure Databricks Interview Question
12 pages
Pyspark
No ratings yet
Pyspark
31 pages
Azure Data Factory Notes 1682135573
No ratings yet
Azure Data Factory Notes 1682135573
78 pages
ADE Azure Data Engineer Interview
No ratings yet
ADE Azure Data Engineer Interview
12 pages
ADF Notes
No ratings yet
ADF Notes
1 page
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
15 pages
Performance Tuning Spark UI
No ratings yet
Performance Tuning Spark UI
37 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Snowflake - Syllubus and DBT
No ratings yet
Snowflake - Syllubus and DBT
11 pages
SCD in Databricks
No ratings yet
SCD in Databricks
16 pages
External Tables
No ratings yet
External Tables
105 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
SQL Interview Questions For A Data Engineer
No ratings yet
SQL Interview Questions For A Data Engineer
11 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Interview Series ADF Part-1
No ratings yet
Interview Series ADF Part-1
17 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
What Is Azure Data Engineer
No ratings yet
What Is Azure Data Engineer
74 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
2.7 Years AzureDataEngineer Prateek
No ratings yet
2.7 Years AzureDataEngineer Prateek
2 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Microsoft Azure Interview Important Question
No ratings yet
Microsoft Azure Interview Important Question
12 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Azure Data Factory v2 (PDFDrive)
No ratings yet
Azure Data Factory v2 (PDFDrive)
78 pages
Siva
No ratings yet
Siva
4 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
SCD Typ2 in Databricks Azure
0% (1)
SCD Typ2 in Databricks Azure
8 pages
Bhaskar ADE - Altimetrik
No ratings yet
Bhaskar ADE - Altimetrik
3 pages
Dhanush Bigdata Resume Updated
No ratings yet
Dhanush Bigdata Resume Updated
9 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
PySpark Meetup Talk
No ratings yet
PySpark Meetup Talk
35 pages
Vijay Kanth - Azure Data Engineer
No ratings yet
Vijay Kanth - Azure Data Engineer
2 pages
53 SQL Questions-Answers
No ratings yet
53 SQL Questions-Answers
89 pages
Ajay Resume VLaF
No ratings yet
Ajay Resume VLaF
2 pages
Zclus - Harish - Data Engineer
No ratings yet
Zclus - Harish - Data Engineer
6 pages
Databricks Delta Guide
No ratings yet
Databricks Delta Guide
11 pages
Interview Questions On ADF
No ratings yet
Interview Questions On ADF
2 pages
Databricks
No ratings yet
Databricks
11 pages
Rag 1708257109
100% (1)
Rag 1708257109
5 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Databricks Project
No ratings yet
Databricks Project
1 page
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
Report-Revisiting Elementary Learning Plans
No ratings yet
Report-Revisiting Elementary Learning Plans
37 pages
Databricks
No ratings yet
Databricks
4 pages
PySpark Optimization Scenarios - Wipro
No ratings yet
PySpark Optimization Scenarios - Wipro
8 pages
4100 Technical Presentation
No ratings yet
4100 Technical Presentation
96 pages
150 Data Engineering Interview Questions PDF
No ratings yet
150 Data Engineering Interview Questions PDF
8 pages
TCS Codevita
No ratings yet
TCS Codevita
6 pages
TE040 Iprocurement Test Script On Oracle Iprocurement
100% (1)
TE040 Iprocurement Test Script On Oracle Iprocurement
17 pages
CET341 Assignment Two 2021 - 22
No ratings yet
CET341 Assignment Two 2021 - 22
9 pages
LSMW - Material Master Data Upload Using Direct Input Metho
100% (3)
LSMW - Material Master Data Upload Using Direct Input Metho
13 pages
Moengage Omnichannel Marketing
0% (1)
Moengage Omnichannel Marketing
10 pages
Computer Science and Engineering
No ratings yet
Computer Science and Engineering
384 pages
Crescendo'25 Software Hackathon Rules
No ratings yet
Crescendo'25 Software Hackathon Rules
3 pages
Wa0005.
No ratings yet
Wa0005.
84 pages
The Ultimate Penetration Testing Command Cheat Sheet For Kali Linux
No ratings yet
The Ultimate Penetration Testing Command Cheat Sheet For Kali Linux
51 pages
Ransomware Hostage Rescue Manual
No ratings yet
Ransomware Hostage Rescue Manual
27 pages
SAP Introduction
No ratings yet
SAP Introduction
7 pages
CH 01
No ratings yet
CH 01
42 pages
RIEJ - Volume 9 - Issue 4 - Pages 337-348
No ratings yet
RIEJ - Volume 9 - Issue 4 - Pages 337-348
12 pages
Intro To Python SIMP - Simp QB
No ratings yet
Intro To Python SIMP - Simp QB
3 pages
Pre Requisites For DMS Integration With Document Center
No ratings yet
Pre Requisites For DMS Integration With Document Center
6 pages
Resume Template For AI
No ratings yet
Resume Template For AI
4 pages
ZetaTalk: Index: Mirror Sites
No ratings yet
ZetaTalk: Index: Mirror Sites
3 pages
Mozart
No ratings yet
Mozart
147 pages
SF Git Cheatsheet
No ratings yet
SF Git Cheatsheet
2 pages
How To Use Function Calling With OpenAI Realtime API - by Pragnakalp Techlabs - Nov, 2024 - Generative AI
No ratings yet
How To Use Function Calling With OpenAI Realtime API - by Pragnakalp Techlabs - Nov, 2024 - Generative AI
13 pages
Sample Question Paper/ Question Bank: Subject Name (With Subject Code) Distributed Computing (CSC802)
No ratings yet
Sample Question Paper/ Question Bank: Subject Name (With Subject Code) Distributed Computing (CSC802)
11 pages
Lumiq Case Study Template - 1
No ratings yet
Lumiq Case Study Template - 1
8 pages
Narvar Connect - Magento 2.x Community Extension
No ratings yet
Narvar Connect - Magento 2.x Community Extension
10 pages
SQL in Python Practice
No ratings yet
SQL in Python Practice
3 pages
Assignment On Computer Viruses - Odt
No ratings yet
Assignment On Computer Viruses - Odt
5 pages
SER322 Database Management Systems Lab: JDBC Programming and XML
No ratings yet
SER322 Database Management Systems Lab: JDBC Programming and XML
3 pages
Weekly Note 450 - Architecture Function in An Agile Context Part 3 of 3
No ratings yet
Weekly Note 450 - Architecture Function in An Agile Context Part 3 of 3
2 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Azure Data Factory Interview Questions and Aswers

Uploaded by

Azure Data Factory Interview Questions and Aswers

Uploaded by

AZURE ADF INTERVIEW QUESTIONS AND ANSWERS

What is Azure Data Factory?

Windows Azure Storage?

It gives four types of storage services:

Queues for informing between web parts and worker roles

Tables for storing structural data

BLOBs (Binary Large Objects) to store contents, records, or vast information

What is the integration runtime?

3 Types of integration runtimes:

What is the limit on the number of integration runtimes?

What is blob storage in Azure?

Serving images or documents directly to a browser

Storing files for distributed access

Streaming video and audio

Storing data for analysis by an on-premises or Azure-hosted service

Steps for Creating ETL

Assume that we have a cars dataset

Create a dataset for Data Saving

Create the pipeline and add copy activity

What are the top-level concepts of Azure Data Factory?

Pipeline: It acts as a carrier in which we have various processes taking place.

This individual process is an activity.

How can I schedule a pipeline?

Can I pass parameters to a pipeline run?

Yes, parameters are a first-class, top-level concept in Data Factory.

Can I define default values for the pipeline parameters?

Can an activity output property be consumed in another activity?

Which Data Factory version do I use to create data flows?

Use the Data Factory V2 version to create data flows

Explain the two levels of security in ADLS Gen2?

What is table storage in Windows Azure?

Entity: An entity is an arrangement of properties, like a database row. An entity can be up to 1 MB in

What is Azure Functions?

What is Azure HdInsight Cluster?

What is Azure Data Lake?

What is SQL Azure database?

How to stop a running slice?

You might also like