100% found this document useful (1 vote)
152 views4 pages

Adf Loop PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
152 views4 pages

Adf Loop PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

AZURE DATA FACTORY

1. What is azure data factory used for Azure Data factory is the data orchestration service provided by the Microsoft Azure cloud. ADF is used for
following use cases mainly:
1. Data migration from one data source to other
2. On Premise to cloud data migration(Data migration is the process of moving data from one location to another, one format to another,
or one application to another)
3. ETL purpose
4. Automated the data flow.
**3. what are the Top-Level Concepts of Azure DF? Data Factory consists of a number of components? What kind of activities from data factory
you used in your project? Buildings blocks of ADF?
 Pipeline – A pipeline is a collection of data movement and transformation activities, grouped together to achieve a higher-level data
integration task.
 Activities: Activities represent the processing steps in a pipeline. A pipeline can have one or multiple activities. It can be anything i.e process like
querying a data set or moving the dataset from one source to another.
 Dataset: it connects to the data source via linked service. It is created based upon the type of the data and data source you want to connect.
Dataset resembles the type of the data holds by data source.
 Linked Service – Linked service in azure data factory are basically the connection mechanism to connect to the external source. It works as the
connection string to hold the user authentication information.
 Integration runtime is referred to as a compute infrastructure used by Azure Data Factory. It provides integration capabilities across various
network environments.
 Trigger – A trigger is a unit of processing that determines when a pipeline needs to be run. These can be scheduled or set off (triggered) by a
different event.
 Control Flow – The control flow in a data factory is what’s orchestrating how the pipeline is going to be sequenced. This includes activities
you’ll be performing with those pipelines, such as sequencing, branching and looping.
**5. what is the Integration Runtimes and types?
Integration runtime is referred to as a compute infrastructure used by Azure Data Factory. It provides integration capabilities across various network
environments.
A quick look at the Types of Integration Runtimes:
1. Azure Integration Runtime – Can copy data between cloud data stores and send activity to various computing services such as SQL
Server, Azure HDInsight, etc.
2. Self-Hosted Integration Runtime – It’s basically software with the same code as the Azure Integration Runtime, but it’s installed on your
local system or virtual machine over a virtual network.
3. Azure SSIS Integration Runtime – Azure SSIS Integration is a fully managed cluster of virtual machines that are hosted in Azure and
dedicated to run SSIS packages in the data factory. We can easily scale up the SSIS nodes by configuring the node size or scaled out by
configuring the number of nodes on the Virtual Machine’s cluster.
6. What is required to execute an SSIS package in Data Factory? We need to create an SSIS Integration Runtime, and an SSIS Database catalogue
hosted in the Azure SQL database or Azure SQL managed instance.

7b. What are the storage types in Azure?


 Azure Blob: A scalable object store for text and binary data
 Azure Files: Managed file shares for cloud or on-premises deployments.
 Azure Queue: A messaging store for reliable messaging between application components.
 Azure Table: A NoSQL store for no-schema storage of structured data

8. What is use of lookup activity in azure data factory?


Lookup activity in adf pipeline is generally used for configuration lookup purpose. It has the source dataset. Lookup activity used to pull the data
from source dataset and keep it as the output of the activity. Output of the lookup activity generally used further in the pipeline for making some
decision, configuration accordingly.
Lookup activity can retrieve a dataset from any of the data sources supported by data factory and Synapse pipelines. The Lookup activity can
read data stored in a database or file system and pass it to subsequent copy or transformation activities.

9. What is copy activity in azure data factory In Azure Data Factory and Synapse pipelines, you can use the Copy activity to copy data among data
stores located on-premises and in the cloud. After you copy the data, you can use other activities to further transform and analyze it. For creating
the copy activity you need to have your source and destination ready. Here destination is called as sink. Copy activity requires:
Linked Service Dataset

9a. What is Get Metadata activity in azure data factory?


The Get Metadata activity allows reading metadata information of its sources. Name of the file or folder.
Folder Level Field List: - Child Items, Exists, Item name, Item type, Last Modified
File Level Field List: - Column Count, Content MD5(MD5 of the file. Applicable only to files), Exists, Item name, Item type, Last Modified,
Size, Structure

9b. what is Data Flows Data flow is an activity in pipeline. it is used to perform transformations over data.to perform transformations, “Mapping
Data Flows” are used. These are used under Dataflow activity. Transformations are Source, union, join, filter, select, derived column, Exist, Sink
Etc…
9c. what is filter Activity Filter activity is used to filter out the input array based on certain conditions. There are 2 activities.
Item: - @activity(“Get Metadat1”).output.ChildItems
Condition: - @startswith(item().name,’emp’) @not(startswith(item().name,’emp’))

9d. what is forEcah Activity?


The ForEach Activity defines a repeating control flow. This activity is used to iterate over a collection and executes specified activities in a loop.
There are 2 options
Sequential the loop should be executed sequentially or in parallel.
Batch count to be used for controlling the number of parallel execution.(max 50batches and min No)

e. What is conditional Split?

Conditional split allows you to split the input stream into n number of output stream based on expressions conditions. Rows not matching the
condition will be routed to default output.

*9g. How many ways does the Data Factory pipeline can be executed?

You can execute a pipeline either manually or by using a trigger. This article provides details about both ways of executing a pipeline.
*
*9h. what is Key vaults?

Azure Key Vault is a cloud service for securely storing and accessing secrets. A secret is anything that you want to tightly control access to,
such as API keys, passwords, certificates, or cryptographic keys. Key Vault service supports two types of containers: vaults and managed
hardware security module(HSM) pools.
*9I. why do you use Logic apps?

Azure Logic Apps is a cloud-based platform for creating and running automated workflows that integrate your apps, data, services, and
systems.

10. What do you mean by variables in the azure data factory?

They are available inside the pipeline and there is set inside the pipeline. Set Variable and append variable are two types of activities used for the
setting or manipulating the variables values. There are two types of the variable:
System variable: These are some kind of the fixed variable from the azure pipeline itself. For example, pipeline name, pipeline id, trigger name etc.
You mostly need this to get the system information which might be needed in your use case.
User variable: User variable is something which you declared manually based on your logic of the pipeline.

11. What is the breakpoint in the adf pipeline The service allows for you to debug a pipeline until you reach a particular activity on the pipeline
canvas.

12. What is the difference between SSIS and Azure data Factory?
Azure Data Factory (ADF) SQL Server Integration Services (SSIS)
ADF is a Extract-Load Tool SSIS is an Extract-Transfer-Load tool
ADF is a cloud-based service (PAAS tool) SSIS is a desktop tool (uses SSDT)
ADF is a pay-as-you-go Azure subscription. SSIS is a licensed tool included with SQL Server.
ADF does not have error handling capabilities. SSIS has error handling capabilities.
ADF uses JSON scripts for its orchestration (coding). SSIS uses drag-and-drop actions (no coding).

23. What Are Azure Databricks


it represent an easy, quick, and mutual Apache Spark based analytics platform that is optimized for Azure. It is being designed in
partnership with the founders of Apache Spark. Moreover, Azure Databricks blends the finest of Databricks and Azure to let customers speed up
innovation through a quick setup. The smooth workflows and an engaging workspace facilitate teamwork between data engineers, data scientists,
and business analysts.

24. What is Azure SQL Data Warehouse?


It is a huge storage of data collected from a broad range of sources in a company and useful to make management decisions. These
warehouses enable you to accumulate data from diverse databases existing as either remote or distributed systems.
An Azure SQL Data Warehouse can be created by integrating data from multiple sources which can be utilized for decision making,
analytical reporting, etc. In other words, it is a cloud-based enterprise application allowing you to function under parallel processing to rapidly
examine a complex query from the massive data volume. Also, it works as a solution for Big-Data concepts.

**26. What is a Slowly Changing Dimension?


A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is
considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.
There are three types of SCD and they are as follows:
SCD 1 – The new record replaces the original record
SCD 2 – A new record is added to the existing customer dimension table
SCD 3 – A original data is modified to include new data.

**28. what are the types of triggers?? Different between Schedule trigger & Window Trigger?
 The Schedule trigger that is used to execute the ADF pipeline on a wall-clock schedule
 The Tumbling window trigger that is used to execute the ADF pipeline on a periodic interval, and retains the pipeline state
 The Event-based trigger that responds to a blob related event, such as adding or deleting a blob from an Azure storage account

**29. Any Data Factory pipeline can be executed using three methods. Mention these methods
 Under Debug mode
 Manual execution using Trigger now
 Using an added scheduled, tumbling window or event trigger
30. What is fault tolerance in azure Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of
(or one or more faults within) some of its components.

**31. what is incremental load? Incremental loading is the activity of loading only new or updated records from a source into Treasure Data .
Incremental loads are useful because they run efficiently when compared to full loads, and particularly for large data sets. Incremental loading is
available for many of the Treasure Data integrations

33. What is a PolyBase? Uses of PloyBase? PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data Lake
Store via the T-SQL language. It is used to query relational and non-relational databases (NoSQL). You can use PolyBase to query tables and files in
Hadoop or in Azure Blob Storage. The PolyBase works as an intermediate for communication between the Azure Data Storage and SQL Server.

38. How do I gracefully handle null values in an activity output?


You can use the @coalesce construct in the expressions to handle the null values gracefully.

**39. Which Data Factory version do I use to create data flows? Use the Data Factory V2 version to create data flows.

40. What is the difference between Azure Data Lake and Azure Data Warehouse? What is the purpose of Azure data lake?
Azure Data Lake Data Warehouse

Data Lake is a capable way of storing any type, size, and shape of Data Warehouse acts as a repository for already filtered data from a specific
data. resource.

It is mainly used by Data Scientists. It is more frequently used by Business Professionals.

It becomes a pretty rigid and costly task to make changes in Data


It is highly accessible with quicker updates.
Warehouse.

It defines the schema after when the data is stored


successfully. developers to get insight from massive and complexDatawarehouse defines the schema before storing the data.
data sets.

It uses ELT (Extract, Load and Transform) process. It uses ETL (Extract, Transform and Load) process.

It is an ideal platform for doing in-depth analysis. It is the best platform for operational users.

41. What is Blob Storage in Azure? It helps to store a large amount of unstructured data such as text, images or binary data. It can be used to
expose data publicly to the world. Blob storage is most commonly used for streaming audios or videos, storing data for backup, and disaster
recovery, storing data for analysis etc. You can also create Data Lakes using blob storage to perform analytics.

**42. Difference between Data Lake Storage and Blob Storage.


Data Lake Storage Blob Storage

Blob Storage is general-purpose storage for a wide variety of


It is an optimized storage solution for big data analytics workloads.
scenarios. It can also do Big Data Analytics.

It follows a hierarchical file system. It follows an object store with a flat namespace.

Blob storage lets you create a storage account. Storage account


In Data Lake Storage, data is stored as files inside folders.
has containers that store the data.

It can be used to store Batch, interactive, stream analytics, andWe can use it to store text files, binary data, media storage for streaming
machine learning data. and general-purpose data.

44. Explain the two levels of security in ADLS Gen2?


 Role-Based Access Control It includes built-in azure rules such as reader, contributor, owner or customer roles. It is specified for two
reasons. The first is, who can manage the service itself, and the second is, to permit the reasons is to permit the users built-in data explorer tools.
 Access Control List Azure Data Lake Storage specifies precisely which data object users may read or write or execute. 46. What is the
difference between ADF v1 and v2?
 ADFv1 – is a service designed for the batch data processing of time series data.
 ADFv2 – is a very general-purpose hybrid data integration service with very flexible execution patterns. The Data Lake Storage Gen2
hierarchical namespace accelerates big data analytics workloads and enables file-level access control lists (ACLs)

 47. What is the difference between the mapping data flow and wrangling data flow transformation?
 Mapping Data Flow: It is a visually designed data transformation activity that lets users design a graphical data transformation logic
without needing an expert developer.
 Wrangling Data Flow: This is a code-free data preparation activity that integrates with Power Query Online.

 48. Data Factory supports two types of compute environments to execute the transform activities. Mention them briefly.
 Let’s go through the types:
 On-demand compute environment – It is a fully managed environment offered by ADF. In this compute type, a cluster is created to
execute the transform activity and removed automatically when the activity is completed.
 Bring your own environment – In this environment, you yourself manage the compute environment with the help of ADF.

51. what is Storage key Storage keys are tokens used to authorize access to a storage account. You can manage an account’s keys in the Azure portal.

52. what are the security connections types? RBAC, Connection String, Access Keys, Shared Access Keys

You might also like