Adf Loop PDF
Adf Loop PDF
1. What is azure data factory used for Azure Data factory is the data orchestration service provided by the Microsoft Azure cloud. ADF is used for
following use cases mainly:
1. Data migration from one data source to other
2. On Premise to cloud data migration(Data migration is the process of moving data from one location to another, one format to another,
or one application to another)
3. ETL purpose
4. Automated the data flow.
**3. what are the Top-Level Concepts of Azure DF? Data Factory consists of a number of components? What kind of activities from data factory
you used in your project? Buildings blocks of ADF?
Pipeline – A pipeline is a collection of data movement and transformation activities, grouped together to achieve a higher-level data
integration task.
Activities: Activities represent the processing steps in a pipeline. A pipeline can have one or multiple activities. It can be anything i.e process like
querying a data set or moving the dataset from one source to another.
Dataset: it connects to the data source via linked service. It is created based upon the type of the data and data source you want to connect.
Dataset resembles the type of the data holds by data source.
Linked Service – Linked service in azure data factory are basically the connection mechanism to connect to the external source. It works as the
connection string to hold the user authentication information.
Integration runtime is referred to as a compute infrastructure used by Azure Data Factory. It provides integration capabilities across various
network environments.
Trigger – A trigger is a unit of processing that determines when a pipeline needs to be run. These can be scheduled or set off (triggered) by a
different event.
Control Flow – The control flow in a data factory is what’s orchestrating how the pipeline is going to be sequenced. This includes activities
you’ll be performing with those pipelines, such as sequencing, branching and looping.
**5. what is the Integration Runtimes and types?
Integration runtime is referred to as a compute infrastructure used by Azure Data Factory. It provides integration capabilities across various network
environments.
A quick look at the Types of Integration Runtimes:
1. Azure Integration Runtime – Can copy data between cloud data stores and send activity to various computing services such as SQL
Server, Azure HDInsight, etc.
2. Self-Hosted Integration Runtime – It’s basically software with the same code as the Azure Integration Runtime, but it’s installed on your
local system or virtual machine over a virtual network.
3. Azure SSIS Integration Runtime – Azure SSIS Integration is a fully managed cluster of virtual machines that are hosted in Azure and
dedicated to run SSIS packages in the data factory. We can easily scale up the SSIS nodes by configuring the node size or scaled out by
configuring the number of nodes on the Virtual Machine’s cluster.
6. What is required to execute an SSIS package in Data Factory? We need to create an SSIS Integration Runtime, and an SSIS Database catalogue
hosted in the Azure SQL database or Azure SQL managed instance.
9. What is copy activity in azure data factory In Azure Data Factory and Synapse pipelines, you can use the Copy activity to copy data among data
stores located on-premises and in the cloud. After you copy the data, you can use other activities to further transform and analyze it. For creating
the copy activity you need to have your source and destination ready. Here destination is called as sink. Copy activity requires:
Linked Service Dataset
9b. what is Data Flows Data flow is an activity in pipeline. it is used to perform transformations over data.to perform transformations, “Mapping
Data Flows” are used. These are used under Dataflow activity. Transformations are Source, union, join, filter, select, derived column, Exist, Sink
Etc…
9c. what is filter Activity Filter activity is used to filter out the input array based on certain conditions. There are 2 activities.
Item: - @activity(“Get Metadat1”).output.ChildItems
Condition: - @startswith(item().name,’emp’) @not(startswith(item().name,’emp’))
Conditional split allows you to split the input stream into n number of output stream based on expressions conditions. Rows not matching the
condition will be routed to default output.
*9g. How many ways does the Data Factory pipeline can be executed?
You can execute a pipeline either manually or by using a trigger. This article provides details about both ways of executing a pipeline.
*
*9h. what is Key vaults?
Azure Key Vault is a cloud service for securely storing and accessing secrets. A secret is anything that you want to tightly control access to,
such as API keys, passwords, certificates, or cryptographic keys. Key Vault service supports two types of containers: vaults and managed
hardware security module(HSM) pools.
*9I. why do you use Logic apps?
Azure Logic Apps is a cloud-based platform for creating and running automated workflows that integrate your apps, data, services, and
systems.
They are available inside the pipeline and there is set inside the pipeline. Set Variable and append variable are two types of activities used for the
setting or manipulating the variables values. There are two types of the variable:
System variable: These are some kind of the fixed variable from the azure pipeline itself. For example, pipeline name, pipeline id, trigger name etc.
You mostly need this to get the system information which might be needed in your use case.
User variable: User variable is something which you declared manually based on your logic of the pipeline.
11. What is the breakpoint in the adf pipeline The service allows for you to debug a pipeline until you reach a particular activity on the pipeline
canvas.
12. What is the difference between SSIS and Azure data Factory?
Azure Data Factory (ADF) SQL Server Integration Services (SSIS)
ADF is a Extract-Load Tool SSIS is an Extract-Transfer-Load tool
ADF is a cloud-based service (PAAS tool) SSIS is a desktop tool (uses SSDT)
ADF is a pay-as-you-go Azure subscription. SSIS is a licensed tool included with SQL Server.
ADF does not have error handling capabilities. SSIS has error handling capabilities.
ADF uses JSON scripts for its orchestration (coding). SSIS uses drag-and-drop actions (no coding).
**28. what are the types of triggers?? Different between Schedule trigger & Window Trigger?
The Schedule trigger that is used to execute the ADF pipeline on a wall-clock schedule
The Tumbling window trigger that is used to execute the ADF pipeline on a periodic interval, and retains the pipeline state
The Event-based trigger that responds to a blob related event, such as adding or deleting a blob from an Azure storage account
**29. Any Data Factory pipeline can be executed using three methods. Mention these methods
Under Debug mode
Manual execution using Trigger now
Using an added scheduled, tumbling window or event trigger
30. What is fault tolerance in azure Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of
(or one or more faults within) some of its components.
**31. what is incremental load? Incremental loading is the activity of loading only new or updated records from a source into Treasure Data .
Incremental loads are useful because they run efficiently when compared to full loads, and particularly for large data sets. Incremental loading is
available for many of the Treasure Data integrations
33. What is a PolyBase? Uses of PloyBase? PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data Lake
Store via the T-SQL language. It is used to query relational and non-relational databases (NoSQL). You can use PolyBase to query tables and files in
Hadoop or in Azure Blob Storage. The PolyBase works as an intermediate for communication between the Azure Data Storage and SQL Server.
**39. Which Data Factory version do I use to create data flows? Use the Data Factory V2 version to create data flows.
40. What is the difference between Azure Data Lake and Azure Data Warehouse? What is the purpose of Azure data lake?
Azure Data Lake Data Warehouse
Data Lake is a capable way of storing any type, size, and shape of Data Warehouse acts as a repository for already filtered data from a specific
data. resource.
It uses ELT (Extract, Load and Transform) process. It uses ETL (Extract, Transform and Load) process.
It is an ideal platform for doing in-depth analysis. It is the best platform for operational users.
41. What is Blob Storage in Azure? It helps to store a large amount of unstructured data such as text, images or binary data. It can be used to
expose data publicly to the world. Blob storage is most commonly used for streaming audios or videos, storing data for backup, and disaster
recovery, storing data for analysis etc. You can also create Data Lakes using blob storage to perform analytics.
It follows a hierarchical file system. It follows an object store with a flat namespace.
It can be used to store Batch, interactive, stream analytics, andWe can use it to store text files, binary data, media storage for streaming
machine learning data. and general-purpose data.
51. what is Storage key Storage keys are tokens used to authorize access to a storage account. You can manage an account’s keys in the Azure portal.
52. what are the security connections types? RBAC, Connection String, Access Keys, Shared Access Keys