Lab 1 - Getting Started With Azure Data Factory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Orchestrating Big Data Solutions with

Azure Data Factory


Lab 1 - Getting Started with Azure Data Factory

Overview
In this lab, you will provision an Azure Data Factory, and use the Copy Wizard to copy data from a file in
Azure Blob Storage to a table in Azure SQL Database.

What You’ll Need


To complete the labs, you will need the following:
• A web browser
• A Microsoft account
• A Microsoft Azure subscription
• A Windows, Linux, or Mac OS X computer
• The lab files for this course

Note: To set up the required environment for the lab, follow the instructions in the Setup document for
this course.

Exercise 1: Provisioning Azure Resources


In this exercise, you will create the Azure Storage account, Azure SQL Database instance, and Azure Data
Factory instance.

Note: The Microsoft Azure portal is continually improved in response to customer feedback. The steps in
this exercise reflect the user interface of the Microsoft Azure portal at the time of writing, but may not
match the latest design of the portal exactly.

Create a Storage Account and a Blob Container


The source data for your data pipeline will be stored in an Azure storage account:

1. In the Microsoft Azure portal, in the menu, click New. Then in the Storage menu, click Storage
account.
2. In the Create storage account blade, enter the following settings and click Create:
• Name: Enter a unique name (and make a note of it!)
• Deployment model: Resource manager
• Account kind: General purpose
• Performance: Standard
• Replication: Locally-redundant storage (LRS)
• Storage service encryption: Disabled
• Subscription: Select your Azure subscription
• Resource group: Create a new resource group with a unique name
• Location: Select any available region
3. In the Azure portal, view Notifications to verify that deployment has started. Then wait for the
storage account to be deployed (this can take a few minutes.)
4. After the storage account has been created, browse to its blade in the Azure portal.
5. On the blade for your storage account, click Blobs, and add a container with the following
properties:
• Name: adf-data
• Access type: Private
6. In the Azure portal, view Notifications to verify that deployment has started. Then wait for the
container to be created (this should take a few seconds.)
7. After the container has been created, return to the blade for your storage account, and click
Access keys. Note that this blade lists the storage account name and two keys that client
applications can use for authentication when connecting.

Create an Azure SQL Database


Your data pipeline will copy the source data to an Azure SQL Database. SQL databases are hosted in
servers, so you will create both a database and a server to host it.

1. In the Microsoft Azure portal, in the menu, click New. Then in the Databases menu, click SQL
Database.
2. In the SQL Database blade, enter the following settings, and then click Create:
• Database name: DataDB
• Subscription: Select your Azure subscription
• Resource Group: Select the resource group you created previously
• Select source: Blank database
• Server: Create a new server with the following settings:
• Server name: Enter a unique name (and make a note of it!)
• Server admin login: Enter a user name of your choice (and make a note of it!)
• Password: Enter and confirm a strong password (and make a note of it!)
• Region: Select the same location as your storage account
• Allow azure services to access server: Selected
• Elastic pool: Not enabled
• Pricing tier: View all and select Basic
• Collation: SQL_Latin1_General_CP1_CI_AS
• Pin to dashboard: Unselected
3. In the Azure portal, view Notifications to verify that deployment has started. Then wait for the
SQL database to be deployed (this can take a few minutes.)
4. After the database has been created, browse to your Azure SQL server (not the database) and
under Settings, click Properties.
5. Note the fully qualified name of your server (which should take the form
server.database.windows.net, where server is the server name you specified earlier) and the server
admin user name (which should be the login you specified earlier).

Create an Azure Data Factory


Now that you have your data stores in place, you are ready to create an Azure Data Factory.
1. In the Microsoft Azure portal, in the menu, click New. Then in the Data + Analytics menu, click
Data Factory.
2. In the New data factory blade, enter the following settings, and then click Create:
• Name: Enter a unique name (and make a note of it!)
• Subscription: Select your Azure subscription
• Version: 1
• Resource Group: Select the resource group you created previously
• Location: Select the location you specified for your storage account (if it is not available,
select any other location)
• Pin to dashboard: Unselected
3. In the Azure portal, view Notifications to verify that deployment has started. Then wait for the
data factory to be deployed (this can take a few minutes.)

Exercise 2: Using the Azure Data Factory to Copy Data


For simple data copy pipelines, Azure Data Factory provides an easy to use wizard. In this exercise, you
will use the wizard to copy data from your Azure blob store account to your Azure SQL Database.

Upload a Data File to the Blob Container


The source data is a comma-delimited text file containing details of sales transactions.

1. In the data subfolder of the folder where you extracted the lab files for this course, open the
transactions.txt file in a text editor.
2. Review the data this file contains, which consist of multiple rows of dates and amounts. Then
close the text editor without saving any changes.
3. Start Azure Storage Explorer, and if you are not already signed in, sign into your Azure
subscription.
4. Expand your storage account and the Blob Containers folder, and then double-click the adf-data
blob container you created in the previous procedure.
5. In the Upload drop-down list, click Folder. Then upload the data folder (which contains the
transactions.txt file) as a block blob to the root of the container.

Create a Table in the Database


You will copy the sales transaction data to a table named transactions, which contains id, tdate, and
amount fields.

1. Click All Resources, and then click your Azure SQL Database.
2. On the database blade, view the Data Explorer page. This opens the web-based query
interface for your Azure SQL Database.
3. In the toolbar for the query editor, click Login, and then log into your database using SQL
Server authentication and entering the login name and password you specified when
provisioning the Azure SQL Database server.
4. In the query editor, enter the following Transact-SQL query to create a table named
transactions in your database:

CREATE TABLE transactions(id int identity, tdate date, amount decimal);

5. Click Run to run the Transact-SQL statement.


Use the Azure Data Factory Copy Wizard to Copy the Data
1. In the Microsoft Azure portal, browse to the blade for your data factory, and click the Copy data
tile. This opens a new tab in your browser.
2. On the Properties page of the Copy Data wizard, enter the following details and then click Next:
• Task name: Wizard Copy
• Task description: Copy transactions
• Task cadence (or) Task schedule: Run once now
• Expiration time: 3:00:00:00
3. On the Source data store page, on the Connect to a Data Store tab, select Azure Blob Storage.
Then click Next.
4. On the Specify the Azure Blob storage account page, enter the following details and then click
Next:
• Connection name: blob-store
• Account selection method: From Azure subscriptions
• Azure subscription: Select your subscription
• Storage account name: Select your storage account
5. On the Choose the input file or folder page, double-click the adf-data blob container you
created previously, and then select the data folder (which contains the transactions.txt file).
Then click Choose, and click Next.
6. On the File format settings page, wait a few seconds for the data to be read, and then verify the
following details, ensuring that the rows of data in the Preview section match the table below,
and click Next:
• File format: text format
• Column delimiter: Comma (,)
• Row delimiter: Carriage return and line feed (\r\n)
• Skip line count: 0
• Column names in first data row: Selected
• Treat empty column value as null: Selected
• Preview:

tdate amount
2016-01-01 129.99
2016-01-01 125.49
2016-01-01 99.75

7. On the Destination data store page, on the Connect to a Data Store tab, select Azure SQL
Database. Then click Next.
8. On the Specify the Azure SQL database page, enter the following details and then click Next:
• Connection name: sql-database
• Server / database selection method: From Azure subscriptions
• Azure subscription: Select your subscription
• Server name: Select your Azure SQL server
• Database name: DataDB
• User name: The server admin login name you specified when creating the database
• Password: The password for your Azure SQL server admin login
9. On the Table mapping page, in the Destination list, select [dbo].[transactions] and click Next.
10. On the Schema mapping page, ensure that the following settings are selected, and click Next:

Blob path: adf-data/data/ [dbo].[transactions] Include this column


tdate (DateTime) tdate (DateTime) ✓
amount (Double) amount (Decimal) ✓
Repeatability settings:
Method: None

11. On the Performance settings page, expand Advanced settings to review the default values.
Then click Next.
12. On the Summary page, click Finish.
13. On the Deploying page, wait for the deployment to complete.
14. Wait a few minutes to allow the pipeline created by the wizard to run.

Verify that the Data Has Been Copied


The Copy Data wizard should have created a pipeline, and run it to copy the transactions data from your
blob store to your Azure SQL Database.

1. Return to the Query editor for your Azure SQL Database and run the following query:

SELECT * FROM dbo.transactions;

2. Verify that the table now contains 10 rows of transaction data, copied from the text file in your
blob store.

Note: You will use the resources you created in this lab when performing the next lab, so do not delete
them.

You might also like