0% found this document useful (0 votes)
20 views11 pages

Exercise On ASA

Uploaded by

Bala Krishna E
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
20 views11 pages

Exercise On ASA

Uploaded by

Bala Krishna E
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
Explore Azure Synapse Analytics ‘Azure Synapse Analytics provides a single, consol data analytics platform for end:-to end data analytics. In this exercise youl explore various ways to ingest and explore data This exercises designed a a high-level ‘overview ofthe vatious core capabiltis of Azure Synapse Analytics. Other exercises are available to explore specific capabilities in more deta This exercise should take approximately 60 minutes to complete Before you start Youll need an Azure subseraton in which you have adhministrative-lvel access, Provision an Azure Synapse Analytics workspace [An Azure Synapse Analytics workspace provides a central point for managing data and dats processing runtimes. You can provision a workspace using the interactive interface in the Azure portal or you can deploy 9 workspace and resources within it by using » script or template. in most production scenarios, it's best to automate provisioning with scripts and templates so that you can incorporate resource deployment into 9 repeatable development and operations (DevOps) process. In this exercise, you'll use a combination ofa PowerShell sri anc an ARM template ta provision Azure Synapse Analytics 1Linaweb browser sign into the Acute portal at hetes://oortal.axure. com 2. Use the [>_] button tothe right ofthe search bar at the tap ofthe page to create a new Clousl Shell in the ‘Azute portal, selecting » PowerShell environment and creating storage it prompted, The cloud shell provides a command line interface in a pane at the bottom ofthe Azure porta, as shown here: acum = Ee © SG remipottarecrvane + © #*¥ B @ = 3 g Se 3 Navigate oasis Soa " Hs polo? OD OID I] ote: you have previo reste loud shell that utes 3 Boveronment ui th the op-iown enya the 3. Note that you can resize the cloud shell by dragging the separator barat the top ofthe pane or by using the — 0, and X icone 3t the top right ofthe pane ta minimize, maximize, and clase the pane. For more information about using the Azure Cloud Shell see the Azure Cloud Shell documentation 4-In the PowerShell pane, enter the fllowing commands te clone this repo: % cooy on ope783 = {ait clone https://eithub.con/hicrosoftLearning/ép- 5. After the repo has been cloned, enter the following commands to change to the flder fortis exercise and run the setup.pst script contains Deon ce p-203/an2#Ltes/laps/an Jsetup.pst 6. If prompted, choose which subscription you want to use (this wll oly happen if you have access to svulple Azure subscription). 7.When prompted, enter a suitable password tobe set for your Azure Synapse SOL pool I Note: Se sue rename ths pasword Adina he password cannot £8, Wat forthe script to complete -this typically takes around 20 minutes, but in some cases may take longer. While you ae waiting, review the What Azure Synagse Analytics? article inthe Azure Synapse Analytics documentation Explore Synapse Studio ‘Synapse Studio is a web-based portal in which you can manage and work with the resources in your Azure Symapse Analytics workspace. 1. When the setup script has finished running, in the Azure portal, go to the dp203-xp0a00 resource group ‘that it created, and notice that this resource group contains your Synapse workspace, a Storage account for your data lake, an Apache Spark pool, a Data Explorer pool, and a Dedicated SA pool 2 Select your Synapse workspace, and ints Overview page, in the Open Synapse Studio card select Open ‘to open Synapse Studi ina new browser tab, Synapse Studio isa web-based interface that you can use to work with your Synapse Analytics workspace, 43. On the let side of Synape Studio, use the » con to expand the mend - thie reveals the diferent pages \within Synapse Studio that youll use to manage resources and perform data analytics tasks, a& shown here! Fem synspe Aras wonepce aon synapseoctz6um 4 a : Om Le y — t= |= a Wh incitame BB sempre cent ture > 4. View the Data page, and note that there are two tabs containing data sources: - © A Workspace ab containing databases defined in the workspace (ncluding desicated SQL databases and Data Explorer databases) © A.Linked tab containing datasources that ae inked to the workspace, including Azure Data Lake storage 5. View the Develop page, whichis curently empty. This is where you can define scripts and other assets sed to develop data processing solutions. 6. View the Integrate page, whichis also emp. You use this page to manage data ingestion and integration assets; such a pipelines to transfer and transform data between data sources. 7. View the Monitor page. This is where you can observe data processing jobs a they run and view their history. 8. View the Manage page. This is where you manage the pools, runtimes, and other assets used in your Azure ‘Synapse workspace. View each ofthe tabs inthe Analytics pools section and note that your workspace Includes the fallowing pool © SAL pools: © Bulltin: A serverlss SQL pec! that yeu can use on-demand to explore or process data in a data lbke by using SQL commands © sqhooaoaar A dedicated SOL pool that hosts a relational data warehouse database, © Apache Spark poots © sparloooaooae that you can use on-demand to explore or process data in a data lake by wsing ‘programming languages ke Scala or Python, Ingest data with a pipeline ‘One of the key tasks you can perform with Azure Synapse Analytics isto define pipelines that transfer (and if necessary, vransform) data ftom a wide range of sources into your workspace for analysis. Use the Copy Data task to create a pipeline “In Synapse Studi on the Home page select Ingest to open the Copy Data tool anu Sse sananace plore Snaose 10.0n the Settings step, enter the followings 2 Inthe Copy Data tool, on the Properties step, ensure that Bulltin copy task and Run once now ave selected, and click Next > 3. On the Source step, inthe Dataset substep select the following settings © Source type: All © Connection: Create a new conection, and inthe Linked service pane that appears, onthe Generic protocol ta, select HTTP. Then continue and create a connection toa data file using the following settings © Name: Products © Description: Product lst via HTTP © Connect via integration runtime: AutoResolvelntegationRuntime ae URL: ne tps//ra.githubusercontentcon/Wieronoft\earning/dp-2@3-azure-d8ta engineer /master/Aiifies/abs/61/sventureworks products cs © Server Certificate Validation: Enable © Authentication type: Anonymous 4. After creating the connection, on the Source data store page, ensure the following settings ae selected, ‘and then select Next» Relative URL: Leave bionk © Request method: GET © Additional headers: Leove blonk © Binary copy: Unselected © Request timeout: Leave bank © Max concurrent connections: Leove blank 5.0n the Source step, in the Configuration substep select Preview data to see a preview of the product data your pipeline wllinges, then close the preview. 6 After previewing the data, on the File format settings page, ensure the ellowing settings are selected, ‘and hen select Next> © File format: DelimiteText © Cohimn delimiter: Comms () © Row delimiter: Line feed (7) © First row as header: Selacted Compression type: None 7.0n the Destination step, in the Dataset substep, select the folowing settings © Destination type: Azure Data Lake Storage Gen 2 © Connection: Select the existing connection to your dato lake store (this was created fr you wien you created the workspace. 8 Aer selecting the connection, on the Destination/Dataset step, ensute the following settings are selected, and then select Next>: © Folder path: fes/product data © File name: products. © Copy behavior: None © Max concurrent connections: Leove blank © Block size (MB): Leave blank 9. 0n the Destination step, inthe Configuration substep on the File fr following propertos are selected. Then select Next > at settings page, ensure that the File format: DelimitedText ‘Coimn delimiter: Comma () ter: Line feed (ny) ‘Add header to file: Selected Compression type: None Max rows per file: Leave bank © Flle name prefix: Leave blank Row del 9 and then cick Next > Task name: Copy products ‘Task description Copy products data Fault tolerance: Leove bank Enable logging: Unsclected Enable staging: Unselected 11.0 the Review and finish step, on the Review substep, read the summary and then click Next >, 12 On the Deployment sep, wait forthe pipeline to be deployed and then click Finish, 13.In Synapse Studi, select the Monitor page, andin the Pipeline runs tab, wat forthe Copy products pipeline to complete with a status of Succeeded (you can use the U Refresh button on the Pipeline runs page to refresh the stats) 14. View the integrate page, and verily that it now contains a pipeline named Copy products View the ingested data 1.0n the Data page, select the Linked tab and expand the synapsexcoaceex (Primary) datalake container bisrachy untl you see the files fle storage for your Synapse workspace. Then select the fil storage to veify that folder named product data containing a file named products.csv has been copied to ths location as shown here: Run button to run the SQL code, and review the results, which should look: similar to this Product Producten category stice ™ Mousn-10 ibe, 38 Mountain Ses ssonss00 m Mounrsn-100 iver, 42 Mountain Bikar 3999500 5. Note the results consist of four columas named C1, C2, C3, and C4; and that the ist ow inthe results contains the names ofthe dats fields. To fx this problem, add a HEADER, ROW = TRUE parameters to the ‘OPENROWSET function as shown here (replacing datolakeaoaaa with the name of your data lake storage account), and then erun the query: cone B con cmennouser¢ UUK “ets: //datalakesooet-df5.core.windonsnet/#iles/preduct_data/oroducts.c5¥", aS (reste Now the results ook ke thie Proto ProductName category lstree ™ Mouse 100 iver, 38 Movntsin aes 33389800 m Moun 100 iver, 42 Mowntsin sees 33959800 6. Modify the query 98 follows (replacing detolakeonaoax withthe name of your dat ake storage account) code 2) Coy sete category, COUNTE®) AS Prosuctcount oPenmouse1¢ BULK “hetprt//dotalakesoonoe. of. core. incons-net/files/ product data /oroducts.c8¥", PARSER_VERSIONA"2.0°, HEADER F0H = TRUE aS fresett] Ro» By Category 7. Run the modified query which should returns resubset that contains the number products in each ‘category ke this In the Properties pane for SQL Seript 1, change the Name to Count Preducts by Category. Then in the ‘toolbar, select Publish to save the srt. 9.los ‘Count Products by Category scrist pane. 10. tn Synapse Stuci, select the Develep page, and notice that your published Count Preducts by Category SQL script has been saved there 11. Select the Count Preduets by Category SQL script to reopen it Then ensure thatthe sit is connected tthe Bullen SQL pool and run it to retrieve the product counts 12. Inthe Results pane, select the Chart view, and then selec the following settings forthe chart chart type: Column Category column: Category Legend (series) columns: ProductCount Legend position: bottom - center Legend (series) labet: Leave blank is) minimum value: Leave blank Legend (series) maximum: Leave blank Category labet (eove blank The resting chart should resemble this 60 40 20 o tenet Lh | PFE ESEEEEL PEGE PEP IEE EEE coy PSS see Com ots es © Productcount Use a Spark pool to analyze data While SQL is» common language for querying structured datasets, many dats analysts find languages like Python useful to explore and prepare data for analysis. In Azure Synapse Analytics, you can run Python (and other) code ina Spark poot which uses a distributed data processing engine based on Apache Spark ‘Lin Synapse Studio ifthe fle tab you opened eater containing the preducts.av fle sno longer open, on the Data page, browse product data folder. Then right-click preducts.cey, point to New notebook, and selec Load to DataFrame. 2.In the Notebook 1 pane that opens inthe Attach to lis, slect the sparlaoonooex Spark pool and ensure ‘thatthe Language is set to PySpark (Python) 3. Review the code inthe frst (and ony) cell inthe notebook, which should lock ike ths: code 2 cooy pyspark spark. read, lose(‘2bfs8:// Llesdatalakensnice.dfscore wlndows.net/proguct ssa/products.co¥ formate’ csv 1 Lf header exists uncoment Tine below > dsaplay oe 26nie030)) 4. Use the > icon to the lft ofthe code cal to run it, and wat forthe results. The frst time you run acelin 2 ‘notebook, the Spark pool is started - soit may take a minute or soto return any results. 5. Eventually, the results should appear below the cel. and they shouldbe similar to this © a @ e Producto Producto category srice me Mourain-100 ibe, 42 Mountain Bikes sansmeo 6. Uncomment the header=Trve line (because the products.cs fle has the column headers in the ist ln}, 0 your code looks lke this: code 2 Cony Nioysper spark. rea6.1036("abfss://#[email protected]¢s. cove windows.net/proeuct_62ta/products.cs¥ 18 6 header exists uncomment in > display(oe-28nte(20)) 7. Rerun the cll and verity that the results look lke ths Proto ProductName category ListPrice ™ Moumn-100 iver, 38 Mocntsin sees 38959800, m Mourrain 10 iver, 42 Mountsin Sikes 33559800 Notice that running the cel again takes less time, because the Spark pools already started 8. Under the results, use the + Code icon to add a new code cello the notebook. 9. Inthe new empty code cell, add the following code: Code ® cooy at counts = afsgroupby ef category) coure() aisplay(oe_counts) 10. Run the new code cll by clicking its ican, and review the results, which should look similar to this: category count Wels “ “Inthe results cutput for the cll selact the Chart view. The suiting chart should resemble this: 12 Hitis not already visible, show the Properties page by selecting the Properties button (which looks similar +018) on the right end of the toolbar. Then inthe Propet products snd use the Publish button on the toolbar to save it. pane, change the notebook name to Explor 13. Close the notebook pane and stop the Spark session when prompted. Then view the Develop page to verify that the notebook has been saved, Use a dedicated SQL pool to query a data warehouse Se far you've seen some techniques for exploring and processing file-based data in a data lake, In many cases an enterprise analytics solution uses data lke to store and prepare unstructured data that can then be loaded into a relational data warehouse to support business intelligence (0 workloads. n Azure Synapse Analytics, these data warehouses canbe implemented ina dedicated SQL. pool Ih Synapse Stusio,on the Manage page in the SQL pools section select the sqboaoaaor dedicated SQL pool rw and then use ts ion ta resume [ait for the SOL pool to start. This can take a few minutes. Use the © Refresh button to check its status petiodcally, The status will show 9s Online wien itis ready ‘When the SQL pool has started, select the Data page; and an the Workspace tab, expand S@lL databases and verity that sqhoooone is sted (use & icon atthe top-left ofthe page to relvesh the view if necessa) Expand the sqboooaoex database and its Tables folder, and then inthe. menu forthe FactintermetSales ‘able, point to New SQL seript and select Select TOP 100 rows Review the resus ofthe query, which show the fist 100 sales transactions the table. Ths data was loaded ito the database by the setup script, and is permanently stored in the database associated withthe dedicated SAL pool 6. Replace the SQL. query with the following code: sel 2 Coy SELECT ¢.calendarvear, ¢.MonthtunberOfVear, 6.tnglishWontiiane, Pstnglisheroductkane AS Product, suW(o.Orderquantity) AS Unitssol@ Rom do. Factinternetsales AS 0 3OIN do.OsRDate AS 4 ON o.OrGerDatekey ~ d.Datekey DIN dbo. o8mPreduct AS p Ot o,ProdvetKey = p.Produeskey H0v? By d.Calendarvean, d.MorthunberofVear, d.tnglishwonthare, p.EnglishProductNane (ORDER BY d.tonshtneroFYear 7. Use the > Run button to run the modified query, which returns the quantity ofeach product sald by year sand month, Its not akeady visible, show the Properties page by selecting the Properties button (which looks similar ‘01 onthe right end ofthe toolbar. Then in the Properties pane, change the query name to Aggregate product sales and use the Publish button on the toolbar to save it 9. Close the query pane, and then view the Develop page to veify thatthe SQ\ script has been saved 10. On the Manage page, select the sqbacaacex dedicated SQL pool row and use its Micon to pause't Delete Azure resources Now that you've finished exploring Azure Synapse Analytics, you should delete the resources you've created to avoid unnecessary Azure costs 1. Close the Synapse Studio browser tab and return to the Azure portal 2. On the Azure portal on the Home page, select Resource groups, 3. Select the dp203-s000000 resource group for your Synapse Analytics workspace (ot the managed resource group), and very that it contains the Synapse workspace, storage account, SQL pool, Data Explorer pool, and Spark pool fr your workspace, 4. At the top ofthe Overview page for your resource group, select Delete resource group, 5. Enter the dp203-s00aean resource group name to confrm you want to delete it, and select Delete, [Alter 3few minutes, your Azure Synapse workspace resource group and the managed workspace resource ‘group associated with t will be deleted

You might also like