Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
82 views
Manual Pentaho Data Integration Fundamentals Parte II
Uploaded by
clarisse
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Manual Pentaho Data Integration fundamentals Parte... For Later
Download
Save
Save Manual Pentaho Data Integration fundamentals Parte... For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
82 views
Manual Pentaho Data Integration Fundamentals Parte II
Uploaded by
clarisse
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Manual Pentaho Data Integration fundamentals Parte... For Later
Carousel Previous
Carousel Next
Save
Save Manual Pentaho Data Integration fundamentals Parte... For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 102
Search
Fullscreen
Pentaho Data Integration Fundamentals (Course Code DI1000 Exercise 6 — Input with Parameters & Table Copy Wizard, Continued Using CSV file input and Step Action Insert/Update 28 | Open CApentahotraining\Data ache Files\Output\TextFileoutput . txt ina text editor. 29. | Add the following line as the first row below the headers: 10208; s12_1108;46;176.6;8125;99 And change the quantityordered field of the third entry in the file from 26 to 36: (continued) Fr Secnorberjpodvcreodet goaye Teyar dered; prTcasschy totes order TneniRoe| Resear aaa.aie agement 10208;812_3148;36;128.4;3330.9/14 30_| Glose the file and save your changes. 31__| Click the Run this transformation icon, then click Launch. 32 _| Inthe Step Metrics view, note that the output of the Insert/Update step shows 1 new Output row and 1 Updated row. Q crecuten tity (7 Load | Pecformance Graph 50 ermal ee) sea al] 1 Covfieret 0 zat zane 2 Tat Opiate 8 aw Bee ‘33 | Run the transformation again. The output for the Insert/Update step should now show 0 Output and 0 Updated. GQ erecta tin (G tosa00 FESR Peterce een) 1 coiiero “Dat eee 2 rewt poe Saw at eat 34_| Close “CSVFileInput_InsertUpdate.ktr.” Partil:Table In part Il of this exercise you create a transformation that uses the Table input with input step to load data based on a parameter value. This procedure is Parameters commonly used to load only changed (or delta) data into a data warehouse. Continued on next page Copyright © 2015 Pentaho Corporation Al rademrks are the property of thelr respective owners course books may ot be reproduced or strbute,n whole rin part, without the por wren permission of Pentsho Training, ‘wort pentaho com/ssevcestaining or em: traning @pentaho.com Page | 103Pentaho Data Iteration Fundamentals Course Code D11000 Exercise 6 — Input with Parameters & Table Copy Wizard, Continued Using aTable To create a transformation that uses the Table input step with parameter Input step with Parameters Step Action 1 _ | Create and save anew transformation named: C:\pentahotraining\My Work\EX6\TableInput Parameter. 2 | Drag an Input > Generate Rows and an Input > Table input step onto the canvas. 3 | Drag an Output > Table output step onto the canvas. 4 | Create a hop between Generate Rows and Table input. 5 | Create a hop between Table input and Table output +f} + (et 6 _| Save your work. 7 | Double-click the Generate Rows step. 8 _| inthe Generate Rows dialog, change the Limit to 1. 9° | inthe Fields table, type or choose the following: ‘Name Type Format ‘Value OrderDateFrom | Date | ywwy-MM-dd | 2007-01-01 OrderDateTo | Date__[ yywy-MM-dd [2007-12-31 agsto Sr | pea NE LS [eka Greg ooenal [Gop [ake Odean fe —— ytd aT 080 ota marieat 10 _| Click [OK] to close the ‘Generate Rows’ dialog. 11 _| Double-click the Table Input step. 12 [in the Table input’ dialog, for Connection, select pentaho_oltp. 13 _| Click Get SOL select statement. 14 | In Database Explorer, expand pentaho_oltp > Tables. 15 _| Select orders and click [OK]. 16 | When prompted, click Yes to include the field names in the SQL statement. Continued on next page Copyright ©2035 Pentaho Corporatio. A rademars ae the propery of thel respective owner. ‘Course books may not be rearoduced or distributed, n whole orn part, without the poe writen permission of Pentaho Training. ‘ie pentaho com/servies/aning o ell: aningBpentabo com Page |104Pentaho Daa Integration Fundamentals Course Cade D11000, Exercise 6 — Input with Parameters & Table Copy WizardContinued Using a Table Input step with parameters (continued) Step Action 17 | Add the following additional line to the end of the SQL statement: WHERE orderdate>=? AND orderdate<=? (The ‘? represents the parameterized value). —— SET ‘rdermunber |. Sederdste [ Selppeddace” | comments | Gestensenuber [Mieke GrGaccater=? AND orderdatec+? 18 | For Insert data from step, choose Generate Rows from the drop- down list. 19 _| Click [OK] to close the Table input’ dialog. 20 _| Double-click the Table output step. 21. | Inthe ‘Table output’ dialog, type or choose the following: Field Value Connection pentaho_olap Target schema [leave blank] Target table test_orders_2007 Commit size 1000 Truncate table [checked} 22_| Click the [SQL] button. Continued on next page ‘Copyright © 2015 Pentaho Corporation, Altrademark are the property oftheir respective owners ‘Course Books may not be reproduced or dstrbuted in whole on pa, without the prior written permsion of Pentaho Training we. penteho.com/sevces/traning oF eral:
[email protected]
Page | 105Pentaho Data Integration Fundamentls Course Cade DIIO00 Exercise 6 — Input with Parameters & Table Copy Wizard, Continued Using a Table Step Action Input step with 23 | The following SQL should appear in the Simple SQL editor: parameters CREATE TABLE test_orders_2007 (continued) ( ordemumber INT orderdate DATETIME _ fequireddate DATETIME: 1 shippeddate DATETIME 1 status VARCHAR(25) 1 comments TEXT _ customernumber INT ) Click Execute to run the SQL code. 24 | When you receive the ‘Results’ dialog, click [OK] 25 | Close the "Simple SQL editor.” 26 | Click [OK] to close the "Table output” dialog. 27_| Save your work. 28 | Click the Run this transformation icon, then click Launch. 29 | in the Step Metrics view, note 320 rows Written by Table input and 320 rows Read and Output by the Table output step. 30_| Switch to the View tab. 31. | Expand Transformations > TableInputParameter > Database connections. 32_| Right-click pentaho_olap and choose Explore from the menu. 33_| in Database Explorer, expand pentaho_olap > Tables. 34 Right-click test_orders_2007 and choose Preview first 100. Continued on next page Copyright © 201 Pentaho Corporation. Alragemarts ae the property oftheir especie owners. Course books may nt be reproduced or dtibuted In whole ot npr, without te po writen permission of ents Tsning ‘Hi genaha cond senies igor ea: algpentaho con Page | 106Pentaho Data Inegration Fundamentals Course Code DI1000 Exercise 6 — Input with Parameters & Table Copy Wizard, Continued Using a Table input step with |_ Step Action parameters 35 _| Close the preview data window. (continued) 36 _| Click [OK] to close Database Explorer. 37_| Close “TableinputParameter.ktr.” Partill: Copy —_In optional part Ill of this exercise, you use the Copy Table wizard to move Table wizard data from one database to another. Usingthe Copy To use the Copy Table wizard: Table wizard Step Action 1_| Create @ new transformation (you will save it later) 2_ | From the menu options choose Tools | Wizard | Copy table (not Copy Tables). 3. | At the Enter the source and target database panel, in the left pane select pentaho_oltp. In the right panel, select pentaho_olap. fenter the source and target database ‘Select next to proceed aol i eee 4_| Click Next. 5 | At the Select the table to copy panel, select customers. [Select the table to copy ‘Select the table to copy from the source database vaisbie tables Continued on next page {Copyright ©2015 Pentaho Corporation. Alltrademars are the property oftheir respective owners Course books may not be reproduced or dstibuted, In whole ern pat, without the prior writen permission of Pentaho Tainng. Ww. pentaho.com/sendee/tsining or ema: trablng®peniaho.com Page | 107Pentaho Data Integration Fundamentals (Course Code D11000 Exercise 6 — Input with Parameters & Table Copy Wizard, Continued Using the Copy Table wizard (continued) Step Action 6 _| Click Finish. 7 | Save the transformation in \My Work\EX6\ using File name: CopyTableWizard. 8 _ | Double-click the write to [customers] step. 9 | inthe Table output’ dialog, click the SQL button. 10. | The "Simple SQL editor” should display the code necessary to create the customers table in the pentaho_olap database: CREATE TABLE customers C castomernumber INT _eustomername VARCHAAIEO) “eontataetname VARCHAR(SO) _,contactfirstname VARCHAR|SO) ‘phone VARCHAR(SO} “addressine VARCHAR(SO) ‘addressine? VARCHAR(SO) “ety VARCHAR(SO) _, State VARCHAR(SO) _, Postalcode VARCHAR(15) |, country VARCHAR(SO) {salesrepemploveenumber INT | ereitmit DOUBLE ) Continued on next page Copyright © 2025 Pentaho Corporation. Altrademarks are the property of thee respective owners. Course books may not be reproduced or distruted, a whole on prt, without the prior written permision of Pentaho Teainng. won pertaho.com/sevvices/winig or emai alning@ pentane com Page | 108‘Pentaho Daa Inegration Fundamentals Course Code DILOOO Exercise 6 — Input with Parameters & Table Copy WizardContinued Using the Copy [Step Action Table wizard, 11_| Click Execute. continued 12 _| Click [Ok] to close the ‘Results’ dialog. 13 _| Close the ‘Simple SQL editor.” 14 _| Click [OK] to close the Table output’ dialog 15 _| Save your work. 16 _| Click the Run this transformation icon, then click Launch. 122 rows should be copied from one table to the other. readtemfeutones) oo om sretteatemes| comm Solution Details The solution to this exercise can be obtained using the details below: Location: C:\pentahotraining\solutions\Exercises ‘Completed transformations: EX6_CSVFileInput_InsertUpdate.ktr EX6_TableInputParameter.ktr EX6_CopyTableWizard.ktr (Use File | Import from an XML file to import) End of Exercise Congratulations! You have completed this exercise. Copyright © 2015 Pentaho Corporation. Al rade re the property of ther respective owner Course books may notte reproduced or dstbutd, in whole orn part, without he plo writen permission of Pentaho Tang. eeu nentghe com/series/aning or ema raning®@pentzo.com Page | 109Pentho Data tteration Fundamentals Course Code D11000 Exercise 6 Advanced — Input with Parameters & Table Copy Wizard Introduction Prerequisites Instructions & Objectives This is an advanced version of Exercise 6. It is the same exercise, but without the detailed guidance. You may choose to do this advanced exercise rather than the standard version of Exercise 6. You must have the ‘pentaho_olap’ and ‘pentaho_oltp’ database connections. You must also have access to pentahotraining files required (if any). Please read each part section below to see the requirements and instructions necessary for completing each part this advanced exercise. After completing all parts of this exercise, you will be able to: * Create a transformation that uses the Generate Rows step as a parameter for loading a table with data. Choose any database table you like. '* Use the Copy Table wizard to copy contents of the customers table from pentaho_oltp to pentaho_olap. End of Exercise Congratulations! You have completed this exercise. Copyright © 2015 Pentaho Corporation. Al rademarks are the property of thei respectve owness Course books may not be reproduced or ditrisuted In whole or in prt, without the prior written permision of Pentaho Teining. ‘ott. petaho com servcesainig or eral: raring pentaho com Page |110Pentaho Data Integration Fundamentals Exercise 7 — Introduction Prerequisites Objectives. Part |: Understanding Data Flow Course Code D11000 Parallel Processing In this exercise, you will learn how to create and then execute more than ‘one instance of a step at the same time. None. After completing this exercise, you will be able to: © Copy and Distribute data «© Start multiple copies of a step. © Send data to multiple outputs In this exercise you will start multiple copies of a step and distribute the data to multiple outputs Continued on next page copyright © 2015 Pentaho Corporation. All trademarks are the propety oftheir respective owners, Course books may nt be reproduced or dstibute, in whe orn par, without the peor written permision of Pentaho Telnng sw pentaho.com/services/taning or ema:
[email protected]
Page | 411Penta Data Integration Fundamentals Course Code D11000 Exercise 7 — Parallel Processing, Continued ‘Understanding Data Flow Step ‘Action 1_| Open the transformaion, HelloWorld. 2 | Double-click the Generate Rows step, change the Limit field to 1000000 (one million), and click {OK}. 3 _| Save the transformation as HelloWorld_ParallelProcessing. 4 | Inthe toolbar, click the Run this transformation or job button. [ovinemnatpoataiaase THB Dwoess yege oe S| Inthe ‘Execute a transformation’ dialog, accept the default options and click Launch. 6 | Switch to the Step Metrics view and note the number of rows written equals 1,000,000. ‘Steprame Create 18 rows Dummy (do nothing) Continued on next page Copyright © 2015 Pentaho Corporation. ll trademarks are the property of ther respective owners. ‘Course books may not be reproduced or stributd, In whole orn part, without the prio writen permission of Pentaho Tesning. seousentaho.com/senices/traning or emai tainina@pentahosom Page [112Pentaho Data Integration Fundamentals (Course Code Di1000 Exercise 7 — Parallel Processing, continued Action Right-click the Generate Rows step and select Change the number of copies to start from the context menu. Understanding Data Flow Step (continued) 3 9 In the ‘Nr of copies of step’ dialog, change the Number of copies field to 5, and click [OK]. This adds a X5 marker to the step on the }-—__>+_@] Crente1S rows ‘Dummy (do nothing) 10 _| Save the transformation as HelloWorld_ParallelProcessing_Scopies. 11_| Click the Run this transformation or job button. 12_| in the ‘Execute a transformation’ dialog, click Launch. 13 | Switch to the Step Metrics view. Notice the transformation wrote '5x1,000,000 rows by spawning 5 copies of the Generate Rows step. Also notice the Dummy step reads 5,000,000 rows. Create 15 rows Create rows Crete rows Create rows Crete rows Duma (do nething) Continued on next page Copyright ©2015 Pentaho Corporation, Al tademarks are the property oftheir respective owners. ‘Course books may nt be reproduced or estributed In whole or In pan, without the peor written permizsion of Pentaho Tesining. wivw.pentaho.com/servces/ traning or ema: twaling @pentaho com Page |113Pentaho Data integration Fundamentals Course Code D11000 Exercise 7 — Parallel Processing, Continued Understanding Data Flow Step ‘Action (continued) 14 Using Edit | Copy and Paste, create a total of 5 Dummy (do nothing) steps on the canvas. @) 15, ‘Save your work as: HelloWorld_ParallelProcessing_ Distribute. 16 Select Generate Rows, hold the Shift key and click and drag onto Dummy (do nothing) 2. This will create a hop between the two steps. 7 ‘When you receive the Warning dialog, click Distribute. Sie Smee hn 7 dttan granite Seton pining Heenan sondern 18 Using the previous steps (or the context menu), create hops between Generate Rows and the remaining Dummy steps. 19 ‘Save your work or you will not see the expected results. . 20, Click the Run this transformation or job button. 24 In the ‘Execute a transformation’ dialog, click Launch. Continued on next page Copyright © 2015 Pentaho Corporation. Al rademark ace the property oftheir respective owners. Course books may not be reproduced or distributed n whol or in pat, wlthout the prior written permission of Pentaho Teinng. Ya entaho com/seedess/vsiniog or erat: tainingt@pentsbo com Page |114Pentaho Data Integration Fundamentals. Course Code DI1000 Exercise 7 — Parallel Processing, Continued Understanding Data Flow {continued) Part Il: Parallelism Working with parallel processing Step Action 22 ‘Switch to the Step Metrics view. Note that each of the 5 Dummy steps now reads 1,000,000 rows. NOTE: This is the same result as right-clicking the Dummy (do nothing) step and changing “Change the number of copies to start” to5. 1B weecrisn rome El tons TE ep Henne LE Pedomance aon GeatcceuennTgl 23 Close the “First steps” transformation. In part Ill of this exercise, you create and run a transformation that implements parallel processing. To implement parallel processing: Step ‘Action 1_| Choose File | New | Transformation from the menu options. 2_| Choose File | Save. 3 _| Save the transformation using File name: EX7_ParalellProcessing DataFlow. 4 | Expand Input, then click and drag 2 Generate Rows steps onto the canvas. 5 | Expand Transform, then click and drag 2 Add Sequence steps onto the canvas, Continued on next page Copyright © 2035 Pentzho Corporation. All trademarks are the propery ofthel respective owners. Course books may net be reproduced or isibute, in whale orn ar, without the por written permision of Pentaho Training. won pentaho.com/serces/trsning or ema
[email protected]
Page |115Pentaho Data Integration Fundamentals Course Code DINO Exercise 7 — Parallel Processing, Continued Action Working with parallel Step processing 6 (continued) Expand Flow, then click and drag a Dummy (do nothing) step onto the canvas. >a setae Cr Remo > =} Generate Rows2 Add sequence? Hover the mouse pointer over the first Generate Rows step. ‘Add a hop. Click the Add sequence step to create a hop between Generate Rows and Add sequence. 10 Repeat the previous steps to add a hop between Generate Rows 2 and Add sequence 2. 1 Repeat the previous steps to add a hop between Add sequence and Dummy. 12 Repeat the previous steps to add a hop between Add sequence 2 and Dummy. GenerteRons i a —{@) ‘uy (onething) @ : Generate Rows? Adssequence? Continued on next page ‘Copyright © 2015 Penteho Corporation. ll tademarks ae the property oftheir respective owners ‘Course books may not be reproduced or distributed, ln whole atin part, without the prio writen permission of Pentaho Training. wap pentaho.com/sendces/vaning or elt: aning®pentaho.com Page | 116Pentaho Data integration Fundamentals Course Code DIL000 Exercise 7 — Parallel Processing, Continued Working with Step Action Parallel 13 | Edit each Generate Rows step and change the Limit (number of processing rows) to 100. You do not need to add fields. 14 [Double-click Add sequence to edit it. 15_| in the Counter name (optional) field, type: CounterA. 16 | Edit Add sequence 2 and change the Counter name (optional) to: CounterB. 17_| Save your work. 18 | Select the Dummy step, right-click, and choose Preview from the context menu. 19 [in the Transformation debug’ dialog, change the Number of rows to retrieve field to 200 and click Quick Launch. (continued) (D NOTE: Be sure the Dummy step is highlighted in the Transformation debug dialog or the results received will be incorrect, 20 | Inthe ‘Examine preview data’ dialog, scroll down the list significantly and find that the entries in the valuename column are out of sequence, This happens because you have configured two separate counters to be placed into the same field in the stream. step: Dummy (do nething) 009 rm) (NOTE: Your results may not appear identical to the screenshot above. Continued on next page Copyright ©2015 Pentaho Corporation. Altrademar are the prosert of thei respective owners Course books may not be reproduced or itt, In whole orn pat, without the prior rte permission of Pentaho Tang ‘wc nentabo.comuserces/trag or ema aloe penishocom Page |117Pentaho Daa Integration Fundamentals Exercise 7 — Parallel Processing, Continued Course Code D11000 Working with parallel Step Action processing 21 | Click the valuename column header to sort the values. Note the duplicate values. (continued) preview step dialog. 22 | Close the Examine preview data window and close the Select the 23 Edit the Add sequence and Add sequence 2 steps and change Counter name (optional) for both steps to lab2. 24__| Save your work. Continued on next page Copyright © 2015 Pentaho Corporation ll trademarks ae the property of their espective owners. ‘Course books may not be reproduced or distributed, n whole rin pat, without the prior writen permission of Pentaho Tralning. ‘ew pentsho.com/services/taaing cr emal training@oenteho com Page |118Pentaho Data Integration Fundamentals Course Code D11000 Exercise 7 — Parallel Processing, Continued SSS Working with parallel Step_| Action Processing, 25 | Select the Dummy step, right-click, and choose Preview from the continued context menu. 26 | In the Transformation debug dialog, change the Number of rows to retrieve field to 2000 and click Quick Launch. (NOTE: Be sure the Dummy step is highlighted in the Transformation debug dialog or the results received will be incorrect. 27 | Click the valuename column header to sort the values. Note the values are now consecutive: 1-200, 28 | Close the “Examine preview data” dialog, close the “Select the preview step” dialog, and close the Data flow transformation. Solution Details The solution to this exercise can be obtained using the details below: Location: C:\pentahotraining\solutions\Exercises ‘Completed transformations: EX7_HelloWorld_ParallelProcessing_Scopies.ktr EX7_HelloWorld ParallelProcessing Distribute. ktr EX7_ParalellProcessing_DataFlow.ktr (Use File | Import from an XML file to import) End of Exercise Congratulations! You have completed this exercise. Copyright © 2015 Pentaho Corporation. Altrademarks ate the property of thei resp Course books may not be reproduced or cstlbuted, in whole orn par, ‘wow pentaho.com/sercesainlogo° ema Page |119 ut the rir witen permission of Pentaho Trainin, Nalning®pentaho.comPentaho Data Iteration Fundamentals Course Code DII000 Guided Demo 9 — Choosing Adequate Sample Size for ‘Get Fields’ Introduction: In this guided demo, you will witness the importance of choosing an adequate number of sample rows to enter when clicking the [Get Fields} button for input steps. You will be guided to purposely choose a number of sample rows that is not enough for PDI to correctly determine the correct column length for a CSV file input step’s fields. Then, you will attempt to load the data into a table whose column lengths were determined from the CSV file input step’s fields. This results in an error during the table load. Q NOTE: This guided demo is performed at this stage in the course due to its proximity to the exercises where the importance of the correct preview sample size is paramount. Objectives After completing this guided demo, you will be able to: ‘* Properly choose a sample size for a CSV file input step. The same technique is used for a Text file input step as well. « Identify transformation errors whose cause may be an improper sample size chosen in a previous step. Prerequisites. You must have Pentaho Data Integration (or Pentaho Business Analyti suite) installed and properly configured. You must also have access to course files required (if any). Continued on next page Copyright © 2015 Pentaho Corporation. Al trademarks ae the property ofthe ezpective owners. ‘Course books may not be reproduced or dsibuted, In whaler la pan, without the prio writen permission of Pentaho Tesining. sw entaho com/sendces/traning or emai training@pentaho om Page |120Pentaho Data Integration Fundamentals ‘Course Code D11000 Guided Demo 9 — Choosing Adequate Sample Size for ‘Get Fields’, Continued ‘ Create the Transformation [Step ‘Action 1 To create the new transformation, press CTRL-N. 2__|To save the Transformation, press CTRLS. 3 _| Set the transformation properties for the Transformation tab according to the table below: Property Name Value Transformation name | GetFieldsSampleSize Description ‘Add a description of your choice. Directory ‘/public/PD|_Trn_Objects £Q NOTE: This isin the repository. 4 _| Toclose the Transformation properties’ dialog, click [OK]. 5 | To customize the save comment, at the ‘Enter a comment’ dialog, enter an optional comment, and then, click [OK]. Create & In this portion of the guided demo, you will create a new transformation and Configurethe add a CSV file input step. Then you will configure it read the sales_data.csv {SVFileImput file that comes with every POI installs pat ofits sample date. ep Step Action 1 _ | Tocreate the first step, from the Input category of the Design tab, drag the CSV file output step onto the Canvas. 2 [To open the step’s properties dialog, double-click the step. Continued on next page opyght © 2015 Pertoh Corporatio. sitodemarts athe pope of he respecte ones course books may nt be reproduced or drove, whol on pat thou the por writen permisiono Pentaho Tenng. iupntah com sendcex/inag real: lnngepentahasam Page |121Pentaho Data Integration Fundamentals Course Code DIIO00 Guided Demo 9 — Choosing Adequate Sample Size for ‘Get Fields’, Continued Create & Configure the Step Action CSV File Input 3 ‘To configure the Filename property, set it according to the table Step, continued ahownkelow Property Value Name Filename ‘C:\Pentaho\design-tools\data- integration\samples\transformations\files\sales_ data.csv Demonstrate _ In this section of the guided demonstration, you will configure the fields for nage ‘Sample the CSV file input step to read the entire file to determine the proper field ze lengths. ‘Step Action 1 | To have PDI read the entire contents of the file to determine the proper field lengths, and configure the fields: © In the ‘CSV Input’ dialog, click [Get Fields). * In the ‘Sample size’ dialog, enter 0 (zero), and then click [OK]. Continued on next page ‘Copyright © 2035 Pentsho Corporation. ll trademarks ae the property of thelr respective owners. ‘Course books may not be reproduced or distributed, In whole arn pant, without the pir writen permission of Pentaho Teaning. was. pentaho com/sendces/v ating or ell:
[email protected]
Page |122Pentaho Data Integration Fundamentals Course Code DI1000 Guided Demo 9 — Choosing Adequate Sample Size for ‘Get Fields’, Continued Demonstrate ProperSample [Step | Action Size, continued 2 | Notice in the ‘Scan results’ dialog that PDI scanned 2823 lines, and then click (Close). Result efter scanning 2623 Tine Field ar. 1 ‘Heid nan genganncer Held tes nteser 5 3 _| Examine the field lengths that PDI has automatically configured for you. Specifically, notice the field named PRODUCTLINE. Its length is correctly set to 16. ‘onoeaNUNER ‘quavvoroeseo Prcteece ‘ORDERLNEN OER saues orornoaTE ane wean, 6 CAUTION If you are reading a very large file, it may take a long time to read all of its contents. In that case, it would be better to determine beforehand the proper field lengths or calculate a sample size that is smaller than the entire file, but large enough to encapsulate the largest field values. Cont jued on next page Copyright© 2015 Pentaho Corporation ll trademarks are th propenty of ther respective owners. ‘course books may not be reproduced or dstb.ted In whole rin par, without the prior written permlsion of Pentaho Trang. ‘moa. pentaho com/stvcesvainlng or ema:
[email protected]
Page | 123Pentaho Data Integration Fundamentals ‘Course Cade DIIO00 Guided Demo 9 - Choosing Adequate Sample Size for ‘Get Fields’, Continued Demonstrate Using an Improper Sample Size In this section of guided demonstration, you will configure the fields for the CSV file input step again, but this time using an inadequate sample size. To demonstrate how a transformation can behave when the field lengths are not correct, you will continue to build the transformation by creating a table to load the data into using a table output step. The length of the PRODUCTLINE field in the table created will not be large enough to contain all of the data for all of the rows in the CSV file, You will witness an error when the transformation is run and the table load is attempted. This is expected, Step Action 1 | To have POI read the entire contents of the file to determine the proper field lengths, and configure the fields: In the ‘CSV Input’ dialog, click (Get Fields]. In the ‘Sample size’ dialog, enter 100, and then click [OK]. GQ NOTE: 100 is PDI's default sample size. Notice in the ‘Scan results’ dialog that POI scanned 100 lines, and then click {Close]. Here are the ests ofthe document san ) [Result after scenning 100 lines. Field er. i: Continued on next page Copyright © 2015 Pentaho Corporation. ll trademarks ae the property of ther respective owns. ‘Course books may not be reproduced or distributed, in whale ora pat, without the rir writen permission of Pentaho Teining nw. pentaho.com/seniees/ traning or emai:
[email protected]
Page |124Pentaho Data tncgation Fundamentals Course Code 11000 Guided Demo 9 - Choosing Adequate Sample Size for ‘Get Fields’, Continued Demonstrate Using an Step ‘Action Improper Sample "2 | Examine the field lengths that PDI has automatically configured for Size, continued you. Specifically, notice the field named PRODUCTLINE. Its length is set to 12. You now know that 12 is too small and not all data for that row in the CSV file can fit into the field, Name Type oRDeRNUMBER Integer QUANTTTYOROERED Integer. PRICEEACH Number CORDERLINENUMBER Integer SALES Number ‘ORDERDATE Date status Seng arp rege MoNTLIO rege PRODUCTUNE 2 NOTE: There are several fields that are problematic using this, sample size. For the purpose of this guided demonstration, we are focusing on the PRODUCTLINE field. 3__| To close the ‘CSV Input’ dialog, click [OK]. Continued on next page Copyright © 2015 Pentaho Corporation. All trademarks are the property oftheir respective owners. Course books may not be reproduced or cstibuted, in whale orn par, without the por wien permission of Pentaho Training. nw pentaho.com/serices/uining or emalk Walnlng @pentaho cm Page |125Peotabo Data Integration Fundamentals Course Code D11000 Guided Demo 9 — Choosing Adequate Sample Size for ‘Get Fields’, Continued Create & Configure the Table Output Step Step Action To create the next step, from the Input category of the Design tab, drag the Table output step onto the Canvas. 2 Create a hop between the steps as shown in the table below: Source Step Destination Step CSV file input. Table output 3 When creating the hop, you will be presented with a context menu. Click Main output of step. CS Fie input Tobe 4 & Erorhanding of step 4 To open the step’s properties dialog, double-click the step. 5 To configure the Filename property, in the File tab, set it according to the table shown below: Property Name Value Connection pentaho_olap Target table ‘SampleSizeTest NOTE: All other properties on this tab should remain at their default values. 6 To generate the SQL query that will create the table defined in the Target table property, click the [SQL] button. Continued on next page ‘Copyright © 2035 Pentaho Corporation. Al rademerts oe the property oftheir respective owners. ‘Course books may net be reproduced or distributed, n whole o In pat, without the prior wlten permission of Pentaho Teining. WuLW.pentaho.com/senices/taning or eal: traning oentsho comm Page |126entao Baa Integration Fundamentals Course Code DI1000 Guided Demo 9 - Choosing Adequate Sample Size for ‘Get Fields’, Continued Create & Configure the Table Output Step, continued Step Action ‘Notice that the field lengths itis using to create the new table are ing from the CSV file input step. i FREATE TABLE TableLoadTest ‘ORDERNUMBER BIGINT QUANTITYORDERED BIGINT PRICEEACH DOUBLE. ‘ORDERLINENUMBER BIGINT 1 SALES DOUBLE ORDERDATE DATETIME STATUS VARCHAR(10) QTR_ID BIGINT ¢ MONTH_ID BIGINT [_PRODUCTEINE VARCHAR(I2) || PRODUCTCODE VARCHAR(8) VARCHAR( 30) * Click [Execute] to create the new table, « Click [OK] to close the ‘Results of the SQL statements’ dialog. + Click (Close] to close the ‘Simple SQL editor’ dialog. co To close the step’s properties dialog, click [OK]. Save the transformation, Continued on next page Copyright © 2015 Pentaho Corporation. ll trademarks ar the property oftheir respective owners. course books may nt be reproduced or dstribute, in whale orn par, thou the prior written permlsion of Pentaho Trlnrg. avn.pentaho.com/servces/traning or ema: tainnge@>pentsho.com Page |127Pentaho Data Integration Fundamentals ‘Coure Code D11000 Guided Demo 9 — Choosing Adequate Sample Size for ‘Get Fields’, Continued Execute the Step Action Transformation 1__ | Torun the transformation, press F9, and then, click [Launch]. 2 | Click the Step Metrics tab in the Execution Results pane to see that errors occurred. It should look similar to the screenshot shown below: ‘Steprame sv fein 3 | Tosee the exact errors causing the problem: * In the Execution Results pane, click the Logging tab. Click the Show error lines © icon. The error states that the data is too long for the column, In this. case, the STATE column received the error fi * tahe di.core exception KettleDstebaceBatchException: h BatchUpdatetsception| Data runcation: Data too long for column STATE atrowL vesion S04, builds Wom@ISIOATITS VT ro SEE version $04, build from 2013-10-30 19-53-32 by buldguy): Eros detected! sion 5.4, build from 2013-10-3019-53.22 by buldouy): Errs detected 4 | To close the ‘Error lines’ dialog, click [OK]. continued on next page Copyright ©2015 Pentaho Corporation. ltrademars ae the popesty oftheir erpetve owners Course books may not be reproduced or dstributed, In whole orn pr, without the por writen permission of Penta Traning. st pntaho.com/seriei/taning 0° ema talnlng@peniaho com Page | 128Pentaho Data Integration Fundamentals Course Code D11000 Guided Demo 9 — Choosing Adequate Sample Size for ‘Get Fields’, Continued Solution Details The solution to this exercise can be obtained using the details below: Location: C:\pentahotraining\solutions\Guided Demos Completed transformatior GD9_PreviewSampleSize.ktr (Use File | Import from an XML file to import) End of Guided Congratulations! You have completed this guided demonstration. Demonstration Copyright © 2015 Pentaho Corporation. lltrademarks are the property of thelr respective owners. Course books may nt be reproduced or stout, in whole orn part, without the por weitten permission af Pentaho Trallng .allng or ema alana @pentaho.com Page |129Pentaho Daa Incpraton Fundamentals Course Code DI1000 Exercise 8 — Lookups & Data Formatting Introduction Objectives Prerequisites Model This exercise introduces the concept of looking up data. The exercise scenario includes a flat file (.csv) of sales data that you will load into a database so that mailing lists can be generated. Several of the customer records are missing postal codes (zip codes) that must be resolved before loading into the database. After completing this exercise, you will be able to: ‘* Merge Data from different streams. © Use the Select Values step to change the name and format of a field. You must have Pentaho Data Integration (or Pentaho Business Analytics suite) installed and properly configured. You must also have access to course files required (if any). ales Finer fing Zips [Rl -e>—fr, Read Postal Coder Lookup Missing Zips" Prepare Feld Layout Continued on next page Copyright © 2015 Pentaho Corporation. All rademarks are the property of thei respective owners Course books may not be reproduced or distributed, In whole o in pat, without the prior written permission of Pentaho Tsing ‘sum natah com/senvcedusnns or ema: ablogazentocem Page | 130Pentabo Daa Integration Fundamentals Course Cade D11000 Exercise 8 — Lookups & Data Formatting, Continued ing Stream Lookup with a Filter Step Action 1 Open a new transformation 2__| Save the transformation as Updatezip in a location of your choice. 4__ [Adda CSV file input step. Then edit properties. 5 _ [Inthe ‘Step name’ field, type Read Sales Data. 6 [Inthe ‘Filename’ property, click [Browse], and then locate the source file, sales_data.esv, found at: |C:\Pentaho\design-tools\data- integration\samples\transformations\files Make sure that the Separator is set to comma (,), and Header is enabled because there is one line of header rows in the file. Continued on next page Copyright © 2015 Pentaho Corporation. ll trademarks are the propety ofthe respective owners, ‘Course books may not be reproduced or distrbutad a whol orn par, without the rior written permission of Pentsho Trang. otto com/senicea/ sing or eal: tlnng@pentahocom Page | 131Pentaho Das Integration Fundamentals Course Code DIIO00 Exercise 8 — Lookups & Data Formatting, Continued Using Stream Lookupwitha [Step Action Filter 1 continued Click [Get Fields] to retrieve the input fields from your source file. A dialogue box will pop-up asking for a sample size (number of rows). Enter 0 (zero). Click Preview Rows to verify that your file is being read correctly. You can change the number of rows to preview. Click [OK] to exit, the step properties dialog box. ‘Add a ‘Filter rows’ step to your transformation. Under the Design tab, go to Flow > Filter rows. Drag it to the screen. Create the hop, and make it the main path. E— Read Sales Data Filter Missing Zips Double-click the Filter rows step. The ‘Filter Rows’ properties dialog box appears. In the Step Name field type, Filter Missing Zips. ~ In the Fields: dialog box select POSTALCODE and click [OK] Click on the comparison operator (set to = by default) and select the IS NOT NULL function and click (OK]. Click [OK] to exit the ‘Filter Rows properties’ dialog box. Continued on next page ‘Copyright © 2035 Pentaho Corporation. Alltrademerks ae the property of thelr respective owns, ‘Course books may not be reproduced or distributed, In whole orn pat, without the prior writen permission of Pentaho Tenn. utr.pentahocom/sevces/training or eal: taniog@aentahocom Page |132Pentaho Data Integration Fundamentals Course Code D100, Exercise 8 — Lookups & Data Formatting, Continued Using Stream Lookup with a Step Action Filter 9 | Click and drag a Table Output step into your transformation. Create a hop between the Filter Missing Zips (Filter Rows) and Table Output steps. In the dialog that appears, select Result is TRUE. —— Read Sales Date ‘Fiter Missing Zips ed Wi M0 Oran ASE EE Wincapttsep 10 | Double-click the Table Output step to open its edit properties dialog box. 11__[ Choose the pentaho_olap connection. 12 _ | Type SALES_DATA in the ‘Target table’ field. 13 _| Inthe dialog box, enable the Truncate table property. 14 | This table does not exist in the target database, so create it, perform the following: lick the [SQL] button, Click [Execute] to run the SQL. Click [OK] to close the ‘Simple SQL editor’ dialog box. Click [OK] to close the ‘Table output’ dialog box. 15 __[ Save your transformation. 16 | Add a new CSV file input step to your transformation, In this step you will retrieve the records from the Zipssortedbycitystate.csv lookup file. Continued on next page Copyright © 2015 Pentaho Corporation. Al rademart are the property of thei respective owners. course books may not be reproduced or stot, n whole orn pat, without the prior written persion of Pentaho Training. we ent com/seoiceshaning or eral
[email protected]
, Page | 133Pentaho Data ttegration Fundamentals ‘Course Cade D11000 Exercise 8 — Lookups & Data Formatting, Continued Using Stream Lookupwitha [Step ‘Action Filter 7, Rename the step to, Read Postal Codes. continued 7) Click [Browse] to locate the source file, Zipssortedbycitystate.csv, located at: C:\Pentaho\design-tools\data- integration\samples\transformations\files 19 Click [Get Fields] to retrieve the data from your .csv file. Rename the POSTALCODE field to ZIP_RESOLVED. 20 Click [Preview Rows] to make sure your entries are correct and click {OK] to close the step’s properties dialog box. 21 ‘Add a Stream lookup step to your transformation. Under the Design tab, expand the Lookup folder and choose Stream Lookup. 22 Draw a hop between the Filter Missing Zips and Stream Lookup steps. In the dialog box that appears, select Result is FALSE. 23 Create a hop from the Read Postal Codes step to the Stream lookup step. The transformation now looks like this. Fiter Miking Zips Wate to Database o> Read Postel Codes Lookup Missing Zips 24 Double-click on the Stream lookup step to open its edit properties dialog box. 25 Rename Stream Lookup to Lookup Missing Zips Continued on next page Copyright © 2035 Pentaho Corporation. Altrademarks are the property oftheir respective owners ‘Coutse books may not be reproduced or distributed, n whole on pat, without the rir writen permision of Pentaho Teining. stu.pentaho com/servce/traning orem taning@pentahocom, Page |134Pentaho Data Integration Fundamentals Course Code DINO00 Exercise 8 — Lookups & Data Formatting, Continued Using Stream Lookup with a Filter, continued Step Action 26 From the Lookup step drop-down box, select Read Postal Codes as the lookup step. 27 Define the CITY and STATE fields in the key(s) to look up the value(s) table. Click the drop down in the Field column and select CITY. Then, click in the LookupField column and select CITY. Perform the same actions to define the second key based on the STATE fields coming in on the source and lookup streams: Step name Lookup Missing Zips Lookup step | Read Postal Codes The keys) to look up the value(s): [#7 Field LookupField jl aya [2 STATE. state L 28 Click [Get lookup fields]. ZIP_RESOLVED is the only field you want to retrieve. To delete the CITY and STATE lines, right-click in the line and select Delete Selected Line and make sure the Type is set to String. Click [OK]. Specify the fields to retrieve #7 Field Newname Default Type 1 ‘ZP_RESOLVED Sting Continued on next page Copyright © 2015 Pentaho Corporation. Al Course books may nt be reprodueed or estrbutee In whe mas ae the property of thelr espectiv owner. or npat without the prior writen permission ef Pentaho Tanne _wintentaho com/sercestloing or emi: traning pentaho.om Page |135Pentaho Data lntegtion Fundamentals Course Code BI1000 Exercise 8 — Lookups & Data Formatting, Continued Using Stream Lookup with a Filter continued Step Action 23 In the canvas, select the Lookup Missing Zips step, right-click and select Preview to display the preview/debugger dialog box. Click Quick Launch to preview the data flowing through this step. Notice that the new field, ZIP_RESOLVED, has been added to the stream containing your resolved postal codes 30 | Add a Select values step to your transformation by expanding the Transform folder and choosing Select Values. 31 | Create a hop between the Lookup Missing Zips and Select Values steps. 32__| Double-click the Select Values step to open its properties dialog box 33___| Rename the ‘Select value’s step to Prepare Field Layout. 34 | Click [Get fields to select] to retrieve all fields and begin modifying the stream layout, 35 _| Select the old POSTALCODE field in the list (line 20) and delete it. 36 Select the ZIP_RESOLVED field and press CTRL-Up Arrow until it is #20, ar The original POSTALCODE field was formatted as a 9-character string. You must modify your new field to match the form. Click the Meta-Data tab. 38 | Inthe first row of the Fields to alter table, click in the Fieldname column and select ZIP_RESOLVED. 39 | Configure this tab using the following steps: ‘Type POSTALCODE in the ‘Rename to’ column. Select String in the Type’ column. Type 9 in the ‘Length’ column. Click (OK] to exit the edit properties dialog box. Continued on next page Copyright © 2035 Pentaho Corporation. All rademarks are the property of thei respective owners Course books may not be reproduced or dtiouted, nwo or np, ithout the prior writen permission of Pentaho Trang sew pentaho com/senvces/vainiog or ema: traning®@gentsho.com Page | 136Pentaho Data Integration Fundamentals ‘Course Cade D11000 Exercise 8 — Lookups & Data Formatting, Continued Using Stream Lookup with a Filter, continued Solution Details End of Exercise ‘Copyright © 2015 Pentaho Corporation, Altrademars are the propery of tel respective owner ‘Course books may not be reproduced or distributed In whole ori par, without the pror written permlsion of Pentaho Teining Step ‘Action 40. | Drawa hop from the Prepare Field Layout (Select values) step to the Write to Database (Table output) step, making it the main path Save your transformation. 41 In the canvas, select the Lookup Missing Zips step, right-click and select Preview to display the preview/debugger dialog box. Click Quick Launch to preview the data flowing through this step. Notice that the new field, ZIP_RESOLVED, has been added to the stream containing your resolved postal codes a2 ‘The transformation is complete. Run the Transformation. 43 Review the results. (Qeecwicnisery|B Losieg| TE Peternne Gaps EE Mais) Pn repre Fld Layout at noweell The solution to this exercise can be obtained using the details below: Location: C: \pentahotraining\Solutions\gxercises Completed transformation: EX8_UpdateZip.ktr (Use File | Import from an XML file to import) Congratulations! You have completed this exercise. ‘wi. pento com/seovces/tanig orem taining@oentohe com Page | 137Pentaho Data Integration Fundamentals Course Code DI1000 Guided Demo 10 - Creating Summary Fields Using Group By Introduction Objectives Prerequisites Model ‘There are a variety of steps that can be used for calculations, this guided demo shows how to create summary data using Group By and the Sort step. After completing this guided demonstration, you will be able to: * Create a new calculated field using ‘Group by’ step. Preview the results of the calculation * Understand how the Sort step works. You must have Pentaho Data Integration (or Pentaho Business Analytics suite) installed and properly configured, You must also have access to course files required (if any). > fy Get Product Date Sort by productline Group by productine Continued on next page Copyright © 2035 Pentaho Corporation. Al rademark are the property of thei respective owners ‘Course books may net be reproduced ar dtibuted, in whole or In pa, without the prior wrtan permission of Pentaho Teinng ‘vi pentaho com/servces/traning o eal
[email protected]
Page | 138Pentabo Data Integration Fundamentals Course Code DILDO Guided Demo 10 — Creating Summary Fields Using Group By Continued Creating the Transformation Step | ‘Action Open a new transformation Save the transformation as SortGroupBy. From the Input Category drag the Table input step to the canvas. Open the step by right clicking on it and choosing Edit Step from the dialogue. Open the Text File Output step In the Step Name field type Get Product data. In the connection drop-down choose pentaho_oltp. ‘The database explorer will appear. Click on the key next to Tables, then click the key next to Tables this will show you a list of tables double click on the products table. I] month stisbutes FB othces Fi ovderde (ff orderdetes basic Bi orders base {DD payments (payments basic Synonyms ‘A dialogue will appear asking if you'd like to include fieldnames in the SQL. Choose Yes. Continued on next page ‘copyright© 2015 Pentabo Corporation. Alltzademars ar te property of thelr respective owners. ‘Course books may not be reproduced or dstbuted in whole or fn par, without the pror writen permission of Pentaho Trsining. ou petaho conserves eaeig ema: waningeetahocom Page | 139Pentsho Data Inegration Fundamentals Course Code DIIOO0 Guided Demo 10 — Creating Summary Fields Using Group By, Continued Creating the Transformation, continued Step ‘Action Delete the productdescription line including the leading comma. connection [peniaho sau ao |< preductsca Remove —— this column Bayprice from script FROM Breduete 10 _| Click the Preview Find the field quantityinstock on the far right. That is the field we will be working with. Click [OK] to close the step. 11 _| From the Transform category drag the Sort Rows step to the canvas and place it to the right of the Table Input step. 12_| Create the hop. 13 Open the Sort Rows step In the Step Name field type Sort by productline. Continued on next page Copyright © 2015 Pentaho Corporation. Al rademarks ae the property of thelr respective owns. ‘Course books may not be reproduced or dsibutd, In whole in part, without the prior written persion of Pentaho Telning mw, penta com/sanicus/ traning or eral:
[email protected]
Page |140Pentaho Data Integration Fundsmentals Course Code D11000 Guided Demo 10 - Creating Summary Fields Using Group By, Continued Creating the Transformation, [Step Action continued 14 | Open the Sort Rows step. Since we want to sort by productline, choose productline from the drop down list under Fieldname, Fieldname | Ascending | Casesensitive | Presorted compare productline Y N N From the Statistics Category drag the Group by to the canvas. 15 16 | Open the Group by step in the Step Name field type Group by productline. 17 _| Fillin the Group By step as seen below. Continued on next page Copyright © 2035 Pentaho Corpraton Ail trademarks ae the property of thee respective owners. Course books may net be reproduced or distributed, I whole orn part without the pier writen prmson of Pentaho Training. mor. pentaho.com/sendcesainiog orem training pentaho.com Page |1a1Pentaho Data Integration Fundamentals ‘Course Code DIL000 Copyright © 2035 Pentaho Corporation. ll rademark are the property oftheir respective owners. ‘Course books may not be reproduced ar distributed, in whole o In part, without the pir written permission of Pentaho Teiang. ‘vu entahe com/sencestaining or em: treiningipentaho.com Page |142Pentaho Duta ltegation Fundamentals Course Cose D11000 Guided Demo 10 — Creating Summary Fields Using Group By, Continued Creating the Transformation, continued ‘Step Action 18 _ | Click [OK] to close the step. 19 _ | This message will appear Aa REE trite ncrtangattereetonte tats you dontsot hein, he esuks may not be comet ttre cont show hing name 20__| The quick launch dialogue will appear clickG [Quick Launch]. 21 | To see the results of the calculation right click on the step and choose preview. 22__| The quick launch dialogue will appear click Quick Launch. 23 _ | When the preview opens you will see the grouped data zs — #7 productline totaly Classic Cars 219183 1 2 Motorcycles 69401 3 Planes 62287 4 Ships 26833 5 Trains 16696 6 — Trucksand Buses 35851 7 Vintage Cars 124880 Continued on next page ‘copyight © 2015 Pentaho Corporation. Altrademarks are the property oftheir respctve owners. ‘Course books may not be reproduced or cetbuted in whole or In pan, without the prior written permision of Pentaho Telning ww. pentaho.com/sendces/tainlag oF ero:
[email protected]
Page | 143Pent Data Integration Fundamentals ‘Course Code D11000 Guided Demo 10 — Creating Summary Fields Using Group By, Continued Creating the Transformation, [Step Action continued 24 | Close the preview and reopen the Group by step. The include all rows option will show you both the individual records and the grouped data. Enable Include all rows. Se9 eT Taeatent 2 25_ | Close the step and run preview again. The result should look like Solution Details The solution to this exercise can be obtained using the details below: Location: C:\pentahotraining\Solutions\Guided Demos ‘Completed transformation: GD10_SortGroupBy.ktr (Use File | Import from an XML file to import) End of Guided Congratulations! You have completed this guided demonstration. Demonstration opyight © 2015 Pentaho corporation. Alteademarts se the propery after cepectve owners ‘course books may not be reproduced or dstbutd, in whole rin par without the por wrktenprmision of Pentaho Tsing. ‘wnt pentah.com/eendcesfaining oF eral anhng@pentao com Page | 144Pena Data Integration Fundamentals Course Code DI1000 Exercise 9 — Calculating & Aggregating Order Quantity Introduction note Prerequisites Objectives In this exercise, you will create a transformation that reads the CSV file containing order data. Then, the data is sorted, by country and grouped by country for their total quantity ordered. Finally, calculates an individual order’s quantity as a percentage of its country's total order quantity. For those looking for more of a challenge, try the advanced version of this exercise. It is the same exercise, but without the detailed guidance. You will find it in this workbook immediately following this exercise. You must have Pentaho Data Integration (or Pentaho Business Analytics suite) installed and properly configured. You must also have access to course files required (if any). After completing this exercise, you will be able to: «# Sort rows of data on the stream. * Group rows ‘© Use the Calculator step to calculate values based on values of data in the stream. Continued on next page Copyright © 2015 Pentaho Corporation. ll trademarks are the property of thelr respecte owners. ‘course books may not be reproduced or detributed in whole orn part, without the pror written permision of Pentaho Trang, ‘win pentaho.com/servies/aning or erat aeing@aeataho com Page | 145Peataho Data Integration Fundameniss Course Code D11000 Exercise 9 — Calculating & Aggregating Order Quantity, Continued Create the Transformation Copyright © 2015 Pentaho Corporation. ‘course boots may not be reproduced or distributed, in ‘Step ‘Action 1 | To create the new transformation, in the Spoon menubar, click File | New | Transformation. 2 To open the Transformation properties dialog, in the Canvas, double-click on an empty area, 3 | Set the transformation properties for the Transformation tab according to the table below: Property Name Value Transformation SortingData_Grouping name Description ‘Add a description of your choice. Directory 7public/PD|_Tr_Objects 1D NOTE: This is in the repository. 4 _| To close the Transformation properties’ dialog, ¢ {OKI 5 | To save the transformation: Press CTRL-S. # At the ‘Transformation properties’ dialog, click [OK]. * Atthe ‘Enter a comment dialog’, enter an optional comment, and ‘then click [OK]. Q NOTE: Now as you work on your transformation, you can easily save your work, Continued on next page trademars are the property oftheir respective owners. olor par, without the pier wrtten permission of Pentaho Trang. ‘Muu pentahe.com/serdces/taiing or ema
[email protected]
Page |146Pentaho Data ftegaton Fundamentals Course Code D11000 Exercise 9 — Calculating & Aggregating Order Quantity, Continued Create & First, you will add a step to read all fields from a CSV file. Configure the CSV File Input Step ‘Action step 1 _ | To create the first step, from the Input category of the Design tab, drag the CSV file input step onto the Canvas. 2 _ | To open the step’s properties dialog, double-click the step. 3. | To configure the step’s properties, set them according to the table shown below: Property Name Value Step Name Read Order Data From CSV File Filename S{DIR_INPUTNOrder_File.csv Delimiter’ fi Lazy conversion (unchecked) 4 | To configure the fields for this step: * Click the [Get Fields] button, © At the ‘Sample size’ dialog, click [OK]. # Atthe ‘Scan results’ dialog, click [Close]. To close the step’s properties dialog, click (OK. 6 _| Save the transformation, emp You can copy and paste a step from a previous transformation you created that has a CSV file input step configured the same way. Continued on next page Copyriht © 2015 Pentaho Corporation. Al rademarts at the property ofthe respective ones. Course books may not be reproduced or asses nwhele orn pr, wihout the ror writen pemson of Pentaho Traing ‘tt.nentoh com/senes/valnng emal:alsleg@pectahacom Page | 147Pentaho Data Intention Fundamentals ‘Course Coe D11000 Exercise 9 — Calculating & Aggregating Order Quantity, Continued The first step reads all of the fields and data from the CSV file and adds them tothe stream. Only a subset of those fields are needed, so you will add a step that will only select the fields that contain the data you are interested ‘Action ‘To create the second step, from the Transform category of the Design tab, drag the Select Values step onto the Canvas. Create a hop between the steps as shown in the table below: ‘Source Step CSV file input Destination Step Select values To open the step’s properties dialog, double-click the step. To configure the step’s properties, set them according to the table shown below: Property Name Value Step Name Select the Necessary Values Select & Alter Tab: ordernumber Fieldname quantityordered priceeach customernumber country ‘Compare the step’s configuration with the screenshot below and make any necessary changes. | "SHEMET. Remove] Me ‘orderomber ‘quantyorderes 5 mat To close the step’s properties dialog, Create & Configure the Select Values Step in. Step a 2 3 4 5 6 7 Save the transformation. Continued on next page ‘Copyright © 2035 Pentaho Corporation. ll trademarks ae the property of thei respective owners. ‘Course books may not be reproduced or distributed, in whole an part without the pir writen permission of Pentaho Training. ou. pentaho.com/senvces/taining or eal: traning@oentaho com Page |148Pentaho Data Lntepation Fundamentals Course Code DIIOOO Exercise 9 — Calculating & Aggregating Order Quantity, Continued Create & The next step you will add will sort the data stream by country. Configure the SortRows Step [Step ‘Action 1 | To create the next step, from the Transform category of the Design tab, drag the Sort rows step onto the Canvas. 2 | Create a hop between the steps as shown in the table below: Source Step Destination Step Select values Sort rows 3__| To open the step’s properties dialog, double-click the step. 4 | To configure the step’s properties, set them according to the table shown below: Property Name Value Step Name Sort Rows by Country 5 | To configure the field to sort by, in the Fields grid, add a row according to the table below: Fieldname | Ascending | Casesensitive | Presorted compare country |¥. N N 6 _| Toclose the step’s properties dialog, click [OK]. 7__[Save the transformation. Continued on next page ‘Copytight © 2015 Pentaho Corporation. Altradamarks are the property of thelr respective owners. ‘Course books may nat be reproduced or ditrbuted, a whole or in pat, without the prior written permission of Pentaho Training ‘ww. pentaho com/sevces/uainlag or ema trallng®pentaho.com Page |149
You might also like
Guided Tutorial For Pentaho Data Integration Using Oracle
PDF
No ratings yet
Guided Tutorial For Pentaho Data Integration Using Oracle
41 pages
PDI-Labguide ETL Using Pentaho Data Integration
PDF
No ratings yet
PDI-Labguide ETL Using Pentaho Data Integration
36 pages
Pentaho Data Integration
PDF
No ratings yet
Pentaho Data Integration
99 pages
Pentaho Kettle Pdi Eng
PDF
No ratings yet
Pentaho Kettle Pdi Eng
17 pages
Spark
PDF
No ratings yet
Spark
96 pages
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
PDF
No ratings yet
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
20 pages
QlikView Essentials - Sample Chapter
PDF
No ratings yet
QlikView Essentials - Sample Chapter
21 pages
Pyspark - DataFrame Window Functions
PDF
No ratings yet
Pyspark - DataFrame Window Functions
3 pages
Data Mining Lab Notes
PDF
0% (1)
Data Mining Lab Notes
93 pages
6.3. data_structure_pyspark.ipynb - Exercise
PDF
No ratings yet
6.3. data_structure_pyspark.ipynb - Exercise
6 pages
100 Days: SQL Tutorial
PDF
No ratings yet
100 Days: SQL Tutorial
106 pages
Python programms
PDF
No ratings yet
Python programms
8 pages
SQL Guide
PDF
No ratings yet
SQL Guide
213 pages
MDX and DAX-compare and Contrast - Mark Whitehorn
PDF
No ratings yet
MDX and DAX-compare and Contrast - Mark Whitehorn
61 pages
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
PDF
No ratings yet
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
4 pages
MSTR Architect Project Design Essentials: Course Contents: Basic and Advanced
PDF
No ratings yet
MSTR Architect Project Design Essentials: Course Contents: Basic and Advanced
3 pages
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark
PDF
No ratings yet
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark
51 pages
TIBCO Hawk Rulebase Standard Guidelines
PDF
No ratings yet
TIBCO Hawk Rulebase Standard Guidelines
9 pages
openpyxl
PDF
No ratings yet
openpyxl
213 pages
PRML Solution Manual
PDF
No ratings yet
PRML Solution Manual
253 pages
Duckdb Docs
PDF
No ratings yet
Duckdb Docs
721 pages
SQL and relational theory how to write accurate SQL code 2ed. Edition Date - The complete ebook is available for download with one click
PDF
100% (1)
SQL and relational theory how to write accurate SQL code 2ed. Edition Date - The complete ebook is available for download with one click
58 pages
L 26 MYSQL Triggers
PDF
100% (1)
L 26 MYSQL Triggers
28 pages
ENGG1003_10_PythonApplicationsOnJupiter
PDF
No ratings yet
ENGG1003_10_PythonApplicationsOnJupiter
30 pages
Referencia M PDF
PDF
No ratings yet
Referencia M PDF
1,008 pages
Nosql Datawarehouse
PDF
No ratings yet
Nosql Datawarehouse
11 pages
IBM MDM 11.6 Installation: Topology, Software Bundles, Prerequisites, Steps and Issues
PDF
No ratings yet
IBM MDM 11.6 Installation: Topology, Software Bundles, Prerequisites, Steps and Issues
5 pages
Build A Search Engine For Medium Stories Using Streamlit and Elasticsearch - by ChiaChong - Better Programming
PDF
No ratings yet
Build A Search Engine For Medium Stories Using Streamlit and Elasticsearch - by ChiaChong - Better Programming
19 pages
Andrew Troelsen - Pro C# 5.0 and The .NET 4.5 Framework - 2013
PDF
No ratings yet
Andrew Troelsen - Pro C# 5.0 and The .NET 4.5 Framework - 2013
1,310 pages
48 Medium Article on PySpark Scenarios
PDF
No ratings yet
48 Medium Article on PySpark Scenarios
6 pages
Percona Monitoring An Management
PDF
No ratings yet
Percona Monitoring An Management
281 pages
Databases Diagrams
PDF
No ratings yet
Databases Diagrams
34 pages
Data Manipulation With Pandas
PDF
No ratings yet
Data Manipulation With Pandas
147 pages
My SQL
PDF
100% (1)
My SQL
44 pages
Yi I 2 For Beginners
PDF
0% (1)
Yi I 2 For Beginners
695 pages
Shell Cheat Sheet: Analytic Queries CRUD Queries Database Administration
PDF
No ratings yet
Shell Cheat Sheet: Analytic Queries CRUD Queries Database Administration
1 page
SQL Joins
PDF
No ratings yet
SQL Joins
15 pages
SQL For Everyone
PDF
No ratings yet
SQL For Everyone
11 pages
Star Trak Data Warehouse Schema v1 Draft
PDF
No ratings yet
Star Trak Data Warehouse Schema v1 Draft
11 pages
60 Days DSA Learning Plan
PDF
No ratings yet
60 Days DSA Learning Plan
9 pages
By Ghazwan Khalid Auda
PDF
100% (1)
By Ghazwan Khalid Auda
17 pages
(MYSQL Advanced) (CheatSheet)
PDF
No ratings yet
(MYSQL Advanced) (CheatSheet)
10 pages
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
PDF
No ratings yet
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
71 pages
Database Systems Scse
PDF
No ratings yet
Database Systems Scse
80 pages
Vbscript Tutorial
PDF
No ratings yet
Vbscript Tutorial
24 pages
Data Warehouse
PDF
0% (1)
Data Warehouse
34 pages
SQL Foda Key
PDF
No ratings yet
SQL Foda Key
129 pages
_ Databricks & PySpark learning day-10
PDF
No ratings yet
_ Databricks & PySpark learning day-10
4 pages
Data Warehousing AND Data Mining
PDF
No ratings yet
Data Warehousing AND Data Mining
169 pages
Pig Slides
PDF
No ratings yet
Pig Slides
46 pages
SQL Tutorial 2024
PDF
No ratings yet
SQL Tutorial 2024
303 pages
Data Mining-Rule Based Classification
PDF
No ratings yet
Data Mining-Rule Based Classification
4 pages
SQL Cheat Sheet
PDF
0% (1)
SQL Cheat Sheet
16 pages
Data Warehouse and Design Presentation
PDF
No ratings yet
Data Warehouse and Design Presentation
11 pages
SQL Server 2022 Datasheet
PDF
No ratings yet
SQL Server 2022 Datasheet
2 pages
O Codificador Limpo
PDF
0% (1)
O Codificador Limpo
8 pages
ASP Net MVC 3tier
PDF
No ratings yet
ASP Net MVC 3tier
12 pages
Module 5 Assignment
PDF
No ratings yet
Module 5 Assignment
5 pages
prac4
PDF
No ratings yet
prac4
68 pages
Assignment 5 - Pentaho Data Integration
PDF
0% (2)
Assignment 5 - Pentaho Data Integration
5 pages