0% found this document useful (0 votes)
103 views11 pages

Ggwplalana

The document contains answers to questions about Pentaho Data Integration (PDI). Key points include: PDI is an ETL tool that uses transformations to extract, transform and load data, and jobs to orchestrate transformations. Transformations contain steps that are executed in parallel, while jobs contain steps executed sequentially. Dimensional modeling uses star and snowflake schemas in data warehousing.

Uploaded by

Renny Pasaribu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views11 pages

Ggwplalana

The document contains answers to questions about Pentaho Data Integration (PDI). Key points include: PDI is an ETL tool that uses transformations to extract, transform and load data, and jobs to orchestrate transformations. Transformations contain steps that are executed in parallel, while jobs contain steps executed sequentially. Dimensional modeling uses star and snowflake schemas in data warehousing.

Uploaded by

Renny Pasaribu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

1. What is the disadvantage of multi-dimensional modeling?

Answer : Slow Query

2. make data warehouse significantly different from its data sources?

Answer : Data Modelling

3. What does Stream Lookup step do?

Answer : Looking for a related data and return one row or no row

4. C:/Pentaho/BA-4.5.0-GA/java/bin/java.exe or ${pentaho_java_home}/bin/java.exe. This is sample for


parameter:

Answer : wrapper java

5. What is the characteristic of type 1 SCD?

Answer : Updating the row with the same bussines key

6. What command line option do you use to refer to a transformation or a job file?

Answer : #NAME?

7. Given the transformation shown above, what does the dashed red line hop mean?

Answer : error handling of a row

8. What does “Select Values” step not able to do ?

Answer : replacing values in defined columns

9. Which is the correct variable expression in PDI?

Answer : ${VAR

10. What does “Add Sequence” step do?

Answer : Adding a series of number


11. What does Filter Rows do ?

Answer : a and b

12. What step to apply Slowly Changing Dimension (SCD) case?

Answer : Dimension Lookup / Update

13. What is the other alias for surrogate key?

Answer : Technicall key

14. Given the job shown above ${MYNAME} variable is set to a new value using “Set Variabels” step. Is
it possible to read the new value of ${MYNAME} in “Display Msgbox Info” step?

Answer : YES

15. What is benefit of staging database?

Answer : all of the above

16. What step do you use for calculating numbers?

Answer : all of the above

17. What is most representative concept of dimension table design?

Answer : Denormalization

18. Where is the best location to implement Change Data Capture mechanism?

Answer : Data source

19. What step do you use to read data from an Oracle Table?

Answer : Table Input

20. Given the transformation stream like above, ${NAMA} variable is given a new value in “Set
Variables” steps. Is it possible to read the new value of ${NAMA} in “Write to log” step?

Answer : No
21. In What type of multi-dimensional modeling table do you put in measure columns?

Answer : Fact table

22. From the architecture shown above what capabilities, does PDI have?

Answer : All of the above

23. What file location or protocol does Microsoft Excel Input step does not support?

Answer : Haddop File System

24. In Row Normaliser step, what output field do you specify to a group of header columns ?

Answer : Type field

25. What schemas do you use in Multidimensional Modelling?

Answer : star and snowflake

26. From the star schema shown above, what table does not come from the source system?

Answer : dim_waktu

27. If I need to change a data format on existing data column, which step should I use ?

Answer : Select Values

28. You want to schedule process through a job, which step do you use ?

Answer : Start

29.What file do you create when you decide to use Windows Task Scheduler to execute a transformation?

Answer : .ktr

30. If you want the email attachment, what option you need to specify in Microsoft Excel Input?

Answer : Add filename to result

31. Do we treat surrogate and business key as the same key in data warehouse?

Answer : a or b
32. What make staging database differs from data warehouse?

Answer : Reference and helper table

33. What does Punch Through mean in the Dimension Lookup / Update step?

Answer : Updating all related rows

34. What is the benefit of using type 2 SCD?

Answer : Accurately keep

35. In What type of multi-dimensionalmodellingtable do you put in measure columns?

Answer : Fact Tables

36. What schemas do you use in Multidimensional Modelling?

Answer : star and snowflake

37. What is the other alias for surrogate key?

Answer : Technicall key

38. Do we treat surrogate and business key as the same key in data warehouse?

Answer : No, they should be in different columns

39. What is the disadvantage of multi-dimensional modeling?

Answer : Slow Query

40. What is the mechanism to generate surrogate key ?

Answer : Updating all related rows

41. What is the characteristic of type 1 SCD?

Answer : Updating the row with the same bussines key

42. What is benefit of using type 1 SCD ?

Answer : Easy to maintain


43. What is benefit of using type 2 SCD ?

Answer : Accurately keep all historical information

44. Which is the correct variable expression in PDI ?

Answer : ${VAR}

45.

Given the transformation stream like above, ${NAMA} variable is given a new value in “Set Variables”
steps. Is it possible to read the new value of ${NAMA} in “Write to log” step?

Answer : No
46.

Given the job shown above ${MYNAME} variable is set to a new value using “Set Variabels” step. Is it
possible to read the new value of ${MYNAME} in “Display Msgbox Info” step?

Answer : YES

47.
Given the transformation shown above, what does the dashed red line hop mean?

Answer : error handling of a row

48. What file do you create when you decide to use Windows Task Scheduler to execute a
transformation?

Answer : .bat

49. What command line option do you use to refer to a transformation or a job file?

Answer : --file

50.

You want to schedule process through a job, which step do you use?

Answer : Start
51. Pentaho Data Integration is a..tool.

Answer : ETL

52. Which is not a job’s evaluation condition?

Answer : Follow when result is error

53. If you want to transform column headers into row values, what step do you use?

Answer : Row Normaliser

54. What does ETL stand for ?

Answer : Extract, Transform and Load

55. Which statement is true?

Answer : All the steps in a transformation and job are executed in parallel

56. Changing data type from string to number is part of

Answer : Transform process

57. The following task is not part of transform process

Answer : Read text file

58. Writing data into Microsoft Excel Spreadsheet is part of

Answer : Load process

59. There are four fields come into Select Value step, they are A,B

Answer : Fill Field to remove column in remove tab with C and D

60. The following task is not part of transform process

Answer : Read text file


61. Reading data from database is part of

Answer : Extract process

62. The task that is not done by transformation

Answer : Flow control

63. One of the columns that come into “Filter row” step is named “TF” which its contents is TRUE or
FALSE. If we configure filter as TF = Y, which one of the following configuration will cause the same
result.

Answer : TF TRUE

64. The keyboard shortcut to open a new transformation

Answer : Ctrl + N

65. What does ETL stand for ?

Answer : Extract, Transform and Load

66. Which statement is true?

Answer : Job can execute transformation

67. Which statement is true?

Answer : Job can execute transformation

68. What is Pan?

Answer : Pan is command line tool that that allows you execute a transformation

69. Which statement is true?

Answer : All the steps in a transformation are executed in parallel

70. What is Spoon ?

Answer : Spoon is a graphical user interface (GUI) that allows you to design, run, and test
transformations and jobs.
71. What is the most representative concept of dimension table design?

Answer : Denormalization

72. What makes data warehouse significantly different from its data sources?

Answer : Data Modelling

73. What command line option do you use to refer to a transformation or a job file?

Answer : --file

78. Which step not part of PDI’s transformation?

Answer : Start

79. What is the file extension for PDI’s Job?

Answer : kjb

80. What is the required step for a job?

Answer : Start

81. What databases can be accessed by PDI?

Answer : all of the above

82. What is the file extension for PDI’s transformation?

Answer : ktr

83. What is the prerequisite application for PDI?

Answer : Java

84. The Following concept is not available for a job

Answer : row

85.What file location or protocol does Microsoft Excel Input step does not support?

Answer : Hadoop File System (HDFS)


86. Which one is the wrong statement ?

Answer : Job can executes a transformation

87. What option we use in Table Output’s step to produce or alter a table?

Answer : SQL

88. From the star schema shown above, what table does not come from the source system?

Answer : Dim Waktu

89.What is the alias name of Pentaho Data Integration (PDI) project ?

Answer : Kettle

90. How do you include a file as attachment in job’s Mail step?

Answer : Attach general file option

91. Which application is not part of PDI?

Answer : Ant

92. From the architecture shown above what capabilities, does PDI have?

Answer : Consume external data source

You might also like