Exp 1 Introduction To Pentaho Pentaho Is A Company That Offers Pentaho Business Analytics, A Suite of Open Source Business Intelligence (BI)
Exp 1 Introduction To Pentaho Pentaho Is A Company That Offers Pentaho Business Analytics, A Suite of Open Source Business Intelligence (BI)
Pentaho is a company that offers Pentaho Business Analytics, a suite of open source Business Intelligence (BI)
products which provide data integration, OLAP services, reporting, dashboarding , data mining and ETL
capabilities. Pentaho was founded in 2004 by five founders and is headquartered in Orlando, FL, USA.
Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract,
Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses
environments, PDI can also be used for other purposes:
PDI is easy to use. Every process is created with a graphical tool where you specify what to do without writing code
to indicate how to do it; because of this, you could say that PDI is metadata oriented.
PDI can be used as a standalone application, or it can be used as part of the larger Pentaho Suite. As an ETL tool, it
is the most popular open source tool available. PDI supports a vast array of input and output formats, including text
files, data sheets, and commercial and free database engines. Moreover, the transformation capabilities of PDI allow
you to manipulate data with very few limitations.
The BI Platform offers the following functionality:
The Pentaho User Console Web interface (shown below), which enables easy management of reports and
analysis views.
An ad hoc reporting interface, which offers a step-by-step wizard for designing simple reports. Output
formats include PDF, RTF, HTML, and XLS.
A real-time analysis view interface that allows you to drill down into properly prepared data.
A complex scheduling subsystem that enables users to set reports to execute at given intervals.
The ability to email a published report to other users.
Connectivity with Pentaho Metadata Editor and Pentaho Report Designer, allowing content created with
these tools to directly publish to the BI Server.
Common Uses
Pentaho Data Integration is an extremely flexible tool that addresses a broad number of use cases including:
Data warehouse population with built-in support for slowly changing dimensions and surrogate key
creation
Data migration between different databases and applications
Loading huge data sets into databases taking full advantage of cloud, clustered, and massively parallel
processing environments
Data Cleansing with steps ranging from very simple to very complex transformations
Data Integration including the ability to leverage real-time ETL as a data source for Pentaho Reporting
Rapid prototyping of ROLAP schemas
Hadoop functions: Hadoop job execution and scheduling, simple Hadoop MapReduce design, Amazon
EMR integration