0% found this document useful (0 votes)
55 views2 pages

Exp 1 Introduction To Pentaho Pentaho Is A Company That Offers Pentaho Business Analytics, A Suite of Open Source Business Intelligence (BI)

The document provides an introduction to Pentaho, an open source business intelligence suite. It describes Pentaho Data Integration, the ETL component of Pentaho. It also outlines common uses of Pentaho Data Integration such as data warehousing, data migration, data cleansing, and integration with Hadoop. Key features of Pentaho Data Integration are also listed.

Uploaded by

raghav0206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views2 pages

Exp 1 Introduction To Pentaho Pentaho Is A Company That Offers Pentaho Business Analytics, A Suite of Open Source Business Intelligence (BI)

The document provides an introduction to Pentaho, an open source business intelligence suite. It describes Pentaho Data Integration, the ETL component of Pentaho. It also outlines common uses of Pentaho Data Integration such as data warehousing, data migration, data cleansing, and integration with Hadoop. Key features of Pentaho Data Integration are also listed.

Uploaded by

raghav0206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Exp 1 Introduction to Pentaho

Pentaho is a company that offers Pentaho Business Analytics, a suite of open source Business Intelligence (BI)
products which provide data integration, OLAP services, reporting, dashboarding , data mining and ETL
capabilities. Pentaho was founded in 2004 by five founders and is headquartered in Orlando, FL, USA.
Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract,
Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses
environments, PDI can also be used for other purposes:

Migrating data between applications or databases


Exporting data from databases to flat files
Loading data massively into databases
Data cleansing
Integrating applications

PDI is easy to use. Every process is created with a graphical tool where you specify what to do without writing code
to indicate how to do it; because of this, you could say that PDI is metadata oriented.
PDI can be used as a standalone application, or it can be used as part of the larger Pentaho Suite. As an ETL tool, it
is the most popular open source tool available. PDI supports a vast array of input and output formats, including text
files, data sheets, and commercial and free database engines. Moreover, the transformation capabilities of PDI allow
you to manipulate data with very few limitations.
The BI Platform offers the following functionality:

The Pentaho User Console Web interface (shown below), which enables easy management of reports and
analysis views.
An ad hoc reporting interface, which offers a step-by-step wizard for designing simple reports. Output
formats include PDF, RTF, HTML, and XLS.
A real-time analysis view interface that allows you to drill down into properly prepared data.
A complex scheduling subsystem that enables users to set reports to execute at given intervals.
The ability to email a published report to other users.
Connectivity with Pentaho Metadata Editor and Pentaho Report Designer, allowing content created with
these tools to directly publish to the BI Server.

Common Uses
Pentaho Data Integration is an extremely flexible tool that addresses a broad number of use cases including:

Data warehouse population with built-in support for slowly changing dimensions and surrogate key
creation
Data migration between different databases and applications
Loading huge data sets into databases taking full advantage of cloud, clustered, and massively parallel
processing environments
Data Cleansing with steps ranging from very simple to very complex transformations
Data Integration including the ability to leverage real-time ETL as a data source for Pentaho Reporting
Rapid prototyping of ROLAP schemas
Hadoop functions: Hadoop job execution and scheduling, simple Hadoop MapReduce design, Amazon
EMR integration

Pentaho Data Integration features and benefits include:

Installs in minutes; you can be productive in one afternoon


100% Java with cross platform support for Windows, Linux, and Macintosh
Easy to use graphical designer with over 100 out-of-the-box mapping objects including inputs, transforms,
and outputs
Simple plug-in architecture for adding your own custom extensions
Enterprise Data Integration server providing security integration, scheduling, and robust content
management including full revision history for jobs and transformations
Integrated designer (Spoon) combining ETL with metadata modeling and data visualization, providing the
perfect environment for rapidly developing new Business Intelligence solutions
Streaming engine architecture provides the ability to work with extremely large data volumes
Enterprise-class performance and scalability with a broad range of deployment options including dedicated,
clustered, and/or cloud-based ETL servers
To download and install Pentaho Data Integration 4.0.1, and open the PDI interface.
Step 1: Install Java Runtime Environment (version 1.4 or higher) in your system.
Step 2: Go to http://www.pentaho.com site and download Pentaho Data Integration 4.0.1.
Step 3: Unzip the downloaded PDI zip file. Open the data-integration folder, and double click on the
spoon.bat file to open the PDI IDE.

You might also like