0% found this document useful (0 votes)
7 views10 pages

Module 2_Getting Started With Talend Data Integration

This document provides a comprehensive guide on getting started with Talend Open Studio for Big Data, including installation steps, project creation, and interface navigation. It outlines how to create a simple ETL Job using various components and includes hands-on exercises for practical application. Key features of Talend, such as component functionality, metadata management, and execution modes, are also highlighted.

Uploaded by

rizqi ardiansyah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views10 pages

Module 2_Getting Started With Talend Data Integration

This document provides a comprehensive guide on getting started with Talend Open Studio for Big Data, including installation steps, project creation, and interface navigation. It outlines how to create a simple ETL Job using various components and includes hands-on exercises for practical application. Key features of Talend, such as component functionality, metadata management, and execution modes, are also highlighted.

Uploaded by

rizqi ardiansyah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Module 2: Getting Started with Talend Data Integration

2.1. Installing Talend Open Studio for Big Data (TOS_BD)

Talend Open Studio for Big Data (TOS_BD) is a free, open-source ETL and Big Data tool designed
to simplify data integration across multiple platforms. In this training, we use TOS_BD v8.0.1.

System Requirements

 Operating System: Windows 10 or higher / Linux / macOS

 Java: Java 8 JDK (64-bit) is required

 RAM: Minimum 8 GB (16 GB recommended)

 Disk Space: At least 5 GB of free space

Installation Steps

1. Download TOS_BD from: https://www.talend.com/products/talend-open-studio/

2. Extract the downloaded zip file to your desired folder.

3. Navigate to the Talend folder and run:

o On Windows: TOS_BD-win-x86_64.exe

o On Linux/macOS: TOS_BD-linux-gtk-x86_64

4. On first launch, select a workspace directory (this is where your projects will be stored).

5. Accept the license agreement and proceed.


2.2. Creating a New Project

Projects in Talend help you organize Jobs, metadata, and resources.

Steps to Create a Project:

1. Launch Talend Open Studio.

2. On the login screen, click Create a new project.

3. Enter a meaningful name (e.g., Talend_DI_Training).

4. Click Create, then Finish to open the Studio with this project.

2.3. Navigating the Talend Studio Interface

Talend Studio is composed of several panels that help you design, configure, and monitor your
ETL Jobs:

A. Repository

 Located on the left.

 Stores metadata (databases, files, schemas), Jobs, routines, contexts, etc.

 Helps you reuse definitions across Jobs.

B. Designer (Design Workspace)

 The center panel where you design your Job by dragging and connecting components.

 Supports zoom, grid alignment, and annotations.

C. Palette

 Located on the right.

 A categorized toolbox of components (Input, Output, Processing, Big Data, etc.)


 You drag components from here to the Designer.

D. Configuration Tabs (Bottom Panel)

 Component: Configure properties of selected component.

 Run: Execute and monitor Job execution.

 Code: See generated Java code.

 Outline: View hierarchical structure of your Job.

2.4. Creating Your First Job

A Job in Talend is a visual workflow composed of interconnected components that represent


data operations.

Steps to Create a Simple Job:

1. Right-click on Job Designs in Repository → Create Job.

2. Fill in the required fields:

o Name: firstJob

o Purpose and Description: Optional

3. Click Finish.

4. Drag a tFixedFlowInput and tLogRow from the Palette to the Designer.

5. Connect them using a Row → Main link.

6. Double-click tFixedFlowInput to configure a simple schema and sample data.

7. Double-click tLogRow and set Mode to Table.

8. Click Run tab and press Run.

✔️You’ve created your first Talend Job that outputs hardcoded data to the console.
2.5. Overview: Components, Metadata, and Execution

A. Components

 Building blocks of Jobs (prefixed with t)

 Each component performs a specific task: read, transform, write, filter, join, etc.

 Examples:

o tFileInputDelimited, tMap, tFilterRow, tPostgresqlOutput, tHiveInput

B. Metadata

 Definitions for databases, files, schemas stored in the Repository

 Benefits:

o Reusability

o Centralized updates

o Faster development

C. Execution

 Jobs are compiled into Java code behind the scenes.

 You can run Jobs directly in Studio or export as standalone executables.

 Execution Modes:

o Normal: Run with GUI logs and output

o Debug: Step-by-step execution for troubleshooting


✅ Key Takeaways

 Talend Open Studio is a powerful visual ETL tool.

 A Project contains all your Jobs and metadata.

 Jobs are created visually using components connected in a flow.

 The Talend interface is intuitive and modular.

 Talend compiles Jobs into Java code and can run them in various modes.
🧪 Hands-on Exercise: Your First Talend Job

🔧 Prerequisites

 Talend Open Studio for Big Data (TOS_BD v8.0.1) is already installed

 Java JDK 8 is installed

 You have created a new project in Talend

🎯 Exercise 1: Creating a Simple Job to Display Data


in Console

Objective: Create a Talend Job that prints sample data to the console.

✅ Step-by-Step Instructions

Step 1: Create a New Job

1. Open Talend Studio and make sure you're in the correct project.

2. In the Repository pane (left side), right-click on Job Designs → select Create Job.

3. Fill in the Job details:

o Name: firstJob

o Purpose: Introductory Job

o Description: This job displays hardcoded data using tFixedFlowInput

4. Click Finish.
Step 2: Add Components to the Design Workspace

1. Open the Palette panel (right side).

2. Drag the following components into the Designer:

o tFixedFlowInput (from Input category)

o tLogRow (from Output category)

3. Connect the components:

o Right-click tFixedFlowInput → Row → Main → click on tLogRow.

Step 3: Configure tFixedFlowInput

1. Double-click tFixedFlowInput to open its Component tab.

2. Click Edit schema:

o Click + to add two columns:

 name as String

 age as Integer

o Click OK

3. In the Values field, enter sample data:

"Alice", 28

"Bob", 35

"Charlie", 22

4. Check the box for Use inline content (delimited file) if necessary.

Step 4: Configure tLogRow


1. Double-click tLogRow.

2. In the Basic settings, set:

o Mode: Table (for better formatted output)

Step 5: Run the Job

1. Click the Run tab at the bottom.

2. Press the Run button.

3. Observe the console output:

+----------+-------+
| name | age |
+----------+-------+
| Alice | 28 |
| Bob | 35 |
| Charlie | 22 |
+----------+-------+
🎉 Congratulations! You've created and executed your first Talend Job.

🧠 Challenge Exercise: Add Filtering Logic

Now extend your job to only display people aged 30 or above.

✅ Steps:

1. Drag a tFilterRow component from the Processing category into the Designer.

2. Insert it between tFixedFlowInput and tLogRow:

o Disconnect existing Main link.

o Connect:
tFixedFlowInput → Row → Main → tFilterRow
tFilterRow → Row → Filter → tLogRow
3. Configure tFilterRow:

o Click Edit filter conditions

o Add a condition: age >= 30

4. Run the Job again.


Expected Output:

+---------+-------+
| name | age |
+---------+-------+
| Bob | 35 |
+---------+-------+

📁 Optional Extension: Add File Output

Replace tLogRow with tFileOutputDelimited to write filtered results to a file.

✅ Steps:

1. Delete tLogRow.

2. Drag tFileOutputDelimited from the File → Output category.

3. Connect tFilterRow to tFileOutputDelimited with Row → Main.

4. Configure tFileOutputDelimited:

o File name: "/path/to/output.csv" (e.g., "C:/Talend/output.csv")

o Field Separator: ; or ,

o Include Header: Checked

5. Run the Job and verify the output file.


📝 Summary of Components Used

Component Purpose

tFixedFlowInput Creates inline mock data

tLogRow Prints data to the console

tFilterRow Filters rows based on a condition

tFileOutputDelimited Writes data to a CSV or delimited file

You might also like