Module 2: Getting Started with Talend Data Integration
2.1. Installing Talend Open Studio for Big Data (TOS_BD)
Talend Open Studio for Big Data (TOS_BD) is a free, open-source ETL and Big Data tool designed
to simplify data integration across multiple platforms. In this training, we use TOS_BD v8.0.1.
System Requirements
Operating System: Windows 10 or higher / Linux / macOS
Java: Java 8 JDK (64-bit) is required
RAM: Minimum 8 GB (16 GB recommended)
Disk Space: At least 5 GB of free space
Installation Steps
1. Download TOS_BD from: https://www.talend.com/products/talend-open-studio/
2. Extract the downloaded zip file to your desired folder.
3. Navigate to the Talend folder and run:
o On Windows: TOS_BD-win-x86_64.exe
o On Linux/macOS: TOS_BD-linux-gtk-x86_64
4. On first launch, select a workspace directory (this is where your projects will be stored).
5. Accept the license agreement and proceed.
2.2. Creating a New Project
Projects in Talend help you organize Jobs, metadata, and resources.
Steps to Create a Project:
1. Launch Talend Open Studio.
2. On the login screen, click Create a new project.
3. Enter a meaningful name (e.g., Talend_DI_Training).
4. Click Create, then Finish to open the Studio with this project.
2.3. Navigating the Talend Studio Interface
Talend Studio is composed of several panels that help you design, configure, and monitor your
ETL Jobs:
A. Repository
Located on the left.
Stores metadata (databases, files, schemas), Jobs, routines, contexts, etc.
Helps you reuse definitions across Jobs.
B. Designer (Design Workspace)
The center panel where you design your Job by dragging and connecting components.
Supports zoom, grid alignment, and annotations.
C. Palette
Located on the right.
A categorized toolbox of components (Input, Output, Processing, Big Data, etc.)
You drag components from here to the Designer.
D. Configuration Tabs (Bottom Panel)
Component: Configure properties of selected component.
Run: Execute and monitor Job execution.
Code: See generated Java code.
Outline: View hierarchical structure of your Job.
2.4. Creating Your First Job
A Job in Talend is a visual workflow composed of interconnected components that represent
data operations.
Steps to Create a Simple Job:
1. Right-click on Job Designs in Repository → Create Job.
2. Fill in the required fields:
o Name: firstJob
o Purpose and Description: Optional
3. Click Finish.
4. Drag a tFixedFlowInput and tLogRow from the Palette to the Designer.
5. Connect them using a Row → Main link.
6. Double-click tFixedFlowInput to configure a simple schema and sample data.
7. Double-click tLogRow and set Mode to Table.
8. Click Run tab and press Run.
✔️You’ve created your first Talend Job that outputs hardcoded data to the console.
2.5. Overview: Components, Metadata, and Execution
A. Components
Building blocks of Jobs (prefixed with t)
Each component performs a specific task: read, transform, write, filter, join, etc.
Examples:
o tFileInputDelimited, tMap, tFilterRow, tPostgresqlOutput, tHiveInput
B. Metadata
Definitions for databases, files, schemas stored in the Repository
Benefits:
o Reusability
o Centralized updates
o Faster development
C. Execution
Jobs are compiled into Java code behind the scenes.
You can run Jobs directly in Studio or export as standalone executables.
Execution Modes:
o Normal: Run with GUI logs and output
o Debug: Step-by-step execution for troubleshooting
✅ Key Takeaways
Talend Open Studio is a powerful visual ETL tool.
A Project contains all your Jobs and metadata.
Jobs are created visually using components connected in a flow.
The Talend interface is intuitive and modular.
Talend compiles Jobs into Java code and can run them in various modes.
🧪 Hands-on Exercise: Your First Talend Job
🔧 Prerequisites
Talend Open Studio for Big Data (TOS_BD v8.0.1) is already installed
Java JDK 8 is installed
You have created a new project in Talend
🎯 Exercise 1: Creating a Simple Job to Display Data
in Console
Objective: Create a Talend Job that prints sample data to the console.
✅ Step-by-Step Instructions
Step 1: Create a New Job
1. Open Talend Studio and make sure you're in the correct project.
2. In the Repository pane (left side), right-click on Job Designs → select Create Job.
3. Fill in the Job details:
o Name: firstJob
o Purpose: Introductory Job
o Description: This job displays hardcoded data using tFixedFlowInput
4. Click Finish.
Step 2: Add Components to the Design Workspace
1. Open the Palette panel (right side).
2. Drag the following components into the Designer:
o tFixedFlowInput (from Input category)
o tLogRow (from Output category)
3. Connect the components:
o Right-click tFixedFlowInput → Row → Main → click on tLogRow.
Step 3: Configure tFixedFlowInput
1. Double-click tFixedFlowInput to open its Component tab.
2. Click Edit schema:
o Click + to add two columns:
name as String
age as Integer
o Click OK
3. In the Values field, enter sample data:
"Alice", 28
"Bob", 35
"Charlie", 22
4. Check the box for Use inline content (delimited file) if necessary.
Step 4: Configure tLogRow
1. Double-click tLogRow.
2. In the Basic settings, set:
o Mode: Table (for better formatted output)
Step 5: Run the Job
1. Click the Run tab at the bottom.
2. Press the Run button.
3. Observe the console output:
+----------+-------+
| name | age |
+----------+-------+
| Alice | 28 |
| Bob | 35 |
| Charlie | 22 |
+----------+-------+
🎉 Congratulations! You've created and executed your first Talend Job.
🧠 Challenge Exercise: Add Filtering Logic
Now extend your job to only display people aged 30 or above.
✅ Steps:
1. Drag a tFilterRow component from the Processing category into the Designer.
2. Insert it between tFixedFlowInput and tLogRow:
o Disconnect existing Main link.
o Connect:
tFixedFlowInput → Row → Main → tFilterRow
tFilterRow → Row → Filter → tLogRow
3. Configure tFilterRow:
o Click Edit filter conditions
o Add a condition: age >= 30
4. Run the Job again.
Expected Output:
+---------+-------+
| name | age |
+---------+-------+
| Bob | 35 |
+---------+-------+
📁 Optional Extension: Add File Output
Replace tLogRow with tFileOutputDelimited to write filtered results to a file.
✅ Steps:
1. Delete tLogRow.
2. Drag tFileOutputDelimited from the File → Output category.
3. Connect tFilterRow to tFileOutputDelimited with Row → Main.
4. Configure tFileOutputDelimited:
o File name: "/path/to/output.csv" (e.g., "C:/Talend/output.csv")
o Field Separator: ; or ,
o Include Header: Checked
5. Run the Job and verify the output file.
📝 Summary of Components Used
Component Purpose
tFixedFlowInput Creates inline mock data
tLogRow Prints data to the console
tFilterRow Filters rows based on a condition
tFileOutputDelimited Writes data to a CSV or delimited file