Talend Tutorial4 Create and Use Metadata
Talend Tutorial4 Create and Use Metadata
1. Create a metadata definition for a delimited file
a. In the Project Repository, click Metadata, right-click File delimited, and click Create file
delimited.
b. In the Name field of the wizard, type movies and click Next.
c. To specify a sample file, click Browse next to the File field, select the file moviesSorted
from the local disk, and click Open.
In the wizard window that appears, you can define settings such as how the file should be
read, the number of rows, if any, that should be skipped when reading the file, and the
maximum number of rows to process.
e. To indicate that the first row of the file is column names and should be ignored, in the
Preview tab, select the Set heading row as column names checkbox.
Note that when you do so, the Header checkbox is automatically checked with the value 1.
f. To refresh the file display to reflect the change made, click the Refresh Preview button
and then click Next.
g. In the Name field, type moviesSchema.
If the first line of the sample file includes column names, they will be displayed. If not, the
columns will appear as Column 0, Column 1, and so on and will have to be renamed manually.
When guessing the schema, Talend only reads the fifty first lines of the sample file and based
on the data in these rows, defines the column types and length. You should validate the
information displayed or correct it, if necessary.
h. Update the displayed schema to reflect the structure of the sample file. In this case,
change the length of the title and url fields to 100 and 250 respectively. Also, change the
type of the directorID field to integer. Click Finish.
Under Metadata in the Project Repository, the movies 0.1 entry is displayed with the file
properties. Under the entry movies 0.1, the schema of the metadata file, moviesSchema, is
displayed.
If you need to modify the property type or the schema, right-click on the component in the
Project Repository and select Edit File Delimited or Edit Schema.
Note that the parameters set of the metadata is displayed. Also note that all the fields are in
grey, indicating that they belong to the metadata and not to the component.
To change the schema, click [] next to text Edit schema and choose an option:
Change to built-in property to edit the schema for this component only.
Update repository connection to edit the metadata schema in the repository.
d. To view the schema, click [] next to text Edit schema and select View schema.
Talend allows you to create metadata based on several parameters such as databases, SAP
connections, and several file types.
Note: To illustrate this, MySQL Workbench 6.3 CE along with a test dataset called talend_dq is
used. You can either try it with a similar configuration or with your own databases.
i. To select all the tables and views, select the checkbox to the left of the database name
and click Next.
All table schemas have been imported as metadata and can be used.
The tables and the views appear under the mysql 0.1 connection in the Project Repository. To
view the field in a table, click the table.
A tMysqlInput component is created with the repository information. It used the MySql 0.1
connection, and for the Schema it used the repository information from the metadata table
tdq_values.
In addition, Talend generates the SQL query and sends it to the table tdq_values.
c. To display the table data, add the tLogRow component and link the tdq_values component
to the tLogRow_1 component.
d. To run the Job, in the Run view, click Run.