0% found this document useful (0 votes)
221 views12 pages

XML With Informatica

XML is a markup language that allows data to be shared across different systems. It defines rules for encoding documents in a format that is both human-readable and machine-readable. XML uses tags to structure and label data. DTDs and XML schemas provide rules to define the structure and elements of an XML document. Valid XML groups in Informatica follow rules like containing unique column names and not having many-to-many relationships between elements.

Uploaded by

Irfan Ali
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
221 views12 pages

XML With Informatica

XML is a markup language that allows data to be shared across different systems. It defines rules for encoding documents in a format that is both human-readable and machine-readable. XML uses tags to structure and label data. DTDs and XML schemas provide rules to define the structure and elements of an XML document. Valid XML groups in Informatica follow rules like containing unique column names and not having many-to-many relationships between elements.

Uploaded by

Irfan Ali
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

XML & XML with Informatica

XML My best description of XML is this: XML is a cross-platform, software and hardware independent tool for transmitting information. XML is used to Exchange Data With XML, data can be exchanged between incompatible systems. In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications. XML, DTD, and XML Schema Extensible Markup Language (XML) is a markup language generally regarded as the universal format for structured documents and data on the Web. Like HTML, XML contains element tags and attributes that define data. Unlike HTML, XML element tags and attributes are not based on a predefined, static set of elements and attributes. Every XML file can have a different set of tags and attributes. Document Type Definition (DTD) files and XML schema files define the elements and attribute that can be used and the structure within which they fit in an XML file. DTD and XML schema files specify the structure and content of XML files in different ways. A DTD file defines the names of elements, the number of times they occur, and how they fit together. The XML schema file provides the same information plus the data types of the elements. DTD The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. A DTD can be declared inline in your XML document, or as an external reference. The DTD file contains only metadata. It contains the description of the structure and the definition of the elements and attributes that can be found in the associated XML file. It does not contain any data. A sample DTD looks like this: <!ELEMENT employees (companyname, employee ) > <!ELEMENT companyname ( id, name) > <!ELEMENT employee ( emp+ ) > <!ELEMENT emp ( id, info ) > <!ELEMENT info ( name, age, sex, job, sal ) >

<!ELEMENT created-date ( format, timestamp ) > <!ELEMENT id ( #PCDATA ) > <!ELEMENT name ( #PCDATA ) > <!ELEMENT format ( #PCDATA ) > <!ELEMENT timestamp ( #PCDATA ) > eg: <employees> < companyname > <id>01</id> <name>Wipro Technologies</name> </ companyname > < employee > <emp> <id>91000</id> <info> <name>Dileep</name> <age>25</age> <sex>Male</sex> <job>Project Engineer</job> <sal>20000</sal> </info> </emp> </employee> </employees> XML Schema The XML schema file, like the DTD file, contains only metadata. In addition to the definition and structure of elements and attributes, an XML schema contains a description of the type of elements and attributes found in the associated XML file. A sample XML Schema file looks like this: <xs:element name="ECR"> <xs:complexType> <xs:sequence> <xs:element ref="ECR_object"/> <xs:element ref=" ECN_object " minOccurs="0" maxOccurs="n"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="ECR_object"> <xs:complexType>

<xs:sequence> <xs:element name="number" type="xs:string"/> <xs:element name="summary" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="ECN_object"> <xs:complexType> <xs:sequence> <xs:element name="number" type="xs:string"/> <xs:element name="summary" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> eg: <ECR> <ECR_object> <number>00996</number> <summary>Testing</summary> </ECR_object> <ECN_object> <number>00896</number> <summary>Test</summary> </ECN_object> </ECR> Cardinality in XML: Declaring only one occurrence of the same element (only once) <!ELEMENT companyname ( id, name) >(For DTD) <xs:element name="number" type="xs:string"/>(For Schema file) Declaring minimum one occurrence of the same element (one or more) <!ELEMENT employee ( emp+ ) >(For DTD) <xs:element name="number" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>(For Schema file) or <xs:element name="number" type="xs:string" minOccurs="1" maxOccurs="n"/>(For Schema file) Declaring zero or more occurrences of the same element (zero or more) <!ELEMENT employee ( emp* ) >

<xs:element name="number" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>(For Schema file) or <xs:element name="number" type="xs:string" minOccurs="0" maxOccurs="n"/>(For Schema file) Declaring zero or one occurrences of the same element (zero or one) <!ELEMENT employee ( emp? ) > <xs:element name="number" type="xs:string" minOccurs="0" maxOccurs="1"/>(For Schema file)

XML Entity References An entity reference is a group of characters used in text as a substitute for a single specific character that is also a markup delimiter in XML. Using the entity reference prevents a literal character from being mistaken for a markup delimiter For example, if an attribute must contain a left angle bracket (<), you can substitute the entity reference "&lt;". Entity references always begin with an ampersand (&) and end with a semicolon (;). You can also substitute a numeric or hexadecimal reference. The entities predefined in XML are identified in the following table.
Character & < > " ' Entity reference &amp; &lt; &gt; &quot; &apos; Numeric reference &#38; &#60; &#62; &#34; &#39; Hexadecimal reference &#x26; &#x3C; &#x3E; &#x22; &#x27;

Character data: Character data can be either a PCDATA or a CDATA in XML. PCDATA PCDATA means parsed character data. i.e. if we have a character data element declared as PCDATA then all characters or text or data inside the xml tags will be parsed by the XML parser. In this type of data, if we place a character like "<" or "&" inside an XML element, it will generate an error because the parser interprets it as the start of a new element. You cannot write something like this "if salary < 1000 then" It will fire an error. To avoid this, we have to replace the "<" character with an entity reference, like this, "if salary &lt; 1000 then" CDATA CDATA means character data. i.e. if we have a character data element declared as CDATA then all characters or text or data inside the xml tags will not be parsed by the XML parser.

If we text contains a lot of "<" or "&" characters - as program code often does - the XML element can be defined as a CDATA section. Only the characters "<" and "&" are strictly illegal in XML. Apostrophes, quotation marks and greater than signs are legal, but it is a good habit to replace them. Metadata from XML, DTD, and XML Schema Files PowerMart and PowerCenter can create metadata for a source or target definition from XML, DTD, or XML schema files. XML files provide both data and metadata, while DTD and XML schema files provide only metadata. The Designer requires a lot of memory and resources to parse very large XML files and extract metadata for source or target definitions. To ensure that the Designer creates an XML source or target definition quickly and efficiently, Informatica recommends that you import source or target definitions only from XML files that are no larger than 100K or from DTD or XML schema files. If you want to import from a very large XML file that has no DTD or XML schema file, decrease the size of the XML file by deleting duplicate data elements. You do not need all of your data to import an XML source or target definition. You need only enough data to accurately show the hierarchy of your XML file and enable the Designer to create a source or target definition. The XML schema file, like the DTD file, contains only metadata. In addition to the definition and structure of elements and attributes, an XML schema contains a description of the type of elements and attributes found in the associated XML file. Target from XML: You can create an XML target definition from an XML, DTD, or XML schema file. You can also create an XML target definition from an XML source definition or from one or more relational source definitions. Rules for a Valid Group An XML group is valid when it follows these rules: Any element or attribute in an XML file can be included in a group. A group cannot contain two elements with a many-to-many relationship. Column names in the groups are unique within a source or target definition. Group names are unique within a source or target definition.

The Designer validates any group you create or modify. When you try to create a group that does not follow these constraints, the Designer returns an error message and does not create the group. Note: If the target definition consists of only one group, then it does not require a primary key or a foreign key.

Normalized Groups A normalized group is a valid group that contains only one multiple-occurring element. In most cases, XML sources contain more than one multiple-occurring element and convert to more than one normalized group. The following rules apply to normalized groups: A normalized group must be a valid group. A normalized group cannot contain more than one multiple-occurring element.

Denormalized Groups A denormalized group has more than one multiple-occurring element. The multiple-occurring elements can have a one-to-many relationship, but not a many-to-many relationship. All the elements in a denormalized group belong to the same parent chain.

Source definitions can have denormalized groups, but target definitions cannot have denormalized groups. Denormalized groups, like denormalized relational tables, generate duplicate data. It can also generate null data. Make sure you filter out any unwanted duplicate or null data before passing data to the target. The following rules apply to denormalized groups: A denormalized group must be a valid group. A denormalized group can contain more than one multiple-occurring element. Multiple-occurring elements in a denormalized group must have a one-to-many relationship. Denormalized groups can exist in a source definition, but not in a target definition.

Group Keys and Relationships The relationship between elements in the XML hierarchy translates into a combination of primary and foreign keys that define the relationship between XML groups. If you define a key in the XML hierarchy, the Designer uses it as a primary key in a group. The Designer handles group keys and relationships differently for sources and targets. In a source definition, a group does not have to be related to any other group. A denormalized group can be independent of any other group. Therefore, groups in a source definition do not require primary or foreign keys. However, if a group is related to another group based on the XML hierarchy, and you do not designate any column as a key for the group, the Designer creates a column called the Generated Primary Key to hold a key for the group. In a target definition, each group must be related to one other group. Therefore, each group needs at least one key to establish its relationship with another group. If you do not designate any column as a key for a group, the Designer creates a column called Group Link Key to hold a key for the group.

When you run a session with a mapping that contains an XML source, the Informatica Server generates the values for the generated primary key columns in the source definition. When you run a session with a mapping that contains an XML target, you need to pass the values to the group link columns in the target groups from the data in the pipeline. Group keys and relationships follow these rules: Any element or attribute can be marked as a key. A group can have only one primary key. A group can be related to only one other group, and therefore can have only one foreign key. A column cannot be marked as both a primary key and a foreign key. A key column can be a column that points to an element in the hierarchy or a column created by the Designer. A group can have a combination of the two types of key columns. A source group does not require a key. A target group requires at least one key. The target root group requires a primary key. It does not require a foreign key. A target leaf group requires a foreign key. It does not require a primary key. A foreign key always refers to a primary key in another group. Self-referencing keys are not allowed. A foreign key column created by the Designer always refers to a primary key column created by the Designer.

Code Pages XML files contain an encoding declaration that indicates the code page used in the file. The most commonly used code pages in XML are UTF-8 and UTF-16. All XML parsers support these two code pages. For information on the XML character encoding specification, go to the W3C website at http://www.w3c.org. PowerCenter and PowerMart support the same set of code pages for XML files that they support for relational databases and other flat files. You can use any code page supported by both Informatica and the XML specification. For a list of code pages that Informatica supports, see Code Pages in the Installation and Configuration Guide. Informatica does not support any user-defined code page. For XML source definitions, PowerCenter and PowerMart use the repository code page. When you import a source definition from an XML file, the Designer displays the code page declared in the file for verification only. It does not use the code page declared in the XML file. For XML target definitions, PowerCenter and PowerMart use the code page declared in the XML file. If Informatica does not support the declared code page, the Designer returns an error. You cannot import the target definition. XML writer:

Verify the XML environment is set up correctly, such as the environment variables are set properly, the .dll files are in the correct location on Windows or the shared libraries on UNIX, and the supporting .dat files are present. How XML sources & targets look in Informatica? XML Source: Each group in an XML definition is analogous to a relational table, and the Designer treats each group within the XML Source Qualifier as a separate source of data. In a mapping, the ports of one group in an XML Source Qualifier can be part of more than one data flow. However, the ports of more than one group in the same XML Source Qualifier cannot link to one transformation or be part of the same data flow. This is the biggest drawback with XML sources. If you need to use data from two different XML source definitions, you can link a group from each source qualifier and join the data in a Joiner transformation. You can also use the same source definition more than once in a mapping. Connect each source definition to a different XML Source Qualifier and join the groups in a Joiner transformation. The following figure shows how we can join two XML groups in the same mapping using a Joiner transformation.

If we need to load data from several groups to the same target based on the granularity its always better to divide those mapping to 2 or 3 mappings & load the data to the target. When we create a session to extract data from an XML source we need to configure source properties, such as source file location, in the session properties. Define the XML source properties on the Properties settings on the Sources tab.

XML Target: The following figure shows how an XML target looks in Informatica Designer.

When you configure a session to load data to an XML target, you define properties on the Targets tab and the Transformations tab of the session properties. You can configure the following properties for XML targets:

Output file options. You can configure the directory and file name to which the Informatica Server writes the target file. Code page. You can define the code page declared in the XML target file. Use the Set File Properties button to define the code page. Duplicate Group Row Handling. You can configure how the Informatica Server handles duplicate rows. DTD/Schema Reference. You can specify a DTD or an XML schema file name for the XML target. Points to be taken care while using XML as source or target: The code page used in the XML/DTD/XML Schema file should be a valid one and supported by Informatica. It should be taken care while creating the file to match with the same format. For eg: For a UTF-8 code file, the encoding should be UTF-8 itself. It should not be ANCI. If we have a DTD/XML Schema file associated with the source/target, then the XML data file should exactly match with the DTD/XML Schema file. If we have a large no. of data in the XML source or to load huge data to our XML target, then divide it into smaller modules with respect to the business requirement. Informatca will not be able to read or write bigger XML files. If we got any changes to the source/target DTD/XML schema file, always re-import the source/target again. Always make sure that the data type and size for the imported XML metadata is correct & matching with the requirement. By default it will take only number & string for all data as data type & size as 10. We need to make sure that whenever we join two groups in the Joiner transformation that we select only the smaller group/set as the Master group. If we have XML as target, we should always make sure that the data sent to the target is matching with the cardinality defined in the target DTD/XML Schema file/XML file. If we have XML as source, decide whether groups in the source to be normalized or de-normalized based on our requirement. But make sure that the XML sources contain only one multiple-occurring element. XML target never can be de-normalized one.

You might also like