0% found this document useful (0 votes)
39 views7 pages

Unstructure Data - Analysis

This document discusses the process of analyzing unstructured data from documents like PDFs and Word files. It involves loading the unstructured data into a file system, then using Data Transformation Studio to extract useful information and convert it into a structured format like XML or flat file. This structured data can then be loaded into an ETL or reporting tool for analysis. Predictive models may also be used to analyze the structured career data and draw conclusions about recruiting.

Uploaded by

Matthew Davis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

Unstructure Data - Analysis

This document discusses the process of analyzing unstructured data from documents like PDFs and Word files. It involves loading the unstructured data into a file system, then using Data Transformation Studio to extract useful information and convert it into a structured format like XML or flat file. This structured data can then be loaded into an ETL or reporting tool for analysis. Predictive models may also be used to analyze the structured career data and draw conclusions about recruiting.

Uploaded by

Matthew Davis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

UNSTRUCTURED

DATA
ANALYSIS
1. Process Flow
2. Data Transformation Studio
3. DT-PowerCenter
4. Flow Chart
5. Analysis




Contents
Process Flow
Unstructured data to be loaded to HDFS or any other file
system.
Data to be fetched in the same format for processing.
DT Studio comes into picture for converting the PDF/Word Doc
into format easily readable by ETL/Reporting tool.
Useful information to be extracted using DT and loaded to a
XML or Flat File.
Reports to be generated on this useful information for depicting
the overall career graph of a resource.
Further analysis using predictive model for firms utility for a
candidate.
Data Transformation Studio
Informatica B2B Data Transformation provides accessibility to complex file
and message formats based on a comprehensive, enterprise-class solution
to your transformation challenges.


It features the best technology for extracting data from any file, document,
or messageregardless of format, complexity, or sizeand transforming it
into a usable form.
Data Transformation is been setup on INFA server for processing the
output file from DT.
UDT Transformation is used to fetch the files from the folder where output
XML/flatfile is placed.
It will act as an input to the mapping or the report.
Data Transformation-PowerCenter
Flow Chart
HDFS
(Hadoop
Distributed
File System)
Data Transformation Studio
XML/Flat File
(Readable Format)
ETL (INFORMATICA) / Reporting
(QlikView)
Analysis
After Processing we have multiple roads:

1. Complete stats available for a candidate in a structured manner
to be queried as per convenience.

2. Predictive analysis over these statistics for the conclusions over
demands and recruits. We can use languages like R or SAS utility
for analysis.

You might also like