U2 - Hub Spoke
U2 - Hub Spoke
1.Extract: The first stage in the ETL process is to extract data from various sources such as
transactional systems, spreadsheets, and flat files. This step involves reading data from the
source systems and storing it in a staging area.
2.Transform: In this stage, the extracted data is transformed into a format that is suitable for
loading into the data warehouse. This may involve cleaning and validating the data, converting
data types, combining data from multiple sources, and creating new data fields.
3.Load: After the data is transformed, it is loaded into the data warehouse. This step involves
creating the physical data structures and loading the data into the warehouse.
ETL Process in Data Warehouse
Extract
During data extraction, raw data is copied or exported from source locations to a
staging area. Data management teams can extract data from a variety of data
sources, which can be structured or unstructured. Those sources include but are
not limited to:
•SQL or NoSQL servers
•CRM and ERP systems
•Flat files
•Email
•Web pages
How ETL works
Transform
The second step of the ETL process is transformation. In this step, a set of rules
or functions are applied to the extracted data to convert it into a single standard
format. It may involve the following processes/tasks: Filtering – loading only
certain attributes into the data warehouse.
•Cleaning – filling up the NULL values with some default values, mapping U.S.A,
United States, and America into USA, etc.
•Joining – joining multiple attributes into one.
•Splitting – splitting a single attribute into multiple attributes.
•Sorting – sorting tuples on the basis of some attribute (generally key-attribute).
How ETL works
Loading
•The third and final step of the ETL process is loading. In this step,
the transformed data is finally loaded into the data warehouse.
•Sometimes the data is updated by loading into the data
warehouse very frequently and sometimes it is done after longer
but regular intervals.
•The rate and period of loading solely depend on the requirements
and vary from system to system.
ETL Tools
The most commonly used ETL tools are:
1. Hevo
2.Sybase
3.Oracle Warehouse builder
4.CloverETL
5.MarkLogic.
Advantages of the ETL process in data warehousing:
1.Improved data quality: The ETL process ensures that the data in the data warehouse
is accurate, complete, and up-to-date.
2.Better data integration: The ETL process helps to integrate data from multiple
sources and systems, making it more accessible and usable.
3.Increased data security: The ETL process can help to improve data security by
controlling access to the data warehouse and ensuring that only authorized users can
access the data.
4.Improved scalability: The ETL process can help to improve scalability by providing a
way to manage and analyze large amounts of data.
5.Increased automation: ETL tools and technologies can automate and simplify the
ETL process, reducing the time and effort required to load and update data in the
warehouse.
Disadvantages of ETL process in data warehousing: