0% found this document useful (0 votes)
16 views13 pages

Probleam Statement

The document outlines a structured approach for conducting a Super_Store Analysis using Informatica, including data preparation, cleaning, and analysis tasks. It details the steps for setting up Oracle SQL, Informatica repository, and workflows for various analyses such as sales summary, customer order analysis, and order processing time analysis. Each task includes specific operations, sample outputs, and instructions for workflow management to ensure successful execution.

Uploaded by

Vinayak Shegar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views13 pages

Probleam Statement

The document outlines a structured approach for conducting a Super_Store Analysis using Informatica, including data preparation, cleaning, and analysis tasks. It details the steps for setting up Oracle SQL, Informatica repository, and workflows for various analyses such as sales summary, customer order analysis, and order processing time analysis. Each task includes specific operations, sample outputs, and instructions for workflow management to ensure successful execution.

Uploaded by

Vinayak Shegar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Certainly!

Here's a structured and well-organized format for the Super_Store Analysis using Informatica:

---

## Informatica Hands-On Challenge: Super_Store Analysis

### Introduction:

You are provided with a sample dataset from a retail store, Super_Store. This dataset contains
information about orders, customers, products, and sales. Your task involves cleaning the data, analyzing
sales, customer orders, customer geography, and order processing time using Informatica PowerCenter.

### Data Preparation:

#### Oracle SQL Setup:

1. **Log in to Oracle SQL Developer in Admin connection using the credentials:**

- **Username:** system

- **Password:** Admin

2. **Create a table named `Super_Store` with the provided structure:**

| Column Name | Data Type |

|-----------------|---------------|

| Row_ID | INT |

| Order_Date | DATE |

| Ship_Date | DATE |

| Ship_Mode | VARCHAR(50) |

| Customer_ID | VARCHAR(50) |

| Customer_Name | VARCHAR(50) |
| Segment | VARCHAR(50) |

| Country | VARCHAR(50) |

| City | VARCHAR(50) |

| State | VARCHAR(50) |

| Postal_Code | VARCHAR(50) |

| Region | VARCHAR(50) |

| Product_ID | VARCHAR(250) |

| Category | VARCHAR(250) |

| Sub_Category | VARCHAR(250) |

| Product_Name | VARCHAR(250) |

| Sales | INT |

**NOTE:** While loading data into the table, update `order_date` and `ship_date` Date Format to
`DD/MM/YYYY`.

3. **Load `superstore_data.csv` into the `Super_Store` table.**

- **Path:** `\Desktop\Project\miniproject-informatica-super_store\`

### Informatica Repository Setup:

1. **Connect to the Informatica repository manager using the following credentials:**

- **Username:** Administrator

- **Password:** Administrator

2. **Create a folder named `Super_Store` in the repository manager.**

#### How to Import Source Table in Source Analyzer:

1. **Go to "Sources" option in Source Analyzer**


- **Step 1:** Click on the "Sources" tab from the main menu.

- **Step 2:** Select "Import from Database" option, then the ODBC Connection box will open.

2. **Create ODBC connection:**

- **Step 1:** Click the button next to "ODBC Data Source(...)".

- **Step 2:** On the next page, select the "User DSN" tab and click the "Add" button.

- **Step 3:** Select "Oracle Wire Protocol".

- **Step 4:** On the next page, select the "General" tab and enter the database details, then click
"Connect".

- **Data Source name:** oracle

- **Host:** localhost

- **Port:** 1521

- **SID:** xe

#### Create Connections for Workflow Manager:

1. **To Create a Relational Connection:**

- **Step 1:** In Workflow Manager, click on the "Connection" menu and select "Relational Option".

- **Step 2:** In the pop-up window, select "Oracle" in type and click the "New" button.

- **Step 3:** In the new window of connection object definition:

- **Enter Connection Name:** oracle

- **Enter username:** system

- **Enter password:** Admin

- **Enter connection string:** xe

- **Leave other settings as default and select OK button**

### Data Cleaning:

- **Mapping Name:** Map_Cleaned_Data


- **Workflow Name:** Workflow_Cleaned_Data

- **Session Name:** Session_Cleaned_Data

- **Target Table:** Super_Store_Cleaned_Data

#### Operations:

1. **Remove duplicates from the dataset to ensure data integrity.**

2. **Filter records where Country is 'United States' to focus on domestic orders.**

3. **Extract numeric part from `Customer_ID` to standardize customer identification (e.g., CH-1234,
extract 1234).**

4. **Concatenate `Customer ID` and `Customer Name` with '-' to create a unique identifier for each
customer (e.g., 1234-Charlies, Extracted ID-Customer name) and store it in `Customer_Id_Name`
column.**

5. **Drop the `customer_id`, `Customer_name` columns.**

6. **After cleaning, load data into the `Super_Store_Cleaned_Data` target table. (For columns check
sample output)**

#### After completing the mapping, in the Workflow Manager:

1. **Connect to repository "Repo_eticlass", and double-click on the folder to connect.**

- **Username:** Administrator

- **Password:** Administrator

2. **Create session `Session_Cleaned_Data`:**

- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.

- **Step 2:** In the edit task window:

- **Select mapping tab.**

- **Select connection property.**

- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**
- **Select OK button.**

3. **Creating Workflow:**

- **Step 1:** In Workflow Designer:

- **Select workflows menu.**

- **Select create option.**

- **Step 2:** In create workflow window:

- **Enter workflow name:** Workflow_Cleaned_Data

- **Select OK button (leave other options as default).**

When you create a workflow, it does not consist of any tasks, so to execute any task in a workflow you
have to add a task in it.

- **To add session task created in Task Developer to Workflow Designer:**

- **Step 1:** In the navigator tree, expand the tasks folder.

- **Step 2:** Drag and drop the command task to Workflow Designer.

- **After linking the task to the workflow, start the workflow.**

- **Check the workflow monitor if it succeeds or not.**

#### Sample Output:

| CUSTOMER_ID_NAME |

|--------------------------|

| 21925-Zushuss Donatelli |

| 16585-Ken Black |

| 21520-Tracy Blumstein |

Note: Super_Store_Cleaned_Data table data is used for the below tasks.

### Analysis Tasks:


#### Task 1: Sales Summary

- **Mapping Name:** Map_Sales_Summary

- **Workflow Name:** Workflow_Sales_Summary

- **Session Name:** Session_Sales_Summary

- **Target Table:** Sales_Summary

**Problem Statement:** Summarize total sales and average sales for each customer. Identify customers
with significant contributions to overall sales.

**Operations:**

1. **Summarize the sales data by calculating the sum of sales and store it in `Total_Sales` and average
sales store it in an `Avg_Sales` column for each customer using their 'Customer ID Name'.**

2. **Order the summarized data in descending order based on the total sales (`Total_Sales`).**

3. **Filter customers with total sales greater than 3000 and average sales greater than 300 to focus on
significant contributors.**

4. **Drop the unnecessary columns.**

5. **Load data into the `Sales_Summary` target table.**

**After completing the mapping, in the Workflow Manager:**

1. **Connect to repository "Repo_eticlass" and double-click on the folder to connect.**

- **Username:** Administrator

- **Password:** Administrator

2. **Create session `Session_Sales_Summary`:**

- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.
- **Step 2:** In the edit task window:

- **Select mapping tab.**

- **Select connection property.**

- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**

- **Select OK button.**

3. **Creating Workflow:**

- **Step 1:** In Workflow Designer:

- **Select workflows menu.**

- **Select create option.**

- **Step 2:** In create workflow window:

- **Enter workflow name:** Workflow_Sales_Summary

- **Select OK button (leave other options as default).**

- **To add session task created in Task Developer to Workflow Designer:**

- **Step 1:** In the navigator tree, expand the tasks folder.

- **Step 2:** Drag and drop the command task to Workflow Designer.

- **After linking the task to the workflow, start the workflow.**

- **Check the workflow monitor if it succeeds or not.**

#### Sample Output:

| CUSTOMER_ID_NAME | TOTAL_SALES | AVG_SALES |

|--------------------|-------------|-----------|

| 11140-Becky Martin | 10540 | 1506 |

| 14635-Grant Thornton | 8167 | 4084 |

| 20290-Sean Braxton | 5580 | 2790 |


#### Task 2: Customer Order Analysis

- **Mapping Name:** Map_Order_Analysis

- **Workflow Name:** Workflow_Order_Analysis

- **Session Name:** Session_Order_Analysis

- **Target Table:** Order_Analysis

**Problem Statement:** Analyze customer orders to determine

the most frequent buyers and their order patterns.

**Operations:**

1. **Filter records for customers in category 'Office Supplies' and City in 'San Francisco' to analyze local
customer behavior.**

2. **Create new column `orders_count`, calculate the count of orders for each customer to determine
their order frequency.**

3. **Sort the results by order count in descending order to identify the most frequent buyers and get
only top 10 records.**

4. **Load data into the `Order_Analysis` target table.**

**After completing the mapping, in the Workflow Manager:**

1. **Connect to repository "Repo_eticlass" and double-click on the folder to connect.**

- **Username:** Administrator

- **Password:** Administrator

2. **Create session `Session_Order_Analysis`:**

- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.
- **Step 2:** In the edit task window:

- **Select mapping tab.**

- **Select connection property.**

- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**

- **Select OK button.**

3. **Creating Workflow:**

- **Step 1:** In Workflow Designer:

- **Select workflows menu.**

- **Select create option.**

- **Step 2:** In create workflow window:

- **Enter workflow name:** Workflow_Order_Analysis

- **Select OK button (leave other options as default).**

- **To add session task created in Task Developer to Workflow Designer:**

- **Step 1:** In the navigator tree, expand the tasks folder.

- **Step 2:** Drag and drop the command task to Workflow Designer.

- **After linking the task to the workflow, start the workflow.**

- **Check the workflow monitor if it succeeds or not.**

#### Sample Output:

| CUSTOMER_ID_NAME | ORDERS_COUNT |

|------------------|--------------|

| 14045-Jeremy Pistek | 4 |

| 16510-Keith Herrera | 4 |

#### Task 3: Customer Geography Analysis


- **Mapping Name:** Map_Geography_Analysis

- **Workflow Name:** Workflow_Geography_Analysis

- **Session Name:** Session_Geography_Analysis

- **Target Table:** Geography_Analysis

**Problem Statement:** Analyze customer distribution across different regions to identify potential
market segments.

**Operations:**

1. **Filter records for customers in the state 'California' to focus on specific geographical areas.**

2. **Group customers by region (North, South, East, West) based on their location data.**

3. **Calculate the count of customers in each region to understand the geographical distribution.**

4. **Drop the unnecessary columns, kindly check the sample output.**

**After completing the mapping, in the Workflow Manager:**

1. **Connect to repository "Repo_etlclass" and double-click on the folder to connect.**

- **Username:** Administrator

- **Password:** Administrator

2. **Create session `Session_Geography_Analysis`:**

- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.

- **Step 2:** In the edit task window:

- **Select mapping tab.**

- **Select connection property.**

- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**
- **Select OK button.**

3. **Creating Workflow:**

- **Step 1:** In Workflow Designer:

- **Select workflows menu.**

- **Select create option.**

- **Step 2:** In create workflow window:

- **Enter workflow name:** Workflow_Geography_Analysis

- **Select OK button (leave other options as default).**

- **To add session task created in Task Developer to Workflow Designer:**

- **Step 1:** In the navigator tree, expand the tasks folder.

- **Step 2:** Drag and drop the command task to Workflow Designer.

- **After linking the task to the workflow, start the workflow.**

- **Check the workflow monitor if it succeeds or not.**

#### Sample Output:

| CUSTOMER_ID_NAME | REGION | STATE | REGION_ORDERS_COUNT |

|---------------------|--------|------------|---------------------|

| 14095-Eudokia Martin | West | California | 182 |

#### Task 4: Order Processing Time Analysis

- **Mapping Name:** Map_Order_Processing

- **Workflow Name:** Workflow_Order_Processing

- **Session Name:** Session_Order_Processing

- **Target Table:** Order_Processing


**Problem Statement:** Evaluate order processing efficiency by analyzing the time taken between order
placement and shipment.

**Operations:**

1. **Calculate the processing days for each order by subtracting the order date from the ship date and
store it in new column `processing_days`.**

2. **Categorize processing days (e.g., less than 1 day then immediate delivery, 1 to 3 days then
moderate delivery, 3 or more days then long term delivery).**

3. **Count the number of orders falling within each category of processing days to analyze processing
days distributions.**

4. **Drop the unnecessary columns, kindly check the sample output.**

5. **Load data into the `Order_Processing` target table (for columns check sample output).**

**After completing the mapping, in the Workflow Manager:**

1. **Connect to repository "Repo_etlclass" and double-click on the folder to connect.**

- **Username:** Administrator

- **Password:** Administrator

2. **Create session `Session_Order_Processing`:**

- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.

- **Step 2:** In the edit task window:

- **Select mapping tab.**

- **Select connection property.**

- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**

- **Select OK button.**

3. **Creating Workflow:**
- **Step 1:** In Workflow Designer:

- **Select workflows menu.**

- **Select create option.**

- **Step 2:** In create workflow window:

- **Enter workflow name:** Workflow_Order_Processing

- **Select OK button (leave other options as default).**

- **To add session task created in Task Developer to Workflow Designer:**

- **Step 1:** In the navigator tree, expand the tasks folder.

- **Step 2:** Drag and drop the command task to Workflow Designer.

- **After linking the task to the workflow, start the workflow.**

- **Check the workflow monitor if it succeeds or not.**

#### Sample Output:

| CATEGORISE_PROCESSING_DAYS | ORDER_COUNT |

|----------------------------|-------------|

| Immediate delivery | 32 |

| Long Term delivery | 641 |

### Final Step:

1. **Run the provided PowerShell script `sample_test.ps1` to get the sample score.**

2. **Submit your solution to validate your work.**

---

This format includes all necessary steps and details required for completing the Informatica Hands-On
Challenge for Super_Store Analysis. Let me know if you need further assistance!

You might also like