Probleam Statement
Probleam Statement
Here's a structured and well-organized format for the Super_Store Analysis using Informatica:
---
### Introduction:
You are provided with a sample dataset from a retail store, Super_Store. This dataset contains
information about orders, customers, products, and sales. Your task involves cleaning the data, analyzing
sales, customer orders, customer geography, and order processing time using Informatica PowerCenter.
- **Username:** system
- **Password:** Admin
|-----------------|---------------|
| Row_ID | INT |
| Order_Date | DATE |
| Ship_Date | DATE |
| Ship_Mode | VARCHAR(50) |
| Customer_ID | VARCHAR(50) |
| Customer_Name | VARCHAR(50) |
| Segment | VARCHAR(50) |
| Country | VARCHAR(50) |
| City | VARCHAR(50) |
| State | VARCHAR(50) |
| Postal_Code | VARCHAR(50) |
| Region | VARCHAR(50) |
| Product_ID | VARCHAR(250) |
| Category | VARCHAR(250) |
| Sub_Category | VARCHAR(250) |
| Product_Name | VARCHAR(250) |
| Sales | INT |
**NOTE:** While loading data into the table, update `order_date` and `ship_date` Date Format to
`DD/MM/YYYY`.
- **Path:** `\Desktop\Project\miniproject-informatica-super_store\`
- **Username:** Administrator
- **Password:** Administrator
- **Step 2:** Select "Import from Database" option, then the ODBC Connection box will open.
- **Step 2:** On the next page, select the "User DSN" tab and click the "Add" button.
- **Step 4:** On the next page, select the "General" tab and enter the database details, then click
"Connect".
- **Host:** localhost
- **Port:** 1521
- **SID:** xe
- **Step 1:** In Workflow Manager, click on the "Connection" menu and select "Relational Option".
- **Step 2:** In the pop-up window, select "Oracle" in type and click the "New" button.
#### Operations:
3. **Extract numeric part from `Customer_ID` to standardize customer identification (e.g., CH-1234,
extract 1234).**
4. **Concatenate `Customer ID` and `Customer Name` with '-' to create a unique identifier for each
customer (e.g., 1234-Charlies, Extracted ID-Customer name) and store it in `Customer_Id_Name`
column.**
6. **After cleaning, load data into the `Super_Store_Cleaned_Data` target table. (For columns check
sample output)**
- **Username:** Administrator
- **Password:** Administrator
- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.
- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**
- **Select OK button.**
3. **Creating Workflow:**
When you create a workflow, it does not consist of any tasks, so to execute any task in a workflow you
have to add a task in it.
- **Step 2:** Drag and drop the command task to Workflow Designer.
| CUSTOMER_ID_NAME |
|--------------------------|
| 21925-Zushuss Donatelli |
| 16585-Ken Black |
| 21520-Tracy Blumstein |
**Problem Statement:** Summarize total sales and average sales for each customer. Identify customers
with significant contributions to overall sales.
**Operations:**
1. **Summarize the sales data by calculating the sum of sales and store it in `Total_Sales` and average
sales store it in an `Avg_Sales` column for each customer using their 'Customer ID Name'.**
2. **Order the summarized data in descending order based on the total sales (`Total_Sales`).**
3. **Filter customers with total sales greater than 3000 and average sales greater than 300 to focus on
significant contributors.**
- **Username:** Administrator
- **Password:** Administrator
- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.
- **Step 2:** In the edit task window:
- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**
- **Select OK button.**
3. **Creating Workflow:**
- **Step 2:** Drag and drop the command task to Workflow Designer.
|--------------------|-------------|-----------|
**Operations:**
1. **Filter records for customers in category 'Office Supplies' and City in 'San Francisco' to analyze local
customer behavior.**
2. **Create new column `orders_count`, calculate the count of orders for each customer to determine
their order frequency.**
3. **Sort the results by order count in descending order to identify the most frequent buyers and get
only top 10 records.**
- **Username:** Administrator
- **Password:** Administrator
- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.
- **Step 2:** In the edit task window:
- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**
- **Select OK button.**
3. **Creating Workflow:**
- **Step 2:** Drag and drop the command task to Workflow Designer.
| CUSTOMER_ID_NAME | ORDERS_COUNT |
|------------------|--------------|
| 14045-Jeremy Pistek | 4 |
| 16510-Keith Herrera | 4 |
**Problem Statement:** Analyze customer distribution across different regions to identify potential
market segments.
**Operations:**
1. **Filter records for customers in the state 'California' to focus on specific geographical areas.**
2. **Group customers by region (North, South, East, West) based on their location data.**
3. **Calculate the count of customers in each region to understand the geographical distribution.**
- **Username:** Administrator
- **Password:** Administrator
- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.
- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**
- **Select OK button.**
3. **Creating Workflow:**
- **Step 2:** Drag and drop the command task to Workflow Designer.
|---------------------|--------|------------|---------------------|
**Operations:**
1. **Calculate the processing days for each order by subtracting the order date from the ship date and
store it in new column `processing_days`.**
2. **Categorize processing days (e.g., less than 1 day then immediate delivery, 1 to 3 days then
moderate delivery, 3 or more days then long term delivery).**
3. **Count the number of orders falling within each category of processing days to analyze processing
days distributions.**
5. **Load data into the `Order_Processing` target table (for columns check sample output).**
- **Username:** Administrator
- **Password:** Administrator
- **Step 1:** Double-click on the session object in Workflow Manager to open the task window to
modify the task properties.
- **Assign the connection to source and target (In targets property - load data as normal and click on
insert check box).**
- **Select OK button.**
3. **Creating Workflow:**
- **Step 1:** In Workflow Designer:
- **Step 2:** Drag and drop the command task to Workflow Designer.
| CATEGORISE_PROCESSING_DAYS | ORDER_COUNT |
|----------------------------|-------------|
| Immediate delivery | 32 |
1. **Run the provided PowerShell script `sample_test.ps1` to get the sample score.**
---
This format includes all necessary steps and details required for completing the Informatica Hands-On
Challenge for Super_Store Analysis. Let me know if you need further assistance!