0% found this document useful (0 votes)
6 views

Extracting Code

ETL PROCESS extraction code

Uploaded by

Andrei Rindasu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Extracting Code

ETL PROCESS extraction code

Uploaded by

Andrei Rindasu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Extracting code

Imports
python
Copy code
import os
from azure.storage.blob import BlobServiceClient
import pandas as pd
import json

 os: This module provides functions for interacting with the operating system.
 azure.storage.blob: Contains classes and methods to interact with Azure Blob
Storage.
 pandas as pd: Pandas is used for data manipulation and analysis, particularly with
CSV files.
 json: This module is used to work with JSON data.

Configuration
python
Copy code
# Configuration - replace with your actual values
connect_str = "YOUR_CONNECTION_STRING" # Replace this with your Azure
Storage connection string
container_name = "customer360data" # Replace this with your Azure Blob
Storage container name

 connect_str: Your Azure Storage account connection string.


 container_name: The name of the container in Azure Blob Storage where your data
is stored.

Initialize BlobServiceClient and ContainerClient


python
Copy code
# Initialize BlobServiceClient and ContainerClient
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client(container_name)

 BlobServiceClient: This client allows you to interact with the Azure Blob Storage
service.
 ContainerClient: This client allows you to interact with a specific container within
Azure Blob Storage.
Extract CSV Function
python
Copy code
def extract_csv(blob_name):
"""
Extract CSV data from the specified blob.
"""
try:
blob_client = container_client.get_blob_client(blob_name)
downloader = blob_client.download_blob()
return pd.read_csv(downloader.download_as_bytes())
except Exception as e:
print(f"Error extracting CSV data from {blob_name}: {e}")
return None

 blob_name: The name of the blob (file) to download and process.


 container_client.get_blob_client(blob_name): Gets a client to interact with a
specific blob.
 blob_client.download_blob(): Initiates the download of the blob.
 pd.read_csv(downloader.download_as_bytes()): Reads the downloaded blob as a
CSV file into a pandas DataFrame.
 Exception handling: Prints an error message if the extraction fails and returns None.

Extract JSON Function


python
Copy code
def extract_json(blob_name):
"""
Extract JSON data from the specified blob.
"""
try:
blob_client = container_client.get_blob_client(blob_name)
downloader = blob_client.download_blob()
return json.loads(downloader.download_as_text())
except Exception as e:
print(f"Error extracting JSON data from {blob_name}: {e}")
return None

 blob_name: The name of the blob (file) to download and process.


 container_client.get_blob_client(blob_name): Gets a client to interact with a
specific blob.
 blob_client.download_blob(): Initiates the download of the blob.
 json.loads(downloader.download_as_text()): Reads the downloaded blob as a JSON
file into a Python dictionary.
 Exception handling: Prints an error message if the extraction fails and returns None.

Main Function
python
Copy code
def main():
# Extract and display customer profiles
customer_profiles = extract_csv("customer_profiles.csv")
if customer_profiles is not None:
print("Customer Profiles:")
print(customer_profiles.head())
else:
print("Failed to extract customer profiles")

# Extract and display purchase history


purchase_history = extract_json("purchase_history.json")
if purchase_history is not None:
print("\nPurchase History (first item):")
print(purchase_history[0])
else:
print("Failed to extract purchase history")

# Extract and display social media data


social_media_data = extract_csv("social_media_data.csv")
if social_media_data is not None:
print("\nSocial Media Data:")
print(social_media_data.head())
else:
print("Failed to extract social media data")

# Extract and display website interactions


website_interactions = extract_json("website_interactions.json")
if website_interactions is not None:
print("\nWebsite Interactions (first item):")
print(website_interactions[0])
else:
print("Failed to extract website interactions")

 Extract and display customer profiles:


o Calls extract_csv("customer_profiles.csv") to download and read the
CSV blob as a pandas DataFrame.
o Prints the first few rows of the DataFrame using
print(customer_profiles.head()).
o Prints an error message if extraction fails.
 Extract and display purchase history:
o Calls extract_json("purchase_history.json") to download and read the
JSON blob as a Python dictionary.
o Prints the first item of the dictionary using print(purchase_history[0]).
o Prints an error message if extraction fails.
 Extract and display social media data:
o Calls extract_csv("social_media_data.csv") to download and read the
CSV blob as a pandas DataFrame.
o Prints the first few rows of the DataFrame using
print(social_media_data.head()).
o Prints an error message if extraction fails.
 Extract and display website interactions:
o Calls extract_json("website_interactions.json") to download and
read the JSON blob as a Python dictionary.
o Prints the first item of the dictionary using
print(website_interactions[0]).
o Prints an error message if extraction fails.
Entry Point
python
Copy code
if __name__ == "__main__":
main()

 This ensures that the main() function is called when the script is executed directly.

Summary

1. Imports necessary libraries.


2. Configures connection details for Azure Blob Storage.
3. Initializes clients to interact with Azure Blob Storage.
4. Defines functions to extract CSV and JSON data from specified blobs.
5. In the main function, it extracts and displays data from different blobs, handling any
errors that occur during extraction.
6. Executes the main function when the script is run directly.

Replace YOUR_CONNECTION_STRING and customer360data with your actual Azure Storage


connection string and container name before running the script. Make sure the blob names
(customer_profiles.csv, purchase_history.json, social_media_data.csv,
website_interactions.json) match the actual file names in your Azure Blob Storage
container.

4o

You might also like