0% found this document useful (0 votes)
69 views16 pages

Internship Presentation 2

This document outlines a project to create a scalable data ecosystem and user-friendly dashboard using Python, Google Cloud Platform services, and an ETL pipeline. It describes extracting raw data, transforming it by creating dimensions and merging tables, and loading it into BigQuery. This forms the foundation for insightful analytics accessible through a Looker dashboard. The goals are to unlock full data potential for informed decisions and understandings. Key tools used include Jupyter Notebook, Google Cloud Platform, Mage AI, and Looker.

Uploaded by

kaustav.9748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views16 pages

Internship Presentation 2

This document outlines a project to create a scalable data ecosystem and user-friendly dashboard using Python, Google Cloud Platform services, and an ETL pipeline. It describes extracting raw data, transforming it by creating dimensions and merging tables, and loading it into BigQuery. This forms the foundation for insightful analytics accessible through a Looker dashboard. The goals are to unlock full data potential for informed decisions and understandings. Key tools used include Jupyter Notebook, Google Cloud Platform, Mage AI, and Looker.

Uploaded by

kaustav.9748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 16

STREAMLINING DATA INSIGHTS:

A COMPREHENSIVE DATA
ENGINEERING
DASHBOARD

TANZIL AHMED KOUSTAV DUTTA


(24) (49)
CONTEXT
INTRODUCTION

GOOGLE CLOUD PLATFORM


EXTRACTION TRANSFORM LOAD

DASHBOARD

CONCLUSION
INTRODUCTION
This project integrates Python, Google Cloud Platform services, and a robust ETL pipeline to
create a scalable data ecosystem. A well-structured data model, coupled with GCP's

INTRODUCTION
capabilities, forms the foundation for insightful analytics and a user-friendly dashboard. The
ultimate goal is to unlock the full potential of data for informed decision-making.
WHAT IS
ETL ?
DATA EXTRACTION

DATA TRANSFORMATION

DATA LOADING
OUR
MODEL
RAW DATA
ETL
ANALYTICS
LOOKER

DATA: THE HEARTBEAT OF
DECISIONS, CURRENCY OF
PROGRESS AND KEY TO
UNDERSTANDING


TOOLS USED
GOOGLE CLOUD
JUPYTER NOTEBOOK PLATFORM

MAGE AI LOOKER
ENTITY RELATIONSHIP DIAGRAM

FACT TABLE
PRIMARY KEY – VendorID

DIMENSION
TABLE
PRIMARY KEY-o passenger_count_
dim
o
rate_code_id
o
trip_distance_id
o
payment_type_dim
o
datetime_dim
o
pickup_location_di
DATA TRAINING
IMPORTING REQUIRED
PACKAGES
PANDAS
DATA FRAME

SORTING

MERGING
GOOGLE CLOUD PLATFORM
UC
VIRTUAL KE SQL
MACHINE TP
hx
xh
GCP BUCKET

hh
h
BU
COMPUTE CK
ENGINE BIG QUERY
ET
STORAGE
DATA EXTRACTION
IMPORTS THE NECESSARY LIBRARIES: IO AND PANDAS.
CHECKS IF THE DATA LOADER VARIABLE IS ALREADY DEFINED .
DEFINES A FUNCTION CALLED LOAD_DATA_FROM_API().
INSIDE THE LOAD_DATA_FROM_API() FUNCTION,

Uses the requests library to download the PDF file.


Uses the io.StringIO() function to create a string buffer from the PDF file contents.
Uses the pandas.read_csv() function to read the data from the string buffer into a
Pandas Data Frame.
Returns the Pandas DataFrame.
DATA TRANSFORMATION

Creating trip Dropping


Importing libraries: distance duplicates:
dimension:
Renaming
Mapping rate columns:
Loading data: code:
Combining
Creating datetime dimensions:
dimensions:
DATA LOAD
Config File Dropping
Loader: duplicates:
Importing
Libraries: Data Frame: Renaming
columns:
Os.path:
get_repo_path:

Big Query:
DASHBOA
RD
CONCLUSION
In conclusion, our exploration into the integration of Python, GCP's Cloud Services, and a robust ETL (Extract, Transform, Load) pipeline has
unveiled a comprehensive approach to handling data efficiently. The outlined objectives led us to develop a model supported by a well-designed
ER diagram, utilizing Python for key tasks such as indexing, merging, and facilitating seamless interactions with a diverse dataset.

CONCLUSION
THANK
YOU

You might also like