0% found this document useful (0 votes)
69 views

PDP Report Example

This document summarizes the author's professional development practice report for their industrial training at PETRONAS Digital Sdn Bhd. It includes an abstract, table of contents, and 4 chapters. Chapter 1 introduces the organization and planned tasks. Chapter 2 provides specific details on the author's data modeling project, including objectives, activities, tools used, timeline, results, knowledge gained, and challenges. Chapter 3 discusses skills improvement and references. Chapter 4 concludes with an overall achievement summary, problems encountered, opinions, and suggestions. Appendices include an achievement checklist and additional details.

Uploaded by

Dipaa Lakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

PDP Report Example

This document summarizes the author's professional development practice report for their industrial training at PETRONAS Digital Sdn Bhd. It includes an abstract, table of contents, and 4 chapters. Chapter 1 introduces the organization and planned tasks. Chapter 2 provides specific details on the author's data modeling project, including objectives, activities, tools used, timeline, results, knowledge gained, and challenges. Chapter 3 discusses skills improvement and references. Chapter 4 concludes with an overall achievement summary, problems encountered, opinions, and suggestions. Appendices include an achievement checklist and additional details.

Uploaded by

Dipaa Lakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Appendix 4A

UNIVERSITI TEKNOLOGI MALAYSIA


SCHOOL OF COMPUTING
FACULTY OF ENGINEERING

PROFESSIONAL DEVELOPMENT PRACTICE


REPORT
(SCSP4134)
DOWNSTREAM DEMAND, MASTER DATA MANAGEMENT
& EDH SELF-SERVE (DATA+)

By

NINA KHAIRINA BINTI MOHD KHAIRIL

YEAR 4

BACHELOR IN COMPUTER SCIENCE (DATA ENGINEERING)


INDUSTRY PETRONAS TWIN TOWER, KLCC 50088,
INFORMATION: KUALA LUMPUR CITY CENTER,
WP KUALA LUMPUR, MALAYSIA.

OIL & GAS

INDUSTRY SECTOR: INDUSTRY TYPE: GLC

PDP PERIOD: 4 OCTOBER 2021 – 8 JULY 2022

MR CHONG YEE ONN (PETRONAS)


COACHES:
TS. DR MUHAMMAD IQBAL TARIQ
PETRONAS DIGITAL SDN BHD
LEVEL 31, TOWER 2, BIN IDRIS (UTM) 1
ABSTRACT

Industrial training program is crucial for all School of Computing students in


UTM as one of the requirements to complete their respective degree courses. For other
courses, they will need to undergo their training during their first semester of final
year. While for the Data Engineering students, they will need to undergo 2 semesters
which is equivalent to 40 weeks of industrial training. The objective for the training is
to
provide students with the exposure of working in a real industry and dealing with data
management in particular. For Data Engineering students, it is important that we take
the most out of this opportunity as it will prepare us for our future careers especially in
this field. Apart from technical skills, this training will also enhance our soft skills
especially when dealing with people at work. Due to the pandemic, some companies
adopt rotational schedule for their employees to work from home, hence the dynamic
of working as a team change from physical presence to virtual meetings and online
discussion. In this report, I will be sharing my experience undergoing internship
training at PETRONAS Digital Sdn Bhd. The office is located in Petronas Twin
Towers, KL. I will be covering the projects that have been assigned to me, some of the
knowledge and experiences that I gained, software and hardware used to perform data
modelling tasks and most importantly the reflections to what I have achieved
throughout this training. Additionally, I will also explain the challenges faced when
performing tasks in the project and provide solutions to the issue.
2
TABLE OF CONTENT

ABSTRACT............................................................................................................................. 2
LIST OF TABLES .................................................................................................................. 5
LIST OF FIGURES ................................................................................................................ 6
LIST OF ABBREVIATIONS................................................................................................. 7
LIST OF APPENDICES......................................................................................................... 8
Chapter 1 INTRODUCTION.............................................................................................. 9
1.1 Introduction.................................................................................................................. 9
1.2 Organization................................................................................................................. 9
1.2 Organization Structure ............................................................................................... 10
1.3 Tasks Planned ............................................................................................................ 12
Chapter 2 SPECIFIC DETAILS ON PROJECT............................................................ 13
2.1 Introduction................................................................................................................ 13
2.2 Objectives of Tasks.................................................................................................... 14
2.3 Type of Activities Done ............................................................................................. 15
2.4 Tools and Technologies Used .................................................................................... 24
2.5 Period of time given to complete the project ............................................................. 27
2.6 Result of Project......................................................................................................... 28
2.7 Theoretical and Practical Knowledge ........................................................................ 31
2.8 Problem Faced............................................................................................................ 32
2.9 Conclusion ................................................................................................................. 33
Chapter 3 OVERALL INFORMATION OF INDUSTRIAL TRAINING ................... 34
3.1 Introduction................................................................................................................ 34
3.2 Skills Improvement.................................................................................................... 34
3.2.1 Technical Skills........................................................................................... 35 3.2.2 Soft
Skills.................................................................................................... 35 3.3 Reference
Materials ................................................................................................... 36 3.4 Constructive
Comment............................................................................................... 36 3.5 Conclusion
................................................................................................................. 37 Chapter 4
CONCLUSION ................................................................................................ 38 4.1
Introduction................................................................................................................ 38 4.2
Overall Achievement ................................................................................................. 38 4.3
Problem and Execution .............................................................................................. 40 4.4
Opinion and Suggestions............................................................................................ 41
3
4.5 Conclusion ................................................................................................................. 41
REFERENCES...................................................................................................................... 42
Appendix A ............................................................................................................................
43 Appendix B ............................................................................................................................
45
4
LIST OF TABLES

Table No. Title Page 2.1 Software Used 24

5
LIST OF FIGURES

Figure No. Title Page

1.1 Petronas Logo 10 1.2 Enterprise Data Organization Structure 11 1.3 DAM
Organization Structure 12 2.1 Data Modelling Lifecyle 15 2.2 View by Release
tables 17 2.3 View by Attribute tables 17 2.4 View table published in Production
Synapse 18 2.5 Data Template from 5 Digital Solutions 19 2.6 List of Entities
for Derivation 20 2.7 Cardinality Activity 20 2.8 Latest Conceptual data model
21 2.9 TRB Presentation Deck 22 2.10 MOM for TRB session 23 2.11 HP
EliteBook 24 2.12 Teams Logo 24 2.13 Outlook Logo 25 2.14 Azure DevOps
Logo 25 2.15 Rise Logo 25 2.16 Microsoft Office 25 2.17 Azure Synapse &
Amazon Redshift 26 2.18 ADLS Logo 26 2.19 Gannt Chart Activity for 15
Weeks 28 2.20 Data+ Portal 29 2.21 Data Marketplace in Data+ 29 2.22
Overview of data in Upstream 29 2.23 Current Conceptual data model progress
30

6
LIST OF ABBREVIATIONS

ED - Enterprise Data

DAM - Data Architecture Management

PIVOT - PETRONAS Integrated Vision, Operational Excellence & Technology POINT -


Plant Operation Integrated Tool

EDH - Enterprise Data Hub

ADLS - Azure Data Lake Storage

ETL - Extract Transform Load


MDM - Master Data Management

GPD - Group Project Delivery

API - Application Programming Interface

UAT - User Acceptance Test

TRB - Technical Review Board


IDM - Industry Data Model
DEP - Data Enablement Project

7
LIST OF APPENDICES

APPENDIX TITLE PAGE


Appendix A Industrial Training Achievement 43 Appendix
B Industrial Training Checklist 45
8
Chapter 1

INTRODUCTION

1.1 Introduction

As part of the Bachelor of Computer Science (Data Engineering) requirement,


it is compulsory that students undergo a 40 weeks industrial training at a selected
companies that has been listed out. This industrial training consists of 12 credit hours
for the first semester.

This chapter will describe the profile of the company where training has taken
place. It will also include organization structure, supervisor details and main tasks
planned for this industrial training.

1.2 Organization

PETRONAS Digital Sdn. Bhd. is a subsidiary company to Petroliam Nasional


Berhad (PETRONAS) also known as Group Digital where its specialties include
business solution governance, application management, project management and other
areas involving data managements. There are multiple departments under Group
Digital including Digital Excellence, Cyber Security, Digital Engineering, Enterprise
Architecture, Enterprise Data, Data Science and Employee Digital Experience. Each
departments have their own specified area especially when dealing with huge data
sources. Under PETRONAS, they have more than 43,000 employees worldwide.
Aside from national employees, they hired employers from across the world too with
great talent and skills.
9

Figure 1.1: Petronas Logo

Below are stated the company’s address with their contact number.

PETRONAS DIGITAL SDN BHD


LEVEL 31, TOWER 2,
PETRONAS TWIN TOWER, KLCC
5088, KUALA LUMPUR CITY CENTER,
WP KUALA LUMPUR, MALAYSIA
General Line: +(603) 2051 5000

Web: www.petronas.com

1.2 Organization Structure

Enterprise Data among the department under Group Digital who are
responsible to look into both data and knowledge subject areas for groupwide
consumptions. As the Data Center of Excellent and Intelligence Hub for the
enterprise, ED’s mission is to provide the single source of truth for data and
knowledge with pace and in a scalable and sustainable manner to support
business growth. ED step-up to institutionalize enterprise-wide data and
knowledge liberalization to exploit the value of information assets and
analytics used to render insights for decision making, automated decisions for
PETRONAS to achieve its techno-digital and strategic agenda imperatives for
growth and value.

There are six different functions that takes place under ED. Units under
ED includes Data Strategy & Performance, Data Program Management, Data
Policy & Governance, Data Architecture Management, Data Delivery and
Knowledge Management. The internship experience is based in Data
10
Architecture & Management group (DAM). This unit is responsible in
designing and developing common data models, metadata management,
oversee overall ecosystem architecture design and the master data management
of all enterprise data in PETRONAS. Under this team, there is also data quality
team where they will undergo data veracity. Each team members are required
to have the ability to understand the specification used to describe existing
state or data warehouse, guide data integration and control data assess for data
strategy and apply data architecture when applicable includes guiding
principles and components in any digital project delivery. Other than that, the
team will need to manage and control master and reference data, managing
data for accuracy and providing the single source of truth. The important part
is to ensure the data creation and maintenance having a governed process flow
for a proper control and compliance throughout the process.

Figure 1.2: Enterprise Data Organization Structure

11
Figure 1.3: DAM Organization Structure

Throughout this internship, I have been assigned to Chong Yee Onn.


He is the head of data modelling team under Data Architecture Management.
Under his team, I have been tagged along with Nurulizza Abdul Rahman and
will be working together with her under several projects too.

1.3 Tasks Planned

In Data Architecture and Management, data modelling is considered a


part of the function where the team’s responsibility is to design the data
according to business requirements and technology that is currently available.
This training scope specifically involves trainee to learn data modelling,
managing and developing conceptual, logical and physical data models, and
managing the data model life cycle from requirement gathering process to
implementation along with maintenance process. We also need to work closely
with analyst team to understand and translate business needs and requirements
into data model. Specifically, we provide technical questions or answers when
needed by the business analytics team especially when it involves designing
the data. Apart from modelling, data quality scope is also included in this
team. They will need to measure the quality of data according to its six
dimension (accuracy, completeness, consistency, validity, uniqueness and
timeliness) according to DQ standards and rules. The role of data modeler lies
in between data analyst and data engineer.

12
Chapter 2
SPECIFIC DETAILS ON PROJECT

2.1 Introduction

This chapter will describe the project in detail during the internship training.
All the tasks and projects accomplished will be elaborated here including the
objectives, result of the project, hardware and software used and problems faced
during the process.

Currently multiple projects have been assigned to be involved in that requires


me to learn and understand the business requirement before starting to do modelling
tasks based on the data modelling lifecycle. The first project is the Data Enablement
for Downstream Manufacturing Data (Project DEMAND) where the goal is to
discover and gather data from Group Downstream to be stored into a single Enterprise
Data Hub (EDH). As of 2021, data from two different sources PIVOT DA and
POINT has been discovered and designed to be made available in the data warehouse
by the end of the year. However, there will be more data sources that will be included
for this year.

Secondly is the EDH Self-Serve where the project’s objective is to create a


Data+ portal where data can be access and download from this platform for
PETRONAS users. Additionally, it will be a data marketplace where users can
discover data via search feature for their desire data and its data definition. The data
are categorized according to each business domains and from these data, view table
have been created to enrich datasets to meet business requirements and also present
the data as more meaningful information to the business.

The third project is Master Data Management project. The objective of this
project to create a golden record of master data for digital solutions in GPD to be
stored

13
in the MDM tool that can also be utilized enterprise wide. As of current progress, the
MDM team has discovered 5 digital solutions to gather information which focuses on
Project Information. The 5 solutions include PPMS, PCMS, MyIcons, Enterprise
Primavera 6 and Smart Planner. These solutions serve different purposes in their
application however the project data information can be the same across application.

2.2 Objectives of Tasks

PETRONAS is implementing a Data Liberalization initiative where all data


owned by the company is being liberated across organization however within the
boundaries and limitation of user access to certain data. EDH is a single source of
truth for all type of purposes, which will allow data to be captured and curated once,
then available for use across integrated value chain. This allows PETRONAS users to
utilize data for digital projects or analytical purposes securely and not having to go
through every department or unit for their data. From there, the user can make use of
the data according to their needs which include decision makings and achieve
business outcomes.

For the Data+ project, based on the data that are available in the warehouse,
they need to be categorized into eight business domains as of now that includes
Health, Safety, Security and Environment (HSSE), Finance, Internal Audit, Risk
Management, Procurement, Gas and New Energy, Upstream, Downstream. There will
also be more business domains that will be included later based on the upcoming DEP
projects. Each data modelers are given one or more area of domains where it consists
of tables and attributes from various of project related to the particular domain. From
here, modelers need to create view tables that are meaningful for the users to be utilize
later on. These tables are later passed on to the data engineer team that are responsible
in developing
the view table used in the portal later. The data will then be cleaned and undergo
quality check by Data Quality team before it will be use in the portal. Users can
utilize this data marketplace to find favorable datasets to access and download
according to their needs.

14
Lastly for the Master Data Management project, the objective of this project is
to create a golden record for master data governance and data standardization. Here,
data discovery and analyst team play their part to collect project information from
these 5 vendors, validate the data to confirm the project attributes and data are
synchronize and updated. Once they confirmed the data, assigned modeler can
determine the common data attribute and start to develop the conceptual, logical and
physical data model that stores the golden record.

2.3 Type of Activities Done

Based on the projects listed, all task that has been assigned are based on the
data modelling lifecycle. Currently the projects are ongoing while some are still in the
data discovery phase. However, my task includes working with the data discovery
team to understand the business requirement to ease the modelling process later on.
Required assistance were given to help them understand the process that a data
modelling will be doing so they can get the idea of how the data can be prepared. As
data modeler, the side task includes supporting any data modelling related task
including populating data into a file.

Figure 2.1: Data Modelling Lifecyle

15
2.3.1 Data+

For the past month, time were spent mostly being involved in the Data+
project. Here, the task is to work on creating view tables based on GPU Project
for Internal Audit and Risk Management group domain. The idea is to create a
view table to be served in the portal for enterprise-wide usage and a more user
friendly application to access data and information that is available in EDH
rather than accessing the database directly.

To begin creating the view table, it is important to understand and


determine the entity’s data type either it is a master, reference or transactional
table. In order to design a view table, we prioritize on having the transactional
table as the anchor for the view table. This way, we get more meaningful data
that can be easily understand by users across PETRONAS. However, data
modelers will need to check on the project’s ETL team for any available API
script that has been created. This is to ensure there are no duplication when
creating the view tables. The API scripts are usually created upon business
requirements hence it is more relevant to prioritized these scripts.

The GPU project were divided into two business domains consisting of
Risk Management and Internal Audit. I started the view table creation for Risk
Management by analyzing the respective model from GPU Dashboard project
where I managed to come out with 5 view datasets for this domain. By
reviewing and referencing to the existing project model, I am able to identify
the transactional table as the anchor and relationship with entities around it.
One transactional table can be considered as one view table along with its
relevant joined table. Subsequently I did the same process for Internal Audit
domain and managed to list out 12 views initially. After discussion with Mr.
Yee Onn, we had to eliminate views tables that may not be meaningful and
make amendments to the view table that has existing API script.

According to the figures below, these are the list of view tables that
have designed by data modeler and created by the data engineer team in
Synapse. Based on the data model diagram, the data modelers will need to
populate the

16
table information which consists of its View Name, View Field, Source Table
Name and Source Table Field in a file tracker. This file also eases the Data+
team to track the progress of data modelers that has been created up to
itsrelease date and designed according to their business domains in the View
by Release sheet. Once data engineers have created the view table in Synapse,
they will also update the status in the file tracker.

Figure 2.2: View by Release tables

Next is the View by Attribute sheet, data modelers need to update their
view table designs and join conditions. Here the team can clearly see what are
the attributes and entities joined into one view table. Furthermore, this also
helps for data engineers to create view table based on its source table and
fields. Once the view table created, the SQL script will be reviewed and
updated into the file tracker for references.

Figure 2.3: View by Attribute tables

Once the views have been created, the data delivery team will do some
testing to compare the view tables and source table for data validity. The
managers will sign off the test script as approval and the script can then be

17
updated into Production after UAT has been done and pass the requirement.
When the data is in Production, the data quality team will start on their tasks to
perform data veracity activity to check on the quality of the data as per
standard before being ready to publish into Data+ portal. Other than that, on
behalf of data modelers team, we also did some testing and checking from our
part. Here we will check on the table counts and correct joining between the
tables. For such errors or issues raised, we will need to log the issue in the
tracked and raised in the Teams Chat to acknowledge the team and the person
in charge of the error. This activity applies to all business domains that is
being highlighted currently.

Figure 2.4:
View table published in Production Synapse for two business
domains involved

18
2.3.2 Master Data Management – PMOF

Besides Data+ project, I have also contributed to MDM where the task
is to understand the business requirements for 5 digital solutions under GPD.
The process of gathering datasets and information has been going on for the
past month and I have been working with another data modeler to discuss the
data template provided and come up with a conceptual model design
concurrently.
Figure 2.5: Data Template from 5 Digital Solutions

As of right now, we are also working closely with data analyst and
GPD representative to get as much information as we need regarding the data
to help us create the model that serves the purpose of MDM in the future. For
this project, we have a daily scrum call to update our daily task and objectives
with the team. We also will raise any issues or blockers to ensure that we can
get it solve among the team. Currently as the data discovery is still in
progress, our tasks is understanding the data to create a storyline of how the
data flows based on the high-level diagram shared by the MDM team. By
creating relation between entity, we get to connect common attributes between
solutions and create a unified entity that will be served in MDM. Other than
that, we also came up with lots of questions to the business that needs their
clarification and answers especially when trying to determine the life-cycle of
data across solution.

19

Figure 2.6: List of Entities for Derivation


Among the activities that has been carried out includes categorizing the
attributes according to its respective entities. From here we also can determine
the common attributes among the solutions and other related project
information as well. Since this project had just been started, most of the data
sample received across solution were claim as immature where the data
collected has never undergone any process or being analyzed. The data model
that are going to be design should prepare for data that will be mature in the
future.

Figure 2.7: Cardinality Activity

20
The data template will be provided by the MDM team once they have
confirmed and validated the data with the vendors and business users. Next,
data modelers will utilize the data template that consists of data attributes that
has been confirmed and categorized as master data to be taken into MDM
record. From here, we will have to start going through the sample data further
between solutions to understand the data between one solution to another and
also creating the cardinality between entities.
Figure 2.8: Latest Conceptual data model

The development of conceptual model is still in progress as we are still


coming up with questions that needs clarification from the business. This
question consists of the data behaviour, multiplicity to ensure that the model
serves its purpose for all incoming data and few others. We also need to take
into consideration of certain attributes that are not available in all solutions but
present in some. The advantage that we have now as data modeler is we can
take the time that we have to understand and study the data much longer and
come up with questions for the analyst to ask with the business regarding the
data. Finally, we will need to have a discussion with the team and modelers to
present our conceptual model as a high-level diagram to give them an
overview and help understand the data flow in MDM.

21
2.3.3 DEMAND

As for DEMAND project, mostly small tasks have been assigned to


assist data modelers in reaching the model that meets the business
requirements. Among the tasks that have been done is help to populate the
presentation slide for Technical Board presentation. The slide includes the
project background, the proposed conceptual, logical and physical data model.
During this session, the model will need to be presented and get approval
before passing on to the data engineers for the data to be ingested into the data
warehouse. For DEMAND Project, there are a few data sources to be ingested
into the data warehouse. Every data source with the model with have to
undergo a TRB session for approval. The presentation slide made focus on the
PIVOT DA module.

Figure 2.9: TRB Presentation Deck

Other tasks include gathering business requirements for another data


source which is the POINT data source specifically the COMMON module.
The COMMON module contains information of the staff along with their day
to-day task rooster. Earlier on, we have yet to gain access to their database to
do some query so we will need to work with data analysts to help understand
the data and relationships between entities. From there, data modelers started
developing a conceptual model for this module.

22
2.3.4 Side Tasks

Apart from modelling tasks, other tasks have also been assigned. This
includes record minute of meeting for every TRB sessions that happened in a
week. OneNote is used as a tool to record the minute of meeting for one
session and can be refer to in the future. The minute meeting would include
the model designed, comments from panels and final decision made regarding
their model. There will be several slots available per week where the higher
management have free slots in their session to conduct a TRB meeting. To
book a slot, data modeler will need to reach out to the TRB secretariat. Slot
are decided upon manager’s availability. The head of data design and data
modelling will be the panel to approved or declined models presented. During
the TRB session, data modelers will present their model of that particular
project that has been assigned. The model will be a future model that will be
developed by the data engineers into the data warehouse and store data.

The purpose of TRB session is to get approval for the model that has
been designed and will be implement to store data into data warehouse. The
design needs to be correct and adhere the standard of industry data model if
applicable. Modelers can refer to Industry Data Workbench portal as a
benchmark for industry data model to design data model. There are a lot of
data model and entities that can be used as reference to design.

Figure 2.10: MOM for TRB Session

23
2.4 Tools and Technologies Used

2.4.1 Hardware

Figure 2.11 HP EliteBook

The company provided a personal computer along with HP accessories


including a headset, mouse, laptop charger and many other accessories to ease
the training process. All of the tasks were done using this computer with easy
access to the system.
2.4.2. Software
Software Description

a) Microsoft Teams Microsoft Teams is the


communication medium for
everyone in the company to
communicate with each other.
Online meetings are held through
Microsoft Teams since everyone
Figure 2.12: Teams Logo is working from home due to the
pandemic situations.

24
b) Microsoft Outlook 365 Outlook 365 is for the emailing
platform and calendar invitation
for the online meetings. Any
meeting invitation or important
information will be sent on this
platform.
Figure 2.13: Outlook Logo

c) Azure DevOps Azure Devops is a platform


where tasks and activities are
being assigned to team
members. All tasks and work
update needs to be updated in
the Azure DevOps Board. Here,
Figure 2.14: DevOps Logo team leaders can track each
project progress and activities of
team members. Every Friday,
our team leader will conduct a
weekly meeting to monitor and
get updates from team members
regarding their respective
projects.
d) Rise Editor RISE Editor is a free information
modelling tool for any
information system
development. Most of the time,
this software is being use when
Figure 2.15: Rise Logo creating a conceptual, logical
and physical view.

e) Microsoft Office Microsoft Office has been one of


the key tools that has been
utilized throughout the entire
company according to its
purpose. Microsoft Word has
been used in the company for
various purposes including
Figure 2.16: Microsoft Office Logo documentation especially after a
certain task or

25
project has been completed.
Other than that, Microsoft Excel
one of the popular software that
uses spreadsheets to organize
numbers and data with formulas
and functions. Excel file is also
known as one of data source
before transform and loaded into
EDH.

f) Azure Synapse, Amazon Microsoft Azure Synapse


Analytics is a cloud data
Redshift
warehouse platform where all
data under PETRONAS is stored
here as a single source data hub.
However, some data are also
being stored in Amazon Redshift
depending on the Data
Engineer’s and business’
decision. In Synapse, the Data
Engineer can query data, ingest
Figure 2.17: Synapse & Redshift Logo or make changes according to
favour of project’s requirement.
Data that is available in the
warehouse can be utilized for
analytics or decision making
purposes.

g) Azure Data Lake Storage ADLS provides unlimited


storage and auto-scale. It also
includes major capabilities that
is required for all types of
processing, analytics across
platforms.

Figure 2.18: ADLS Logo

Table 2.1: List of Software used during training

26
2.5 Period of time given to complete the project

Each project has its own project timeline. The time given to complete the data
model depends on the size of the table and data. To develop the conceptual model, it
is important that we understand the business requirement to ensure design meets their
needs.

For Data+ project, the project started around week 3 and expected to launch
by the end of January 2022. However, the task allocated to data modeler took around
4 to 5 weeks to complete the view design and getting the view approved by project
managers.
For MDM, the project started around the time the internship period started
and right now it is still in the process of data discovery. The estimation month to
complete this project is in July 2022. For the data modeler, the tasks have been to
understand and studying the data from each solution for few weeks as they updated
the data template. Once data discovery finalizes all master data across digital
solutions, an estimation period of 4 to 6 weeks is required to complete the conceptual
and logical model providing that the common data attributes across solutions have
been determined.

While in DEMAND project, the tasks assigned have been to assist data
modelers to help them complete the model. Small tasks like populating presentation
slide, understanding business requirements and drafting a simple conceptual model
for one of the modules in one data source. This helps the data modeler to keep up to
their timeline and ease their designing process.

The timeline for tasks has been recorded in the Gantt Chart attached below.

27

Figure 2.19: Gannt Chart for 20 Weeks

2.6 Result of Project

While most of the project is also still ongoing, there are no finalized result
that can be shown. However, results displayed are as of current progress.

As for the Data+, the output of view table has been published into the Data
Warehouse in Synapse Production where right now the Data Quality team is doing
quality check over the data. After the process and once data are confirmed, data
should be ready to serve in the portal. The idea of this project is to serve a platform
for user to retrieve data that are available in the warehouse in a more proper interface
and experience without having to retrieve from the warehouse directly. Additionally,
users will be able to get an overview of the dataset quality before retrieving datasets.

Below are the Data+ portal that has been launched recently by Enterprise
Data team. Currently there are a total of 52 datasets available in the market place.
Datasets are categorized in each business domains and there are also overview
regarding the data for users to understand before downloading. This portal acts like a
library of data that is available enterprise-wide for any purposes. However, some
data classified as confidential may not be serve in the portal or requires certain
authorization beforehand.

28

Figure 2.20: Data+ Portal


Figure 2.21: Data Marketplace in Data+

Figure 2.22: Overview of data in Upstream

29
The development of conceptual model for MDM project is still in progress
however the expected result should come out similar to the current progress of
model. As of the data discovery phase, we are working on the data verified from the
data template. The designing process is dependent on the data discovery activity.
Officially the timeline for data model has yet to begin but we have been making
good progress into it. Finalize model shall be confirm once business and the team
understand and agree to the conceptual diagram.
Figure 2.23: Current Conceptual data model progress

30
2.7 Theoretical and Practical Knowledge

The concept of designing data model were part of our degrees courses and the
internship training helps to enhance my knowledge further by applying what has been
learned against the real data and project business requirement. A comprehensive and
optimized data model aids in the creation of a logical, streamlined database that
eliminates redundancy, decreases storage needs, and allows for quick retrieval. It also
provides a single source of truth for all systems, which is critical for efficient
operations and verifiable compliance with rules and regulatory requirements.
Being exposed to all different kinds of diagrams such as ERD, UML and Data
Model diagram in Database class gives an advantage to design a data model during
internship. The challenges will be more on the technical terms used by the Oil and
Gas industry in their data and the specific business process involved. I have received
a lot of guidance and support from my peers in ensuring that the model is designed
accurately and at the same time meeting the business requirements.

Other knowledge that has been learnt and implemented including the Agile
methodology where the team which consists of scrum master, product owner and
team member. The scrum master will organize a daily scrum to raise any issues or
blockers during their day-to-day work. Besides, the stand-up serves as a purpose to
update team members on the task that we are going to do today and progress made
yesterday. This helps to not cause any bottlenecks in the process of getting the task
done on time as well as a more collaborative way of working to ensure the results
delivered with the best maximum outcome. Tasks were also being completed in
sprints where usually each project sprint lasted up to 3 weeks’ time. Additionally,
before the sprint ends, there will be a Sprint Retrospectives where the team members
will provide reflections based on the past sprint and things that could be improved in
the future. Generally, sprint retrospective will come together with sprint planning
session. Here the scrum master, product owner and team members will together
focus on the objective of the next sprint and assign tasks to respective members.

31
2.8 Problem Faced

Different project faces different challenges. Generally, the challenges are to


keep the task up to speed to meet the project dateline.

Some challenges that have been faced during Data+ includes the data are not
displayed when joined into the view table. This issue needs to be confirmed by the
GPU project’s team to assist in validating this issue. Other than that, issues regarding
the data where there is multiple ID combined in one row. This gives effect to the
description and other data related to the entity that cannot be displayed due to the
issue in ID column. To resolve this issue, I reached out to the project team member
to clarify on the ID that causes missing data when joining the tables. Upon further
investigation, I found out that the data behavior came directly from the source so we
cannot change the data source. Then, we made a decision to drop the unwanted
columns and prioritize view script that has been created from the project based on
business requirements. The issue then was resolved and test script has been approved
by the managers. Once the data has been moved to Production in Synapse, the Data
Quality team with take charge to undergo data veracity and Metadata team will
conduct semantic mapping process to the data.

Other challenges involve the process of understanding the business


requirements for MDM project. There are times where the questions raised regarding
the data cannot be answered by the data analyst and it takes time to get back to the
business. We prefer a complete sample of their data in order to assist us understand
better since we do not have access to the database. However, since this is a new
project, there were lots of things that requires thinking outside of the box. Data from
multiple digital solution received were immature and uncertain. Based on the
webinar, one pain point that the team noticed is that the business claims that data
were collected manually from forms and updated into a system by the staffs. This
has raised a concern where human errors during transferring data from one to
another. The quality of data in terms of accuracy cannot be confirmed. Finally, data
modelers also need to be able to serve a design that should prepare for when the data
is mature in the future.

32
2.9 Conclusion

As a conclusion, this chapter has explained in detail all the tasks and
activities that has been undertaken for the past 13 weeks of my training. I
have learned a lot during this period and the internship experience has
exposed me further in applying what I learned into the real world. I truly
appreciate the opportunity given to be part of the team that helps build an
enterprise data hub and treat data as its asset.
33
Chapter 3

OVERALL INFORMATION OF INDUSTRIAL TRAINING

3.1 Introduction

This chapter describes on the learning process during the internship training.
Working with the industry expertise have certainly broaden my knowledge in this
field. During the first two weeks of training, we were briefed on the company profile
and were given sufficient documentations to bring us up to speed with all the
terminologies used in PETRONAS.

3.2 Skills Improvement

In order to complete the assigned task within the project, one has to be
equipped with the knowledge of data modeling development cycle, data model
approaches, ER diagram, data marts, data warehouse, dimension model, data model
tools and others. All the theoretical knowledge has been taught during class and
implemented through the assignments and projects in university will help enhanced
Interns in the given tasks. This helps Interns to prepare for the basics technical skill in
understanding the data models and references used. Nonetheless, there are new things
to learn which will enhance the development of Interns.

Apart from technical skills, interns will also improve their soft skills especially
in analysis and problem solving, communication and time management. This training
requires teamwork between each team members to complete a certain project. The
complexity of the project will require support and expertise of one another to provide
the best solution. Through this process, Interns will learn and be prepared for the
future career paths.
34
3.2.1 Technical Skills

Being assigned into Enterprise Data group requires a lot of


understanding particularly to deal with data. Certain data can be immature and
incomplete. Our task is to be able to design a model that can be utilized for a
long time especially after the data is matured later. The idea is to avoid
excessive changes in the design that will affect the system or data in the future.
This process requires critical thinking to ensure the data in warehouse run
smoothly and data can be utilized enterprise-wide.

Technical skills would include data model approaches and all subjects
mentioned in above section. The data needs to be modelled according to
business requirements and aligned with the company’s standards. Even though
these topics have been taught in class, this is the time where the skills and
knowledge need to be applied in the real situation. Other technical skills
needed such as SQL to query data from database to understand the cardinality
and relationship of data. Additional skills would include the use of Microsoft
Office such as Microsoft Word, Excel, Powerpoint for documentation and
presentation purposes.

3.2.2 Soft Skills

The most important skill that needs to be applied would be


communication skills. All project runs on Agile method where there are daily
scrum meetings and sprint retrospectives to share updates and feedbacks, raise
blockers and improvements if needed. These projects have their respective
processes and stages that runs on a certain timeline and sprints where each
team is dependent on one another. When a particular issue is raised, team
effort is key in solving the problem as it will result the best solutions. This
will involve brain storming and different perspectives given by team members
that would save time and covers all the risk mitigated rather than trying to
solve it alone. A good communication skill is vital during this process that
leads to a more productive and timely project delivery.

35
Other skills would include analytical thinking and problem solving. It
is also important when designing a data model diagram where we need ensure
the design suits the warehouse and future analytics purposes. Most data that
have been encountered will be served into the EDH and users can utilize for
analytical purposes. As a modeler, apart from designing model that meets the
business need, we also need to ensure data is relevant for analytics and other
purposes. We have encountered the issue with data that are not mature and
incomplete so we need to consider these before coming up with a model.
Hence, all these would require problem solving skills to analyze data before
ingesting into the EDH as a single source of truth. At times, the data mart
design would also be considered if requested.

3.3 Reference Materials

The main references needed to complete tasks and models are usually the data
template received from the data analysts when the data has been confirmed and
discovered. Usually, the data template would include business glossary to understand
each attributes along with a data flow diagram as a reference. Additional information
that is provided such as data samples would be useful to understand the data better as
part of requirement gathering. Besides, other references that we use is the Industry
Data Model where we can find a standard industrial model as a reference to create our
own. Certainly, the support from respective colleagues and supervisor has been
significantly helpful during the whole internship journey.

3.4 Constructive Comment

The tasks assigned to me were relevant to the subjects that have been learnt
during the past three years. There are a lot of new things that I learned mainly related
to data architectures and the processes. University provides data engineering technical
skills especially on data architectures and this works as a foundation in performing the
tasks during the internship. We were able to apply our knowledge into the real
situation.
36
A lot of materials that have been learnt and experience in classes related to designing
data and also go in depth regarding Master Data Managements, data warehouses and
data architecture designs can be applied during this period. There is sufficient time
allocated for the intern to complete each task. Further guidance has been provided by
colleagues and supervisor to ensure task were delivered accurately and we were able
to absorb as much knowledge here as an intern to prepare for future career ahead in
this field. Lastly, it is highly recommended that future interns grab this meaningful
experience working in industries with rich datasets like PETRONAS.

3.5 Conclusion

As a conclusion, this chapter has described the overall information that relates
to the internship program along with the skills that can be improved and acquired to
work in this field. Constructive comments and some technical skills also have been
elaborated as a guidance of what to expect during the internship experience.
37
Chapter 4

CONCLUSION

4.1 Introduction

This chapter will discuss the overall conclusion of the whole industrial training
along with its achievements, problem faced during execution and how it is solved.

4.2 Overall Achievement

Along the 15 weeks of industrial training period, the tasks assigned to me have
provided opportunity to apply the knowledge into the industry world. Here in
PETRONAS, all data are related to oil and gas where new terms and information were
being introduced to every day. Fortunately, the assigned projects just kick-off during
that period of time and in data discovery mode, so we as data modeler were able to
take the time to understand the business requirements together with data analyst team
and also to go through some data documentation for the data model standards and
convention. Other achievements include improvement of soft skills in terms of
communication, analytical thinking and problem solving.

During the first month of training, we had the opportunity to go in depth


regarding PETRONAS working culture, going through all sorts of documentation,
introductions to project involved and groups under PETRONAS. It took a while to get
access to all sorts of system so we went through all documentations and process in the
meantime. For DEMAND project, the tasks assigned were smaller scope such as to
help populating information into Power Point slide for PIVOT DA TRB presentation,
gathering business requirements and had the opportunity to do a little model for
POINT Common Module before being pulled to other tasks. Initially, the task
assigned is to
38
develop a model for Common Module under POINT data source, however upon
business requirement they wanted to combine three modules into one integrated data
model. The task then has been shifted to a more experienced data modeler since
another crucial project came in for the Data+ that needed more hands.

In Data+ project, the achievement is creating view tables for two business
domains which are the Risk Management and Internal Audit group from EDH. The
task assigned were the same as other data modelers involved where we need to design
view tables from different business domains that serve meaningful information for
Data+ portal. From the business domain, 17 datasets have been created as view tables,
11 datasets successfully published into Synapse Production and has undergone data
quality profiling and semantic mapping to be ready in the portal.

Next, the goal for MDM is to come up with one combined data model that
consist of five data sources from Group Project Delivery that stores the project
information. Currently the project is in data discovery phase where the process for
validation of data attribute is ongoing between the data analysts and business. Draft of
conceptual model is also in progress and we aim to complete the design once we have
confirmed the data attributes from the discovery team.

Overall, the achievements for this semester involves completion of one project
which is the Data+ portal that is expected to launch very soon. Besides, the data
modeler task to create view tables from all business domain in EDH has been
completed. There is a lot of project processes including business process that have
been exposed to that can be considered achievement to this training.
39
4.3 Problem and Execution

It is impossible that there are no challenges throughout all project processes


however with strong team-working, this can be addressed accordingly. One of the
challenges that I faced during the early days relates to system access on business
database in order to gather requirements for POINT module in DEMAND project. As
new modeler, there were various processes needed in order to obtain database access.
Some of them took longer time than the other. Since the project is running on a
timeline, workaround solution was taken while waiting for access approval. The issue
was resolved when the data analyst team work alongside with data modeler to access
the business database through Teams Call. SQL Query were done with the assistance
of data analyst team to help the modeler understand the attribute relationship, read and
get to know the data, and create cardinalities along with entity required in designing a
model for POINT module in Downstream data warehouse.

Other challenges would include the process requirement gathering and getting
to know the project background from the team before conducting a data model.
Usually, intern will be buddy with experienced data modeler to assist in designing
model in a particular project. The advantage of having to work with experienced
colleagues is that they are more experienced and knowledgeable in performing their
work. As an example, the requirement gathering involved a creation of table that
maps entity with its cardinality. Instead of starting with the model design, a table has
been constructed to picture each relationship between entity, listing out all attributes
and the entity where it belongs have been very helpful to assist in creating model. The
illustration of table design and cardinality is available in Chapter 2. When studying
the requirement, we can easily address any issue or concerns related to the data that
needs clarification from data discovery or business team. Hence, issue can easily be
resolved once they understand our concern regarding the data. With experiences, data
modeler can easily come up with a list of question to the data discovery team and
business to clarify their data. Interns can experienced how this is done and be able to
think out of the box and further analyze should there be any possibilities of similar
scenario happen in the future. This would avoid major changes in data model design
and it will be less disruptive.

40
4.4 Opinion and Suggestions

It has been a wonderful experience to join and learn from PETRONAS as part
of the industrial training since it is a very reputable company to apply what have been
learn so far. Being part of the company widens my knowledge in oil and gas industry.
I had a better understanding in their business processes and how they manage and treat
data as an asset to the company. Each team members were treated equally including
the Interns in assigning relevant tasks that gives me an opportunity to develop. We
were guided throughout this duration and support were given when needed. This
training provides a good exposure to the real world aside to what we learned in class.
As part of the lesson learned, Interns need to be exposed more on the data
management subjects in regards to data architectures and solutioning before
commencing their industrial training. In terms of system access, it will be good if this
is granted upon onboarding. Interns will be able to perform the activity and not have
to wait until it is given. In short, this company is highly recommended for future
interns to start their career development in data related field.

4.5 Conclusion

In summary, this report has explained all tasks and activities done during the
industrial training which has been very eye-opening experience for this degree. This
experience surely encouraged interns to adapt to the industry environment as well as
enhancing skills that has been taught in university. The internship is important to
prepare students as part of new chapter in life after graduating from university. All in
all, this experience would be a very great time to learn and grab new skills as a kick
start for future career.

41
REFERENCES

Pue Giok Chu, (2021) Enterprise Data Hub, myPETRONAS Portal. Retrieved from
https://mypetronas.com/group-digital/enterprise-data
42

You might also like