MMABA1 - Data Lake Part 3
MMABA1 - Data Lake Part 3
https://drive.google.com/drive/folders/1XM4gGa7X0YXPjTYM-IeJEu3bg2XS_GHW?usp=sharing
This material belongs to Universitas Prasetiya Mulya
Do not upload and share this material to public domain. For private use only!
Quick Summary
Trans
SQL Query for Connect RDBMS
Platform
making SQL DB and Hadoop
NoSQL
scripting
Coordinating
in Clusters Transporting web
(nodes up MapRed Alt
logs (large)
down, etc)
External Data
Query Engines
Storage
Notebook Style
This material belongs to Universitas Prasetiya Mulya
Do not upload and share this material to public domain. For private use only!
Penggunaan Hadoop Kemenkeu
https://dzone.com/articles/data-lake-governance-best-practices
This material belongs to Universitas Prasetiya Mulya
Do not upload and share this material to public domain. For private use only!
Governing Data Lakes vs traditional
Load
The numbers of data sets, users, and changes are extremely
high.
Frictionless ingestion
Because a data lake stores data for future, yet tobe
determined analytics, it usually ingests the data with minimal,
if any, processing.
Encryption
There are often government or internal regulations that
require sensitive or personal information to be protected, yet
that data is needed for analysis.
Exploratory nature of work
Data scientists often do not know what’s available in the huge
and diverse data store. If analysts cannot find data that they
don’t have access to, they can’t ask for access to it.
This material belongs to Universitas Prasetiya Mulya Common Data Governance Challenges – TDAN.com
Do not upload and share this material to public domain. For private use only!
Data Governance Pillars
Data stewardship. Accountable for a portion of an
organization's data, with job duties in areas such as data
quality, security and usage.
Proactive approaches:
Tag –based data access policy
Deidentifying Sensitive Data
Implementing self service access management
Materi lengkap bisa diakses di: 9. Governing Data Access (ebookreading.net)
Membuat guidance untuk membantupemegang data secara aman dan efektif membagikan
datamenggunakan Five Data Sharing Principles (FDSP).
FDSP ini merupakan adopsi dari prinsip yang dikembangkan oleh Kantor Statistik Nasional Inggris.
Pada guidance ini terdapat gambaranaplikasidari FDSP tersebut.
Penerapan FDSP
This material belongs to Universitas Prasetiya Mulya
Do not upload and share this material to public domain. For private use only!
Recap
Data Types
Hadoop Ecosystems
NoSQL and GraphDB
Data Governance