0% found this document useful (0 votes)
32 views17 pages

Internship

The document outlines the details of an internship at Niveus Solutions, focusing on cloud engineering services and technologies such as BigQuery and Dataplex. It covers the internship duration, objectives, and implementation steps for integrating data management solutions. Key takeaways emphasize the importance of data governance, metadata management, and automation in enhancing data ecosystems.

Uploaded by

Alekhya Yamini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views17 pages

Internship

The document outlines the details of an internship at Niveus Solutions, focusing on cloud engineering services and technologies such as BigQuery and Dataplex. It covers the internship duration, objectives, and implementation steps for integrating data management solutions. Key takeaways emphasize the importance of data governance, metadata management, and automation in enhancing data ecosystems.

Uploaded by

Alekhya Yamini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Internship Details

PRESENTED BY: Faculty:


P Alekhya Ms. Anusha N.
Assistant Professor Gd-
III

4NM21IS100
Topics for Discussion

o Internship Certificate and Company Details


o Internship Duration
o Introduction
o Objectives
o Technologies Used
o Problem Definition
o Implementation
o Conclusion
o References
Internship Certificate and
Company Details

o Niveus Solutions, founded in


2013 is a cloud engineering
services company.
o Specializing in application,
infrastructure, data
modernization, cloud
consulting and security,
Niveus became an exclusive
partner of Google Cloud India
in 2019.
Internship Duration

o Internship Duration
Dec 20, 2024 – Feb 21, 2025
o Work Mode
Work from Office in Mangalore
o Role
Customer Engineer Intern
Introduction

 Data Warehouse: A centralized system for storing and


analyzing large volumes of structured data, used for
reporting and business intelligence.

 BigQuery: Google Cloud’s serverless and highly scalable


data warehouse that enables fast SQL queries and built-in
machine learning.

 Dataplex: A Google Cloud tool for unified data management


and governance across data lakes and warehouses,
integrating seamlessly with BigQuery.
Technologies Used

The following technologies are utilized:

 BigQuery- A fully managed data warehouse.

 Dataplex- A data fabric solution for governance and security.

 Docker - Containerization platform for deploying applications.

 DataHub - A metadata platform for data governance.

 Google Cloud Run Functions - Managed service to deploy and


run containerized applications.
BigQuery & Schema Overview

BigQuery is a serverless data warehouse that allows users


to:

 Store and query large datasets efficiently.

 Define structured schemas with columns and data types.

 Use partitioning and clustering for performance


optimization.
Dataplex Overview

 Dataplex is a unified data management solution that


helps manage data across multiple storage systems.

 Provides automated governance and security policies.

 Enables data discovery and quality checks across


warehouses and lakes.
Data Lineage

Data lineage tracks the origin, transformations, and


movement of data.

 Helps in understanding dependencies across datasets.

 Enhances compliance and auditability in data


governance.

 Dataplex provides lineage tracking via BigQuery and


Data Catalog integration.
Data Profiling

Data profiling involves analyzing datasets to extract useful


insights

 Identifies data distributions, missing values, and


anomalies.

 Helps in improving data quality and consistency.

 Enables metadata-driven decision making for data


governance.
Data Quality Scan

A data quality scan ensures clean, reliable, and accurate


data.

 Uses rules and thresholds to detect errors in datasets.

 Validates schema consistency, duplicates, and missing


value.

 Provides automated monitoring for continuous data


integrity.
Scheduled Query

Scheduled queries automate data transformations and


reporting

 Run SQL queries at predefined intervals.

 Optimize ETL workflows without manual intervention.

 Improve data availability by keeping tables updated.


DataHub Overview

DataHub is a modern metadata platform for data


governance

 Provides real-time metadata search and discovery.

 Enables data lineage tracking and impact analysis.

 Supports multi-source data ingestion including


BigQuery, MySQL, Kafka, etc.
Features of DataHub

Key features of DataHub include:

 Metadata Search - Locate datasets instantly.

 Data Lineage - Understand data dependencies.

 Access Control - Manage permissions and security.

 Schema Evolution - Track changes over time.

 Data Ingestion - Connect multiple data sources.


Implementation

Steps to integrate BigQuery Datasets into DataHub

1. Configure Dataplex for data governance and profiling.

2. Set up BigQuery datasets and scheduled queries.

3. Deploy DataHub using Docker and ingestion pipelines.

4. Automate data lineage and quality scans.


Google Cloud Run Functions

 Cloud Run hosts containerized services that can query


BigQuery using its client libraries.

 It enables on-demand data processing or analytics via


HTTP triggers (e.g., from apps or schedulers).

 It is ideal for automated workflows, real-time


dashboards, or integrating BigQuery insights into APIs.
Conclusion

Key takeaways:

1. Dataplex enhances data governance and security.

2. DataHub improves metadata management and discovery.

3. Integration of both leads to a more efficient data


ecosystem.

4. Integration of BigQuery datasets into Cloud Run functions


helps in faster automation of data.

You might also like