0% found this document useful (0 votes)

53 views20 pages

MongoDB Usage by French SaaS Companies

Uploaded by

Romain Osanno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views20 pages

MongoDB Usage by French SaaS Companies

Uploaded by

Romain Osanno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DATA ANALYSIS

Session 1 (30/01 Morning)

 Introduction to SQL syntax on Google Cloud Platform

(BigQuery)

# Write the result of the query in a new table

CREATE OR REPLACE TABLE z_romain.ratings AS
SELECT *
FROM `sql-class-mines-nancy-2023.movielens.ratings`
LIMIT 1000

# 1) Get the number of ratings by movie id

SELECT movieId, COUNT(*) AS nb_ratings_by_movie
FROM `z_romain.ratings`
GROUP BY 1

# 2) Link it with the title from the movies table

SELECT m.title, COUNT(*) AS nb_ratings_by_movie

FROM `z_romain.ratings` AS r INNER JOIN `movielens.movies` AS m
ON r.movieId = m.movieId
GROUP BY 1

# 3) Keep only the top 10 movies released in 2010

SELECT m.title, COUNT(*) AS nb_ratings_by_movie
FROM `z_romain.ratings` AS r INNER JOIN `movielens.movies` AS m
ON r.movieId = m.movieId
WHERE m.title LIKE '%(2010)%' # ou regexp_contains(m.title, ‘(2010)’)
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

/!\ Tips :
- Use alias ! (<AS> name)
- Never use JOIN without specifying the type (INNER/LEFT/RIGHT/…)

 Datastudio introduction

# creation of a new table to add to the report

CREATE OR REPLACE TABLE `z_romain.datastudtio` AS
SELECT m.title, COUNT(*) as nb_ratings_by_movie
FROM `sql-class-mines-nancy-2023.movielens.ratings` AS r
INNER JOIN `sql-class-mines-nancy-2023.movielens.movies` AS m
ON r.movieId = m.movieId
GROUP BY 1

1
Figure 1 : Metrics on a table

Figure 2 : Movie distribution of ratings

Figure 3 : Movie distribution of ratings – group

2
Session 2 (30/01 Afternoon)

 Exercise on BigQuery and Datastudio

To have a look of movielens interface :

https://movielens.org/home
mail : [email protected]
mdp : mines2023

Question 1)

# First, we identify the Disney tags

SELECT tag, COUNT(*) as nb_tag, COUNT(DISTINCT movieId) as nb_distinct_movieId
FROM `sql-class-mines-nancy-2023.movielens.tags`
WHERE regexp_contains(lower(tag),'disney')
GROUP BY 1
ORDER BY 2 DESC

# Facultative (for visualization) : count the tags

SELECT COUNT(*) as nb_tag, COUNT(DISTINCT movieId) as nb_distinct_movieId
FROM `sql-class-mines-nancy-2023.movielens.tags`
WHERE regexp_contains(lower(tag),'disney')

# Then, we map it with the ratings table

WITH disney_movieId AS (SELECT DISTINCT movieId
FROM `sql-class-mines-nancy-2023.movielens.tags`
WHERE regexp_contains(lower(tag),'disney'))
SELECT movies.title, COUNT(*) as nb_ratings
FROM `sql-class-mines-nancy-2023.movielens.ratings` as ratings
INNER JOIN `sql-class-mines-nancy-2023.movielens.movies` as movies
ON ratings.movieID = movies.movieId
WHERE ratings.movieId IN (SELECT * FROM disney_movieId)
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

Figure 4 : Top 10 Disney movies with the most ratings

3
Session 3 (31/01 Morning)

 Exercise on BigQuery and Datastudio (following)

Question 2)

# Data preparation query

CREATE OR REPLACE TABLE z_romain.ratings_by_users AS
SELECT userId, COUNT(*) as nb_ratings_by_user
FROM `sql-class-mines-nancy-2023.movielens.ratings`
GROUP BY 1

Figure 5 : Movie distribution running sum – group

 Lecture : Big data & Cloud Module

Objective
Get an overview of the key concepts and tools used in data analysis work.

 Hardware basics

Reading and writing information on storage takes time (relative to the

processor speed). This speed depends on the type of storage (SSD vs magnetic).

Storage pricing on Google :

- Standard storage = 46$/monthly/1To
- SSD = 190$/monthly

4
Figure 6 : Historical cost of computer memory and storage

 Rise of distributed computing

The main challenge of Big Data is not how to store the information but how
to process it.

Example : Google 2006 Released Hadoop

Apache Hadoop is a collection of open-source software utilities that
facilitates using a network of many computers to solve problems involving
massive amounts of data and computation. It provides a software framework for
distributed storage and processing of big data using the MapReduce programming
model.

Figure 7 : Hadoop MapReduce

5
 Rise of cloud computing

Cloud computing is the on-demand availability of computer system resources,

especially data storage and computing power, without direct active management
by the user. Large clouds often have functions distributed over multiple
locations, each of which is a data centre.

Figure 8 : Market share

 What is BigQuery ? – high level vs low level language

BigQuery is a multi-parallel processing (MPP) database, serverless and fully

managed, using SQL.

- When you load data based on the volumetry, GCP will if needed optimize
the portioning of your data across multiple server. It will more over
ensure redundancy/replication to avoid data loss in case of
infrastructure incident.

- When you query data based on the volumetry to be queried, GCP will
select the right number of “computer” to run your query using a
distributed computing framework.

High-level languages are programming languages that are designed to allow

humans to write computer programs and interact with a computer system without
having to have specific knowledge of the processor or hardware that the
program will run on. They use command words and syntax which reflects everyday
language, which makes them easier to learn and use. They also offer the
programmer development tools such as libraries and built-in functions.

6
High-level vs low-level classification depends on what you are comparing :
- BigQuery is high-level in comparison to Spark
- Spark is high-level in comparison to Hadoop/MapReduce

 What is SQL ?

SQL means Structured Query Language invented in 1974. It is a declarative

language : you write code that describes what you want, but not how to get
it (in opposition to imperative language). It is a widespread standard in
data industry.

Initially, it was associated with RDMBS type databases such as MySQL (created
in 1995) and SQL Server (created in 1989), which had a similar operating mode
in the background. Distributed computed framework decided to use it later on
as well like BigQuery or Apache Spark.

Some databases were qualified as “NoSQL” databases in opposition to RDMBS

databases, which where originally the only type of database with SQL support.

 Cloud provider value proposition

Software as a service (Saas) is a software licensing and delivery model in

which software is licensed on a subscription basis and is centrally hosted.
Most French Licorns have business models caracterised as Saas based
(Contentsquare, Deezer, Mirakl, Payfit, Quonto, Spendesk).

Figure 9 : Comparison of cloud service models

IAAS : Infrastructure as a service

PAAS : Platform as a service
SAAS : Software as a service

7
Session 4 (31/01 Afternoon)

 Business case

1) How to start ?

Identify the metrics we want to analyse :

- We want to look at macro statistic
 Which indicator do we want ?
Nb ratings, rating value average and standard deviation
- We want to look at granular value (movie level)
 Just look at movie example 1 by 1 by applying previous metrics
- We want to have an idea of the distribution
 Number of users by rating value

Identify the analysis axis that could be discriminating:

- Movie related : gender, released date, nb rating by movie
- User related : rating notation, rating temporality*, nb rating by
users, first rating date

(*in comparison to the released date)

2) BigQuery

# Data preparation query : join ratings and movies

CREATE OR REPLACE TABLE z_romain.ta_ratings_movies AS
SELECT CAST(TIMESTAMP_SECONDS(CAST(timestamp AS INT64)) AS DATE) AS rating_date,
title AS m_title,
genres AS m_genres,
COUNT(*) OVER (PARTITION BY movieId) AS nb_ratings_by_movie,
CAST(REPLACE(REPLACE(SAFE.REGEXP_EXTRACT(title,'\$[0-9]{4}\$'),
'(',''),')','') AS INT64) AS m_released_year,
rating,
userId,
COUNT(*) OVER (PARTITION BY userId) AS nb_ratings_by_user
FROM `sql-class-mines-nancy-2023.movielens.ratings`
INNER JOIN `sql-class-mines-nancy-2023.movielens.movies` AS movie
USING(movieId)

8
3) Datastudio

Figure 10 : Datastudio report on the movie distribution – average

Figure 11 : Datastudio report on the movie distribution – standard deviation

9
 Lecture : Data Quality on BigQuery

Objective
Get an overview of the data quality stakes in data team

 Relational database – key takeaway for data quality

Figure 12 : Database tables in a normalized manners

Figure 12 : Database tables in a denormalized manners

Relational databases where especially designed for transaction management.

Example : ecommerce website

You want to support multiple concurrent queries (in high volumetry) :
- List of place available by category (SELECT)
- Place an “order” (INSERT)
- Update the availability of a place once it has been sold (UPDATE)
You do not want to sell twice the same place !
If you have a failure (during data update), you want to be able to revert to
a previous stage.
You need accurate data and a “solid” modelisation.

10
A relational database (RDB) is a way of structuring information in tables,
rows, and columns. An RDB has the ability to establish links (or
relationships) between information by joining tables, which makes it easy to
understand and gain insights about the relationship between various data
points.

It allows you set hard constraint on the database : unicity of a value (or a
set of value) through primary key.

 BigQuery vs Relational Database

MySQL BigQuery
OLTP – like
Analytical
Use case (Online Transaction
(OLAP – like)
Processing)
Type of queries Big number of small query Small number of huge query
Row level Bulk load,
Manipulation
(delete, update) update and delete limited
Primary keys can enforce
Constraints scheme constraint across Scheme
table
SQL support Yes Yes
Horizontal
Scalability Vertical
(i.e. distributed)

Figure 13 : MySQL vs BigQuery

 Data Quality in BigQuery

People started to replicate this constraint by writing test through “Primary

key” on a set of columns.

SELECT userId, movieId, COUNT(*) as n

FROM `sql-class-mines-nancy-2023.movielens.ratings`
GROUP BY 1, 2
HAVING n > 1

 Data Build Tool (dbt) : a booming tool in data team

Framework for testing : https://www.getdbt.com/product/data-testing/

Data folks are importing good practise from software engineering to improve
the quality management :
- Modular data modelling
- Documentation best practise

11
There is a whole system around dbt. It has been valorised at 4 billions $
and start-up are built around it.
(https://www.castordoc.com/, https://www.siffletdata.com/)

 NoSQL database and document format

NoSQL properties sacrifice data consistency for more horizontal scalability

combined with very good latency. Due to the lack of enforced schemes and
relation in NoSQL database, if their design is not well managed, they become
nightmare in term of data quality.

Example :
https://medium.com/partoo/partoo-migrates-from-mongodb-to-postgresql-
43c60854bebb

Document format is however very popular and used by other relational database:
- PostgreSQL support document (without scheme)
- BigQuery support document :
with scheme (json functions) or without scheme (arrays)

Figure 14 : example of BigQuery support document with scheme

12
Session 5 (01/02 Morning)

 Lecture : BigQuery optimization and data loading

 Data partitioning in BigQuery and cost optimization

BigQuery is sharing public data as google trend dataset so people can exercise
on BigQuery.

https://console.cloud.google.com/bigquery?hl=fr&project=sql-class-mines-
nancy-2023&ws=!1m4!1m3!3m2!1sbigquery-public-data!2swikipedia

# union operator
SELECT order_date, order_id, revenue
FROM command1
UNION ALL/DISTINCT (SELECT cast(datetime as DATE) as order_date,
orderID as order_id,
revenue
FROM command2)

Summary
- On BigQuery, you are billed on the amount of data your query process
- When you run a query on the table, it processes the full tables and
computes only the columns needed.
- BigQuery offers support to partition your table in order to optimize
it (sparing money and not wasting server resources)
 https://cloud.google.com/bigquery/docs/querying-partitioned-tables

 How indexing is working on relational database ?

Indexing strategy is similar to the “index” or the summary of a book. If you

are looking for a concept in a book, instead of reading the full book, you
read the index to get the page number. Indexing takes space (physical storage)
and time (data load when creation or update).

Indexes are a common way to enhance database performance. An index allows the
database server to fin and retrieve specific rows much faster than it could
do without an index. But indexes also add overhead to the database system as
a whole, so they should be used sensibly.

 How to load data into BigQuery ?

https://cloud.google.com/bigquery/docs/loading-
data#loading_denormalized_nested_and_repeated_data

13
 Syntax tips

- Date

https://cloud.google.com/bigquery/docs/reference/standard-
sql/date_functions

Example : creating a movie filter

WHERE date_session > DATE_ADD(CURRENT_DATE(),INTERVAL -3 DAY)

WITH ga_running_sum AS (
/* STEP 1 : Compute Running Sum */
SELECT {...}),ga_rank AS (
/* STEP 2 : Compute Rank */
SELECT * {...}
FROM ga_running_sum
WHERE date_session > DATE_ADD(CURRENT_DATE(),INTERVAL -3 DAY))
SELECT /* STEP 3 : Compute Evolution */ * {...}
FROM ga_rank
WHERE date_session = (SELECT MAX(date_session)
FROM `sql-class-mines-nancy-
2023.a_bq_window_function.stats_ga`)

- Regex

https://www.dataquest.io/blog/regex-cheatsheet/ (syntax)
https://pythex.org/ (to test my regex)

Example : extract the year from the title

We use BigQuery ‘REGEXP_EXTRACT’ function

1) From the pythex module, you identify the regex match
2) Then, you implement it in BigQuery

CAST(REPLACE(REPLACE(SAFE.REGEXP_EXTRACT(title,
'\$[0-9]{4}\$'),'(',''),')','') AS INT64) AS m_released_year

14
Session 6 (01/02 Afternoon)

 SQL – Window function module

How to do a rank ?

# We want to rank the customer order by date of acquisition

SELECT *, RANK() OVER (PARTITION BY customer_id ORDER BY date ASC) as order_number
FROM orders

For each customer, we create an order rank iterating based on the date of
order (this pattern is similar to an aggregation function) :
- GROUP BY is replaced by PARTITION BY
- OREDER BY is necessary for using a ranking function
- OVER() declares the use of an analytics function

An analytics function is the only way to use functions that require an ORDER
BY operator.

To finish with this example, be ware of the difference between RANK() and
ROW_NUMBER() :
- RANK() : if 2 orders have been placed at the same date, they will have
the same rank.
- ROW_NUMBER() : they will have a different row number (arbitrary one).

You can use it as well to compute aggregation functions that do not require
ordering (SUM()/AVG()/…) to simplify your query :

CREATE OR REPLACE TABLE z_romain.ta_ratings_movies AS

SELECT title as m_title,
genres as m_genres,
COUNT(*) OVER (PARTITION BY movieId) AS nb_ratings_by_movie,
rating,
userId,
COUNT(*) OVER (PARTITION BY userId) AS nb_ratings_by_user
FROM `sql-class-mines-nancy-2023.movielens.ratings`
INNER JOIN `sql-class-mines-nancy-2023.movielens.movies` as movie
USING(movieId)

# Focus on the running_sum operator

SELECT item, purchases, category, SUM(purchases)
OVER (
PARTITION BY category
ORDER BY purchases
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS total_purchases
FROM Produce

15
General syntax
- PARTITION BY : breaks up the input rows into separate partitions, over
which the window function is independently evaluated.
- ORDER BY : defines how rows are ordered within a partition. This clause
is optional in most situations but is required in some cases for
navigation functions.
- WINDOW_FRAME_CLAUSE : (for aggregate analytics functions) defines the
window frame within the current partition. The window frame determines
what to include in the window. If this clause is used, ORDER BY is
required except for fully unbounded windows.

We will work with ROWS : computes the window frame bases on physical offsets
from the current row. For example, you could include two rows before and
after the current row.

ROWS BETWEEN A AND A'

# Let’s discover the data

SELECT * FROM `sql-class-mines-nancy-2023.a_bq_window_function.stats_ga` LIMIT 1000

How to do it step by step ?

Metrics we need to compute :

- Sessions 2 day : session from the last two days
- Rank 30 days
- Rank 7 days
- Progression rank 30 days = rank 2 days – rank 30 days
- Progression rank 7 days = rank 2 days – rank 7 days

Query steps :
1. Compute the sessions from the last X days
(as a convention, name them : entrances_1_2, entrances_1_7,…)
2. From there, compute the rank()
3. Run the difference
4. Filter on something

One possible manner to code it

WITH ga_running_sum AS
( /* STEP 1 */
SELECT
*
,SUM(organic_entrances) OVER (PARTITION BY entity_theme,entity_id ORDER BY date_s
ession ROWS BETWEEN 1 PRECEDING AND 0 PRECEDING) as entrances_1_2
,SUM(organic_entrances) OVER (PARTITION BY entity_theme,entity_id ORDER BY date_s
ession ROWS BETWEEN 6 PRECEDING AND 0 PRECEDING) as entrances_1_7
,SUM(organic_entrances) OVER (PARTITION BY entity_theme,entity_id ORDER BY date_s
ession ROWS BETWEEN 29 PRECEDING AND 0 PRECEDING) as entrances_1_30
,SUM(organic_entrances) OVER (PARTITION BY entity_theme,entity_id ORDER BY date_s
ession ROWS BETWEEN 3 PRECEDING AND 2 PRECEDING) as entrances_3_4
,SUM(organic_entrances) OVER (PARTITION BY entity_theme,entity_id ORDER BY date_s
ession ROWS BETWEEN 13 PRECEDING AND 7 PRECEDING) as entrances_7_14
,SUM(organic_entrances) OVER (PARTITION BY entity_theme,entity_id ORDER BY date_s
ession ROWS BETWEEN 59 PRECEDING AND 30 PRECEDING) as entrances_30_60
FROM `sql-class-mines-nancy-2023.a_bq_window_function.stats_ga`

16
),ga_rank AS
( /* STEP 2 : Compute Rank */
SELECT
*
,RANK() OVER (PARTITION BY entity_theme,date_session ORDER BY entrances_1_2 DESC)
as rank_entrances_1_2
,RANK() OVER (PARTITION BY entity_theme,date_session ORDER BY entrances_1_7 DESC)
as rank_entrances_1_7
,RANK() OVER (PARTITION BY entity_theme,date_session ORDER BY entrances_1_30 DESC)
as rank_entrances_1_30
,RANK() OVER (PARTITION BY entity_theme,date_session ORDER BY entrances_3_4 DESC)
as rank_entrances_3_4
,RANK() OVER (PARTITION BY entity_theme,date_session ORDER BY entrances_7_14 DESC)
as rank_entrances_7_14
,RANK() OVER (PARTITION BY entity_theme,date_session ORDER BY entrances_30_60 DESC
) as rank_entrances_30_60
FROM ga_running_sum
WHERE date_session > DATE_ADD(CURRENT_DATE(),INTERVAL -3 DAY)
)
SELECT
/* STEP 3 : Compute Evolution */
*
,ROUND(rank_entrances_3_4 - rank_entrances_1_2 ) as evol_rank_entrances_3_4_vs_ra
nk_entrances_1_2
,ROUND(rank_entrances_7_14 - rank_entrances_1_7 ) as evol_rank_entrances_7_14_vs_
rank_entrances_1_7
,ROUND(rank_entrances_30_60 - rank_entrances_1_30 ) as evol_rank_entrances_30_60
_vs_rank_entrances_1_30
,ROUND(rank_entrances_1_7 - rank_entrances_1_2 ) as evol_rank_entrances_1_7_vs_ra
nk_entrances_1_2
,ROUND(rank_entrances_1_30 - rank_entrances_1_2 ) as evol_rank_entrances_1_30_vs_
rank_entrances_1_2
FROM ga_rank
WHERE date_session = (SELECT MAX(date_session) FROM `sql-class-mines-nancy-
2023.a_bq_window_function.stats_ga`)

 Lecture : Useful software engineering knowledge for data

 Document format and NoSQL database

The document format you discovered in your mongoDB exercises is a very popular
one. You will find it as well in python (cf dictionary data structure). It
has been since implemented in BigQuery or relational databases like
PostgreSQL. MongoDB properties sacrifices data consistency for more
horizontal scalability combined with very good latency

 Rest API and json response

Document format is a widely used standard. It is very popular in Rest API :

- Popular tool to analyse api response : https://www.postman.com/
- Module request in python to query an api (and transfer the data to a
dictionary for example)

17
What is API ?
- It is an abstract term to describe the idea of communication protocol
with a machine.
- There is various ways to do it : API is very popular and can return
data in various standard (XML.csv files, …) but json is also a very
popular one.

 Companies use Saas connector to query API

Using a Saas connector to query APIs enables to reuse scripts already

developed by others.

Examples :
- In Webedia : https://rivery.io/
- Another tool very popular among start up : https://airbyte.com/

 Versioning

It permits to track the different modifications on a code based. It is

especially useful when :
- You are several to work on the same project.
- To analyse the impact of some modifications later on.

Git is free and open source distributed version control system designed to
handle everything from small to very large projects with speed and efficiency.
It works by tracking the difference : which lines are add and which lines
are removed.

On top of it, private companies offer services to store git “repository” and
interact with it :
- https://github.com/ is one of the most popular one (it was bought by
Microsoft for 7.5 billion dollars in 2018).
- Other example : https://about.gitlab.com/

 Lecture : No code / low code and tool selection

 No code objective

No code objective is to enable a faster development cycle by using tools

accessible through a high-level language interface (datastudio and BI tools).

How ?
Reusing components/macros already coded by others allow you to spend more
time focusing on your problem and its impact, rather than on “logistics”.

18
By reducing the number of expertise needed for a project, you can :
- Focus on hiring people that fit your specific issues
- Diminish the number of stakeholders in the project, easing
communication and decision making.

 Gojob example

https://gojob.com/

Gojob is an “interim management” company. It invested a lot in tech to

increase its operational efficiency by optimizing sourcing and matching
between company and job seeker.

In collaboration with the ops team, they develop and test new process
involving tools evolution by developing them thanks to no code solution.

What gojob is looking for their no-coder :

- Product skills : understand problem and target the impact.

- Data skills : being able to access company data (understanding how it
is organized, being able to query it).
- Maker : configure no code tools.

 Strapi

Strapi is a no code / low code tool that enables you to create :

- A database
- A user interface to feed the database
- A Rest API to query the database

To do so, you configure a relational model through a no code interface. You

need to define the objects, their schemes, and the relation between them.

To test it in the cloud : https://strapi.io/demo

 Open-source trend

A lot of recent tools are available on open-source version and are monetized
on enterprise plan besides.
Examples of recent tools funded by VC : dbt, strapi.

 Make or buy trade-off and some criteria to select vendor

Why go for the “buy” ?

- Commoditization of the feature does not make it strategic
- Speed/time to market

19
Why go for the “make” ?
- Nothing fits your need
- Too expensive / you already have the expertise to develop the project
- Too strategic for the company : it should be a strategic advantage vs
the rest of the market

Check list to consider for this trade off at Webedia

1. Customization allows us to tweak the tool to Webedia’s specific extra
needs :
o Usually 80% of our needs are shared with other customers, we need
to be able to adapt the tool to solve the 20%
o Usually customization = open tool through API (<> open source)
2. Data ownership belongs to Webedia and is easily accessible
3. Pricing consideration
o Costs should not be exploding if usage grows
o Avoid “Vendor” lock-in effect, being in position to leave if
price is too high
To illustrate it with google cloud platform : there are competitors
with similar service, GCP follow market standard in term of product
design.
4. Viability of the editor allows for a long-term collaboration
5. External strategic factor
o Retailer do not want to be Amazon Web Service customers
o Webedia went for Google Cloud Platform because of more native
integrations with Google Data Product and it is big enough to be
considered in negotiation (but too small to be considered as a
competitor)

From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
Day1 - Introduction To Database
100% (1)
Day1 - Introduction To Database
29 pages
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
No ratings yet
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
117 pages
BDA2023 Outline
No ratings yet
BDA2023 Outline
7 pages
Unit 1
No ratings yet
Unit 1
19 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
134 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Bcis5420 - Lecture Note - ch6 - Big Data Technologies
No ratings yet
Bcis5420 - Lecture Note - ch6 - Big Data Technologies
24 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
84 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
Module 1
No ratings yet
Module 1
54 pages
Ese Bda
No ratings yet
Ese Bda
28 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
84 pages
Mod10-Wk10 CSG2132 Module 10 Big Data 2020
No ratings yet
Mod10-Wk10 CSG2132 Module 10 Big Data 2020
26 pages
Bda - Digital Notes
No ratings yet
Bda - Digital Notes
85 pages
Data Analysis
No ratings yet
Data Analysis
40 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
Module - 1
No ratings yet
Module - 1
84 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
33 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
BIG Data1
No ratings yet
BIG Data1
49 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Bigdata Overview PDF
No ratings yet
Bigdata Overview PDF
98 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
9 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
01 Unit-BDA - Intro BDA
No ratings yet
01 Unit-BDA - Intro BDA
37 pages
Unstructured Data: User Price Shipped
No ratings yet
Unstructured Data: User Price Shipped
14 pages
CS 441 Handouts
No ratings yet
CS 441 Handouts
300 pages
Unit 5 Da
No ratings yet
Unit 5 Da
41 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
48 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
119 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Module 1 Introduction To Big Data Analytics
No ratings yet
Module 1 Introduction To Big Data Analytics
121 pages
Understanding Big Data Analytics Types
No ratings yet
Understanding Big Data Analytics Types
45 pages
Unit 1
No ratings yet
Unit 1
118 pages
Bigdata Using Hadoop (Bca Bigdata)
No ratings yet
Bigdata Using Hadoop (Bca Bigdata)
39 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Module 1.1 - Introduction To Big Data
No ratings yet
Module 1.1 - Introduction To Big Data
18 pages
Wa0033.
No ratings yet
Wa0033.
26 pages
Bda U1
No ratings yet
Bda U1
80 pages
Introduction To Big Data & Basic Data Analysis
No ratings yet
Introduction To Big Data & Basic Data Analysis
51 pages
Big Data Analysis Fundamentals
No ratings yet
Big Data Analysis Fundamentals
43 pages
Unit 1
No ratings yet
Unit 1
76 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Chapter-1-Introduction To Big Data
No ratings yet
Chapter-1-Introduction To Big Data
25 pages
Big Data Analytics Unit-1
No ratings yet
Big Data Analytics Unit-1
39 pages
BD Unit 1
No ratings yet
BD Unit 1
5 pages
Big Data Seminar Overview
No ratings yet
Big Data Seminar Overview
30 pages
Bda Using Spark
No ratings yet
Bda Using Spark
36 pages
Big Data Analytics Notess
No ratings yet
Big Data Analytics Notess
69 pages
COMP9313: Big Data Management Overview
No ratings yet
COMP9313: Big Data Management Overview
79 pages
Big Data - Midsem
No ratings yet
Big Data - Midsem
526 pages
Azure Data Migration Expert Resume
No ratings yet
Azure Data Migration Expert Resume
2 pages
Edureka Training - Data Engineer Masters Program
No ratings yet
Edureka Training - Data Engineer Masters Program
49 pages
M Tech 1sem BDA Question Paper With Answers
No ratings yet
M Tech 1sem BDA Question Paper With Answers
98 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
175 pages
BDA Unit-1
No ratings yet
BDA Unit-1
56 pages
Big Data Technologies
No ratings yet
Big Data Technologies
14 pages
DS Syllabus Introduction (Reference)
No ratings yet
DS Syllabus Introduction (Reference)
44 pages
Introduction To Emerging Technologies: Cloud Computing
No ratings yet
Introduction To Emerging Technologies: Cloud Computing
79 pages
Unit 6 - Compression and Serialization in Hadoop
No ratings yet
Unit 6 - Compression and Serialization in Hadoop
24 pages
Internal Assesment Examination - Iv (Answer Key)
No ratings yet
Internal Assesment Examination - Iv (Answer Key)
4 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Week - 5
No ratings yet
Week - 5
7 pages
大数据系统和分析技术综述程学旗
No ratings yet
大数据系统和分析技术综述程学旗
20 pages
Cloudcomputingbasics Aselfteachingintroduction PDF
100% (1)
Cloudcomputingbasics Aselfteachingintroduction PDF
199 pages
Translation Sample - Eng To Target Language - Telecoms - ICT - Blockchain - AI
No ratings yet
Translation Sample - Eng To Target Language - Telecoms - ICT - Blockchain - AI
4 pages
NPTEL CC Assignment 5
50% (2)
NPTEL CC Assignment 5
4 pages
New-Features of SQL Server 2016-2019
No ratings yet
New-Features of SQL Server 2016-2019
28 pages
Hive Query Execution and Data Management
75% (4)
Hive Query Execution and Data Management
17 pages
Mining Chinese Social Media UGC: A Big Data Framework For Analyzing Douban Movie Reviews
No ratings yet
Mining Chinese Social Media UGC: A Big Data Framework For Analyzing Douban Movie Reviews
23 pages
Senior Data Engineer with Cloud Expertise
No ratings yet
Senior Data Engineer with Cloud Expertise
12 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
The Art of Data Science
No ratings yet
The Art of Data Science
12 pages
Data Analysis Tools Comparison: RapidMiner, Weka, R, KNIME, Orange
No ratings yet
Data Analysis Tools Comparison: RapidMiner, Weka, R, KNIME, Orange
9 pages
Hadoop Setup for CSE Students
No ratings yet
Hadoop Setup for CSE Students
17 pages
ICGTETM 2016 Proceedings PDF
No ratings yet
ICGTETM 2016 Proceedings PDF
690 pages
SAP HANA Hadoop Integration
No ratings yet
SAP HANA Hadoop Integration
16 pages
BDH UNITs
No ratings yet
BDH UNITs
2 pages
Distributed Coordination Systems
No ratings yet
Distributed Coordination Systems
16 pages
MISQ BI Special Issue Introduction Chen-Chiang-Storey December 2012 PDF
No ratings yet
MISQ BI Special Issue Introduction Chen-Chiang-Storey December 2012 PDF
24 pages
Ramya Sree - Data Engineer
No ratings yet
Ramya Sree - Data Engineer
6 pages