0% found this document useful (0 votes)

5 views37 pages

8-Wide Column Database and Document Database-25!01!2025

The document explains various SQL window functions including NTILE(), CUME_DIST(), ROW_NUMBER(), AVG(), SUM(), COUNT(), MIN(), MAX(), and LEAD(). Each function is described with its purpose, syntax, and examples using a dataset of workers, demonstrating how to partition and analyze data based on different criteria. The document provides SQL queries to illustrate how these functions can be applied to calculate distributions, averages, sums, and differences within specified partitions.

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views37 pages

8-Wide Column Database and Document Database-25!01!2025

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

NTILE ()

• The SQL NTILE() function partitions a logically ordered dataset into a number of buckets demonstrated
by the expression and allocates the bucket number to each row.
• The buckets are numbered from 1 through expression where the expression value must result in a
positive integer value for each partition.
• For example, the following query will allocate rows to three buckets.
• SELECT ENAME, EID, DEPTID, DEPTNAME, SALARY, NTILE(3) OVER (PARTITION BY DEPTNAME ORDER BY
SALARY) AS BUCKETS FROM workers;
ENAME EID DEPTID DEPTNAME SALARY BUCKETS
Niya 38 308 HR 45,000 1
Bobby 17 308 HR 58,000 2
Reyon 16 305 Testing 30,000 1
Jerry 15 305 Testing 35,000 2
Alice 18 305 Testing 45,000 3
John 11 301 Workshop 30,000 1
Tom 24 301 Workshop 50,000 2
Bob 22 301 Workshop 51,000 3
1
NTILE ()
• If PARTITION BY clause is excluded from the above query, then it will give results as
follows:
• SELECT ENAME, EID, DEPTID, DEPTNAME, SALARY, NTILE(3) OVER (ORDER BY SALARY)
AS BUCKETS FROM workers;

ENAME EID DEPTID DEPTNAME SALARY BUCKETS

John 11 301 Workshop 30,000 1
Reyon 16 305 Testing 30,000 1
Jerry 15 305 Testing 35,000 1
Niya 38 308 HR 45,000 2
Alice 18 305 Testing 45,000 2
Tom 24 301 Workshop 50,000 2
Bob 22 301 Workshop 51,000 3
Bobby 17 308 HR 58,000 3

2
CUME_DIST ()
• The SQL window function CUME_DIST() returns the cumulative distribution of a value within a partition
of values.
• The cumulative distribution of a value calculated by the number of rows with values less than or equal to
(<=) the current row’s value is divided by the total number of rows.
• N/totalrows
• where N is the number of rows with the value less than or equal to the current row value and total rows is the number of
rows in the group or result set. Function returns value having a range between 0 and 1.
• SELECT ENAME, EID, DEPTID, DEPTNAME, SALARY, CUME_DIST() OVER (PARTITION BY DEPTNAME ORDER
BY SALARY) AS CUME_DIST_VALUE FROM workers;
ENAME EID DEPTID DEPTNAME SALARY CUME_DIST_VALUE
Niya 38 308 HR 45,000 0.5
Bobby 17 308 HR 58,000 1
Reyon 16 305 Testing 30,000 0.3333333333333333
Jerry 15 305 Testing 35,000 0.6666666666666666
Alice 18 305 Testing 45,000 1
John 11 301 Workshop 30,000 0.3333333333333333
Tom 24 301 Workshop 50,000 0.6666666666666666
Bob 22 301 Workshop 51,000 1 3
ROW_NUMBER ()
• The SQL window function ROW_NUMBER() is used to display a row number for each
row within a specified partition.
• SELECT ROW_NUMBER() OVER (PARTITION BY DEPTNAME ORDER BY SALARY DESC) AS
ROW_NUM, DEPTNAME, DEPTID, SALARY, ENAME, EIDFROM workers;

ROW_NUM DEPTNAME DEPTID SALARY ENAME EID

1 HR 308 58,000 Bobby 17
2 HR 308 45,000 Niya 38
1 Testing 305 45,000 Alice 18
2 Testing 305 35,000 Jerry 15
3 Testing 305 30,000 Reyon 16
1 Workshop 301 51,000 Bob 22
2 Workshop 301 50,000 Tom 24
3 Workshop 301 30,000 John 11

4
AVG()
• A window function applies function across a set of table rows that are related to the current row.
• The window function does not cause rows to be clustered into a single output row; the rows maintain their separate
identities. The window function is able to access more than just the current row of the query result.
• To calculate average value of each partition, we can use window function AVG(). To calculate average salary in each
department, we can write the query as follows:
• SELECT AVG(SALARY) OVER (PARTITION BY DEPTNAME) AS AVG_SALARY, DEPTNAME, DEPTID, SALARY, ENAME, EID
FROM workers;
AVG_SALARY DEPTNAME DEPTID SALARY ENAME EID
51,500.0000 HR 308 45,000 Niya 38
51,500.0000 HR 308 58,000 Bobby 17
36,666.6667 Testing 305 35,000 Jerry 15
36,666.6667 Testing 305 45,000 Alice 18
36,666.6667 Testing 305 30,000 Reyon 16
43,666.6667 Workshop 301 30,000 John 11
43,666.6667 Workshop 301 50,000 Tom 24
43,666.6667 Workshop 301 51,000 Bob 22
5
AVG()
• Also, moving aggregate can be calculated by adding ORDER BY clause along with PARTITION BY in
window function with AVG().
• SELECT AVG(SALARY) OVER (PARTITION BY DEPTNAME ORDER BY SALARY DESC) AS AVG_SALARY,
DEPTNAME, DEPTID, SALARY, ENAME, EID FROM workers;

AVG_SALARY DEPTNAME DEPTID SALARY ENAME EID

58,000.0000 HR 308 58,000 Bobby 17
51,500.0000 HR 308 45,000 Niya 38
45,000.0000 Testing 305 45,000 Alice 18
40,000.0000 Testing 305 35,000 Jerry 15
36,666.6667 Testing 305 30,000 Reyon 16
51,000.0000 Workshop 301 51,000 Bob 22
50,500.0000 Workshop 301 50,000 Tom 24
43,666.6667 Workshop 301 30,000 John 11

6
SUM()
• The SUM() window function returns the sum of input column or the expression across input values in
each partition.
• For example, to calculate sum of salaries of workers in each department, we can write the query as
follows:
• SELECT SUM(SALARY) OVER (PARTITION BY DEPTNAME) AS SUM_SALARY, DEPTNAME, DEPTID, SALARY,
ENAME, EID FROM workers;

SUM_SALARY DEPTNAME DEPTID SALARY ENAME EID

103,000 HR 308 45,000 Niya 38
103,000 HR 308 58,000 Bobby 17
110,000 Testing 305 35,000 Jerry 15
110,000 Testing 305 45,000 Alice 18
110,000 Testing 305 30,000 Reyon 16
131,000 Workshop 301 30,000 John 11
131,000 Workshop 301 50,000 Tom 24
131,000 Workshop 301 51,000 Bob 22
7
SUM()
• If we want to calculate moving sum of salaries of each department, then we can add an ORDER
BY clause in the above query:
• SELECT SUM(SALARY) OVER (PARTITION BY DEPTNAME ORDER BY SALARY DESC) AS
SUM_SALARY, DEPTNAME, DEPTID, SALARY, ENAME, EID FROM workers;

SUM_SALARY DEPTNAME DEPTID SALARY ENAME EID

58,000 HR 308 58,000 Bobby 17
103,000 HR 308 45,000 Niya 38
45,000 Testing 305 45,000 Alice 18
80,000 Testing 305 35,000 Jerry 15
110,000 Testing 305 30,000 Reyon 16
51,000 Workshop 301 51,000 Bob 22
101,000 Workshop 301 50,000 Tom 24
131,000 Workshop 301 30,000 John 11

8
COUNT()
• The COUNT() window function counts the number of rows defined by the expression in
partition. To count employees in each department, we can write the query as follows:
• SELECT COUNT(ENAME) OVER (PARTITION BY DEPTNAME) AS COUNT_ENAME,
DEPTNAME,DEPTID, SALARY, ENAME, EID FROM WORKERS;

COUNT_ENAME DEPTNAME DEPTID SALARY ENAME EID

2 HR 308 45,000 Niya 38
2 HR 308 58,000 Bobby 17
3 Testing 305 35,000 Jerry 15
3 Testing 305 45,000 Alice 18
3 Testing 305 30,000 Reyon 16
3 Workshop 301 30,000 John 11
3 Workshop 301 50,000 Tom 24
3 Workshop 301 51,000 Bob 22

9
MIN() and MAX()
• The aggregate window functions MIN() and MAX() return the minimum and maximum values
of an expression within a specified window.
• The following query will return the maximum and minimum salaries of workers in each
department.
• SELECT DEPTNAME, DEPTID, SALARY, ENAME, EID, MAX(SALARY) OVER (PARTITION BY
DEPTNAME) AS MAX_SAL, MIN(SALARY) OVER (PARTITION BY DEPTNAME) AS MIN_SAL FROM
workers;
DEPTNAME DEPTID SALARY ENAME EID MAX_SAL MIN_SAL
HR 308 45,000 Niya 38 58,000 45,000
HR 308 58,000 Bobby 17 58,000 45,000
Testing 305 35,000 Jerry 15 45,000 30,000
Testing 305 45,000 Alice 18 45,000 30,000
Testing 305 30,000 Reyon 16 45,000 30,000
Workshop 301 30,000 John 11 51,000 30,000
Workshop 301 50,000 Tom 24 51,000 30,000
Workshop 301 51,000 Bob 22 51,000 30,000
10
LEAD()
• SQL LEAD() function has a capacity that gives admittance to a column at a predefined actual
counterbalance which follows the current row.
• For example, by utilizing the LEAD() function, from the current line, you can get information of the
following line, or the second line that follows the current line, or the third line that follows the current
line, etc.
• The LEAD() function syntax is given below:
• LEAD(return_value [,offset[, default ]])
OVER (
PARTITION BY ...
ORDER BY ...
)
• In the above syntax, return_value specifies the return value of the following row offsetting from the
current row. Offset represents the number of rows forward from the current row from which to access
data.
• The offset must be a nonnegative integer. If the offset is not specified, then it is set default to 1.
• When offset goes beyond the scope of the partition, then function returns default value. If the value is
not specified, then NULL is returned. 11
LEAD()
• The LEAD() function applies to the partitions that are created by the PARTITION BY clause. If PARTITION
BY clause is not used, then the whole result set is treated as a single partition.
• The sorting of the rows in each partition is done by the ORDER BY clause to which the LEAD() function
applies. The following query will extract the salary of the next person in the department, and if the next
person is not available in the list, then it will return a NULL value.
• SELECT ENAME, EID, DEPTID, DEPTNAME, SALARY, LEAD(SALARY) OVER (PARTITION BY DEPTNAME
ORDER BY SALARY) AS NEXT_PERSON_SALARY FROM workers;

ENAME EID DEPTID DEPTNAME SALARY NEXT_PERSON_SALARY

Niya 38 308 HR 45,000 58,000
Bobby 17 308 HR 58,000 NULL
Reyon 16 305 Testing 30,000 35,000
Jerry 15 305 Testing 35,000 45,000
Alice 18 305 Testing 45,000 NULL
John 11 301 Workshop 30,000 50,000
Tom 24 301 Workshop 50,000 51,000
Bob 22 301 Workshop 51,000 NULL
12
LEAD()
• The LEAD() function can also be very useful for calculating the difference between the value of the
current row and the value of the following row.
• The following query finds the difference between the salaries of person in the same department.
• SELECT ENAME, EID, DEPTID, DEPTNAME, SALARY, LEAD(SALARY) OVER (PARTITION BY DEPTNAME
ORDER BY SALARY)-SALARY AS SALARY_DIFFERENCE FROM workers;

ENAME EID DEPTID DEPTNAME SALARY SALARY_DIFFERENCE

Niya 38 308 HR 45,000 13,000
Bobby 17 308 HR 58,000 NULL
Reyon 16 305 Testing 30,000 5,000
Jerry 15 305 Testing 35,000 10,000
Alice 18 305 Testing 45,000 NULL
John 11 301 Workshop 30,000 20,000
Tom 24 301 Workshop 50,000 1,000
Bob 22 301 Workshop 51,000 NULL

13
FIRST_VALUE()
• The SQL window function FIRST_VALUE() returns the first value in an ordered group of a result set or
window frame.
• The following query returns the first salary value in each department ordered by salary.
• SELECT FIRST_VALUE(SALARY) OVER (PARTITION BY DEPTNAME ORDER BY SALARY DESC) AS FIRST_ROW,
DEPTNAME, DEPTID, SALARY, ENAME, EID FROM workers;

FIRST_ROW DEPTNAME DEPTID SALARY ENAME EID

58,000 HR 308 58,000 Bobby 17
58,000 HR 308 45,000 Niya 38
45,000 Testing 305 45,000 Alice 18
45,000 Testing 305 35,000 Jerry 15
45,000 Testing 305 30,000 Reyon 16
51,000 Workshop 301 51,000 Bob 22
51,000 Workshop 301 50,000 Tom 24
51,000 Workshop 301 30,000 John 11

14
LAST_VALUE()
• The SQL window function LAST_VALUE() returns the last value in an ordered group of a result set.
• LAST_VALUE() function used in SQL server is a type of window function that results the last value in an
ordered partition of the given data set.
• The following query returns the last salary value in each department ordered by salary.
• SELECT LAST_VALUE(SALARY) OVER (PARTITION BY DEPTNAME ORDER BY SALARY DESC) AS LAST_ROW,
DEPTNAME, DEPTID, SALARY, ENAME, EID FROM workers;

LAST_ROW DEPTNAME DEPTID SALARY ENAME EID

58,000 HR 308 58,000 Bobby 17 • FIRST_VALUE is the same and
45,000 HR 308 45,000 Niya 38 equal to the value in the first row
45,000 Testing 305 45,000 Alice 18 for the entire result set.
• While the LAST_VALUE changes
35,000 Testing 305 35,000 Jerry 15
for each record and is equal to the
30,000 Testing 305 30,000 Reyon 16 last value that was pulled (i.e.
51,000 Workshop 301 51,000 Bob 22 current value in the result set).
50,000 Workshop 301 50,000 Tom 24
30,000 Workshop 301 30,000 John 11
15
LAST_VALUE()
• The following is a sort of scoreboard where each person has their own set of points. To know where they stand,
each row must have a low and high score associated with it.
• SELECT IdCol, vcName, iScore,
LAST_VALUE(iScore)
OVER (ORDER BY iScore DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as
LowestiScore,
FIRST_VALUE(iScore)
OVER (ORDER BY iScore DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as
HighestiScore
ID NAME SCORE
FROM tblEmpScores;
1011 Scott 2100
1012 Peter 2220
1013 John 2010
Employee Scores Table 1014 George 2009
1015 Thomas 2500
1016 Veronica 2110
1017 Anthony 2011 16
LAST_VALUE()
• If we want the Last Value to remain the same for all rows in the result set we need to use
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING with the
LAST_VALUE function.
• UNBOUNDED PRECEDING means that the starting boundary is the first row in the partition,
and UNBOUNDED FOLLOWING means that the ending boundary is the last row in the
partition.
.
ID NAME SCORE LowestiScore HighestiScore
1011 Scott 2100 2009 2500
1012 Peter 2220 2009 2500
1013 John 2010 2009 2500
1014 George 2009 2009 2500
1015 Thomas 2500 2009 2500
1016 Veronica 2110 2009 2500
1017 Anthony 2011 2009 2500

17
LAG()
• We can use a SQL window function LAG() to access previous row’s data based on
defined offset value. It works similar to a LEAD() function.
• In the SQL LEAD() function, we access the values of subsequent rows, but in LAG()
function, we access previous row’s data.
• It is useful to compare the current row value from the previous row value.
• SELECT ENAME, EID, DEPTID, DEPTNAME, SALARY, LAG(SALARY) OVER (PARTITION BY
DEPTNAME ORDER BY SALARY) AS PREVIOUS_PERSON_SALARY FROM workers;
• The above query finds the salary of the previous person in each department based on
logically sorted salary value.
• As no previous row is available for the first row in each department, it returns a NULL
value.

18
LAG()
ENAME EID DEPTID DEPTNAME SALARY PREVIOUS_PERSON_SALARY
Niya 38 308 HR 45,000 NULL
Bobby 17 308 HR 58,000 45,000
Reyon 16 305 Testing 30,000 NULL
Jerry 15 305 Testing 35,000 30,000
Alice 18 305 Testing 45,000 35,000
John 11 301 Workshop 30,000 NULL
Tom 24 301 Workshop 50,000 30,000
Bob 22 301 Workshop 51,000 50,000

19
Preparing Data from Analytics Tool
• One of the primary steps performed for data science is the cleaning of the
dataset you are working with.
• Various SQL queries can be used to clean, update, and filter data, by eliminating
redundant and unwanted records. This can be done with the different SQL
clauses like CASE WHEN, COALESCE, NULLIF, LEAST/GREATEST, Casting, and
DISTINCT.
Sales Table
sale_no product_id quantity price customer_name
5,001 3 4 21,000 John
5,002 11 NULL 17,000 Anna
5,003 94 10 105,000 Tom
5,004 86 8 27,000 Nora
5,005 88 18 8,000 Tom

20
CASE
WHEN
• The CASE statement goes through various conditions specified with WHEN clause and returns a
value when the first condition is met.
• It works like nested IF-THEN-ELSE statement. Once a condition is true, it will return the value
specified after THEN. Value in the ELSE clause is returned, if no conditions are true.
• It returns NULL when no conditions are true, and no ELSE part is specified in the query.
• Suppose we fetch all data of the above sales table and want to add an extra column that labels as
summary which categorizes sales into More, Less, and Avg, this table can be created using a CASE
statement as follows:
• SELECT *,
CASE
WHEN quantity >= 10 THEN 'More’
WHEN quantity >= 6 THEN 'Avg’
ELSE 'Less’
END AS summary
FROM sales; 21
CASE
WHEN
sale_no product_id quantity price customer_name
5,001 3 4 21,000 John
5,002 11 NULL 17,000 Anna
5,003 94 10 105,000 Tom
5,004 86 8 27,000 Nora
5,005 88 18 8,000 Tom

sale_no product_id quantity price customer_name summary

5,001 3 4 21,000 John Less
5,002 11 NULL 17,000 Anna Less
5,003 94 10 105,000 Tom More
5,004 86 8 27,000 Nora Avg
5,005 88 18 8,000 Tom More

22
COALESCE
• Some records of database may consist of NULL values, but while applying
statistics to these datasets, you may need to replace these NULL values with
some other data. This can be done effectively by the COALESCE function.
• The first parameter to this function is a column that may consist of NULL, and the
second represents value that replaces NULL.
• It replaces all NULL values specified in column by the second default value given
in the function.
• The following example replaces NULL by −1 in the quantity column.
• SELECT
customer_name ,product_id,
COALESCE(quantity, -1) AS quantity
FROM sales;
23
COALESCE
sale_no product_id quantity price customer_name
5,001 3 4 21,000 John
5,002 11 NULL 17,000 Anna
5,003 94 10 105,000 Tom
5,004 86 8 27,000 Nora
5,005 88 18 8,000 Tom

customer_name product_id quantity

SELECT John 3 4
customer_name ,product_id, Anna 11 -1
COALESCE(quantity, -1) AS quantity Tom 94 10
FROM sales;
Nora 86 8
Tom 88 10

24
NULLIF
• NULLIF function takes two parameters and will return NULL if the first parameter
value equals the second value else returns the first parameter.
• As an example, imagine that we want to replace product_id value 11 by NULL.
This could be done with the following query:
• SELECT sale_no, customer_name,
NULLIF(product_id, 11) AS product_id
FROM sales;

25
NULLIF
sale_no product_id quantity price customer_name
5,001 3 4 21,000 John
5,002 11 NULL 17,000 Anna
5,003 94 10 105,000 Tom
5,004 86 8 27,000 Nora
5,005 88 18 8,000 Tom

sale_no customer_name product_id

SELECT sale_no, customer_name, 5,001 John 3
NULLIF(product_id, 11) AS product_id 5,002 Anna NULL
FROM sales; 5,003 Tom 94
5,004 Nora 86
5,005 Tom 88

26
LEAST/GREATEST
• The LEAST and GREATEST are frequently used functions for data cleaning.
• These functions return the least and greatest values from the given set of
elements, respectively. These functions are useful to replace value in list,
especially if it is too high or low.
• For example, minimum price needs to be 10,000 in the above table. This can be
done by the following query.
• Price 8,000 is replaced by value 10,000 in the last row, as 8,000 is less than
10,000, and it replaces it by max value among these two.
• SELECT
sale_no, product_id, quantity,
GREATEST(10000, price) as price
FROM sales;
27
GREATEST
sale_no product_id quantity price customer_name
5,001 3 4 21,000 John
5,002 11 NULL 17,000 Anna
5,003 94 10 105,000 Tom
5,004 86 8 27,000 Nora
5,005 88 18 8,000 Tom

sale_no product_id quantity price

SELECT 5,001 3 4 21,000
sale_no, product_id, quantity, 5,002 11 NULL 17,000
GREATEST(10000, price) as price 5,003 94 10 105,000
FROM sales;
5,004 86 8 27,000
5,005 88 18 10,000

28
LEAST
sale_no product_id quantity price customer_name
5,001 3 4 21,000 John
5,002 11 NULL 17,000 Anna
5,003 94 10 105,000 Tom
5,004 86 8 27,000 Nora
5,005 88 18 8,000 Tom

sale_no product_id quantity price

SELECT 5,001 3 4 10,000
sale_no, product_id, quantity, 5,002 11 NULL 10,000
LEAST(10000, price) as price 5,003 94 10 10,000
FROM sales;
5,004 86 8 10,000
5,005 88 18 8,000

29
DISTINCT
• The DISTINCT keyword returns only distinct values in the specified column value
sets.
• For example, to extract all the unique names in the sales table, you could write
the following query:
• SELECT
DISTINCT customer_name
FROM sales;
• DISTINCT clause can also be applied to multiple columns to get the distinct
combinations of the specified column.
• The above query gives the following result: It removed duplicate names from the
customer_name column.

30
DISTINCT
sale_no product_id quantity price customer_name
5,001 3 4 21,000 John
5,002 11 NULL 17,000 Anna
5,003 94 10 105,000 Tom
5,004 86 8 27,000 Nora
5,005 88 18 8,000 Tom

customer_name
SELECT John
DISTINCT customer_name Anna
FROM sales; Tom
Nora

31
Advanced NoSQL for Data
Science
• NoSQL, which means “not only SQL”, is an alternative to relational databases in which data is
stored in tables and has a fixed data schema.
• NoSQL is a database design that can accommodate various data models, including key-value,
document, columnar, and graph formats.
• NoSQL databases are very useful for working with large distributed data.
• The NoSQL databases are built in the early 2000s to deal with large-scale database clustering in
web and cloud applications.
• NoSQL has a flexible schema, unlike the traditional relational database model. All rows can have
different structures or attributes.
• NoSQL databases are found to be very useful for handling really big data tasks because it follows
the Basically Available, Soft State, Eventual Consistency (BASE) approach instead of Atomicity,
Consistency, Isolation, and Durability − commonly known as ACID properties.
• Two major drawbacks of SQL are rigidity when adding columns and attributes to tables and
slow performance when many tables need to be joined and when tables store a large amount
of data.
• NoSQL databases tried to overcome these two biggest drawbacks of relational databases.
• NoSQL offers a more flexible, schema-free solution that can work with unstructured data. 32
Why NoSQL?
• NoSQL supports unstructured data or semi-structured data.
• In many applications, an attribute usually needs to be added on the fly, for
specific rows, but not every row, and may be of different types than attributes in
the rows.
• Now let us explore some NoSQL features to understand why you should choose
NoSQL databases for data science.
• Features:
• It is not using the relational model to store data.
• NoSQL running well on clusters.
• It is mostly open-source.
• NoSQL is capable to handle a large amount of social media data.
• NoSQL is schema-less.

33
Document Databases for Data
Science
• Document-based NoSQL databases store the data in the JSON object format. Each
document has key-value pairs like structures.
• The document-based NoSQL databases are simple for engineers as they map
items as a JSON object.
• JSON is a very common data format truly adaptable by web developers and
permits us to change the structure whenever required.
• Some examples of document-based NoSQL databases are CouchDB, MongoDB,
OrientDB, and BaseX.

34
JSON Document Format
{
"_id": 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "Form", "FP" ],
"awards" : [
{
"award" :"Dowell Award",
"year" : 1988,
"by" :"Computer Society"
},
{
"award" :"First Prize",
"year" : 1993,
"by" : "National Academy of Engineering“
}
]
}
35
Wide Column Databases for Data Science
• Similar to any relational database, this wide-column database stores the data in records, but it
can also store very large numbers of dynamic columns.
• It groups the dynamically added columns into column families.
• Instead of having multiple tables like relational databases, we have multiple column families in
wide-column databases.
• Examples of wide-column types of databases are Cassandra and Hbase.

Pattern for wide-column database.

36
Graph Databases for Data Science
• Graph database stores the data in the form of nodes and edges.
• The node stores information about the main entities like people, places, and products, and the
edge stores the relationships between them.
• Graph database is very useful to find out the pattern or relationship among data like a social
network and recommendation engines.
• Examples of graph databases are Neo4j and Amazon Neptune.

Simple pattern for graph database.

Window Functions in SQL (Slides)
No ratings yet
Window Functions in SQL (Slides)
24 pages
OLAP2
No ratings yet
OLAP2
53 pages
Murachs SQL Server 2016 For Developers TOC
No ratings yet
Murachs SQL Server 2016 For Developers TOC
10 pages
The Data Driven Enterprise of 2025 Final
No ratings yet
The Data Driven Enterprise of 2025 Final
10 pages
Goldengate Internationalization Best Practices For V11.2.1: Oracle-To-Oracle
No ratings yet
Goldengate Internationalization Best Practices For V11.2.1: Oracle-To-Oracle
14 pages
DBMS - Lecture 7 Functions
No ratings yet
DBMS - Lecture 7 Functions
47 pages
Zachman Framework in Teaching Information Systems: July 2003
No ratings yet
Zachman Framework in Teaching Information Systems: July 2003
7 pages
21bce0968 VL2023240100969 Ast03
No ratings yet
21bce0968 VL2023240100969 Ast03
15 pages
DMSMP
No ratings yet
DMSMP
20 pages
Fypfinal e Freelancing 210405163027
100% (1)
Fypfinal e Freelancing 210405163027
207 pages
Window Functions and Syntax (Slides)
No ratings yet
Window Functions and Syntax (Slides)
14 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
SQL Coding Session
No ratings yet
SQL Coding Session
32 pages
CubeRollup Slides PDF
No ratings yet
CubeRollup Slides PDF
27 pages
Bsadcom 201910007
No ratings yet
Bsadcom 201910007
18 pages
S - Hadoop Ecosystem
No ratings yet
S - Hadoop Ecosystem
14 pages
Lesson 3 Measures of Central Tendency, Dispersion and Skewness An Kurtosis
No ratings yet
Lesson 3 Measures of Central Tendency, Dispersion and Skewness An Kurtosis
31 pages
HTTP WWW - Akadia
100% (1)
HTTP WWW - Akadia
17 pages
Analytical Functions
No ratings yet
Analytical Functions
9 pages
Analytic Functions
100% (1)
Analytic Functions
3 pages
Lesson 09 Window Functions in SQL
No ratings yet
Lesson 09 Window Functions in SQL
93 pages
Exponential Distribution
No ratings yet
Exponential Distribution
10 pages
Oracle Analytic Functions Session1
No ratings yet
Oracle Analytic Functions Session1
16 pages
Data Engineering On Microsoft Azure (DP-203T00) H9P83S
No ratings yet
Data Engineering On Microsoft Azure (DP-203T00) H9P83S
5 pages
Generalization
No ratings yet
Generalization
8 pages
Lab Task Dom
No ratings yet
Lab Task Dom
4 pages
Dbms Complete Notes
No ratings yet
Dbms Complete Notes
66 pages
TD Advanced SQL
No ratings yet
TD Advanced SQL
88 pages
SQL Exer1 2review
No ratings yet
SQL Exer1 2review
6 pages
LAB TASK Files
No ratings yet
LAB TASK Files
3 pages
30 Oracle Analytic Functions
No ratings yet
30 Oracle Analytic Functions
2 pages
Analaytical Function-Pravin
No ratings yet
Analaytical Function-Pravin
24 pages
Unit IV Notes
No ratings yet
Unit IV Notes
47 pages
Windowing Functions
No ratings yet
Windowing Functions
54 pages
Windows Function PPT
No ratings yet
Windows Function PPT
19 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
14 pages
ADBMS
No ratings yet
ADBMS
111 pages
MMW Module 4 Lesson 3 5 (1) .PDF S 1
No ratings yet
MMW Module 4 Lesson 3 5 (1) .PDF S 1
24 pages
Window Functions
100% (1)
Window Functions
15 pages
Data Engineering SQL Window Functions 1719829356
No ratings yet
Data Engineering SQL Window Functions 1719829356
76 pages
RDBMS
No ratings yet
RDBMS
2 pages
Window Functions
No ratings yet
Window Functions
6 pages
Lesson 09 Window Functions in SQL
No ratings yet
Lesson 09 Window Functions in SQL
95 pages
SQL Functions
No ratings yet
SQL Functions
6 pages
SQL Server Analytical Functions
No ratings yet
SQL Server Analytical Functions
9 pages
SQL Class 4 PDF Notes
No ratings yet
SQL Class 4 PDF Notes
27 pages
Machine Learning For Data Science Unit-2
No ratings yet
Machine Learning For Data Science Unit-2
11 pages
Công TH C SQL
No ratings yet
Công TH C SQL
7 pages
SQL Zero To Hero DAY-15: Important Window Functions in SQL Part-2
No ratings yet
SQL Zero To Hero DAY-15: Important Window Functions in SQL Part-2
8 pages
SQL 2
No ratings yet
SQL 2
64 pages
Advanced SQL Functions
No ratings yet
Advanced SQL Functions
12 pages
Windows Function SQL
No ratings yet
Windows Function SQL
5 pages
Window Functions
No ratings yet
Window Functions
30 pages
Window Function SQL
No ratings yet
Window Function SQL
2 pages
Windows Fun
No ratings yet
Windows Fun
4 pages
SQL Window Functions
No ratings yet
SQL Window Functions
19 pages
Newton School Free SQL Handbook
No ratings yet
Newton School Free SQL Handbook
108 pages
Windowed Aggergations
No ratings yet
Windowed Aggergations
5 pages
Mastering SQL Window Functions - 01
No ratings yet
Mastering SQL Window Functions - 01
39 pages
SQL (Window Function)
No ratings yet
SQL (Window Function)
6 pages
CNG351 Lecture 10 DML Part 1
No ratings yet
CNG351 Lecture 10 DML Part 1
19 pages
Blockchain
No ratings yet
Blockchain
38 pages
Window Functions
No ratings yet
Window Functions
10 pages
SQL Window Functions
No ratings yet
SQL Window Functions
19 pages
HND Database Design 1 For Computer Sci
No ratings yet
HND Database Design 1 For Computer Sci
39 pages
Sqldev320a Week7-2
No ratings yet
Sqldev320a Week7-2
31 pages
How To Access Online Databases
No ratings yet
How To Access Online Databases
4 pages
Privacy and Data Security Concerns in AI1
No ratings yet
Privacy and Data Security Concerns in AI1
17 pages
Window Functions Questions
No ratings yet
Window Functions Questions
6 pages
Module 2 Introduction To SQL
No ratings yet
Module 2 Introduction To SQL
22 pages
Oralcle Analytical Questions
No ratings yet
Oralcle Analytical Questions
26 pages
SQL Interview Preparation Part 4.2
No ratings yet
SQL Interview Preparation Part 4.2
3 pages
SQL Windssss
No ratings yet
SQL Windssss
17 pages
Advanced SQL
No ratings yet
Advanced SQL
4 pages
MDC Tables
No ratings yet
MDC Tables
14 pages
Learn Advanced SQL
No ratings yet
Learn Advanced SQL
48 pages
Syllabus Oracle19c (OCP) Training Professional IT Solution
No ratings yet
Syllabus Oracle19c (OCP) Training Professional IT Solution
8 pages
Window Function Revision
No ratings yet
Window Function Revision
5 pages
Oracle Window Analytic Functions
No ratings yet
Oracle Window Analytic Functions
3 pages
Mobile Recharge Report
No ratings yet
Mobile Recharge Report
110 pages
Windows Function
No ratings yet
Windows Function
25 pages
OPC UA Cloud Initiative Flyer HM 2025 Lay17-PaulB
No ratings yet
OPC UA Cloud Initiative Flyer HM 2025 Lay17-PaulB
8 pages
Abhishek Chakraborty Ece 2021
No ratings yet
Abhishek Chakraborty Ece 2021
2 pages
SQL Notes
No ratings yet
SQL Notes
5 pages
SQL Window Functions
No ratings yet
SQL Window Functions
18 pages
Software Mod7 2 Shashank
No ratings yet
Software Mod7 2 Shashank
95 pages
Lec 30
No ratings yet
Lec 30
39 pages
SQL Window Functions Interview Guide
No ratings yet
SQL Window Functions Interview Guide
2 pages
8-Product and Process Metrics, Quality Standards, IsO, TQM, Six-Sigma-07!04!2025
No ratings yet
8-Product and Process Metrics, Quality Standards, IsO, TQM, Six-Sigma-07!04!2025
56 pages
9-Wide Column Database and Document Database-25!01!2025
No ratings yet
9-Wide Column Database and Document Database-25!01!2025
43 pages
DBMS - Set 1
No ratings yet
DBMS - Set 1
10 pages
SQL Window Function !!
No ratings yet
SQL Window Function !!
30 pages
Troubleshooting and Root Cause Analysis For Aurora Failovers
No ratings yet
Troubleshooting and Root Cause Analysis For Aurora Failovers
6 pages
? Window Functions ?
No ratings yet
? Window Functions ?
10 pages
SQL Latest
No ratings yet
SQL Latest
7 pages

8-Wide Column Database and Document Database-25!01!2025

Uploaded by

8-Wide Column Database and Document Database-25!01!2025

Uploaded by

NTILE ()

ENAME EID DEPTID DEPTNAME SALARY BUCKETS

ROW_NUM DEPTNAME DEPTID SALARY ENAME EID

AVG_SALARY DEPTNAME DEPTID SALARY ENAME EID

SUM_SALARY DEPTNAME DEPTID SALARY ENAME EID

SUM_SALARY DEPTNAME DEPTID SALARY ENAME EID

COUNT_ENAME DEPTNAME DEPTID SALARY ENAME EID

ENAME EID DEPTID DEPTNAME SALARY NEXT_PERSON_SALARY

ENAME EID DEPTID DEPTNAME SALARY SALARY_DIFFERENCE

FIRST_ROW DEPTNAME DEPTID SALARY ENAME EID

LAST_ROW DEPTNAME DEPTID SALARY ENAME EID

sale_no product_id quantity price customer_name summary

customer_name product_id quantity

sale_no customer_name product_id

sale_no product_id quantity price

sale_no product_id quantity price

Pattern for wide-column database.

Simple pattern for graph database.

You might also like