0% found this document useful (0 votes)

47 views3 pages

4 How To Build A Stock Data Management Database

The document discusses the design of an HBase database to store stock record data. It describes HBase as a sparse, multi-dimensional, sorted mapping table with column families to organize data storage. The stock record table is designed with two column families, StockInfo and Statistic, to store basic stock information and statistics respectively. The design of the rowkey is also important for load balancing, with this example using a hash of the year and month combined with the stock code and date to improve data distribution across regions. Multiple pre-partitioned regions were also created when building the table to further aid balancing. The data is then inserted into HBase from a CSV file using an import command.

Uploaded by

Rokon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views3 pages

4 How To Build A Stock Data Management Database

Uploaded by

Rokon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Hbasse database model design

Unlike traditional relational databases, Hbase is a sparse, multi-dimensional, sorted mapping

table. Each table has a set of column families, and HBase organizes the physical storage of data
through the concept of column families. Each row has a sortable primary key (the row keys of one
of the primary keys are in lexicographical order) and any number of columns. The data is stored
in the unit determined by the row and column, and the data type is a byte array byte[]. Due to the
schemaless nature of HBase, each row of data in the same table can have completely different
columns. All databases in HBase have a time stamp when they are updated. Each update is a new
version. HBase will retain a certain number of versions. The client can choose to obtain the value
of the version unit closest to a certain point in time, or Get the values of all version units at once.
According to the above characteristics of HBase, the stock record table in this practice project is
designed, as shown in Table 1. The table is divided into two column families, the column family
StockInfo contains basic stock information, and the column family Statistic contains stock
statistics.

Table 1: Stock Record Sheet

Column family(StackInfo) Column family(Statistic)

Rowkey
Code Name GMV RANGE

03_000001.SZ_20071228 000001.SZ PA Bank 2376508379 50

03_000002.SZ_20071228 000002.SZ WK A 601376594.4 73.3333

……

13_000004.SZ_20071228 000004.SZ BA 2832630790 11.1484

13_000005.SZ_20071228 000005.SZ SWY A 2935635546 12.7659

The data in Hbase is sorted by rowkey.

Rowkey design
The design of the HBase row key is very important. The unique identifier of a piece of data is the
rowkey. Which partition the data is stored in depends on which pre-partition range the rowkey is
in. The main purpose of designing the rowkey is to make the data evenly distributed in all In the
region of , data skew is prevented to a certain extent.

Example of rowkey design scheme: take the high bit of Rowkey as the hash field, the year and
month of the stock are used to hash the remainder, the median is the stock code, and the low bit
is the time field, for example:

hash value (202004) %299 + “_” + stock code + “_” + stock date

This will improve the probability of data balancing across each Regionserver to achieve load
balancing. If there is no hash field, the first field is directly the time information, which will cause a
hotspot phenomenon in which all new data is accumulated on one RegionServer. At the same
time, when data retrieval is performed, the load will be concentrated on individual RegionServers,
reducing query efficiency.
The last is the creation of the table. When HBase creates a table by default, there is a region. The
rowkey of this region has no boundaries, that is, there is no startkey and endkey. When data is
written, all data will be written to the default region. As the amount of data continues to increase,
this region has If it cannot bear the growing amount of data, it will be split and divided into 2
regions. During this process, two problems will arise:

1. When data is written to a region, there will be a problem of writing hot spots.
2. Region split consumes valuable cluster I/O resources.

Therefore, for load balancing, when building a table, create multiple empty regions for pre-
partitioning, so that stocks of different years and months exist in different regionservers. Since
the data stored in the same column family has the same characteristics, for the stock
dataset, it is divided into two column families, StackInfo and Statisitc, to save the basic
information and statistical information of the stock.

The columns under the column cluster do not need to be created in advance, and can be specified
by: when needed.

Insert data
Because we usually need to manipulate a large amount of data, we need to insert batches of data
into HBase. Here I will use 1.csv as an example.

I simply design the rowkey as "code-date", and delete the first row, because the first row will
affect the batch operation. The modified file is as shown below.

First upload the local csv file to HDFS, which is operated by command.

hdfs dfs -mkdir /hadoop

hdfs dfs -mkdir/hadoop/input
hdfs fs -put /home/whh/Documents/1.csv /hadoop/input/1.csv
Because the table has been built before, the next step is to use the command to operate, and the
mapreduce package is borrowed for import.

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -

Dimporttsv.columns=HBASE_ROW_KEY,StockInfo:abbreviation,StockInfo:date,Statistic
:previous-closing-price,Statistic:opening-price,Statistic:max-
price,Statistic:min-price,Statistic:closing-price,Statistic:trading-
volume,Statistic:transaction,Statistic:ups-and-
downs,Statistic:range,Statistic:average,Statistic:turnover-rate,Statistic:total
HbaseStock /hadoop/input/1.csv

This will appear after running the command

check:

Up to this point, the data of the first table has been inserted successfully. Other data can follow
the previous process.

Visualizing Pi System Data WorkBook
100% (1)
Visualizing Pi System Data WorkBook
232 pages
Data Protection and Data Privacy
No ratings yet
Data Protection and Data Privacy
75 pages
PHP Shopping Cart Tutorial Using SESSIONS - Step by Step Guide!
No ratings yet
PHP Shopping Cart Tutorial Using SESSIONS - Step by Step Guide!
52 pages
DBMS Unit1
No ratings yet
DBMS Unit1
45 pages
Chandra Finn: Work Experience
No ratings yet
Chandra Finn: Work Experience
1 page
Lab Manual File: Course Coordinator
No ratings yet
Lab Manual File: Course Coordinator
49 pages
DB2 Commands
100% (1)
DB2 Commands
3 pages
Even The Best DBAs Miss These
No ratings yet
Even The Best DBAs Miss These
13 pages
PROJECT-TheSIS 2 (Doctor Appoitnment Syestem) (1) (1) - 1
No ratings yet
PROJECT-TheSIS 2 (Doctor Appoitnment Syestem) (1) (1) - 1
68 pages
Chapter 3
No ratings yet
Chapter 3
12 pages
Week - 2 Assignment - Introduction To AI Programming
No ratings yet
Week - 2 Assignment - Introduction To AI Programming
3 pages
ACI Course Descriptions
No ratings yet
ACI Course Descriptions
21 pages
Case Study Dbms
No ratings yet
Case Study Dbms
21 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Unit 9 Library and Information Networks and Consortia: 9.0 Objectives
No ratings yet
Unit 9 Library and Information Networks and Consortia: 9.0 Objectives
20 pages
Romney Ais13 PPT 17
No ratings yet
Romney Ais13 PPT 17
9 pages
National University of Modern Languages: Complex Computing Problem (CCP)
No ratings yet
National University of Modern Languages: Complex Computing Problem (CCP)
5 pages
Chapter 2: SQL Server Reporting Services: Objectives
No ratings yet
Chapter 2: SQL Server Reporting Services: Objectives
42 pages
Crystal Report
No ratings yet
Crystal Report
5 pages
Merit Databse
No ratings yet
Merit Databse
53 pages
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
No ratings yet
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
13 pages
Vaibhav K Resume
No ratings yet
Vaibhav K Resume
2 pages
Bacdrive For Bacteria Classification
No ratings yet
Bacdrive For Bacteria Classification
6 pages
01-Introduction To DS With Python
No ratings yet
01-Introduction To DS With Python
32 pages
Coursera - Data Analytics - Course 4
No ratings yet
Coursera - Data Analytics - Course 4
6 pages
pl-300 8
No ratings yet
pl-300 8
35 pages
Study The Image Below This Question. What Happens If You Press The Icon Circled in Orange Located at The Top Left-Hand Corner of The Worksheet?
No ratings yet
Study The Image Below This Question. What Happens If You Press The Icon Circled in Orange Located at The Top Left-Hand Corner of The Worksheet?
3 pages
Question Text: It Matches One To Any Number of Occurrences of The Preceding Character
No ratings yet
Question Text: It Matches One To Any Number of Occurrences of The Preceding Character
10 pages
Resume Anvesh Garg Recent
No ratings yet
Resume Anvesh Garg Recent
2 pages
Syllabus It 221-Information Management 2ND Sem 2021-2022
100% (1)
Syllabus It 221-Information Management 2ND Sem 2021-2022
8 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2133)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

4 How To Build A Stock Data Management Database

Uploaded by

4 How To Build A Stock Data Management Database

Uploaded by

Hbasse database model design

Unlike traditional relational databases, Hbase is a sparse, multi-dimensional, sorted mapping

Table 1: Stock Record Sheet

Column family(StackInfo) Column family(Statistic)

03_000001.SZ_20071228 000001.SZ PA Bank 2376508379 50

03_000002.SZ_20071228 000002.SZ WK A 601376594.4 73.3333

13_000004.SZ_20071228 000004.SZ BA 2832630790 11.1484

13_000005.SZ_20071228 000005.SZ SWY A 2935635546 12.7659

The data in Hbase is sorted by rowkey.

hdfs dfs -mkdir /hadoop

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -

This will appear after running the command

You might also like