KDB Tutorial
KDB Tutorial
KDB Tutorial
DRAFT CONFIDENTIAL
1
First Derivatives plc
KDB+ Reference Manual 3.0
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
2
First Derivatives plc
Kdb+ Reference Manual 3.0
All rights reserved. No part of this document may be reproduced, stored in a retrieval system or transmitted
in any form or by any means, without the prior written permission of First Derivatives plc, except in the case
of brief quotations embodied in critical articles or reviews.
First Derivatives plc has made every effort in the preparation of this document to ensure the accuracy of the
information. However, the information contained in this document is provided without warranty, either
express or implied. First Derivatives plc will not be held liable for any damages caused or alleged to be
caused either directly or indirectly by this document.
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
3
Contents
INTRODUCTION 10
SAMPLE USES OF KDB+ 14
MARKET DATA CAPTURE AND DISTRIBUTION 14
RESEARCH AND MODELLING 14
EQUITY TRADING 14
FIXED INCOME TRADING 14
COMPLIANCE 15
OTHER SAMPLE FINANCIAL APPLICATIONS 15
HOW TO USE THIS MANUAL 16
ARCHITECTURE DISCUSSIONS 17
DATA CAPTURE AND CLEANSING 19
KDB+/TICK 20
MULTIPLE TICKER-PLANT ENVIRONMENTS 22
ANALYTICS 23
TRADE EXECUTION 23
STRAIGHT THROUGH PROCESSING & INTERFACING 24
AVAILABLE INTERFACES 25
DATABASE DRIVERS 25
WEB SERVER 26
APIS 26
Q 26
EFFICIENT PROGRAMMING 27
SERVER-SIDE QUERIES AND STORED PROCEDURES 27
DEDICATED SERVERS 27
QDBC V J DBC 28
GETTING STARTED 29
INSTALLATION 29
THE DEVELOPMENT ENVIRONMENT 30
COMMANDS 34
DEBUGGING 35
COMMON ERRORS 35
QUERIES 37
SAMPLE QUERIES 37
ROLLUPS 44
TOOLS FOR COMPLEX CALCULATIONS 45
DATATYPES 45
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
4
ASSIGNMENT 46
LISTS 46
DICTIONARIES AND ASSOCIATIONS 47
VERBS AND ADVERBS 49
MANIPULATING ATOMS, LISTS, DICTIONARIES AND VERBS 51
FUNCTIONS 62
ORDER OF EVALUATION 76
WORKING WITH THE DATABASE AND DATABASE DESIGN 77
CREATING TABLES 77
FOREIGN KEYS 78
DICTIONARIES AND TABLES 79
INSERT AND UPSERT 82
UPDATES AND UPDATE AGGREGATIONS 83
STORED PROCEDURES 84
TABLE ARITHMETIC 84
JOINS 85
PARAMETERS 87
Q AS AN EXTENSION OF SQL 89
DATABASE ADMINISTRATION 92
DATABASE LAYOUT 92
SMALL DATABASES 92
MEDIUM DATABASES 92
LARGE DATABASES 92
LOGS 93
NESTED DATABASES 93
PARALLEL DATABASES 93
LOADING TABLES 93
SAVING TABLES 94
DEVELOPING ANALYTICS IN Q 95
DEFINED FUNCTIONS 95
EXECUTION CONTROL 97
INTER-PROCESS COMMUNICATION 101
KDB+ DATA CLIENT 101
OPENING AND CLOSING A CONNECTION 101
ASYNCHRONOUS AND SYNCHRONOUS MESSAGES 101
MESSAGE FILTERS 102
EVALUATING MESSAGES WITH THE VALUE PRIMITIVE 103
THE CLOSE HANDLER 103
KDB+ HTTP SERVER 103
WORKING WITH FILES 103
KDB+ DATA FILES 104
TABLES 104
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
5
TEXT FILES 105
BINARY FILES 105
SPECIFYING FIELD TYPES WHEN READING FILES 105
INPUT/OUTPUT TO FILES 108
HANDLES 110
FILES 110
SOCKETS 110
INTERFACING WITH OTHER PROGRAMMES 112
GENERAL NOTES 112
DYNAMICALLY LINKED C FUNCTIONS 114
KDB+/C# API 115
KDB+/C#SAMPLE INTERFACE 118
KDB+/JAVA API 120
KDB+/J AVA INTERFACE EXAMPLE 125
KDB+/C++ API 127
TICK, TAQ AND TOW 130
KDB+/TICK ARCHITECTURE 130
COMPONENTS OF KDB+/TICK 132
FEED HANDLER 132
TICKER-PLANT 132
REAL-TIME SUBSCRIBERS 133
REAL-TIME DATABASE 133
CHAINED TICKER-PLANTS 134
HISTORIC DATABASE 134
CUSTOMISING KDB+/TICK 134
IMPLEMENTING KDB+/TICK 135
INSTALLATION 136
A BRIEF DESCRIPTION OF THE SCRIPTS 136
THE TICKER-PLANT SYSTEM 138
STARTING THE TICKER-PLANT 138
CONFIGURATION 140
THE SCHEMA FILE 140
TICKER-PLANT CONFIGURATION 140
FEED HANDLER CONFIGURATION 141
REUTERS FEEDHANDLER CUSTOMISATION 147
THE FEEDHANDLER FUNCTIONS 147
F FUNCTION 147
K FUNCTION 147
ADDING FIDS 148
CUSTOMISING THE FEEDHANDLER 149
DIADIC INITIALISATION 149
FILLING IN THE BLANKS 149
FILLING IN THE BLANKS USING DICTIONARIES 150
OTHER FUNCTIONS AND VARIABLES WITHIN THE FEEDHANDLER 150
DATABASE CUSTOMISATION 152
MESSAGE HANDLERS 152
RTD CUSTOMISATION 152
HDB CUSTOMISATION 153
REAL-TIME SUBSCRIBER AND CHAINED TICKER-PLANT DESIGN 155
WHEN TO USE WHICH 155
WRITING A CHAINED TICKERPLANT 156
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
6
PROGRAMMING CONSIDERATIONS 156
A VWAP PUBLISHER 157
THE UPD FUNCTION 157
SUBSCRIPTIONS 158
SUBSCRIBING TO MORE THAN ONE TICKERPLANT 158
PRIMING 159
MODIFYING .U.SUB 159
PUBLISHING A SNAPSHOT 159
UPDATING SUBSCRIPTION LISTS 160
.Z.PC 160
REAL-TIME SUBSCRIBERS CONTAINED IN C.Q 160
FAILURE MANAGEMENT 162
BACKUP AND RECOVERY 162
ACTIVE-ACTIVE BACKUP 162
FAILURE RECOVERY 162
BEST EFFORT RECOVERY STRATEGY 162
TICKER-PLANT FAILURE 163
REAL TIME DATABASE FAILURE 164
HISTORIC DATABASE FAILURE 164
FEED HANDLER FAILURE 165
MACHINE FAILURE 165
NETWORK FAILURE 165
REPLAYING A LOG AFTER DAY END 165
RECOVERING A CORRUPT LOG 166
OTHER CONSIDERATIONS 167
PERFORMANCE 167
USING MULTIPLE TICKER-PLANTS 167
MEMORY USAGE 167
APPENDICES 169
APPENDIX A: TROUBLESHOOTING KDB+/TICK AND KDB+/TAQ 169
MEMORY 169
CPU 169
DISK IO 169
ERRORS 169
MESSAGES 170
KDB+LICENCE 170
APPENDIX B: TECHNICAL IMPLEMENTATION OF TICKER-PLANT 171
VARIABLES 171
FUNCTIONS CONTAINED IN U.K 171
FUNCTIONS CONTAINED IN TICK.K 171
APPENDIX C: BLOOMBERG TICKER-PLANT 172
APPENDIX D: THE REUTERS FEED HANDLER 174
KDB+/TAQ HISTORICAL DATABASE 174
WHAT IS KDB+/TAQ? 174
HARDWARE REQUIREMENTS 176
INSTALLATION 177
RUNNING THE KDB+ TAQ LOADER 178
QUERIES 180
CORPORATE ACTIONS 181
HANDLING OTHER SOURCES OF HISTORICAL DATA 182
KDB+/TOW REPLAY MODULE 183
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
7
IMPLEMENTING THE REPLAY 183
INDEX 185
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
8
About First Derivatives
First Derivatives plc (www.firstderivatives.com) is a recognised and respected service provider with a global
client base. FDP specialises in providing services to both financial software vendors and financial
institutions.
The company has drawn its consultants from a range of technical backgrounds; they have industry
experience in equities, derivatives, fixed income, fund management, insurance and financial/mathematical
modeling combined with extensive experience in the development, implementation and support of large-
scale trading and risk management systems.
About Kx Systems
Kx Systems (www.kx.com) provides ultra high performance database technology, enabling innovative
companies in finance, insurance and other industries to meet the challenges of acquiring, managing and
analyzing massive amounts of data in real-time.
Their breakthrough in database technology addresses the widening gap between what ordinary databases
deliver and what today's businesses really need.
Kx Systems offers next-generation products built for speed, scalability, and efficient data management.
Strategic Partnership
First Derivatives have been working with Kx technology since 1998 and accredited partners of Kx Systems
worldwide.
First Derivatives offers a complete range of Kx technology services:
Training
Systems Architecture & Design
q development resources
Kdb+/tick implementation and customization
Database Migration
Production Support
Feedhandler developments
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
9
First Derivatives Services
First Derivatives team of Business Analysts, Quantitative Analysts, Financial Engineers, Software Engineers,
Risk Professionals and Project Managers provide a range of general services including:
Financial Engineering
Risk Management
Project Management
Systems Audit and Design
Software Development
Systems Implementation
Systems Integration
Systems Support
Beta Testing
Contact
North American Office (NY): +1 212-792-4230
European Office (UK): +44 28 3025 4870
USA
J ohn Conneely : [email protected]
Toni Kane: [email protected]
Europe
Michael ONeill : [email protected]
Victoria Shanks: [email protected]
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
10
Introduction
This manual draws extensively from documentation (in some cases the content is produced verbatim)
available on the KX Systems website including;
Kdb+Database and Language Primer
Kdb+Database Reference Manual
Abridged kdb+Database Manual
Abridged q Language Manual
q Language Reference Manual
Entries on the kdb+listbox
The purpose of this manual is to provide a reference guide which collates and organizes all publicly available
documentation related to kdb+. First Derivatives personnel will update the manual on a regular basis as new
features are added to the product. We have the largest concentrated pool of kdb+expertise in the world and
we will be including practical examples from our work in the field. Should you wish to make any contributions
we will be happy to include them if they are appropriate. To receive the latest version of the manual e-mail
Victoria Shanks ([email protected]).
The KX Systems website provides a succinct introduction to kdb+and it is reproduced below.
What is kdb+?
Kdb+, introduced in 2003, is the new generation of the kdb database. Like kdb, kdb+is designed to
capture, analyze, compare, and store data -- all at high speeds and on high volumes of data. But
more than that, kdb+was architected specifically to meet the emerging needs of leading-edge,
realtime business.
How is kdb+ suited for realtime business?
Most data management/data analysis solutions divide the world into realtime/in-memory/front-end
data and historical/on disk/back-end data. The division makes it easier for partial approaches to
claim proficiency at one or the other. Having separate front-end and back-end data management
worked all right until recently. Now enormous growth in the data volumes collected by business,
along with the need for instant analysis of data, and realtime comparisons of in-memory to
historical data, are becoming critically important to competitive differentiation. The firms that are
first to market with these realtime business applications are the ones who can maintain and
expand their competitive strategies.
With kdb+there is no architectural split between the front end and the back end data management
and analysis. We provide a single architecture for managing and analyzing data across the entire
data management chain, maintaining exceptional performance throughout. In addition, kdb+was
designed from the outset to use 64-bit memory, because 64-bit addressability is essential to
holding increasing volumes of streaming data in memory. It was also architected for extremely low
latency, enabling such time-critical applications as auto-trading and realtime risk management.
To assist customers transitioning from 32-bit to 64-bit architectures, we have added a binary-
compatible 32-bit version. But the fundamental design of the software takes full advantage of 64-bit
platforms. Kdb+gives you unlimited room to grow.
Why is a unified architecture so important?
It enables leading-edge customers to rapidly develop and deploy realtime applications that deliver
high-performance for business-critical applications including: operational risk management,
backtesting of trading strategies, business activity monitoring, and other applications that quickly
identify out-of-range patterns so that the business can respond in realtime.
The greater performance lead that kdb+gives our customers translates to increased capability to
create competitive strategies.
Why did you develop a next-generation database product?
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
11
Kx was founded in 1993, and our kdb database has been in use by leading firms since 1998. In
that time, we have seen customer needs evolve. A major business driver for the enterprise today is
the requirement to analyze increasing volumes of data on financial or energy trading
transactions, for telecom usage analysis, for realtime CRM, in regulatory compliance/risk
management, and in other high-volume areas. Firms need immediate results on these analyses,
even when billions of records are involved. Thats what realtime business is all about: viewing and
analyzing what is occurring in the business right now and comparing it on the fly to historical
patterns. Developed for high data volume applications, kdb+expands a firms ability to capture,
analyze, compare, and store enormous amounts of data -- both streaming and on disk -- with
analysis results in realtime.
Is kdb+ used only as an in-memory database?
No. Kdb+provides a full relational database management system with time-series analysis that
handles data in memory as well as stored data on disk. For advanced applications such as
backtesting of auto trading strategies or operational risk management, it is essential to be able to
compare streaming data against history. You must be able to understand where the business has
been in order judge and act upon realtime occurrences. Approaches that handle in-memory data
alone or historical data alone cant meet the needs of todays realtime enterprise, where accurate
comparison on the fly is becoming increasingly important. Approaches that try to combine a
streaming or in-memory product from one vendor with a historical product from another can't
deliver the performance necessary for realtime business, because they have to cope with two
separate architectures. Excess overhead is unavoidable with multiple architectures.
Which platforms does kdb+ run on?
Kdb+is available today for industry-standard 32- and 64-bit architectures (AMD Opteron, Intel
Xeon, and Sun) running Linux, Windows or Solaris
I've heard it's not possible to run SQL series on streaming data. Is that true?
That's untrue. Our customers have been running time-series or SQL queries on streaming data
since 2001 and achieving results in realtime, even on complex queries involving millions of
records.
What features contribute to the performance of kdb+?
Weve refined the architecture in a number of ways, based on the companys 10 years of
experience:
We expanded the data types for greater flexibility, particularly in writing time-series
analytics. While other time-series companies supply a limited time-series language, kdb+was
specifically developed to let leading-edge customers go beyond limits.
We enhanced the speed and efficiency of application development by combining our
general programming, relational, and time-series languages into a single, concise programming
language q. The q language is integrated into the database, contributing to very high query
performance. q uses English-like commands and a simple syntax. C++or SQL programmers
typically learn q in less than a day. (See the Kdb+Primer written by Dennis Shasha, Associate
Professor of Computer Science at NYU's Courant Institute.)
We reduced overhead and latency to maintain leadership performance even as data
volumes keep rising. For example, data on many securities exchanges is doubling each year. Our
product strategy has always been to maintain the lead in performance for complex data analysis,
and with kdb+we have further extended that lead for our customers.
As a relational database vendor, how do you handle streaming data?
Our product kdb+tick is a realtime ticker-plant application layered on kdb+. As data streams in from
a data feed or other source of streaming data, it becomes available for immediate relational
analysis. In addition, the data is logged so that, in case of a system failure, you do not lose the
day's data, as you would with products that support streaming or in-memory data only. Periodically,
the log file is written to the historical database -- a day's worth of realtime data (easily 50 million
records) can be written to the database in couple of minutes. In fact, kdb+tick is so fast at
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
12
managing streaming, in-memory, and stored data that some of our customers have used it to
eliminate the traditional end of day, where the database is taken off-line. Because kdb+runs at top
efficiency 24x7, it can be used to program advanced applications such as global 24x7 trading.
Is it really necessary to save all that data?
Only if your firm's strategy is to offer highly competitive, leading products. One of the reasons we
developed kdb+tick originally was in response to trading departments asking us: isn't there a way
we can save the streaming data so we can analyze it later? While it's true that small trading
problems can be solved using a streaming data or in-memory database alone, big, strategic
problems require you to be able to save data and to compare streaming or in-memory and
historical data on the fly, without losing speed anywhere along the line.
Aside from kdb+tick, do you have other layered products for kdb+?
To date, we have two in addition kdb+tick:
Kdb+tow is an application that enables traders to test sophisticated algorithms by replaying
historical ticks through their models.
Kdb+taq is a fast loader for NYSE TAQ data (distributed via CD/DVD or FTP) that enables
you to create a full 10+year history of NYSE TAQ data quickly, update it daily, and have it
immediately available for relational, time-series analysis in kdb+.
Kdb+x is a family of eXchange loaders for other sources, for example the LSE Tick and
Best Price Data.
Why should development teams and IT departments invest in new
technologies such as kdb+, when the trend is toward standard technologies?
Doing business in real time demands new technologies and fast ROI. The volumes of data
encountered in business today are like nothing the world has seen before -- and they are growing
rapidly. In addition, firms need to understand how streaming data relates to historical patterns.
Conventional database paradigms are floundering, because the relational databases of the 1980s
are no longer able to keep up with escalating volumes of data. The old model of overnight reporting
is no longer acceptable in realtime business. The business intelligence/OLAP/data warehousing
structures that were built to make relational databases more efficient are also under increasing
pressure to deliver faster analysis -- and they can't. Newer in-memory databases and streaming
data products deliver speed as long as the data is in memory, but they don't meet the needs of
realtime business, because they solve only a small part of the data volume and data analysis
problem.
What if I've already invested considerable resources in developing Java, C,
and .net programs?
Kdb+provides native C and J ava interfaces. In addition, to make up for J ava's inability to handle
large arrays, you can use our J DBC driver. To further assist you, the q language data types map
directly to J ava and .NET.
Is it complicated to administer a kdb+ database?
Not at all. Kdb+is remarkably simple to manage, because native operating system routines are
used for much of the file management, including backup and restore.
Do you need a big server to run kdb+?
No. Most of our customers begin with a 2- or 3-CPU system and grow from there. As you build a
historical database, you will need multi-terabyte storage, but kdb+is flexible -- you can use local
storage, SANs or any combination.
But what if you need a high-availability environment?
No problem -- get as big and redundant as you want to. Many Kx customers have implemented
large, fully-redundant systems, including redundant ticker-plants for kdb+tick. We support failover,
so there is no loss of data or performance. We provide local logging as well as complete replication
between data centers. Through our relationship with Cassatt, we also enable IT organizations to
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
13
deploy kdb+in distributed environments and dynamically allocate system resources to meet
realtime spikes, such as unusual peaks in market data. That way you don't need to over-invest in
big hardware dedicated to kdb+, but you also have extra capacity available instantly, when you
need it. Contact us for a demo.
Do you think the kdb+ database will replace Oracle, DB2, SQL Server and
other relational databases?
As long as another database meets your needs, use it. But for applications where you're waiting
too long for reports, or you don't have the data for implementing a realtime business application,
consider kdb+.
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
14
Sample uses of kdb+
Kdb+is used for a wide variety of purposes in many of the worlds largest institutions. Some sample usages
are given below.
Market Data Capture and Distribution
Capture data from worldwide markets seamlessly
Data feed agnostic Reuters, Bloomberg, internal feeds, Opra
Create feeds directly to kdb+ticker-plant
Easy data loading capabilities, e.g.TAQ
Publish data to TIBCO, Triarch, etc.
Calculate and republish real-time stats
Decommission legacy systems such as FAME
Unlock data in legacy systems such as Asset Control
Cleanse and enrich data
Create new internal tickers for back testing/program trading purposes
No fault tolerance, replication or redundancy issues
Model calibration
Store large volumes of analytical data
Research and Modelling
Store large volumes of historical data
Replay strategies quickly
Store large volumes of derived data
Refine strategies
Research and develop cross asset strategies
Monte Carlo simulations
Simulate quote and order scenarios
Integrate with charting applications
Kdb+facilitates a distributed architecture using thousands of blades
Equity Trading
Capture equities, futures and options from worldwide markets seamlessly
Capture Level 2 order book data
Pre-trade and post-trade analysis
Market impact calculations
Relative performance measures
Develop and create customised indices
Create volatility surfaces on the fly
Arbitrary time interval vwaps, nbbo, hlcv
Easy interface with Excel to allow on the fly pricer creation
Price complex options in real-time
Real-time model calibration
Integrate OTC options with program trading framework
Fixed Income Trading
Capture bonds, futures and options from worldwide markets seamlessly
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
15
Facilitates statistical arbitrage trading
Develop and create customised indices
Create volatility surfaces on the fly
Analytics such as yield curve construction, tree building, lattice methods readily available
Price complex options in real-time
Real-time model calibration e.g.HW calibration with swaptions
Integrate OTC options with program trading framework
Compliance
Surveillance on trade quote and order flow data
Search for specific potential violation of regulatory rules
Search for fraudulent trading patterns with aggregate views of potential violations
Real-time use to avoid future regulatory penalties
Run on historical data to provide info and ammunition to compliance officers addressing existing
regulatory complaints
Reduce the number of false positives from legacy systems
Examine order books retrospectively
Back test new surveillance algorithms
Facilitate RegNMS compliance
Facilitate pre and post trade analysis for MiFID compliance
Other Sample Financial Applications
FX Correlation trading
Data warehousing
News delivery services
Performance management
Configuration management
Pre-trade Risk Analysis
Real-time PnL
Convertible Bond Trading Systems
Structured Products with Large Data Issues
Mortgage Backed Securities Data Problems
Credit Derivative Analytics
Back Office Processing of High Trade Volumes
Monte Carlo Simulations
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
16
How to use this manual
This manual can be used as a reference guide or as a means for learning kdb+.
In general users of kdb+will be developing an application and the manual has been organized with this in
mind. When developing an application or a process in kdb+there are a number of important considerations.
The following section entitled Architecture Discussions examines the considerations in the specific context of
building a financial application. Similar considerations apply when building applications in other areas.
The table below gives some ideas in terms of how to tackle building an application from scratch.
Reference
Getting Started Being comfortable with using the console and displaying data in a
web browser is crucial for prototyping. Scripts are also a valuable
development aid.
Queries A good place to start to see the power of kdb+is to build queries.
The sample queries show the ease of syntax of kdb+and how
vast amounts of data can be queried in milliseconds.
Tools for complex calculations Kdb+has vector language properties which facilitates elegantly
expressing and rapid solving of complex algorithims. The various
datattypes can be organized as lists and dictionaries and
manipulated using powerful primitives.
Functions There are a number of standard logical and arithmetic functions.
Working with the database Kdb+is a dialect of sql92 but has a number of extensions which
make it vastly more powerful and easier to perform complex
queries.
Database Administration This section will be useful to DBAs in particular.
Developing analytics in q Execution control and function definition are explored in this
section.
Interprocess communication Kdb+has a number of features which make IPC tasks such as
messaging much easier than in other systems
Interfacing with other technologies Kdb+can work alongside a number of other programmes such as
J ava,C#and C++.
TICK,TAQ and TOW This section of the manual gives some details of three financial
applications of kdb+
Failure Management This section gives some guidance for redundancy, resilience and
failover.
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
17
Architecture Discussions
The diagram (see Kdb+/tick) shows a simplified schematic representation of how a theoretical kdb+
implementation fits within an equity trading environment. The same principles apply for other trading
environments such as an FX or fixed income trading environment. In practice the implementation of kdb+will
vary from institution to institution.
Trading decisions rely on the timely capture of market data from a multitude of sources and the cleansing
and enriching of this data to make it suitable for analysis. Kdb+gives traders a competitive advantage in
terms of the speed and quality of data which can be accessed (see Data Capture and Cleansing).
The real-time analysis of this scrubbed data facilitates automatic trading based on pre-defined programme
trading and statistical arbitrage trading algorithms. The former generally requires historical trading
information as does the back testing of strategies. Kdb+ has a number of features facilitating the
implementation of real-time trading strategies which would be impossible under other architectures. The
historical database can also be used for internal and external reporting purposes (see Analytics).
The automatic execution of a transaction is vital otherwise the pre-defined event or profitable opportunity
may disappear in fast moving markets. Theoretically before a trade is executed certain checks must be made
to ensure for example that market and credit risk limits are not breached. In practice these are often
overlooked due to the time taken to undertake the checks. As well as capturing and analysing data kdb+
supports instantaneous transaction execution (see Trade Execution).
Each new transaction has associated consequences for other front, middle and back office functional areas
such as risk management, settlements, compliance, accounting and portfolio management. Kdb+reduces
the problems associated with Straight Through Processing and interfacing with other internal and external
systems (see Straight Through Processing and Interfacing).
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
18
Data Feed 1 Data Feed 2 Data Feed 3
Feed Handlers
TAQ Loader Script
Analytics
STP/Integration
Settlements Risk Management Compliance
Client 3 Client 2 Client 1
Internal/External Reporting Programme Trading Stat. Arb. Trading
Trade Execution
Kdb_1
Captures intra-day data
Publishes to Kdb_2 every second
Publishes to Kdb_3 once daily
Kdb_2
Receives intra-day data from
Real-ti me database
Kdb_1
Accepts queries
Kdb_3
Historical database
Receives daily data from
Accepts queries
Kdb_1 at end of trading day
Interfacing
Alternatives
Other
Databases
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
19
re and Cleansing
he volume of market data needed by equity trading desks continues to grow rapidly. As well as stock prices
and ),
inte t urces including exchanges,
spe li nd data collated internally.
Colle ti
inst io
ing, checking data integrity and correctness
a sources may have different ways of treating market depth which will lead to
parability issues
t there are no data gaps when transferring real-time data in batch
to historical databases
red and the operating system used
Data Captu
T
quotes the data needed includes futures, options, index data (including index options and futures
res rate and foreign exchange data. The data is published by many so
cia sed market data organisations such as Bloomberg and Reuters a
c ng and cleansing market data poses a number of technical problems which must be resolved if the
itut ns trading operation is to rely on the data:
There are a large number of sources of data which may span internal and external sources and
different time zones
The sheer volume of data can lead to data storage issues and impossible demands on existing
system architecture including hardware and databases
There are peaks and troughs in data which may lead to unacceptable delay in capturing data during
peak flow times and may lead to stability and reliability issues
Enriching captured data (e.g. adding a timestamp or calculating an implied volatility) may not be
possible in practice without slowing down the capture process
Data is stored in different formats and may be needed in different forms for different purposes
leading to data mapping issues
Implementing data cleansing procedures such as filter
and correcting data may slow down the overall process
Different dat
consistency and com
Consolidating streaming market data with data from multiple database sources can be extremely
difficult
The captured data should be capable of handling corporate actions such as stock splits and
dividends
The distinction between real-time and historical data can be blurred in organisations which trade
round the clock it is important tha
Procedures must be in place to handle feed failure and server process failures intraday
It must be possible to query data as it is being stored
End users should be capable of generating a wide range of ad hoc queries without recourse to
additional development resources
Where possible the mechanism for displaying the data should be independent of how the data is
sto
Migration of new data sources to the existing infrastructure should be relatively straightforward
Kx Systems (www.kx.com) offer a solution which overcomes all of the above issues in the form of the
ngle integrated product that enables trading firms to capture, store and
a that is required in order to gain a competitive
Kdb+/tick product. Kdb+/tick is a si
allow their traders to query the volume of market dat
advantage in todays markets.
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
20
-plant, Real-Time Database and Historical Database are operational on a 24/7 basis.
he ticker-plant then purges its
tables. So the ticker-plant captures intra-day data but does not store it.
base holds the intra-day data and accepts queries.
In general, clients who need immediate updates of data (for example custom analytics) will subscribe
could itself be a chained ticker-plant) and publishes to its subscribers.
At the end of the day the log file is deleted and a new one is created, also the
n consists of several unique components:
dlers using the C-q interface to link into custom data feeds or even the
ustomers order and execution feeds. In practice multiple feed handlers can be used to gather data from a
-plant is a specialized kdb+process that operates as a link between the clients data feed and a
ained in the real time database. The real time database can be queried like any other
t startup, the real time database sends a message to the ticker-plant and receives a reply containing the
e many interfaces available on
db+, including C/C++, C#, J ava, QDBC and the embedded HTTP server, which can format query results in
ave already stated, the real-time database can save market data daily from the real time streaming
pplication to disk and create a historical database. This enables the user to store and analyze virtually
Kdb+/tick
The Ticker
The data from the data feed is parsed by the feed handler.
The feed handler publishes the parsed data to the ticker-plant.
Immediately upon receiving the parsed data, the ticker-plant publishes the new data to the log file
and updates its own internal tables.
On a timer loop, the ticker-plant publishes all the data held in its tables to the real-time database
and publishes to each subscriber the data they have requested. T
The real-time data
directly to the ticker-plant (becoming a real-time subscriber). Clients who dont require immediate
updates, but need a view the intra-day data will query the real-time database.
A real-time subscriber can also be a chained ticker-plant. In this case it receives updates from a
ticker-plant (which
real-time database
saves all of its data to the historical database and then purges its tables.
The Kdb+/tick solutio
1 The Ticker-plant
This is the core component, which is responsible for collecting the daily data from the specified market feed.
Currently there are ready built feed handlers for Reuters Triarch and Bloomberg. However it is relatively
simple to build custom feed han
c
number of different sources, both internal and external, and collate the data so that the user has access to all
the data that they require simultaneously.
The ticker
number of subscribers. It receives data from the data feed, appends a time stamp to it, and saves it to a log
file. On a timer loop it publishes new data to a real-time database and any clients which have subscribed to
it, and purges its tables of data. In this way the ticker-plant uses very little memory, whilst a full record of
intra-day data is maint
database.
All the data is logged to a log file as it is received to allow for disaster recovery events. The real time
database saves the data collected to the historical database on a daily basis.
2 Real-Time Database
A
data schema, the location of the log file, and the number of lines to read from the log file. The real time
database reads the log file to obtain the historic data and subscribes to the ticker-plant to receive the
subsequent updates.
It is possible to have multiple real time databases, which are dedicated to specific tasks (see Real Time
Subscribers).
The real-time database can support hundreds of clients simultaneously with no noticeable effect on
performance. Clients can connect to a real-time database using one of th
k
HTML, XML, TXT, and CSV. Using the C-q Interface or TCP/IP socket programming, custom subscribers
can be created using virtually any programming language, running on virtually any platform.
3 Historical Database
As we h
a
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
21
nlimited volumes of data. The only limitation on the volume of data that can be stored is hard disk size. With
ne growing to greater than 60 million records per day
the scalability of any solution will become more and
ond per drive and because the historical database is composed of
w drive
4 - Real Time Subscribers
Real-time subscribers are processes that subscribe to the ticker-plant and receive updates of the requested
data, similar to the real-time database. A real time subscriber can subscribe to all the data or a subset of the
data. Generally subscription is on a table and list of symbols basis, although table, columns and list of
symbols is also possible.
Typical real-time subscribers are kdb+databases that process the data received from the ticker-plant and/or
store it in local tables. The subscription, data processing, and schema of a real-time subscriber can be
easily customized.
Kdb+/tick includes a set of default real-time subscribers, which are in-memory kdb+databases that can be
queried in real-time, taking full advantage of the powerful analytical capabilities of q and the incredible speed
of kdb+. Each real-time database subscribing to the ticker-plant can support hundreds of clients and still
deliver query results in milliseconds. Clients can connect to the subscribers using the same interfaces
available to real-time databases.
Multiple real-time subscribers to the ticker-plant may be used, for example, to off-load queries that employ
complex, special-purpose analytics. The update data they receive may simply be used to update special-
purpose summary tables. Data-cleansing processes such as filtering and corrections can also be created in
this way.
Real-time subscribers are not necessarily kdb+ databases. Using the C-q Interface or TCP/IP socket
programming, custom subscribers can be created using virtually any programming language, running on
virtually any platform.
5 - Chained Ticker-plants
Real-time subscribers can also be chained ticker-plants. This means that they have subscribers themselves
which they publish updates to on a timer loop. In most cases, the chained ticker-plant will be publishing
processed data. The timer should be tuned to minimize the latency through the system whilst still coping
with the potentially large volumes of data.
If a real-time subscriber services a lot of queries every second from the same set of clients, it may be
advisable to make it a chained ticker-plant. This will reduce load on the real-time subscriber by reducing the
number of queries per second, whilst still providing all clients with the up-to-date information that they
require.
However, this may not be a possibility if some of the clients require information as soon as it is available (for
example arbitrage program trading), as the chained ticker-plant will increase the latency from the feed due to
the extra timer loop. If the timer loop cannot be made short enough, a real-time subscriber which publishes
data immediately to clients but is not a ticker-plant may be the better option.
Another reason to use a chained ticker-plant would be if the system infrastructure allowed for ad-hoc ticker-
plant subscriptions from potentially many clients. The processing load on the main ticker-plant should be
clearly defined to cope with volumes at peak-times comfortably, to guarantee service to mission critical
applications. Since ad-hoc subscriptions may substantially increase the load, the subscriptions should be to
a chained ticker-plant, ideally residing on a separate server to avoid processor conflict with the main ticker-
plant.
u
typical volumes of daily trading data for the NYSE alo
greater than 2GB of storage) and continuing to grow (
more crucial.
Kdb+/tick analytical performance keeps up with this massive amount of data. For example, Kdb+/tick can
nalyze one million prices per sec a
independent segments then we can make use of additional disk drives and CPUs beyond the standard
configuration. The recommended minimum configuration for a full tick system is 4 CPUs with 16GB RAM per
machine (2 CPUs per machine). Having two disk drives then moving half the segments to the ne
would double performance on multi-day queries by doubling the throughput.
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
22
er-Plant Environments
feeds. It may not be possible to
onsolidate the data at the feedhandler level and if this is the case multiple ticker-plants should be used, one
to captu
Rea scribe to multiple different ticker-plants. The real-time subscribers can then be
use quired. An example of this would be a risk
manage sses.
ultiple Tick M
It may often be the case that data is captured from multiple different
c
re the data from each feed.
l-time subscribers can sub
d to consolidate and/or process the data however is re
ment application would require data across multiple asset cla
First Derivatives plc KDB+Reference Manual
DRAFT CONFIDENTIAL
23
tical application of