KDB+ Explained

Jonathan Schein
4 min readNov 18, 2020

--

Introduction

KDB+ is a row/column organized database system that falls under the umbrella of Kx systems Inc. The main purpose of this database system is to analyze, store and compare the data. Due to its pragmatic and unique features, it is able to store high volumes of data and high speeds. This blog post is meant to give a brief introduction to the KDB+ database system without going into too much technical details.

KDB+ was introduced in 2003 and is the newest form of the KDB database system which analyzes and stores large forms of data. It is broken down into two categories. The first one is the KDB+ database system which holds all the data. And the second category is q, the programming language to query from the database. These are both written in the k programming language.

Timeline

Why use KDB+?

If you are looking for a solution by analyzing and interpreting real-time data, then KDB+ is the right tool to use. There are no special needs when it comes to the hardware and storage architecture of the system, it is stored as ordinary files. Because it is just a set of files, it is easy to navigate.

Where to use KDB+?

Most big financial institutions use KDB+, making it easier to name the ones that don’t use it than it is to name the ones that do use KDB+. Big Data is increasing tremendously every day, and KDB+ is the system that can handle these huge amounts of data and store them appropriately. It both stores and analyzes the data in real time.

Architecture

KDB+ performs very well when it comes to handling large volumes of data in real-time. It is 64-bit and has built-in multi-threading and multi-processing. Analytics are run directly on the data because it has its own query language, q. KDB+-tick is the architecture that allows for the processing and querying of the real-time and historical data.

Architecture Broken Down

  1. Data Feeds is a market or time series data. It is the raw input to the feed-handler. They usually come live directly from an exchange, the news, or different data providers such as Bloomberg.
  2. Feed Handler converts the data into a format that is suitable for writing queries to the KDB+ database. It connects to the data feed and converts it instantly and sends the message to the ticker-plant process. It does the following operations. 1) Captures the data. 2) Translates the data from format A to format B. 3) Catches the most recent values.
  3. Ticker Plant is the most important feature of the architecture. This is where the database and the subscribers access the financial data. The tables are queried using q just like any other KDB+ database. Also, only the subscribers have access. Once you get the subscription, a ticker plant is defined and performs the following steps. 1) Retrieves the data from the feed handler. 2) It stores a copy of the data as a log file and updates it so there is no data loss in case of failure. 3) Clients subscribe directly to the ticker-plant. 4) At the end of the day, all the data is stored in the historical database and is sent to the subscribers. 5) Resets all the tables including the log file where todays data was stored. 6) The ticker-plant, real-time database and historical database operate 24/7.
  4. Real-time Database (RDB) stores today’s data and it is directly connected to the ticker-plant. Usually, it is stored in memory during the day (when the market is open) and written to the historical database at night (when the market is closed). As the data is stored in memory, the processing is very fast. The recommended RAM size is 4x the expected size of the day’s data. However, the query that runs on RDB has very good performance. Since RDB only has today’s data, there is no date column.
  5. Historical Database (HDB) is used for many reasons. One of them is to calculate the estimates of a certain company. HDB holds the data from the past and is only updated at the end of the day. The storage of large tables in HDB are either stored (splayed) in files or are partitioned by temporal data. Some can even be partitioned further. The reason the large table data is either splayed or partitioned is to make it easier and more efficient to search and access the data. HDB can be used for analytics (i.e. get the trades for company X on day Y from table name Z).

Sources

  1. https://www.tutorialspoint.com/kdbplus/index.htm
  2. https://www.tutorialspoint.com/kdbplus/kdbplus_overview.htm
  3. https://www.tutorialspoint.com/kdbplus/kdbplus_architecture.htm

--

--

Jonathan Schein
Jonathan Schein

Written by Jonathan Schein

Data Scientist, Brandeis University Alum and Flatiron School Alum

No responses yet