0% found this document useful (0 votes)

50 views11 pages

DS Report

The document discusses high frequency price prediction of index futures. It provides an abstract describing the goal of predicting 1-second price fluctuations using order book data. It then reviews relevant literature on identifying high frequency trading, forecasting methodology, and the Lee-Mykland estimator. The literature review discusses advantages and limitations of the papers. Next, it summarizes the outcomes of the literature review papers. Finally, it provides an overview of the existing Lee-Mykland methodology for estimating integrated volatility from high-frequency financial data.

Uploaded by

rohil.221ai033

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views11 pages

DS Report

Uploaded by

rohil.221ai033

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Course Project Report

High Frequency Price Prediction of Index Futures

Submitted By

Abhaysingh Rajput (221AI002)

Rohil Sharma (221AI033)

as part of the requirements of the course

Data Science (IT258) [Dec 2023 - Apr 2024]

in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology in Artificial Intelligence

under the guidance of

Dr. Sowmya Kamath S, Dept of IT, NITK Surathkal

undergone at

D EPARTMENT OF I NFORMATION T ECHNOLOGY

N ATIONAL I NSTITUTE OF T ECHNOLOGY K ARNATAKA , S URATHKAL

DEC 2023 - APR 2024

DEPARTMENT OF INFORMATION TECHNOLOGY
National Institute of Technology Karnataka, Surathkal

CERTIFICATE

This is to certify that the Course project Work Report entitled “High Frequency Price Prediction
of Index Futures” is submitted by the group mentioned below -

Details of Project Group

Name of the Student Register No. Signature with Date

Abhaysingh Rajput 2210587

Rohil Sharma 2210455

this report is a record of the work carried out by them as part of the course Data Science (IT258)
during the semester Dec 2023 - Apr 2024. It is accepted as the Course Project Report submission in
the partial fulfillment of the requirements for the award of the degree of Bachelor of Technology in
Artificial Intelligence.

(Name and Signature of Course Instructor)

Dr. Sowmya Kamath S
Associate Professor, Dept. of IT, NITK
DECLARATION

We hereby declare that the project report entitled “High Frequency Price Prediction of Index
Futures” submitted by us for the course Data Science (IT258) during the semester Dec 2023 - Apr
2024, as part of the partial course requirements for the award of the degree of Bachelor of Technology
in Artificial Intelligence at NITK Surathkal is our original work. We declare that the project has not
formed the basis for the award of any degree, associateship, fellowship or any other similar titles
elsewhere.

Details of Project Group

Name of the Student Register No. Signature with Date

1. Abhaysingh Rajput 2210587

2. Rohil Sharma 2210455

Place: NITK, Surathkal

Date: Update date of submission
High Frequency Price Prediction of Index Futures
Abhaysingh Rajput1 , Rohil Sharma2

Abstract— The dynamics of short-term commodity prices in B. Rajan Lakshmi A/2017

financial markets using high-frequency trading data requires
careful data processing In this paper, frequent order book this paper explores Low latency trading impact on Indian
pricing which will take place in 1 second in future contracts. capital markets and explores the merits and demerits of
A detailed procedure for data generation is described To High frequency trading.
understand data characteristics such as distributions, patterns,
and relationships among variables, they must first be analyzed
and visualized. 1) Advantages: Benefits of HFT algorithms include
increased liquidity, better efficiency, increased returns for
Techniques such as correlation analysis help identify the investors,lower volitality.
most important factors. Other variables such as disposable
income indicators are developed using available data to
improve the accuracy of the forecast of price changes. 2) Limitation: Harmful HFT Algorithms exist that might
Advanced algorithms are used to identify missing values from cause disruption in the arket such as high Intra - Day
similar data points and fill in the missing data points. Thus the volatility and high Order - to - Trade Ratio
original distribution of data is maintained. Before modeling,
the data are normalized one last time to put all variables on
the same scale. Through these intensive processes, raw order
book data is optimized to feed into machine learning models C. Spyros Makridakis, Evangelos Spiliotis/ March 27, 2018
such as gradient growth and random forests High-frequency The methodology involves data preprocessing to enhance
trading data requires careful data structure in predictive
models capable reliable construction forecasting accuracy through techniques like seasonal
adjustments, transformations, and trend removal. It also
explores multiple-horizon forecasting approaches, including
I. INTRODUCTION iterative, direct, and multi-neural network methods,
Forecasting future price fluctuations in the financial mar- evaluating their accuracy and computational complexity
kets is a highly sought-after goal. The emergence of high- across various forecasting horizons. Challenges such as
frequency trading and the accessibility of detailed order book overfitting and computational complexity are addressed,
information have created new opportunities to investigate with future research directions proposed for improvement.
the complex dynamics and patterns that influence market
behavior. This paper explores the challenging problem of 1) Advantages: The methodology improves forecasting
predicting 1-second price fluctuations for a futures contract accuracy through diverse preprocessing techniques and
by utilizing the abundance of data seen in high-frequency multiple-horizon forecasting approaches while addressing
order book snapshots. challenges like overfitting and computational complexity,
II. L ITERATURE S URVEY laying the groundwork for further advancements in
forecasting methodologies.
A. Mostafa Goudarzi / 2023
We aim to identify High-Frequency Trading (HFT) using 2) Limitation: The methodology faces challenges includ-
selected features and a suitable machine learning algorithm. ing computational complexity, potential overfitting, manual
Our goal is to achieve Automated Market-Making (AMM) preprocessing requirements, and lack of uncertainty estima-
classification using fuzzy logic. Once we have identified the tion in ML forecasts.
optimal algorithm, it is tested on different trading day data
and interpret the results. III. O UTCOME OF L ITERATURE S URVEY
1) Advantages: The results of the analysis of the samples 1) Paper-1: The results of the analysis of the samples
confirmed that among several supervised learning strategies, confirmed that among several supervised learning strategies,
the cluster model combined with random low sampling is the cluster model combined with random low sampling is
the most efficient method to identify HFTs in sequence the most efficient method for sequence detection of HFTs
book in the data in order book data .
The model showed the best performance in classifying data
2) Limitation: Comparitive study to identify optimal for the other five trading days of the week and gave a good
algorithm perform for all the rest days .
2) Paper-2: With the rise of algo trading and high- the effectiveness of preprocessing techniques through
frequency trading (HFT), there have been notable cross-validation and performance metrics.
improvements in various aspects of market dynamics.
These include reduced transaction costs, volatility, and
a more balanced buy-sell ratio. Market efficiency has VI. E XISTING M ETHODOLOGY
improved, aiding in better price discovery. Colocation
services have also contributed by minimizing latency and Lee and Mykland developed the LM estimator, a tool in
creating a level playing field for HFT market participants. financial econometrics for estimating integrated volatility
of asset prices. It’s particularly useful in noisy market
However, leveraging advanced technology for algo conditions. Here’s a simplified overview:
trading and HFT requires substantial technical expertise
and resources. The lack of proper controls has introduced 1. Input:
systemic risks, with the potential for significant deviations - High-frequency price data (e.g., stock prices).
from healthy market prices due to errors or faulty algorithms. - Sampling frequency (e.g., tick-by-tick data).
This poses a challenge for traditional investors who prioritize
fundamental analysis.
2. Preprocessing:
- Remove outliers.
HFT firms often utilize specialized services such as co -
location facilities and have direct access to the raw market - Ensure evenly sampled data
data . While HFT can contribute to price fluctuations and
short-term volatility, it also brings efficiencies to market 3.Initialization:
operations. - Set parameters like observation count and time window
length.
3) Paper-3: The literature survey highlights the com- - Decide on handling irregularities like price jumps.
parative forecasting performance of machine learning (ML)
and statistical methods. It reveals that while ML methods 4. Estimation Algorithm:
offer computational advantages, they often fall short in - Calculate realized volatility for each observation.
accuracy compared to simpler statistical models. The survey - Smooth volatility using a kernel function to reduce noise.
underscores the need for ML methods to improve accuracy, - Aggregate smoothed estimates over a time window.
reduce computational complexity, automate preprocessing,
and provide uncertainty estimates for practical forecasting 5. Output:
applications. - Integrated volatility estimates for each time period.
IV. P ROBLEM S TATEMENT The LM estimator involves complex math, but these steps
give a basic understanding of how it works.
To do identify and apply new data preprocessing
methodologies and to meticulously analyze, visualize the
data, refine data,and do advanced eda do understand the
under lying intution behind the data and explore new ways to VII. P ROPOSED ENHANCEMENTS / NOVELTY
make it better ,hence making it good for predictive modeling
VIII. DATASET
purposes and choosing suitable model for predictions.
A. About The Data
V. O BJECTIVES The raw dataset used in this paper can be accessed
•Perform exploratory data analysis (EDA) to understand at: https://www.kaggle.com/competitions/
the distribution , explore patterns, and the various caltech-cs155-2020
relationships within the order - book data, and visualize key
insights using descriptive statistics and data visualization The dataset under consideration has a distinct problem, as
techniques. each row represents a single point in time and the condition
of the order book at that particular 5ms interval. Creating a
•Preprocess the data by handling outliers, missing values, predictive framework that can precisely predict the direction
and standardizing features, while also exploring feature of the "mid" price—a critical statistic that denotes the mid
engineering methods to enhance predictive power and point between the best bid of the dataset which is usually
ensure data quality. the first bid and the best ask prices which in most of the
cases is just the first ask price of the dataset — is the main
•Develop and evaluate machine learning models to predict objective. In particular, each timestep must be categorized as
the probability of future price changes, using appropriate either a possible increasing trajectory (designated as "1") or
algorithms and hyperparameter tuning, while validating a static or falling trajectory (designated as "0").
B. Terminologies in Dataset includes feature engineering, data cleansing, addressing
•id - The timestep ID of the order - book features. missing values, and possible scaling or transformations
to improve the features’ predictive power. Furthermore,
last_price - price at which the most recent order fill correcting any imbalances in the target variable and
occurred. investigating resampling strategies would be required to
lessen biases and enhance model functionality. The study
•mid - "mid" price - which is the average of the best will use a range of machine learning algorithms to train
bid1 and the best ask1 prices. and assess predictive models once the data has undergone
thorough preprocessing. During this iterative process,
•opened_position_qty - In the past 500ms, how many essential metrics including precision, recall, F1-score, and
buy orders were filled? accuracy are carefully chosen, hyperparameters are adjusted,
and model performance is rigorously assessed. The final
•closed_position_qty - In the past 500ms, how many sell objective is to create a strong predictive framework that,
orders were filled? using the complex patterns seen in the order book data, can
reliably predict future price changes.
•transacted_qty - In the past 500ms, how many buy+sell
orders were filled?
B. Data Preprocessing
•d_open_interest - In the past 500ms, what is (#buy 1) Exploratory Data Analysis (EDA): We started by
orders filled)- (#sell orders filled)? calculating central tendency measures like mean, median,
and mode to understand the core values of our features.
•bid1 - What is the 1st bid price (the best/highest one)? To gauge data spread, we computed variance and standard
deviation. Visual aids such as histograms and box plots
•bid[2, 3, 4, 5] - What is the [2nd, 3rd, 4th, 5th] helped us grasp the data’s shape and identify potential
best/highest bid price? outliers. We used Hexbin plots which are merged scatter
plot and histogram principles to visualize bivariate data
•ask1 - What is the 1st ask price (the best/lowest/cheapest distributions in the dataset . Density contour plots offered
one)? another view by outlining dense regions with contour lines.
Heatmaps displayed feature correlations visually , while
•ask[2, 3, 4, 5] - What is the [2nd, 3rd, 4th, 5th] we used violin plots depicted density profiles, emphasizing
best/lowest/cheapest ask price? areas with more data points and skewness of each features
in the dataset.
•bid1vol - What is the volume of contracts in the order -
book at the 1st bid price (the best/highest one)?

•bid[2,3,4,5]vol - What is the quantity of contracts in

the order - book at the [2,3,4,5]th bid price (the [2,3,4,5]th
best/highest one)?

•ask1vol - What is the quantity of contracts in the order

book at the 1st ask price (the best/lowest/cheapest one)?

•ask[2,3,4,5]vol - What is the quantity of contracts in

the order - book at the [2,3,4,5]th ask price (the [2,3,4,5]th
best/lowest/cheapest one)?

•y (unique to training data) - What is the change in the

mid price from now to 2 timesteps (approx. 1 second) in the
future? "1" means this change is strictly positive, and "0"
means the change is 0 or negative.
Fig. 1: histogram
IX. M ETHODOLOGY
A. Roadmap 2) Outlier Detection and Handling: From the previous
To guarantee the quality and usefulness of the data for eda we had got idea that data is highly skewed and there
analysis, a thorough data pretreatment pipeline is necessary were lot of oultiers and missing values in the data.For
before starting the modeling step. This complex procedure detecting the outliers we use IQR method from the box plot
Fig. 2: Outlier detection using Box Plot

Fig. 4: Conducting EDA on the features

Fig. 3: IQR calculations

Fig. 5: Violin Plots

analysis .

After identifying the outliers are Initially approach was results as expected and also lead to significant data lose and
trimming outliers and this was fatal as the significant amount while in the eda part we had studied the correlation between
of data was lost in this process. Instead,adopted a robust the features and found out that the feature containing the
approach, capping outliers by the replacing it with the values missing value had a weak positive correlation. After we
at Q3 from , preserving most data while minimizing the come to imputing the missing value instead.Various methods
impact of extreme values. which were used for imputations are Mean/Median/Mode
Imputation,Regression Imputation,K-Nearest Neighbors
3) Feature Construction: We introduced a new feature, (KNN) Imputation.With the knn imputation we could get
the "Liquidity Indicator," which Utilize bid-ask spread as a the most satisfactory results and the total dispersion of
feature, that is calculated as the difference between the best the features where maintained.Hence, we implemented
bid prices and ask prices.Incorporate volume at the best bid the K-Nearest Neighbors (KNN) algorithm for missing
and ask prices . By capturing nuanced aspects like liquidity value imputation, even in knn we tried both distance
it improves the model’s ability to forecast price changes based and uniform based averaging for the imputing
accurately values,theoretically better results should be found in
distance based averaging and so was when implemented
4) Feature Scaling: Initially, we explored robust scaling practically .
methods like RobustScaler but as we already had handled
the outlier and based on the techniques of imputation we are 6) Advanced Exploratory Data Analysis: Before the final
going to use in this paper we ultimately used StandardScaler model development Advanced Exploratory Data Analysis
for standardization of all the features before missing value (EDA) techniques can provide deeper insights into the data,
imputation. uncover complex relationships, and aid in the selection and
engineering of relevant features for modeling. Here are
5) Missing Value Imputation:: Initially Techniques such some advanced EDA methods:
as deletion and feature disregarding didn’t yield satisfactory
Fig. 8: Missing value dataset MAR

Fig. 6: Density Contour Plots

Fig. 9: Sample of transformed dataset

constraint we would not end up using all of the data samples

so we need to reduce the dimensionality of this dataset
so that the new pricipal components can capture the most
amount of frequency between different features in the dataset
Fig. 7: Skewness Reduction
. Its goal is to transform a high-dimensional dataset into
a low-dimensional subspace, preserving as much original
information or variance as possible. Here’s a more detailed
Correlation and Covariance Analysis:
explanation of PCA and its implementation:
• Compute correlation matrices to identify linear
1) Motivation and Rationale:
relationships between pairs of features.
• High-dimensional datasets often contain redundant

• Visualize correlation matrices using heatmaps or similar or highly correlated features, leading to the
techniques for easy interpretation. curse of dimensionality and computational
inefficiencies.
• Analyze covariance matrices to understand the spread
and direction of feature relationships. • PCA is used to find the directions of maximum
variance in the data ( also known as principal
• Identify and potentially remove highly correlated or components ), and projects the data onto a
redundant features to avoid multicollinearity issues. lower - dimensional subspace spanned by these
components.
Principal component analysis (PCA) which is one of the
most popular techniques to find the principal components • This projection captures the most important
of the dataset to capture the maximum variance in the data patterns and structures in the data while reducing
is being is used by us for the purpose of reducing our noise and redundancy.
high dimensional dataset, point to be noted even though we
posses enough number of samples that can support the high 2) Mathematical Formulation:
number of features in this dataset but due to computational • Let X be a [n] x [p] data matrix, where n
4) Determining Principal Components:
• The choice of k (the amount of PCs to retain)
is crucial and depends on the desired trade-off
between dimensionality reduction and information
preservation.
• Common approaches include:

– Explained Variance Ratio: Retain enough

components to account for a certain percentage
( e.g., 90% or 95% ) of total variance in the data.

– Scree Plot: Plot eigenvalues in descending order

and look for an "elbow" or sudden flattening,
indicating that subsequent components
contribute little to the overall variance.

– Domain Knowledge: Choose k based on prior

knowledge or interpretability requirements of
the specific problem domain.
Fig. 10: Heat map

represents the number of observations and p is

the number of features.

• PCA seeks to find a linear transformation that

maps the original p - dimensional data onto a k
- dimensional subspace ( where k < p ) while
minimizing the reconstruction error.

• The principal components are the orthogonal

directions (eigenvectors) of the covariance or
correlation matrix of X, corresponding to the
largest eigenvalues.

3) Implementation Steps:
a. Data Preprocessing: Center the data by subtracting
the mean from each feature, or standardize (scale) the
data if features have different units or scales.

b. Compute the Covariance Matrix : Calculate Fig. 11: Post PCA on test dataset
the covariance matrix of the preprocessed data.

c. Perform eigendecomposition on the covariance

C. Model Development and Evaluation:
matrix to obtain the eigenvectors and corresponding
eigenvalues. Model Selection: We extensively evaluated two different
machine learning algorithms to choose the best approach
d. Sort the eigenvectors by their corresponding for forecasting 1-second price changes in futures contracts.
eigenvalues in descending order. The top k
eigenvectors with the largest eigenvalues are chosen Random Forest: We selected Random Forest due to its
as the principal components. capabolity to handle high - dimensional data and capture
complex non-linear relationships with ease. Random Forest is
e. Fix the original data onto the k - dimensional an ensemble of decision trees. It uses bootstrap sampling and
subspace spanned by selected principal components random feature subsets to build diverse trees. For prediction,
by multiplying the data matrix X , and the matrix of it aggregates the outputs of all trees: majority vote for
k principal component vectors. classification. The ensemble reduces variance and overfitting.
Hyperparameters like maxdepth and nestimators were fine- X. R ESULTS O BTAINED
tuned using techniques like GridSearchCV. The results obtained in all the techniques are discussed
C(x) = majority vote(C1(x), C2(x), ..., CB(x)) Where C( x below and the performance metrics used here is accuracy
) refers to the predicted class , f(x) refers to the predicted and the are also specified
value, Cb(x) and fb(x) are predictions from the b-th tree, and
B is the total number of trees. Models Accuracy
Gradient Boosting: Models like CatBoost were explored Logistic Regression 0.663
for their proficiency in capturing intricate patterns in HFT Random Forest Classifier 0.667
data. Hyperparameters such as learningrate and maxdepth CatBoost 0.671
were optimized through randomized search
REFERENCES
Model Evaluation: Stratified k-fold cross-validation [1] Jonathan Brogaard, Allen Carrion, Thibaut Moyaert,
assessed model generalization. We computed accuracy, Ryan Riordan, Andriy Shkilko, Konstantin Sokolov.
precision, recall, and F1-score to measure predictive (2018). High frequency trading and extreme price
capabilities. movements.

Hyperparameter Tuning: For Random Forest, Logistic [2] A. Lakshmi and Sailaja Vedala. (2017). A study on low
Regression Grid/RandomizedSearchCV evaluated latency trading in Indian stock markets. International
hyperparameter combinations systematically. For gradient Journal of Civil Engineering and Technology, 8(12),
boosting models, randomized search strategies maximized 733-743.
validation metrics. [3] Peter Gomber, Björn Arndt, Marco Lutat, and Tim Uhle.
(2011). High-Frequency Trading. SSRN Electronic Jour-
Logistic Regression:As the main objective was to predict nal. doi: 10.2139/ssrn.1858626.
the fall or rise of market mid price it is feasible to use a [4] C. Dutta, K. Karpman, S. Basu, et al. (2023). Review
logistic regression model on this dataset. Logistic regression of Statistical Approaches for Modeling High-Frequency
is a statistical model that predicts the probability of two Trading Data. Sankhya B, 85(Suppl 1), 1–48.
[5] G.P.M. Virgilio. (2019). High-frequency trading: a liter-
outcomes using one or more predictor variables. It uses the
ature review. Financ Mark Portf Manag, 33, 183–208.
logit function to map predicted probabilities to [0, 1] , we
doi: 10.1007/s11408-019-00331-6.
applied hyperparameter tunning using grid search.
Team12.pdf
ORIGINALITY REPORT

9 %
SIMILARITY INDEX
6%
INTERNET SOURCES
4%
PUBLICATIONS
2%
STUDENT PAPERS

PRIMARY SOURCES

1
iaeme.com
Internet Source 1%
2
fastercapital.com
Internet Source 1%
3
Mostafa Goudarzi, Flavio Bazzana.
"Identification of high-frequency trading: A
1%
machine learning approach", Research in
International Business and Finance, 2023
Publication

4
www.efmaefm.org
Internet Source 1%
5
Submitted to University of Melbourne
Student Paper 1%
6
kclpure.kcl.ac.uk
Internet Source 1%
7
Jyoti P. Kanjalkar, Gaurav N. Patil, Gaurav R.
Patil, Yash Parande, Bhavesh Dilip Patil,
1%
Pramod Kanjalkar. "chapter 18 Wise Apply on
a Machine Learning-Based College

Darryl Shen - OrderImbalanceStrategy PDF
No ratings yet
Darryl Shen - OrderImbalanceStrategy PDF
70 pages
Literature Review High Frequency Trading
100% (2)
Literature Review High Frequency Trading
8 pages
Order Imbalance Strategy
No ratings yet
Order Imbalance Strategy
70 pages
Ai ML Lab Project Template Final
No ratings yet
Ai ML Lab Project Template Final
27 pages
Deep Learning For Financial Time Series Forecasting in A-Trader
No ratings yet
Deep Learning For Financial Time Series Forecasting in A-Trader
8 pages
Dissertation High Frequency Trading
100% (2)
Dissertation High Frequency Trading
5 pages
HighFrequencyMarketMaking Preview
100% (1)
HighFrequencyMarketMaking Preview
72 pages
Service Manual: History Information For The Following Manual
No ratings yet
Service Manual: History Information For The Following Manual
71 pages
Day 3 - Customizing ChatGPT
No ratings yet
Day 3 - Customizing ChatGPT
44 pages
Research Ii: Types of Research Data
No ratings yet
Research Ii: Types of Research Data
21 pages
Optimal High-Frequency Trading in A Pro-Rata Microstructure With Predictive Information
No ratings yet
Optimal High-Frequency Trading in A Pro-Rata Microstructure With Predictive Information
26 pages
Group - Vii, Nse Project-2
No ratings yet
Group - Vii, Nse Project-2
11 pages
High Frequency Trading: Price Dynamics Models and Market Making Strategies
No ratings yet
High Frequency Trading: Price Dynamics Models and Market Making Strategies
40 pages
Aryan Blackbook
No ratings yet
Aryan Blackbook
95 pages
Yaesu Bda Ft-991
No ratings yet
Yaesu Bda Ft-991
158 pages
How and When Are High-Frequency Stock Returns Predictable
No ratings yet
How and When Are High-Frequency Stock Returns Predictable
58 pages
Application of Machine Learning in High Frequency Trading of Stocks
No ratings yet
Application of Machine Learning in High Frequency Trading of Stocks
12 pages
Deep Learning For Algorithmic Trading
No ratings yet
Deep Learning For Algorithmic Trading
27 pages
Eswa 2020
No ratings yet
Eswa 2020
46 pages
Deep Probabilistic Modelling of Price Movements For High-Frequency Trading
No ratings yet
Deep Probabilistic Modelling of Price Movements For High-Frequency Trading
8 pages
Machine-Learning Classification Techniques For The Analysis and P
No ratings yet
Machine-Learning Classification Techniques For The Analysis and P
292 pages
Vizio Vw32l Hdtv10a Service Manual
100% (1)
Vizio Vw32l Hdtv10a Service Manual
136 pages
Trend Prediction Classification For High Frequency Bitcoin Time Series With Deep Learning
No ratings yet
Trend Prediction Classification For High Frequency Bitcoin Time Series With Deep Learning
15 pages
Journal of Forecasting - 2018 - Ntakaris - Benchmark Dataset For Mid Price Forecasting of Limit Order Book Data With
No ratings yet
Journal of Forecasting - 2018 - Ntakaris - Benchmark Dataset For Mid Price Forecasting of Limit Order Book Data With
15 pages
Novel Modelling Strategies For High-Frequency Trading Data
No ratings yet
Novel Modelling Strategies For High-Frequency Trading Data
28 pages
An Adaptive Network-Based Approach For Advanced Forecasting of Cryptocurrency Values
No ratings yet
An Adaptive Network-Based Approach For Advanced Forecasting of Cryptocurrency Values
11 pages
SmartPlant Electrical SmartPlant 3D Cable Management Interface
No ratings yet
SmartPlant Electrical SmartPlant 3D Cable Management Interface
1 page
A Neural Network Based Approach To Suppo
No ratings yet
A Neural Network Based Approach To Suppo
8 pages
Forecasting Stock Prices From The Limit Order Book Using Convolutional Neural Networks
No ratings yet
Forecasting Stock Prices From The Limit Order Book Using Convolutional Neural Networks
6 pages
Stock Market Prediction On High Frequency Data Using LSTM
No ratings yet
Stock Market Prediction On High Frequency Data Using LSTM
6 pages
For 2543
No ratings yet
For 2543
15 pages
ICT Trivia
No ratings yet
ICT Trivia
9 pages
Review of Statistical Approaches For Modeling High-Frequency Trading Data
No ratings yet
Review of Statistical Approaches For Modeling High-Frequency Trading Data
48 pages
Applsci 14 02984
No ratings yet
Applsci 14 02984
26 pages
Review of Statistical Approaches For Modeling High-Frequency Trading Data
No ratings yet
Review of Statistical Approaches For Modeling High-Frequency Trading Data
48 pages
10.3934 Dsfe.2022022
No ratings yet
10.3934 Dsfe.2022022
27 pages
C D L O B R L P T: Ombining EEP Earning On Rder Ooks With Einforcement Earning For Rofitable Rading
No ratings yet
C D L O B R L P T: Ombining EEP Earning On Rder Ooks With Einforcement Earning For Rofitable Rading
41 pages
JRFM 2548267
No ratings yet
JRFM 2548267
17 pages
Pattern Recognition: Tao Yin, Chenzhengyi Liu, Fangyu Ding, Ziming Feng, Bo Yuan, Ning Zhang
No ratings yet
Pattern Recognition: Tao Yin, Chenzhengyi Liu, Fangyu Ding, Ziming Feng, Bo Yuan, Ning Zhang
11 pages
Master Daniela Martinez Vargas
No ratings yet
Master Daniela Martinez Vargas
71 pages
Adaptive Neuro-Fuzzy Inference System For
No ratings yet
Adaptive Neuro-Fuzzy Inference System For
10 pages
SSRN 5226168
No ratings yet
SSRN 5226168
56 pages
Using Deep Learning To Detect Price Change Indications in Financial Markets
No ratings yet
Using Deep Learning To Detect Price Change Indications in Financial Markets
5 pages
Stock Price Prediction Using Data Analytics: 978-1-5386-3852-1/17/$31.00 ©2017 IEEE
No ratings yet
Stock Price Prediction Using Data Analytics: 978-1-5386-3852-1/17/$31.00 ©2017 IEEE
5 pages
Graph Portfolio: High-Frequency Factor Predictor Via Heterogeneous Continual Gnns
No ratings yet
Graph Portfolio: High-Frequency Factor Predictor Via Heterogeneous Continual Gnns
14 pages
Report
No ratings yet
Report
56 pages
Mathematics 13 00347
No ratings yet
Mathematics 13 00347
40 pages
Stock Price Prediction Using Machine Learning
No ratings yet
Stock Price Prediction Using Machine Learning
44 pages
Paras Blackbook
No ratings yet
Paras Blackbook
94 pages
Final Blackbook
No ratings yet
Final Blackbook
95 pages
Lob Comp Preprint3
No ratings yet
Lob Comp Preprint3
26 pages
Codigo de Barras EP2000
No ratings yet
Codigo de Barras EP2000
48 pages
Nat ADABAS4 ND
100% (1)
Nat ADABAS4 ND
54 pages
Mathematics 10 03302 With Cover
No ratings yet
Mathematics 10 03302 With Cover
14 pages
Project Report2
No ratings yet
Project Report2
37 pages
Daksh Blackbook
No ratings yet
Daksh Blackbook
94 pages
Worksheet 4.1: Linear Inequalities in Two Unknowns
No ratings yet
Worksheet 4.1: Linear Inequalities in Two Unknowns
28 pages
Low Policy Induced Optimal HF Trading
No ratings yet
Low Policy Induced Optimal HF Trading
14 pages
Research On Optimizing Real-Time Data Processing in High-Frequency Trading Algorithms Using Machine Learning
No ratings yet
Research On Optimizing Real-Time Data Processing in High-Frequency Trading Algorithms Using Machine Learning
4 pages
Regular Falsi Method: B.S. (SE) Semester Project Report
No ratings yet
Regular Falsi Method: B.S. (SE) Semester Project Report
12 pages
Financial Time Series Forecasting Applying Deep Learning Algorithms
No ratings yet
Financial Time Series Forecasting Applying Deep Learning Algorithms
16 pages
Data Acquisition and Labview: Prof. R.G. Longoria
No ratings yet
Data Acquisition and Labview: Prof. R.G. Longoria
19 pages
16631271278
No ratings yet
16631271278
12 pages
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
No ratings yet
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
32 pages
PACS DATA EXTRACT-User Guide
100% (1)
PACS DATA EXTRACT-User Guide
15 pages
Lecture 3 Slides
No ratings yet
Lecture 3 Slides
49 pages
Review Paper: Virtual Autopsy: A New Trend in Forensic Investigation
No ratings yet
Review Paper: Virtual Autopsy: A New Trend in Forensic Investigation
7 pages
Cuet 2nd Day Admit Card
No ratings yet
Cuet 2nd Day Admit Card
2 pages
FiberVU User Guide v01
No ratings yet
FiberVU User Guide v01
35 pages
Welcome To NEST: What Saving With NEST Means For You
No ratings yet
Welcome To NEST: What Saving With NEST Means For You
9 pages
Datasheet - A-HV-3U Battery BOS-A V1.1
No ratings yet
Datasheet - A-HV-3U Battery BOS-A V1.1
6 pages
Pharmacy Minitheme by Slidesgo
No ratings yet
Pharmacy Minitheme by Slidesgo
42 pages
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
No ratings yet
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
22 pages
Simple NLG
No ratings yet
Simple NLG
4 pages
Using Multivariate Statistics 7th Edition Barbara G. Tabachnickdownload
100% (2)
Using Multivariate Statistics 7th Edition Barbara G. Tabachnickdownload
51 pages
Find Changes Logs For A Table Using SM30 - SAP Blogs
No ratings yet
Find Changes Logs For A Table Using SM30 - SAP Blogs
7 pages
Mobile Phone Cloning IJERTCONV3IS10043
No ratings yet
Mobile Phone Cloning IJERTCONV3IS10043
5 pages
Simplifying Radicals
No ratings yet
Simplifying Radicals
8 pages
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
No ratings yet
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
2 pages
Battery Capacity and Battery Backup Time Calculation
No ratings yet
Battery Capacity and Battery Backup Time Calculation
3 pages
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Cryptocurrency Market Forecasting With Catboost Models
From Everand
Cryptocurrency Market Forecasting With Catboost Models
Heng Chen
No ratings yet
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
From Everand
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenTelemetry in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenTelemetry in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Operational Monitoring with Datadog: Definitive Reference for Developers and Engineers
From Everand
Operational Monitoring with Datadog: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
MQTT Protocol in Practice: Definitive Reference for Developers and Engineers
From Everand
MQTT Protocol in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Media Transfer Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
Media Transfer Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Operational Monitoring with Stackdriver: Definitive Reference for Developers and Engineers
From Everand
Operational Monitoring with Stackdriver: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Observability Engineering with Relic: Definitive Reference for Developers and Engineers
From Everand
Practical Observability Engineering with Relic: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
From Everand
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
Robert Johnson
No ratings yet

DS Report

Uploaded by

DS Report

Uploaded by

Course Project Report

High Frequency Price Prediction of Index Futures

Abhaysingh Rajput (221AI002)

as part of the requirements of the course

Data Science (IT258) [Dec 2023 - Apr 2024]

in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology in Artificial Intelligence

under the guidance of

Dr. Sowmya Kamath S, Dept of IT, NITK Surathkal

D EPARTMENT OF I NFORMATION T ECHNOLOGY

DEC 2023 - APR 2024

Details of Project Group

Abhaysingh Rajput 2210587

Rohil Sharma 2210455

(Name and Signature of Course Instructor)

Details of Project Group

Name of the Student Register No. Signature with Date

1. Abhaysingh Rajput 2210587

2. Rohil Sharma 2210455

Place: NITK, Surathkal

Abstract— The dynamics of short-term commodity prices in B. Rajan Lakshmi A/2017

•bid[2,3,4,5]vol - What is the quantity of contracts in

•ask1vol - What is the quantity of contracts in the order

•ask[2,3,4,5]vol - What is the quantity of contracts in

•y (unique to training data) - What is the change in the

Fig. 4: Conducting EDA on the features

Fig. 3: IQR calculations

Fig. 5: Violin Plots

Fig. 6: Density Contour Plots

Fig. 9: Sample of transformed dataset

constraint we would not end up using all of the data samples

– Explained Variance Ratio: Retain enough

– Scree Plot: Plot eigenvalues in descending order

– Domain Knowledge: Choose k based on prior

represents the number of observations and p is

• PCA seeks to find a linear transformation that

• The principal components are the orthogonal

c. Perform eigendecomposition on the covariance

You might also like