0% found this document useful (0 votes)
152 views61 pages

Minorproject Report

This document summarizes a project that aims to predict crypto currency prices using machine learning methods. It was submitted by three students as a minor project for their Bachelor's degree in Computer Science and Engineering. The project involves collecting crypto currency price data, preprocessing the data, applying classification algorithms like RNN and LSTM to predict price direction, and evaluating model performance. It discusses the motivation, objectives, methodology, implementation details, and expected outcomes of the project. The document provides an abstract, introduction, literature review, requirements analysis, system design, implementation steps, testing approach, expected outputs, and conclusion.

Uploaded by

Vishal Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views61 pages

Minorproject Report

This document summarizes a project that aims to predict crypto currency prices using machine learning methods. It was submitted by three students as a minor project for their Bachelor's degree in Computer Science and Engineering. The project involves collecting crypto currency price data, preprocessing the data, applying classification algorithms like RNN and LSTM to predict price direction, and evaluating model performance. It discusses the motivation, objectives, methodology, implementation details, and expected outcomes of the project. The document provides an abstract, introduction, literature review, requirements analysis, system design, implementation steps, testing approach, expected outputs, and conclusion.

Uploaded by

Vishal Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Crypto Price Predictor & Visualiser

A
Project Work
Submitted as Minor Project in Partial fulfillment for the award of Graduate Degree in
Bachelor of Engineering in Computer Science & Engineering.

Submitted to

RAJIV GANDHI PROUDYOGIKI


VISHWAVIDYALAYA BHOPAL (M.P)

Submitted By--
Amardeep Singh Rathaur (0105CS191018)
Adarsh Kumar Singh (0105CS191009)
Devraj Singh (0105CS191035)

Under the Guidance of


Prof. Anil Kumar Kushwah
(Department of Computer Science &
Engineering)

Oriental Institute of Science & Technology, Bhopal


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

JULY-DEC 2021

i
CRYPTO PRICE PREDICTION 1
Oriental Institute of Science & Technology, Bhopal
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that the project entitled “Crypto Price Predictor & Visualiser”

being submitted by Amardeep Singh Rathaur (0105CS191018), Adarsh Kumar

Singh (0105CS191009) and Devraj Singh (0105CS191035) student of Vth Semester,

B.Tech in Computer Science & Engineering have done their work as MINOR

PROJECT-I for Partial fulfillment of the B.Tech degree from RGPV, Bhopal (M.P.) is a

record of bonafide work carried out by t e a m under our supervision.

Guide Head
Department of Computer Science& Department of Computer Science &
Engineering Engineering

Prof. Anil Kumar Kushwah

CRYPTO PRICE PREDICTION 2


ABSTRACT

The purpose of this study is to find out with what accuracy the direction of the price of
crypto currency can be predicted using machine learning methods. This is basically a time
series prediction problem. While much research exists surrounding the use of different
machine learning.

Techniques for time series prediction, research in this area relating specifically to crypto
currency is lacking. In addition, crypto currency as a currency is in a transient stage and as
a result is considerably more volatile than other currencies such as the USD. Interestingly,
it is the top performing currency four out of the last five years. Thus, its prediction offers
great potential and this provides motivation for research in the area. As evidenced by an
analysis of the existing literature, running machine learning algorithms on a GPU as
opposed to a CPU can offer significant performance improvements. This is explored by
benchmarking the training of the RNN and LSTM network using both the GPU and CPU.
This provides a solution to the sub research topic.

Finally, in analysing the chosen dependent variables, each variables importance is assessed
using a random forest algorithm. In addition, the ability to predict the direction of the price
of an asset such as crypto currency offers the opportunity for profit to be made by trading
the asset.

Keywords: crypto currency Prediction, Time complexity, Machine-learning, Database


architecture, RNN, LSTM.

i
CRYPTO PRICE PREDICTION 3
ACKNOWLEDGEMENT
 
 
I take the opportunity to express my cordial gratitude and deep sense of
indebtedness to my guide for the valuable guidance and inspiration throughout
the project duration. I feel thankful to him for his innovative ideas, which led
to successful submission of this minor project work. I feel proud and fortune to
work under such an outstanding mentor in the field of Crypto Price Predictor
& Visualiser. He has always welcomed my problem and helped us to clear our
doubt. I will always be grateful to him for providing me moral support and
sufficient time.
 
I owe sincere thanks to Director OIST, for providing us with moral support and
necessary help during my project work in the Department.
 
At the same time, I would like to thank HOD CSE and all other faculty
members and all non-teaching staff of department of Computer Science
&Engineering for their valuable co-operation.
 
I would also thank to my Institution, faculty members and staff without whom
this project would have been a distant reality. I also extend my heartfelt thanks
to our family and well-wishers.
 

 
 
Amardeep Singh Rathaur (0105CS191018)
Adarsh Kumar Singh (0105CS191009)
Devraj Singh (0105CS191035)

CRYPTO PRICE PREDICTION 4


TABLE OF CONTENTS

ABSTRACT I

ACKNOWLEDGEMENT II

CHAPTERS III

LIST OF FIGURES VI

Chapter 1 Page No.

Introduction 1

1.1 Domain Specification Introduction 1

1.2 Problem Definition 2

1.3 Project Purpose 3

1.4 Project Features 5

1.5 Module Description 6

Chapter 2

Literature Survey 8

2.1 Data Mining 8

2.2 Existing System 13

2.3 Proposed System 14

2.4 Software Description 15

Chapter 3

iii
CRYPTO PRICE PREDICTION 5
Requirement Analysi 23
s
3.1 Functional Requirements 23

3.2 Non Functional Requirement 23


s
3.3 Hardware Requirements 25

3.4 Software Requirements 25

Chapter 4

Design 26

4.1 Design Goals 26

4.2 System Architecture 28

4.3 Data Flow Diagram 29

Chapter 5

Implementation 30

5.1 Dataset 30

5.2 Data Preprocessing 31

5.3 Classification 33

Chapter 6

Testing 45

1. Unit Testing 45
2. Integration Testing 45

6.3 Validation Testing 46

iv
CRYPTO PRICE PREDICTION 6
Chapter 7

Outputs 47

Chapter 8

Conclusion and Future Scope 51

References 52

v
CRYPTO PRICE PREDICTION 7
LIST OF FIGURES

Diagram Page No.

2.1 Data 9
Mining
2.2 Stages in Data Mining 10

2.3 Data Mining Techniques 12

4.1 System Architecture 28

4.2 Data Flow Diagram 29

5.1 Clasification 33
5.2 NNAR 39
6.1 Testing Process 46

7.1 Bitcoin Dataset 47

7.2 LSTM Model 48

7.3 Prediction graph 49

7.4 predicted price 50

vi
CRYPTO PRICE PREDICTION 8
CHAPTER 1
INTRODUCTION

1.1 DOMAIN SPECIFIC INTRODUCTION

Time series prediction is not a new phenomenon. Prediction of most financial markets
such as the stock market has been researched at large scale. crypto currency presents an
interesting parallel to this as it is a time series prediction problem in a market still in its
beggining stage. As a result, there is high volatility in the market and this provides an
opportunity in terms of prediction. In addition, crypto currency is the leading
cryptocurrency in the world with adoption growing consistently over time. Due to the
open nature of crypto currency it also poses another difficulty as opposed to traditional
financial markets. It operates on a decentralised, peer-to-peer and trustless system in
which all transactions are posted to an open ledger called the Blockchain. This type of
transparency is not seen in other financial markets. Traditional time series prediction
methods such as Holt- Winters exponential smoothing models rely on linear assumptions
and require data that can be broken down into trend, seasonal and noise to be effective.
This type of methodology is more suitable for a task such as predicting sales where
seasonal effects are present. Due to the lack of seasonality in the crypto currency market
and it’s high volatility, these methods are not very effective for this task. Given the
complexity of the task, deep learning makes for an interesting technological solution
based on its performance in similar areas. Tasks such as natural language processing
which are also sequential in nature and have shown promising results. This type of task
uses data of a sequential nature and as a result is similar to a price prediction task. The
recurrent neural network (RNN) and the long short term memory (LSTM) flavour of
artificial neural networks are favoured over the traditional multilayer perceptron (MLP)
due to the temporal nature of the more advanced algorithms.
The aim of this research is to ascertain with what accuracy can the price of crypto
currency be predicted using machine learning. Section one addresses the project
specification which includes the research question, sub research questions, the purpose
of the study and

CRYPTO PRICE PREDICTION 1


the research variables. A brief overview of crypto currency, machine learning and time
series analysis concludes section one. Section two examines related work in the area of
both and other financial time series prediction. Literature on using machine learning to
predict crypto currency price is limited.

Out of approximately 653 papers published on crypto currency only 7 have related to
machine learning for prediction. As a result, literature relating to other financial time
series prediction using deep learning is also assessed as these tasks can be considered
analogous.

1.2 PROBLEM DEFINITION

The popularity of cryptocurrencies has skyrocketed in 2017 due to several consecutive


months of super exponential growth of their market capitalization, which peaked at
more than $800 billions in Jan. 2018. Today, there are more than1,500 actively traded
crypto currencies. Between2.9and 5.8millions of private as well as institutional investors
are in the different transaction networks, according to a recent survey , and access to
the market has become easier over time. Major cryptocurrencies can be bought using
fiat currency in a number of online exchanges and then be used in their turn to buy less
popular cryptocurrencies. The volume of daily exchanges is currently superior to $15
billions. Since 2017, over 170 hedge funds specialised in cryptocurrencies have emerged
and crypto currency futures have been launched to address institutional demand for
trading and hedging crypto currency could be effective also in predicting crypto currency
prices. However, the application of machine learning algorithms to the cryptocurrency
market has been limited so far to the analysis of crypto currency prices, using random
forests, Bayesian neural network , long short-term memory neural network, and other
algorithms .The studies were able to anticipate, to different degrees, the price
fluctuations of crypto currency, and revealed that best results were achieved by neural
network based algorithms. Deep reinforcement learning was showed to beat the
uniform buy and hold strategy in predicting the prices of 12 cryptocurrencies overone-
year period.

CRYPTO PRICE PREDICTION 2


The crypto currency’s value varies just like any other stock . There are many algorithms
used on stock market data for price forecast. However, the parameters affecting crypto
currency are different. Therefore it is necessary to foretelling the value of crypto
currency so that correct investment decisions can be made. The price of crypto currency
does not depend on the business events or intervening government authorities, unlike
the stock market. Thus, to forecast the value we feel it is necessary to leverage machine
learning technology to predict the price of crypto currency. So the project aim is to
predict the price of crypto currency and help investor’s make better investments. This
research is concerned with predicting the price of crypto currency using machine
learning. The goal is to ascertain with what accuracy can the direction of crypto currency
price in USD can be predicted.

The price data is sourced from the crypto currency Price index. The task is achieved with
varying degrees of success through the implementation of a Bayesian optimized
recurrent neural network (RNN) and Long Short-Term Memory (LSTM) network.

1.3 PROJECT PURPOSE

The purpose of this study is to find out with what accuracy the direction of the price of
crypto currency can be predicted using machine learning methods. This is basically a
time series prediction problem. While much research exists surrounding the use of
different machine learning techniques for time series prediction, research in this area
relating specifically to crypto currency is lacking. In addition, crypto currency as a
currency is in a transient stage and as a result is considerably more volatile than other
currencies such as the USD. Interestingly, it is the top performing currency four out of
the last five years1. Thus, its prediction offers great potential and this provides
motivation for research in the area. As evidenced by an analysis of the existing
literature, running machine learning algorithms on a GPU as opposed to a CPU can
offer significant performance improvements. This is explored by benchmarking the
training of the RNN and LSTM network using both the GPU and CPU. This provides a
solution to the sub research topic.

CRYPTO PRICE PREDICTION 3


Finally, in analysing the chosen dependent variables, each variables importance is
assessed using a random forest algorithm. In addition, the ability to predict the
direction of the price of an asset such as crypto currency offers the opportunity for
profit to be made by trading the asset. To implement a full trading strategy based on
the results of the models is worthy of a dissertation in itself and as a result this paper
will focus solely on the accuracy in which direction the price can be predicted. In basic
terms, the model would initiate a short position if the price was predicted to go up and
a long position if the price was predicted to go down. Several crypto currency
exchanges offer margin trading accounts to facilitate this too. The profitability of this
strategy would be based not only on the accuracy of the model, but also on the size of
the positions taken. This is outside the scope of this research but could be addressed in
future work.
While we will try to build a predictive model for the crypto currency price value
calculator,we are aware in advance that price may differ greatly because of internal
and external factors to crypto currency. By internal factors we are presuming factors
inside the crypto currency security.

By external we are referring to agents which influence indirectly the price of crypto
currency(exchange closures, replacing cryptocurrencies, speculation markets, the fact that
as its believed widely over 80% of crypto currencys in circulation is concentrated in a
limited number of investors etc.) Anyway, we shall compare our results to other models
built for cryptocurrency prediction. Let’s not forget that in the first month of 2018 there
were models which predicted that crypto currency would surpass the 100,000.00 USD
per crypto currency till the end of the year, while we are barely reaching the 7,000.00
USD value just 2 months before the end of the year.

CRYPTO PRICE PREDICTION 4


4. PROJECT FEATURES

The main feature of this system is to propose a general and effective approach to
predict the crypto currency price using data mining techniques. The main goal of the
proposed system is to analyze and study the hidden patterns and relationships between
the data present in the crypto currency dataset. The solution to the crypto currency
analysis problem can provide extremely useful information to prevent investors from
loosing money which is being invested on crypto currency. Most of the existing work
solves these problems separately by different models. so dealing with this becomes
more important. The analysis and prediction plays an important role in the problem
definition.

The constant increase in crypto currency usage has become an extremely serious
problem, with the development of technology and hi-tech tools having a significantly
greater impact on the crypto currency price. The large amounts of information also
poses a challenge to analyze such data and identify similarities or relations between the
data. Also there is a challenge of inconsistency that can occur in the data due to
incompleteness in the dataset. Therefore, there is an urging need of proper techniques
to analyze large volumes of data to get some useful results out of it. So the main aim of
this project is to propose a general and effective approach to predict the crypto currency
price using data mining techniques.

The main features of the proposed system are:


 More efficient.
 Better crypto currency price monitoring systems.
 Reduces the costs of storage, maintenance and personnel.
 It reduces the time complexity of the system.
 System that has a simpler architecture to understand.
 Processing of large amount of data becomes easier.

CRYPTO PRICE PREDICTION 5


5. MODULES DESCRIPTION

1. DATA GATHERING

The first step in this project or in any data mining project is the collection of data to be
studied or examined to find the hidden relationships between the data members. The
important concern while choosing a dataset is that the data which we are gathering
should be relevant to the problem statement and it must be large enough so that the
inference derived from the data is useful to extract some important patterns between
the data such that they can be used to predict the future events or can be studied for
further analysis. The result of the process of gathering and creating a collection of data
results into what we call as a Dataset. The dataset contains large volume of data that can
be analyzed to get some knowledge from the databases. This is an important step in the
process because choosing the inappropriate dataset can lead us to incorrect results.

2. DATA PREPROCESSING

The primary data collected from the internet resources remains in the raw form of
statements, digits and qualitative terms. The raw data contains error, omissions and
inconsistencies. It requires corrections after careful scrutinizing the completed
questionnaires. The following steps are involved in the processing of primary data. A
huge volume of raw data collected through field survey needs to be grouped for similar
details of individual responses.

Data Preprocessing is a technique that is used to convert the raw data into a clean data
set. In other words, whenever the data is gathered from different sources it is collected
in raw format which is not feasible for the analysis.

Therefore, certain steps are executed to convert the data into a small clean data set. This
technique is performed before the execution of Iterative Analysis. The set of steps is
known as data preprocessing.

CRYPTO PRICE PREDICTION 6


The process comprises:

 Data Cleaning
 Data Integration
 Data Transformation

 Data Reduction

Data Preprocessing is necessary because of the presence of unformatted real


world data. Mostly real world data is composed of:
 Inaccurate data (missing data) - There are many reasons for missing data such as
data is not continuously collected, a mistake in data entry, technical problems
with biometrics and much more.
 The presence of noisy data (erroneous data and outliers) - The reasons for the
existence of noisy data could be a technological problem of gadget that gathers
data, a human mistake during data entry and much more.
 Inconsistent data - The presence of inconsistencies are due to the reasons such
that existence of duplication within data, human data entry, containing mistakes in
codes or names, i.e., violation of data constraints and much more.

1.5.3 CLASSIFICATION

This technique is used to divide various data into different classes. This process is also
similar to clustering. It segments data records into various segments which are known as
classes. Unlike clustering, here we have knowledge of different clusters. Ex: Outlook
email, they have an algorithm to categorize an email as legitimate or spam.

CRYPTO PRICE PREDICTION 7


CHAPTER 2

LITERATURE SURVEY
1. DATA MINING
Literature survey is that the most vital step in code development method. Before
developing the tool it's necessary to see the time issue, economy and company strength.
Once these things are satisfied, then next steps is to determine which operating system
and language can be used for developing the tool Once the programmers begin building
the tool the programmers would like heap of external support. This support is obtained
from senior programmers, from book or from websites Before building the system the on
top of thought area unit taken under consideration for developing the projected system..
We have to analyze the Data mining Outline Survey:

1. Data Mining Survey


Data mining is a data analysis technique which allows us to study and identify different
patterns and relationships between the data. In other words, data mining is a technique
which can be employed to extract information from large and extensive datasets and
convert the information into a prominent structure so that it can be used further for
gaining inference and knowledge on the data so as to prevent the crimes.

Data mining contains techniques for analysis which involve various domains. For instance,
some of the domains involved in data mining are Statistics, Machine Learning and
Database systems. Data mining is additionally spoken as “Knowledge discovery in
databases (KDD)”.

The real task of data mining systems is the semi-automatic or automatic analysis of large
volumes of data to extract previously unknown relationships such as groups of data
members(clustering analysis),unusual records(outlier or anomaly detection),and
dependencies. Normally, this includes database techniques like spatial indices.

CRYPTO PRICE PREDICTION 8


These relationships that are discovered can be used as input data or may also be used in
depth analysis for example, in machine learning or predictive analysis.

Data mining may identify multiple groups in the data, that can be put to further use for
accurate predictions by a decision support system.

Fig 2.1: Data Mining

2.1.2 Stages in Data Mining


There are 4 major steps in data mining which are described as follows:
1. Data Sources: This stage includes gathering the data or making a dataset on which the
analysis or the study has to performed. The datasets can be of many forms for instance,
they can be news letters, databases, excel sheets or various other sources like websites,
blogs, social media. An appropriate dataset must be chosen in order to perform an
efficient study or analysis. The dataset must be chosen which is appropriate and well
suited with respect to the problem definition.

CRYPTO PRICE PREDICTION 9


2. Data Exploration: This step includes preparing the data properly for analysis and study.
This step is mainly focused on cleaning the data and removing the anomalies from the
data. As there is a large amount of data there is always a great chance that some of the
data might be missing or some data might be wrong. Thus, for efficient analysis we
require the data to be maintained properly. So this process includes removing the
incorrect data and replacing the data which is missing with either mean or median of the
whole data. This step is also generally known as data pre-processing.

3. Data Modeling: In this step the relationships and patterns that were hidden in the data
are examined and extracted from the datasets. The data can be modeled based on the
technique that is being used. Some of the different techniques that can be used for
modeling data are: clustering, classification, association and decision trees.

Deploying Models: Once the relationships and patterns present in the data are discovered
we need to put that knowledge to use. These patterns can be used to predict events in the
future and also they can be used for further analysis. The discovered patterns can be used as inputs for
machine learning and predictive analysis for the datasets.

Fig 2.2: Stages in Data Mining

CRYPTO PRICE PREDICTION 10


2.1.3 Techniques in Data Mining:

1.Classification: This technique is used to divide various data into different classes. This
process is also similar to clustering. It segments data records into various segments
which are known as classes. Unlike clustering, here we have knowledge of different
clusters. Ex: Outlook email, they have an algorithm to categorize an email as legitimate
or spam.

2.Association: This technique is used to discover hidden patterns in the data and also
for identifying interesting relations between the variables in a database. Ex: It is used in
retail industry.

3.Prediction: This technique is used only for particular uses. It is used extract
relationships between independent and dependent variables in the dataset. Ex: We use
this technique to predict profit obtained from sales for the future.

4.Clustering: A cluster is referred to as a group of data objects. The data objects that are
similar in properties are kept in the same cluster. In other words we can tell that
clustering is a process of discovering groups or clusters.

5.Here we do not have prior knowledge of the clusters. Ex: It can be used in consumer
profiling.

6.Sequential Patterns: This is an essential aspect of data mining techniques its main aim
is to discover similar patterns in the dataset. Ex: E-commerce websites suggestions are
based on what we have bought previously.

7.Decision Trees: This technique is a vital role in data mining because it is easier to
understand for the users. The decision tree begins with a root which is a simple
question. As they can have multiple answers we get our nodes of the decision tree also
the questions in the root node might lead to another set of questions. Thus, the nodes
keep adding in the decision tree. At last, we are allowed to make a final decision on it.
Apart from these techniques there are certain other techniques which allow us to
remove noisy data and also clean the dataset. This helps us to get accurate analysis and
prediction results.

CRYPTO PRICE PREDICTION 11


Fig 2.3: Data Mining Techniques

2.1.4 Benefits of Data Mining:

Data mining has various uses in various sectors of the society:


 In finance sector, it can be used for modeling risks accurately regarding loans
and other facilities.

 In marketing, it can be used for predicting profits and also can be used for
creating targeted advertisements for various customers.
 In retail sector, it is used for improving consumer experience andalso
increasing the amount of profits.
 Tax governing organizations use it to determine frauds in
transactions.

CRYPTO PRICE PREDICTION 12


2.2 EXISTING SYSTEM
Nowadays Investors and Researchers trying to understand the fluctuation in prices of
cryptocurrencies, it is important to have a system that can help to predict the change in
prices on daily basis. Like the stock exchange crypto currency price change is quite volatile
and can be difficult to get a high accurate prediction. The value of crypto currency or any
other cryptocurrency cannot be static and can vary for every second. The fluctuation is
completely dependent on the amount being paid for crypto currency by buyers. As crypto
currency is used as an investment, the same principle applied in stocks for buying cheap
and selling at a high price is applicable for cryptocurrency. The volatile nature of the
cryptocurrency makes it much more challenging and interesting for analysts and investors
to predict the accurate price.
The prediction and approximation of crypto currency prices is an area where much
research has not been done. Since investors are keen to know the direction of
cryptocurrency price
i.e. high or low it is vital to have an algorithm that gives the best accuracy in terms of
determining the range. A lot of work and research has been done in trying to predict the
direction of stock prices and very less in terms of cryptocurrency.

1. Disadvantages:

 Storing of large amounts of data that contains a lot of information about Bit coin
price is posing a challenge for the Researchers and the Investors.

 Sometimes the data is entered manually and humans can make mistakes, so
there are chances of incorrect data being entered in the dataset which can lead
to inaccurate results while analyzing the data.

 In such a large dataset, there is always a chance of some fields containing missing
values, these missing values can make the data noisy and thus we must take
appropriate measures to remove inconsistency from the datasets.

 The Investors and the researchers do not have adequate techniques to analyze
and study the data to get some inference out of it and use this inference to
efficiently predict the price of the crypto currency.

CRYPTO PRICE PREDICTION 13


In some researches Omkar S. Deorukhkar1, Shrutika H. Lokhande2, Vanishree R.
Nayak3,Amit A. Chougule Uses the simple Feedforward Neural Network for Bit coin price
prediction is not so efficient and accurate when it comes to time series or problems
which involve dealing with data that is sequential. This is because at fundamental level in
a feedforward neural network, the neuron does not consider learning from past data.
Hence, Recurrent Neural Networks were designed to overcome this problem. RNNs
consider previous time step output along with the current input state. This makes them
better than Feedforward neural networks in learning sequential data.

3. PROPOSED SYSTEM

The proposed system implements machine algorithm to build the model to predict the
price of the bit-coin based on historical dataset available on online database.In the
proposed model, The can be done using the LSTM(Long Short Term Memory) is one of
the type of the RNN (Recurrent Neural Networks). The tool used for project are
anaconda-navigator.The procedure to be followed for the proposed system is given as
follows:
 First, collect the data set using the Rest-API to collect the historic of the bit-coin
prices from the online database.
 Arrange the data into the data frame according to the problem definition, so as
to get analysis correct and produce the results which are efficient to meet goals
of the system.
 Then the rows of the dataset which are outdated for analysis/prediction to build
a model and in-order to feed the relevant data to the model extra columns are
removed and stored into a CSV file.

 Then data-preprocessing is performed to missing values for the attributes, this


done to reduce the noise and inconsistency in the data.

 Then we Build the model for the data-set using the LSTM (RNN) algorithm to
predict values of bit-coin on daily basis.

 Test the predictions with different layers of the RNN Model.

CRYPTO PRICE PREDICTION 14


4. SOFTWARE DESCRIPTION

1. JUPYTER NOTEBOOK

The Jupyter Notebook App is a server-customer application that permits altering and
running note pad records by means of an internet browser. The Jupyter Notebook App
can be executed on a nearby work area requiring no web access (as portrayed in this
report) or can be introduced on a remote server and got to through the web.
Notwithstanding showing/altering/running note pad archives, the Jupyter Notebook App
has a "Dashboard" (Notebook Dashboard), a "control board" indicating nearby records
and permitting to open note pad reports or closing down their portions.

A scratch pad part is a "computational motor" that executes the code contained in a
Notebook record. The ipython part, referenced in this guide, executes python code.

Portions for some, different dialects exist (official parts).When you open a Notebook
report, the related part is consequently propelled. At the point when the scratch pad is
executed (either cell-by-cell or with menu Cell - > Run All), the portion plays out the
calculation and produces the outcomes. Contingent upon the sort of calculations, the
piece may expend critical CPU and RAM.

Note that the RAM isn't discharged until the part is closed down, he Notebook
Dashboard is the part which is indicated first when you dispatch Jupyter Notebook App.
The Notebook Dashboard is essentially used to open note pad archives, and to deal with
the running portions (picture and shutdown).

The Notebook Dashboard has different highlights like a record director, in particular
exploring organizers and renaming/erasing documents.

CRYPTO PRICE PREDICTION 15


2.4.2 MATPLOTLIB

People are exceptionally visual animals: we comprehend things better when we see
things envisioned. Notwithstanding, the progression to showing investigations, results or
bits of knowledge can be a bottleneck: you probably won't realize where to begin or you
may have as of now a correct configuration as a top priority, however then inquiries like
"Is this the correct method to imagine the bits of knowledge that I need to convey to my
group of onlookers?" will have unquestionably gone over your brain.

When you're working with the Python plotting library Matplotlib, the initial step to
responding to the above inquiries is by structure up information on themes like: The life
structures of a Matplotlib plot: what is a subplot? What are the Axes? What precisely is a
figure?

Plot creation, which could bring up issues about what module you precisely need to
import (pylab or pyplot?), how you precisely ought to approach instating the figure and
the Axes of your plot, how to utilize matplotlib in Jupyter note pads, and so on.

Plotting schedules, from straightforward approaches to plot your information to further


developed methods for picturing your information. Essential plot customizations, with
an emphasis on plot legends and content, titles, tomahawks marks and plot format.

Sparing, appearing, … your plots: demonstrate the plot, spare at least one figures to, for
instance, pdf documents, clear the tomahawks, clear the figure or close the plot, and so
on.

In conclusion, you'll quickly cover two manners by which you can alter Matplotlib: with
templates and the rc settings.

Since all is set for you to begin plotting your information, it's an ideal opportunity to
investigate some plotting schedules. You'll regularly go over capacities like plot() and
disperse(), which either draw focuses with lines or markers interfacing them, or draw
detached focuses, which are scaled or shaded. In any case, as you have just found in the
case of the primary area, you shouldn't neglect to pass the information that you need
these capacities to utilize!

CRYPTO PRICE PREDICTION 16


These capacities are just the exposed rudiments. You will require some different capacities
to ensure your plots look magnificent:

2.4.3 NUMPY

NumPy is, much the same as SciPy, Scikit-Learn, Pandas, and so forth one of the bundles
that you can't miss when you're learning information science, principally in light of the
fact that this library gives you a cluster information structure that holds a few advantages
over Python records, for example, being increasingly reduced, quicker access in perusing
and composing things, being progressively advantageous and increasingly productive.

NumPy exhibits are somewhat similar to Python records, yet at the same time
particularly unique in the meantime. For those of you who are new to the subject, how
about we clear up what it precisely is and what it's useful for.

As the name gives away, a NumPy cluster is a focal information structure of the numpy
library. The library's name is another way to say "Numeric Python" or "Numerical
Python".

At the end of the day, NumPy is a Python library that is the center library for logical
registering in Python. It contains an accumulation of apparatuses and strategies that can
be utilized to settle on a PC numerical models of issues in Science and Engineering. One
of these apparatuses is an elite multidimensional cluster object that is an incredible
information structure for effective calculation of exhibits and lattices.

CRYPTO PRICE PREDICTION 17


To work with these clusters, there's a tremendous measure of abnormal state scientific
capacities work on these grids and exhibits. since you have set up your condition, it's the
ideal opportunity for the genuine work. In fact, you have officially gone for some stuff
with exhibits in the above DataCamp Light pieces. Be that as it may, you haven't generally
gotten any genuine hands- on training with them, since you originally expected to
introduce NumPy all alone pc. Since you have done this current, it's a great opportunity
to perceive what you have to do so as to run the above code pieces without anyone else.

A few activities have been incorporated underneath with the goal that you would
already be able to rehearse how it's done before you begin your own!

To make a numpy exhibit, you can simply utilize the np.array() work. You should simply
pass a rundown to it, and alternatively, you can likewise indicate the information sort of
the information. In the event that you need to find out about the conceivable
information types that you can pick, go here or consider investigating DataCamp's
NumPy cheat sheet.

There's no compelling reason to proceed to retain these NumPy information types in


case you're another client; But you do need to know and mind what information you're
managing. The information types are there when you need more power over how your
information is put away in memory and on plate. Particularly in situations where you're
working with broad information, it's great that you know to control the capacity type.

Remember that, so as to work with the np.array() work, you have to ensure that the
numpy library is available in your condition. The NumPy library pursues an import
tradition: when you import this library, you need to ensure that you import it as np. By
doing this, you'll ensure that different Pythonistas comprehend your code all the more
effectively.

CRYPTO PRICE PREDICTION 18


2.4.4 PANDAS

Pandas is an open-source, BSD-authorized Python library giving elite, simple to-utilize


information structures and information examination instruments for the Python
programming language.

Python with Pandas is utilized in a wide scope of fields including scholastic and business
areas including money, financial matters, Statistics, examination, and so on. In this
instructional exercise, we will get familiar with the different highlights of Python Pandas
and how to utilize them practically speaking.

This instructional exercise has been set up for the individuals who try to become
familiar with the essentials and different elements of Pandas. It will be explicitly valuable
for individuals working with information purging and examination. In the wake of
finishing this instructional exercise, you will wind up at a moderate dimension of ability
from where you can take yourself to more elevated amounts of skill.

You should need have a fundamental comprehension of Computer Programming


phrasings. A fundamental comprehension of any of the programming dialects is an or
more.Pandas

library utilizes the vast majority of the functionalities of NumPy. It is recommended that
you experience our instructional exercise on NumPy before continuing with this
instructional exercise.

CRYPTO PRICE PREDICTION 19


2.4.5 ANACONDA

Anaconda constrictor is bundle director. Jupyter is an introduction layer.Boa constrictor


endeavors to explain the reliance damnation in python—where distinctive tasks have
diverse reliance variants—in order to not influence distinctive venture conditions to
require diverse adaptations, which may meddle with one another.
Jupyter endeavors to fathom the issue of reproducibility in investigation by empowering
an iterative and hands-on way to deal with clarifying and imagining code; by utilizing rich
content documentations joined with visual portrayals, in a solitary arrangement.
Boa constrictor is like pyenv, venv and minconda; it's intended to accomplish a python
situation that is 100% reproducible on another condition, autonomous of whatever
different forms of a task's conditions are accessible. It's somewhat like Docker, however
limited to the Python biological system.
Jupyter is an astounding introduction device for expository work; where you can display
code in "squares," joins with rich content depictions among squares, and the
consideration of organized yield from the squares, and charts created in an all around
planned issue by method for another square's code.

Jupyter is extraordinarily great in expository work to guarantee reproducibility in


somebody's exploration, so anybody can return numerous months after the fact and
outwardly comprehend what somebody attempted to clarify, and see precisely which
code drove which representation and end.

Regularly in diagnostic work you will finish up with huge amounts of half-completed note
pads clarifying Proof-of-Concept thoughts, of which most won't lead anyplace at first.

A portion of these introductions may months after the fact—or even years after the fact
— present an establishment to work from for another issue.

CRYPTO PRICE PREDICTION 20


2.4.6 PYTHON

Python could be a translated, object-arranged, abnormal state artificial language with


dynamic linguistics. Its abnormal state worked in information structures, joined with
dynamic composing and dynamic authoritative, make it appealing for Rapid Application
Development, just as for use as a scripting or paste language to interface
existingsegments together. Python's basic, simple to learn language structure underlines
intelligibility and hence decreases the expense of program support. Python underpins
modules and bundles, which empowers program seclusion and code reuse. The Python
translator and the broad standard library are accessible in source or parallel structure
without charge for every single significant stage, and can be openly appropriated.
Frequently, software engineers begin to look all starry eyed at Python on account of the
expanded efficiency it gives. Since there is no aggregation step, the alter test-
troubleshoot cycle is staggeringly quick. Troubleshooting Python programs is simple: a
bug or awful information will never cause a division blame. Rather, when the mediator
finds a blunder, it raises a special case. At the purpose once the program does not get
the special case, the translator prints a stack follow. A source level debugger permits
assessment of nearby and worldwide factors, assessment of discretionary articulations,
setting breakpoints, venturing through the code a line at any given moment, etc. The
debugger is written in Python itself, vouching for Python's contemplative power. Then
again, frequently the speediest method to troubleshoot a program is to add a couple of
print proclamations to the source: the quick alter test-investigate cycle makes this
straightforward methodology successful. Python is an item situated, abnormal state
programming language with incorporated unique semantics essentially for web and
application improvement. It is surprisingly tempting within the field of fast Application
Development since it offers dynamic composing and dynamic proscribing alternatives.

Python is generally basic, so it's anything but difficult to learn since it requires a one of a
kind language structure that centers around coherence. Designers can peruse and
interpret Python code a lot simpler than different dialects. Thusly, this decreases the
expense of program upkeep and improvement since it enables groups to work
cooperatively without huge language and experience obstructions.

CRYPTO PRICE PREDICTION 21


Moreover, Python underpins the utilization of modules and bundles, which implies that
projects can be planned in a secluded style and code can be reused over an assortment
of tasks.
When you've built up a module or bundle you need, it very well may be scaled for use in
different tasks, and it's anything but difficult to import or fare these modules.
A standout amongst the most encouraging advantages of Python is that both the
standard library and the mediator are accessible for nothing out of pocket, in both
parallel and source structure. There is no restrictiveness either, as Python and all the
important instruments are accessible on every single real stage. In this way, it is a
tempting alternative for designers who would prefer not to stress over paying high
improvement costs.

CRYPTO PRICE PREDICTION 22


CHAPTER 3

REQUIREMENT ANALYSIS

1. FUNCTIONAL REQUIREMENTS
The functions of software systems are defined in functional requirements and the
behavior of the system is evaluated when presented with specific inputs or conditions
which may include calculations, data manipulation and processing and other specific
functionality.

 Our system should be able to read the crime data and preprocess data.
 It should be able to analyze the crime data.
 It should be able to group data based on hidden patterns.
 It should be able to assign a label based on its data groups.
 It should be able to split data into train set and test set.
 It should be able to train model using train set.
 It must validate trained model using test set.
 It should be able to classify the crime data.

2. NON-FUNCTIONAL REQUIREMENTS

Nonfunctional needs describe however a system should behave and establish constraints
of its practicality.This type of needs is additionally called the system’s quality attributes..
Attributes such as performance, security, usability, compatibility are not the feature of
the system, they are a required characteristic. They are "developing" properties that
emerge fromthe whole arrangement and hence we can't compose a particular line of
code to execute them. Any attributes required by the customer are described by the
specification. We must include only those requirements that are appropriate for our
project.

CRYPTO PRICE PREDICTION 23


Some Non-Functional Requirements are as follows:
 Reliability
 Maintainability
 Performance
 Portability
 Scalability
 Flexibility

1. ACCESSIBILITY:

Availability is a general term used to depict how much an item, gadget, administration,
or condition is open by however many individuals as would be prudent.

In our venture individuals who have enrolled with the cloud can get to the cloud to
store and recover their information with the assistance of a mystery key sent to their
email ids. UI is straightforward and productive and simple to utilize.

2. MAINTAINABILITY:

In programming designing, viability is the simplicity with which a product item can be
altered so as to:

• Correct absconds

• Meet new necessities

New functionalities can be included in the task based the client necessities just by
adding the proper documents to existing venture utilizing ASP.net and C# programming
dialects. Since the writing computer programs is extremely straightforward, it is simpler
to discover and address the imperfections and to roll out the improvements in the
undertaking.

CRYPTO PRICE PREDICTION 24


3. SCALABILITY:

Framework is fit for taking care of increment all out throughput under an expanded
burden when assets (commonly equipment) are included.

Framework can work ordinarily under circumstances, for example, low data transfer
capacity and substantial number of clients.

4. PORTABILITY:

Conveyability is one of the key ideas of abnormal state programming. Convenient


is the product code base component to have the capacity to reuse the current code as
opposed to making new code while moving programming from a domain to another.
Venture can be executed under various activity conditions gave it meet its base setups.
Just framework records and dependant congregations would need to be designed in
such case.

3.3 HARDWARE REQUIREMENTS


Processor : Any Processor above 500 MHz
RAM : 4 GB
Hard Disk : 500 GB
System : Pentium IV 2.4 GHz
Any system with above or higher configuration is compatible for this project.

3.4 SOFTWARE REQUIREMENTS

• Operating system : Windows 7/8/9/10


• Programming lang : Python
• IDE : Jupyter Notebook
• Tools : Anaconda

CRYPTO PRICE PREDICTION 25


CHAPTER 4

DESIGN
4.1 DESIGN GOALS

The goal of this project is to predict the highest and closing price of crypto currency on
a given day based on the crypto currency data of several preceding quarters. It is
technically challenging to predict the accurate price, mainly due to lack of seasonality
and highly volatile nature of the cryptocurrency market. This is primarily a statistic
prediction drawback. Artificial neural network (ANNs) models of time series is used to
perform the prediction task, mainly due to the ability of ANNs to deal with non-
linearities in the data such as lack of seasonality These two models are trained and
tested on crypto currency data starting from 2012 till the first quarter of 2018. In
order to make the one day ahead prediction of highest and closing price of crypto
currency, features such as open price, high price, low price, close price and volume of
currency (USD) are taken into consideration. To predict the highest and closing price
on a day of quarter, both the neural network models are trained with data over the
past eight quarters and it is tested over the next quarter The document explains the
info preparation steps followed by the neural network models and their practicality.
Quantitative measures like to MSE (mean square error), NMSE (normalized mean
square error. The predicted high and closing price using these two neural networks
are presented in tabular format. At the end, the report discusses possible
improvements that can be made to increase the scope of the experiment. The
constant increase in crypto currency usage has become an extremely serious problem,
with the development of technology and hi- tech tools having a significantly greater
impact on the crypto currency price. The large amounts of information also pose’s a
challenge to analyze such data and identify similarities or relations between the data.
Also there is a challenge of inconsistency that can occur in the data due to
incompleteness in the dataset. Therefore, there is an urging need of proper
techniques to analyze large volumes of data to get some useful results out of it. So the
main aim of this project is to propose a general and effective approach to predict the
crypto currency price using data mining techniques.

CRYPTO PRICE PREDICTION 26


The main options of the projected system are:
 More efficient.
 Better crypto currency price monitoring systems.
 Reduces the costs of storage, maintenance and personnel.
 It reduces the time complexity of the system.
 System that has a simpler architecture to understand.
 Processing of large amount of data becomes easier.

1. INPUT/OUTPUT PRIVACY

No sensitive information from the large data sets are taken. The data taken are
of use to the society as it helps in solving important problems.

2. EFFICIENCY
The local computations done by the programmer helps the system that is
developed to be more efficient than the rest of the systems. Efficiency is very important
when it comes to large systems, as it plays an important role.

CRYPTO PRICE PREDICTION 27


2. SYSTEM ARCHITECTURE
The architecture of the proposed system has the following components:

 Crime dataset which consists of the crimes that have occurred from day to day
for 10 years.

 Training data to train themodels.

 Testing data to apply themodels.

 Data storage- stores data.

 Classification and prediction algorithms.

 Forecast engine.

crypto
currency

database

Fig 4.1 System Architecture

CRYPTO PRICE PREDICTION 28


4.3 DATA FLOW DIAGRAM

Fig 4.2: Data Flow Diagram

CRYPTO PRICE PREDICTION 29


CHAPTER 5

IMPLEMENTATION

5.1 DATASET

Several crypto currency data sets are available online to download for free. Most of
them provide the data related to price of crypto currency on a minute to minute basis
However, the top goal of the project is to create one-day ahead prediction of highest
and shutting worth of crypto currency. So, we will need data such as highest and
closing price of crypto currency for each day over period of several years The Quandl
API provides the crypto currency worth knowledge set, ranging from September 2011 –
2018 (present). This API gives access to crypto currency exchanges and daily crypto
currency values. It permits users to customise the question whereas victimisation the
interface to transfer the historical crypto currency costs. The data is available in three
different formats i.e JSON, XML and CSV. Data is downloaded in the
.csv format. Size of data is around 200KB. It has a total of 2381 data records (each
record corresponds to a day) consisting of crypto currency open, high, low, closing price
and volume of crypto currency (USD) starting from Sept 2011 – 2018 (present).
However due to inconsistencies in the data from September 2011 to December 2011,
this data has been
discarded and data records starting from January 2012 – March 2018 are taken into
consideration for this project. So, after the data is cleaned, the final data set has a total
of 2271 data records. The total data records are divided into three (3) sets, namely: Y12-
13 –2012 and 2013 data, Y1415 – 2014 and 2015 data, Y16-17 – 2016 and 2017 data.
Y12-13 has eight quarters and the neural networks (TDNN, RNN) will be trained on this
data and tested on the first quarter of 2014. Similarly, Y14-15 has eight quarters and
the neural networks will be trained on this data and tested on the first quarter of 2016.
In the same way, Y16-17 has eight quarters and neural networks are trained on this
data and tested on the first quarter of 2018.

To predict the highest and closing price of crypto currency one day ahead, in each of
the sub data sets, columns high and close are shifted up by one (1) unit. In the three
sub data sets, it should be noted that the testing data is from 1st January to 18th March
and it is predicted on 19th March (of years 2014, 2016, 2018) for three sets
respectively. The data set has limited features and in the current project almost all
these features are considered valuable for the prediction task. To be clear, for
predicting the highest and closing price of crypto currency one step ahead, features
such as open, high, low, closing price and volume of crypto currency (USD) are used.

CRYPTO PRICE PREDICTION 30


2. DATA PREPROCESSING
The primary knowledge collected from the web sources remains within the raw kind of
statements, digits and qualitative terms. The raw data contains error, omissions and
inconsistencies. It requires corrections after careful scrutinizing the completed
questionnaires. The following steps square measure concerned within the process of
primary knowledge. A huge volume of information collected through field survey must
be sorted for similar details of individual responses..

Data Preprocessing could be a technique that's accustomed convert the {raw


knowledge|data|information} into a clean data set. In alternative words, whenever the
info is gathered from totally different sources it's collected in raw format that isn't
possible for the analysis.

Therefore, bound steps square measure dead to convert {the knowledge| the info| the
information} into a little clean data set. This technique is performed before the
execution of reiterative Analysis. The set of steps is understood as knowledge
preprocessing.. The process comprises:
 Data Gathering

 Data Cleaning

 Data Normalization

Data Gathering:
Daily data of four channels are considered since 2013.First, the crypto currency price
history, which is extracted from Coin market cap through its open API. Secondly, data
from Blockchain is gathered, in particular we choose the average block size, the
number of
user addresses, number of transactions, and the miners revenue. We found it counter
intuitive to have some Blockchain data, given the incessant scaling problem, on the
other hand, the number of accounts, by definition is related to the price movements,
since an increase in the number of accounts, either means more transactions occurring
(presumably for exchanging with different parties and not just transferring crypto
currencys to another address), or it is a sign of more users joining the network.

CRYPTO PRICE PREDICTION 31


Thirdly, for the sentiment data we obtain the Interest over time for the word 'crypto
currency' using PyTrends library. Lastly, two indices are considered, that of S&P 500 and
Dow and Jones. Both are retrieved through Yahoo Finance API.

All in all, these make for 12 features. The Pearson correlation between the attributes is
shown in Figure 2. Clearly, some attributes are not too correlated, for example, the
financial indices are relevant with each other, but not with any of crypto
currencyrelated attributes. Also, we see how Google Trends are related to crypto
currency transactions

Data Cleaning:
From exchange data we consider relevant only the Volume, Close, Open, High prices
and
Market capitalization. For all data sets if NaN values are found to be existent, they are
replaced with the mean of the respective attribute. After this, all datasets are merged
into one, along the time dimension. Judging from crypto currency price movements
during the period from 2013 until 2014, we considered best to get rid of data points
before 2014, hence the data which will be passed to the network lies from 2014 until
September 2018.

Data Normalization:
Deciding on the method for normalizing a time series, especially financial ones is never
easy. What's more, as a rule of thumb a neural network should load data that take
relatively large values, or data that is heterogeneous (referring to time-series that have
different scales, like exchange price, with Google Trends). Doing so can trigger large
gradient updates that will prevent the network from converging. To make learning
easier for the network, data should have the following characteristics

Data Preprocessing is important due to the presence of unformatted universe


knowledge. Mostly realworld data is composed of:

 Inaccurate knowledge (missing knowledge) - There ar several reasons


for missing knowledge like data isn't incessantly collected, an error in knowledge
entry, technical issues with life science and much more.

CRYPTO PRICE PREDICTION 32


 The presence of clattery knowledge (erroneous knowledge and outliers) - the
explanations for the existence of clattery knowledge can be a technological
downside of device that gathers knowledge.

 human mistake throughout knowledge entry and far additional. Inconsistent


knowledge - The presence of inconsistencies ar thanks to the explanations such
existence of duplication inside knowledge, human knowledge entry, containing
mistakes in codes or names, i.e.,

The column Resolution is dropped because it does not provide any assistance and has no
significance in helping to predict the target variable.

5.3 CLASSIFICATION

This technique is used to divide various data into different classes. This process is also
similar to clustering. It segments data records into various segments which are known as
classes. Unlike clustering, here we have knowledge of different clusters. Ex: Outlook
email, they have an algorithm to categorize an email as legitimate or spam.

Fig 5.1: Classification

CRYPTO PRICE PREDICTION 33


Some of the classification algorithms are:

 Linear Classifiers: Logistic Regression, Naive Bayes Classifier

 Support Vector Machines

 Decision Trees

 Boosted Trees

 Random Forest

 Neural Networks

CRYPTO PRICE PREDICTION 34


5.3.1 : TIME SERIES DATA

Normally a time series is a sequence of numbers along time. LSTM for sequenceprediction
acts as a supervised algorithm unlike its autoencoder version. As such, the overall dataset
should be split into inputs and outputs. Moreover, LSTM is great in comparison with classic
statistics linear models, since it can easier handle multiple input forecasting problems. In
our approach, the LSTM will use previous data to predict 30 days ahead of closing price.
First, we have a need to decide on how many previous days one forecast will have access
to. This number we refer as the window size. We have opted for 35 days in case of
monthly prediction, and 65 days in that of 2 months prediction, therefore the input data
set will be a tensor comprising of matrices with dimension 35x12/65x12 respectively, such
that we have 12 features, and 35 rows in each window. So the first window will consist of 0
to the 34 row (python is zero indexed), the second from 1 to 35 and so on. Another reason
for choosing this window length is that a small window leaves out patterns which may
appear in a longer sequence. The output data takes into account not only the window size
but also the prediction range which in our case is 30 days. The output dataset starts from
row 35 up until the end, and is made of chunks of length 30. The prediction range also
determines the output size for the LSTM network.

Split into training and test data:


This step is one of the most important, especially in the case of crypto currency. We first
wanted to predict the year ahead, but this would mean, that data from 1 Jan 2018 until
September
2018 would be used for testing, the downside of this, is ofcourse the slight slope in 2017,
which would make the neural network learn this pattern by the last input, and the
prediction of year 2018 would not be very logical. Thus we go for training data from 2014-
01-01 until 2018- 07-05, this leaves us with approximately 2 months for prediction, while
we predict for two months, the data set is split a bit earlier to leave room for 2 months:
2018-06-01. Each training set and test set is composed of input and output features.

CRYPTO PRICE PREDICTION 35


Turn data into tensors
LSTM expects that the input is given in the form of a 3 dimensional vector of float values.
A key feature of tensors is their shape, which in Python is a tuple of integers representing
the dimensions of it along the 3 axis. For instance, in our testing data of crypto currency,
the shape of training inputs is: (1611,35,12), so we have 1611 samples, a window size
(timestep) of 35 values, and 12 features.

In overall the idea is simple, in that we separate the data into chunks of 35, and push
these small windows of data into a numpy array. Each window is a 35x12 matrix, so all
windows will create the tensor. Furthermore, in LSTM the input layer is by design,
specified from the input shape argument on the first hidden, the these three dimensions
of input shape

5.3.2 : LSTM IMPLEMENTATION

LSTM internals

A chief feature of feed forward Networks, is that they don’t retain any memory.
So each input is processed independently, with no state being saved between inputs.
Given that we are dealing with time series where information from previous crypto
currency price are needed, we should maintain some information to predict the future. An
architecture providing this is the Recurrent neural network (RNN) which along with the
output has a self-directing loop. So the window we provide as input gets processed in a
sequence rather than in a single step. However, when the time step (size of window) is
large (which is often the case) the gradient gets too small/large, which leads to the
phenomenon known as vanishing/exploding gradient respectively [Chollet2017]. This
problem occurs while the optimizer backpropagates, and will make the algorithm run,
while the weights almost do not change at all. RNN variations mitigate the problem,
namely LSTM and GRU.

CRYPTO PRICE PREDICTION 36


The LSTM layer adds some cells that carry information across many timesteps .
The cell state is the horizontal line from Ct−1 to Ct, and its importance lies in holding the
long-term or short term memory. The output of LSTM is modulated by the state of these
cells. And this is important when it comes to predict based on historic context, rather than
only the last input. LSTM networks manage to remember inputs by making use of a loop.
These loops are absent in RNN. On the other hand, as more time passes, the less likely it
becomes that the next output depends on a very old input, therefore forgetting is
necessary. LSTM achieves this by learning when to remember and when to forget, through
their forget- gates. We mention them shortly to not consider LSTM just as a black box
model [Olah2015].

Fig 5.2: LSTM cell

• Forget gate: ft = σ(WfSt−1 + WfSt)


• Input gate: it = σ(WiSt−1 + WiSt)
• Output gate: ot = σ(WoSt−1 + WoSt)

CRYPTO PRICE PREDICTION 37


Architecture of Network:

We used the Sequential API, rather than the functional one. The overall architecture is
as follows:
•1 LSTM Layer: The LSTM layer is the inner one, and all the gates, mentioned at the
very beginning are already implemented by Keras, with a default activation of hard-
sigmoid [Keras2015]. The LSTM parameters are the number of neurons, and the input
shape as discussed above.
•1 Dropout Layer: Typically this is used before the Dense layer. As for Keras, a dropout
can be added after any hidden layer, in our case it is after the LSTM.
• 1 Dense Layer: This is the regular fully connected layer.
• 1 Activation Layer: Because we are solving a regression problem, the last layer should
give the linear combination of the activations of the previous layer with the weight
vectors. Therefore, this activation is a linear one. Alternatively, it could be passed as a
parameter to the previous Dense layer.

CRYPTO PRICE PREDICTION 38


IMPLEMENTATION

import numpy as

np import pandas

as pd

import matplotlib.pyplot as plt

import math

from sklearn.preprocessing import MinMaxScaler


from sklearn.metrics import mean_squared_error

from keras.models import Sequential


from keras.layers import Dense
from keras.layers import LSTM

data = pd.read_csv("crypto

currency.csv") data.head()

data['rp_key'].value_counts()

df = data.loc[(data['rp_key'] == 'btc_us')]

df.head()

df = df.reset_index(drop=True)

df['datetime'] =
pd.to_datetime(df['datetime_id'])
df = df.loc[df['datetime'] >
pd.to_datetime('2017-06-28 00:00:00')]

df = df[['datetime', 'last', 'diff_24h', 'diff_per_24h', 'bid', 'ask', 'low', 'high', 'volume']]

df.head()
data1["month"]=data1["Timestamp"].dt.year

data1["year"]=data1["Timestamp"].dt

.month

CRYPTO PRICE PREDICTION 40


data1["day"]=data1["Timestamp"].dt.day

data1.head()

data1["hour"]=data1["Timestamp"].dt.hour

data1["minute"]=data1["Timestamp"].dt

.minute

data1["seconds"]=data1["Timestamp"].dt

.second data1.head()

data1 = data1.rename(columns = {'Volume_(BTC)':'VolumeBTC',

'Volume_(Currency)' : 'VolumeCurrency',

'Weighted_Price' : 'WeightedPrice' })

data1['Open'].plot

() plt.show()

data1["Log_Normalization"]=data1["Open"]/len(data1["Open"])

data1["Log_Normalization"].head()

print('BTC Volume Threshold is : ', threshold)

data1["VolumeLevel"] = ["high" if i > threshold else "low" for i in data1.VolumeBTC]

data1.loc[:,["VolumeLevel","VolumeBTC"]].head(

# adjusting VolumeBTC threshold

threshold = sum(data1.Volume1BTC)/len(data1.VolumeBTC)

CRYPTO PRICE PREDICTION 41


threshold

df = df[['last']]

dataset = df.values

dataset = dataset.astype('float32')

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)

train_size = int(len(dataset) * 0.67) test_size

= len(dataset) - train_size

train, test = dataset[0:train_size, :],


dataset[train_size:len(dataset),

:] print(len(train), len(test))

# convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []

for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return np.array(dataX), np.array(dataY)

look_back = 10

trainX, trainY = create_dataset(train,

look_back=look_back) testX, testY = create_dataset(test,

look_back=look_back) trainX

trainY

CRYPTO PRICE PREDICTION 42


# reshape input to be [samples, time steps, features]

trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

model = Sequential()

model.add(LSTM(4, input_shape=(1, look_back)))

model.add(Dense(1))

model.compile(loss='mean_squared_error',

optimizer='adam')

model.fit(trainX, trainY, epochs=100, batch_size=256, verbose=2)

trainPredict = model.predict(trainX)

testPredict = model.predict(testX)

trainPredict =

scaler.inverse_transform(trainPredict) trainY =

scaler.inverse_transform([trainY]) testPredict =

scaler.inverse_transform(testPredict) testY =

scaler.inverse_transform([testY])

trainscore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,

0])) print('Train Score: %.2f RMSE' % (trainscore))

testscore = math.sqrt(mean_squared_error(testY[0], testPredict[:, 0]))

print('Test Score: %.2f RMSE' % (testscore))

trainPredictPlot =

np.empty_like(dataset)

trainPredictPlot[:, :] = np.nan

CRYPTO PRICE PREDICTION 43


trainPredictPlot[look_back:len(trainpredict) + look_back, :] = trainPredict

testPredictPlot =
np.empty_like(dataset)

testPredictPlot[:, :] = np.nan

testPredictPlot[len(trainPredict) + (look_back * 2) + 1:len(dataset) - 1, :] = testPredict

plt.plot(df['last'], label='Actual')

plt.plot(pd.DataFrame(trainPredictPlot, columns=["close"], index=df.index).close,

label='Training')

plt.plot(pd.DataFrame(testPredictPlot, columns=["close"], index=df.index).close,

label='Testing')

plt.legend(loc='best')

plt.show()

CRYPTO PRICE PREDICTION 44


CHAPTER 6
TESTING
The reason for testing is to find blunders. Testing is the way toward endeavoring to
find
each possible blame or shortcoming in a work item. It gives an approach to check the
usefulness of parts, sub gatherings, congregations as well as a completed item it is the
way toward practicing programming with the goal of guaranteeing that the Software
framework lives up to its necessities and client desires and does not flop in an unsuitable
way. There are different kinds of test. Each test type tends to a particular testing
prerequisite.

1. UNIT TESTING
Unit testing includes the structure of experiments that approve that the inward program
rationale is working legitimately, and that program inputs produce substantial yields. All
choice branches and inside code stream ought to be approved. It is the trying of
individual programming units of the application .it is done after the finishing of an
individual unit before combination. This is a basic testing, that depends on data of its
development and is obtrusive. Unit tests perform fundamental tests at part level and
test a particular business procedure, application, and additionally framework design.

Unit tests guarantee that every extraordinary way of a business procedure performs
precisely to the recorded particulars and contains obviously characterized information
sources and anticipated outcomes.

2. INTEGRATION TESTING
Joining tests are intended to test incorporated programming segments to decide
whether they really keep running as one program. Testing is occasion driven and is
progressively worried about the fundamental result of screens or fields.

Incorporation tests exhibit that despite the fact that the segments were separately
fulfillment, as appeared by effectively unit testing, the mix of parts is right and reliable.
Coordination testing is explicitly gone for uncovering the issues that emerge from the
blend of segments.

CRYPTO PRICE PREDICTION 45


6.3 VALIDATION TESTING

A building approval test (EVT) is performed on first building models, to guarantee that the
essential unit performs to plan objectives and particulars. It is imperative in recognizing
plan issues, and fathoming them as right off the bat in the structure cycle as could
reasonably be expected, is the way to keeping ventures on schedule and inside spending
plan. Over and over again, item plan and execution issues are not identified until late in
the item improvement cycle — when the item is prepared to be transported. The familiar
saying remains constant: It costs a penny to roll out an improvement in building, a dime
underway and a dollar after an item is in the field.

Check is a Quality control process that is utilized to assess whether an item,


administration, or framework conforms to guidelines, details, or conditions forced
toward the beginning of an improvement stage. Check can be being developed, scale-up,
or creation. This is regularly an inside procedure.

Approval is a Quality affirmation procedure of setting up proof that gives a high level of
confirmation that an item, administration, or framework achieves its planned
prerequisites. This regularly includes acknowledgment of qualification for reason with
end clients and other item partners.

The testing process overview is as follows:

Figure 6.1: The testing process

CRYPTO PRICE PREDICTION 46


• R-Squared Formula

R-squared (R2) is a statistical measure that represents the proportion of


the variance for a dependent variable that's explained by an
independent variable or variables in a regression model.

• Mean Squared Error

The Mean Squared Error (MSE) is perhaps the simplest and most
common loss function, often taught in introductory Machine Learning
courses. To calculate the MSE, you take the difference between your
model’s predictions and the ground truth, square it, and average it out
across the whole dataset.

• Mean Absolute Error

To calculate the MAE, you take the difference between your model’s
predictions and the ground truth, apply the absolute value to that
difference, and then average it out across the whole dataset.

LSTM (Long SVM (Support Random


Short Term Vector Forest
Memory) Machine)
MAE 0.038000458 0.0595876 0.0456383
MSE 0.003944659 0.0054783 0.0045624
0.266917910 0.2357646 0.2634167

CRYPTO PRICE PREDICTION 54


• Above graph shows the training process of the machine.

80% of data is being used to train the machine learning model and
20% data is used for checking the prediction.

CRYPTO PRICE PREDICTION 55


Figure 7.3: prediction graph

CRYPTO PRICE PREDICTION 49


RESULT ANALYSIS

Results presented in previous page compare the actual and


LSTM-predicted price of BTC. The graph shows that the
predicted and the actual price is approximately the same over
the entire interval. This model is considered the best model. The
mean absolute error for the prediction model (MAE) of BTC for
LSTM is 0.38170458, and the mean square error (MSE) is
0.003934259.

Statistical analysis of the data indicates that the predicted price


has a mean value of 9,173.258 USD, a maximum value of
12,358.805 USD, and a minimum value of 4,775.013 USD,
whereas the actual price has a mean value of 9,249.388 USD, a
maximum value of 12,380.999 USD, and a minimum value of
4,941.0 USD. The mean difference between the mean values of
the actual and the predicated prices is 59.13 USD.

CRYPTO PRICE PREDICTION 57


CRYPTO PRICE PREDICTION 58
RESULT ANALYSIS

Results presented in previous page compare the actual and LSTM-


predicted price of BTC. The graph shows that the predicted and the
actual price is approximately the same over the entire interval. This
model is considered the best model. The mean absolute error for
the prediction model (MAE) of BTC for LSTM is 0.38000458, and the
mean square error (MSE) is 0.003944659.

Statistical analysis of the data indicates that the predicted price has
a mean value of 57,173.258 USD, a maximum value of 64,358.805
USD, and a minimum value of 50,775.013 USD, whereas the actual
price has a mean value of 57,249.388 USD, a maximum value of
64,380.999 USD, and a minimum value of 50,941.0 USD. The mean
difference between the mean values of the actual and the
predicated prices is 76.13 USD.

CRYPTO PRICE PREDICTION 59


CHAPTER 8

CONCLUSION

This was a very nice exposure to learn a lot of new concepts. Crypto currency
prediction is a very crucial topic to deal with and making a system suitable
for it was a challenging role to do. This project was an approach to use
different neural network modules such as LSTM, SVM, RF and compare their
errors based on their prediction on the given dataset.

From the comparison table in page 54 we can see that MAE (mean squared
error) and MSE is the least for lstm while both svm and random forest have
higher error rate. We also passed different datasets and compared the error
in each case, we found that the error in LSTM model was least as compared
to the other models. LSTM model was most suitable for predicting the price
with least error.

CRYPTO PRICE PREDICTION 51


REFERENCES
 http://www.iosrjournals.org/iosr-jce/papers/Vol19-issue3/Version-1/B190301061
7.pdf

 https://ieeexplore.ieee.org/document/7395797
 https://www.geeksforgeeks.org/unified-modeling-language-uml-sequence-diagra
ms/
 https://www.geeksforgeeks.org/designing-use-cases-for-a-project/
 https://dl.acm.org/citation.cfm?id=170072 https://www.edureka.co/blog/apriori-
algorithm/
 https://content.iospress.com/articles/intelligent-data-analysis/ida1-1-02
 https://link.springer.com/book/10.1007%2F978-3-319-10247-4
 https://www.geeksforgeeks.org/apriori-algorithm/

CRYPTO PRICE PREDICTION 52

You might also like