0% found this document useful (0 votes)
141 views31 pages

Internship Report

This document provides an internship report on machine learning. It includes 6 chapters that cover an introduction to machine learning, the basics of financial markets, and details about the internship modules. The introduction defines machine learning, classifies the different types (supervised, unsupervised, reinforcement, semi-supervised learning), and provides examples of machine learning problems and applications. The internship involved 6 modules that provided learning experiences in topics related to machine learning and financial markets.

Uploaded by

Nitesh Bisht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views31 pages

Internship Report

This document provides an internship report on machine learning. It includes 6 chapters that cover an introduction to machine learning, the basics of financial markets, and details about the internship modules. The introduction defines machine learning, classifies the different types (supervised, unsupervised, reinforcement, semi-supervised learning), and provides examples of machine learning problems and applications. The internship involved 6 modules that provided learning experiences in topics related to machine learning and financial markets.

Uploaded by

Nitesh Bisht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

INTERNSHIP REPORT

ON
‘Machine Learning’

Department of Electronics and Communication Engineering


Ambedkar Institute of Advanced Communication Technologies & Research

​Submitted by:
​ itesh Bisht
N
40710102816

1
ACKNOWLEDGEMENT

The internship opportunity I had with Career Launcher was a great chance for
learning and professional development. Therefore, I consider myself very fortunate
as I was provided with an opportunity to be a part of it. I am also grateful for
having a chance to work with many wonderful people and professionals who led
me through this internship period.

2
Table of contents

Chapter 1:

Introduction to Machine Learning……………………………………………….4-7

1.1 Introduction ……………..…………………………………………….……..4-5

1.2 Classification of ML …………...……………………………………………5-7

1.3 Categorizing on the basis of required output……………...………………....7

Chapter 2:

Basics of Financial Market……………………………………………………..8-14

2.1 Basic Definition…………………………………………...……………...…..8

2.2 Stock Market and its working….………………………………………...…...9-12

2.3 Equities and commodities………………………………………………..…...12-13

2.4 Market Index………………………………………………………..………...13-14

Chapter 3:

About Internship………………………………………………………………..15-31

3.1 Introduction………………....………………………………….…..………...15

3.2 Module 1…………………..……………………………………………..…..16-18

3.2 Module 2…………………..……………………………………………..…..18-25

3.2 Module 3…………………..……………………………………………..…..25-27

3.2 Module 4…………………..……………………………………………..…..27-28

3.2 Module 5…………………..……………………………………………..…..29-30

3.2 Module 6…………………..……………………………………………..…..30-31

3
Chapter 1
Introduction to Machine Learning

1.1 Introduction
The term Machine Learning was coined by Arthur Samuel in 1959, an American
pioneer in the field of computer gaming and artificial intelligence and stated that
“it gives computers the ability to learn without being explicitly programmed”.
And in 1997, Tom Mitchell gave a “well-posed” mathematical and relational
definition that “A computer program is said to learn from experience E with
respect to some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.
Machine Learning is a latest buzzword floating around. It deserves to, as it is one
of the most interesting subfield of Computer Science. So what does Machine
Learning really mean?
Let’s try to understand Machine Learning in layman terms. Consider you are trying
to toss a paper to a dustbin.
After first attempt, you realize that you have put too much force in it. After second
attempt, you realize you are closer to target but you need to increase your throw
angle. What is happening here is basically after every throw we are learning
something and improving the end result. We are programmed to learn from our
experience.
This implies that the tasks in which machine learning is concerned offers a
fundamentally operational definition rather than defining the field in cognitive
terms. This follows Alan Turing’s proposal in his paper “Computing Machinery
and Intelligence”, in which the question “Can machines think?” is replaced with
the question “Can machines do what we (as thinking entities) can do?”
Within the field of data analytics, machine learning is used to devise complex
models and algorithms that lend themselves to prediction; in commercial use, this
is known as predictive analytics. These analytical models allow researchers, data

4
scientists, engineers, and analysts to “produce reliable, repeatable decisions and
results” and uncover “hidden insights” through learning from historical
relationships and trends in the data set(input).
Suppose that you decide to check out that offer for a vacation . You browse
through the travel agency website and search for a hotel. When you look at a
specific hotel, just below the hotel description there is a section titled “You might
also like these hotels”. This is a common use case of Machine Learning called
“Recommendation Engine”. Again, many data points were used to train a model in
order to predict what will be the best hotels to show you under that section, based
on a lot of information they already know about you.
So if you want your program to predict, for example, traffic patterns at a busy
intersection (task T), you can run it through a machine learning algorithm with data
about past traffic patterns (experience E) and, if it has successfully “learned”, it
will then do better at predicting future traffic patterns (performance measure P).
The highly complex nature of many real-world problems, though, often means that
inventing specialized algorithms that will solve them perfectly every time is
impractical, if not impossible. Examples of machine learning problems include, “Is
this cancer?”, “Which of these people are good friends with each other?”, “Will
this person like this movie?” such problems are excellent targets for Machine
Learning, and in fact machine learning has been applied such problems with great
success.
1.2 Classification of Machine Learning
Machine learning implementations are classified into three major categories,
depending on the nature of the learning “signal” or “response” available to a
learning system which are as follows:-
1. Supervised learning : ​When an algorithm learns from example data
and associated target responses that can consist of numeric values or
string labels, such as classes or tags, in order to later predict the correct
response when posed with new examples comes under the category of
Supervised learning. This approach is indeed similar to human learning

5
under the supervision of a teacher. The teacher provides good examples
for the student to memorize, and the student then derives general rules
from these specific examples.
2. Unsupervised learning :​Whereas when an algorithm learns from plain
examples without any associated response, leaving to the algorithm to
determine the data patterns on its own. This type of algorithm tends to
restructure the data into something else, such as new features that may
represent a class or a new series of un-correlated values. They are quite
useful in providing humans with insights into the meaning of data and
new useful inputs to supervised machine learning algorithms.
As a kind of learning, it resembles the methods humans use to figure
out that certain objects or events are from the same class, such as by
observing the degree of similarity between objects. Some
recommendation systems that you find on the web in the form of
marketing automation are based on this type of learning.
3. Reinforcement learning : ​When you present the algorithm with
examples that lack labels, as in unsu-pervised learning. However, you
can accompany an example with positive or negative feedback
according to the solution the algorithm proposes comes under the
category of Reinforcement learning, which is connected to applications
for which the algorithm must make decisions (so the product is
prescriptive, not just descriptive, as in unsupervised learning), and the
decisions bear consequences. In the human world, it is just like learning
by trial and error.
Errors help you learn because they have a penalty added (cost, loss of
time, regret, pain, and so on), teaching you that a certain course of
action is less likely to succeed than others. An interesting example of
reinforcement learning occurs when computers learn to play video
games by themselves.
In this case, an application presents the algorithm with examples of
specific situations, such as having the gamer stuck in a maze while
avoiding an enemy. The application lets the algorithm know the
outcome of actions it takes, and learning occurs while trying to avoid

6
what it discovers to be dan-gerous and to pursue survival. You can have
a look at how the company Google DeepMind has created a
reinforcement learning program that plays old Atari’s videogames.
When watching the video, notice how the program is initially clumsy
and unskilled but steadily improves with training until it becomes a
champion.
4. Semi-supervised learning : ​where an incomplete training signal is
given: a training set with some (often many) of the target outputs
missing. There is a special case of this principle known as Transduction
where the entire set of problem instances is known at learning time,
except that part of the targets are missing.

1.3 Categorizing on the basis of required Output


Another categorization of machine learning tasks arises when one considers the
desired output of a machine-learned system:
1. Classification : When inputs are divided into two or more classes, and
the learner must produce a model that assigns unseen inputs to one or
more (multi-label classification) of these classes. This is typically
tackled in a supervised way. Spam filtering is an example of
classification, where the inputs are email (or other) messages and the
classes are “spam” and “not spam”.
2. Regression : Which is also a supervised problem, A case when the
outputs are continuous rather than discrete.
3. Clustering : When a set of inputs is to be divided into groups. Unlike
in classification, the groups are not known beforehand, making this
typically an unsupervised task.

Machine Learning comes into the picture when problems cannot be solved by
means of typical approaches.

7
Chapter 2
Basics of Financial Market
2.1 Introduction and basic definitions
Public v/s Private Company - A Public company is one which can its shares to
the general public on the stock exchange (share market)
Eg- Reliance, ICICI Bank , Yes Bank whereas
A private company is one which holds its shares to a few, big money investors.
Eg- Paytm, Dell
Shares - As the name suggests, a share is a ‘part’ of a company which you can buy
or sell, IF the company is publicly listed i.e. on a stock exchange.
Eg- Now, let’s say a company X has a worth of Rs 1,000 right now and it has 100
shares in total. In this case, the price of 1 share will be - Rs 1,000/ 100 = Rs 10 per
share. Also, one share would be 1% of the total company. If the company had 200
shares , then one share would be 0.5% of the total company ( 1/200 ) and so on .
Conversely, the total number of shares of a company multiplied by the price of a
share at a given time is the ​Market Capitalization (Market Cap) of the company
at a given time.
Stocks v/s Shares - The 2 words are used interchangeably, though there is a slight
difference between the two. A stock can refer to any arbitrary company, but the
word ‘share’ is used when we are referring to a specific company.
Eg - You can buy stocks of 20 different companies, but you can buy the shares of
Reliance ( a specific company ) Share market and the Stock market are one and the
same thing.
NSE ( National Stock Exchange ) and BSE ( Bombay Stock Exchange ) are
basically the ‘markets’ where one can buy and sell shares of a company. Unlike the
conventional markets, these markets are electronic and not physical. All
transactions take place electronically.

8
Portfolio - If I have invested in more stocks than 1, let’s say 10 different stocks,
then this collection of investment is known as a Portfolio.

2.2 Stock Market and its working


As we discussed, the stock market is an electronic market place. Buyers and sellers
meet and trade their point of view.
For example, consider the situation of Yes Bank in 2018-19. At the time of writing
this, Yes Bank is facing a succession issue, and most of its senior level
management personnel are quitting the company for internal reasons. It seems like
the leadership vacuum is weighing down the company’s reputation heavily. As a
result, the stock price dropped to Rs.110 all the way from Rs. 250. Whenever there
are new reports regarding Yes Bank management change, the stock prices react to
it.
Assume there are two traders – ​T1​ and ​T2​.
T1​’s point of view on Yes Bank – The stock price is likely to go down further
because the company will find it challenging to find a new CEO.
If ​T1 trades as per his point of view, he should be a seller of the Yes Bank stock.
T2​, however views the same situation in a different light and therefore has a
different point of view – According to him, the stock price of Yes Bank has
overreacted to the succession issue and soon the company will find a great leader,
after whose appointment the stock price will move upwards.
If ​T2 trades as per his point of view, he should be a buyer of the Yes Bank stock.
So, at Rs 110 ​T1​ will be a seller, and ​T2​ will be a buyer in Yes Bank.
Now both ​T1 ​and ​T2 will place orders to sell and buy the stocks. The stock
exchange has to ensure that these two orders are matched, and the trade gets
executed. This is the primary job of the stock market – to create a marketplace for
the buyer and seller. The stock market is a place where market participants can
access any publicly listed company and trade from their point of view, as long as
there are other participants who have an opposing point of view. After all, different
opinions are what make a market.

9
What moves the stock?
Let us continue with the Yes Bank example to understand how stocks really move.
Imagine you are a market participant tracking Yes Bank.
It is 10:00 AM on 18th June 2019 ,and the price of Yes Bank is 110. The
management makes a statement to the press that they have managed to find a new
CEO who is expected to steer the company to greater heights. They are confident
on his capabilities and they are sure that the new CEO will deliver much more than
what is expected out of him. Two questions –
1. How will the stock price of Yes Bank react to this news?
2. If you were to place a trade on Yes Bank, what would it be? Would be a buy or a
sell?
The answer to the first question is quite simple, the stock price will move up.
Yes Bank had a leadership issue, and the company has fixed it. When positive
announcements are made market participants tend to buy the stock at any given
price and this cascades into a stock price rally.

Let me illustrate this further:

S.no Time Last traded What price What does New Last
price the seller the buyer trade price
wants? do?

1 10:00 110 112 He buys 112

2 10:01 112 116 He buys 116

3 10:03 116 121 He buys 121

4 10:05 121 126 He buys 126

Notice, whatever price the seller wants the buyer is willing to pay for it. This
buyer-seller reaction tends to push the share price higher.

10
So as you can see, the stock price jumped 16 Rupees in a matter of 5 minutes.
Though this is a fictional situation, it is a very realistic, and typical behavior of
stocks. The stocks price tends to go up when the news is good or expected to be
good.
In this particular case, the stock moves up because of two reasons. One, the
leadership issue has been fixed, and two, there is also an expectation that the new
CEO will steer the company to greater heights.
The answer to the second question is now quite simple; you buy Yes Bank stocks
considering the fact that there is good news surrounding the stock.
How does the stock get traded?
You have decided to buy 200 shares of Yes Bank at 120, and hold on to it for 1
year. How does it actually work? What is the exact process to buy it? What
happens after you buy it? Luckily there are systems in place which are fairly well
integrated. With your decision to buy Yes Bank, you need to login to your trading
account (provided by your stock broker) and place an order to buy Yes Bank. Once
you place an order, an order ticket gets generated containing the following details:
1. Details of your trading account through which you intend to buy Yes Bank
shares – therefore your identity is revealed.
2. The price at which you intend to buy Yes Bank
3. The number of shares you intend to buy
Before your broker transmits this order to the exchange he needs to ensure you
have sufficient money to buy these shares. If yes, then this order ticket hits the
stock exchange. Once the order hits the market the stock exchange (through their
order matching algorithm) tries to find a seller who is willing to sell you 200 shares
of Yes Bank at 120.
Now the seller could be 1 person willing to sell the entire 200 shares at 120 or it
could be 10 people selling 20 shares each or it could be 2 people selling 1 and 199
shares respectively. The permutation and combination does not really matter. From
your perspective, all you need is 200 shares of Yes Bank at 120 and you have
placed an order for the same. The stock exchange ensures the shares are available
to you as long as there are sellers in the market. Once the trade is executed, the

11
shares will be electronically credited to your DEMAT account. Likewise the shares
will be electronically debited from the sellers DEMAT account.
What happens after you own a stock?
After you buy the shares, the shares will now reside in your DEMAT account. You
are now a part owner of the company, to the extent of your shareholding. To give
you a perspective, if you own 200 shares of Yes Bank then you own 0.00068% of
Yes Bank. By virtue of owning the shares you are entitled to a few corporate
benefits like dividends, stock splits, bonus, rights issue, voting rights etc. We will
explore all these shareholder privileges at a later stage.

2.3 Equities and Commodities


From an investment point of view, we always want maximum returns on our
investments, with minimum risk.
One of the most common ways is to open a Fixed Deposit account in a bank.
Although the risk is minimum, so are the returns, close to 6% returns per annum.
This asset class which consists of Fixed Deposits and other instruments is known
as ​Debt or Fixed Income instruments​.
A different way to invest would be to invest in different asset classes. We’ll only
be discussing about the 2 most common asset classes which we come across on a
daily basis -
1. Equities​ ​( Stocks and Shares )
Shares and its derivatives collectively are known as equity. Investment in stocks
involves buying shares of publicly listed companies. The shares are traded both on
the Bombay Stock Exchange (BSE), and the National Stock Exchange (NSE).
When an investor invests in equity, unlike a fixed deposit, there is no capital
guarantee. However, as a trade-off, the returns from equity investment can be
extremely attractive. Indian Equities have generated returns close to 14% – 15%
CAGR (compound annual growth rate) over the past 15 years. Investing in some of
the best and well run Indian companies has yielded over 20% CAGR in the
long-term. Identifying such investments opportunities requires skill, hard work and

12
patience, and of late, some really good Data Science skills which we are going to
use in this internship!
2. Commodities
Investments in gold and silver are considered one of the most popular investment
avenues. Gold and silver over a long-term period has appreciated in value.
Investments in these metals have yielded an annual return of approximately 8%
over the last 20 years ! Crude Oil is another commodity which keeps on varying
with time and if one is able to analyse the trend correctly, one can make a fortune
in this product.
Eg- If an investor would’ve correctly analysed the trend of oil price movement
between October 2018 and December 2018, he/she would’ve made 36.7% profits
in just 2 months !

2.4 Market Index


It is very likely that you have come across the words Sensex & Nifty at some point
of time. You may even be knowing the levels where they are trading right now ,
Sensex around 39,000 and Nifty at 11,800 . But what exactly is Nifty and Sensex?

Nifty50 ( or Nifty ) and Sensex are Market Indices of the NSE and BSE
respectively. Now another question arises- What is a market index ?
Consider the following situation -
If I were to ask you how the stock market is moving today, how would you answer
my question? There are approximately 5,000 listed companies in the Bombay Stock
Exchange and about 2,000 listed companies in the National Stock Exchange. It
would be clumsy to check each and every company, figure out if they are up or
down for the day and then give a detailed answer.

Instead you would just check few important companies across key industrial
sectors. If the majority of these companies are moving up you would say markets

13
are up, if the majority is down, you would say markets are down, and if there is a
mixed trend, you would say markets are sideways!

So essentially identify a few companies to represent the broader markets. So every


time someone asks you how the markets are doing, you would just check the
general trend of these selected stocks and then give an answer. These companies
that you have identified collectively make up the stock market index!

Now that we have that covered, we can now understand what Nifty and Sensex
really are.
● Nifty is the market index which represents the top 50 companies listed on
the NSE
● Sensex is the market index which represents the top 30 companies listed on
the BSE

The obvious question arises –


How do we assign weights to the stock that make up the Index?
There are many ways to assign weights but the Indian stock exchange follows a
method called market capitalization. The weights are assigned based on the market
capitalization of the company, the larger the market capitalization, higher the
weight. Market capitalization is the product of total number of shares outstanding
in the market, and the price of the stock. For example company ABC has a total of
100 shares outstanding in the market, and the stock price is at 50 then the free float
market cap of ABC is 100*50 = Rs.5,000. So, Nifty is a collection of the top 50
companies on the NSE by market cap , and Sensex is a collection of the top 30
companies on the BSE by market cap.

14
Chapter 3
About Internship
3.1 Introduction
Investment Bankers . CA's . Hedge Fund / Portfolio Managers . Forex traders .
Commodities Analysts. These have been historically considered to be among the
most coveted professions of all time. Yet, if one fails to keep up with the demands
of the day, one would find one's skills to be obsolete in this era of data analysis.
Data Science has inarguably been the hottest domain of the decade, asserting its
need in every single sphere of corporate life. It was not long ago when we
discovered the massive potential of incorporating ML/AI in the financial world.
Now, the very idea of the two being disjointed sounds strange.
Data Science has been incremental in providing powerful insights ( which people
didn't even know existed ) and helped massively increase the efficiency, helping
everyone from a scalp trader to a long term debt investor. Accurate predictions,
unbiased analysis, powerful tools that run through millions of rows of data in the
blink of an eye have transformed the industry in ways we could've never imagined.
This internship was designed to both test our knowledge and to give us the feel and
experience of a real world financial world - data science problem.

I was allotted HDFC Bank

In the coming modules, we will be doing various tasks to analyze and make
predictions on allotted selected company stock. You may need to learn about the
underlying markets as well to complete the internship.

15
3.2 Module 1
Introduction to the problem

In Module 1, we are going to get familiar with pandas, the python module which is
used to process and analyse data. Processing could include removing unknown
values from the data or replacing unknown values with values which make sense,
maybe 0. Analysing the data could include finding out the trend of a stock price,
e.g. how the stock price changes with respect to the Nifty 50 basket of stocks.
Problem Statements

● 1.1 Import the csv file of the stock you have been allotted using
'pd.read_csv()' function into a dataframe.
Shares of a company can be offered in more than one category. The category
of a stock is indicated in the ‘Series’ column. If the csv file has data on more
than one category, the ‘Date’ column will have repeating values. To avoid
repetitions in the date, remove all the rows where 'Series' column is NOT
'EQ'.
Analyze and understand each column properly.
You'd find the head(), tail() and describe() functions to be immensely useful
for exploration. You're free to carry out any other exploration of your own.
● 1.2 Calculate the maximum, minimum and mean price for the last 90 days.
(price=Closing Price unless stated otherwise)
● 1.3 Analyse the data types for each column of the dataframe. Pandas knows
how to deal with dates in an intelligent manner. But to make use of Pandas
functionality for dates, you need to ensure that the column is of type
'datetime64(ns)'. Change the date column from 'object' type to
'datetime64(ns)' for future convenience. See what happens if you subtract the
minimum value of the date column from the maximum value.
● 1.4 In a separate array , calculate the monthwise VWAP (Volume Weighted
Average Price ) of the stock.
( VWAP = sum(price*volume)/sum(volume) )
To know more about VWAP , visit - ​VWAP definition

16
{Hint : Create a new dataframe column ‘Month’. The values for this column
can be derived from the ‘Date” column by using appropriate pandas
functions. Similarly, create a column ‘Year’ and initialize it. Then use the
'groupby()' function by month and year. Finally, calculate the vwap value for
each month (i.e. for each group created).
● 1.5Write a function to calculate the average price over the last N days of the
stock price data where N is a user defined parameter. Write a second
function to calculate the profit/loss percentage over the last N days.
Calculate the average price AND the profit/loss percentages over the course
of last -
1 week, 2 weeks, 1 month, 3 months, 6 months and 1 year.
{Note : Profit/Loss percentage between N days is the percentage change
between the closing prices of the 2 days }
● 1.6 Add a column 'Day_Perc_Change' where the values are the daily change
in percentages i.e. the percentage change between 2 consecutive day's
closing prices. Instead of using the basic mathematical formula for
computing the same, use 'pct_change()' function provided by Pandas for
dataframes. You will note that the first entry of the column will have a ‘Nan’
value. Why does this happen? Either remove the first row, or set the entry to
0 before proceeding.
● 1.7 Add another column 'Trend' whose values are:
○ 'Slight or No change' for 'Day_Perc_Change' in between -0.5 and
0.5
○ 'Slight positive' for 'Day_Perc_Change' in between 0.5 and 1
○ 'Slight negative' for 'Day_Perc_Change' in between -0.5 and -1
○ 'Positive' for 'Day_Perc_Change' in between 1 and 3
○ 'Negative' for 'Day_Perc_Change' in between -1 and -3
○ 'Among top gainers' for 'Day_Perc_Change' in between 3 and 7
○ 'Among top losers' for 'Day_Perc_Change' in between -3 and -7
○ 'Bull run' for 'Day_Perc_Change' >7
○ 'Bear drop' for 'Day_Perc_Change' <-7

17
● 1.8 Find the average and median values of the column 'Total Traded
Quantity' for each of the types of 'Trend'.
{Hint : use 'groupby()' on the 'Trend' column and then calculate the average
and median values of the column 'Total Traded Quantity'}
● 1.9 SAVE the dataframe with the additional columns computed as a csv file
week2.csv. In Module 2, you are going to get familiar with matplotlib, the
python module which is used to visualize data.

3.3 Module 2

Data visualization and Technical Analysis

'A picture speaks a thousand words' has never been truer in financial markets.
Absolutely no one goes through the millions of rows of numbers, we always prefer
the data in a plotted form to draw better inferences. This module would cover the
plotting, basic technical indicators and our own customisation, and making our
own trade calls!
Problem Statements

● 2.1 Load the week2.csv file into a dataframe. What is the type of the Date
column? Make sure it is of type datetime64. Convert the Date column to the
index of the dataframe.
Plot the closing price of each of the days for the entire time frame to get an
idea of what the general outlook of the stock is.
○ Look out for drastic changes in this stock, you have the exact date
when these took place, try to fetch the news for this day of this
stock
○ This would be helpful if we are to train our model to take NLP
inputs.

18

● 2.2 A stem plot is a discrete series plot, ideal for plotting daywise data. It
can be plotted using the plt.stem() function.

Display a stem plot of the daily change in of the stock price in percentage.
This column was calculated in module 1 and should be already available in
week2.csv. Observe whenever there's a large change.
● 2.3 Plot the daily volumes as well and compare the percentage stem plot to
it. Document your analysis of the relationship between volume and daily
percentage change.

19
● 2.4 We had created a Trend column in module 1. We want to see how often
each Trend type occurs. This can be seen as a pie chart, with each sector
representing the percentage of days each trend occurs. Plot a pie chart for all
the 'Trend' to know about relative frequency of each trend. You can use the
groupby function with the trend column to group all days with the same
trend into a single group before plotting the pie chart. From the grouped
data, create a BAR plot of average & median values of the 'Total Traded
Quantity' by Trend type.

20
21
● 2.5 Plot the daily return (percentage) distribution as a histogram.
Histogram analysis is one of the most fundamental methods of exploratory
data analysis. In this case, it'd return a frequency plot of various values of
percentage changes .
● 2.6 We next want to analyse how the behaviour of different stocks are
correlated. The correlation is performed on the percentage change of the
stock price instead of the stock price.

Load any 5 stocks of your choice into 5 dataframes. Retain only rows for
which ‘Series’ column has value ‘EQ’. Create a single dataframe which
contains the ‘Closing Price’ of each stock. This dataframe should hence have
five columns. Rename each column to the name of the stock that is

22
contained in the column. Create a new dataframe which is a percentage
change of the values in the previous dataframe. Drop Nan’s from this
dataframe.
Using seaborn, analyse the correlation between the percentage changes in
the five stocks. This is extremely useful for a fund manager to design a
diversified portfolio. To know more, check out these resources on
correlation​and ​diversification​.

● 2.7 Volatility is the change in variance in the returns of a stock over a


specific period of time.Do give the following documentation on ​volatility a
read.

23
You have already calculated the percentage changes in several stock prices.
Calculate the 7 day rolling average of the percentage change of any of the
stock prices, then compute the standard deviation (which is the square root
of the variance) and plot the values.
Note: pandas provides a rolling() function for dataframes and a std()
function also which you can use.
● 2.8 Calculate the volatility for the Nifty index and compare the 2. This leads
us to a useful indicator known as 'Beta' ( We'll be covering this in length in
Module 3)

● 2.9 Trade Calls - Using Simple Moving Averages. Study about moving
averages ​here​.

24
Plot the 21 day and 34 day Moving average with the average price and
decide a Call !
Call should be buy whenever the smaller moving average (21) crosses over
longer moving average (34) AND the call should be sell whenever smaller
moving average crosses under longer moving average.
One of the most widely used technical indicators.
● 2.10 Trade Calls - Using ​Bollinger Bands
Plot the bollinger bands for this stock - the duration of 14 days and 2
standard deviations away from the average
The bollinger bands comprise the following data points-

○ The 14 day rolling mean of the closing price (we call it the average)
○ Upper band which is the rolling mean + 2 standard deviations away
from the average.
○ Lower band which is the rolling mean - 2 standard deviations away
from the average.
○ Average Daily stock price.
● Bollinger bands are extremely reliable , with a 95% accuracy at 2 standard
deviations , and especially useful in sideways moving market.
Observe the bands yourself , and analyse the accuracy of all the trade signals
provided by the bollinger bands.

3.4 Module 3

Fundamental analysis using Regression

This module would introduce us to the Regression related inferences to be drawn


from the data.

Regression is basically a statistical approach to find the relationship between


variables. In machine learning, this is used to predict the outcome of an event
based on the relationship between variables obtained from the data-set. More often

25
than not, we utilize linear regression to come up with an ideal inference. We'd be
using the regression model to solve the following problems:

Problem Statements

● 3.1 Import the file 'gold.csv' (you will find this in the intro section to
download or in '/Data/gold.csv' if you are using the jupyter notebook), which
contains the data of the last 2 years price action of Indian (MCX) gold
standard. Explore the dataframe. You'd see 2 unique columns - 'Pred' and
'new'. One of the 2 columns is a linear combination of the OHLC prices with
varying coefficients while the other is a polynomial function of the same
inputs. Also, one of the 2 columns is partially filled.
○ Using linear regression, find the coefficients of the inputs and using
the same trained model, complete the entire column.
○ Also, try to fit the other column as well using a new linear
regression model. Check if the predictions are accurate. Mention
which column is a linear function and which is polynomial.
(Hint: Plotting a histogram & distplot helps in recognizing the
discrepencies in prediction, if any.)
● CAPM CAPM Analysis and Beta Calculation using regression -
CAPM(Capital Asset Pricing Model) attempts to price securities by
examining the relationship that exists between expected returns and risk.
Read more about CAPM. (​Investopedia CAPM reference​)
The Beta of an asset is a measure of the sensitivity of its returns relative to a
market benchmark (usually a market index). How sensitive/insensitive is the
returns of an asset to the overall market returns (usually a market index like
S&P 500 index). What happens when the market jumps, does the returns of
the asset jump accordingly or jump somehow?
Read more about Beta (​Investopedia Beta reference​)
● 3.2 Import the stock of your choosing AND the Nifty index.
Using linear regression (OLS), calculate -
○ The daily Beta value for the past 3 months. (Daily= Daily returns)

26
○ The monthly Beta value. (Monthly= Monthly returns)
● Refrain from using the (covariance(x,y)/variance(x)) formula.
Attempt the question using regression.(​Regression Reference​)
Were the Beta values more or less than 1 ? What if it was negative ?
Discuss. Include a brief writeup in the bottom of your jupyter notebook with
your inferences from the Beta values and regression results

3.5 Module 4

Trade Call Prediction using Classification

In this module, we'd be covering the concept of classification and utilize our skills
to solve the following queries – (Stock Price = Close Price)

Problem Statements

● 4.1 Import the csv file of the stock which contained the Bollinger columns as
well.
○ Create a new column 'Call' , whose entries are -
'Buy' if the stock price is below the lower Bollinger band
'Hold Buy/ Liquidate Short' if the stock price is between the lower
and middle Bollinger band
'Hold Short/ Liquidate Buy' if the stock price is between the middle
and upper Bollinger band
'Short' if the stock price is above the upper Bollinger band
○ Now train a classification model with the 3 bollinger columns and
the stock price as inputs and 'Calls' as output. Check the accuracy
on a test set. (There are many classifier models to choose from, try
each one out and compare the accuracy for each)
○ Import another stock data and create the bollinger columns. Using
the already defined model, predict the daily calls for this new stock.

27
● 4.2 Now, we'll again utilize classification to make a trade call, and measure
the efficiency of our trading algorithm over the past two years. For this
assignment , we will use RandomForest classifier.
○ Import the stock data file of your choice
○ Define 4 new columns , whose values are:
% change between Open and Close price for the day
% change between Low and High price for the day
5 day rolling mean of the day to day % change in Close Price
5 day rolling std of the day to day % change in Close Price
○ Create a new column 'Action' whose values are:
1 if next day's price(Close) is greater than present day's.
(-1) if next day's price(Close) is less than present day's.
i.e. Action [ i ] = 1 if Close[ i+1 ] > Close[ i ]
i.e. Action [ i ] = (-1) if Close[ i+1 ] < Close[ i ]
○ Construct a classification model with the 4 new inputs and 'Action'
as target
○ Check the accuracy of this model , also , plot the net cumulative
returns (in %) if we were to follow this algorithmic model

28
3.6 Module 5

Modern Portfolio Theory

In this module, We’ll be looking at investment portfolio optimization with python,


the fundamental concept of diversification and the creation of an efficient frontier
that can be used by investors to choose specific mixes of assets based on
investment goals; that is, the trade off between their desired level of portfolio
return vs their desired level of portfolio risk.

Modern Portfolio Theory suggests that it is possible to construct an "efficient


frontier" of optimal portfolios, offering the maximum possible expected return for
a given level of risk. It suggests that it is not enough to look at the expected risk
and return of one particular stock. By investing in more than one stock, an investor
can reap the benefits of diversification, particularly a reduction in the riskiness of
the portfolio. MPT quantifies the benefits of diversification, also known as not
putting all of your eggs in one basket.

Problem Statements

● 5.1 For your chosen stock, calculate the mean daily return and daily standard
deviation of returns, and then just annualise them to get mean expected
annual return and volatility of that single stock. ( annual mean = daily mean
* 252 , annual stdev = daily stdev * sqrt(252) )
● 5.2 Now, we need to diversify our portfolio. Build your own portfolio by
choosing any 5 stocks, preferably of different sectors and different caps.
Assume that all 5 have the same weightage, i.e. 20% . Now calculate the
annual returns and volatility of the entire portfolio ( Hint : Don't forget to
use the covariance )
● 5.3 Prepare a scatter plot for differing weights of the individual stocks in the
portfolio , the axes being the returns and volatility. Colour the data points
based on the Sharpe Ratio ( Returns/Volatility) of that particular portfolio.

29
● 5.4 Mark the 2 portfolios where -
Portfolio 1 - The Sharpe ratio is the highest
Portfolio 2 - The volatility is the lowest.

3.7 Module 6

Clustering for Diversification analysis

Clustering is a method of unsupervised learning and is a common technique for


statistical data analysis used in many fields.

Clustering is a Machine Learning technique that involves the grouping of data


points. Given a set of data points, we can use a clustering algorithm to classify
each data point into a specific group. In theory, data points that are in the same
group should have similar properties and/or features, while data points in different
groups should have highly dissimilar properties and/or features.

In financial Markets, Cluster analysis is a technique used to group sets of objects


that share similar characteristics. It is common in statistics, but investors will use
the approach to build a diversified portfolio. Stocks that exhibit high correlations in

30
returns fall into one basket, those slightly less correlated in another, and so on,
until each stock is placed into a category.

Problem Statements

● 6.1 Create a table/data frame with the closing prices of 30 different stocks,
with 10 from each of the caps
● 6.2 Calculate average annual percentage return and volatility of all 30 stocks
over a theoretical one year period
● 6.3 Cluster the 30 stocks according to their mean annual Volatilities and
Returns using K-means clustering. Identify the optimum number of clusters
using the Elbow curve method
● 6.4 Prepare a separate Data frame to show which stocks belong to the same
cluster

31

You might also like