Internship Report
Internship Report
ON
‘Machine Learning’
Submitted by:
itesh Bisht
N
40710102816
1
ACKNOWLEDGEMENT
The internship opportunity I had with Career Launcher was a great chance for
learning and professional development. Therefore, I consider myself very fortunate
as I was provided with an opportunity to be a part of it. I am also grateful for
having a chance to work with many wonderful people and professionals who led
me through this internship period.
2
Table of contents
Chapter 1:
Chapter 2:
Chapter 3:
About Internship………………………………………………………………..15-31
3.1 Introduction………………....………………………………….…..………...15
3
Chapter 1
Introduction to Machine Learning
1.1 Introduction
The term Machine Learning was coined by Arthur Samuel in 1959, an American
pioneer in the field of computer gaming and artificial intelligence and stated that
“it gives computers the ability to learn without being explicitly programmed”.
And in 1997, Tom Mitchell gave a “well-posed” mathematical and relational
definition that “A computer program is said to learn from experience E with
respect to some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.
Machine Learning is a latest buzzword floating around. It deserves to, as it is one
of the most interesting subfield of Computer Science. So what does Machine
Learning really mean?
Let’s try to understand Machine Learning in layman terms. Consider you are trying
to toss a paper to a dustbin.
After first attempt, you realize that you have put too much force in it. After second
attempt, you realize you are closer to target but you need to increase your throw
angle. What is happening here is basically after every throw we are learning
something and improving the end result. We are programmed to learn from our
experience.
This implies that the tasks in which machine learning is concerned offers a
fundamentally operational definition rather than defining the field in cognitive
terms. This follows Alan Turing’s proposal in his paper “Computing Machinery
and Intelligence”, in which the question “Can machines think?” is replaced with
the question “Can machines do what we (as thinking entities) can do?”
Within the field of data analytics, machine learning is used to devise complex
models and algorithms that lend themselves to prediction; in commercial use, this
is known as predictive analytics. These analytical models allow researchers, data
4
scientists, engineers, and analysts to “produce reliable, repeatable decisions and
results” and uncover “hidden insights” through learning from historical
relationships and trends in the data set(input).
Suppose that you decide to check out that offer for a vacation . You browse
through the travel agency website and search for a hotel. When you look at a
specific hotel, just below the hotel description there is a section titled “You might
also like these hotels”. This is a common use case of Machine Learning called
“Recommendation Engine”. Again, many data points were used to train a model in
order to predict what will be the best hotels to show you under that section, based
on a lot of information they already know about you.
So if you want your program to predict, for example, traffic patterns at a busy
intersection (task T), you can run it through a machine learning algorithm with data
about past traffic patterns (experience E) and, if it has successfully “learned”, it
will then do better at predicting future traffic patterns (performance measure P).
The highly complex nature of many real-world problems, though, often means that
inventing specialized algorithms that will solve them perfectly every time is
impractical, if not impossible. Examples of machine learning problems include, “Is
this cancer?”, “Which of these people are good friends with each other?”, “Will
this person like this movie?” such problems are excellent targets for Machine
Learning, and in fact machine learning has been applied such problems with great
success.
1.2 Classification of Machine Learning
Machine learning implementations are classified into three major categories,
depending on the nature of the learning “signal” or “response” available to a
learning system which are as follows:-
1. Supervised learning : When an algorithm learns from example data
and associated target responses that can consist of numeric values or
string labels, such as classes or tags, in order to later predict the correct
response when posed with new examples comes under the category of
Supervised learning. This approach is indeed similar to human learning
5
under the supervision of a teacher. The teacher provides good examples
for the student to memorize, and the student then derives general rules
from these specific examples.
2. Unsupervised learning :Whereas when an algorithm learns from plain
examples without any associated response, leaving to the algorithm to
determine the data patterns on its own. This type of algorithm tends to
restructure the data into something else, such as new features that may
represent a class or a new series of un-correlated values. They are quite
useful in providing humans with insights into the meaning of data and
new useful inputs to supervised machine learning algorithms.
As a kind of learning, it resembles the methods humans use to figure
out that certain objects or events are from the same class, such as by
observing the degree of similarity between objects. Some
recommendation systems that you find on the web in the form of
marketing automation are based on this type of learning.
3. Reinforcement learning : When you present the algorithm with
examples that lack labels, as in unsu-pervised learning. However, you
can accompany an example with positive or negative feedback
according to the solution the algorithm proposes comes under the
category of Reinforcement learning, which is connected to applications
for which the algorithm must make decisions (so the product is
prescriptive, not just descriptive, as in unsupervised learning), and the
decisions bear consequences. In the human world, it is just like learning
by trial and error.
Errors help you learn because they have a penalty added (cost, loss of
time, regret, pain, and so on), teaching you that a certain course of
action is less likely to succeed than others. An interesting example of
reinforcement learning occurs when computers learn to play video
games by themselves.
In this case, an application presents the algorithm with examples of
specific situations, such as having the gamer stuck in a maze while
avoiding an enemy. The application lets the algorithm know the
outcome of actions it takes, and learning occurs while trying to avoid
6
what it discovers to be dan-gerous and to pursue survival. You can have
a look at how the company Google DeepMind has created a
reinforcement learning program that plays old Atari’s videogames.
When watching the video, notice how the program is initially clumsy
and unskilled but steadily improves with training until it becomes a
champion.
4. Semi-supervised learning : where an incomplete training signal is
given: a training set with some (often many) of the target outputs
missing. There is a special case of this principle known as Transduction
where the entire set of problem instances is known at learning time,
except that part of the targets are missing.
Machine Learning comes into the picture when problems cannot be solved by
means of typical approaches.
7
Chapter 2
Basics of Financial Market
2.1 Introduction and basic definitions
Public v/s Private Company - A Public company is one which can its shares to
the general public on the stock exchange (share market)
Eg- Reliance, ICICI Bank , Yes Bank whereas
A private company is one which holds its shares to a few, big money investors.
Eg- Paytm, Dell
Shares - As the name suggests, a share is a ‘part’ of a company which you can buy
or sell, IF the company is publicly listed i.e. on a stock exchange.
Eg- Now, let’s say a company X has a worth of Rs 1,000 right now and it has 100
shares in total. In this case, the price of 1 share will be - Rs 1,000/ 100 = Rs 10 per
share. Also, one share would be 1% of the total company. If the company had 200
shares , then one share would be 0.5% of the total company ( 1/200 ) and so on .
Conversely, the total number of shares of a company multiplied by the price of a
share at a given time is the Market Capitalization (Market Cap) of the company
at a given time.
Stocks v/s Shares - The 2 words are used interchangeably, though there is a slight
difference between the two. A stock can refer to any arbitrary company, but the
word ‘share’ is used when we are referring to a specific company.
Eg - You can buy stocks of 20 different companies, but you can buy the shares of
Reliance ( a specific company ) Share market and the Stock market are one and the
same thing.
NSE ( National Stock Exchange ) and BSE ( Bombay Stock Exchange ) are
basically the ‘markets’ where one can buy and sell shares of a company. Unlike the
conventional markets, these markets are electronic and not physical. All
transactions take place electronically.
8
Portfolio - If I have invested in more stocks than 1, let’s say 10 different stocks,
then this collection of investment is known as a Portfolio.
9
What moves the stock?
Let us continue with the Yes Bank example to understand how stocks really move.
Imagine you are a market participant tracking Yes Bank.
It is 10:00 AM on 18th June 2019 ,and the price of Yes Bank is 110. The
management makes a statement to the press that they have managed to find a new
CEO who is expected to steer the company to greater heights. They are confident
on his capabilities and they are sure that the new CEO will deliver much more than
what is expected out of him. Two questions –
1. How will the stock price of Yes Bank react to this news?
2. If you were to place a trade on Yes Bank, what would it be? Would be a buy or a
sell?
The answer to the first question is quite simple, the stock price will move up.
Yes Bank had a leadership issue, and the company has fixed it. When positive
announcements are made market participants tend to buy the stock at any given
price and this cascades into a stock price rally.
S.no Time Last traded What price What does New Last
price the seller the buyer trade price
wants? do?
Notice, whatever price the seller wants the buyer is willing to pay for it. This
buyer-seller reaction tends to push the share price higher.
10
So as you can see, the stock price jumped 16 Rupees in a matter of 5 minutes.
Though this is a fictional situation, it is a very realistic, and typical behavior of
stocks. The stocks price tends to go up when the news is good or expected to be
good.
In this particular case, the stock moves up because of two reasons. One, the
leadership issue has been fixed, and two, there is also an expectation that the new
CEO will steer the company to greater heights.
The answer to the second question is now quite simple; you buy Yes Bank stocks
considering the fact that there is good news surrounding the stock.
How does the stock get traded?
You have decided to buy 200 shares of Yes Bank at 120, and hold on to it for 1
year. How does it actually work? What is the exact process to buy it? What
happens after you buy it? Luckily there are systems in place which are fairly well
integrated. With your decision to buy Yes Bank, you need to login to your trading
account (provided by your stock broker) and place an order to buy Yes Bank. Once
you place an order, an order ticket gets generated containing the following details:
1. Details of your trading account through which you intend to buy Yes Bank
shares – therefore your identity is revealed.
2. The price at which you intend to buy Yes Bank
3. The number of shares you intend to buy
Before your broker transmits this order to the exchange he needs to ensure you
have sufficient money to buy these shares. If yes, then this order ticket hits the
stock exchange. Once the order hits the market the stock exchange (through their
order matching algorithm) tries to find a seller who is willing to sell you 200 shares
of Yes Bank at 120.
Now the seller could be 1 person willing to sell the entire 200 shares at 120 or it
could be 10 people selling 20 shares each or it could be 2 people selling 1 and 199
shares respectively. The permutation and combination does not really matter. From
your perspective, all you need is 200 shares of Yes Bank at 120 and you have
placed an order for the same. The stock exchange ensures the shares are available
to you as long as there are sellers in the market. Once the trade is executed, the
11
shares will be electronically credited to your DEMAT account. Likewise the shares
will be electronically debited from the sellers DEMAT account.
What happens after you own a stock?
After you buy the shares, the shares will now reside in your DEMAT account. You
are now a part owner of the company, to the extent of your shareholding. To give
you a perspective, if you own 200 shares of Yes Bank then you own 0.00068% of
Yes Bank. By virtue of owning the shares you are entitled to a few corporate
benefits like dividends, stock splits, bonus, rights issue, voting rights etc. We will
explore all these shareholder privileges at a later stage.
12
patience, and of late, some really good Data Science skills which we are going to
use in this internship!
2. Commodities
Investments in gold and silver are considered one of the most popular investment
avenues. Gold and silver over a long-term period has appreciated in value.
Investments in these metals have yielded an annual return of approximately 8%
over the last 20 years ! Crude Oil is another commodity which keeps on varying
with time and if one is able to analyse the trend correctly, one can make a fortune
in this product.
Eg- If an investor would’ve correctly analysed the trend of oil price movement
between October 2018 and December 2018, he/she would’ve made 36.7% profits
in just 2 months !
Nifty50 ( or Nifty ) and Sensex are Market Indices of the NSE and BSE
respectively. Now another question arises- What is a market index ?
Consider the following situation -
If I were to ask you how the stock market is moving today, how would you answer
my question? There are approximately 5,000 listed companies in the Bombay Stock
Exchange and about 2,000 listed companies in the National Stock Exchange. It
would be clumsy to check each and every company, figure out if they are up or
down for the day and then give a detailed answer.
Instead you would just check few important companies across key industrial
sectors. If the majority of these companies are moving up you would say markets
13
are up, if the majority is down, you would say markets are down, and if there is a
mixed trend, you would say markets are sideways!
Now that we have that covered, we can now understand what Nifty and Sensex
really are.
● Nifty is the market index which represents the top 50 companies listed on
the NSE
● Sensex is the market index which represents the top 30 companies listed on
the BSE
14
Chapter 3
About Internship
3.1 Introduction
Investment Bankers . CA's . Hedge Fund / Portfolio Managers . Forex traders .
Commodities Analysts. These have been historically considered to be among the
most coveted professions of all time. Yet, if one fails to keep up with the demands
of the day, one would find one's skills to be obsolete in this era of data analysis.
Data Science has inarguably been the hottest domain of the decade, asserting its
need in every single sphere of corporate life. It was not long ago when we
discovered the massive potential of incorporating ML/AI in the financial world.
Now, the very idea of the two being disjointed sounds strange.
Data Science has been incremental in providing powerful insights ( which people
didn't even know existed ) and helped massively increase the efficiency, helping
everyone from a scalp trader to a long term debt investor. Accurate predictions,
unbiased analysis, powerful tools that run through millions of rows of data in the
blink of an eye have transformed the industry in ways we could've never imagined.
This internship was designed to both test our knowledge and to give us the feel and
experience of a real world financial world - data science problem.
In the coming modules, we will be doing various tasks to analyze and make
predictions on allotted selected company stock. You may need to learn about the
underlying markets as well to complete the internship.
15
3.2 Module 1
Introduction to the problem
In Module 1, we are going to get familiar with pandas, the python module which is
used to process and analyse data. Processing could include removing unknown
values from the data or replacing unknown values with values which make sense,
maybe 0. Analysing the data could include finding out the trend of a stock price,
e.g. how the stock price changes with respect to the Nifty 50 basket of stocks.
Problem Statements
● 1.1 Import the csv file of the stock you have been allotted using
'pd.read_csv()' function into a dataframe.
Shares of a company can be offered in more than one category. The category
of a stock is indicated in the ‘Series’ column. If the csv file has data on more
than one category, the ‘Date’ column will have repeating values. To avoid
repetitions in the date, remove all the rows where 'Series' column is NOT
'EQ'.
Analyze and understand each column properly.
You'd find the head(), tail() and describe() functions to be immensely useful
for exploration. You're free to carry out any other exploration of your own.
● 1.2 Calculate the maximum, minimum and mean price for the last 90 days.
(price=Closing Price unless stated otherwise)
● 1.3 Analyse the data types for each column of the dataframe. Pandas knows
how to deal with dates in an intelligent manner. But to make use of Pandas
functionality for dates, you need to ensure that the column is of type
'datetime64(ns)'. Change the date column from 'object' type to
'datetime64(ns)' for future convenience. See what happens if you subtract the
minimum value of the date column from the maximum value.
● 1.4 In a separate array , calculate the monthwise VWAP (Volume Weighted
Average Price ) of the stock.
( VWAP = sum(price*volume)/sum(volume) )
To know more about VWAP , visit - VWAP definition
16
{Hint : Create a new dataframe column ‘Month’. The values for this column
can be derived from the ‘Date” column by using appropriate pandas
functions. Similarly, create a column ‘Year’ and initialize it. Then use the
'groupby()' function by month and year. Finally, calculate the vwap value for
each month (i.e. for each group created).
● 1.5Write a function to calculate the average price over the last N days of the
stock price data where N is a user defined parameter. Write a second
function to calculate the profit/loss percentage over the last N days.
Calculate the average price AND the profit/loss percentages over the course
of last -
1 week, 2 weeks, 1 month, 3 months, 6 months and 1 year.
{Note : Profit/Loss percentage between N days is the percentage change
between the closing prices of the 2 days }
● 1.6 Add a column 'Day_Perc_Change' where the values are the daily change
in percentages i.e. the percentage change between 2 consecutive day's
closing prices. Instead of using the basic mathematical formula for
computing the same, use 'pct_change()' function provided by Pandas for
dataframes. You will note that the first entry of the column will have a ‘Nan’
value. Why does this happen? Either remove the first row, or set the entry to
0 before proceeding.
● 1.7 Add another column 'Trend' whose values are:
○ 'Slight or No change' for 'Day_Perc_Change' in between -0.5 and
0.5
○ 'Slight positive' for 'Day_Perc_Change' in between 0.5 and 1
○ 'Slight negative' for 'Day_Perc_Change' in between -0.5 and -1
○ 'Positive' for 'Day_Perc_Change' in between 1 and 3
○ 'Negative' for 'Day_Perc_Change' in between -1 and -3
○ 'Among top gainers' for 'Day_Perc_Change' in between 3 and 7
○ 'Among top losers' for 'Day_Perc_Change' in between -3 and -7
○ 'Bull run' for 'Day_Perc_Change' >7
○ 'Bear drop' for 'Day_Perc_Change' <-7
17
● 1.8 Find the average and median values of the column 'Total Traded
Quantity' for each of the types of 'Trend'.
{Hint : use 'groupby()' on the 'Trend' column and then calculate the average
and median values of the column 'Total Traded Quantity'}
● 1.9 SAVE the dataframe with the additional columns computed as a csv file
week2.csv. In Module 2, you are going to get familiar with matplotlib, the
python module which is used to visualize data.
3.3 Module 2
'A picture speaks a thousand words' has never been truer in financial markets.
Absolutely no one goes through the millions of rows of numbers, we always prefer
the data in a plotted form to draw better inferences. This module would cover the
plotting, basic technical indicators and our own customisation, and making our
own trade calls!
Problem Statements
● 2.1 Load the week2.csv file into a dataframe. What is the type of the Date
column? Make sure it is of type datetime64. Convert the Date column to the
index of the dataframe.
Plot the closing price of each of the days for the entire time frame to get an
idea of what the general outlook of the stock is.
○ Look out for drastic changes in this stock, you have the exact date
when these took place, try to fetch the news for this day of this
stock
○ This would be helpful if we are to train our model to take NLP
inputs.
18
●
● 2.2 A stem plot is a discrete series plot, ideal for plotting daywise data. It
can be plotted using the plt.stem() function.
Display a stem plot of the daily change in of the stock price in percentage.
This column was calculated in module 1 and should be already available in
week2.csv. Observe whenever there's a large change.
● 2.3 Plot the daily volumes as well and compare the percentage stem plot to
it. Document your analysis of the relationship between volume and daily
percentage change.
19
● 2.4 We had created a Trend column in module 1. We want to see how often
each Trend type occurs. This can be seen as a pie chart, with each sector
representing the percentage of days each trend occurs. Plot a pie chart for all
the 'Trend' to know about relative frequency of each trend. You can use the
groupby function with the trend column to group all days with the same
trend into a single group before plotting the pie chart. From the grouped
data, create a BAR plot of average & median values of the 'Total Traded
Quantity' by Trend type.
20
21
● 2.5 Plot the daily return (percentage) distribution as a histogram.
Histogram analysis is one of the most fundamental methods of exploratory
data analysis. In this case, it'd return a frequency plot of various values of
percentage changes .
● 2.6 We next want to analyse how the behaviour of different stocks are
correlated. The correlation is performed on the percentage change of the
stock price instead of the stock price.
Load any 5 stocks of your choice into 5 dataframes. Retain only rows for
which ‘Series’ column has value ‘EQ’. Create a single dataframe which
contains the ‘Closing Price’ of each stock. This dataframe should hence have
five columns. Rename each column to the name of the stock that is
22
contained in the column. Create a new dataframe which is a percentage
change of the values in the previous dataframe. Drop Nan’s from this
dataframe.
Using seaborn, analyse the correlation between the percentage changes in
the five stocks. This is extremely useful for a fund manager to design a
diversified portfolio. To know more, check out these resources on
correlationand diversification.
23
You have already calculated the percentage changes in several stock prices.
Calculate the 7 day rolling average of the percentage change of any of the
stock prices, then compute the standard deviation (which is the square root
of the variance) and plot the values.
Note: pandas provides a rolling() function for dataframes and a std()
function also which you can use.
● 2.8 Calculate the volatility for the Nifty index and compare the 2. This leads
us to a useful indicator known as 'Beta' ( We'll be covering this in length in
Module 3)
● 2.9 Trade Calls - Using Simple Moving Averages. Study about moving
averages here.
24
Plot the 21 day and 34 day Moving average with the average price and
decide a Call !
Call should be buy whenever the smaller moving average (21) crosses over
longer moving average (34) AND the call should be sell whenever smaller
moving average crosses under longer moving average.
One of the most widely used technical indicators.
● 2.10 Trade Calls - Using Bollinger Bands
Plot the bollinger bands for this stock - the duration of 14 days and 2
standard deviations away from the average
The bollinger bands comprise the following data points-
○ The 14 day rolling mean of the closing price (we call it the average)
○ Upper band which is the rolling mean + 2 standard deviations away
from the average.
○ Lower band which is the rolling mean - 2 standard deviations away
from the average.
○ Average Daily stock price.
● Bollinger bands are extremely reliable , with a 95% accuracy at 2 standard
deviations , and especially useful in sideways moving market.
Observe the bands yourself , and analyse the accuracy of all the trade signals
provided by the bollinger bands.
3.4 Module 3
25
than not, we utilize linear regression to come up with an ideal inference. We'd be
using the regression model to solve the following problems:
Problem Statements
● 3.1 Import the file 'gold.csv' (you will find this in the intro section to
download or in '/Data/gold.csv' if you are using the jupyter notebook), which
contains the data of the last 2 years price action of Indian (MCX) gold
standard. Explore the dataframe. You'd see 2 unique columns - 'Pred' and
'new'. One of the 2 columns is a linear combination of the OHLC prices with
varying coefficients while the other is a polynomial function of the same
inputs. Also, one of the 2 columns is partially filled.
○ Using linear regression, find the coefficients of the inputs and using
the same trained model, complete the entire column.
○ Also, try to fit the other column as well using a new linear
regression model. Check if the predictions are accurate. Mention
which column is a linear function and which is polynomial.
(Hint: Plotting a histogram & distplot helps in recognizing the
discrepencies in prediction, if any.)
● CAPM CAPM Analysis and Beta Calculation using regression -
CAPM(Capital Asset Pricing Model) attempts to price securities by
examining the relationship that exists between expected returns and risk.
Read more about CAPM. (Investopedia CAPM reference)
The Beta of an asset is a measure of the sensitivity of its returns relative to a
market benchmark (usually a market index). How sensitive/insensitive is the
returns of an asset to the overall market returns (usually a market index like
S&P 500 index). What happens when the market jumps, does the returns of
the asset jump accordingly or jump somehow?
Read more about Beta (Investopedia Beta reference)
● 3.2 Import the stock of your choosing AND the Nifty index.
Using linear regression (OLS), calculate -
○ The daily Beta value for the past 3 months. (Daily= Daily returns)
26
○ The monthly Beta value. (Monthly= Monthly returns)
● Refrain from using the (covariance(x,y)/variance(x)) formula.
Attempt the question using regression.(Regression Reference)
Were the Beta values more or less than 1 ? What if it was negative ?
Discuss. Include a brief writeup in the bottom of your jupyter notebook with
your inferences from the Beta values and regression results
3.5 Module 4
In this module, we'd be covering the concept of classification and utilize our skills
to solve the following queries – (Stock Price = Close Price)
Problem Statements
● 4.1 Import the csv file of the stock which contained the Bollinger columns as
well.
○ Create a new column 'Call' , whose entries are -
'Buy' if the stock price is below the lower Bollinger band
'Hold Buy/ Liquidate Short' if the stock price is between the lower
and middle Bollinger band
'Hold Short/ Liquidate Buy' if the stock price is between the middle
and upper Bollinger band
'Short' if the stock price is above the upper Bollinger band
○ Now train a classification model with the 3 bollinger columns and
the stock price as inputs and 'Calls' as output. Check the accuracy
on a test set. (There are many classifier models to choose from, try
each one out and compare the accuracy for each)
○ Import another stock data and create the bollinger columns. Using
the already defined model, predict the daily calls for this new stock.
27
● 4.2 Now, we'll again utilize classification to make a trade call, and measure
the efficiency of our trading algorithm over the past two years. For this
assignment , we will use RandomForest classifier.
○ Import the stock data file of your choice
○ Define 4 new columns , whose values are:
% change between Open and Close price for the day
% change between Low and High price for the day
5 day rolling mean of the day to day % change in Close Price
5 day rolling std of the day to day % change in Close Price
○ Create a new column 'Action' whose values are:
1 if next day's price(Close) is greater than present day's.
(-1) if next day's price(Close) is less than present day's.
i.e. Action [ i ] = 1 if Close[ i+1 ] > Close[ i ]
i.e. Action [ i ] = (-1) if Close[ i+1 ] < Close[ i ]
○ Construct a classification model with the 4 new inputs and 'Action'
as target
○ Check the accuracy of this model , also , plot the net cumulative
returns (in %) if we were to follow this algorithmic model
28
3.6 Module 5
Problem Statements
● 5.1 For your chosen stock, calculate the mean daily return and daily standard
deviation of returns, and then just annualise them to get mean expected
annual return and volatility of that single stock. ( annual mean = daily mean
* 252 , annual stdev = daily stdev * sqrt(252) )
● 5.2 Now, we need to diversify our portfolio. Build your own portfolio by
choosing any 5 stocks, preferably of different sectors and different caps.
Assume that all 5 have the same weightage, i.e. 20% . Now calculate the
annual returns and volatility of the entire portfolio ( Hint : Don't forget to
use the covariance )
● 5.3 Prepare a scatter plot for differing weights of the individual stocks in the
portfolio , the axes being the returns and volatility. Colour the data points
based on the Sharpe Ratio ( Returns/Volatility) of that particular portfolio.
29
● 5.4 Mark the 2 portfolios where -
Portfolio 1 - The Sharpe ratio is the highest
Portfolio 2 - The volatility is the lowest.
3.7 Module 6
30
returns fall into one basket, those slightly less correlated in another, and so on,
until each stock is placed into a category.
Problem Statements
● 6.1 Create a table/data frame with the closing prices of 30 different stocks,
with 10 from each of the caps
● 6.2 Calculate average annual percentage return and volatility of all 30 stocks
over a theoretical one year period
● 6.3 Cluster the 30 stocks according to their mean annual Volatilities and
Returns using K-means clustering. Identify the optimum number of clusters
using the Elbow curve method
● 6.4 Prepare a separate Data frame to show which stocks belong to the same
cluster
31