Stock Prediction
Stock Prediction
1
ADITYA DEGREE COLLEGE
(sai prasanna)
2
ADITYA DEGREE COLLEGE
3
ACKNOWLEDGEMENT
No endeavour is completed without the valuable support of others.
I would like to take this opportunity to extend my sincere gratitude
to all those who have contributed to the successful completion of this
Short-Term Internship Project Report.
(sai prasanna)
4
CONTENTS
Introduction
Learning outcome of Short-Term Internship
Introduction to AI and ML
ML and types of ML
Applications of ML
Deep Learning
ANN, NLP, CC
AI tools we used in our daily life
Back propagation
Difference between neural & deep neural
networks
Difference between ChatGPT and Google
POS Tagging
Object detection
CNN algorithm
Deep fake, Deep dream
GAN model and architecture
Data augmentation
Parameter sharing and typing
Ensemble methods
Bayes theorem
5
LSTM- long short-term memory
Restricted Boltzmann Machine
RNN- Recurrent Neural Network
Auto encoders and types
VGG Net and architecture
Google Net and architecture
Data types in Python
Arithmetic operations in python
Declaration of comments and variables
Reserved words in python
Control statements in python
Programs
Problem statement & Explanation
Source and Outputs
Conclusion
6
INTRODUCTION
Stock market prediction is the act of trying to determine the future
value of a company stock or other financial instrument traded on
an exchange. The successful prediction of a stock's future price could
yield significant profit. The efficient market hypothesis suggests that
stock prices reflect all currently available information and any price
changes that are not based on newly revealed information thus are
inherently unpredictable. Others disagree and those with this
viewpoint possess myriad methods and technologies which
purportedly allow them to gain future price information
The efficient market hypothesis posits that stock prices are a function
of information and rational expectations, and that newly revealed
information about a company's prospects is almost immediately
reflected in the current stock price. This would imply that all publicly
known information about a company, which obviously includes its
price history, would already be reflected in the current price of the
stock. Accordingly, changes in the stock price reflect release of new
information, changes in the market generally, or random movements
around the value that reflects the existing information set.
Burton Malkiel, in his influential 1973 work A Random Walk Down
Wall Street, claimed that stock prices could therefore not be
accurately predicted by looking at price history. As a result, Malkiel
argued, stock prices are best described by a statistical process called
a "random walk" meaning each day's deviations from the central
value are random and unpredictable. This led Malkiel to conclude that
paying financial services persons to predict the market actually hurt,
rather than helped, net portfolio return. A number of empirical tests
support the notion that the theory applies generally, as most portfolios
managed by professional stock predictors do not outperform the
market average return after accounting for the managers' fees.
7
LEARNING OUTCOME OF
SHORT-TERM INTERNSHIP
Introduction to AI & ML: -
Types of ML: -
There are three types of machine learning:
1. supervised learning – it is a labelled data or structured data
2. unsupervised learning – it is un-labelled data or unstructured data
3. reinforcement learning -it uses both structured data and unstructured data
Supervised learning: -
• Supervised learning involves training a machine from labelled data.
8
• Labelled data consists of examples with the correct answer or classification.
• The machine learns the relationship between inputs (fruit images) and
outputs (fruit labels).
• The trained machine can then make predictions on new, unlabelled data.
Supervised learning is classified into two categories of algorithms:
• Regression: A regression problem is when the output variable is a real value,
such as “dollars” or “weight”.
• Classification: A classification problem is when the output variable is a
category, such as “Red” or “blue”, “disease” or “no disease”
Regression: -
Regression is a type of supervised learning that is used to predict continuous values,
such as house prices, stock prices, or customer churn. Regression algorithms learn a
function that maps from the input features to the output value.
Some common regression algorithms include:
• Linear Regression
• Polynomial Regression
• Support Vector Machine Regression
• Decision Tree Regression
• Random Forest Regression
Classification: -
Classification is a type of supervised learning that is used to predict categorical values,
such as whether a customer will churn or not, whether an email is spam or not, or
whether a medical image shows a tumour or not. Classification algorithms learn a
function that maps from the input features to a probability distribution over the
output classes.
Some common classification algorithms include:
• Logistic Regression
• Support Vector Machines
• Decision Trees
• Random Forests
• Naive Baye
9
Unsupervised Learning: -
• Unsupervised learning allows the model to discover patterns and relationships
in unlabelled data.
• Clustering algorithms group similar data points together based on their
inherent characteristics.
• Feature extraction captures essential information from the data, enabling the
model to make meaningful distinctions.
• Label association assigns categories to the clusters based on the extracted
patterns and characteristics.
Unsupervised learning is classified into two categories of algorithms:
• Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behaviour.
• Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that buy
X also tend to buy Y.
Clustering: -
Clustering is a type of unsupervised learning that is used to group similar data points
together. Clustering algorithms work by iteratively moving data points closer to their
cluster centres and further away from data points in other clusters. Some types of
clustering are:
• Hierarchical clustering
• K-means clustering
• Principal Component Analysis
• Singular Value Decomposition
• Independent Component Analysis
Association: -
Association rule learning is a type of unsupervised learning that is used to identify
patterns in a data. Association rule learning algorithms work by finding relationships
between different items in a dataset.
Some common association rule learning algorithms include:
• Apriori Algorithm
• Eclat Algorithm
• FP-Growth Algorithm
10
Reinforcement Learning: -
Reinforcement Learning (RL) is the science of decision making. It is about learning the
optimal behaviour in an environment to obtain maximum reward. In RL, the data is
accumulated from machine learning systems that use a trial-anderror method. Data is
not part of the input that we would find in supervised or unsupervised machine
learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which
action to take next.
Reinforcement learning elements are as follows:
• Policy
• Reward function
• Value function
• Model of the environment
Applications of ML: -
Today, companies are using Machine Learning to improve business decisions,
increase productivity, detect disease, forecast weather, and do many more things.
Some of the most common examples are:
• Image Recognition
• Speech Recognition
• Recommender Systems
• Fraud Detection
• Self-driving Cars
• Medical Diagnosis
• Stock Market Trading
• Virtual Try On
Image Recognition: -
Image recognition made a bloom in Deep Learning.The task which started from
classification between cats and dog images has now evolved up to the level of Face
Recognition and real-world use cases based on that like employee attendance
tracking.
Speech Recognition: -
Speech Recognition based smart systems like Alexa and Siri have certainly come
across and used to communicate with them. In the backend, these systems are based
11
basically on Speech Recognition systems. These systems are designed such that they
can convert voice instructions into text.
Recommender Systems: -
Approximately everyone trying to provide customized services to its users. This
application is possible just because of the recommender systems which can analyse a
user’s preferences and search history and based on that they can recommend content
or services to them.
Fraud detection: -
Due to ML applications only whenever the system detects red flags in a user’s activity
than a suitable notification be provided to the administrator so, that these cases can
be monitored properly for any spam or fraud activities.
Medical diagnosis: -
Not even in the field of disease diagnosis in human beings but they work perfectly
fine for plant disease-related tasks whether it is to predict the type of disease it is or
to detect whether some disease is going to occur in the future.
Deep Learning: -
In the fast-evolving era of artificial intelligence, Deep Learning stands as a cornerstone
technology, revolutionizing how machines understand, learn, and interact with
complex data. At its essence, Deep Learning AI mimics the intricate neural networks
of the human brain, enabling computers to autonomously discover patterns and
make decisions from vast amounts of unstructured data. This transformative field has
propelled breakthroughs across various domains, from computer vision and natural
language processing to healthcare diagnostics and autonomous driving. In a fully
connected Deep neural network, there is an input layer and one or more hidden
layers connected one after the other.
12
world which the neural network needs to analyse or learn about. Then this data
passes through one or multiple hidden layers that transform the input into data that
is valuable for the output layer. Finally, the output layer provides an output in the
form of a response of the Artificial Neural Networks to input data provided.
Congestion Control: -
Congestion Control is a mechanism that controls the entry of data packets into the
network, enabling a better use of a shared network infrastructure and avoiding
congestive collapse. Congestive-Avoidance Algorithms (CAA) are implemented at the
TCP layer as the mechanism to avoid congestive collapse in a network. There are two
congestion control algorithm which are as follows:
• Leaky Bucket Algorithm: -
The leaky bucket algorithm discovers its use in the context of network traffic
shaping or rate-limiting. This algorithm is used to control the rate at which traffic is
sent to the network and shape the burst traffic to a steady traffic stream. The large
area of network resources such as bandwidth is not being used effectively.
• Token bucket Algorithm: -
13
In some applications, when large bursts arrive, the output is allowed to speed up.
This calls for a more flexible algorithm, preferably one that never loses
information. Therefore, a token bucket algorithm finds its uses in network traffic
shaping or rate-limiting. It is a control algorithm that indicates when traffic should
be sent. This order comes based on the display of tokens in the bucket.
14
Back Propagation: -
The reverse process of feed forward neural network is called back
propagation. It rectifies the error in hidden layers and form backward
direction. The flow chart as follows: -
Predictive value errors (input layer) backward directionoutput layer to hidden
layer errors correctionoutput layerprediction value
Calculation of weights: -
15
Difference between Neural and Deep Neural
Networks: -
The differences between the neural networks and deep learning neural networks are
tabulate as follows: -
Neural Deep Learning
s.no Differences in Networks Neural
Networks
1. A neural network is Deep learning
a model of neurons neural networks
inspired by the are distinguished
human brain. It is from neural
Definition made up of many networks on the
neurons that at basis of their depth
inter-connected or number of
with each other. hidden layers.
2. Architecture Feed Forward Recursive Neural
Neural Networks Networks
Recurrent Neural Unsupervised
Networks Pre-trained
Symmetrically Networks
Connected Convolutional
Neural Networks Neural Networks
3. Structure Neurons Motherboards
Connection and PSU
weights RAM
Propagation Processors
function Learning
rate
4. Performance It gives low It gives high
performance performance
compared to Deep compared to
Learning neural networks.
Networks.
16
5. Task Interpretation Your task is poorly The deep learning
interpreted by a network more
neural network. effectively.
5. It gives the answer based on the Gives the answer based on the
information it trained on. searches and reviews.
6. It focused on generating humanlike It can be used for variety of tasks
texts. like voices and image recognition.
9. Gives the information from its data Gives the information that is
source. already on the internet
10. It is developed by OpenAI. It is developed by Google Inc.
POS Tagging: -
Parts of Speech tagging is a linguistic activity in Natural Language Processing
(NLP) wherein each word in a document is given a particular part of speech (adverb,
adjective, verb, etc.) or grammatical category. Through the addition of a layer of
syntactic and semantic information to the words, this procedure makes it easier to
comprehend the sentence’s structure and meaning.
17
In NLP applications, POS tagging is useful for machine translation, named entity
recognition, and information extraction, among other things. It also works well for
clearing out ambiguity in terms with numerous meanings and revealing a sentence’s
grammatical structure.
Object Detection: -
Object detection is a technique that uses neural networks to localize and classify
objects in images. This computer vision task has a wide range of applications, from
medical imaging to self-driving cars.
Object detection is a computer vision task that aims to locate objects in digital
images. Object detection overlaps with other computer vision techniques, but
developers nevertheless treat it as a discrete endeavour.
Image classification (or image recognition) aims to classify images according to
defined categories. A rudimentary example of this is CAPTCHA image tests, in which a
group of images may be organized as images with stop signs and images without.
Image classification assigns one label to a whole image.
Object detection, by comparison, delineates individual objects in an image according
to specified categories. While image classification divides images among those that
have stop signs and those that do not, object detection locates and categorizes all of
the road signs in an image, as well as other objects such as cars and people.
Image segmentation (or semantic segmentation) is similar to object detection, albeit
more precise. Like object detection, segmentation delineates objects in an image
according to semantic categories. But rather than mark objects using boxes,
segmentation demarcates objects at the pixel level.
18
CNN Algorithm: -
A Convolutional Neural Network (CNN), also known as ConvNet, is a specialized type
of deep learning algorithm mainly designed for tasks that necessitate object
recognition, including image classification, detection, and segmentation. CNNs are
employed in a variety of practical scenarios, such as autonomous vehicles, security
camera systems, and others.
The convolutional neural network is made of four main parts.They help the CNNs
mimic how the human brain operates to recognize patterns and features in images:
• Convolutional layers
• Rectified Linear Unit (ReLU for short)
• Pooling layers
• Fully connected layers
20
In GAN architecture, it takes the samples of real-world images and from generator.
The samples are aligned into a discriminator to generate new images. It gives the loss
function by analyzing real and fake images. Here back propagation error is occurred to
update discriminator weights. The flowchart is as follows: -
Loss functionOutput – MistakeBackpropagationDiscriminator->Errors
identificationRectify errorsOutput
Data Augmentation: -
Data Augmentation is a set of techniques used to increase the amount of data by
adding or slightly modifying copies in already existing data or newly Created Synthetic
data. It includes making minor changes to the dataset or using deep learning to
generate new data points. Data augmentation is useful to prevent models from
overfitting. It improves the model accuracy. It also useful to reduce the operational
cost of labelling and cleaning the raw dataset. Data augmentation is used in X-rays,
self-driving cars, automatic speech recognition. Data augmentation is useful to
improve the performance and outcomes of machine learning models by forming new
and different examples to train datasets.
Ensemble Methods: -
Ensemble methods are techniques that aims to improve the results in models by
combining multiple model instead of single model. These methods help to increase
the accuracy of the results. Ensemble methods are ideal for regression and
classification where they reduce bis and variance to accuracy of models. The most
popular ensemble methods are as follows: -
Bagging: -
Bagging is the short form for bootstrap aggregating. It is mainly applied in
classification and regression.
It increases the accuracy of models through decision trees, which reduces variance
to a large extent.
It classified into two types i.e. bootstrapping and aggregation.
Bootstrapping is a sampling technique where samples are derived from the whole
set using the replacement procedure.
Aggregation in bagging is done to incorporate all possible outcomes of the
prediction and randomize the outcome.
Without aggregation, predictions will not be accurate because all outcomes are not
put into consideration.
Boosting: -
Boosting is an ensemble method that learns from previous predictor mistakes to
make better predictions in the future.
This technique combines several weak base learners to form one strong learner,
thus improving the predictability of models.
22
Boosting takes many forms, including gradient boosting, Adaptive boosting, and XG
Boost.
Stacking: -
Stacking is often referred to as stacked generalization.
This method works by allowing a training algorithm to ensemble several other
similar learning algorithm predictions.
It can also be used to measure the error rate involved during bagging.
Combination of boosting and stacking is called Boostacking.
Bayes Theorem: -
Bayes’ theorem is a fundamental concept in probability theory that plays a crucial role
in various machine learning algorithms, especially in the fields of Bayesian statistics
and probabilistic modelling. It provides a way to update probabilities based on new
evidence or information. In the context of machine learning, Bayes’ theorem is often
used in Bayesian inference and probabilistic models. The theorem can be
mathematically expressed as:
𝑃�(𝐴�/𝐵�)=𝑃�(𝐵�/𝐴�)⋅𝑃�(𝐴�)/𝑃�(𝐵�)
Where: -
P(A/B) is the posterior probability of event A given event B.
(B/A) is the likelihood of event B given event A.
P(A) is the prior probability of event A.
P(B) is the total probability of event B.
LSTM Architecture: -
The LSTM architectures involve the memory cell which is controlled by three gates:
the input gate, the forget gate, and the output gate. These gates decide what
information to add to, remove from, and output from the memory cell.
• The input gate controls what information is added to the memory cell.
• The forget gate controls what information is removed from the memory cell.
• The output gate controls what information is output from the memory cell.
24
RNN- Recurrent Neural Networks: -
In RNN we have separate and independent input and output layers which we
inefficient the dealing with sequential data hence a new neural network called RNN
was introduced to store of previous outputs in the internal memory. these results are
then fed into the neural network as input this allows it to use in applications like
pattern detection speech recognition, NLP, time series prediction. RNN has hidden
layers that acts memory locations to store the output of a layer in a loop. There are
4types in recurrent neural network
1. one to one
2. one to many
3. many to one 4. many to many one to one:
In RNN is one to one wich allows a single input & single output. It has fired input
and output sizes and acts as a Traditional Neural Networks Applications:
image classification
input --> | | --> output
one to many:
One to many is a type of RNN that gives multiple outputs which we given single
input. It tables a fixed size and give a sequential of data inputs and the main
applications are found in music generation and image capturing input --> | | -->
output
| | --> output
| | --> output
many to one:
Many-to-one is used when a single output is required from multiple inputs in
Sequence. It takes a sequence of inputs to display fixed output.
input 1 --> input 2 -->
input 3 --> | | -->
output many-to-many:
It is used to generate the sequence of output data from a sequence of input data.
They are divided into 2 sub categories
25
1. equal unit size 2.unequal unit size Equal
unit size:
The no. of both inputs and outputs is same. it can be found in batch normalization
and name-entity recognition.
input1 --> | | --> output1 input2
--> | | --> output2 input2 --> | |
--> output3 Unequal unit size:
Input and output have different unit numbers and it applications can be found in
machine translation. input1 --> | | --> output1 input2 --> | | --> output2
Input and output values have different unit numbers
27
Data Types in Python: -
Python: -
Python is an interpreted, object-orientated high level programming language. It was
created by Gudio Van Rossum in 1991. Python supports modules and packages which
encourages program modularity and code reusability. Python works on different
platforms such as windows, mac OS, Linux etc. Python has syntax that allows develop
to write programs with fewer lines compared to other programming languages. There
are 14 data types. They are as follows: -
• INT: -It consists the integer values i.e. numbers. 2, 3, ….
• FLOAT: - It consists of floating values i.e. decimals 0.78, 0,66, ….
• COMPLEX: - It consists of imaginary and real numbers. 2+3i, ….
• CHAR: - It consists of characters i.e. alphabets. A, C, e ….
• STR: -It consists of string values i.e. group of characters. ‘Python’
• BYTE: - Consists of 0 and 1.
• BOOL: -Consists of Boolean type i.e. TRUE and FALSE.
• SET: -Consists the data items in a curly-braces. ,1, 5-
• TUPPLE: - Consists the data items in parentheses. (78, 98)
• DICT: -Consists the data items in curly-braces. ,1, 6, 9, 8-
• FROZEN SET: - Consists of set Operations. union, intersection
• RANGE: - Consists of range values. for i in range
• LIST: -Consists the data items in square brackets. *3, 6, 8+
28
+ Addition x+y 9 + 3 = 12
* Multiplication x*y 9 * 3 = 27
// Floor division x // y 9 // 3 = 3
Comments in python: -
Comments are used for the description of the code.
There are two types of comment lines. They are: -
1. Single line comment 2. Multiple line comment
Single line comments are declared with the symbol (#) in-front of them Multiple
line comments are declared with triple quotations (‘ ’ ’).
Examples are as follows: - # Python is a high-level language.
‘’’
Python is interpreted language
29
It executes the code line by line
‘‘‘
31
if…else statement: -
The `if…else` statement allows you to execute the instructions if one block of
code condition is true, and another block when if the condition is false.
SYNTAX: -
if condition:
statement
else:
statement
if…elif…else statement: -
The `if…elif…else` statement allows you to check multiple conditions and
execute different blocks of code based on which condition is true.
SYNTAX: -
if condition1:
statement elif
condition2:
statement
else:statement
Nested if statement:
-
A nested `if` statement is an `if` statement that is placed inside another `if`
statement.
SYNTAX: -
if condition1:
if condition2:
statement
Nested if…else statement: -
A nested `if…else` statement is similar to a nested `if` statement, but it
includes an `else` block for each `if` condition.
32
SYNTAX: -
if condition1:
if condition2:
statement
else:
statement
else:
statement
• Jumping statements: - Break statement: -
The break statement terminates the loop it is currently in, regardless of
whether the loop condition is true or false.
EXAMPLE: -
for i in range (10):
If i==5:
Break
Print(i)
Continue statement: -
The continue statement skips the rest of the code inside the loop for the
current iteration and proceeds to the next iteration of the loop.
EXAMPLE: -
for i in range (10):
if i==5:
continue
print(i)
Pass statement: -
The pass statement is a null operation; nothing happens when it is executed.
33
EXAMPLE: -
for i in range (5):
if i==3:
pass else:
print(i)
• For loop: -
It can be used to iterate over a range and iterators.
SYNTAX: - for
iterator_var in range: statement
• Nested loop: -
Python programming language allows to use one loop inside another loop
which is called nested loop.
SYNTAX1: - for
iterator_var in sequence:
for iterator_var in sequence:
statements(s)
statements(s) SYNTAX2: -
while expression:
while expression:
statement(s) statement(s)
34
Programs: -
Given number is positive or negative or zero: -
n=float (input ()) if n>0:
print("positive")
elif n<0:
print("negative")
else:
print("zero")
Given number is odd or even: -
n=int (input ()) if n%2==0:
print("even")
else:
print("Odd")
Given number is Armstrong or not: -
n=int(input()) temp=n s=0 while
n!=0: r=n%10 n=n//10
s=s+r*r*r if s==temp:
print('Armstrong') else:
print('Not Armstrong)
Eliminate Duplicate Values: -
l=list(map(int,input().split()))
u=*+ d=*+ for i in l: if i not in
u:
u.append(i)
else:
35
d.append(i) for
i in u :
print(i,end=" ")
36
To work on a supermarket sales prediction project, you
would typically use datasets that contain historical records of
supermarket transactions. These datasets can vary in
complexity and granularity but generally include information
such as date of transaction, store ID, item ID, sales amount,
promotions, and possibly additional features like weather
conditions or customer demographics.
Example Supermarket Sales Dataset Features
Date: The date of the transaction, which helps capture
seasonal and daily trends.
Store ID: Identifies the specific store where the transaction
took place. This can be useful for analyzing sales
performance across different locations.
Item ID: Identifies the product that was sold during the
transaction.
Sales: The amount of sales generated from the transaction,
which serves as the target variable for prediction.
Promotion: Indicates whether the item was sold at a
promotional price or during a special offer period.
Price: The price at which the item was sold during the
transaction.
Customer Demographics: If available, data on customer
characteristics such as age, gender, loyalty status, etc. This
can help in understanding buying patterns based on different
customer segments.
37
Store Location: Information about the store's geographical
location, which might include urban/rural classification,
region, etc
External Factors (optional): Additional data like weather
conditions (temperature, precipitation), economic indicators
(GDP, inflation), or competitor pricing could be included if
deemed relevant for predicting sales.
Libraries Used for Analysis and Modeling
To work with such a dataset and build predictive models, you
would typically use Python along with various libraries:
Pandas: For data manipulation and preprocessing tasks
such as loading data, cleaning, filtering, and transforming
datasets.
NumPy: For numerical operations and array manipulations,
often used in conjunction with Pandas.
Matplotlib and Seaborn: For data visualization, creating plots
and graphs to explore distributions, trends, and relationships
within the dataset.
scikit-learn: A comprehensive library for machine learning
tasks, providing tools for preprocessing data, building
models (regression, classification, clustering), model
evaluation, and more.
XGBoost or LightGBM: Gradient boosting libraries that are
commonly used for building ensemble models, which can
often yield better predictive performance compared to
traditional models like linear regression.
38
SOURCE CODE:
39
40
41
42
43
44
45
46
47
48
49
50
CONCLUSION
51