0% found this document useful (0 votes)
22 views14 pages

Stock NLP ML CS412.pptx-1

1) The document discusses using natural language processing techniques to predict stock market trends based on news headlines. 2) The dataset combines world news headlines and stock price shifts from 2008-2016, with labels indicating whether the stock price increased or decreased. 3) Various machine learning models like logistic regression, random forests, SVM, and Naive Bayes were tested on n-gram tokenized versions of the news text, with random forests on bigrams achieving the highest accuracy of 85.97% for stock trend prediction.

Uploaded by

shaligram21comp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views14 pages

Stock NLP ML CS412.pptx-1

1) The document discusses using natural language processing techniques to predict stock market trends based on news headlines. 2) The dataset combines world news headlines and stock price shifts from 2008-2016, with labels indicating whether the stock price increased or decreased. 3) Various machine learning models like logistic regression, random forests, SVM, and Naive Bayes were tested on n-gram tokenized versions of the news text, with random forests on bigrams achieving the highest accuracy of 85.97% for stock trend prediction.

Uploaded by

shaligram21comp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Stock Market Prediction

Using Natural Language


Processing

-Arnav Dahal
-Hassan Pasha
-Manoj Kumar Gunasekaran
-Niharika Balachandra
-Sathvik Raju
-Yi-Huan Chen
INTRODUCTION:

▶ Natural Language Processing : Attempting to discover patterns


and ability to manipulate the human language by a computer.
▶ Stock Market Prediction is one of the most famously researched
areas that takes the help of Machine learning to predict the rise
and fall of a stock based on past data.
DATASET

▶ The data set in consideration is a combination of the world news and


stock price shifts available on Kaggle.
▶ There are 25 columns of top news headlines for each day in the data
frame.
▶ Data ranges from 2008 to 2016 and the data from 2000 to 2008 was
scrapped from Yahoo finance.
▶ Labels are based on the Dow Jones Industrial Average stock index.
▶ Class 1→ the stock price increased.
▶ Class 0→ the stock price stayed the same or decreased.
Data Wrangling

▶ The data has a lot of stopwords. (Words like a, the, you


doesn’t help in predicting a stock!)
▶ Convert all the words to lowercase.
▶ Remove punctuation marks and numbers.
▶ Combine all the top 25 News headline into one single list of
words per day.
The Data has been Processed!
Words → Vectors

▶ CountVectorizer helps to tokenize and determine the


frequency of the words.

▶ Then fit_transform is applied on the above object to obtain


a sparse matrix of word counts.
MODEL Logistic Regression:

1 gram model
Accuracy 82.275%

Bi-gram model
Accuracy 85.714%

Tri-gram model
Accuracy 85.185%
MODEL : Random Forests

1 gram model
Accuracy 84.465%

Bi-gram model
Accuracy 85.978%

Tri-gram model
Accuracy 85.185%
MODEL: LINEAR SVM

1 gram model
Accuracy 82.275%

Bi-gram model
Accuracy 84.656%

Tri-gram model
Accuracy 84.656%
MODEL: SVM(GAUSSIAN KERNEL)

1 gram model
Accuracy 85.185%

Bi-gram model
Accuracy 85.185%

Tri-gram model
Accuracy 82.539%
MODEL: NAÏVE BAYES

1 gram model –
Accuracy
82.0105%

Bi-gram model didn’t want to execute on our computer


Conclusion
▶ Random forests had highest accuracy on the a bi-gram
model as shown in the chart. The prediction accuracy was
85.97%.

▶ Using Natural Language Processing techniques, we were


able to accurately predict the stock market trends 85% of
the time.
Questions?

You might also like