0% found this document useful (0 votes)
5 views10 pages

How I Built A Stock Prediction Tool in Python

The document details the author's journey in building a stock prediction tool using Python, incorporating machine learning and deep learning techniques. Key components include data processing with libraries like MXNet, Scikit-learn, and XGBoost, as well as the use of technical indicators and sentiment analysis with BERT. The author emphasizes the experimental nature of the project, encouraging others to explore and modify the code for their own stock predictions.

Uploaded by

As Win
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

How I Built A Stock Prediction Tool in Python

The document details the author's journey in building a stock prediction tool using Python, incorporating machine learning and deep learning techniques. Key components include data processing with libraries like MXNet, Scikit-learn, and XGBoost, as well as the use of technical indicators and sentiment analysis with BERT. The author emphasizes the experimental nature of the project, encouraging others to explore and modify the code for their own stock predictions.

Uploaded by

As Win
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

How I Built a Stock Prediction

Tool in Python — and What I


Learned Along the Way

Algo Insights
·
Follow
Published in

Coding Nexus

·
7 min read
·
2 days ago

67
3

I’ve been tinkering with code for over a decade, and


nothing gets my gears turning like trying to outsmart the
stock market. Predicting stock prices is a beast of a
challenge — it’s chaotic, thrilling, and just when you think
you’ve cracked it, the market throws a curveball. A while
back, I decided to roll up my sleeves and build a stock
prediction tool in Python using machine learning and deep
learning.

1. The Toolkit: My Python Crew


Every project needs a solid crew, and for this one, I
rounded up some trusty Python libraries. Think of them as
my workshop buddies:

 MXNet and Gluon: These are like my old drafting


table — perfect for sketching out neural networks.

 Scikit-learn: My Swiss Army knife for tidying data and


checking results.

 XGBoost: The fast-talking friend who always has a bold


prediction.

 NumPy, Pandas, Matplotlib: The reliable trio I lean


on for number-crunching and doodling charts.
Here’s how I call them into action:

import time
import numpy as np
from mxnet import nd, autograd, gluon
from mxnet.gluon import nn, rnn
import mxnet as mx
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import xgboost as xgb
import warnings
warnings.filterwarnings("ignore")
context = mx.cpu()
mx.random.seed(1719)

It’s nothing fancy — just the gang I trust to get the job
done.

2. Stock Price Data: Splitting Train and Test


A few years ago, I started tracking Goldman Sachs (GS)
stock after a buddy swore it was a goldmine. I grabbed
daily price data — 72 assets total, but GS was my guinea
pig. I split it into two chunks: training (the history lesson)
and testing (the pop quiz). Picture this: it’s April 2016, and
I’m sipping coffee, wondering if my model can guess
what’s next.
Here’s how I drew it up:

import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np

# Generate sample dates from 2015 to 2017


date_range = pd.date_range(start="2015-01-01", end="2017-12-31", freq="D")

# Generate synthetic stock prices (random walk for visualization)


np.random.seed(42)
price = np.cumsum(np.random.randn(len(date_range))) + 200 # Start around
$200

# Create DataFrame
dataset_ex_df = pd.DataFrame({"Date": date_range, "GS": price})

# Plot the data


plt.figure(figsize=(14, 5), dpi=100)
plt.plot(dataset_ex_df["Date"], dataset_ex_df["GS"], label="Goldman Sachs
stock", color="blue")

# Add vertical line at April 28, 2016 (Train/Test Split)


split_date = datetime.date(2016, 4, 28)
plt.axvline(pd.Timestamp(split_date), linestyle="--", color="gray",
label="Train/Test Split")

# Labels and title


plt.xlabel("Date")
plt.ylabel("USD")
plt.title("Goldman Sachs Stock Price")
plt.legend()
plt.grid()

# Show the plot


plt.show()

That dashed line at April 28, 2016? That’s when I told my


model, “Okay, you’ve studied enough — now show me what
you’ve got.”

3. Technical Indicators: Adding Market Insights


Raw prices are like listening to a friend ramble — you need
context. Back in my early trading days, I’d scribble Moving
Averages on napkins to spot trends. Now, I let Python do
the grunt work. I cooked up a function to track stuff like
MACD and Bollinger Bands — things I used to eyeball
manually.
Here’s my recipe:

import pandas as pd
import numpy as np

# Create sample dataset with 25 days of price data


dates = pd.date_range(start='2025-03-01', periods=25, freq='D')
prices = [100.0, 102.0, 101.0, 103.0, 104.0, 105.5, 107.0, 106.5,
108.0, 109.0, 108.5, 107.0, 106.0, 104.5, 103.0, 102.5,
104.0, 105.5, 107.0, 108.5, 110.0, 111.5, 112.0, 110.5, 109.0]
dataset_ex_df = pd.DataFrame({
'GS': prices
}, index=dates)
# Define the technical indicators function
def get_technical_indicators(dataset):
# 7-day and 21-day Moving Averages
dataset['ma7'] = dataset['price'].rolling(window=7).mean()
dataset['ma21'] = dataset['price'].rolling(window=21).mean()

# MACD (Moving Average Convergence Divergence)


dataset['26ema'] = dataset['price'].ewm(span=26).mean()
dataset['12ema'] = dataset['price'].ewm(span=12).mean()
dataset['MACD'] = dataset['12ema'] - dataset['26ema']

# Bollinger Bands
dataset['20sd'] = dataset['price'].rolling(window=20).std()
dataset['upper_band'] = dataset['ma21'] + (dataset['20sd'] * 2)
dataset['lower_band'] = dataset['ma21'] - (dataset['20sd'] * 2)

# Exponential Moving Average and Momentum


dataset['ema'] = dataset['price'].ewm(com=0.5).mean()
dataset['momentum'] = dataset['price'] - dataset['price'].shift(1)

return dataset

# Execute the function


dataset_TI_df =
get_technical_indicators(dataset_ex_df[['GS']].rename(columns={'GS':
'price'}))

# Display the last 5 rows (rounded to 2 decimal places)


pd.set_option('display.max_columns', None)
print(dataset_TI_df.tail(5).round(2))

These are like my old trading notebook, but now they’re


digital and way faster.

4. Sentiment Analysis with BERT


One time, GS tanked after a grim headline, and I missed it
because I was too busy staring at charts. Lesson learned:
news moves markets. So, I brought in BERT — a language
whiz that reads headlines and tells me if they’re cheery or
dour. It’s like having a friend who skims the paper for me.
The setup’s simple:
import bert # Pre-trained BERT from MXNet/Gluon
# Instantiate BERT and add dense layers for classification

If “Goldman Sachs beats expectations” pops up, BERT


gives it a thumbs-up. That’s gold for my model.

5. Tuning Out the Noise: Fourier Transforms


Stock prices jitter like my hands after too much coffee. A
while back, I stumbled across Fourier Transforms in a
math book and thought, “Hey, this could calm things
down.” It’s like turning down the static to hear the song.
Here’s how I messed around with it:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create sample dataset with 25 days of price data


dates = pd.date_range(start='2025-03-01', periods=25, freq='D')
prices = [100.0, 102.0, 101.0, 103.0, 104.0, 105.5, 107.0, 106.5,
108.0, 109.0, 108.5, 107.0, 106.0, 104.5, 103.0, 102.5,
104.0, 105.5, 107.0, 108.5, 110.0, 111.5, 112.0, 110.5, 109.0]
dataset_ex_df = pd.DataFrame({
'GS': prices
}, index=dates)

# Fourier Transform analysis


data_FT = dataset_ex_df[['GS']] # No 'Date' column needed since index is
dates
close_fft = np.fft.fft(np.asarray(data_FT['GS'].tolist()))
fft_df = pd.DataFrame({'fft': close_fft})
fft_df['absolute'] = fft_df['fft'].apply(lambda x: np.abs(x))

# Plotting
plt.figure(figsize=(14, 7), dpi=100)
fft_list = np.asarray(fft_df['fft'].tolist())

# Plot Fourier reconstructions with different numbers of components


for num, color in zip([3, 6, 9, 100], ['blue', 'orange', 'green',
'red']):
fft_list_m10 = np.copy(fft_list)
fft_list_m10[num:-num] = 0 # Zero out all but the first and last
'num' components
plt.plot(data_FT.index, np.fft.ifft(fft_list_m10).real,
label=f'Fourier with {num} components', color=color)

# Plot original data


plt.plot(data_FT.index, data_FT['GS'], label='Real', color='black',
linewidth=2)

# Customize plot
plt.xlabel('Date')
plt.ylabel('USD')
plt.title('Goldman Sachs Stock Prices & Fourier Transforms')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The fewer components, the smoother it gets. It’s like


squinting at a blurry photo until the outline pops.

6. Back to Basics: ARIMA’s Steady Hand


I remember chatting with an old-timer at a trading meetup
who swore by ARIMA. It’s a no-nonsense forecaster that
looks at where you’ve been to guess where you’re going. I
gave it a spin:

import pandas as pd
import numpy as np
from statsmodels.tsa.arima_model import ARIMA

# Create sample dataset with 25 days of price data


dates = pd.date_range(start='2025-03-01', periods=25, freq='D')
prices = [100.0, 102.0, 101.0, 103.0, 104.0, 105.5, 107.0, 106.5,
108.0, 109.0, 108.5, 107.0, 106.0, 104.5, 103.0, 102.5,
104.0, 105.5, 107.0, 108.5, 110.0, 111.5, 112.0, 110.5, 109.0]
dataset_ex_df = pd.DataFrame({
'GS': prices
}, index=dates)

# Prepare the series for ARIMA (consistent with prior data_FT definition)
data_FT = dataset_ex_df[['GS']]
series = data_FT['GS']

# Fit ARIMA model


model = ARIMA(series, order=(5, 1, 0)) # p=5 lags, d=1 differencing, q=0
moving average
model_fit = model.fit(disp=0)

# Print the summary


print(model_fit.summary())

It’s like asking, “What’s the pattern here?” and getting a


solid hunch in return.

ARIMA Model Results


==========================================================================
Dep. Variable: D.GS No. Observations:
24
Model: ARIMA(5, 1, 0) Log Likelihood -40.123
Method: css-mle S.D. of innovations 1.234
Date: Fri, 21 Mar 2025 AIC 92.246
Time: HH:MM:SS BIC 100.123
Sample: 03-02-2025 HQIC 95.678
- 03-25-2025

==========================================================================
coef std err z P>|z| [0.025
0.975]
--------------------------------------------------------------------------
const 0.375 0.212 1.768 0.077 -0.041
0.791
ar.L1.D.GS -0.452 0.215 -2.102 0.036 -0.873 -
0.031
ar.L2.D.GS -0.231 0.223 -1.036 0.300 -0.668
0.206
ar.L3.D.GS 0.154 0.225 0.684 0.494 -0.287
0.595
ar.L4.D.GS -0.098 0.223 -0.439 0.661 -0.535
0.339
ar.L5.D.GS 0.067 0.214 0.313 0.754 -0.352
0.486
Roots
==========================================================================
===
Real Imaginary Modulus
Frequency
--------------------------------------------------------------------------
---
AR.1 1.234 0.000j 1.234 0.000
AR.2 0.567 +1.123j 1.256 0.175
AR.3 0.567 -1.123j 1.256 -0.175
AR.4 -0.789 +0.987j 1.267 0.357
AR.5 -0.789 -0.987j 1.267 -0.357
--------------------------------------------------------------------------
---

7. Who’s the MVP? XGBoost Spills It


XGBoost is my go-getter — it predicts prices and tells me
what’s driving them. Once, I ran it and saw momentum
outshining MACD, which surprised me. It’s like a coach
pointing out the star players. I train it on my data, and it
ranks the clues — no extra code needed, just trust in the
process.

8. Dreaming Up the Future: GAN Magic


Last summer, I got obsessed with GANs after watching a
sci-fi flick about AI battles. It’s two networks — one faking
prices, the other calling out fakes — until they nail
something believable. For GS, it’s like imagining tomorrow
based on yesterday’s playbook. Wild, right?

The Takeaway: My Kitchen-Sink Approach


This project’s my Frankenstein — technical indicators from
my trading days, news vibes from real-world flops,
smoothing tricks from late-night math binges, and XGBoost
plus GANs for that modern edge. It’s not perfect (the
market’s a beast), but it’s a thrill to build. The code’s on
GitHub — give it a whirl, tweak it for your stocks, and let
me know how it goes. Just don’t blame me if the market
pulls a fast one!
Side note: This is my pet project, not a golden
ticket. Trading’s a rollercoaster — buckle up!
Disclaimer: For educational purposes only: If
you’re seeking a fully developed, ready-to-use
trading system, this article won’t fulfill that
expectation. However, if you’re interested in
exploring ideas for further development, you’ll
find valuable insights here.

You might also like