0% found this document useful (0 votes)
12 views9 pages

Bda Report

This project report presents an end-to-end pipeline for analyzing and forecasting U.S. stock prices using the PSPARK tool, incorporating technical indicators and a Long Short-Term Memory (LSTM) network. The system allows users to input stock tickers, automatically processes data, and generates visualizations and predictions, achieving a test-set R² of around 0.85–0.90. The modular design of the pipeline facilitates future enhancements, making it a practical tool for financial analysis and time-series modeling.

Uploaded by

ravikirantsrk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

Bda Report

This project report presents an end-to-end pipeline for analyzing and forecasting U.S. stock prices using the PSPARK tool, incorporating technical indicators and a Long Short-Term Memory (LSTM) network. The system allows users to input stock tickers, automatically processes data, and generates visualizations and predictions, achieving a test-set R² of around 0.85–0.90. The modular design of the pipeline facilitates future enhancements, making it a practical tool for financial analysis and time-series modeling.

Uploaded by

ravikirantsrk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belagavi- 590014

A Project Report
On
“REVIEW ANALYSIS BY USING PSPARK TOOL”
Submitted in partial fulfilment of the requirement for the award of
BACHELOR OF ENGINEERING
In
COMPUTER SCIENCE & ENGINEERING (DATA SCIENCE)
By

RAVIKIRAN T S (USN: 1AM22CD078)


S PRINCESTON (USN: 1AM22CD082)
VARSHA M B (USN: 1AM22CD110)
VISHAL H K (USN: 1AM22CD114)

Under the Guidance of


Prof. Vinod Kulkarni

Assistant Professor, Dept. of CSE-DS

2024 – 2025

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING (DATA SCIENCE)


AMC ENGINEERING COLLEGE
18th K.M, Bannerghatta Road, Bengaluru- 560083
TABLE OF CONTENTS

1. INTRODUCTION

2. DESCRIPTION OF TOOL USED

3. OUTPUT SCREENSHOTS

4. CONCLUSION
INTRODUCTION

Financial markets generate vast quantities of historical price and volume


data, and making sense of these time series to inform trading or
investment decisions requires robust analytical methods. In this project,
we develop an end-to-end pipeline that ingests daily stock data (open,
high, low, close, volume), computes a suite of technical indicators
(moving averages, RSI, MACD), and leverages a PyTorch-based Long
Short-Term Memory (LSTM) network to forecast next-day closing
prices. Beyond forecasting, we produce a set of visualizations—price
trends, volume patterns, moving-average overlays, RSI distributions, and
MACD crossovers—to provide a comprehensive historical analysis.
Users can request analysis for any ticker (e.g., “AAPL,” “MSFT,”
“TSLA”), and the system will automatically download the corresponding
CSV, engineer features, display key graphs, train an LSTM on past
windows, and overlay predicted versus actual prices. This approach
addresses two main challenges: (1) enriching raw data with domain-
specific indicators that capture momentum and trend, and (2) modeling
sequential dependencies via LSTM to produce short-term price forecasts.

3
DESCRIPTION OF TOOLS USED

• Python 3.11 (Google Colab)


The entire pipeline is implemented in a Colab notebook (.ipynb),
enabling both interactive visualization and rapid iteration. Colab’s free
GPU/TensorFlow accelerator is optionally used to speed up LSTM
training.
• kagglehub
A lightweight Python client that automates direct downloading of Kaggle
datasets. We pull “Price and Volume Data for All US Stocks & ETFs”
without manual credential setup. Once downloaded, the repository
organizes files into Stocks/ and ETFs/ subfolders, each containing one
CSV per symbol (e.g., aapl.us.txt).
• pandas & NumPy
Primary libraries for data loading, manipulation, and numerical
operations. After reading a ticker’s CSV into a DataFrame, we convert
“Date” to a datetime object, sort chronologically, and compute
rolling/ewm indicators. NumPy is used internally for array
transformations and as the backend for scikit-learn scalers.
• scikit-learn
o MinMaxScaler: Normalizes all selected features (Open, High,
Low, Close, Volume, MA10, MA50, EMA10, RSI, MACD, Signal
Line) to the [0, 1] range, ensuring stable, fast convergence of the
neural network.
o train_test_split (where relevant): Splits sequences into 80 %
training and 20 % test sets—always in chronological (non-
shuffled) order to avoid lookahead leakage in time-series data.
4
• matplotlib & seaborn
Used for static, publication-quality plots during exploratory data analysis.
Examples include:
o Line plot of historical closing prices.
o Bar/line plot of daily trading volume.
o Six-panel subplot (Closing Price, Volume, MA10 vs. MA50,
EMA10, RSI, MACD & Signal Line).
o Correlation heatmap among all features to inspect
multicollinearity.
• plotly.express & plotly.graph_objects
Enables interactive, web-friendly visualizations that can be zoomed,
panned, and hovered. We produce:
o Multi-line plots overlaying Close, MA10, and MA50.
o An area chart of Volume over time.
o A dual-subplot of RSI and MACD (with Signal Line).
o A candlestick chart for the most recent 60 days, useful for short-
term technical analysis.
• PyTorch (torch, torch.nn, torch.optim)
The core deep-learning framework used for building, training, and
evaluating the LSTM. Key components:
o torch.tensor: Converts NumPy arrays into GPU-ready tensors.
o nn.LSTM: Stacked LSTM with two layers and 64 hidden units,
capturing temporal dependencies across a 20-day window.
o nn.Linear: A fully connected layer to map the last LSTM hidden
state to a single output (next-day closing price).
o optim.Adam: Adaptive optimizer with learning rate 1 × 10⁻³.
o MSELoss: Regression loss function measuring squared error
5
between predicted and actual scaled prices.
• ipywidgets (optional)
For interactive text-box and button widgets that allow users to type a
ticker symbol and click “Analyze,” automatically running the entire
pipeline—feature computation, EDA visualization, LSTM training, and
forecasting—inline within the notebook.

6
OUTPUT SCREENSHOTS

7
CONCLUSION
This project delivers a unified framework for both retrospective analysis and
short-term forecasting of U.S. stock prices. By combining domain
knowledge (moving averages, RSI, MACD) with a deep sequential
model (LSTM), we achieve the following:
1. Historical Insight:
o Clear visualizations reveal long-run trends (e.g., Apple’s multi-
decade appreciation), volume surges around major events, and
oscillatory behavior in RSI and MACD indicating overbought or
oversold regimes.
o A correlation heatmap highlights relationships among features,
guiding feature selection and pointing out potential
multicollinearity.
2. Predictive Performance:
o The LSTM, trained on 20-day sliding windows of normalized
features, learns temporal dependencies and yields a test‐set R2R^2
of around 0.85–0.90 for large-cap tickers (e.g., AAPL), with an
average prediction error of $1–$3.
o Although it cannot capture abrupt shocks perfectly, it reliably
tracks general uptrends and downtrends in daily closing prices.
3. User Flexibility:
o A single function, run_analysis(symbol), ties together data
ingestion, feature engineering, EDA plots, model training, and
forecast plotting. Users simply pass a ticker (e.g., “MSFT”), and
the notebook produces a comprehensive analysis within minutes.
o The optional interactive widget allows nonprogrammers to trigger
8
analysis by typing a symbol and clicking a button.
4. Modularity and Extensibility:
o The pipeline’s design (clearly separated blocks for loading data,
engineering features, preparing sequences, training the LSTM, and
visualizing results) makes it straightforward to replace or augment
any component.
o Future work could incorporate additional exogenous inputs (news
sentiment, macroeconomic indicators), experiment with alternative
architectures (GRU, Transformer), or extend to multi-day
forecasts.
In summary, this project demonstrates that a well-engineered combination
of technical indicators and an LSTM model can both illuminate historical
market behavior and produce actionable next-day price forecasts. By
packaging everything into a user-friendly function, it provides a practical
tool for analysts and students to explore time-series modeling in finance.

You might also like