Common Python Packages for FinML
Common Python Packages for FinML
Pandas is one of the most popular Python packages for data analysis and manipulation. The
name stands for "Panel Data". It provides fast, flexible, and expressive data structures designed
to make working with relational or labeled data easy and intuitive. Pandas enables easy data
loading, data cleaning, data preprocessing, merging, transformations, aggregations, and
analysis. Some key data structures in Pandas are Series, DataFrames, Panels. Below is a sample
code to load stock data from Yahoo Finance and analyze it using Pandas:
import pandas as pd
import yfinance as yf
msft = yf.Ticker("MSFT")
msft_data = msft.history(period="max")
df = pd.DataFrame(msft_data)
df.head() # view first 5 rows
df['Close'].plot() # plot closing prices
NumPy
NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It
provides support for large, multi-dimensional arrays and matrices with high-level mathematical
functions to operate on these arrays. NumPy is fast and efficient, as it has bindings to C
libraries. Commonly used for numerical data and manipulations.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
c = a + b # Array addition
print(c) # [3 5 7]
Matplotlib
Matplotlib is a comprehensive Python 2D plotting library which can generate various charts,
graphs, histograms etc. Commonly used for data visualization and presenting analysis results.
scikit-learn
scikit-learn is one of the most popular Python machine learning libraries. It provides tools for
data mining, data analysis, model evaluation and many classical ML algorithms like linear
regression, random forest, SVM etc.
from sklearn.linear_model import LinearRegression
X = [[1], [2], [3]]
y = [1, 2, 3]
model = LinearRegression()
model.fit(X, y)
print(model.predict([[4]])) # 4
Yahoo Finance and Google Finance provide APIs to extract real-time and historical financial
data. pandas_datareader can download data from these sources.
Statsmodels
Statsmodels is a Python module for statistical modeling and econometrics. It provides classes
and functions for regression, time series analysis, statistical tests, and more.
import statsmodels.api as sm
model = sm.OLS(y, X).fit() # Ordinary Least Squares
print(model.summary())
PyMC3
PyMC3 is a probabilistic programming framework for Bayesian modeling and Probabilistic
Machine Learning. It provides tools for Bayesian inference and stochastic optimization.
import pymc3 as pm
basic_model = pm.Model()
with basic_model:
pm.Normal('x', mu=0, sigma=1)
trace = pm.sample(1000)
pm.summary(trace)
TensorFlow Probability
TensorFlow Probability is a Python library built on TensorFlow for probabilistic reasoning and
statistical analysis. Provides tools for Bayesian deep learning, modeling, and inference.
import tensorflow as tf
import tensorflow_probability as tfp
model = tfp.distributions.Normal(loc=0., scale=1.)
samples = model.sample(100)
print(tfp.stats.mean(samples))
Prophet
Prophet is an open-source forecasting tool released by Facebook. It provides an intuitive API to
make accurate time series forecasts.
TA-Lib
TA-Lib provides technical analysis indicators commonly used in financial analysis like moving
averages, Bollinger Bands, RSI etc. Integrates well with Pandas.
import talib
import pandas as pd
close = pd.Series([10, 11, 9, 11, 8, 12], name='Close')
sma20 = talib.SMA(close, timeperiod=20)
rsi = talib.RSI(close)