0% found this document useful (0 votes)
0 views7 pages

Common Python Packages for FinML

The document provides an overview of various Python libraries used for data analysis and manipulation, including Pandas, NumPy, SciPy, Matplotlib, scikit-learn, Statsmodels, PyMC3, TensorFlow Probability, Prophet, and TA-Lib. Each library is briefly described along with sample code demonstrating its functionality. These libraries facilitate tasks such as data loading, statistical modeling, machine learning, and data visualization.

Uploaded by

phyo wai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views7 pages

Common Python Packages for FinML

The document provides an overview of various Python libraries used for data analysis and manipulation, including Pandas, NumPy, SciPy, Matplotlib, scikit-learn, Statsmodels, PyMC3, TensorFlow Probability, Prophet, and TA-Lib. Each library is briefly described along with sample code demonstrating its functionality. These libraries facilitate tasks such as data loading, statistical modeling, machine learning, and data visualization.

Uploaded by

phyo wai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Pandas (Python Data Analysis Library)

Pandas is one of the most popular Python packages for data analysis and manipulation. The
name stands for "Panel Data". It provides fast, flexible, and expressive data structures designed
to make working with relational or labeled data easy and intuitive. Pandas enables easy data
loading, data cleaning, data preprocessing, merging, transformations, aggregations, and
analysis. Some key data structures in Pandas are Series, DataFrames, Panels. Below is a sample
code to load stock data from Yahoo Finance and analyze it using Pandas:

import pandas as pd
import yfinance as yf
msft = yf.Ticker("MSFT")
msft_data = msft.history(period="max")
df = pd.DataFrame(msft_data)
df.head() # view first 5 rows
df['Close'].plot() # plot closing prices

NumPy
NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It
provides support for large, multi-dimensional arrays and matrices with high-level mathematical
functions to operate on these arrays. NumPy is fast and efficient, as it has bindings to C
libraries. Commonly used for numerical data and manipulations.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])

c = a + b # Array addition
print(c) # [3 5 7]

SciPy (Scientific Python)


SciPy is a Python-based ecosystem for mathematics, science, and engineering. It builds on top
of NumPy and provides support for signal processing, optimization, statistics, linear algebra,
and more. Commonly used for numerical data processing and statistical modeling.

from scipy import stats


import numpy as np
x = [1, 4, 7, 3, 2]
mean = np.mean(x)
std_dev = np.std(x)

print(stats.norm.pdf(x, mean, std_dev)) # Probability density

Matplotlib
Matplotlib is a comprehensive Python 2D plotting library which can generate various charts,
graphs, histograms etc. Commonly used for data visualization and presenting analysis results.

import matplotlib.pyplot as plt


x = [1, 2, 3, 4]
y = [10, 11, 12, 13]
plt.plot(x, y)
plt.title("Sample Chart")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

scikit-learn
scikit-learn is one of the most popular Python machine learning libraries. It provides tools for
data mining, data analysis, model evaluation and many classical ML algorithms like linear
regression, random forest, SVM etc.
from sklearn.linear_model import LinearRegression
X = [[1], [2], [3]]
y = [1, 2, 3]
model = LinearRegression()
model.fit(X, y)
print(model.predict([[4]])) # 4

Yahoo Finance and Google Finance provide APIs to extract real-time and historical financial
data. pandas_datareader can download data from these sources.

import pandas_datareader as pdr


msft = pdr.get_data_yahoo('MSFT')
goog = pdr.get_data_google('GOOG')

Statsmodels
Statsmodels is a Python module for statistical modeling and econometrics. It provides classes
and functions for regression, time series analysis, statistical tests, and more.

import statsmodels.api as sm
model = sm.OLS(y, X).fit() # Ordinary Least Squares
print(model.summary())

PyMC3
PyMC3 is a probabilistic programming framework for Bayesian modeling and Probabilistic
Machine Learning. It provides tools for Bayesian inference and stochastic optimization.

import pymc3 as pm
basic_model = pm.Model()
with basic_model:
pm.Normal('x', mu=0, sigma=1)
trace = pm.sample(1000)
pm.summary(trace)
TensorFlow Probability
TensorFlow Probability is a Python library built on TensorFlow for probabilistic reasoning and
statistical analysis. Provides tools for Bayesian deep learning, modeling, and inference.

import tensorflow as tf
import tensorflow_probability as tfp
model = tfp.distributions.Normal(loc=0., scale=1.)
samples = model.sample(100)
print(tfp.stats.mean(samples))

Prophet
Prophet is an open-source forecasting tool released by Facebook. It provides an intuitive API to
make accurate time series forecasts.

from prophet import Prophet


model = Prophet()
model.fit(data)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

TA-Lib
TA-Lib provides technical analysis indicators commonly used in financial analysis like moving
averages, Bollinger Bands, RSI etc. Integrates well with Pandas.

import talib
import pandas as pd
close = pd.Series([10, 11, 9, 11, 8, 12], name='Close')
sma20 = talib.SMA(close, timeperiod=20)
rsi = talib.RSI(close)

You might also like