Introduction To Data Science in Finance
Introduction To Data Science in Finance
Data
Scienc
e
algorithm
processe
s, and
s systems
Key components of Data Science
Data Collection
Data Preparation
Data Analysis
Data Visualization
Machine Learning
Data Science: A Multifaceted Field
Comput
er
Science
Domain
Statistic
Expertis
s
e
Data
Scienc
e
Statistics
A statistician might develop a new method for A domain expert in healthcare might identify a
analyzing time series data, but a computer need to predict patient outcomes, while a data
scientist would implement it in efficient scientist would use machine learning to build a
software. predictive model.
Tools and Technologies
Data Engineer
Key Roles in Financial Data Scientist
Finance
Risk Analyst
Using Data
Algorithmic Trader
Science
Financial Modeler
• Accounting records
• Financial statements
• Management reports
External
Financial Data
Sources
• Financial databases
• Government agencies
• Industry associations
• Research firms
Importance of Clean
Data in Finance
Data corruption
Privacy concerns
Data integration
Techniques for Handling
Missing Data
• 1. Removal
• Listwise deletion: Remove all
observations with missing values. This
method can be effective if the number
of missing values is small and the data
is not heavily skewed.
• Pairwise deletion: Exclude observations
with missing values only for the
specific analysis or calculation. This
method can be more efficient than
listwise deletion but may introduce bias
if the missing values are not random.
2. Imputation
• Mean/median/mode imputation:
Replace missing values with the mean,
median, or mode of the respective
variable. This method is simple but can
introduce bias if the data is not
normally distributed.
• Hot deck imputation: Replace missing
values with values from a randomly
selected donor observation with similar
characteristics.
• Cold deck imputation: Replace
missing values with values from a
predetermined donor observation.
• Regression imputation: Use
regression analysis to predict missing
values based on other variables in
the dataset.
• Multiple imputation: Create multiple
complete datasets by imputing
missing values using different
methods and combining the results.
Common Data Quality
Issues: Outliers in
Financial Data
• Currency conversion
• Date format standardization
• Decimal separator
standardization
• Min-max scaling
• Z-score normalization
• Robust scaling This Photo by Unknown Author is licensed under CC BY-NC
Descriptive Statistics
in Finance: Central
Tendency
• Positive skewness
(right-skewed)
• Negative skewness
(left-skewed)
Introduction to
Financial Data
Visualization
Line
Charts
• A line chart connects
data points with
lines, creating a
visual representation
of how a variable
changes over time.
In finance, this is
often used to track This Photo by Unknown Author is licensed under CC BY-SA
• Secular trends
• Market cycles
• Fundamental analysis