DSBA Curriculum Guide
DSBA Curriculum Guide
DATA SCIENCE
AND BUSINESS
ANALYTICS
CURRICULUM GUIDE
In collaboration with
ABOUT THE PROGRAM
The Post Graduate Program in Data Science and Business Analytics (PGP-DSBA) is tailored for
mid-senior and senior professionals. The program’s curriculum is designed for those interested in
extracting insights from data to create insightful stories and impact business decisions. Through the
program, learners will familiarize themselves with the tools and techniques required to solve business
problems.
Learners will discover how to analyze and visualize data using Python to extract valuable insights and
offer practical business recommendations. They'll also learn how to conduct statistical analysis to test
business hypotheses and create machine learning models for predicting future occurrences based on
data relationships.
This program is built around the fundamental learning principle of ‘learning by doing’. It focuses on
building a meaningfully practical skill set with hands-on case studies, hands-on projects, and a
portfolio of Data Science and analytics projects. PGP-DSBA is designed to help you transition or
advance into one of the fastest-growing careers of the modern world.
PROGRAM HIGHLIGHTS
7 Months Online
02
LEARNING OUTCOMES
03
CURRICULUM
The curriculum, designed by the faculty of UT Austin, Great Learning, and leading industry
practitioners is taught by the best-in-class professors and practicing industry experts. The objective
of the program is to familiarize learners with the concepts of data science and business analytics
necessary to establish their career or transition to a career in the field of data science.
This course provides you with an understanding of the evolution of Data Science over time, its
application in industries, the mathematics and statistics behind them, and an overview of the life
cycle of building data-driven solutions.
Pre-work: Python
This course provides you with a fundamental understanding of the basics of Python programming
and builds a strong foundation of the basics of coding to build Data Science applications.
04
Module 1: Python Foundations
Master data storytelling with Python. Learn to read, manipulate, and visualize data, driving insights
for impactful business solutions through exploratory data analysis. Transform raw information into
compelling narratives.
Concepts Used:
• Variables and Datatypes
• Data Structures
• Conditional and Looping Statements
• Functions
Learning Outcomes: Learn about the fundamentals of Python programming (variables, data
structures, conditional and looping statements, functions).
NumPy is a Python package for mathematical and scientific computing and involves
working with arrays and matrices. Pandas is a fast, powerful, flexible, and simple-to-use
open-source library in Python to manipulate and analyze data. This module will cover these
important libraries and provide a deep understanding of how to use them to explore data.
Concepts Used:
• NumPy Arrays and Functions
• Accessing and Modifying NumPy Arrays
• Saving and Loading NumPy Arrays
• Pandas Series (Creating, Accessing, and Modifying Series)
• Pandas DataFrames (Creating, Accessing, Modifying, and Combining DataFrames)
• Pandas Functions
• Saving and Loading Datasets using Pandas
Learning Outcomes: Learn about two of the most commonly used libraries
(NumPy and Pandas) used in Data Science for reading and manipulating data.
05
Topic 3- Exploratory Data Analysis (Deep Dive)
Exploratory Data Analysis, or EDA, is a process of examining and visualizing data to uncover
patterns and extract meaningful insights from it and facilitates storytelling. This module
provides a deep insight on how to conduct EDA using Python and utilize the insights
extracted to drive business decisions.
Concepts Used:
• Data Overview
• Univariate Analysis
• Bivariate/Multivariate Analysis
• Missing Value Treatment
• Outlier Detection and Treatment
Learning Outcomes: Learn how to perform Exploratory Data Analysis (EDA) to extract
insights from data.
Utilize Python for statistical analysis. Validate business estimates through confidence intervals,
ensuring reliability. Test assumptions with hypothesis testing, guiding informed resource
allocation and strategic decision-making based on data distribution analysis.
Concepts Used:
• Experiments, Events, and Definition of Probability
• Introduction to Inferential Statistics
• Introduction to Probability Distributions (Random Variable, Discrete and Continuous
Random Variables, Probability Distributions)
• Binomial Distribution
• Normal Distribution
• Z-Score
Learning Outcomes: Learn about the fundamentals of probability distributions and the
foundations of Inferential Statistics
06
Topic 2- Estimation and Hypothesis Testing
Estimation involves determining likely values for population parameters from sample data,
while hypothesis testing provides a framework for drawing conclusions from sample data to
the broader population. This module covers the important concepts of central limit theorem
and estimation theory that are vital for statistical analysis, and the framework for conducting
hypothesis tests.
Concepts Used:
• Sampling
• Central Limit Theorem
• Estimation
• Introduction to Hypothesis Testing (Null and Alternative Hypothesis, Type-I
• and Type-II errors, Alpha, Critical Region, P-Value)
• Hypothesis Formulation and Performing a Hypothesis Test
• One-Tailed and Two-Tailed Tests
• Confidence Intervals and Hypothesis Testing
Learning Outcomes: Learn about the Central Limit Theorem, estimation, and the key
concepts of Hypothesis Testing.
Concepts Used:
• Common Statistical Tests
• Test for One Mean
• Test for Equality of Means (Known Standard Deviation)
• Test for Equality of Means (Equal and Unknown Std Dev)
• Test for Equality of Means (Unequal and Unknown Std Dev)
• Test of Independence
• One-Way ANOVA
Learning Outcomes: Learn about various commonly used statistical tests and their
implementation in Python with business examples.
07
Module 3: Supervised Learning - Foundations
Delve into linear models for uncovering relationships between variables and continuous outcomes.
Validate models for statistical soundness, drawing inferences to extract crucial business insights
into decision-making factors.
Machine Learning (ML), a subset of Artificial Intelligence (AI), which focuses on developing
algorithms capable of learning patterns in data and making predictions without being explicitly
programmed to do so. Linear Regression is one of the most popular supervised ML algorithms
that identifies the degree of linear relationship in data. This module introduces participants to
ML and explores how linear regression can be used for predictive analysis.
Concepts Used:
• Introduction to Learning from Data
• Simple and Multiple Linear Regression
• Evaluating a Regression Model
• Pros and Cons of Linear Regression
Learning Outcomes: Understand the concept of learning from data, how the linear regression
algorithm works, and how to build and assess the performance of a regression model in Python.
The linear regression algorithm has a set of assumptions that need to be satisfied for the model
to be statistically validated and to be able to draw inferences from it. This module walks
participants through these assumptions, how to check them, what to do in case they are
violated, and the statistical inferences that can be drawn based on the model's output.
Concepts Used:
• Statistician vs ML Practitioner
• Linear Regression Assumptions
• Statistical Inferences from a Linear Regression Model
08
Module 4: Supervised Learning - Classification
Unlock the power of classification models to discern relationships between variables and
categorical outcomes. Extract business insights by identifying pivotal factors shaping
decision-making processes.
Logistic regression is a statistical modeling technique primarily used for modeling the
probability of binary outcomes. It finds applications in various fields such as medicine, finance,
and manufacturing. This module covers the theory behind the logistic regression model, how to
assess its performance, and how to draw statistical inferences from it.
Concepts Used:
• Introduction to Logistic Regression
• Interpretation from a Logistic Regression Model
• Changing the Threshold of a Logistic Regression Model
• Evaluation of a Classification Model
• Pros and Cons
Learning Outcomes: Understand the foundations of the Logistic Regression Model, how to make
interpretations from it, how to evaluate the performance of classification models, and how
changing the threshold of a Logistic Regression Model can help in improving predictions.
Decision Trees are supervised ML algorithms that utilize a hierarchical structure for
decision making and can be used for both classification and regression problems.
This module dives into how a decision tree can be used to model complex, non-linear
data and how to improve the performance of Decision Trees using pruning techniques.
Concepts Used:
• Introduction to Decision Tree
• How a Decision Tree is Built
• Methods of Pruning a Decision Tree
• Different impurity measures
• Regression Trees
• Pros and Cons
Learning Outcomes: Understand the Decision Tree algorithm, how it’s built, the different
pruning techniques that can be used to improve performance, and learn about the different
impurity measures used to make decisions.
09
Module 5: Ensemble Techniques and Model Tuning
Combine the decisions from multiple models using ensemble techniques to arrive at
more robust models that can make better predictions.
Random forest is a popular ensemble learning technique that comprises several decision trees,
each using a subset of the data to understand patterns. The outputs of each tree are then
aggregated to provide predictive performance. This module will explore how to train a random
forest model to solve complex business problems.
Concepts Used:
• Introduction to Ensemble Techniques
• Introduction to Bagging
• Sampling with Replacement
• Introduction to Random Forest
Learning Outcomes: Understand how ensemble techniques work, learn about sampling with
replacement and the concept of bagging, and build Random Forest models to make better
predictions.
Topic 2- Boosting
Boosting models are robust ensemble models that comprise several sub-models, each of which
are developed in a sequential manner to improve upon the errors made by the previous one.
These modules will cover essential boosting algorithms like Adaboost and XGBoost that are
widely used in the industry for accurate and robust predictions.
Concepts Used:
• Introduction to Boosting
• Boosting Algorithms like Adaboost, Gradient Boost, and XGBoost
• Stacking
Learning Outcomes: Understand the concept of boosting, the difference between bagging
and boosting, learn various boosting algorithms, and understand the concept of stacking.
10
Topic 3- Feature Engineering and Cross Validation
Feature engineering involves creating new input features or modifying existing ones to improve
a machine learning model's performance, and cross-validation is used for getting a better
assessment of a model performance. This module covers these two concepts along with
regularization to tune the performance of ML models and correctly assess their performance.
Concepts Used:
• Feature Engineering
• Cross-Validation
• Oversampling and Undersampling
• Regularization
Learning Outcomes: Learn how to handle imbalanced data, how to use the cross-validation
technique to get a better picture of model performance, and understand the concept of
regularization.
Concepts Used:
• Machine Learning Pipeline
• Model Tuning and Performance
• Hyperparameter Tuning
• Grid Search
• Random Search
Learning Outcomes: Learn how to optimize model performance using hyperparameter tuning
and how to automate standard workflows in a machine learning process using pipelines.
11
Module 6: Unsupervised Learning
Unlock the power of clustering algorithms to group data based on similarity, unveiling hidden
patterns and intrinsic structures. Explore dimensionality reduction techniques to grasp the
significance of streamlined data analysis.
K-means clustering is a popular unsupervised ML algorithm that is used for identifying patterns
in unlabeled data and grouping it. This module dives into the working of the algorithm and the
important points to keep in mind when implementing it in practical scenarios.
Concepts Used:
• Introduction to Clustering
• Types of Clustering
• K-Means Clustering
• Importance of Scaling
• Silhouette Score
• Visual Analysis of Clustering
Learning Outcomes: Learn about the different types of clustering algorithms, how K-means
clustering works, how to determine the optimal number of clusters by comparing different
metrics, and the importance of scaling data.
Hierarchical clustering organizes data into a tree-like structure of nested clusters, while
dimensionality reduction techniques are used to transform data into a lower-dimensional
space while retaining the most important information in it. This module covers the business
applications of hierarchical clustering and how to reduce the dimension of data using PCA
to aid in visualization and feature selection of multivariate datasets.
Concepts Used:
• Hierarchical Clustering
• Cophenetic Correlation
• Introduction to Dimensionality Reduction
• Principal Component Analysis
Learning Outcomes: Learn how to apply the hierarchical clustering technique to group similar
data points together and discover underlying patterns, understand the need for reducing
dimensions of the data, and understand the working of the PCA and how to transform data into
fewer dimensions using PCA.
12
Module 7: Introduction to Generative AI
In this course, you will get an overview of Generative AI, understand the difference between
generative and discriminative AI, design, implement, and evaluate tailored prompts for specific
tasks to achieve desired outcomes, and integrate open-source models and prompt engineering
to solve business problems using generative AI.
Generative AI is a subset of AI that leverages ML models to learn the underlying patterns and
structures in large volumes of training data and use that understanding to create new data such
as images, text, videos, and more. This module provides a comprehensive overview of what
generative AI models are, how they evolved, and how to apply them effectively to various
business challenges.
Concepts Used:
• Supervised vs Unsupervised Machine Learning
• Generative AI vs Discriminative AI
• Brief timeline of Generative AI
• Overview of Generative Models
• Generative AI Business Applications
Prompt engineering refers to the process of designing and refining prompts, which are
instructions provided to generative AI models, to guide the models in generating specific,
accurate, and relevant outputs. This module provides an overview of prompts and covers
common practices to effectively devise prompts to solve problems using generative AI models.
Concepts Used:
• Introduction to Prompts
• The Need for Prompt Engineering
• Different Types of Prompts (Conditional, Few-shot, Chain-of-thought, Returning
Structured Output)
• Limitations of Prompt Engineering
13
Module 8: Introduction to SQL
This course will help you gain an understanding of the core concepts of databases and SQL, gain
practical experience writing simple SQL queries to filter, manipulate, and retrieve data from
relational databases, and utilize complex SQL queries with joins, window functions, and subqueries
for data extraction and manipulation to solve real-world data problems and extract actionable
business insights.
SQL is a widely used querying language for efficiently managing and manipulating relational
databases. This module provides an essential foundation for understanding and working with
relational databases. Participants will explore the principles of database management and
Structured Query Language (SQL), and learn how to fetch, filter, and aggregate data using SQL
queries, enabling them to extract valuable insights from large datasets efficiently.
Concepts Used:
• Introduction to Databases and SQL
• Fetching data
• Filtering data
• Aggregating data
SQL offers a wide range of numeric, string, and date functions, gaining proficiency in leveraging
these functions to perform advanced calculations, string manipulations, and date operations.
SQL joins are used to combine data from multiple tables effectively and window functions
enable performing complex analytical tasks such as ranking, partitioning, and aggregating data
within specified windows. This module provides a comprehensive exploration of the various
functions and joins available within SQL for data manipulation and analysis, enabling them to
summarize and analyze large datasets effectively.
Concepts Used:
• In-built functions (Numeric, Datetime, Strings)
• Joins
• Window functions
Subqueries allow one to nest queries within other queries, enabling more complex and flexible
data manipulation. This module will equip participants with advanced techniques for filtering
data based on conditional expressions or calculating derived values to extract and manipulate
data dynamically.
Concepts Used:
• Subqueries
• Order of query execution
14
ENHANCE KNOWLEDGE WITH
SELF-PACED MODULES
The self-paced modules cater to skills that are complementary to those learnt in guided modules.
Since all learners do not need to/may not want to learn them, they have been kept as part of
self-paced modules. All these modules have similar high-quality recorded video lectures by UT Austin
faculty, global academicians, and industry experts, but do not have mentorship sessions. You can learn
them at your own pace and schedule, based on your interests and the current and future demands of
your role.
Pre-Work
Gain a fundamental understanding of the basics of Python programming and build a strong
foundation of coding to build Data Science applications.
Model Deployment
Learn the role of model deployment in realizing the value of an ML model and how to build
and deploy an application using Python.
15
BUILD INDUSTRY-RELEVANT SKILLS WITH
HANDS-ON PROJECTS
Practical Learning
7 hands-on
projects Skill Development
that will help
you with:
Portfolio Enhancement
16
SAMPLE CASE STUDIES
Build a Data Science solution for a chain of hotels that will help them predict the likelihood of
a booking getting canceled so that they can take measures to fill in potential vacancies and
reduce revenue loss.
Tools and Concepts: Exploratory Data Analysis, Decision Trees, Random Forest, Scikit
Learn and Pandas
Analyze the customer reviews for different restaurants for a leading global food aggregator
and use generative AI models to analyze the reviews and tag them, thereby enhancing the
company's ability to understand customer sentiments at scale, enabling data-driven
decision-making, and improving overall customer satisfaction.
Tools and Concepts: Generative AI, Large Language Models, Prompt Engineering,
Hugging Face
Analyze the data of an auto component manufacturing company and develop a predictive
model to detect potential machine failures, determine the most influencing factors on machine
health, and provide recommendations for cost optimization to the management.
Tools and Concepts: Exploratory Data Analysis, Data Visualization, Decision Trees, Pruning,
Scikit-Learn
Analyze the customer data of a bike-sharing company and build a model to predict the count
of bikes shared so that the company can make prior decisions for surge hours.
Tools and Concepts: Exploratory Data Analysis, Data Visualization, Decision Trees,
AdaBoost, XGBoost, Scikit-Learn
CredPay
Analyze the data provided by a consultation firm that partners with banks, answer key ques-
tions provided, draw actionable insights, and help the company to improve the business by
identifying the attributes of customers eligible for a credit card.
Tools and Concepts: Exploratory Data Analysis, Data Visualization, Pandas, Seaborn
17
Diabetes Risk Prediction
Analyze the historical patient data provided and build a predictive model to help identify
whether a person is at risk of diabetes or not.
Tools and Concepts: Exploratory Data Analysis, Data Visualization, Bagging, Random
Forests, Scikit-Learn
Analyze the data from the database of a music-based startup that recently started selling
music records, answer questions for a performance review to identify customer preferences
by demographies, and generate recommendations to help business growth.
Tools and Concepts: Data Filtering, SQL Functions, Data Aggregation, JoinsSQL
Analyze the data comprising economic, social, and environmental & infrastructure indicators,
and group countries based on them to help a tourism management organization identify key
locations to invest to promote tourism services.
Tools and Concepts: Exploratory Data Analysis, Data Visualization, K-means Clustering,
Hierarchical Clustering, Principal Component Analysis, Scikit-Learn
Analyze the data provided by a health company regarding a market test experiment to check
the effectiveness of various diet plans for weight loss, and conduct hypothesis tests to find
evidence of whether the different diet plans differ significantly.
Tools and Concepts: Exploratory Data Analysis. Confidence Intervals, Hypothesis Testing,
ANOVA, Statsmodels
Analyze the platform engagement data of a massive open online course provider and create
an analytical report for a given academic year and enable informed decision-making regarding
actions for the next academic year.
18
READY TO ADVANCE YOUR CAREER?
APPLY NOW
CONTACT US
+1 512 793 9938
https://onlineexeced.mccombs.utexas.edu/online-data-science-business-analytics-course