04 Synthetic Data Generation
04 Synthetic Data Generation
Fall Semester
Academic Year 2024/2025
Table of Contents
1. Objectives
6. References
• It is one that has been generated from real data – and that has
similar statistical properties.
• The degree to which a synthetic dataset is an accurate proxy for
real data is a measure of utility – can we really use it?
• Synthesis is the process of generating synthetic data.
• Synthetic data can be of different forms – structured,
semi-structured, or unstructured.
• There are three types of synthetic data – generated from real
data, does not use real data, and a hybrid of these two.
The second type of synthetic data is not generated from real data,
but rather from existing models or domain experience.
For some use cases, having high utility will matter quite a bit – in
other cases, medium or even low utility may be acceptable.