0% found this document useful (0 votes)
3 views31 pages

L1 FA23 BST AB Spring 2025

The document outlines the fundamentals of data science, covering key terminologies such as data types, data collection methods, and the concept of synthetic data. It introduces artificial intelligence (AI), its definitions, types, and the distinction between artificial and human intelligence. Additionally, it discusses the DIKW framework and suggests future class initiatives and topics for further exploration in data science.

Uploaded by

umarfiaz1199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views31 pages

L1 FA23 BST AB Spring 2025

The document outlines the fundamentals of data science, covering key terminologies such as data types, data collection methods, and the concept of synthetic data. It introduces artificial intelligence (AI), its definitions, types, and the distinction between artificial and human intelligence. Additionally, it discusses the DIKW framework and suggests future class initiatives and topics for further exploration in data science.

Uploaded by

umarfiaz1199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

DATA SCIENCE

FUNDAMENTALS

FA23-BST-
SPRING2025
Lecture 1
Dr. Asma Arshad
Associate Prof. PHED
TODAY’S AGENDA

Introduction – Basic Terminologies


 Data Collection Ways
• Generated
• Collected
• Retrieved
 Synthetic Data
 Artificial Intelligence
• Why artificial?
 DIKW
BASIC TERMINOLOGIES

DATA

TYPES

NUMERICA
HYBRID Non-Numerical
L
BASIC
TERMINOLOGIES
FOR DATA SCIENCE
NAVIGATING THE FUTURE AI & ML
DATA CAN BE

Generated Simulation

Collected Primary Secondary

Data Algorithm Similarity


Retrieved structures s Measures
1. GENERATED DATA

Generated data is artificially created rather than collected from real-world


sources. It is often used in simulations, testing, or predictive modeling.

🔹 How it's generated: Through simulations, synthetic data generation, or


computational models.

🔹 Examples:
 Monte Carlo Simulation: Used in risk analysis and financial forecasting.

 Synthetic Data in Machine Learning: AI-generated images (e.g., using GANs),


synthetic customer transactions for fraud detection.
 Game Development: Simulated player behavior for AI testing.

 Physics & Weather Forecasting: Climate models simulate future temperature


trends.
2. COLLECTED DATA

Collected data comes from real-world observations, experiments, or existing


databases. It is divided into Primary and Secondary data.

🔸 Primary Data (Collected firsthand)


 Data that is collected directly for a specific research purpose.
 Examples:

o Surveys: A company conducts a survey to understand customer satisfaction.

o Experiments: A scientist records lab results from a chemical reaction.

o Sensor Data: IoT devices collecting temperature data.

o Field Research: Biologists tracking animal migration patterns.


2. COLLECTED DATA

🔸 Secondary Data (Collected from existing sources)


 Data that has already been collected by someone else and is reused.
 Examples:
o Census Data: Governments use past census data for policy planning.

o Stock Market Data: Investors analyze past market trends from financial
reports.
o Medical Records: A researcher uses past patient records for disease
prediction.

Wikipedia & Public Datasets: AI models trained on pre-existing datasets.

o.
3. RETRIEVED DATA

Retrieved data involves extracting useful information using data structures, algorithms, and
similarity measures. This category is essential for fields like big data, information retrieval,
and AI applications.

🔸 Data Structures

 How data is stored efficiently for quick retrieval.

 Examples:

o Databases (SQL, NoSQL): Storing customer records in MySQL.

o Hash Tables: Fast lookup for a dictionary app.

o Graphs: Social networks (Facebook friend connections).

o Trees: Search engines use tree structures for indexing web pages.
3. RETRIEVED DATA

🔸 Algorithms
 How data is retrieved, sorted, and analyzed.
 Examples:

o Search Algorithms: Google uses PageRank to find relevant web pages.

o Sorting Algorithms: E-commerce sites sort products by price.

o Machine Learning Models: Netflix recommends shows based on user data.

o Pattern Recognition: AI detects spam emails based on past spam patterns.


3. RETRIEVED DATA

🔸 Similarity Measures
 How we compare and retrieve similar data points.
 Examples:

o Euclidean Distance: Measuring similarity in face recognition systems.

o Cosine Similarity: Finding similar documents in text analysis.

o Jaccard Similarity: Detecting plagiarism between two texts.

KNN (k-Nearest Neighbors): Classifying email as spam or not based on past


messages.
SYNTHETIC DATA

In Data Science & AI (Synthetic Data)


o Artificially generated data that mimics real-
world data but does not come from actual
observations.
o Used when real data is unavailable, expensive, or
sensitive (e.g., medical data).
o Example: AI-generated customer transactions to
train fraud detection models.
Category Definition Examples

Monte Carlo simulations, AI-


Simulated or artificially created
Generated generated images, climate
data
models

Surveys, experiments, IoT


Collected (Primary) Data collected firsthand
sensor data

Pre-existing data used for Census data, stock market


Collected (Secondary)
analysis history, public datasets

Retrieved (Data Databases, graphs, trees, hash


Organizing data efficiently
Structures) tables

Search engines, ML
Retrieved (Algorithms) Extracting and processing data
recommendations, sorting

Retrieved (Similarity Comparing and finding related Face recognition, plagiarism


Measures) data detection, KNN classification
AI

• What is AI (Artificial Intelligence)?

Artificial Intelligence (AI) refers to the simulation of human


intelligence in machines that can perform tasks typically requiring
human thinking, such as learning, reasoning, problem-solving,
perception, and decision-making.

AI enables machines to:


✅ Learn from data (Machine Learning)
✅ Recognize patterns (Face recognition, speech processing)
✅ Make decisions (Self-driving cars, chatbots)
✅ Understand language (ChatGPT, Google Assistant)
WHY IS IT CALLED "ARTIFICIAL"?

• The term "artificial" means man-made, not natural.

• We call it Artificial Intelligence because:

• 🔹 It is not real human intelligence but simulated using


algorithms and computers.

• 🔹 Machines do not think like humans but process


information based on predefined rules and learning models.

• 💡 Example:

• A human can learn from experience and make decisions naturally.

• AI can learn from data and make predictions but only within the
limits of the algorithms.
TYPES OF AI

1 Narrow AI (Weak AI) – Designed for specific tasks


1️⃣

🔸 Example: Google Search, Siri, Spam filters

2️⃣General AI (Strong AI) – Hypothetical AI that can think like humans

🔸 Example: AI that understands emotions, reasons, and makes


decisions like a person (not yet achieved).

3️⃣Super AI – AI that surpasses human intelligence (theoretical)

🔸 Example: AI that can innovate and improve itself without human


help.
BASIC TERMS FOR AI

Term Meaning

Machines that simulate human


AI
intelligence
Man-made, not naturally
Artificial
occurring
AI for specific tasks (e.g., Google
Narrow AI
Search, chatbots)
AI with human-like intelligence
General AI
(not yet achieved)
AI surpassing human intelligence
Super AI
(theoretical)
DIKW
EXPANSIO
N
UNLOCKING NEW HORIZONS
DIKW
Data Processe
d
Information Validatio
n

Knowledge Thinking

Wisdom
DIKW
INNOVATIVE
SOLUTIONS
From the MovieLens dataset documentation, this file has 5
columns:
Column
Description
Name
user_id Unique identifier for each user
age Age of the user
Gender (M = Male, F =
gender
Female)
occupation User's occupation
zip_code User's zip code
What we want?

📊 Suggested Data Analysis


Now that we understand the dataset, let's perform some
key analyses.

Basic Summary Statistics


🔹 Insights:
 The average user age, min/max age, gender distribution, and
most common occupations.
What we want?

💡 Conclusion
By performing these analyses, you can get
demographic insights into the MovieLens
dataset.
Would you like more advanced analysis, such as
correlating age with occupation trends? 🚀
NEXT LECTURE INITIATIVES

1. Bring laptops… to get with the


python
2. Technology integration. To exploare
the sharing of files via whatsapp.
3. Collaborative woking. Fost learning
to proceed with real datasets..
NEXT CLASS AGENDA

Introduction – Basic Terminologies


 What is Data Science?
• Big Data and Data Science Hype
• Getting Past the Hype
 Why Now?

 Datafication

 Data Science Jobs

 What is a Data scientist?

• In academia

• In industry
THANK YOU
ANY QUESTIONS?

[email protected]

You might also like