0% found this document useful (0 votes)
3 views

Data Science

Data Science involves analyzing data through techniques like statistics and machine learning to derive insights. It encompasses various applications, stages of AI projects, and data types, including structured and unstructured data. Key concepts include data collection, preprocessing, visualization, and algorithms like K-Nearest Neighbors (KNN), which classify data based on proximity.

Uploaded by

srishithsrinand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Science

Data Science involves analyzing data through techniques like statistics and machine learning to derive insights. It encompasses various applications, stages of AI projects, and data types, including structured and unstructured data. Key concepts include data collection, preprocessing, visualization, and algorithms like K-Nearest Neighbors (KNN), which classify data based on proximity.

Uploaded by

srishithsrinand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1.​What is Data Science?


Data Science is the study of data using techniques like analysis,
statistics, and machine learning to extract insights and make
decisions.
2.​Relation between Data Science and Machine Learning:​
Machine Learning is a subset of Data Science that uses
algorithms to make predictions and automate tasks based on
data.
3.​Applications of Data Science:
●​ Fraud detection
●​ Healthcare predictions
●​ Recommendation systems (e.g., Netflix)
●​ Marketing analytics
4.​Stages of AI Project Cycle:
●​ Problem identification
●​ Data collection
●​ Data preparation
●​ Model building
●​ Evaluation
●​ Deployment
5.​4Ws Canvas for Scoping a Problem:
●​ What: Define the problem.
●​ Why: Understand the need.
●​ Where: Identify the context.
●​ Who: Determine stakeholders.
6.​Steps in Data Collection:

Define objectives.

Identify data sources.

Collect data (manual/automated).


Validate and clean the data.

7.Difference Between Numerical and Categorical Data:​


Numerical data contains numbers (e.g., age), while categorical
data includes labels (e.g., colors).

8.Define Data Visualization:​


Data visualization uses charts and graphs to represent data for
better understanding and analysis.

9.Compare Structured, Semi-structured, and Unstructured Data:

●​ Structured: Organized in tables (e.g., databases).


●​ Semi-structured: Partially organized (e.g., XML).
●​ Unstructured: No organization (e.g., videos).
10.​ Steps in Data Preprocessing:
●​ Data cleaning
●​ Transformation
●​ Integration
●​ Reduction
11.​ Role of Libraries in Data Science:​
Libraries like NumPy, Pandas, Matplotlib, and Seaborn simplify
data analysis, visualization, and manipulation in Python.
12.​ Statistical Learning in Data Science:​
Statistical learning involves algorithms that use statistical models
to analyze and predict data patterns.
13.​ Difference Between Supervised and Unsupervised Learning:​
Supervised learning uses labeled data, while unsupervised
learning analyzes unlabeled data to find patterns.
14.​ Define K-Nearest Neighbors (KNN):​
KNN is a machine learning algorithm that classifies data points
based on their nearest neighbors.
15.​ Impact of K Value in KNN:​
The value of K affects classification accuracy. A small K may
cause overfitting, while a large K generalizes better.
16.​ Advantages/Disadvantages of KNN:​
Advantages: Simple, intuitive.​
Disadvantages: Slow with large data, sensitive to irrelevant
features.
17.​ Role of Statistical Measures:​
Mean, median, mode, and standard deviation summarize and
describe data distribution and variability.
18.​ Importance of Data Visualization:​
Visualization helps identify patterns, detect outliers, and
communicate insights effectively.
19.​ Explain Box Plot:​
A box plot visualizes data spread, outliers, and central
tendencies using quartiles and whiskers.

You might also like