0% found this document useful (0 votes)
33 views5 pages

UNIT I Complete Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

UNIT I Complete Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT I: Introduction to Data Science - Exam Preparation Notes

UNIT I: Definition of Data Science

Definition of Data Science:

Data Science combines programming, statistics, and domain expertise to analyze and interpret data.

It focuses on discovering patterns and insights through techniques like data cleaning, exploration,

and visualization. Applications include fraud detection, customer segmentation, and healthcare

analytics.

Key Aspects:

1. Interdisciplinary Field: Merges computer science, statistics, and domain knowledge.

2. Real-Life Example: Netflix uses Data Science to recommend content based on user behavior.

3. Tools: Python, R, Hadoop, and Spark are common tools used in Data Science.
UNIT I: Introduction to Data Science - Exam Preparation Notes

Big Data and Data Science Hype

Big Data and Data Science Hype:

Big Data refers to extremely large datasets that require advanced tools for processing.

Characteristics of Big Data (5Vs):

1. Volume: Amount of data (e.g., Facebook generates 500 terabytes daily).

2. Velocity: Speed at which data is generated (e.g., stock market data updates).

3. Variety: Diverse data types, including text, images, and videos.

4. Veracity: Ensuring the quality and reliability of data.

5. Value: Transforming data into actionable insights.

Moving Beyond the Hype:

While Data Science is celebrated for its power, challenges include data quality, model reliability, and

ethical considerations.
UNIT I: Introduction to Data Science - Exam Preparation Notes

Datafication

Datafication:

Datafication transforms human behavior, business processes, and systems into data.

For example, fitness trackers convert physical activity into digital metrics, enabling health insights.

Examples:

1. E-commerce: Amazon tracks user behavior to recommend products.

2. Education: E-learning platforms track progress and suggest personalized learning paths.
UNIT I: Introduction to Data Science - Exam Preparation Notes

Current Landscape of Perspectives

Current Landscape of Perspectives:

1. Statistical Inference:

- Using sample data to make predictions about populations.

- Example: Election polling uses a sample of voters to predict outcomes.

2. Populations and Samples:

- Population: All individuals (e.g., all citizens in a country).

- Sample: A subset for analysis (e.g., 1,000 people surveyed).

3. Statistical Modeling:

- Models represent relationships in data (e.g., predicting house prices).

- Includes concepts like regression, classification, and clustering.

4. Probability Distributions:

- Normal Distribution: Bell-shaped curve (e.g., heights of people).

- Poisson Distribution: Events in fixed intervals (e.g., website traffic).

5. Overfitting:

- A model performs well on training data but poorly on unseen data.

- Solution: Use cross-validation and simplify the model.


UNIT I: Introduction to Data Science - Exam Preparation Notes

Basics of R Programming

Basics of R Programming:

R is a programming language widely used for statistical computing and data analysis.

1. Setting Up R Environment:

- Install R and RStudio (an IDE for R programming).

2. R Syntax:

- Variables: `x <- 10`

- Functions: `sum(c(1, 2, 3))`

3. Data Structures in R:

- Vectors: `v <- c(1, 2, 3)`

- Data Frames: `df <- data.frame(name=c("A", "B"), age=c(25, 30))`

4. Common Libraries:

- ggplot2: Data visualization.

- dplyr: Data manipulation.

You might also like