DS
DS
• Data Science helps extract meaningful insights from massive amounts of structured and
unstructured data.
Cannot perform arithmetic operations. Can rank values but not quantify differences.
• Imputation:
• Use advanced techniques like machine learning models to estimate missing values.
Does not go beyond the data analyzed. Uses sample data to draw conclusions about a larger group.
• Logistic regression assumes a linear relationship between independent variables and the log
odds, which may not hold true in complex datasets.
• It cannot capture intricate patterns or non-linear relationships as well as machine learning
models like Random Forests or Neural Networks.
Bayes' Theorem provides a way to calculate the probability of an event based on prior knowledge:
Where:
• The maximum marginal hyperplane (MMH) is the decision boundary that maximizes the
margin (distance) between classes.
• SVM optimizes the MMH by solving a convex optimization problem using techniques like
Lagrange multipliers.
• Kernel functions (e.g., linear, polynomial, RBF) can be used to handle non-linear data.
Single tree structure used for decision-making. Collection of multiple decision trees (ensemble).
9. Mention any four different types of plots used for data visualization.
Outlier analysis involves identifying data points that deviate significantly from the majority of the
dataset.
Data wrangling is the process of cleaning, transforming, and preparing raw data into a usable format.
Steps include:
• Removing duplicates.
4o
You said:
ChatGPT said:
ChatGPT
Here are the answers to the questions from the second image:
Organized and stored in tabular formats (e.g., databases). Does not have a pre-defined format.
Example: Customer information table with columns (Name, Example: Images, videos, audio files,
Age, Email). emails.
2. List the relationship between computer science, data science, and traditional sciences.
• Computer Science: Provides computational tools, algorithms, and infrastructure for data
analysis (e.g., programming, databases).
• Data Science: Combines computer science and statistics to extract insights from data.
• Traditional Sciences: Use data science for experimentation, modeling, and simulations (e.g.,
biology for genomics, physics for particle analysis).
Statistical inference involves drawing conclusions about a population based on a sample of data.
Significance:
• Each cell indicates the strength and direction of the relationship between two variables.
Uses a single tree to make decisions. Combines multiple decision trees for better accuracy.
Definition: Data visualization is the graphical representation of information and data using charts,
graphs, and maps.
Importance:
1. Line charts.
2. Bar charts.
3. Pie charts.
4. Scatter plots.
5. Heatmaps.
6. Histograms.
7. Box plots.
8. Tree maps.
• Object detection: Detecting and classifying objects in images (e.g., self-driving cars).
Definition: Clustering is an unsupervised learning technique that groups similar data points into
clusters.
Types:
Goal: Predict output or classify data. Goal: Discover hidden patterns or groupings.