Ass 2
Ass 2
Big Data refers to extremely large datasets that cannot be handled efficiently with traditional
data processing tools. It involves complex and massive volumes of data generated from
various sources such as social media, sensors, digital transactions, and more.
• Healthcare: Big Data is used for predictive analytics in patient health, treatment
outcomes, and disease patterns.
• Retail: Companies like Amazon use Big Data to analyze purchasing behavior and
offer personalized recommendations.
• Finance: Fraud detection and algorithmic trading are powered by Big Data analytics.
2. Comparison of Data Science and Data Analytics
Data Science and Data Analytics are closely related fields, but they serve different purposes
in analyzing data:
• Data Science is a multidisciplinary field that uses scientific methods, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It
involves big-picture thinking and encompasses data preparation, model building,
and interpretation of results.
o Key Roles: Data scientists build predictive models using machine learning,
statistical analysis, and programming.
o Example: Developing a machine learning model to predict customer churn.
• Data Analytics focuses on processing and analyzing data to extract specific,
actionable insights. It often deals with answering specific business questions or
reporting.
o Key Roles: Data analysts clean data, perform statistical analysis, and create
dashboards and reports.
o Example: Analyzing sales data to identify trends and help guide business
decisions.
Key Differences:
• Scope: Data Science is broader and includes advanced techniques like machine
learning, whereas Data Analytics focuses on more specific, often real-time data
interpretation.
• Outcome: Data Science builds models for prediction and future insights, while Data
Analytics provides retrospective or descriptive insights.
How They Complement Each Other: Data Analytics is often part of the Data Science
process, where analysts explore data trends and relationships before scientists build advanced
models. Together, they transform raw data into insights and decisions.
3. Importance of Data Visualization in Data Science
Data Visualization is crucial in Data Science because it transforms complex data into visual
formats like graphs, charts, and maps, making it easier to understand, communicate, and
interpret insights.
Importance:
• Sales Dashboards: A retail company uses bar charts and pie charts to monitor daily
sales performance, leading to adjustments in marketing strategies.
• Healthcare: Heat maps are used to visualize patient data across regions, helping
public health officials allocate resources more effectively.
4. Comparison of Machine Learning and Deep Learning
Machine Learning (ML) and Deep Learning (DL) are both subsets of AI, but they differ in
their methodologies, applications, and challenges:
• Machine Learning involves algorithms that allow computers to learn patterns from
data and make predictions. It typically requires feature engineering, where human
experts identify the features that will help the model.
Challenges: ML models can be less effective with very large and complex datasets.
• Deep Learning is a subset of ML that uses neural networks with many layers (hence
"deep") to automatically learn features from raw data. It does not require manual
feature engineering and excels at processing unstructured data like images, audio, and
text.
Examples:
• Supervised Learning: The model learns from labeled data, where the input-output
pairs are provided. The model maps inputs to known outputs (targets).
Example: Predicting house prices (input: features like location, size; output: price).
• Unsupervised Learning: The model learns from unlabeled data, trying to find
hidden patterns or structure in the data.
Example: Training a robot to navigate through a maze, where the robot is rewarded
for reaching the end of the maze and penalized for hitting obstacles.
6. Basic Steps in Building a Machine Learning Model
The process of building a Machine Learning model involves several key steps:
Importance of Preprocessing:
• Preprocessing ensures that the data is in a suitable format for the model to learn
effectively.
• Dimensionality Reduction reduces noise and computational complexity, improving
model efficiency.
• Normalization ensures that no feature dominates due to its scale, helping models
converge faster.
• Feature Extraction improves model accuracy by providing relevant and high-quality
inputs.