0% found this document useful (0 votes)

19 views6 pages

Ass 2

Uploaded by

ptlaman1922

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views6 pages

Ass 2

Uploaded by

ptlaman1922

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1.

Definition and Characteristics of Big Data

Big Data refers to extremely large datasets that cannot be handled efficiently with traditional
data processing tools. It involves complex and massive volumes of data generated from
various sources such as social media, sensors, digital transactions, and more.

Key Characteristics of Big Data (often referred to as the 5 Vs):

1. Volume: The size of the data is huge (terabytes to zettabytes).

o Example: Data generated by social media platforms like Facebook and
Twitter.
2. Velocity: The speed at which new data is generated and processed.
o Example: Data from stock market transactions or real-time GPS data.
3. Variety: Big Data comes in different formats – structured, unstructured, and semi-
structured.
o Example: Text, images, audio, video, log files, etc.
4. Veracity: The uncertainty or trustworthiness of data, which requires data cleaning and
validation.
o Example: Inconsistent data from social media posts or customer reviews.
5. Value: The meaningful insights or benefits that can be derived from analyzing Big
Data.
o Example: Data analytics providing personalized recommendations for
customers in e-commerce.

Real-World Applications of Big Data:

• Healthcare: Big Data is used for predictive analytics in patient health, treatment
outcomes, and disease patterns.
• Retail: Companies like Amazon use Big Data to analyze purchasing behavior and
offer personalized recommendations.
• Finance: Fraud detection and algorithmic trading are powered by Big Data analytics.
2. Comparison of Data Science and Data Analytics

Data Science and Data Analytics are closely related fields, but they serve different purposes
in analyzing data:

• Data Science is a multidisciplinary field that uses scientific methods, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It
involves big-picture thinking and encompasses data preparation, model building,
and interpretation of results.
o Key Roles: Data scientists build predictive models using machine learning,
statistical analysis, and programming.
o Example: Developing a machine learning model to predict customer churn.
• Data Analytics focuses on processing and analyzing data to extract specific,
actionable insights. It often deals with answering specific business questions or
reporting.
o Key Roles: Data analysts clean data, perform statistical analysis, and create
dashboards and reports.
o Example: Analyzing sales data to identify trends and help guide business
decisions.

Key Differences:

• Scope: Data Science is broader and includes advanced techniques like machine
learning, whereas Data Analytics focuses on more specific, often real-time data
interpretation.
• Outcome: Data Science builds models for prediction and future insights, while Data
Analytics provides retrospective or descriptive insights.

How They Complement Each Other: Data Analytics is often part of the Data Science
process, where analysts explore data trends and relationships before scientists build advanced
models. Together, they transform raw data into insights and decisions.
3. Importance of Data Visualization in Data Science

Data Visualization is crucial in Data Science because it transforms complex data into visual
formats like graphs, charts, and maps, making it easier to understand, communicate, and
interpret insights.

Importance:

• Improves Understanding: Visual representations make it easier for people to

understand patterns, outliers, and trends in data.
• Enhances Decision-Making: Decision-makers can quickly grasp insights, enabling
faster and more informed decisions.
• Communicates Results: Visualizations are essential for presenting findings in a clear
and concise manner to non-technical stakeholders.

Common Tools and Techniques:

• Tools: Tableau, Power BI, Matplotlib, Plotly, D3.js.

• Techniques: Bar charts, line graphs, scatter plots, heat maps, pie charts, dashboards.

Examples of Visualization Influencing Decision-Making:

• Sales Dashboards: A retail company uses bar charts and pie charts to monitor daily
sales performance, leading to adjustments in marketing strategies.
• Healthcare: Heat maps are used to visualize patient data across regions, helping
public health officials allocate resources more effectively.
4. Comparison of Machine Learning and Deep Learning

Machine Learning (ML) and Deep Learning (DL) are both subsets of AI, but they differ in
their methodologies, applications, and challenges:

• Machine Learning involves algorithms that allow computers to learn patterns from
data and make predictions. It typically requires feature engineering, where human
experts identify the features that will help the model.

Applications: Fraud detection, customer segmentation, recommendation systems.

Challenges: ML models can be less effective with very large and complex datasets.

• Deep Learning is a subset of ML that uses neural networks with many layers (hence
"deep") to automatically learn features from raw data. It does not require manual
feature engineering and excels at processing unstructured data like images, audio, and
text.

Applications: Image recognition (e.g., facial recognition), language translation, self-

driving cars.

Challenges: Deep learning requires massive amounts of data and computational

power, making it resource-intensive.

Examples:

• Machine Learning: A spam filter in email systems uses supervised learning to

classify emails as spam or not based on certain features like keywords.
• Deep Learning: In self-driving cars, deep learning processes video data to identify
objects such as pedestrians, road signs, and vehicles.
5. Types of Learning Problems in Machine Learning

There are three main types of learning problems in Machine Learning:

• Supervised Learning: The model learns from labeled data, where the input-output
pairs are provided. The model maps inputs to known outputs (targets).

Example: Predicting house prices (input: features like location, size; output: price).

• Unsupervised Learning: The model learns from unlabeled data, trying to find
hidden patterns or structure in the data.

Example: Customer segmentation in marketing (clustering customers based on their

purchasing behavior without predefined labels).

• Reinforcement Learning: The model learns by interacting with an environment and

receiving feedback in the form of rewards or penalties based on its actions.

Example: Training a robot to navigate through a maze, where the robot is rewarded
for reaching the end of the maze and penalized for hitting obstacles.
6. Basic Steps in Building a Machine Learning Model

The process of building a Machine Learning model involves several key steps:

1. Data Collection: Gathering relevant data from various sources.

2. Data Preprocessing: This includes:
o Cleaning: Handling missing or inconsistent data.
o Feature Selection: Choosing the most relevant features for the model.
o Normalization/Standardization: Scaling features to a common range for
consistent model behavior.
o Dimensionality Reduction: Techniques like Principal Component Analysis
(PCA) reduce the number of features while preserving the important
information.
o Feature Extraction: Transforming raw data into features that better represent
the problem.
3. Model Selection: Choosing the appropriate machine learning algorithm (e.g., decision
tree, neural network, support vector machine).
4. Training the Model: Feeding the model the training data so it can learn the
relationship between inputs and outputs.
5. Evaluation: Testing the model on unseen data to assess its accuracy, precision, recall,
etc.
6. Tuning: Adjusting model parameters (hyperparameters) to improve performance.
7. Deployment: Implementing the model in a real-world system for making predictions.

Importance of Preprocessing:

• Preprocessing ensures that the data is in a suitable format for the model to learn
effectively.
• Dimensionality Reduction reduces noise and computational complexity, improving
model efficiency.
• Normalization ensures that no feature dominates due to its scale, helping models
converge faster.
• Feature Extraction improves model accuracy by providing relevant and high-quality
inputs.

Bda Bi Jit Chapter-6
No ratings yet
Bda Bi Jit Chapter-6
16 pages
Machine Learning
100% (1)
Machine Learning
90 pages
Data Science Syllabus From Beginner To Advanced
No ratings yet
Data Science Syllabus From Beginner To Advanced
7 pages
Artificial Intelligence and Machine Learning For Business
No ratings yet
Artificial Intelligence and Machine Learning For Business
22 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Introduction To Emerging Technologies
No ratings yet
Introduction To Emerging Technologies
43 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
UNIT 1 (ML For DS)
No ratings yet
UNIT 1 (ML For DS)
10 pages
Module 4 Data Science
No ratings yet
Module 4 Data Science
42 pages
Learning and Big Data AI, Machine
No ratings yet
Learning and Big Data AI, Machine
42 pages
Aiml
No ratings yet
Aiml
11 pages
360DigiTmg E Book Data Science
100% (1)
360DigiTmg E Book Data Science
168 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
Data Science and Machine Learning
No ratings yet
Data Science and Machine Learning
30 pages
360DigiTMG Practical Data Science New
100% (1)
360DigiTMG Practical Data Science New
168 pages
Data Science Vs Machine Learning Vs Deep Learning: The Difference
No ratings yet
Data Science Vs Machine Learning Vs Deep Learning: The Difference
19 pages
ML Notes
No ratings yet
ML Notes
101 pages
Koushal Vichare Assingment
No ratings yet
Koushal Vichare Assingment
5 pages
DL Unit 1
No ratings yet
DL Unit 1
21 pages
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
53 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
Industrial Training Report (Sahil)
No ratings yet
Industrial Training Report (Sahil)
33 pages
English For Academic and Professional Purposes
No ratings yet
English For Academic and Professional Purposes
18 pages
ML & Statistical Methods in Business
No ratings yet
ML & Statistical Methods in Business
9 pages
Issues in Machine Learning
No ratings yet
Issues in Machine Learning
63 pages
4.introductin To Machine Learning
No ratings yet
4.introductin To Machine Learning
28 pages
Session One Machine Learning
No ratings yet
Session One Machine Learning
18 pages
Key Standards of Epp Sy 2023
No ratings yet
Key Standards of Epp Sy 2023
21 pages
Mlanswers
No ratings yet
Mlanswers
17 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Research Paper
No ratings yet
Research Paper
14 pages
AI Unit 1
No ratings yet
AI Unit 1
36 pages
Module 1 (ML)
No ratings yet
Module 1 (ML)
17 pages
Note - Before Use Check Answers According To Your Syllabus.: Importance
No ratings yet
Note - Before Use Check Answers According To Your Syllabus.: Importance
31 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
7 pages
Cpelec2 Activity 1 Vargas Reinner
No ratings yet
Cpelec2 Activity 1 Vargas Reinner
4 pages
Report Print
No ratings yet
Report Print
22 pages
Data Science Deep Learning & Artificial Intelligence
No ratings yet
Data Science Deep Learning & Artificial Intelligence
9 pages
ML Sessional - I Ans
No ratings yet
ML Sessional - I Ans
18 pages
Data Science Course Syllabus 01
100% (1)
Data Science Course Syllabus 01
20 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
A Comprehensive Guide To Machine Learning
No ratings yet
A Comprehensive Guide To Machine Learning
8 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
Ahishek File
No ratings yet
Ahishek File
6 pages
Intro To ML
No ratings yet
Intro To ML
4 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
Unit I
No ratings yet
Unit I
23 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
Question 3
No ratings yet
Question 3
6 pages
Shubans 3rd Q
No ratings yet
Shubans 3rd Q
5 pages
Data Science Topics Notes
No ratings yet
Data Science Topics Notes
3 pages
Data Science and Analytics Reviewer
No ratings yet
Data Science and Analytics Reviewer
5 pages
Machine Learning: A Comprehensive Overview
No ratings yet
Machine Learning: A Comprehensive Overview
3 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
5 pages
BE Computer Engineering Syllabus 2019 Course
No ratings yet
BE Computer Engineering Syllabus 2019 Course
3 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Creative Classroom PDF
100% (1)
Creative Classroom PDF
170 pages
Baddi Y. Big Data Intelligence For Smart Applications 2022
No ratings yet
Baddi Y. Big Data Intelligence For Smart Applications 2022
343 pages
Psycho Linguistics
100% (1)
Psycho Linguistics
7 pages
Tools To Assess Curriculum: Deseree P. Bautista
100% (1)
Tools To Assess Curriculum: Deseree P. Bautista
63 pages
Customer Service Executive
100% (1)
Customer Service Executive
54 pages
Term Paper: Amity Institute of Psychology and Allied Sciences Amity University, Noida
No ratings yet
Term Paper: Amity Institute of Psychology and Allied Sciences Amity University, Noida
17 pages
LET English Major Language
No ratings yet
LET English Major Language
39 pages
Week 4: Schemes of Work: Assessment Homework
No ratings yet
Week 4: Schemes of Work: Assessment Homework
6 pages
UNLV Student: PSMT Name: Lesson Plan Title: Lesson Plan Topic: Date: Estimated Time: Grade Level: School Site: 1. State Standard(s)
No ratings yet
UNLV Student: PSMT Name: Lesson Plan Title: Lesson Plan Topic: Date: Estimated Time: Grade Level: School Site: 1. State Standard(s)
9 pages
Twelve Angry Men
No ratings yet
Twelve Angry Men
8 pages
Educational Evaluation Which Has Developed Standards For Educational
No ratings yet
Educational Evaluation Which Has Developed Standards For Educational
15 pages
Chap 4 O.B by Mcshane PDF
No ratings yet
Chap 4 O.B by Mcshane PDF
33 pages
2 of 4 Chasing Fleeing and Dodging
No ratings yet
2 of 4 Chasing Fleeing and Dodging
5 pages
Passages2e - Language Summaries
No ratings yet
Passages2e - Language Summaries
12 pages
Coste - Moore - Zarate Plurilingual and Pluricultural Competence
No ratings yet
Coste - Moore - Zarate Plurilingual and Pluricultural Competence
51 pages
16pf - 6th Edition - Competency Profile and Interview Guide - Ella - SAMPLE
No ratings yet
16pf - 6th Edition - Competency Profile and Interview Guide - Ella - SAMPLE
26 pages
Defining The Scope of AI Regulations
No ratings yet
Defining The Scope of AI Regulations
24 pages
Ryan International School, Sharjah
No ratings yet
Ryan International School, Sharjah
7 pages
The Research Problem: The Key Steps in Choosing A Topic
No ratings yet
The Research Problem: The Key Steps in Choosing A Topic
5 pages
CNCA Nepali School 2020 - 2021
No ratings yet
CNCA Nepali School 2020 - 2021
17 pages
Unit (6) Teaching Handwriting and Early Writing Skills
No ratings yet
Unit (6) Teaching Handwriting and Early Writing Skills
5 pages
Shadows Learner Guide
No ratings yet
Shadows Learner Guide
2 pages
Commed HealthEducationHABAWEL
No ratings yet
Commed HealthEducationHABAWEL
7 pages
Schools Division of Negros Oriental: Republic of The Philippines Region VII, Central Visayas
No ratings yet
Schools Division of Negros Oriental: Republic of The Philippines Region VII, Central Visayas
2 pages
Active and Passive Voice
No ratings yet
Active and Passive Voice
5 pages
Mid-Term Test II: 1 Bachillerato
No ratings yet
Mid-Term Test II: 1 Bachillerato
2 pages
Emails Do's and Don'Ts
No ratings yet
Emails Do's and Don'Ts
1 page
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet

Ass 2

Uploaded by

Ass 2

Uploaded by

1.

Definition and Characteristics of Big Data

Key Characteristics of Big Data (often referred to as the 5 Vs):

1. Volume: The size of the data is huge (terabytes to zettabytes).

Real-World Applications of Big Data:

• Improves Understanding: Visual representations make it easier for people to

Common Tools and Techniques:

• Tools: Tableau, Power BI, Matplotlib, Plotly, D3.js.

Examples of Visualization Influencing Decision-Making:

Applications: Fraud detection, customer segmentation, recommendation systems.

Applications: Image recognition (e.g., facial recognition), language translation, self-

Challenges: Deep learning requires massive amounts of data and computational

• Machine Learning: A spam filter in email systems uses supervised learning to

There are three main types of learning problems in Machine Learning:

Example: Customer segmentation in marketing (clustering customers based on their

• Reinforcement Learning: The model learns by interacting with an environment and

1. Data Collection: Gathering relevant data from various sources.

You might also like