Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 1 - Introduction To Data Science
Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 1 - Introduction To Data Science
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
PART A
1. What is Bigdata?
Big data is a huge volume, high velocity and variety of data that cannot
be processed by traditional processing system.
They are characterized by the 7 Vs: velocity, variety, volume, variability,
visualization, value and veracity.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
PART B
1. Give the description about data science and its applications, also
discuss the benefits and uses of Data Science and Big Data.
Contents
Big Data
Data Science
Benefits and Uses:
1. Commercial Companies
2. Human Resource Professionals
3. Financial Institutions
4. Government Organizations
5.Non-governmental organizations
(NGOs)
6. Universities
Data Science Tools
Real Time Applications of Data Science
Data
Data is a collection of discrete states that convey information,
describing quantity, quality, fact and statistics.
Big data
Big data is a huge volume, high velocity and variety of data that
cannot be processed by traditional processing system.
They are characterized by the 7 Vs: velocity, variety, volume,
variability, visualization, value and veracity.
Data science
Data science is the field of study of data, using modern scientific
techniques, statistical methods and algorithms to derive insights
from huge volume of data and to create business and IT strategies.
It deals about where the data comes from, what it represents, and
the ways by which it can be transformed into valuable inputs and
resources
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
2. List and explain the facets of data or different types of data or categories of
data.
Contents
1. Structured
2. Unstructured
3. Natural Language
4. Machine-generated
5. Graph-based
6. Audio, video, and images
7. Streaming
Categories of data:
1. Structured data
Structured data is data that depends on a data model and resides in a
fixed field within a record.
It’s easy to store structured data in tables within databases or Excel files.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
2. Unstructured data
Unstructured data is data that isn’t easy to fit into a data model because
the content is context-specific or varying.
Example - regular email. (Figure 1.2).
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
3. Natural language
Natural language is a special type of unstructured data; it’s
challenging to process because it requires knowledge of specific data
science techniques and linguistics.
The natural language processing community had success in entity
recognition, topic recognition, summarization, text completion, and
sentiment analysis, but models trained in one domain don’t generalize
well to other domains.
4. Machine-generated data
Machine-generated data is information that’s automatically created
by a computer, process, application, or other machine without human
intervention.
The analysis of machine data relies on highly scalable tools, due to its
high volume and speed.
Examples - web server logs, call detail records, network event logs, and
telemetry (Figure 1.3).
The machine data in figure 1.3 would fit nicely in a classic table-
structured database.
This isn’t the best approach for highly interconnected or “networked”
data, where the relationships between entities have a valuable role to
play.
5 Graph-based or network data
“Graph” points to mathematical graph theory.
In graph theory, a graph is a mathematical structure to model pair-
wise relationships between objects.
Graph or network data is, a data that focuses on the relationship or
adjacency of objects.
The graph structures use nodes, edges, and properties to represent and
store graphical data.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
Graph databases are used to store graph-based data and are queried with
specialized query languages such as SPARQL.
6. Audio, image, and video
Audio, image, and video are data types that pose specific challenges to
a data scientist.
Tasks that are trivial for humans, such as recognizing objects in pictures,
turn out to be challenging for computers.
High-speed cameras at stadiums will capture ball and athlete movements
to calculate in real time, for example, the path taken by a defender
relative to two baselines.
Recently a company called DeepMind succeeded at creating an algorithm
that’s capable of learning how to play video games.
This algorithm takes the video screen as input and learns to interpret
everything via a complex process of deep learning.
This prompted Google to buy the company for their own Artificial
Intelligence (AI) development plans.
7. Streaming data
The data flows into the system in a continuous manner when an event
happens instead of being loaded into a data store in a batch.
Examples - “What’s trending” on Twitter, live sporting or music events,
and the stock market.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
The second step is to collect data by finding suitable data and getting
access to the data from the data owner.
Start with data stored within the company
o The data can be stored in official data repositories such as
databases, data marts, data warehouses, and data lakes
maintained by a team of IT professionals.
o The primary goal of a database is data storage, while a data
warehouse is designed for reading and analyzing that data.
o A data mart is a subset of the data warehouse and geared toward
serving a specific business unit.
o While data warehouses and data marts are home to preprocessed
data, data lakes contains data in its natural or raw format which
probably needs polishing and transformation before it becomes
usable..
Don’t be afraid to shop around
o Many companies specialize in collecting valuable information.
o Data can also be delivered by third-party companies and take
many forms ranging from Excel spreadsheets to different types of
databases. Refer Table 1.2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
3 Data preparation
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
Common Errors
Table 1.3 – Common Errors
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
Example:
2. Appending or stacking:
Appending or stacking tables is effectively adding observations
from one table to another table.
The equivalent operation in set theory would be the union, and
this is also the command in SQL, the common language of
relational databases.
Other set operators are also used in data science, such as set
difference and intersection.
Example:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
3. View
Views are kind of virtual tables.
Can create a view by selecting fields from one or more tables
present in the database.
A View can either have all the rows of a table or specific rows
based on certain condition.
4 Data transformation
Certain models require their data to be in a certain shape.
Ensures that the data is in a suitable format for use in data
models.
Taking the log of the independent variables simplifies the
estimation problem dramatically.
Example – Refer Figure 1.9
Relationships between an input variable and an output variable aren’t always
linear.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
Histogram
In a histogram a variable is cut into discrete categories and the
number of occurrences in each category are summed up and shown in
the graph.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS UNIT 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering