0% found this document useful (0 votes)

25 views5 pages

IAT 2 Part A - DS

The document discusses various concepts related to data science including data exploration, binning, tasks of data science, comparing data science and big data, data science lifecycle, features of data science, applications of data science, data sampling, outliers, variance and covariance, conditional probability, eigen values and eigen vectors, descriptive analysis, features and data types of R programming, operators in R, advantages and disadvantages of R, and history of R. It also provides examples of data science applications.

Uploaded by

21CSE015 Keerthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views5 pages

IAT 2 Part A - DS

Uploaded by

21CSE015 Keerthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Part-A_DS

1.Define Data Exploration.

Data exploration, also known as exploratory data analysis, involves computing descriptive statistics and
visualizing data to gain a comprehensive understanding of the dataset. It aims to reveal the structure,
distribution, presence of outliers, and inter-relationships within the data. Descriptive statistics such as mean,
median, mode, standard deviation, and range summarize key characteristics of data distribution. Visual plots,
on the other hand, offer an instant overview of all data points in a single chart, aiding in pattern recognition
and insights generation.
2.Define Binning.
Numeric values can be converted to categorical data types by a technique called binning, where a range
of values are specified for each category, for example, a score between 400 and 500 can be encoded as “low”
and so on.
3.Taks of data science.
Classification
Association analysis
Clustering and regression
4.Compare data science and big data.
Below is a table of differences between Big Data and Data Science:

Data Science Big Data

Big Data is a technique to collect, maintain and

Data Science is an area.
process huge information.

It is about the collection, processing, analyzing,

It is about extracting vital and valuable information
and utilizing of data in various operations. It is
from a huge amount of data.
more conceptual.

It is a field of study just like Computer Science, It is a technique for tracking and discovering trends in
Applied Statistics, or Applied Mathematics. complex data sets.

The goal is to make data more vital and usable i.e. by

The goal is to build data-dominant products for a
extracting only important information from the huge
venture.
data within existing traditional aspects.

Tools mainly used in Data Science include SAS, Tools mostly used in Big Data include Hadoop,
R, Python, etc Spark, Flink, etc.
Data Science Big Data

It is a superset of Big Data as data science consists

It is a sub-set of Data Science as mining activities
of Data scrapping, cleaning, visualization,
which is in a pipeline of Data science.
statistics, and many more techniques.

It is mainly used for business purposes and customer

It is mainly used for scientific purposes.
satisfaction.

It is more involved with the processes of handling

It broadly focuses on the science of the data.
voluminous data.

5.Summarize DS lifecycle

6.Feature of data science.

7. Application of data science.
DATA SCIENCE APPLICATIONS AND EXAMPLES
• Healthcare: Data science can identify and predict disease, and personalize healthcare
recommendations.
• Transportation: Data science can optimize shipping routes in real-time.
• Sports: Data science can accurately evaluate athletes’ performance.
• Government: Data science can prevent tax evasion and predict incarceration rates.
• E-commerce: Data science can automate digital ad placement.
• Gaming: Data science can improve online gaming experiences.
• Social media: Data science can create algorithms to pinpoint compatible partners.
• Fintech: Data science can help create credit reports and financial profiles, run accelerated underwriting
and create predictive models based on historical payroll data.
8.Data sampling.

9.Outliers.
Outliers are anomalies within a dataset, arising from correct or erroneous data capture, such as
extremely high incomes or measurement errors. Understanding and addressing outliers is crucial as they can
distort the representativeness of models derived from the data. Detecting outliers is essential in applications
like fraud or intrusion detection, where anomalies may indicate significant events or issues.
10.Compare variance and co-variance.
Variance:
The variance is the sum of the squared deviations of all data points divided by the number of data
points. For a dataset with N observations, the variance is given by the following equation
Covariance:
The covariance explains how two variables vary with respect to their cor responding mean values—if
both variables tend to stay on the same side of their respective means, the covariance would be positive, if not
it would be negative. (In statistics, covariance is also used in the calculation of correlation coefficient
11.Define conditional probability.

12.Eigen values and Eigen vectors.

13.Descriptive analysis.
Descriptive analytics is a statistical interpretation used to analyze historical data to identify patterns and
relationships. Descriptive analytics seeks to describe an event, phenomenon, or outcome. It helps understand
what has happened in the past and provides businesses the perfect base to track trends.
14.Feature of R programming.
15.Data types of R.

16.Operators of R.
1. Arithmetic Operators
2. Assignment Operators
3.Relational Operators
4. Logical Operators
5. Miscellaneous Operators

17.Advantage of R.
1. Extensive Statistical Analysis Capabilities
2. Rich Data Visualization Tools
3. Large and Active Community Support
4. Free and Open Source
5. Wide Range of Packages and Extensions
6. Integration with Other Languages and Tools
7. Cross-Platform Compatibility
8. Reproducible Research Environment
18.Disadvantage of R.
1. Steep Learning Curve
2. Memory Management
3. Single-threaded
4. Data Size Limitations
5. Limited Support for Object-Oriented Programming
6. Package Dependency Management
19.History of R.
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach
introductory statistics at the University of Auckland. The language was inspired by the S programming
language, with most S programs able to run unaltered in R.
20.Problems.

Hi Soldiers! By.Premkumar,Ramkishan,Subbiah

Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Business Analytics ASSIGNMENT Questions
No ratings yet
Business Analytics ASSIGNMENT Questions
20 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
AROGYA ADVANCE - Pdfdisplayname AROGYA ADVANCE
100% (1)
AROGYA ADVANCE - Pdfdisplayname AROGYA ADVANCE
2 pages
Applied Data Analysis
No ratings yet
Applied Data Analysis
128 pages
Class 9 (Chap #4)
No ratings yet
Class 9 (Chap #4)
9 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Data Science
No ratings yet
Data Science
46 pages
B Ei
No ratings yet
B Ei
44 pages
Appraisal Report
83% (6)
Appraisal Report
24 pages
FDS Notes
No ratings yet
FDS Notes
5 pages
Chapter 1 Introduction To Datascience
No ratings yet
Chapter 1 Introduction To Datascience
13 pages
Data Science
No ratings yet
Data Science
64 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Ixs8h l8mgc
No ratings yet
Ixs8h l8mgc
40 pages
Data Science
No ratings yet
Data Science
10 pages
02 Introduction - Fall 23-24
No ratings yet
02 Introduction - Fall 23-24
29 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
No ratings yet
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
6 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
2 Marks Foundations of Data Science
No ratings yet
2 Marks Foundations of Data Science
13 pages
Operation and Maintenance Manual: Effluent Treatment Plant
100% (2)
Operation and Maintenance Manual: Effluent Treatment Plant
49 pages
2 Marks With Answers
No ratings yet
2 Marks With Answers
39 pages
Notes Unit1 Unit2
No ratings yet
Notes Unit1 Unit2
83 pages
DSA Module 1 Notes
No ratings yet
DSA Module 1 Notes
24 pages
SAP SD Credit Memo, Debit Memo and Return Order
100% (2)
SAP SD Credit Memo, Debit Memo and Return Order
21 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
Uppen FP Series FP 2400Q Service Manual
No ratings yet
Uppen FP Series FP 2400Q Service Manual
47 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Intro Lectures To DSA
0% (1)
Intro Lectures To DSA
17 pages
Applied - Data - Science MODULE 1 SEM8
No ratings yet
Applied - Data - Science MODULE 1 SEM8
16 pages
November 2015
100% (3)
November 2015
100 pages
En 1044 1999
No ratings yet
En 1044 1999
28 pages
Data Science
No ratings yet
Data Science
2 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
Kadir
No ratings yet
Kadir
84 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
FDS Unit 1 QB
No ratings yet
FDS Unit 1 QB
7 pages
DS Unit-1 PDF
No ratings yet
DS Unit-1 PDF
50 pages
3.question Bank
No ratings yet
3.question Bank
7 pages
DA-1,2,3 (1) Merged
No ratings yet
DA-1,2,3 (1) Merged
39 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
Data Science Techniques AND PREDICTIONS
No ratings yet
Data Science Techniques AND PREDICTIONS
5 pages
Writing of An Application Letter: Discussions
100% (1)
Writing of An Application Letter: Discussions
13 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
Session 1819
No ratings yet
Session 1819
47 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Science Modern Technology5
No ratings yet
Data Science Modern Technology5
6 pages
Impact of Data Science Across Industries
No ratings yet
Impact of Data Science Across Industries
3 pages
ACW Flow Calculation Basis
No ratings yet
ACW Flow Calculation Basis
4 pages
Fda 1
No ratings yet
Fda 1
5 pages
Sbi General Set PPT 2012
No ratings yet
Sbi General Set PPT 2012
20 pages
Mini Project 1.. 1
No ratings yet
Mini Project 1.. 1
15 pages
ĐỀ THI THỬ SỐ 10 - Khóa Đề
No ratings yet
ĐỀ THI THỬ SỐ 10 - Khóa Đề
6 pages
Power Engineering (Trivia 3)
No ratings yet
Power Engineering (Trivia 3)
7 pages
Unit 5 (C++) - Function
No ratings yet
Unit 5 (C++) - Function
102 pages
Leeb Hardness Tester
No ratings yet
Leeb Hardness Tester
4 pages
Nitoprime Zincrich TDS
No ratings yet
Nitoprime Zincrich TDS
2 pages
AY 2025-26 SPPU Guidelines For OJT
No ratings yet
AY 2025-26 SPPU Guidelines For OJT
2 pages
Chemical Burn
No ratings yet
Chemical Burn
32 pages
Document 4
No ratings yet
Document 4
27 pages
Week 7 Milestone Worksheet Completed
No ratings yet
Week 7 Milestone Worksheet Completed
16 pages
Javascriptinterviewquestions 240713104909 D9bedd8b
No ratings yet
Javascriptinterviewquestions 240713104909 D9bedd8b
25 pages
Magnetostriction and Applications of Ultrasonic Waves: 15Z204 - Materials Science
No ratings yet
Magnetostriction and Applications of Ultrasonic Waves: 15Z204 - Materials Science
17 pages
Sci C IRDL NM Document
No ratings yet
Sci C IRDL NM Document
4 pages
TCA 1 Hard Surface Flooring Proposal and Reason Statement
No ratings yet
TCA 1 Hard Surface Flooring Proposal and Reason Statement
2 pages
FPGA TN 02136 1 8 LatticeECP3 SPI Slave Port
No ratings yet
FPGA TN 02136 1 8 LatticeECP3 SPI Slave Port
22 pages
9th Major-4 English NCERT Paper Zdyxcq
No ratings yet
9th Major-4 English NCERT Paper Zdyxcq
7 pages
Ex2 - Building User Interface
No ratings yet
Ex2 - Building User Interface
14 pages
Pressure Volume Curve 2005
No ratings yet
Pressure Volume Curve 2005
22 pages
Probability Althea
No ratings yet
Probability Althea
8 pages
Apd 8
No ratings yet
Apd 8
6 pages
Sending SMS and Making Phone Calls
No ratings yet
Sending SMS and Making Phone Calls
6 pages
Aqautec Ocean Parts Manual
No ratings yet
Aqautec Ocean Parts Manual
4 pages
Intent
No ratings yet
Intent
5 pages
China Suzhou Retail Q4 2019 ENG
No ratings yet
China Suzhou Retail Q4 2019 ENG
2 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet

IAT 2 Part A - DS

Uploaded by

IAT 2 Part A - DS

Uploaded by

Part-A_DS

1.Define Data Exploration.

Data Science Big Data

Big Data is a technique to collect, maintain and

It is about the collection, processing, analyzing,

The goal is to make data more vital and usable i.e. by

It is a superset of Big Data as data science consists

It is mainly used for business purposes and customer

It is more involved with the processes of handling

6.Feature of data science.

12.Eigen values and Eigen vectors.

You might also like