0% found this document useful (0 votes)
25 views5 pages

IAT 2 Part A - DS

The document discusses various concepts related to data science including data exploration, binning, tasks of data science, comparing data science and big data, data science lifecycle, features of data science, applications of data science, data sampling, outliers, variance and covariance, conditional probability, eigen values and eigen vectors, descriptive analysis, features and data types of R programming, operators in R, advantages and disadvantages of R, and history of R. It also provides examples of data science applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views5 pages

IAT 2 Part A - DS

The document discusses various concepts related to data science including data exploration, binning, tasks of data science, comparing data science and big data, data science lifecycle, features of data science, applications of data science, data sampling, outliers, variance and covariance, conditional probability, eigen values and eigen vectors, descriptive analysis, features and data types of R programming, operators in R, advantages and disadvantages of R, and history of R. It also provides examples of data science applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Part-A_DS

1.Define Data Exploration.


Data exploration, also known as exploratory data analysis, involves computing descriptive statistics and
visualizing data to gain a comprehensive understanding of the dataset. It aims to reveal the structure,
distribution, presence of outliers, and inter-relationships within the data. Descriptive statistics such as mean,
median, mode, standard deviation, and range summarize key characteristics of data distribution. Visual plots,
on the other hand, offer an instant overview of all data points in a single chart, aiding in pattern recognition
and insights generation.
2.Define Binning.
Numeric values can be converted to categorical data types by a technique called binning, where a range
of values are specified for each category, for example, a score between 400 and 500 can be encoded as “low”
and so on.
3.Taks of data science.
Classification
Association analysis
Clustering and regression
4.Compare data science and big data.
Below is a table of differences between Big Data and Data Science:

Data Science Big Data

Big Data is a technique to collect, maintain and


Data Science is an area.
process huge information.

It is about the collection, processing, analyzing,


It is about extracting vital and valuable information
and utilizing of data in various operations. It is
from a huge amount of data.
more conceptual.

It is a field of study just like Computer Science, It is a technique for tracking and discovering trends in
Applied Statistics, or Applied Mathematics. complex data sets.

The goal is to make data more vital and usable i.e. by


The goal is to build data-dominant products for a
extracting only important information from the huge
venture.
data within existing traditional aspects.

Tools mainly used in Data Science include SAS, Tools mostly used in Big Data include Hadoop,
R, Python, etc Spark, Flink, etc.
Data Science Big Data

It is a superset of Big Data as data science consists


It is a sub-set of Data Science as mining activities
of Data scrapping, cleaning, visualization,
which is in a pipeline of Data science.
statistics, and many more techniques.

It is mainly used for business purposes and customer


It is mainly used for scientific purposes.
satisfaction.

It is more involved with the processes of handling


It broadly focuses on the science of the data.
voluminous data.

5.Summarize DS lifecycle

6.Feature of data science.


7. Application of data science.
DATA SCIENCE APPLICATIONS AND EXAMPLES
• Healthcare: Data science can identify and predict disease, and personalize healthcare
recommendations.
• Transportation: Data science can optimize shipping routes in real-time.
• Sports: Data science can accurately evaluate athletes’ performance.
• Government: Data science can prevent tax evasion and predict incarceration rates.
• E-commerce: Data science can automate digital ad placement.
• Gaming: Data science can improve online gaming experiences.
• Social media: Data science can create algorithms to pinpoint compatible partners.
• Fintech: Data science can help create credit reports and financial profiles, run accelerated underwriting
and create predictive models based on historical payroll data.
8.Data sampling.

9.Outliers.
Outliers are anomalies within a dataset, arising from correct or erroneous data capture, such as
extremely high incomes or measurement errors. Understanding and addressing outliers is crucial as they can
distort the representativeness of models derived from the data. Detecting outliers is essential in applications
like fraud or intrusion detection, where anomalies may indicate significant events or issues.
10.Compare variance and co-variance.
Variance:
The variance is the sum of the squared deviations of all data points divided by the number of data
points. For a dataset with N observations, the variance is given by the following equation
Covariance:
The covariance explains how two variables vary with respect to their cor responding mean values—if
both variables tend to stay on the same side of their respective means, the covariance would be positive, if not
it would be negative. (In statistics, covariance is also used in the calculation of correlation coefficient
11.Define conditional probability.

12.Eigen values and Eigen vectors.

13.Descriptive analysis.
Descriptive analytics is a statistical interpretation used to analyze historical data to identify patterns and
relationships. Descriptive analytics seeks to describe an event, phenomenon, or outcome. It helps understand
what has happened in the past and provides businesses the perfect base to track trends.
14.Feature of R programming.
15.Data types of R.

16.Operators of R.
1. Arithmetic Operators
2. Assignment Operators
3.Relational Operators
4. Logical Operators
5. Miscellaneous Operators

17.Advantage of R.
1. Extensive Statistical Analysis Capabilities
2. Rich Data Visualization Tools
3. Large and Active Community Support
4. Free and Open Source
5. Wide Range of Packages and Extensions
6. Integration with Other Languages and Tools
7. Cross-Platform Compatibility
8. Reproducible Research Environment
18.Disadvantage of R.
1. Steep Learning Curve
2. Memory Management
3. Single-threaded
4. Data Size Limitations
5. Limited Support for Object-Oriented Programming
6. Package Dependency Management
19.History of R.
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach
introductory statistics at the University of Auckland. The language was inspired by the S programming
language, with most S programs able to run unaltered in R.
20.Problems.

Hi Soldiers! By.Premkumar,Ramkishan,Subbiah

You might also like