IAT 2 Part A - DS
IAT 2 Part A - DS
It is a field of study just like Computer Science, It is a technique for tracking and discovering trends in
Applied Statistics, or Applied Mathematics. complex data sets.
Tools mainly used in Data Science include SAS, Tools mostly used in Big Data include Hadoop,
R, Python, etc Spark, Flink, etc.
Data Science Big Data
5.Summarize DS lifecycle
9.Outliers.
Outliers are anomalies within a dataset, arising from correct or erroneous data capture, such as
extremely high incomes or measurement errors. Understanding and addressing outliers is crucial as they can
distort the representativeness of models derived from the data. Detecting outliers is essential in applications
like fraud or intrusion detection, where anomalies may indicate significant events or issues.
10.Compare variance and co-variance.
Variance:
The variance is the sum of the squared deviations of all data points divided by the number of data
points. For a dataset with N observations, the variance is given by the following equation
Covariance:
The covariance explains how two variables vary with respect to their cor responding mean values—if
both variables tend to stay on the same side of their respective means, the covariance would be positive, if not
it would be negative. (In statistics, covariance is also used in the calculation of correlation coefficient
11.Define conditional probability.
13.Descriptive analysis.
Descriptive analytics is a statistical interpretation used to analyze historical data to identify patterns and
relationships. Descriptive analytics seeks to describe an event, phenomenon, or outcome. It helps understand
what has happened in the past and provides businesses the perfect base to track trends.
14.Feature of R programming.
15.Data types of R.
16.Operators of R.
1. Arithmetic Operators
2. Assignment Operators
3.Relational Operators
4. Logical Operators
5. Miscellaneous Operators
17.Advantage of R.
1. Extensive Statistical Analysis Capabilities
2. Rich Data Visualization Tools
3. Large and Active Community Support
4. Free and Open Source
5. Wide Range of Packages and Extensions
6. Integration with Other Languages and Tools
7. Cross-Platform Compatibility
8. Reproducible Research Environment
18.Disadvantage of R.
1. Steep Learning Curve
2. Memory Management
3. Single-threaded
4. Data Size Limitations
5. Limited Support for Object-Oriented Programming
6. Package Dependency Management
19.History of R.
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach
introductory statistics at the University of Auckland. The language was inspired by the S programming
language, with most S programs able to run unaltered in R.
20.Problems.
Hi Soldiers! By.Premkumar,Ramkishan,Subbiah