Data Analysis
Data Analysis
DATA ANALYSIS
Jhansi Lakshmi K P
Assistant Professor
Department of Mechanical Engineering
GEC, Haveri
According to Forbes, the data analytics profession is exploding. The United States Bureau of Labor Statistics
forecasts impressively robust growth for data science jobs skills and predicts that the data science field will
grow about 28 percent through 2026. Amstat.org backs up these predictions, reporting that, by the end of 2021,
almost 70 percent of business leaders surveyed will look for prospective job candidates that have data skills.
Starting off as a Data Analysis, you can quickly move into Senior Analyst, then Analytics Manager, Director of
Analytics, or even Chief Data Officer (CDO).
Data is defined as” information, especially facts or numbers, collected to be examined and considered and used
to help decision making, or information in an electronic form that can be stored and used”
Data analysis is the process of cleaning, changing, and processing raw data, and extracting actionable, relevant
information that helps businesses make informed decisions. The procedure helps reduce the risks inherent in
decision-making by providing useful insights and statistics, often presented in charts, images, tables, and
graphs.
The descriptive analysis tool generates a report of univariate statistics for data the input range , providing
information about the central tendency and variability of data.
The different measures that can be found in descriptive analysis are:
Standard error: The standard error of the mean, or simply standard error, indicates how different the
population mean is likely to be from a sample mean. It tells you how much the sample mean would
vary if you were to repeat a study using new samples from within a single population
Standard deviation: The standard deviation is the average amount of variability in your dataset. It
tells you, on average, how far each value lies from the mean. A high standard deviation means that
values are generally far from the mean, while a low standard deviation indicates that values are clustered
close to the mean.
Sample variance :Sample variance can be defined as the expectation of the squared difference of data
points from the mean of the data set. It is an absolute measure of dispersion and is used to check the
deviation of data points with respect to the data's average.
Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal
distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.
In finance, kurtosis is used as a measure of financial risk. A large kurtosis is associated with a high risk for an
investment because it indicates high probabilities of extremely large and extremely small returns. On the other hand, a
small kurtosis signals a moderate level of risk because the probabilities of extreme returns are relatively low.
A skewed distribution occurs when one tail is longer than the other. Skewness defines the asymmetry of a
distribution. Unlike the familiar normal distribution with its bell-shaped curve, these distributions are
asymmetric. The two halves of the distribution are not mirror images because the data are not distributed
equally on both sides of the distribution’s peak.
A general guideline for skewness is that if the number is greater than +1 or lower than –1, this is an
indication of a substantially skewed distribution. For kurtosis, the general guideline is that if the number is
greater than +1, the distribution is too peaked. Likewise, a kurtosis of less than –1 indicates a distribution that
is too flat. Distributions exhibiting skewness and/or kurtosis that exceed these guidelines are considered non
normal."
DIFFERENCE BETWEEN
CORRELATION AND
REGRESSION
Correlation Regression