Menu

[r6368]: / trunk / py4science / workbook / stats_descriptives.tex  Maximize  Restore  History

Download this file

54 lines (42 with data), 2.1 kB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
\section{Descriptive statistics}
\label{sec:stats_descriptives}
The first step in any statistical analysis should be to describe,
charaterize and importantly, visualize your data. The normal
distribution (aka Gaussian or bell curve) lies at the heart of much of
formal statistical analysis, and normal distributions have the tidy
property that they are completely characterized by their mean and
variance. As you may have observed in your interactions with family
and friends, most of the world is not normal, and many statistical
analyses are flawed by summarizing data with just the mean and
standard deviation (square root of variance) and associated
signficance tests (eg the T-Test) as if it were normally distributed
data.
In the exercise below, we write a class to provide descriptive
statistics of a data set passed into the constructor, with class
methods to pretty print the results and to create a battery of
standard plots which may show structure missing in a casual analysis.
Many new programmers, or even experienced programmers used to a
proceedural environment, are uncomfortable with the idea of classes,
having hear their geekier programmer friends talk about them but not
really sure what to do with them. There are many interesting things
one can do with classes (aka object oriented programming) but at their
hear they are a way of bundling data with methods that operate on that
data. The \texttt{self} variable is special in python and is how the
class refers to its own data and methods. Here is a toy example
\begin{lstlisting}
In [115]: class MyData:
.....: def __init__(self, x):
.....: self.x = x
.....: def sumsquare(self):
.....: return (self.x**2).sum()
.....:
.....:
In [116]: nse = npy.random.rand(100)
In [117]: mydata.sumsquare()
Out[117]: 29.6851135284
\end{lstlisting}
\lstinputlisting[label=code:stats_descriptives,caption={IGNORED}]{problems/stats_descriptives.py}
\begin{figure}
\begin{centering}\includegraphics[width=4in]{fig/stats_descriptives}\par\end{centering}
\caption{\label{fig:stats_descriptives}}
\end{figure}
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.