100% found this document useful (1 vote)

68 views44 pages

HCI - Notes-Ch3

This document discusses data analysis techniques for clinical data, including descriptive statistics, inferential statistics, and machine learning methods. It covers summarizing categorical data using contingency tables and frequencies, summarizing numerical data using measures of central tendency and distribution shape. Key steps in a data science project like goal setting, data extraction, cleaning, feature engineering, model creation, and impact analysis are also outlined. Statistical analysis tools like Excel and Python are presented.

Uploaded by

Júlia Estorach Segarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

68 views44 pages

HCI - Notes-Ch3

Uploaded by

Júlia Estorach Segarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

3.

Clinical Data Analysis

• A Data Science Project
• Statistical Analysis of Health Care Data
– Descriptive Statistics
– Inferential Statistics
– Regression
• Artificial Intelligence Analysis of Health Care Data
– Unsupervised Machine Learning
– Supervised Machine Learning

1
A Data Science Project
• Data science is an inter-disciplinary field that uses scientific
methods, processes, algorithms and systems to extract
knowledge and insights from many structural and
unstructured data. It unifies statistical data analysis, machine
learning-based data analysis, domain knowledge and their
related methods in order to understand and analyze actual
phenomena and testing hypotheses, with data.

• A DS project follows the next steps:

1. Goals and Objectives Setting
2. Data Extraction
3. Data Cleaning
4. Feature Engineering
5. Model Creation
6. Impact Analysis

2
A Data Science Project

A DS project follows the next steps:

Goals and Objectives Setting:

1. Define the specific objectives of the project, such as reducing death rate by a
certain percentage within a defined time frame.
2. Set the goal of building a predictive model to identify potential patient survival
and implement proactive retention strategies.
Data Extraction:
1. Gather relevant data from various sources, such as Patients EHR.
2. Extract data related to patients' demographics, diseases, and clinical patterns.
Data Cleaning:
1. Identify and handle missing values, outliers, and inconsistencies in the dataset.
2. Standardize data formats, resolve discrepancies, and ensure data integrity.

3
A Data Science Project
A DS project follows the next steps:

Feature Engineering:
1. Create new features from the existing dataset that can potentially enhance the
predictive power of the model, such as calculating the average usage over a specific
time period, frequency of interactions, or customer tenure.
2. Transform and preprocess data to make it suitable for the model, such as one-hot
encoding categorical variables and scaling numerical features.
Model Creation:
1. Build a predictive model, such as a machine learning algorithm (e.g., logistic
regression, random forest, or neural network), to predict the likelihood of customer
churn based on the engineered features.
2. Train the model using historical data and evaluate its performance using appropriate
metrics, such as accuracy, precision, recall, and F1-score.
Impact Analysis:
1. Analyze the model's predictions and assess its effectiveness in identifying potential
churners.
2. Calculate the projected impact of implementing retention strategies based on the
model's predictions, such as estimated patient survival rate.

4
Data Cleaning
• Data cleaning (or data cleansing) is the process of detecting
and correcting (or removing) corrupt or inaccurate records
from a record set, table, or database. Wikipedia
• The main issues in data cleaning are:
– Missing Values: some data can be absent. For example, the blood pressure of some
patients can be unknown.
– Outliers: some data can be highly atypical. For example, some patients may be older
than 100.
– Errors: some data can be corrupt. For example, heart rate can become null because the
connected machine disconnects while moving the patient.
– Duplicated Data: some data can be redundant. For example, we could have birth date,
age, and admission date (birthdate = admission_date – age).
– Pre-Calculation: some required data can be calculated from the available data. For
example, body mass index can be calculated from patient’s height and weight
(BMI=weight(in Kg) / height(in m)2). Dependent Variables
– Useless Features: some features can be irrelevant to the current DS project. For
example, some study may not need patient’s gender information.
– Useless Cases: some cases can be irrelevant to the current DS project. For example,
pediatric studies may remove patients older than 18.
5
Feature Engineering and Model Creation
• Def (Feature engineering) the process of using domain knowledge to
extract features from raw data via data mining techniques. Wikipedia

• Def (Scientific modelling) the process of making a particular part or

feature of the world easier to understand, define, quantify, visualize, or
simulate by referencing it to existing and usually commonly accepted
knowledge. It identifies relevant aspects of a situation in the real world
and then uses different types of models for different aims, such as
conceptual models to better understand, operational models to
operationalize, mathematical models to quantify, and graphical models to
visualize the subject.

• In this course, Model Creation refers to the use of artificial intelligence

(AI) algorithms to process available data and produce mathematical
models (or Blackbox models) that can be used to infer facts about new
data. TO BE CONSIDERED LATER

6
Statistical Analysis of Health Care Data
• Sample Tools
– MS Excel
– Python

• Statistical Data Analysis: Descriptive Statistics

• Descriptive Statistics with Python
• Case Study 1: Data Description with Python

• Statistical Data Analysis: Inferential Statistics

• Inferential Statistics with Python
• Case Study 2: Data Analysis with Python

7
Sample Tool: MS Excel

• Rationale:
– Accessibility: “everybody” has MS Excel in its laptop.
– Applicability: health care professionals use to work with
MS Excel to store and process their databases.
– Simplicity: all the statistical involved in the course is easily
performed with MS Excel.

8
Sample Tool: Python

• Rationale:
– Accessibility: it is open access.
– Programmable: analyses can be embedded within
computer programs.
– Simple: with minor indications statistics with Python is
easy.
– Powerful: statistic functions are fast.
– Complete: python provides a great variety of statistical
functions implemented and ready to be used.

9
Types of variables
• The types of variables used in a study influence both the
descriptive and inferential statistics during the analysis phase

• However, need to pay attention to this is the planning phase

1. What type of data (i.e., variable) is it?

• Categorical: nominal or ordinal
• Numerical: continuous or discrete

2. How do you summarize and present it?

• Numerical summary statistics
• Graphs, charts

10
Categorical Variables (two or more groups or “categories” being
measured)

• Nominal – i.e. “names”→ descriptive only, no

natural order
• Examples: Gender, Race/ethnicity
• Ordinal variables
– The sequence of categories is meaningful – it is “ordered”
– We assign numbers to that value
– But the intervals between those numbers are not
meaningful (they are not equally spaced)
– Examples: Likert scales (1-5, strongly disagree, disagree,
neutral, agree, strongly agree)

11
Numerical Variables
*NUMERICAL– measurement can be quantified as a number
• Continuous – uninterrupted; any number is possible (e.g.
1.25)
– Examples:
• Age
• Temperature
• Discrete – integers; only some numbers are possible
– Examples:
• Number of children
• Number of strokes a patient has had

12
How do you summarize your data?
• Descriptive statistics: used to describe, organize, and summarize data
– Categorical variables: number (N), frequency (%)
– Numerical variables:
• mean + standard deviation
• median + range
• mode
• Graphs, charts:
– Categorical: contingency tables, bar charts
• Note: Not pie charts—relative areas of the pie are
difficult to distinguish!
– Numerical: shape of the distribution, box plots, histograms

13
Summarizing Categorical Data

• Contingency tables: number, frequency

Table 1. Barcelona residency match results, 2010

Residency Site N. patients %

Barcelona 23 24.7%
Tarragona 15 16.1%
Other 55 59.1%
Total 93 100.0%

14
Summarizing Numerical Data
Key questions to ask:
• How are the data distributed?
– Where is the center?
– What is the range?
– What’s the shape of the distribution? (e.g.,
Gaussian (normal), right- or left-skewed)

• Are there “outliers”?

• Are there data points that don’t make sense?

15
“Where is the center”?
Measures of central tendency
• Mean
• Median
• Mode

16
“Where is the center?”
Measures of central tendency: Mean
• Mean – the average; the balancing point
– Calculation: the sum of values divided by the sample size
– In math shorthand: n

 x X + X ++ X
X = i =1
= 1 2 n

n n
• Example Mean calculation:
– Age of participants: 17 19 21 22 23 23 23 38
n

X i
17 +19 + 21+ 22 + 23 + 23 + 23 + 38
X= i=1
= = 23.25
n 8

17
“Where is the center?”
Measures of central tendency: Mean

• Should not be used with ordinal data

• The mean is affected by extreme values (outliers)
Scenario One Scenario Two

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5

Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

18
“Where is the center?”
Measures of central tendency: Median
• Median – the exact middle value
Calculation:
– If there are an odd number of observations, find the middle value.
– If there are an even number of observations, find the middle two
values and average them.
• Example:
– Age of participants: 17 19 21 22 23 23 23 38

Median = (22+23)/2 = 22.5

19
“Where is the center?”
Measures of central tendency: Median

• The median is NOT affected by extreme values (outliers).

Scenario One
Scenario Two
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

20
“Where is the center?”
Measures of central tendency: Mode

• Mode – the value that occurs most frequently

• Example:
– Age of participants: 17 19 21 22 23 23 23 38

Mode = 23

21
Deciding which measure of central
tendency to use

• Need to see the shape of the distribution

– Describes how data are distributed

– Helps us decide whether to use the mean or the

median to describe numerical data

22
Which measures of central tendency to
use? Shape of the distribution
Shape describes how numerical data are distributed:
1. Symmetric: same shape on both sides of the mean.
– Mean = Median
2. Skewed: outlying observations occur in only one direction
– Left skewed: outlying values are small (left of center)
• Mean < Median
– Right skewed: outlying values are large (right of center)
• Mean > Median

23
Shape of the distribution

Symmetric Outliers to the LEFT

Mean = Median of center pull the Left-Skewed
Mean < Median
mean LEFT, so the
mean is LESS than
the median

The median is the

CENTER (and it doesn’t Outliers to the Right-Skewed
change due to RIGHT of center pull Mean > Median
outliers), so think of the mean to the
RIGHT, so the mean
whether outliers pull
is MORE than the
the mean to the left or
median
right

24
Which measure of central tendency to
use? General guidelines

1. Mean: numerical data and symmetric distribution

2. Median: ordinal data or numerical data

if skewed distribution

3. Mode: bimodal distributions

25
“What is the range?”
Measures of Variation/Spread
Measures of variation give information on the spread or variability
of the data values.
– Range
– Percentiles/quartiles
– Interquartile range (IQR)
Same center,
– Standard deviation/Variance different variation

26
Quartiles

25% 25% 25% 25%

Q1 Q2 Q3
• The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger.
• Q2 is the same as the median
• 50% are smaller, 50% are larger
• Only 25% of the observations are greater than the third
quartile Q3

27
Interquartile Range
• Interquartile range = 3rd quartile – 1st quartile = Q3 – Q1
• IQR contains the central 50% of the observations.

28
Variance
• Average (roughly) of squared deviations of values from the mean
Mean Mean
n
σ 𝑛 ሜ 2
 (x i − X) 2
𝞭2 =
𝑖 (𝑥𝑖 − 𝑋)
S2 = i
𝑛
n −1
(Sample) (Population)

• Why square the deviations?

– Adding deviations would be 0 (squares eliminate the negatives)
– Increasing contribution to variance further from mean

29
Standard Deviation
• Most commonly used measure of variation

• Shows variation about the mean: approximately the average

distance of each observation from the mean

• Is the square root of the variance; has same units as original data

 (x i − X )2
𝞭=
σ𝑛 ሜ 2
𝑖 (𝑥𝑖 −𝑋)
S = i
𝑛
n −1

(Sample) (Population)

30
Standard Deviation vs Variance

31
Calculation Example:
Sample Standard Deviation

Age data (N=8) : 17 19 21 22 23 23 23 38

N=8 Mean = X = 23.25

(17 − 23.25)2 + (19 − 23.25)2 +  + (38 − 23.25)2

S=
8−1
280
= = 6.3
7

32
Comparing Standard Deviations

Mean = 15.5
SD = 3.338
Data A
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
Data B SD = 0.926
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5
Data C
11 12 13 14 15 16 17 18 19 20 21
SD = 4.570

Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

33
The Normal Distribution

Changing σ (SD) increases or

decreases the spread.

• Most common data distribution: bell curve,

normal curve, Gaussian distribution.
• Centered on the “mean” value (also the
most frequently occurring value)
• Symmetric on either side of mean
• Can be narrow or wide. Width described by
the standard deviation (how values deviate
from the mean)

34
The beauty of the normal curve
Normal distribution: 68-95-99.7 Rule

For any normal distribution with mean  and

standard deviation :
• 68% of the observations fall within one standard
deviation of the mean (in the interval [-, +])
• 95% of the observations fall within two standard
deviations of the mean (in the interval [-2,
+2])
• 99.7% of the observations fall within three
standard deviations of the mean (in the interval
[-3, +3]).
• Almost all values fall within 3 standard
deviations.
-3 SD -2 SD -1 SD mean 1 SD 2 SD 3 SD

Confidence intervals : e.g., 95%CI = [mean  1.96*st.dev/n]

99%CI = [mean  2.58*st.dev/n]
35
Which measures of variability to use?
General guidelines
• Standard deviation
• When the mean is used (numerical data and symmetric
distribution)
• Percentiles and IQR:
• When median is used (i.e., ordinal data or skewed numerical data)
• When mean is used but objective is to compare individual
observations with a set of norms
• IQR
• To describe the central 50% of a distribution, regardless of shape
• Range
• Used with numerical data to emphasize extreme values

36
Graphical displays of numerical data

• To show the distribution (shape, center, range,

variation) of continuous variables.

1. The Box Plot

2. The Histogram

37
Displaying numerical data
Box plot (aka box-and-whisker plot)
• Graphically shows the quartiles, mean, median,
maximum, and minimum of the data.
– The ends of the box are drawn at the first and third quartiles
(cut-offs for the lowest and highest 25%)
– A line is drawn inside the box at the median (exact middle value)
– Lines, called whiskers, are extended from the ends of the box
out to the minimum and maximum
– Often, the mean (the average) is indicated with an asterisk,
cross, or dotted line

38
Box Plot for a symmetric distribution

Maximum or
Q3 +
(1.5*IQR)

75th percentile (Q3)

Value

Interquartile Range
* Median (Q2)

25th percentile (Q1)

Minimum or
Q1-(1.5*IQR)

Variable name
Displaying numerical data:
Histogram
• Gives the percentage (proportion) of the study population in
ranges of the continuous variable or in categories
• X-axis: measure of interest
• Y-axis: number or percentage of observations

Histogram
25.0

16.7
Count

8.3

0.0
20.0 23.3 26.7 30.0
Age
Distribution shape and box plot

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

* *

Mean < Median Mean = Median Mean > Median

Statistical Data Analysis: Descriptive Statistics
Descriptive statistics is a branch of statistics aiming at quantitatively
describe or summarize features of a collection of data.
• Qualitative variables: proportion or percentage of occurrence of
each variable value (e.g., percentage of patients taking one drug).
• Quantitative variables:
– Measures of central tendency
• Mean: arithmetic average of the values.
• Median: middle value of the set of values. mode σ𝑛𝑖=1 𝑥𝑖
𝑚𝑒𝑎𝑛 =
• Mode: most commonly observed value of the set of values. median 𝑛
– Measures of dispersion or variability
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2
• Variance and standard deviation: st. dev = square-root(variance) 𝑠=
𝑛−1
– ~68% of the cases are in the interval [mean  st.dev]
– ~95% of the cases are in the interval [mean  2*st.dev]

• Confidence intervals: e.g., 95%CI = [mean  1.96*st.dev/n]

99%CI = [mean  2.58*st.dev/n]

• Interquartile range: obtain the first and third quartiles Q1 and Q3, then [Q1, Q3] in
the interquartile range containing 50% of the data.

42
Statistical Description of Data
N 451
Variable Type Mean St. Dev. 95% CI
Age Numeric 46.4080 16.4298 44.8916 47.9243
Sex Categoric male 44.79% female 55.21%
Height Numeric 166.1353 37.1946 162.7025 169.5681
Population Description
Weight Numeric 68.1441 16.5998 66.6121 69.6762
QRS duration Numeric 88.9224 15.3814 87.5028 90.3420
P-R interval Numeric 155.0953 44.8755 150.9537 159.2370
Q-T interval Numeric 367.2239 33.4208 364.1395 370.3084
T interval Numeric 169.9335 35.6711 166.6413 173.2257
P interval Numeric 89.9756 25.8480 87.5900 92.3612
Heart Rate Numeric 74.4634 13.8707 73.1833 75.7436
Ragged R wave Categoric exists 0.22% not exists 99.78%
Diphasic der R valveCategoric exists 1.11% not exists 98.89%

Comparative Population Description

Male Female All
N 202 44.79% 249 55.21% 451 100%
Variable Type Mean St. Dev. 95% CI Mean St. Dev. 95% CI Mean St. Dev. 95% CI
Age Numeric 47.4109 16.4466 45.1428 49.6790 45.5944 16.4042 43.556816 47.631939 46.4080 16.4298 0.4978 37.1946
Height Numeric 171.2228 72.6881 147.6103 194.8353 162.0080 39.8710 157.05566 166.9604 166.1353 37.1946 162.7025 169.5681
Weight Numeric 72.6881 171.2228 59.6308 85.7454 64.4578 14.7585 62.624677 66.290986 68.1441 16.5998 66.6121 69.6762
QRS duration Numeric 94.6832 94.6832 72.9829 116.3834 84.2490 14.4695 82.451745 86.046246 88.9224 15.3814 87.5028 90.3420
P-R interval Numeric 157.3564 157.3564 107.0805 207.6324 153.2610 45.9316 147.55589 158.9662 155.0953 44.8755 150.9537 159.2370
Q-T interval Numeric 364.5693 364.5693 340.1280 389.0106 369.3775 33.4351 365.22454 373.53048 367.2239 33.4208 364.1395 370.3084
T interval Numeric 177.2327 177.2327 164.5085 189.9568 164.0120 35.4926 159.60352 168.42058 169.9335 35.6711 166.6413 173.2257
P interval Numeric 92.2673 92.2673 82.1306 102.4040 88.1165 25.5852 84.938527 91.294404 89.9756 25.8480 87.5900 92.3612
Heart Rate Numeric 73.5050 73.5050 63.3682 83.6417 75.2410 12.8728 73.64204 76.839888 74.4634 13.8707 73.1833 75.7436
Ragged R wave Categoric exists 0.00% not exists 100.00% exists 0.40% not exists 99.60% exists 0.22% not exists 99.78%
Diphasic der R valveCategoric exists 0.99% not exists 99.01% exists 1.20% not exists 98.80% exists 1.11% not exists 98.89%

43
Descriptive Statistics with Python
• Context
Numpy : Python library adding support for large, multi-dimensional arrays and matrices, along
import numpy as np with a large collection of high-level mathematical functions to operate on these arrays.
import statistics Scipy: Python library on Numpy that provides additional functions for optimization, linear
import scipy algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE
solvers and other tasks common in science and engineering.
import math
Import collections
• Qualitative Variables
data = ['a','a','a','e','e','e','e','e','e','i','o','o','o','o','o','u','u','u','u','u']
Frequency all categories: collections.Counter(data)
Frequency of one category: collections.Counter(data)[‘a’]
Percentage: collections.Counter(data)[‘a’]/len(data)*100
• Quantitative Variables
data = [1,2,3,3,3,4,4,4,4,5,5,5,6,6,7,7,7,7,7,8,10,10,10]
N: len(data)
Mean: statistics.mean(data)
Median: statistics.median(data)
Mode: statistics.mode(data)
Variance: statistics.variance(data)
Std. Deviation: statistics.stdev(data)
95% CI (normal): from scipy import stats n = 23
cv = stats.norm.ppf(0.975) mean = 5.565217391304348
error = cv * stdev / math.sqrt(n) median = 5
CI = (mean – error, mean + error) mode = 7
Quartiles: q1 = np.quantile(data, 0.25) variance = 6.3478260869565215
q3 = np.quantile(data, 0.75)
standard deviation = 2.5194892512087685
See the presentation to data description CI (95%) = (4.5355506551396925, 6.594884127469003)
Quartiles = 4.0 5.0 7.0
and visualization in Python 44

Thermo King Tool Catalog Part 2
100% (1)
Thermo King Tool Catalog Part 2
53 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
Cfa (Initial
No ratings yet
Cfa (Initial
15 pages
Les Anecdotes de Florence PDF
100% (1)
Les Anecdotes de Florence PDF
378 pages
HIRARC Form
50% (2)
HIRARC Form
43 pages
Statistics N Probability
No ratings yet
Statistics N Probability
31 pages
TS DRA 2022 en Create Drawings
No ratings yet
TS DRA 2022 en Create Drawings
1,070 pages
COMP5310 Notes
No ratings yet
COMP5310 Notes
10 pages
School of Public Health: Haramaya University, Chms
100% (1)
School of Public Health: Haramaya University, Chms
40 pages
PROCESS Vs SEM
No ratings yet
PROCESS Vs SEM
6 pages
E-Note 33325 Content Document 20250319114322AM
No ratings yet
E-Note 33325 Content Document 20250319114322AM
69 pages
07 ASAP Business Analytics Probability
No ratings yet
07 ASAP Business Analytics Probability
74 pages
1 - Introduction To Health Care Data Analytics (Bagian 2)
No ratings yet
1 - Introduction To Health Care Data Analytics (Bagian 2)
31 pages
Gis&Cad Lab Manual
No ratings yet
Gis&Cad Lab Manual
53 pages
Research Variables
100% (1)
Research Variables
108 pages
Common Types of Variables
No ratings yet
Common Types of Variables
5 pages
SYBBA CA Sem IV Labbook
No ratings yet
SYBBA CA Sem IV Labbook
116 pages
Stages of Development of HRIS
50% (2)
Stages of Development of HRIS
15 pages
Felcom SSASInfo SVC Manual
No ratings yet
Felcom SSASInfo SVC Manual
56 pages
Statistical Computing I
No ratings yet
Statistical Computing I
187 pages
Solidworks Tutorial
No ratings yet
Solidworks Tutorial
14 pages
Dokumen - Tips - Spss Lecture Notes
100% (1)
Dokumen - Tips - Spss Lecture Notes
58 pages
Online Freelancing in The Philippines
No ratings yet
Online Freelancing in The Philippines
22 pages
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
No ratings yet
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
19 pages
Playstation 4 Setup Manual: Model Cuh-2215B
No ratings yet
Playstation 4 Setup Manual: Model Cuh-2215B
9 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
10 pages
Data Arrangement and Presentation Formation of Tables and Charts
No ratings yet
Data Arrangement and Presentation Formation of Tables and Charts
55 pages
02-03 ASAP Business Analytics-2 Descriptive Statistics
No ratings yet
02-03 ASAP Business Analytics-2 Descriptive Statistics
109 pages
CSC118 - Fundamentals of Algorithm Development
0% (1)
CSC118 - Fundamentals of Algorithm Development
3 pages
Digital Instrumentation
No ratings yet
Digital Instrumentation
1 page
STATA Notes 2022
No ratings yet
STATA Notes 2022
25 pages
Govt - Polytechnic College Nedumangadu: Seminar Report ON
No ratings yet
Govt - Polytechnic College Nedumangadu: Seminar Report ON
29 pages
Measures in Epidemiology 03-2017
No ratings yet
Measures in Epidemiology 03-2017
46 pages
Data Analysis
100% (1)
Data Analysis
28 pages
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
No ratings yet
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
7 pages
Master of Statistics
100% (1)
Master of Statistics
24 pages
SwapMagic v3.6 UserManual
No ratings yet
SwapMagic v3.6 UserManual
2 pages
Quantitative and Qualitative Methods in Medical Education Research AMEE Guide No 90 Part II
No ratings yet
Quantitative and Qualitative Methods in Medical Education Research AMEE Guide No 90 Part II
12 pages
Logit Model For Binary Data
No ratings yet
Logit Model For Binary Data
50 pages
5 - Test of Hypothesis (Part - 1)
No ratings yet
5 - Test of Hypothesis (Part - 1)
44 pages
Swot
No ratings yet
Swot
9 pages
Topic 1
100% (1)
Topic 1
37 pages
Introduction To Statistical Computing in Clinical Research: Biostatistics 212
No ratings yet
Introduction To Statistical Computing in Clinical Research: Biostatistics 212
39 pages
Drive Spares Old PDF
No ratings yet
Drive Spares Old PDF
3 pages
Introduction To Research Designs
No ratings yet
Introduction To Research Designs
52 pages
BFA TCP Exam Brief 21-22 Final 080322
No ratings yet
BFA TCP Exam Brief 21-22 Final 080322
5 pages
Experiment - 7 Single-Phase Half Wave Voltage Multiplier 7-1 Object
No ratings yet
Experiment - 7 Single-Phase Half Wave Voltage Multiplier 7-1 Object
2 pages
Descriptive Statistics: Numerical Summary Measures: Single Numbers Which Quantify The
No ratings yet
Descriptive Statistics: Numerical Summary Measures: Single Numbers Which Quantify The
85 pages
Embedded System Assignment
No ratings yet
Embedded System Assignment
14 pages
Topic03 Correlation Regression
No ratings yet
Topic03 Correlation Regression
81 pages
Statistics
No ratings yet
Statistics
9 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
File Management Explained
No ratings yet
File Management Explained
5 pages
Estimation in Statistics
100% (1)
Estimation in Statistics
4 pages
Lecture 7 - X-S, P & NP Charts
No ratings yet
Lecture 7 - X-S, P & NP Charts
28 pages
Linear Mixed Effects Modeling in SPSS
No ratings yet
Linear Mixed Effects Modeling in SPSS
29 pages
Levine Smume6 01
100% (1)
Levine Smume6 01
14 pages
OUTLIERS
100% (1)
OUTLIERS
5 pages
Scale of Data Measurement
No ratings yet
Scale of Data Measurement
11 pages
Effect of Speedometer Positioning: Distraction and Workload While Driving
No ratings yet
Effect of Speedometer Positioning: Distraction and Workload While Driving
6 pages
The Development of Psychometrics
No ratings yet
The Development of Psychometrics
9 pages
Statistical Packages - SPSS - ABH
No ratings yet
Statistical Packages - SPSS - ABH
68 pages
Lecture 2A - Biological Variability, Descriptive Stats
No ratings yet
Lecture 2A - Biological Variability, Descriptive Stats
9 pages
Measuring The Occurrence of Disease: Dr. Elijah Kakande MBCHB, MPH Department of Public Health
No ratings yet
Measuring The Occurrence of Disease: Dr. Elijah Kakande MBCHB, MPH Department of Public Health
25 pages
STATA Codes - Basic
No ratings yet
STATA Codes - Basic
8 pages
How To Use NFC Shield With Arduino and Demo Code
No ratings yet
How To Use NFC Shield With Arduino and Demo Code
8 pages
Descriptive and Causal Research
No ratings yet
Descriptive and Causal Research
18 pages
Dsur I Chapter 18 Categorical Data
No ratings yet
Dsur I Chapter 18 Categorical Data
47 pages
Statatistical Inferences
No ratings yet
Statatistical Inferences
22 pages
Chapter 6 Section 4-5: Probability: Multiple Choice
No ratings yet
Chapter 6 Section 4-5: Probability: Multiple Choice
7 pages
List of Statistical Packages
No ratings yet
List of Statistical Packages
2 pages
A Lesson 1 Introduction To Statistics & SPSS
100% (1)
A Lesson 1 Introduction To Statistics & SPSS
8 pages
Module 1.1 Stata For Beginners
100% (1)
Module 1.1 Stata For Beginners
3 pages
Psychology Revision: Research Methods
No ratings yet
Psychology Revision: Research Methods
17 pages
SPSSnotes
No ratings yet
SPSSnotes
60 pages
Insert Pic 1 To 6: Proudabmstudent Thebridgeintriumph
No ratings yet
Insert Pic 1 To 6: Proudabmstudent Thebridgeintriumph
4 pages
Types of Statistical Tests
No ratings yet
Types of Statistical Tests
5 pages
Exercises - SPSS
No ratings yet
Exercises - SPSS
6 pages
1.medical Statistics
No ratings yet
1.medical Statistics
33 pages
Random Effects Models
No ratings yet
Random Effects Models
37 pages
Industry Essentials: Enterprise Storage: Delivery Type
No ratings yet
Industry Essentials: Enterprise Storage: Delivery Type
1 page
Design of An Extremely High Performance Counter Mode AES Reconfigurable Processor
No ratings yet
Design of An Extremely High Performance Counter Mode AES Reconfigurable Processor
7 pages
Informe 1
No ratings yet
Informe 1
4 pages
9.data Analysis
No ratings yet
9.data Analysis
25 pages
SNP Log
No ratings yet
SNP Log
3 pages
Sample Size Dr. Karmran
No ratings yet
Sample Size Dr. Karmran
5 pages
Work Breakdown Structure
No ratings yet
Work Breakdown Structure
1 page
Partial Correlation
No ratings yet
Partial Correlation
2 pages
Prognosis Appraisal Tools
No ratings yet
Prognosis Appraisal Tools
2 pages

HCI - Notes-Ch3

Uploaded by

HCI - Notes-Ch3

Uploaded by

3.

Clinical Data Analysis

• A DS project follows the next steps:

A DS project follows the next steps:

Goals and Objectives Setting:

• Def (Scientific modelling) the process of making a particular part or

• In this course, Model Creation refers to the use of artificial intelligence

• Statistical Data Analysis: Descriptive Statistics

• Statistical Data Analysis: Inferential Statistics

• However, need to pay attention to this is the planning phase

1. What type of data (i.e., variable) is it?

2. How do you summarize and present it?

• Nominal – i.e. “names”→ descriptive only, no

• Contingency tables: number, frequency

Residency Site N. patients %

• Are there “outliers”?

• Are there data points that don’t make sense?

• Should not be used with ordinal data

Median = (22+23)/2 = 22.5

• The median is NOT affected by extreme values (outliers).

• Mode – the value that occurs most frequently

• Need to see the shape of the distribution

– Helps us decide whether to use the mean or the

Symmetric Outliers to the LEFT

The median is the

1. Mean: numerical data and symmetric distribution

2. Median: ordinal data or numerical data

3. Mode: bimodal distributions

25% 25% 25% 25%

• Why square the deviations?

• Shows variation about the mean: approximately the average

Age data (N=8) : 17 19 21 22 23 23 23 38

(17 − 23.25)2 + (19 − 23.25)2 +  + (38 − 23.25)2

Changing σ (SD) increases or

• Most common data distribution: bell curve,

For any normal distribution with mean  and

Confidence intervals : e.g., 95%CI = [mean  1.96*st.dev/n]

• To show the distribution (shape, center, range,

1. The Box Plot

75th percentile (Q3)

25th percentile (Q1)

Left-Skewed Symmetric Right-Skewed

Mean < Median Mean = Median Mean > Median

• Confidence intervals: e.g., 95%CI = [mean  1.96*st.dev/n]

Comparative Population Description

You might also like