This document outlines the course structure and objectives for a biostatistics lecture series being taught in the fall 2022 semester. The 16-week course will cover topics in probability, statistical inference, and common biostatistical methods. Students will learn to apply statistical analysis to evaluate public health and biomedical research studies.
This document outlines the course structure and objectives for a biostatistics lecture series being taught in the fall 2022 semester. The 16-week course will cover topics in probability, statistical inference, and common biostatistical methods. Students will learn to apply statistical analysis to evaluate public health and biomedical research studies.
Module1 2:00~3:00 Statistics Theory Required • Textbook: Principles of Module2 3:00~4:00 R programing Required Biostatistics by Marcello Pagano Module3 4:00~5:00 R workshop and Kimberlee Gauvreau Week Mon Day Topics Professor • Software: R 1 9 7 Introduction Min Jin Ha • Slides and R workshop files will be 2 14 Data Presentation/Descriptive Statistics Min Jin Ha 3 21 Probability Min Jin Ha provided 4 28 Theoretical Probability Distributions Min Jin Ha • GHPF Evaluation: Attendance (20%), 5 10 5 Sampling Distribution of the Mean Min Jin Ha Mid-term (40%), Final Exam (40%) 6 12 Estimation/Confidence Intervals Min Jin Ha 7 19 Mid-term Examination Min Jin Ha • GCID Evaluation: Attendance 8 26 Hypothesis Testing Min Jin Ha (20%), Mid-term (30%), Final 9 11 2 Comparison of Two Means Min Jin Ha Exam (30%), Assignments(20%) 10 9 Analysis of Variance Min Jin Ha 11 16 Analysis of Variance Min Jin Ha 12 23 Nonparametric Methods Min Jin Ha 13 30 Correlation Min Jin Ha 14 12 7 Simple Linear Regression Min Jin Ha 15 14 Multiple Regression Min Jin Ha 16 21 Final Exam Min Jin Ha Readings • Pagano and Gauvreau, Chapters 1 and 2.1 What is Biostatistics? • Statistics is the science of obtaining, analyzing and interpreting data
• When the focus is on the biological and health sciences, we use the term Biostatistics
• Biostatisticians forge advances in science that benefit human health
through innovations in biostatistical methodology and theory as well as the thoughtful implementation of biostatistical methods in practice Welcome to Biostatistics Biostatistics lectures • are introductory course in probability and statistical inference • provides a tour of basic statistical methods commonly encountered in public health and biomedical research • Places emphasis on understanding of basic statistical methods, use of the methods to evaluate evidence from studies, and communication of statistical results to statisticians/non-statisticians Learning Objectives By December, students successfully completing the course will • have a basic working knowledge of important statistical topics including descriptive statistics and probability, inference on means, regression methods, and nonparametrics • understand how to evaluate which methods are appropriate in answering a research question for a given study design • be able to evaluate straightforward statistical usage in public health and biomedicine • Have the tools to interact knowledgeably with biostatisticians in planning, conducting, analyzing, and reporting public health and medical research Keep up with the readings for Success (they were assigned for a reason!) The Big Picture
• Statistics is the process by which we convert data into useful
information. As part of this process, we • Collect data • Summarize data • Interpret the results
Graphic from the CMU Open Learning Initiative
Population • The process starts when we identify what group we want to study or learn something about. We call the group the population. • We might be interested in all babies born in Seoul, all breast cancer patients diagnosed in South Korea, or all adults (>18years) in South Korea • Population, then, is the entire group that is the target of our interest
Graphic from the CMU Open Learning Initiative
Sampling from the Population • In most cases, the population is so large, there is absolutely no way we can study all of it (e.g., all adults in South Korea) • Usually we have to compromise by taking a sample of objects from the population • This involves choosing a sample that are representative of the population and collecting data from it
Graphic from the CMU Open Learning Initiative
Explain the data • Once the data have been collected, we need to summarize data in a meaningful way, called exploratory data analysis
Graphic from the CMU Open Learning Initiative
Not finished yet • Remember that our goal is to study the population! • We want to be able to draw conclusions about the population based on the sample results • We use probability to examine the difference between the population and the sample • In essence, probability is the `machinery’ that allows us to draw conclusions about the population based on the data collected about the sample
Graphic from the CMU Open Learning Initiative
Statistical Inference We can use what we’ve discovered about our sample to draw conclusions about our target population, which is the final step in the process inference
Graphic from the CMU Open Learning Initiative
What do we really mean by data? • Data are pieces of information about individuals organized into variables • By an individual, we mean a particular person or object. • By a variable, we mean a particular characteristic of the individual • A dataset is a set of data identified with particular circumstances, that are typically displayed in a table/matrix.
Graphic from the CMU Open Learning Initiative
Types of Data: Discrete • Nominal Data (categorical data, Qualitative) • Classification into named categories without numeric meaning • e.g., gender, race, blood type, whether or not you have a disease • Ordinal Data (Quantitative) • Categories are ordered, but differences between levels not easily measured; only relative comparisons are made about differences between levels • e.g., clinical/pathological stages of cancer, I, IIA, IIB,IIC, IIIA,…. And Likert scale, 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5= strongly agree • Count data (Quantitative) • Counted observations, e.g., New confirmed Covid-19 cases in Texas Types of Data: Continuous data • Data representing measurable quantities • The difference between any two possible data values can be arbitrarily small • e.g. birth weight, BMI, Serum Cholesterol level
Graphic from Centers of Disease Control and Prevention
Quiz The table shows the part of the dataset for a random sample from the 2000 U.S. Census • Who are the individuals described by this data • What type of variable is zipcode? • What type of variable is Family_Size? • What type of variable is Annual_income? Reading for Next Time • Pagano and Gauvreau, reminder of Chapter 2