0% found this document useful (0 votes)
30 views4 pages

Lab Note

Uploaded by

Aaron Winston
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

Lab Note

Uploaded by

Aaron Winston
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CLASS NOTE

A statistical hypothesis is a statement or assumption about a population or


a process that we want to test using data. It's like making an educated
guess about something and then collecting data to see if the guess is
correct or not. There are two main types: null hypothesis, which represents
the status quo or no effect, and alternative hypothesis, which suggests that
there is an effect or difference. We use statistical tests to evaluate the
evidence against the null hypothesis and make conclusions based on the
data.

**Hypothesis**:
We want to test whether studying with music improves test scores. Our
hypotheses would be:

- Null Hypothesis (H0): Studying with music does not improve test scores.
- Alternative Hypothesis (H1): Studying with music improves test scores.

**Experiment**:
We gather a group of students and randomly assign them to two groups:
one group studies with music, and the other studies in silence. After
studying, we give them a math test.

**Results**:
We calculate the average test scores for each group and find that the group
studying with music has a slightly higher average score.
**P-value**:
The p-value is a measure that helps us determine the strength of evidence
against the null hypothesis in a statistical hypothesis test. It represents the
probability of observing the data, or more extreme results, if the null
hypothesis were true. In simpler terms, it tells us how likely it is to get the
observed results by random chance alone.

A small p-value (typically less than 0.05) indicates that the observed
results are unlikely to have occurred under the assumption of the null
hypothesis, suggesting strong evidence against the null hypothesis. On the
other hand, a large p-value suggests that the observed results are more
likely to be consistent with the null hypothesis, indicating weak evidence
against it.

We conduct a statistical test, and let's say we get a p-value of 0.03. This
means that if the null hypothesis were true (studying with music does not
improve test scores), there is only a 3% chance of observing the difference
in test scores that we did, or even more extreme differences, just by
random chance.

Since the p-value (0.03) is less than the common significance level of 0.05,
we reject the null hypothesis and conclude that there is evidence to support
the claim that studying with music improves test scores.
What is ~

The tilde ~ is being used inside the summarise_all function from the dplyr
package in R, which is part of the tidyverse suite of data manipulation
tools.

WHAT IS %>%

The %>% operator in R is known as the pipe operator This operator allows
you to pass the result of one expression as the first argument to the next
expression, facilitating a more readable and streamlined syntax for
chaining together multiple functions. The use of %>% greatly enhances
the readability of code by presenting a sequence of operations in a logical
and linear fashion, akin to how one might describe the process verbally:
"Take the numbers 1 to 10, then calculate the square roots, and then sum
them."

REPLACE NA NUMERIC

1. mutate_if() checks each column in the penguins data-frame to see if


it's numeric (is.numeric).

2. For numeric columns, it applies the transformation: if a value is NA


(is.na(.)), it replaces it with the mean of the column (mean(. , na.rm =
TRUE)), otherwise, it leaves the value unchanged (.)

3. na.rm = TRUE is used within mean() to ensure that NA values are


ignored when calculating the mean.
4. penguins_clean_num is the new data-frame with the NA values in
numeric columns replaced by the column mean.

REPLACE NA CATEGORICAL

Finding the Most Common "Sex" per Species: This is done by grouping
by species, filtering out NA values in the sex column, and then summarising
each group to find the most common value of sex. The result is stored in
most_common_sex.

Joining and Replacing NA Values: We then join this information back to


the original penguins dataset. For rows with NA in the sex column, we
replace them with the most common sex for their species. The ifelse() is
explicitly casting the output to character to ensure consistency.

You might also like