0% found this document useful (0 votes)
7 views11 pages

Statistical Analysis Presentation

Uploaded by

sarveshayaam2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

Statistical Analysis Presentation

Uploaded by

sarveshayaam2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Probability Calculator and Hypothesis Tester: A

Comprehensive Statistical Toolkit

IE509 (Computation and Programming Lab)


Prof. Urban Larsson & Balamurugan
Palaniappan

Sarvesh Maurya Goutam Agarwal


24N0465 24N0463
Motivation and Background

The idea for this feature stems from my


undergraduate studies when we used to
manually refer to Z-tables and t-tables for
finding probabilities and critical values
during hypothesis testing. We thought,
'Why not use Python programming to
simplify this process
Introduction
This project is divided into two main parts:

• Part 1: Probability Calculations with Visualizations


• Provides various types of probability distributions (e.g., normal, binomial, Poisson,), allowing users to select
parameters and distribution types.
• Visualizes each distribution with shaded areas, highlighting specific probability regions for a clearer
understanding.
• Part 2: Hypothesis Testing and Data Analysis
• Accepts data input manually or from CSV.
• Offers three test options: one-sample t-test, two-sample t-test, and z-test, with assumption checks.
• Includes outlier detection, descriptive statistics (mean, variance, etc.), and data visualization.
• Outputs the test results, comparing the p-value with the significance level, and displays a confidence interval
for the results.
Probability Calculations with Visualizations
• You can calculate probability from both discrete and continuous distributions (Binomial, Poisson, Geometric,
Exponential, Normal, Uniform, Gamma, Chi-square)

• The code is written like this, After selecting the distribution it will ask appropriate parameters for the chosen distribution.
• It will also ask what type of probability do you want to calculate like P(X<x) P(X>x) or P(a<X<b).
• It will return you the appropriate probability and shaded region.
Usability and Key Features – Probability Calculations
This project’s first part combines all essential probability distributions in one place:
1. Unified Distribution Options: All types of distributions (normal, binomial, Poisson, etc.)
are available in a single platform.
2. Simplifies Code: No need to write separate code for each distribution; all calculations
and visualizations are integrated, saving time and reducing complexity.
3. Automatic Visualizations: Visualizations with shaded areas make it easy to interpret
probabilities without consulting separate tables (e.g., normal or chi-square tables).
4. Academic Applications: Ideal for academic purposes, providing a ready-to-use tool for
understanding and exploring various probability distributions."
Hypothesis Testing and Data Analysis
The second half of this project streamlines hypothesis testing by
automating key steps in the process:
1. Data Input
User provides data either manually or by uploading a CSV file.
2. Column Selection
Specific columns are selected for hypothesis testing, allowing for targeted analysis.
3. Null Value Removal
Automatically detects and removes any null (missing) values to ensure clean, usable data.
4. Assumption Checks
Verifies essential assumptions (e.g., normality, homogeneity of variance) for the chosen test, ensuring validity.
5. Hypothesis Test Selection
User selects the appropriate test: one-sample t-test, two-sample t-test, or z-test, based on the research question.
6. Hypothesis Testing
The test is conducted, and the p-value is calculated.
7. Compare p-value with Significance Level
The p-value is compared to the significance level (α) to decide whether to reject or fail to reject the null hypothesis.
8. Confidence Interval Visualization
Displays a confidence interval around the sample mean or difference, providing a visual summary of the test results.
9. Results Interpretation
Presents a summary of findings, including confidence intervals and the final statistical decision
Internal working of the the programme
Data Input and Validation
• The program first allows the user to enter data manually or upload a CSV file.
• After data is loaded, specific columns are selected based on the user’s analysis needs.
• The program validates the data to ensure it’s in the correct format, handling any errors in data types or
structure.

Null Value Detection and Removal


•Checks for any missing (null) values within the dataset.
•Automatically removes null values to ensure accurate calculations and reliable results in subsequent .steps

Outlier Detection
Automatically detects and removes outliers from the dataset, ensuring a clean and accurate analysis by using box
plot
Assumption Checks
•Based on the test selected, the program runs necessary assumption checks:
• Normality Check: Uses statistical tests like the Shapiro-Wilk test to check if data follows a normal
distribution.
• Variance Equality (for two-sample tests): For two-sample tests, it uses tests like Levene’s to ensure the
variances of groups are comparable.
•If assumptions are violated, the program may suggest alternatives or warn the user, ensuring valid and
appropriate test results.

Hypothesis Test Execution


•Executes the selected test (one-sample t-test, two-sample t-test, or z-test).
•Calculates the test statistic and p-value, which are central to determining the outcome of the hypothesis test.

Confidence interval visual


Provides a visual representation of the confidence interval around the mean, offering insight into the range of
values likely to contain the true population parameter.
Usability and Key Features – Part 2: Hypothesis Testing
• Flexible Data Input: Users can input data manually or via CSV files,
and can select the column on which he/she wants to perform test
• Automated Assumptions Checking: Assumptions for each test (e.g.,
normality, variance equality) are checked automatically, avoiding the
need for extra code.
• Saves Time: By streamlining assumption checks and offering multiple
test options, this project helps users avoid redundant code.
• Versatile and Practical: Simplifies hypothesis testing workflows,
making it practical for academic and research applications where
quick and reliable statistical insights are needed.
Methodology and Concept used in this project
Throughout this project, We have made use of the key concepts we have learned in the lab,
including:

•Object-Oriented Programming (OOP): We designed the program in a modular way, using


classes and objects to handle different distributions and statistical functions.
•Data Structures: Lists, dictionaries, and arrays are used to store user inputs and manage the
flow of data within the program.

•Python Libraries: The project leverages several powerful Python libraries, such as:
• NumPy: For mathematical computations
• Pandas: For data manipulation and analysis
• Matplotlib and Seaborn: For generating visualizations
• SciPy: For performing statistical tests and calculating probabilities
Q&A

Thank you! Any questions?

You might also like