Statistical Analysis Presentation
Statistical Analysis Presentation
• The code is written like this, After selecting the distribution it will ask appropriate parameters for the chosen distribution.
• It will also ask what type of probability do you want to calculate like P(X<x) P(X>x) or P(a<X<b).
• It will return you the appropriate probability and shaded region.
Usability and Key Features – Probability Calculations
This project’s first part combines all essential probability distributions in one place:
1. Unified Distribution Options: All types of distributions (normal, binomial, Poisson, etc.)
are available in a single platform.
2. Simplifies Code: No need to write separate code for each distribution; all calculations
and visualizations are integrated, saving time and reducing complexity.
3. Automatic Visualizations: Visualizations with shaded areas make it easy to interpret
probabilities without consulting separate tables (e.g., normal or chi-square tables).
4. Academic Applications: Ideal for academic purposes, providing a ready-to-use tool for
understanding and exploring various probability distributions."
Hypothesis Testing and Data Analysis
The second half of this project streamlines hypothesis testing by
automating key steps in the process:
1. Data Input
User provides data either manually or by uploading a CSV file.
2. Column Selection
Specific columns are selected for hypothesis testing, allowing for targeted analysis.
3. Null Value Removal
Automatically detects and removes any null (missing) values to ensure clean, usable data.
4. Assumption Checks
Verifies essential assumptions (e.g., normality, homogeneity of variance) for the chosen test, ensuring validity.
5. Hypothesis Test Selection
User selects the appropriate test: one-sample t-test, two-sample t-test, or z-test, based on the research question.
6. Hypothesis Testing
The test is conducted, and the p-value is calculated.
7. Compare p-value with Significance Level
The p-value is compared to the significance level (α) to decide whether to reject or fail to reject the null hypothesis.
8. Confidence Interval Visualization
Displays a confidence interval around the sample mean or difference, providing a visual summary of the test results.
9. Results Interpretation
Presents a summary of findings, including confidence intervals and the final statistical decision
Internal working of the the programme
Data Input and Validation
• The program first allows the user to enter data manually or upload a CSV file.
• After data is loaded, specific columns are selected based on the user’s analysis needs.
• The program validates the data to ensure it’s in the correct format, handling any errors in data types or
structure.
Outlier Detection
Automatically detects and removes outliers from the dataset, ensuring a clean and accurate analysis by using box
plot
Assumption Checks
•Based on the test selected, the program runs necessary assumption checks:
• Normality Check: Uses statistical tests like the Shapiro-Wilk test to check if data follows a normal
distribution.
• Variance Equality (for two-sample tests): For two-sample tests, it uses tests like Levene’s to ensure the
variances of groups are comparable.
•If assumptions are violated, the program may suggest alternatives or warn the user, ensuring valid and
appropriate test results.
•Python Libraries: The project leverages several powerful Python libraries, such as:
• NumPy: For mathematical computations
• Pandas: For data manipulation and analysis
• Matplotlib and Seaborn: For generating visualizations
• SciPy: For performing statistical tests and calculating probabilities
Q&A