Assignment Python
Assignment Python
1.Seaborn :
Statistical
Statistics is a branch of math focused on collecting, organizing, and understanding numerical data. It
involves analyzing and interpreting data to solve real-life problems, using various quantitative
models. Some view statistics as a separate scientific discipline rather than just a branch of math. It
simplifies complex tasks and offers clear insights into regular activities. Statistics finds applications in
diverse fields like weather forecasting, stock market analysis, insurance, betting, and data science.
The statistics. mean() method calculates the mean (average) of the given data set. Tip: Mean = add
up all the given values, then divide by how many values there are.
Data Visualization
Data visualization is the discipline of trying to understand data by placing it in a visual context so that
patterns, trends, and correlations that might not otherwise be detected can be exposed. Data
visualization is the graphical representation of information and data. By using visual elements like
charts, graphs, and maps, data visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.
Structured data
Structured data is data that has a standardized format for efficient access by software and humans
alike. It is typically tabular with rows and columns that clearly define data attributes. Computers can
effectively process structured data for insights due to its quantitative nature.
Here are examples of structured data systems:
* Excel files
* SQL databases
* Point-of-sale data
* Web form results
* Search engine optimization (SEO) tags
* Product directories
* Inventory control * Reservation system
Unstructured data
These files have a delimiter and either fixed or variable width where the missing values are
represented as blanks in between the delimiters. But sometimes we get data where the lines are not
fixed width, or they are just HTML, image or pdf files. Such data is known as unstructured data.
Unstructured data has an internal structure but does not contain a predetermined data model or
schema. It can be textual or non-textual. It can be human-generated or machine-generated. One of
the most common types of unstructured data is text.
2.Data visualization with tableau
What is Power BI used for in data visualization?
Power BI is a business analytics service by Microsoft used for data visualization, business intelligence,
and data analysis. It helps users: 1. Connect to various data sources
2. Create interactive visualizations (reports and dashboards)
3. Explore and analyze data 4. Share insights with others Power
BI is used for:
1. Data visualization: Create interactive charts, tables, maps, and more to represent data.
2. Business intelligence: Analyze data to inform business decisions.
3. Data mining: Discover patterns and trends in data.
4. Reporting: Create and share reports with others.
5. Dash boarding: Create custom dashboards for real-time monitoring.
6. Data storytelling: Present data insights in a clear and compelling way.
Power BI offers various features, including:
1. Data connectors (e.g., Excel, SQL, Azure)
2. Data modeling and transformation
3. Visualizations (e.g., charts, tables, maps)
4. Interactivity (e.g., filters, drill-downs)
5. Collaboration and sharing
6. Artificial intelligence (AI) and machine learning (ML) capabilities Power BI is used across various
industries and departments, such as:
1. Sales and marketing
2. Finance and accounting
3. Operations and supply chain
4. Human resources
5. Healthcare and life sciences
By using Power BI, organizations can gain insights, make data-driven decisions, and drive business
success.
2. Load Data: Connect to your data source (e.g., Excel file, database) or use a built-in sample dataset.
3. Create a New Visual: Click the "Visualizations" icon (a bar chart symbol) in the left sidebar and
select "Bar chart" from the dropdown menu.
4. Drag Fields: Drag the field you want to display on the x-axis (categories) to the "Axis" area.
5. Drag Values: Drag the field you want to display on the y-axis (values) to the "Values" area.
6. Customize: Adjust the chart's appearance by using the "Format" options (e.g., colors, font sizes,
titles).
7. Analyze: Explore your data by interacting with the bar chart (e.g., hover, click, filter).
What are filters in Power BI, and how do they help in data analysis?
In Power BI, filters are a way to narrow down data to a specific subset, allowing users to focus on
relevant information and gain insights. Filters help in data analysis by:
1. Reducing data volume: Filters exclude irrelevant data, making it easier to analyze and visualize.
2. Focusing on specific segments: Filters enable analysis of specific groups, such as regions, products,
or time periods.
3. Identifying trends and patterns: By applying filters, users can discover trends and patterns within
specific data segments.
4. Drilling down into details: Filters allow users to drill down into detailed data, enabling deeper
analysis.
5. Creating targeted visualizations: Filters help create visualizations that show specific data, making it
easier to communicate insights.
Python integration with Power BI is limited to two main functionalities: data integration and analysis,
so Python can only be used in Power BI for sourcing data and creating custom visualizations.
Power BI will execute the Python script and load the data as a Data Frame.
You can then use Power BI's data transformation tools to clean and shape the data as needed.
Visualize Data:
Use Power BI's visualization tools to create reports and dashboards based on the data processed
by your Python script. Refresh Data:
Ensure that your Python environment and scripts are properly configured to handle data refreshes if
you are using scheduled refreshes in the Power BI Service.
This process integrates Python scripts into Power BI for advanced data processing and analysis.
Data Summarization
Inference
Hypothesis Testing
Modelling Relationships
Risk Assessment
Optimizing Processes
Prediction and Forecasting
How do you calculate the mean and standard deviation of a data set?
To calculate the mean and standard deviation of a dataset, follow these steps:
Step 5: Take the square root of the variance to get the standard deviation. 𝑛 Step 4: Divide the
Step 3: Sum all the squared deviations.
sum of squared deviations by the number of data points n to get the variance.
What is the difference between correlation and causation in the statistical analysis?
•Definition - Measures the strength and direction of a relationship between two variables. -
Indicates that one event directly causes another.
•Implication -Shows association but does not imply one variable causes the other. -
Demonstrates that changes in one variable result from changes in another.
•Example - Ice cream sales and drowning incidents are correlated. -
Smoking causes lung cancer.
•Interpretation - Useful for identifying potential relationship -
Critical for establishing cause-effect relationships.
•Limitation -Can be misleading if misinterpreted as causation.
-Requires rigorous testing to confirm.
5.LOGISTIC REGRESSION:
Linear Regression:
1. Continuous Outcome: Predicts a continuous outcome variable (y) based on one or more predictor
variables (x).
2. Linear Relationship: Assumes a linear relationship between the predictors and the outcome.
3. Equation: y = β0 + β1x + ε (where β0 is the intercept, β1 is the slope, and ε is the error term)
4. Assumptions: Linearity, independence, homoscedasticity, normality, and no multicollinearity.
5. Example: Predicting house prices based on features like size, location, and number of bedrooms.
Logistic Regression:
1. Binary Outcome: Predicts a binary outcome variable (y) based on one or more predictor variables
(x).
2. Non-Linear Relationship: Assumes a non-linear relationship between the predictors and the
outcome (using the logistic function).
3. Equation: p(y=1) = 1 / (1 + e^(-z)) (where z = β0 + β1x and p(y=1) is the probability of the positive
outcome)
4. Assumptions: Independence, no multicollinearity, and linearity in the logit (not the original
variables).
5. Example: Predicting whether a customer will churn (yes/no) based on features like usage,
demographics, and satisfaction.