0% found this document useful (0 votes)
2 views

Assignment Python

The document provides an overview of key concepts in data analysis, including statistics, data visualization, structured and unstructured data, and the use of tools like Power BI and Python for data processing. It discusses the importance of statistical methods, logistic regression, and the differences between correlation and causation. Additionally, it outlines how to create visualizations and analyze data effectively using various techniques and tools.

Uploaded by

philomath Math
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment Python

The document provides an overview of key concepts in data analysis, including statistics, data visualization, structured and unstructured data, and the use of tools like Power BI and Python for data processing. It discusses the importance of statistical methods, logistic regression, and the differences between correlation and causation. Additionally, it outlines how to create visualizations and analyze data effectively using various techniques and tools.

Uploaded by

philomath Math
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

University name - Calcutta University

Course name - Business analyst


Date - 14.08.2024
Name – ISHA BISWAS
Registration no. - CMO23

1.Seaborn :

Statistical
Statistics is a branch of math focused on collecting, organizing, and understanding numerical data. It
involves analyzing and interpreting data to solve real-life problems, using various quantitative
models. Some view statistics as a separate scientific discipline rather than just a branch of math. It
simplifies complex tasks and offers clear insights into regular activities. Statistics finds applications in
diverse fields like weather forecasting, stock market analysis, insurance, betting, and data science.
The statistics. mean() method calculates the mean (average) of the given data set. Tip: Mean = add
up all the given values, then divide by how many values there are.

Data Visualization
Data visualization is the discipline of trying to understand data by placing it in a visual context so that
patterns, trends, and correlations that might not otherwise be detected can be exposed. Data
visualization is the graphical representation of information and data. By using visual elements like
charts, graphs, and maps, data visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.

Structured data
Structured data is data that has a standardized format for efficient access by software and humans
alike. It is typically tabular with rows and columns that clearly define data attributes. Computers can
effectively process structured data for insights due to its quantitative nature.
Here are examples of structured data systems:
* Excel files
* SQL databases
* Point-of-sale data
* Web form results
* Search engine optimization (SEO) tags
* Product directories
* Inventory control * Reservation system

Unstructured data
These files have a delimiter and either fixed or variable width where the missing values are
represented as blanks in between the delimiters. But sometimes we get data where the lines are not
fixed width, or they are just HTML, image or pdf files. Such data is known as unstructured data.
Unstructured data has an internal structure but does not contain a predetermined data model or
schema. It can be textual or non-textual. It can be human-generated or machine-generated. One of
the most common types of unstructured data is text.
2.Data visualization with tableau
What is Power BI used for in data visualization?
Power BI is a business analytics service by Microsoft used for data visualization, business intelligence,
and data analysis. It helps users: 1. Connect to various data sources
2. Create interactive visualizations (reports and dashboards)
3. Explore and analyze data 4. Share insights with others Power
BI is used for:
1. Data visualization: Create interactive charts, tables, maps, and more to represent data.
2. Business intelligence: Analyze data to inform business decisions.
3. Data mining: Discover patterns and trends in data.
4. Reporting: Create and share reports with others.
5. Dash boarding: Create custom dashboards for real-time monitoring.
6. Data storytelling: Present data insights in a clear and compelling way.
Power BI offers various features, including:
1. Data connectors (e.g., Excel, SQL, Azure)
2. Data modeling and transformation
3. Visualizations (e.g., charts, tables, maps)
4. Interactivity (e.g., filters, drill-downs)
5. Collaboration and sharing
6. Artificial intelligence (AI) and machine learning (ML) capabilities Power BI is used across various
industries and departments, such as:
1. Sales and marketing
2. Finance and accounting
3. Operations and supply chain
4. Human resources
5. Healthcare and life sciences
By using Power BI, organizations can gain insights, make data-driven decisions, and drive business
success.

How do you create a basic bar chart in Power BI?


1. Open Power BI: Launch the Power BI application on your computer or access it online.

2. Load Data: Connect to your data source (e.g., Excel file, database) or use a built-in sample dataset.

3. Create a New Visual: Click the "Visualizations" icon (a bar chart symbol) in the left sidebar and
select "Bar chart" from the dropdown menu.

4. Drag Fields: Drag the field you want to display on the x-axis (categories) to the "Axis" area.

5. Drag Values: Drag the field you want to display on the y-axis (values) to the "Values" area.

6. Customize: Adjust the chart's appearance by using the "Format" options (e.g., colors, font sizes,
titles).
7. Analyze: Explore your data by interacting with the bar chart (e.g., hover, click, filter).
What are filters in Power BI, and how do they help in data analysis?
In Power BI, filters are a way to narrow down data to a specific subset, allowing users to focus on
relevant information and gain insights. Filters help in data analysis by:

1. Reducing data volume: Filters exclude irrelevant data, making it easier to analyze and visualize.
2. Focusing on specific segments: Filters enable analysis of specific groups, such as regions, products,
or time periods.
3. Identifying trends and patterns: By applying filters, users can discover trends and patterns within
specific data segments.
4. Drilling down into details: Filters allow users to drill down into detailed data, enabling deeper
analysis.
5. Creating targeted visualizations: Filters help create visualizations that show specific data, making it
easier to communicate insights.

By applying filters, users can:


1. Analyze specific business scenarios
2. Identify areas for improvement
3. Track key performance indicators (KPIs)
4. Create targeted reports and dashboards
5. Gain deeper insights into their data
Filters are a powerful feature in Power BI, enabling users to extract valuable insights from their data
and make informed decisions.

3. INTEGRATING PYTHON WITH TABLEAU :

How can Python be used to enhance data analysis in Power BI?


Python is a very useful programming language for data analysis purposes, data science and machine
learning. With Python, you can import, transform, analyses, and visualize data from various sources
in different formats. It also boasts multiple libraries with advanced functions and algorithms for data
processing.
Microsoft Power BI is an interactive data analysis and visualization tool used for BI (business
intelligence). With Power BI, you can quickly and easily connect to, model, explore, and share data,
as well as create personalized, interactive visual reports that offer valuable insights about your
business.

Python integration with Power BI is limited to two main functionalities: data integration and analysis,
so Python can only be used in Power BI for sourcing data and creating custom visualizations.

In this article, we will show you how to:


Install and configure the Python and Power BI environment.
Use Python to import and transform data in Power BI.
Create custom visualizations using Seaborn and Matplotlib in Power BI.
Use Pandas to handle datasets in Power BI.
Reuse your existing Python source code in Power BI.
Understand the limitations of using Python in Power BI.
Use Kaggle, an open databank.
What is the process for connecting Python scripts in Power BI?
To connect Python scripts to Power BI, follow these steps:
Install Python and Required Packages: Ensure Python is installed on your system along with the
necessary packages (e.g., pandas, numpy). You can install packages using pip:

Copy code pip install pandas


numpy Enable Python Support in
Power BI:

Open Power BI Desktop.


Go to File > Options and settings > Options.
Under Global > Python scripting, specify the path to your Python executable.
Load Data Using Python Script:

In Power BI Desktop, go to Home > Get Data > More.


Select Other > Python script and click Connect.
Enter your Python script in the dialog box that appears. This script should include code to
import libraries, read data, and prepare it for Power BI. Run and Transform Data:

Power BI will execute the Python script and load the data as a Data Frame.
You can then use Power BI's data transformation tools to clean and shape the data as needed.
Visualize Data:

Use Power BI's visualization tools to create reports and dashboards based on the data processed
by your Python script. Refresh Data:

Ensure that your Python environment and scripts are properly configured to handle data refreshes if
you are using scheduled refreshes in the Power BI Service.
This process integrates Python scripts into Power BI for advanced data processing and analysis.

How do you execute Python code in a Power BI dashboard?


To run your Python script:
In the Home group of the Power BI Desktop ribbon, select Get data. In the Get Data dialog box,
select Other > Python script, and then select Connect.

4.ANALYTICS FOUNDATION USING STATISTICAL


METHODS:
What is the purpose of using statistical methods in analytics?
Statistical methods in analytics are used to collect, analyse, interpret, and present data. They help
identify patterns, relationships, and trends, enabling informed decision-making and accurate
predictions in various fields.

Data Summarization
Inference
Hypothesis Testing
Modelling Relationships
Risk Assessment
Optimizing Processes
Prediction and Forecasting
How do you calculate the mean and standard deviation of a data set?

To calculate the mean and standard deviation of a dataset, follow these steps:

1. Calculate the Mean (Average):


Step 1: Sum all the values in the dataset.
2. Calculate the Standard Deviation:
Step 1: Calculate each data point's deviation from the mean by subtracting the mean from each
value.
Step 2: Square each of these deviations.

Step 5: Take the square root of the variance to get the standard deviation. 𝑛 Step 4: Divide the
Step 3: Sum all the squared deviations.

sum of squared deviations by the number of data points n to get the variance.
What is the difference between correlation and causation in the statistical analysis?

A concise comparison between correlation and causation in a table format:


•Aspect
•Correlation
•Causation

•Definition - Measures the strength and direction of a relationship between two variables. -
Indicates that one event directly causes another.
•Implication -Shows association but does not imply one variable causes the other. -
Demonstrates that changes in one variable result from changes in another.
•Example - Ice cream sales and drowning incidents are correlated. -
Smoking causes lung cancer.
•Interpretation - Useful for identifying potential relationship -
Critical for establishing cause-effect relationships.
•Limitation -Can be misleading if misinterpreted as causation.
-Requires rigorous testing to confirm.

5.LOGISTIC REGRESSION:

What is logistic regression used for in data analysis?


Logistic regression is a data analysis technique that uses mathematics to find the relationships
between two data factors. It then uses this relationship to predict the value of one of those factors
based on the other. The prediction usually has a finite number of outcomes, like yes or no.

How do you interpret the output of a logistic regression model?


Standard interpretation of the ordered logit coefficient is that for a one unit increase in the predictor,
the response variable level is expected to change by its respective regression coefficient in the
ordered log-odds scale while the other variables in the model are held constant.
What is the difference between logistic regression and linear regression?
Linear Regression and Logistic Regression are both statistical models used for prediction, but they
differ in their approach and application:

Linear Regression:

1. Continuous Outcome: Predicts a continuous outcome variable (y) based on one or more predictor
variables (x).
2. Linear Relationship: Assumes a linear relationship between the predictors and the outcome.
3. Equation: y = β0 + β1x + ε (where β0 is the intercept, β1 is the slope, and ε is the error term)
4. Assumptions: Linearity, independence, homoscedasticity, normality, and no multicollinearity.
5. Example: Predicting house prices based on features like size, location, and number of bedrooms.

Logistic Regression:

1. Binary Outcome: Predicts a binary outcome variable (y) based on one or more predictor variables
(x).
2. Non-Linear Relationship: Assumes a non-linear relationship between the predictors and the
outcome (using the logistic function).
3. Equation: p(y=1) = 1 / (1 + e^(-z)) (where z = β0 + β1x and p(y=1) is the probability of the positive
outcome)
4. Assumptions: Independence, no multicollinearity, and linearity in the logit (not the original
variables).
5. Example: Predicting whether a customer will churn (yes/no) based on features like usage,
demographics, and satisfaction.

You might also like