0% found this document useful (0 votes)
10 views14 pages

Previous QP

Uploaded by

afrinbanu9742
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views14 pages

Previous QP

Uploaded by

afrinbanu9742
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

February/march 2024 Data Analytics

SECTION-A
I. Answer any four questions. Each question carries two marks. (4x2=8)
1. Define the term Data Analytics.
Data analytics refers to the process of examining, cleaning, transforming, and modeling data to uncover
useful information, draw conclusions, and support decision-making. It involves using statistical,
computational, and machine learning techniques to analyze large datasets and extract valuable insights
2. Name any four data visualization tools used.
Tableau, Power BI, QlinkView, Google Charts,Dj3.
3. Explain the term Normal Distribution.
Normal distribution is a symmetric, bell-shaped probability distribution that represents the distribution of
a continuous random variable. The majority of the values cluster around the mean, and the frequency
of values decreases as they move away from the mean. It is characterized by its mean (µ) and standard
deviation (σ), and many statistical methods assume data follows a normal distribution
4. Define the following events
a. Mutually exclusive- Two events are mutually exclusive if they cannot occur at the same time. If one
event happens, the other cannot. For example, rolling a 3 or a 5 on a single dice throw
b. Equally likely- Two events are equally likely if they have the same probability of occurring. For
example, flipping a fair coin, where both heads and tails have equal chances of occurring.
5. What is power query?
Power Query is a data connection technology in Microsoft Power BI, Excel, and other Microsoft tools. It
allows users to discover, connect, combine, and refine data from various sources. With Power Query,
users can automate the process of extracting and transforming data before using it for analysis or
reporting
6. What are Filters in Power BI?
Filters in Power BI are used to restrict the data that appears in visualizations, reports, or dashboards. They
allow users to narrow down data based on certain conditions, such as filtering by a specific time
period, region, or product category. Filters can be applied at different levels: visual level, page level,
and report level
SECTION B
II. Answer any four questions. Each question caries five marks. (4x5=20)
7. Write a note on Data Analytics Life Cycle.
• Data Collection: The first step in data analytics is gathering data from various sources, including
internal sources like databases and spreadsheets, and external sources such as social media and market
research. The data should be relevant to the business problem and of high quality.
• Data Cleaning and Preprocessing: After data collection, it must be cleaned and preprocessed to remove
errors and inconsistencies. This includes removing duplicates, filling in missing values, and correcting
errors. Preprocessing may also involve transforming the data into a suitable format for analysis, such
as converting categorical variables into numerical ones.
• Data Transformation: Data transformation involves converting data into a format suitable for analysis,
which may include scaling, normalizing, and applying mathematical functions.
• Data Analysis: Data is analyzed using statistical and computational techniques, including descriptive
statistics (e.g., mean, standard deviation) and inferential statistics (e.g., hypothesis testing, regression
analysis).
• Interpretation and Reporting: Interpret the results to derive actionable insights and make informed
decisions. This involves understanding how the analysis impacts the business problem or question
being addressed.
8. Define Hypothesis. Explain the purpose of ANOVA in Hypothesis testing.
Hypothesis is a claim or belief, hypothesis testing is a statistical process of either rejecting or retaining a
claim or belief or association related to a business context, product, service, processes, etc.

There are two main types of hypotheses in hypothesis testing:

1. Null Hypothesis (H₀): Assumes there is no effect, difference, or relationship between variables.
2. Alternative Hypothesis (H₁): Suggests that there is an effect, difference, or relationship between
variables.

Purpose of ANOVA in Hypothesis Testing

ANOVA (Analysis of Variance) is a statistical technique used to compare the means of three or more
groups to determine if there are significant differences between them. It evaluates whether the
observed variations among group means are due to random chance or a true effect.

Key Objectives of ANOVA:

1. Testing for Differences: ANOVA tests the null hypothesis that all group means are equal (H₀: µ₁ = µ₂
= µ₃ ... = µₖ). If rejected, it suggests at least one group mean differs significantly.
2. Identifying Variance Sources: It partitions the total variance in the data into two components:
o Within-group variance: Variability due to differences within each group.
o Between-group variance: Variability due to differences between the group means.
3. Reducing Error: ANOVA helps detect significant group differences while controlling for Type I
errors (false positives) that can arise when conducting multiple t-tests.

9. What are the various steps involved in any Analytics Project?


1. Problem Definition

• Objective: Clearly define the problem or business question to address.


• Key Tasks:
o Understand the goals of the project.
o Identify stakeholders and their requirements.
o Establish the scope, constraints, and success metrics.

2. Data Collection

• Objective: Gather relevant data needed for analysis.


• Key Tasks:
o Identify data sources (databases, APIs, surveys, etc.).
o Collect data in structured or unstructured formats.
o Ensure the data's relevance, completeness, and accuracy.

3. Data Cleaning and Preparation

• Objective: Prepare the data for analysis by addressing quality issues.


• Key Tasks:
o Remove duplicates, handle missing values, and correct errors.
o Normalize or standardize data as needed.
o Create derived features or variables for better analysis.
4. Exploratory Data Analysis (EDA)

• Objective: Understand the dataset and uncover initial insights.


• Key Tasks:
o Analyze distributions, patterns, and trends using statistical summaries and visualizations.
o Detect outliers or anomalies.
o Identify correlations or relationships between variables.

5. Model Development and Testing

• Objective: Build predictive or prescriptive models to address the problem.


• Key Tasks:
o Choose appropriate analytical techniques (e.g., regression, clustering, machine learning).
o Split data into training and testing sets.
o Train the model using the training dataset.
o Evaluate model performance using metrics such as accuracy, precision, recall, or RMSE.

6. Insights Generation and Interpretation

• Objective: Translate model outputs into actionable insights.


• Key Tasks:
o Interpret the results and validate them against business objectives.
o Summarize key findings in a meaningful way.
o Assess whether the insights align with the problem definition.

7. Visualization and Communication

• Objective: Present insights in an easily understandable format.


• Key Tasks:
o Create visualizations such as dashboards, charts, or graphs.
o Use storytelling to communicate the findings to stakeholders.
o Highlight actionable recommendations based on the analysis.

8. Deployment

• Objective: Integrate the solution into business processes or systems.


• Key Tasks:
o Automate workflows or dashboards for continuous use.
o Deploy predictive models or decision-making tools.

9. Monitoring and Maintenance

• Objective: Ensure the solution remains effective over time.


• Key Tasks:
o Continuously monitor the solution's performance.
o Update models or insights as new data becomes available.
o Collect feedback from users and refine the solution.

10. State and prove Baye's Theorem.

Bayes' Theorem is a fundamental concept in probability theory that describes the relationship between
conditional probabilities. It allows us to update the probability of a hypothesis based on new evidence.
Statement of Bayes' Theorem

For two events A and B with P(B)>0, Bayes' Theorem is stated as:

P(A∣B)=P(B∣A)⋅P(A)/P(B)

Where:

• P(A∣B): The probability of event A occurring given that B has occurred (posterior probability).
• P(B∣A): The probability of event B occurring given that A has occurred (likelihood).
• P(A): The probability of event A occurring (prior probability).
• P(B): The probability of event B occurring (marginal probability)

Proof of Bayes' Theorem

Step 1: Start with the definition of conditional probability

The conditional probability of A given B is defined as

P(A∩B)
P(A∣B)= (1)
P(B)

Similarly, the conditional probability of BBB given AAA is:

P(A∩B)
P(B∣A)= (2)
P(A)

Step 2: Rearrange equation (2)

From equation (2):

P(A∩B)=P(B∣A)P(A) (3)
Step 3: Substitute equation (3) into equation (1)

Substitute P(A∩B) from equation (3) into equation (1):

P(B∣A)P(A)
P(A∣B)= P(B)

This is the formula for Bayes' Theorem.

11. The owner of Maumee Ford-solvo wants to study the relationship between the age of a car and its
selling price. Listed below is a random sample of 10 used cars sold at the dealership during last year.
Age (years)
9 7 11 12 8 7 8 11 10 12

Selling Price ($000) 8.1 6.0 3.6 4.0 5.0 10.0 7.8 8.6 8.0 6.0
Calculate the correlation coefficient between car's age and its sale price.
Step 2: Calculate the Deviations

For each xi and yi, calculate (xi−xˉ) and (yi−yˉ).

age Selling price x−xˉ y−yˉ (x−xˉ)(y−yˉ) (x−xˉ)2 (y−yˉ)2


9 8.1 -0.5 1.39 -0.695 0.25 1.9321
7 6 -2.5 -0.71 1.775 6.25 0.5041
11 3.6 1.5 -3.11 -4.665 2.25 9.6721
12 4 2.5 -2.71 -6.775 6.25 7.3441
8 5 -1.5 -1.71 2.565 2.25 2.9241
7 10 -2.5 3.29 -8.225 6.25 10.8241
8 7.8 -1.5 1.09 -1.635 2.25 1.1881
11 8.6 1.5 1.89 2.835 2.25 3.5721
10 8 0.5 1.29 0.645 0.25 1.6641
12 6 2.5 -0.71 -1.775 6.25 0.5041
12. What are the advantages of Power BI ?

Advantages:

1. User-Friendly Interface: Power BI has a visually appealing and intuitive interface that makes it
accessible for users with varying levels of technical expertise.
2. Integration with Microsoft Products: It seamlessly integrates with other Microsoft tools like Excel,
Azure, and SharePoint, making it easier for organizations already using Microsoft services.
3. Data Connectivity: Power BI supports a wide range of data sources, including databases, cloud
services, and spreadsheets, allowing users to consolidate data from multiple sources.
4. Real-Time Data Access: Users can access real-time data, enabling timely decision-making and
insights.
5. Customizable Dashboards and Reports: Users can create personalized dashboards and reports that
can be easily shared across the organization.
6. Advanced Analytics: It offers powerful analytics capabilities, including DAX (Data Analysis
Expressions) for complex calculations and machine learning integrations.
7. Collaboration Features: Power BI facilitates collaboration through sharing options and integration
with Microsoft Teams, enhancing teamwork and communication.
Cost-Effective: For small to medium-sized businesses, Power BI can be a cost-effective solution
compared to other analytics tools.
SECTION-C
III. Answer any four questions. Each question carries eight marks. (4x8=32)
13. With an example explain the different types of analytics.

1. Descriptive Analytics

Purpose:

• To summarize and interpret historical data to understand what has happened.


• Focuses on key metrics such as averages, totals, and percentages.

Example:

Scenario: A retail company analyzing last year's sales.

• Insights:
o Total sales for the year: $5 million.
o Top-performing product: Sneakers (20% of total sales).
o Sales distribution by region: East Coast accounted for 40%.

Tools: Dashboards, reports, and data visualization tools like Tableau or Power BI.

2. Diagnostic Analytics

Purpose:

• To investigate and identify the reasons behind past outcomes or patterns.


• Answers the question: Why did it happen?

Example:

Scenario: A company notices a 15% drop in sales during Q3.

• Analysis:
o Found that a competitor launched a similar product at a lower price.
o Marketing campaigns had lower engagement due to inadequate targeting.

Tools: Data mining, root cause analysis, and correlation analysis.

3. Predictive Analytics

Purpose:

• To forecast future outcomes based on historical data and statistical models.


• Answers the question: What is likely to happen?

Example:

Scenario: An e-commerce company predicting future sales.

• Insights:
o Sales for Q4 are expected to grow by 10% due to holiday promotions.
o Customers who purchased electronics are 70% likely to buy accessories.

Tools: Machine learning, regression models, and forecasting tools.


4. Prescriptive Analytics

Purpose:

• To recommend actions or strategies based on data insights.


• Answers the question: What should we do?

Example:

Scenario: A logistics company optimizing delivery routes.

• Recommendation:
o Implement Route A to reduce fuel costs by 15%.
o Schedule deliveries during non-peak hours to save time.

Tools: Optimization algorithms, decision trees, and AI-based models.

14. With a case study explain how analytics has helped the food industry to improve their business

Case Study: Domino’s Pizza – Leveraging Analytics to Improve Business

Background:

Domino’s Pizza, one of the world’s largest pizza delivery chains, faced challenges in delivery efficiency,
customer satisfaction, and predicting demand. By leveraging analytics, Domino’s transformed its
operations and marketing strategies to achieve significant growth.

Implementation of Analytics:

1. Descriptive Analytics: Understanding Customer Trends

• What they did:


Analyzed historical sales data to determine peak ordering times, top-selling items, and regional preferences.
• Impact:
Enabled Domino’s to optimize store operations during busy hours, ensuring consistent service quality.

2. Predictive Analytics: Forecasting Demand

• What they did:


Used machine learning models to predict demand during events like sports matches and holidays.
• Impact:
Improved inventory planning, reducing food waste and ensuring adequate stock of popular items.
3. Prescriptive Analytics: Optimizing Delivery Routes

• What they did:


Developed real-time route optimization algorithms for delivery drivers.
• Impact:
Reduced delivery times by 25%, enhancing customer satisfaction and loyalty.

4. Customer-Centric Insights

• What they did:


Implemented an AI-powered app to track customer preferences, allowing personalized offers and
recommendations.
• Impact:
Boosted online orders and customer retention through targeted marketing campaigns.

Results:

1. Increased Sales: Revenue grew by over 20% in key markets.


2. Improved Efficiency: Reduced delivery times led to a 15% increase in positive customer feedback.
3. Customer Retention: Personalized marketing increased repeat orders by 30%.

Conclusion:

Domino’s Pizza effectively used analytics to address critical business challenges, resulting in improved
operational efficiency, customer satisfaction, and profitability. The case highlights the transformative
power of analytics in the food industry.

15. Define regression. Find the two regression equations for the data of 10 students in two subjects
given below
English 75 80 93 65 87 71 98 68 89 77
Economics 82 78 86 72 91 80 95 72 89 74
Regression is a statistical method used to model and analyze the relationship between two or more variables. It
helps determine the equation that best describes how a dependent variable (e.g., Economics scores) changes
in response to changes in an independent variable (e.g., English scores).
16.
a) What are the various types of refresh options provided in power BI? (3+5)

Power BI offers multiple data refresh options to ensure reports and dashboards display up-to-date
information. The main types are:

1. Manual Refresh:
o Triggered by the user in the Power BI service or desktop.
o Suitable for ad-hoc data updates.
2. Scheduled Refresh:
o Automatically refreshes data at predefined intervals (e.g., daily, hourly).
o Configured in the Power BI service and requires a Power BI Gateway for on-premises data.
3. Real-Time (Automatic) Refresh:
o Allows data to update in real-time by connecting to streaming datasets.
o Suitable for scenarios like live dashboards showing stock prices or IoT data.
4. On-Demand Refresh:
o Triggered using APIs for specific use cases, such as programmatically refreshing datasets when certain
events occur.
5. Direct Query or Live Connection:
o Data is queried directly from the source in real time, so no scheduled refresh is required.
o Useful for large datasets that cannot be imported into Power BI.

b) What are the building blocks of Microsoft Power BI ? Explain

Power BI is composed of several key components that work together to create a complete data analytics
solution:

1. Datasets:
o Collections of data imported or connected to Power BI from sources like SQL, Excel, or APIs.
o Example: Sales data for a year.
2. Reports:
o A collection of visualizations, such as charts, tables, and graphs, displayed on multiple pages.
o Example: A report showing monthly sales trends, top products, and customer demographics.
3. Dashboards:
o A single-page, real-time view of key metrics and insights, pulling data from multiple reports.
o Example: An executive dashboard summarizing company performance.
4. Visualizations:
o Graphical representations of data like bar charts, line charts, and pie charts.
o Example: A bar chart showing product-wise revenue distribution.
5. Tiles:
o A single visualization in a dashboard, pinned from a report or dataset.
o Example: A KPI tile showing total sales.
6. Power BI Service:
o A cloud-based platform where users can share, collaborate, and publish reports and dashboards.
o Example: Sharing a sales performance report with a team.
7. Power BI Desktop:
o A Windows application for creating reports and models.
o Example: Building a sales forecasting model.
8. Dataflows:
o Used for data preparation and transformation within the Power BI service.
o Example: Creating reusable datasets for customer analytics.

17.
a) What is the purpose of COUNT, COUNTA, COUNTBLANK and COUNTIF in Excel?(4+4)

• COUNT:

• Purpose: Counts the number of numeric entries in a range.


• Example: =COUNT(A1:A10) counts cells with numbers in the range A1:A10.

• COUNTA:

• Purpose: Counts the number of non-blank cells, regardless of data type (numbers, text, or formulas).
• Example: =COUNTA(A1:A10) counts cells with any data (text, numbers, or logical values).

• COUNTBLANK:

• Purpose: Counts the number of blank cells in a specified range.


• Example: =COUNTBLANK(A1:A10) counts empty cells in the range A1:A10.

• COUNTIF:

• Purpose: Counts the number of cells that meet a specified condition.


• Example: =COUNTIF(A1:A10,">10") counts cells in A1:A10 with values greater than 10.

b) List the difference between Logistic Regression and Linear Regression.

18.
a) Differentiate between Dashboard and Reports. (4+4)
b) Explain the different visualisation techniques used for spatial data.
Spatial Data:

Map Visualization

Description: A simple geographical map visualization where data points are plotted using latitude and
longitude coordinates or by geographic regions (like country, state, or city).

Use Cases: Displaying precise locations of points, such as customer addresses, store locations, or
event locations.

Filled Map Visualization

Description: A choropleth map that fills geographical areas (e.g., countries, states, or districts) with
colors based on the values in your dataset.

Use Cases: Visualizing data intensity or value distribution across predefined geographical regions.

Synoptic panel Visualization

Description: A custom visual in Power BI that allows you to create custom shapes or regions and
assign data to them, similar to a filled map but not restricted to geographical locations.

Use Cases: Floor plans (e.g., visualizing sales or occupancy in different parts of a retail store or
building).

Bubble chart visualization

Description: A map visualization where bubbles (circles) are placed on specific geographic locations,
with the size of the bubble representing the magnitude of a particular value (e.g., population, sales, or
revenue).
Use Cases: Displaying distribution or intensity of a value across different locations.

Scatter plot visualization

Description: A scatter plot can visualize geographical data when paired with latitude and longitude on
a Cartesian plane, but more commonly, it is used to represent data distribution between two numeric
values.

Use Cases: Great for identifying correlations or relationships between two data points, such as
comparing sales vs. customer satisfaction in different regions.

Tree maps

Description: A tree map visualizes hierarchical data as a set of nested rectangles. Each rectangle's
size is proportional to a value, making it easy to compare parts of a whole.

Use Cases: While not inherently geographical, tree maps can be used to represent hierarchies of
regions (e.g., country > state > city) and compare metrics within those regions.

Custom visuals

Description: Power BI allows you to integrate a variety of custom visuals designed for more advanced
or specialized spatial data visualization. Some custom visuals cater specifically to geographical data.

Use Cases: Mapbox: Advanced custom maps, including 3D maps, heat maps, contour maps, and
satellite imagery.

You might also like