Artificial Intelligence & BA - Practicals Assignments
Artificial Intelligence & BA - Practicals Assignments
A small company recorded the monthly sales figures (in thousands of dollars) for the past 10 months. The sales
figures are as follows:
22,30,25,30,28,32,29,30,26,27,22, 30, 25, 30, 28, 32, 29, 30, 26, 2722,30,25,30,28,32,29,30,26,27
Tasks:
2. Determine the Median: Identify the middle value of the monthly sales data when arranged in ascending
order.
3. Find the Mode: Determine the most frequently occurring sales figure.
4. Interpret the Results: Based on your calculations, which measure of central tendency provides the best
insight into the company's typical monthly sales?
A researcher collected data on the number of hours studied and the corresponding test scores for a sample of 8
students. The data is as follows:
2 55
3 60
5 70
4 65
6 75
7 80
8 85
9 90
Tasks:
1. Calculate Pearson’s Correlation Coefficient: Determine the strength and direction of the linear relationship
between hours studied and test scores.
o Find the equation of the best-fit line in the form Y=a+bXY = a + bXY=a+bX, where YYY is the test
score and XXX is the number of hours studied.
Problem 3:
A data analyst is working with a dataset that contains information about the number of hours studied and the
corresponding exam scores of students. The dataset includes the following columns:
3 60 0
5 70 1
4 65 1
6 75 1
7 80 1
8 85 1
9 90 1
Tasks:
1. Linear Regression: Fit a linear regression model to predict Exam Score based on Hours Studied. Provide the
regression equation and interpret the coefficients.
2. Logistic Regression: Fit a logistic regression model to predict the probability of passing the exam based on
Hours Studied. Provide the logistic regression equation and interpret the coefficients.
3. Ridge Regression: Apply ridge regression to the same linear regression problem. Explain how ridge
regression modifies the standard linear regression model.
4. Lasso Regression: Apply lasso regression to the same linear regression problem. Discuss how
lasso regression impacts feature selection compared to ridge regression.
Problem 4 :
**One-Sample t-Test:**
- A company claims that their light bulbs last an average of 1000 hours. A sample of 25 light bulbs has a mean
lifespan of 980 hours with a standard deviation of 50 hours. Test the claim at a 0.05 significance level. What
are the null and alternative hypotheses, and what is the conclusion?
Problem 5:
**Two-Sample t-Test:**
- Two different teaching methods are tested for their effectiveness on student performance. Group A (n=30) has
a mean test score of 75 with a standard deviation of 8, and Group B (n=30) has a mean test score of 70 with a
standard deviation of 10. Test if there is a significant difference between the two groups' test scores at a 0.01
significance level. What are the null and alternative hypotheses, and what is the conclusion?
- A survey is conducted to examine the relationship between gender (male, female) and preference for a new
product (like, dislike). The following results are obtained:
Test whether gender and product preference are independent at a 0.05 significance level. What are the null and
alternative hypotheses, and what is the conclusion?
Problem 7:
- A researcher wants to compare the effectiveness of three different diets on weight loss. The weight loss (in
pounds) after 6 weeks for each diet group is as follows:
- Diet A: [3, 4, 2, 5, 4]
- Diet B: [6, 7, 5, 6, 7]
Perform an ANOVA test at a 0.05 significance level to determine if there are significant differences in weight
loss between the three diet groups. What are the null and alternative hypotheses, and what is the conclusion?
Problem 8:
- A health study measures blood pressure before and after a treatment on 12 subjects. The blood pressure
readings before treatment are:
[ [120, 115, 130, 125, 140, 135, 125, 130, 120, 110, 140, 125] \] And the readings
[ [115, 110, 125, 120, 130, 125, 120, 125, 115, 105, 130, 120] \]
Problem 9:
- In a survey, 60 out of 200 respondents reported they prefer online shopping over in-store shopping. The
company claims that 30% of the population prefers online shopping. Test the company's claim at a 0.05
significance level. What are the null and alternative hypotheses, and what is the conclusion?
Problem 10:
- In a simple linear regression analysis, the estimated regression equation is \( Y = 2 + 3X \). The standard error of
the slope coefficient (3) is 0.5. Test the significance of the slope at a 0.05 significance level. What are the null
and alternative hypotheses, and what is the conclusion?
Problem 11:
Normal Distribution
1. Question:
o The heights of adult women in a certain city are normally distributed with a mean of 65 inches and
a standard deviation of 3 inches. What is the probability that a randomly selected woman from this
city is taller than 68 inches?
Problem 12:
Poisson Distribution
A call center receives an average of 4 calls per hour. What is the probability that exactly 3 calls are received in a given
hour?
Problem 13:
Exponential Distribution
The lifespan of a certain type of battery is exponentially distributed with a mean lifespan of 500 hours. What is the
probability that a battery lasts more than 600 hours?
Bernoulli Distribution
A factory produces light bulbs, and each light bulb has a 90% chance of passing the quality control test. What is the
probability that a single randomly selected light bulb passes the test?
Problem 15:
Binomial Distribution
In a factory, 5% of items are defective. If a quality inspector randomly selects 10 items, what is the probability that
exactly 2 of them are defective?
Problem 16:
Uniform Distribution
An employee’s work shift starts at a random time uniformly distributed between 9 AM and 5 PM. What is the
probability that the shift starts after 2 PM?
2. **Using Libraries:**
Import the `numpy` library in a Jupyter Notebook and create a NumPy array with the values `[1, 2, 3, 4, 5]`. How do
you find the standard deviation of this array using `numpy`?
Plot the points (1, 2), `(2, 4)`, and `(3, 6)`.
[4, 5, 6]})
R Programming
How do you calculate the mean of a vector `c(10, 20, 30, 40, 50)` in R? Write the code to perform this calculation.
2. **Using Libraries:**
Install and load the `ggplot2` library in R. Write the code to create a simple scatter plot of `x = c(1, 2, 3)` and `y = c(2,
4, 6)` using `ggplot2`.
Create a bar plot in R for the vector `c(5, 10, 15)` with the labels `c("A", "B", "C")`.
1. **Bar Plot:**
Create a bar plot in Jupyter Notebook using Matplotlib for the following
Label the x-axis as "Categories" and the y-axis as "Values". What is the code to generate this bar plot?
2. **Histogram:**
Use 5 bins and label the x-axis as "Value" and the y-axis as "Frequency". What is the code?
3. **Pie Chart:**
Create a pie chart in Jupyter Notebook using Matplotlib for the following data:
Add a title "Distribution of Categories". What is the code to generate this pie chart?
For Private Circulation Only
4. **Box Plot:**
Using the following data, create a box plot in Jupyter Notebook with Matplotlib:
Label the y-axis as "Values". What is the code to generate this box plot?
1. Question:
o You are working on a new software project and need to gather requirements from various
stakeholders. What is one common method you could use to collect detailed business
requirements, and how does it help ensure that all stakeholder needs are considered?
2. Question:
o During a project kickoff meeting, you are using interviews to gather business requirements. What
are two key questions you might ask a stakeholder to understand their needs and expectations
from the project?
3. Question:
o You decide to use surveys to collect business requirements from a large group of stakeholders.
What is one advantage of using surveys over interviews for gathering requirements, and what is one
potential limitation?
1. Question:
o After gathering business requirements, you need to map these requirements to your delivery team's
capabilities. What is one approach you can use to assess whether your team has the necessary skills
to meet the requirements?
2. Question:
o You have identified a requirement that requires advanced data analytics capabilities, but your
delivery team is currently lacking expertise in this area. What is one approach you could take
to address this skills gap before proceeding with the project?
o When mapping requirements to team capabilities, you find that some requirements might be
beyond your team's current technical skills. What is one strategy you could use to handle
these requirements while keeping the project on track?
2. A dataset is stored in an Excel file with multiple sheets, and another dataset is available in a cloud-based SQL
database. Describe how you would import these datasets into a Pandas DataFrame in Python and an R data
frame. Include any relevant libraries or functions.
3. You have data stored in a public CSV file available online and another dataset in a private SQL database.
Demonstrate the steps to import these datasets into a data frame in Python using Pandas and R using
readr. What code or functions would you use?
4. You are preparing data for analysis and need to organize and map metadata to understand the context
and structure of your data better. Explain the process of mapping metadata for a dataset that includes
columns like "Date", "Sales", and "Region". How would you document the metadata to support data
analysis?
5. You are performing data profiling on a dataset to assess its quality. The dataset contains columns such as
"Customer ID", "Purchase Amount", and "Transaction Date". Describe the steps you would take to
evaluate the quality of this data and identify any potential issues, such as missing values or inconsistencies.
You have a DataFrame df with columns Name, Age, and Salary. The Age column should be numeric, but some entries
are text, and the Salary column contains missing values. Write a Python script using Pandas to identify:
Given the Data Frame df with columns Age (which has some non-numeric values) and Salary (which has missing
values), apply the following cleaning steps:
o Write a Python script to perform these data cleaning tasks using Pandas. What code would you use?
You have a Data Frame df with columns Height (in cm) and Weight (in kg). To prepare the data for analysis, you
need to normalize these columns to a range between 0 and 1 using Min-Max scaling. After normalization, perform a
basic validation to ensure that the normalized values fall within the expected range.
o Write a Python script using Scikit-learn and Pandas to normalize the data and validate the
results. What code would you use?
1. Question:
o You have a dataset df with multiple features. Apply Principal Component Analysis
(PCA) to reduce the dimensionality of the dataset to 2 principal components. Write a Python
script using Scikit-learn to perform PCA and display the variance explained by each principal
component. What code would you use?
2. Question:
o Using the same dataset df, apply Linear Discriminant Analysis (LDA) to reduce the dimensionality
of the dataset to 2 components. Assume that df has a target variable target for classification. Write
a Python script using Scikit-learn to perform LDA and project the data onto the 2 components.
What code would you use?
3. Question:
4. Question:
o After applying dimension reduction techniques to your dataset, you want to evaluate the
correlations between different data points. Create a scatter plot of the first two principal
components obtained from PCA. Write a Python script using Matplotlib to create the scatter plot
and label the axes as "Principal Component 1" and "Principal Component 2". What code would
you use?
5. Question:
o Using the reduced dataset from PCA or LDA, perform clustering using K-means and visualize the
clusters in a scatter plot. Write a Python script using Scikit-learn and Matplotlib to apply K-means
clustering with 3 clusters and plot the results. What code would you use to create this scatter plot
with cluster centroids and labels?
1 Question:
o Create a visualization to represent the results of a data analysis using Python. The dataset includes
columns for Category, Sales, and Profit. Use Matplotlib or Seaborn to create a bar chart that shows
the total sales and profit for each category. What code would you use to generate this
visualization?
2 Question:
o You need to present a sales performance dashboard using Tableau. Describe the steps you
would take to:
o What are the key actions and settings you would use to build this dashboard? Performing Version
3 Question:
o You are managing multiple versions of a report and need to use version control. Explain how
you would use Git to:
4 Question:
o You want to maintain your data analysis reports in a knowledge base for easy access and
collaboration. Describe the process of setting up a knowledge base using a platform like Confluence
or SharePoint. Include steps for:
Ensuring that team members can access and contribute to the reports.
o What key features would you use to manage and share the reports effectively?
o You need to present a project update to your team using oral, written, and nonverbal
communication skills. Describe how you would:
o How would you ensure that your communication is clear and effectively conveys your thoughts and
ideas?
o As a team leader, you are responsible for demonstrating professional behavior and providing
effective mentorship to your team. Explain how you would:
Perform rule-based analysis to extract meaningful insights from the data. Format the data into the required
types/forms for analysis. Identify any anomalies in the data, such as missing or inconsistent entries. Evaluate
the information and knowledge management systems used to store and retrieve the data. Apply information
confidentiality guidelines to ensure the data is handled securely.
• Use the CRM database to record new information and extract existing customer information for your analysis.
• What steps and tools would you use to complete this task efficiently and securely?
o Describe two methods you would use to gather requirements from the client. Explain why you chose
these methods and how they would help ensure a thorough understanding of the client's needs.
o Outline two approaches you would take to manage client expectations, including how you would
prioritize their requirements and set performance expectations. Provide examples of how these
approaches would be implemented in the project.
o Demonstrate how you would maintain effective communication and build a good working
relationship with the client throughout the project. Provide two specific strategies or techniques you
would use and explain why they are important.
You are leading a team tasked with implementing a new project management software in your organization. Some
team members are resistant to change and skeptical about the new software's benefits.
Present an argument in favor of the new software and provide at least two pieces of evidence to support your
argument. Explain why this evidence is compelling and how it addresses the team's concerns.
Describe how you would frame the goal of implementing the new software in a way that finds common ground with
your skeptical team members. What shared objectives or benefits would you highlight to gain their support?
Demonstrate how you would use both visual and verbal communication techniques to influence your team's
perspectives and encourage them to embrace the new software. Provide specific examples of the techniques you
would use and explain their effectiveness.
1. Segregation of Waste:
o Describe the steps you would take to practice the segregation of recyclable, non-recyclable, and
hazardous waste in your office. Provide specific examples of how you would ensure compliance
among employees.
o Demonstrate two different methods you would implement to optimize and conserve energy
resources in the workplace. Explain how these methods contribute to overall energy efficiency and
sustainability.
3. Inclusive Communication:
o Demonstrate how you would communicate essential information in a manner that is inclusive of all
genders and sensitive to persons with disabilities (PwD). Provide specific examples of communication
techniques or tools you would use to ensure everyone is informed and included.
************************