0% found this document useful (0 votes)
4 views3 pages

Kabir ML Assignment

The document outlines an ML assignment consisting of theoretical questions and practical coding tasks. It covers topics such as neural network limitations, performance metrics for classifiers, frequent itemset mining using the Apriori algorithm, decision trees, K-Means clustering, and a GitHub API community clustering project. The assignment requires analysis, calculations, and the development of a Python script for data collection and clustering visualization.

Uploaded by

shahharshil686
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

Kabir ML Assignment

The document outlines an ML assignment consisting of theoretical questions and practical coding tasks. It covers topics such as neural network limitations, performance metrics for classifiers, frequent itemset mining using the Apriori algorithm, decision trees, K-Means clustering, and a GitHub API community clustering project. The assignment requires analysis, calculations, and the development of a Python script for data collection and clustering visualization.

Uploaded by

shahharshil686
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

ML Assignment

Theory and Numericals

1. Analyze the limitations of traditional neural networks.


2. A classifier achieves the following confusion matrix on a test dataset:

Predicted Predicted
Positive Negative
Actual Positive 40 10
Actual Negative 20 30
Calculate precision, recall, F1 score, and
accuracy.

Derive the mathematical formula for the F1 score and explain its relationship with
precision and recall.

3. A dataset contains the following transactions:


o T1: {A, B, C}
o T2: {A, C}
o T3: {A, B}
o T4: {A, B, C, D}
o T5: {B, C, D} Find all frequent itemsets using the Apriori algorithm with a
minimum support of 0.6.
o Extend the above example to generate association rules with a minimum
confidence of 0.8.
4. Decision Trees
5. Open ended

• Why is it not always ideal to achieve zero training error in a machine learning model?
Explain with examples.
• If adding more data does not improve the performance of a machine learning model, what
could be the reasons? Propose solutions.
• Can a classifier with 100% accuracy always be considered the best? Discuss scenarios
where this may not hold true.
• How does feature scaling impact the performance of algorithms like K-Nearest
Neighbors and Support Vector Machines? Provide insights.

6. K-Means Clustering Process:

Using the given data: [2,4,10,12,3,20,30,11,25] [2, 4, 10, 12, 3, 20, 30, 11, 25],

perform K-Means clustering with K=2

Tasks:

1. Perform two iterations of the K-Means algorithm and report the cluster
assignments after each iteration.
2. Calculate the final centroids of the clusters.
3. Explain why the clusters remain stable or change during each iteration.
Coding Assignment

Github API Community Clustering

Write a Python script to:

1. Data Collection:
o Use the GitHub API to fetch user profile data for a list of users.
o Collect information about each user's repositories, programming languages, and
followers.
2. Data Processing:
o Create a dataset where each user is represented by the programming languages
they most frequently use.
o Encode the programming languages as features.
3. Clustering:
o Apply K-Means clustering to group users based on their programming language
preferences.
o Visualize the clusters using a 2D scatter plot (if dimensionality reduction is
needed, use PCA).
4. Community Insights:
o Identify the main programming languages in each cluster.
o Provide a brief analysis of how the clusters represent communities of users who
code in similar languages.

Deliverables:

• Python script (.py file) with clear comments and modular code.
• A README file explaining how to run the script and interpret the results.
• A visualization of the clusters and summary insights.

REFERENCES

Adobe Acrobat
Document

You might also like