Kabir ML Assignment
Kabir ML Assignment
Predicted Predicted
Positive Negative
Actual Positive 40 10
Actual Negative 20 30
Calculate precision, recall, F1 score, and
accuracy.
Derive the mathematical formula for the F1 score and explain its relationship with
precision and recall.
• Why is it not always ideal to achieve zero training error in a machine learning model?
Explain with examples.
• If adding more data does not improve the performance of a machine learning model, what
could be the reasons? Propose solutions.
• Can a classifier with 100% accuracy always be considered the best? Discuss scenarios
where this may not hold true.
• How does feature scaling impact the performance of algorithms like K-Nearest
Neighbors and Support Vector Machines? Provide insights.
Using the given data: [2,4,10,12,3,20,30,11,25] [2, 4, 10, 12, 3, 20, 30, 11, 25],
Tasks:
1. Perform two iterations of the K-Means algorithm and report the cluster
assignments after each iteration.
2. Calculate the final centroids of the clusters.
3. Explain why the clusters remain stable or change during each iteration.
Coding Assignment
1. Data Collection:
o Use the GitHub API to fetch user profile data for a list of users.
o Collect information about each user's repositories, programming languages, and
followers.
2. Data Processing:
o Create a dataset where each user is represented by the programming languages
they most frequently use.
o Encode the programming languages as features.
3. Clustering:
o Apply K-Means clustering to group users based on their programming language
preferences.
o Visualize the clusters using a 2D scatter plot (if dimensionality reduction is
needed, use PCA).
4. Community Insights:
o Identify the main programming languages in each cluster.
o Provide a brief analysis of how the clusters represent communities of users who
code in similar languages.
Deliverables:
• Python script (.py file) with clear comments and modular code.
• A README file explaining how to run the script and interpret the results.
• A visualization of the clusters and summary insights.
REFERENCES
Adobe Acrobat
Document