Student Assessment Submission and Declaration
Student Assessment Submission and Declaration
When submitting evidence for assessment, each student must sign a declaration confirming that
the work is their own.
In case of resubmission
Student Declaration
Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the
consequences of plagiarism. I understand that making a false declaration is a form of
malpractice.
2.
In sum,
the
correlati
on
analysis
provides
insights
about
the
strength,
direction,
and
significan
ce of the
3)
Types of models:
Descriptive Analytics:
Pros:
Cons:
Predictive Analytics:
Pros:
Cons:
Prescriptive Analytics:
Pros:
Cons:
Milestone 2
1)
2. Car Ownership:
- Income levels: Analyzing the scatter plot can reveal if there are
income patterns among car owners and non-car owners. You can
observe whether car owners generally have higher or lower
incomes compared to non-car owners.
4. Decision-making:
- Risk assessment: The scatter plot can aid in assessing the risk
associated with lending to individuals based on their credit scores,
incomes, and car ownership. It can help identify high-income
individuals with low credit scores who own cars or individuals with
high credit scores and incomes who do not own cars.
1. Occupation Distribution:
- Risk assessment: The pie chart can assist in assessing the risk
associated with entities based on their finance status. It helps
identify the proportion of entities in different financial conditions,
which can guide decision-making processes related to lending,
investments, or partnerships.
3)
Data Discretization:
Data Reduction:
4)
1. Handling of Mixed Data Types: Certain columns like 'Monthly
Income', 'Number of Children', and 'Years of Employment' that were
expected to contain numeric values, actually contained mixed data
types (including strings and numbers). This required a substantial
amount of data cleaning, such as removing unnecessary characters
and converting data types.
5)
Data preparation is a crucial step in the data analysis process and
here are the main reasons:
Model Implementation
1)
2)
1. Confusion Matrix: This visualization is beneficial in a binary
classification problem such as ours ("will buy a car" or "will not buy
a car"). A confusion matrix provides a clear picture of how well the
model is performing by showing true positives, true negatives, false
positives, and false negatives. It provides insight into the instances
the model got correct, as well as the ones it got wrong.
2)
This one is better than the second but not better than the original
result.
3)
1. Different algorithms: While Random Forests performed
reasonably well, it could be beneficial to explore different
algorithms. For example, Gradient Boosting algorithms, such as
XGBoost or LightGBM, are also powerful tools for classification
tasks. Even exploring deep learning models might be beneficial,
given sufficient data.
4)
1. Iterative development and validation: One important practice
in model development is to follow an iterative approach where
you continuously develop, test, validate, and refine your model.
This includes splitting the data into a training set, a validation
set, and a test set. The training set is used to build the model,
5)
In reflecting on our work thus far, I'd like to express the
transformative impact that both descriptive and predictive analytics
have had on our operations.