0% found this document useful (0 votes)
184 views5 pages

Assignment JTW115E 2023-2024 v5

Uploaded by

Sha Finna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views5 pages

Assignment JTW115E 2023-2024 v5

Uploaded by

Sha Finna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Group Assignment: Data Analysis Project Using Kaggle Dataset

(version 4 - 11 May 2024)

Objective:

This assignment is designed to simulate a real-world data analysis project, from beginning to
end. It will help students develop a solid foundation in data handling and analysis techniques,
critical thinking about data, and effective communication of data-driven insights.

Group Formation:

Team Members: Each group should consist of 6 (min) to 10 (max) students.


Team Dynamics: Assign specific roles within the team, such as project manager, data
analyst, data cleaner, and presenter. This ensures a diverse skill set and equitable workload
distribution.

Dataset Selection:

Source: Kaggle.com with at least 100 observations (at least 100 rows).
Criteria for Dataset Selection: Choose a dataset with a variety of data types (numerical,
categorical) for a comprehensive analysis experience. The dataset must have at least 100
observations (rows) and be relevant to a real-world problem.

Report Structure1:

This structure aims to provide a clear and comprehensive framework for reporting on data
analysis projects, facilitating a logical flow of information from the introduction of the
dataset to the presentation of findings and conclusions.

1. Cover Page
1.1. Project Title: Clearly state the focus of your analysis.
1.2. Group Members: List the names, student numbers, and roles within the group.
2. Table of Contents
2.1. List all section titles along with their corresponding page numbers for easy
navigation through the document.
3. Introduction
3.1. Dataset Overview: Provide a brief overview of the chosen dataset, including its
source, size, and the types of variables it contains.
3.2. Dataset Significance: Explain the relevance of the dataset to societal, economic, or
environmental issues, highlighting the importance of your analysis.
3.3. Purpose of the Analysis: Define the objectives of your project. What questions are
you aiming to answer or what hypotheses are you testing?
4. Methodology
4.1. Software and Tools: Primary analysis should be conducted in Microsoft Excel. The
use of Python, R or other software is encouraged for advanced analysis. Document
all processes clearly.
4.2. Data Acquisition: Describe the process of obtaining the dataset from Kaggle,
including any challenges encountered.

1
You may include additional relevant section as needed.

1
4.3. Data Cleaning and Management: Detail the steps taken to prepare the data for
analysis. This includes handling missing values, detecting and removing duplicates,
and any transformations applied to the data.
4.4. Data Analysis Techniques: Explain the analytical methods and tools used, including
any statistical or machine learning techniques. Justify the choice of these methods in
the context of your research questions or hypotheses.
5. Results
5.1. Descriptive Statistics: Present basic statistics that summarize the dataset’s features.
5.2. Visualizations: Include charts, graphs, or other visual aids that highlight key findings
from your analysis.
5.3. Modeling Results: Detail the outcomes of any statistical tests or predictive models
used, including metrics of model performance.
5.4. Critical Evaluation of Results: Critically evaluate how effectively your analysis
methods answered the research questions, including limitations or biases.
6. Discussion
6.1. Findings Interpretation: Discuss the results in the context of the research questions or
hypotheses. What does the data reveal?
6.2. Significance and Implications: How do your findings contribute to existing
knowledge or practice in the relevant field?
6.3. Real-world Applications: Discuss how your findings could be applied in real-world
scenarios, suggesting practical applications or further research areas.
6.4. Ethical Considerations: Address any ethical considerations related to your findings or
the analysis process.
7. Conclusion
7.1. Summary of Key Findings: Concisely recap the main insights derived from your
analysis.
7.2. Limitations: Acknowledge any limitations of your study, including dataset constraints
or methodological challenges.
7.3. Future Research: Suggest directions for future studies or further analysis that could
build on your work.
7.4. Personal Reflection: Reflect on your personal contribution and the skills or
knowledge gained through the project.
8. References
8.1. Cite all sources used in your research and analysis, including data sources, software
tools, and any academic literature or online resources.
9. Appendices (if applicable)
10. Supplementary Material: Include any additional material that supports your analysis, such
as detailed tables, extended data visualizations, or code snippets.

Software and Tools:

1. Utilize Microsoft Excel for data analysis. However, you are allowed to adopt other
software or statistical techniques (for example Microsoft PowerBI, R, Python, Google
Sheets, or Google Looker Studio).
2. Documentation of Analysis Process: Thoroughly document the analysis process,
including screenshots, code snippets, and reasoning, to ensure clarity and reproducibility.

Submission Requirements:

For a comprehensive evaluation of your work, each group are required to submit:

2
1. All code files, if any (e.g., Python scripts, R Markdown)
2. A well-organized, clear, and professionally formatted PDF report of the project.
3. Dataset in Microsoft Excel format.

Evaluation Criteria:

1. Innovation and Creativity: Projects will be evaluated for innovation in analysis approach,
dataset choice, and presentation of findings.
2. Clarity and completeness of the data report.
3. Effectiveness of data cleaning and management.
4. Depth and relevance of exploratory data analysis.
5. Appropriateness and implementation of statistical techniques.
6. Quality of data presentation and visualizations.
7. Reflection on the process.

Deadline:

12 May 2024 (Extended) 5 May 2024, 11:59 PM. Submissions are accepted through the
course portal only. Hardcopy (courier) or Softcopy (email) submission will not be accepted!

Important Notes:

Ensure all data used is cited appropriately.


Originality is crucial, plagiarism will result in severe penalties.
Regularly consult with your instructor for guidance and clarification. Please submit your
question through:

https://bit.ly/AskJTW115

and refer to the answer at:

https://bit.ly/FAQJTW115

Suggested Steps/Tasks2:

1. Introduction to Kaggle and Dataset Selection


a. Register on Kaggle.com if you haven't already.
b. Explore datasets available on Kaggle, focusing on a topic that interests you or
aligns with your future career goals.
c. Download your selected dataset and read the dataset documentation thoroughly to
understand its context, variables, and any potential issues or limitations.
2. Data Acquisition and Management
a. Load the dataset into a suitable environment (e.g., Microsoft Excel, Google
Sheets, Python, R).
b. Perform initial data exploration to understand the structure, format, and quality of
your data.

2
Please utilize this suggestion as a guide, each dataset is different, therefore there will be no one-size-fits-all
solution.

3
c. Clean the data by handling missing values, removing duplicates, and correcting
errors as necessary.
3. Exploratory Data Analysis (EDA)
a. Use descriptive statistics to summarize the data.
b. Visualize the data using various charts and graphs to uncover patterns, trends, and
anomalies.
c. Document your initial findings and hypotheses about the data.
4. In-depth Analysis
a. Based on your EDA, formulate specific
i. research questions, AND/OR
ii. research questions, AND/OR
iii. research hypotheses.
b. Apply statistical tests or machine learning models as appropriate to test your
hypotheses.
c. Interpret the results, considering the context and relevance to real-world scenarios.
5. Data Presentation and Reporting
a. Summarize your findings in a clear, concise report.
b. Include an introduction to your dataset, methodology, results, and conclusions.
c. Create visualizations to effectively communicate your insights.
d. Reflect on the limitations of your analysis and suggest areas for future research.
6. Reflection
a. Reflect on what you learned during the project and how you can apply these skills
in future data analysis tasks.

In general, you are required to transform raw data into insights as depicted in Figure 1.

Figure 1 An overview of the data analysis process in Microsoft Excel3

Expected Analysis4:

1. Queries using logical functions.


2. VLOOKUP/MATCH/INDEX/IF functions
3. Design a PivotTable and use this Table to show/extract TWO different information and
visualize this information using PivotChart.
4. Create at least FOUR different charts and at least ONE combination chart.

3
Boateng, B. O. (2023). Data Modeling with Microsoft Excel: Model and analyze data using Power Pivot,
DAX, and Cube functions: Packt Publishing.
4
Non exhaustive list. You may perform additional analysis as needed.

4
5. Create ONE sparklines.
6. Comment on the frequency distribution and histogram of a categorical variable.
7. Comment on the frequency distribution and histogram of a numerical variable.
8. Comment on the mean, median and mode for at least one of the variables.
9. Use the Excel Descriptive Statistics Tool, compare the numerical statistical measures for
at least two of the variables.
10. Investigate the relationship between two variables and interpret the coefficient.
11. Build a regression model and use this model for a prediction.

Additional Notes & Guides:

Ensure all data used is appropriately cited. Originality is crucial; plagiarism will result in
severe penalties. For guidance and clarification, regularly consult the provided links.

1. https://youtu.be/U2ACDG5qVWc?si=XCML4Gog6aZerFeE
2. https://youtu.be/CCCjYVJuwU4?si=Jtok6xaPl1yyeaC3
3. https://youtu.be/YvVEvmzUw2Q?si=SfBfcikcX6QpRDBE
4. https://www.datacamp.com/tracks/data-storytelling

All the Best!

You might also like