Project - Stat - Fall 2023
Project - Stat - Fall 2023
Important dates:
⮚ Team registration and propose a research topic: EOD December 31, 2023, via a link to
be released on Blackboard.
⮚ Deadline for report submissions: 8:15 AM, January 12, 2023. File submissions are
on Blackboard.
Instructions: It is recommended that you should work in a group of 2-4 students to propose
ideas and to motivate each other. A group of at most four students is allowed. You can work
alone if you wish. The submission of the project report is by submitting a file (or a link for
Google Colab) on BB. If you have any questions or need advice, feel free to ask the instructor
or the TA.
The project topic is free. Each team can propose a project topic that you would like to study!
Here are some examples suggestions and inspirations:
1. Go to https://www.kaggle.com/
Make a search, learn the previous projects to have inspirations, and initiate your own project.
For example, learn the linear regression with Boston housing data set and make your own
project on linear regression:
https://www.kaggle.com/code/henriqueyamahata/boston-housing-with-linear-regression/data
https://www.kaggle.com/code/henriqueyamahata/boston-housing-with-linear-regression/
notebook
Present the descriptive statistics and design appropriate hypothesis tests to determine whether
there has been an increase in the average temperatures over the years at, for example, 10%
level of significance and at 2% level of significance, respectively. One can vary the level of
significance and observe the results. If it is possible, one can explore/infer some predictions for
the future.
3. Analysing the poverty and equity in Vietnam, and/or Analysing population (urban, rural,
largest cities) and making predictions for population in Vietnam, Income vs. Education, GDP,
finance, loan, travel services, Internet users, labour force, employment, education: pupil-
teacher ratio, school enrolment, Electricity consumption, Electricity production, tourist, air
transport, export, life expectancy, CO2 emission, renewable energy, etc.
Data can be taken from the Work Bank:
https://data.worldbank.org/country/vietnam?view=chart
4. Study a theoretical model and apply the model for some specific applications:
For example, one can study a regression model such as multilinear regression, logistic
regression (strongly recommend! This is an important model.) and apply the model, for
example, to accept the personal loan or to predict the bankruptcy of a company. The
probability p of bankruptcy is between 0 and 1 and can be predicted by the logistic regression
model.
Project report should include Title; Abstract; Introduction: motivation, the importance of the
study, questions, or problems the project study; Methodology: Descriptive statistics (Data
collection, data summary) and inferential statistics (data analysis by C.I, hypothesis test,
regressions, ANOVA, etc); Results and discussion; References. The important aspects of any
statistical data analysis are stating questions, collecting data, visualizing data by descriptive
statistics tools and analyzing data by inferential statistics methods to infer the conclusions or
predictions. The techniques can be some of the following: summarizing data, and/or
confidence intervals, and/or hypothesis testing, and/or regression models, ANOVA, or a
technique that we will not cover in this class such as data classification or data clustering. Note
that if you are using models/techniques that we did not cover in class, you should explain the
models/techniques. All explanation and reasoning must be of your own team words. Turnitin
will be used upon submissions. Any serious similarity will be considered plagiarism.
The project proposal, report, and presentation weigh totally 75 points (5+50+20pts).
1. Proposal (5 points): Make a team and propose the research topic, including a title (2
points). Write a short plan (~ 1 page) to introduce the topic, summarize the proposed work and
outline a plan (3 points). Due: EOD Dec 31, 2023.
Structure
● Is there a clear introduction? Does the
4
purpose/motivation/problem state clearly?
● Is there a finish? (Or is it just a sudden stop?)
Persuasiveness
● Is it realistic? Is it convincing?
6
Presentation values
10
● Were the transitions and flow easy to follow?
Slides were error-free and logically
presented?
BONUS: Creativity
Additional bonus for the project: A bonus of 5 points will be given if you can submit the
certifications for “Data Science for Business” in Datacamp.
---The end ---