0% found this document useful (0 votes)
43 views35 pages

Lecture 2 - The Dataset Presentation

The document outlines a lecture on applied data science projects. It introduces potential data sources from companies and open datasets that could be used. It also discusses forming groups, delivering a report and presentation, and the grading criteria for the project which focuses on defining the problem, analyzing data, developing recommendations, and effective communication of findings.

Uploaded by

Giorgio Aduso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views35 pages

Lecture 2 - The Dataset Presentation

The document outlines a lecture on applied data science projects. It introduces potential data sources from companies and open datasets that could be used. It also discusses forming groups, delivering a report and presentation, and the grading criteria for the project which focuses on defining the problem, analyzing data, developing recommendations, and effective communication of findings.

Uploaded by

Giorgio Aduso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

TDT4259 – Applied Data Science

Lecture 2: Bringing value from datasets


Nisha Dalal
Adj. Associate Professor

[email protected]
2

Agenda
• Introduction

• 10:20 – 10:40 Dataset presentation from Aneo

• 10:40 -- 11:00 Dataset presentation from Equinor

• 11:00 -- 11:15 Break

• 11:15 -- 11:30 Dataset presentation for IDI, NTNU

• 11:30 – 11:45 Open-source dataset presentations

• 11:45 – 12:00 Report Template and Grading criteria


3

Project (Group)
You will work in groups of 5-6 students.

Fill the form to register for the group assignment. A separate link for individuals to sign up for the group
assignment. Links also present in Blackboard (Course work -> Group project)

IMPORTANT! Only one entry is needed per team.


IMPORTANT! Deadline is 20th September!

The final deliverable will be a report and a presentation. Examples from previous year present in
Blackboard**.
4

Project (Group)

Data sources

1) Company datasets

2) Online open data sources

3) Your own dataset/company data


5

Aneo Dataset presentation


6

Equinor Dataset presentation


7

NTNU (IDI) Dataset presentation


8

Open-source datasets
9

Airbnb has released data-sets per city that


AirBnB
are openly available here:
http://insideairbnb.com/get-the-data.html
10

Inside AirBnB: Dallas


11

NYC Open Data


The city of New York has released data-sets
on different topics that are openly available
here: https://opendata.cityofnewyork.us
12

NYC Open Data


The city of New York has released data-sets
on different topics that are openly available
here: https://opendata.cityofnewyork.us
13

Electronic Products and Pricing Data


https://data.world/datafiniti/electronic-products-and-pricing-data
14

Fashion products on Amazon


https://data.world/promptcloud/fashion-products-on-amazon-com
15

Yelp Open Dataset


https://www.yelp.com/dataset,
https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset
16

E-Commerce Orders
https://www.kaggle.com/jainaashish/orders-merged
17

Family food dataset


https://www.gov.uk/government/statistical-data-sets/family-food-datasets
18

Violence Against Women & Girls


https://data.world/makeovermonday/2020w10
19

Goodreads
https://www.kaggle.com/datasets/jealousleopard/goodreadsbooks?datasetId=231310
20

Maternal health data


https://data.unicef.org/resources/dataset/maternal-health-data/
21

Google Public Data Explorer


https://www.google.com/publicdata/directory
22

London’s Open Data Portal for Transport


https://tfl.gov.uk/info-for/open-data-users/our-open-data
23

Kaggle
https://www.kaggle.com/datasets
24

Grading Criteria for Group Assignment


Introduction and Interpretation and
Appearance of
problem definition Background (15%) Method (15%) Analysis (20%) recommendations Presentation (15%)
report (5%)
(10%) (20%)
Describes the context Defines clear objectives Clearly describes the Presents results in a Develops an Includes cover page Has a clear
clearly and completely and describes how they data-set, with its clear and concise implementation plan with case, authors etc. introduction, problem
can be resolved with attributes, features and manner and explain based on derived Also includes table of statement and
the use of data sources what they indicate insight that has specific contents, table of objective
(descriptive) actions and a time-plan tables/figures etc
Has a good Uses relevant literature Introduces the Uses appropriate Recommends actions Is complete in Makes a convincing
understanding of the to support arguments methods and tools visualization methods towards different information of argument based on the
domain and how the on selected approaches used to analyze the to present results stakeholders or actors contributors, name, data and the analysis
intended problem is of data science project data and explains why based on best practices that are relevant to the student ID, email and uses appropriate
suited in the context management they are appropriate context of examination visualization and
that the organization (What does the analysis editing to make for an
operates tell us about the engaging presentation
challenge and what
should we do)
Clearly articulates the Describes how you will Presents demographics Organizes analysis in an Outlines limitations of Has references that are Provides clear and
problem(s) that needs design your data- or descriptive statistics order that makes method and/or data- formatted concise
to be answered with strategy and how it of the data-set if logical sense and set and ways in which it appropriately and recommendations
the help of data complements with relevant presents different can be improved for completely and has
analytics other methods (design types of analyses to future analysis appendixes that are
thinking etc.) to examine the challenge needed to understand
achieve your set from different angles analyses
objectives
Description of team, Presents examples if Describes how data Explains buzzwords and Suggests ways in which Has page numbers and Is within the time-limit
roles and relevant that have had were pre-processed technical details so that future analysis can is formatted in a of 5 minutes
responsibilities an impact through and cleansed. non-experts can complement existing uniform way
data-driven decision- understand them results
making
25

Project (Group)
Suggested template

1. Introduction and problem definition– Describe the context and the problem you wish to address (max 3 pages).
2. Background – Present specific objectives you want to achieve and describe how you approach the problem, how you will
design your data-strategy and what goals it is intended to achieve etc. (approx. 3-6 pages).
3. Method– Describe in detail the methods you are applying to analyze the data and the data-set you have selected (2-4 pages)
4. Analysis– Describe the data analysis you conducted and present the results. It is important that the results are described in
detail and visualized appropriately (3-10 pages)
5. Interpretation and recommendations– Describe an implementation plan based on the insights you extracted. You can set
specific actions that need to be implemented, a time-plan for deployment, and ideas for future data collection and improving
the analysis and results. (3-5 pages)

**Sample reports and presentations added in Blackboard!


26

FAQs- Project (Group)


What is the ideal group size? 5-6 people.

Does it matter which company/dataset we use? No, but you should ask interesting and actionable questions.

Do we need permission/contact with an individual in the company? No, use publicly available datasets.

Do we need to be present physically for the report submission and presentation? No, but plan it with your team.

When will the groups be assigned?: One week after the deadline of registration (20th September).

How long should be the video presentation? 5 minutes. Samples in Blackboard.


Other Important things and Q&A
28

Reference groups and feedback

We are looking for 5-8 students to comprise the reference group. The purpose of the reference group is to provide
constructive feedback about the course through an ongoing open dialogue with other students throughout the semester.
You can read more about task of the reference group in this link.

If you want to sign up to be a member of the reference group, use this link.

A survey will be sent out to all to evaluate the course during the last week.
29

Next to come
The next lecture (6th September) will teach you some basics on machine learning with Kshitij Sharma.

In the lecture after that (13th September), I will be giving an overview of some low or no code data science tools.

The choice of tool/technique is open, and you can select any software/method/tool you think is best suited.

Slack group: If not added yet, please contact Manos ([email protected])


30

Nisha Dalal
Questions & Discussion [email protected]

You might also like