0% found this document useful (0 votes)
32 views9 pages

Just Give Me The Codes Lecture 2: Data Importation: Goals: Import Data Into Jupyterlab View The Dataset

The document provides instructions for importing data into JupyterLab and viewing a dataset. It outlines 11 steps to import the data, view the first and last entries, check for missing values and data types, re-import the data with missing values addressed, take a random sample of the data with a set seed for repeatability, and suggests further reading on random seeds and where to find additional datasets.

Uploaded by

DragosCavescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views9 pages

Just Give Me The Codes Lecture 2: Data Importation: Goals: Import Data Into Jupyterlab View The Dataset

The document provides instructions for importing data into JupyterLab and viewing a dataset. It outlines 11 steps to import the data, view the first and last entries, check for missing values and data types, re-import the data with missing values addressed, take a random sample of the data with a set seed for repeatability, and suggests further reading on random seeds and where to find additional datasets.

Uploaded by

DragosCavescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Just Give me the Codes

Lecture 2: Data Importation

Goals:
• Import data into JupyterLab
• View the dataset
Step 1: Extract data from the internet

 Open the ipnyb file in ‘ipynb for Entry Level Python’ folder
 Go to Step 1 and run the script (press Shift & Enter together)

 Data imported! Well done!


Steps 2-3: View first 5 and last 5 entrants

 Follow Steps 2 & 3 to view the first 5 and last 5 entrants of the dataset
 Step 3 output shows period (full stop) instead of numerical values for variables 4 & 5
Steps 4-5: Look for missing values &
determine data type

 Follow Steps 4 & 5 to return sum of NaNs in


each column and to determine data type
for each variable
 Note: functions will be explained in
subsequent lectures as we are just
correcting for data type to infer accurate
descriptive analysis in the next lecture
Steps 6-7: Re-import & re-examine dataset

 Follow Steps 6 & 7 to create missing


values list and to re-examine last 5
entrants
 Take note: One needs to be
creative when creating a missing
values list, as what the data entrant
may find comprehensible for a
missing value, the data analyst may
not (i.e., maybe add a comma).
Steps 8-9: Check data types and
missing values

 Follow Steps 8-9 to re-examine data types


and sum of NaNs in each column,
respectively
Step 10: Random selection of dataset

 Follow Step 10
 Note: your random sample will
always differ unless you set a seed
Step 11: Set a seed for repeatable results

 By setting a seed (36), you ensure the


results can be repeated later on (i.e.,
setting a seed at 36 will always give
you the same results unless you
change the value of n).
 Change value of n for your
convenience (i.e., n=20)
 A seed of 21 will give you different
results from a seed of 36
 Play around with the seed values! ☺
 Refer to end of lecture links for more
information on random seed
End of Lecture 2

 Well done! You have officially gained skills in data importation!


 Where to go from here? Lecture 3 of course! But things to consider:
 Read up on random.seed
 Commence Assignment 1
 A great place to start:
 Information on random.seed
 https://pynative.com/python-random-seed/

 Repositories where you can find data in csv format


 https://archive.ics.uci.edu/ml/index.php
 https://databank.worldbank.org/home.aspx

You might also like