0% found this document useful (0 votes)
34 views2 pages

Exercise Explore Your Data

Uploaded by

nhungnhung101200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views2 pages

Exercise Explore Your Data

Uploaded by

nhungnhung101200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

{"metadata":{"kernelspec":{"display_name":"Python

3","language":"python","name":"python3"},"language_info":{"codemirror_mode":
{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-
python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","vers
ion":"3.6.5"},"kaggle":{"accelerator":"none","dataSources":
[{"sourceId":38454,"sourceType":"datasetVersion","datasetId":2709},
{"sourceId":260251,"sourceType":"datasetVersion","datasetId":108980}],"isInternetEnable
d":false,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_m
inor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"**[Machine Learning
Course Home Page](https://www.kaggle.com/learn/machine-learning)**\n\n---\
n","metadata":{}},{"cell_type":"markdown","source":"This exercise will test your ability to
read a data file and understand statistics about the data.\n\nIn later exercises, you will
apply techniques to filter the data, build a machine learning model, and iteratively improve
your model.\n\nThe course examples use data from Melbourne. To ensure you can apply
these techniques on your own, you will have to apply them to a new dataset (with house
prices from Iowa).\n\nThe exercises use a \"notebook\" coding environment. In case you
are unfamiliar with notebooks, we have a [90-second intro
video](https://www.youtube.com/watch?v=4C2qMnaIKL4).\n\n# Exercises\n\nRun the
following cell to set up code-checking, which will verify your work as you go.","metadata":
{}},{"cell_type":"code","source":"# Set up code checking\nfrom learntools.core import
binder\nbinder.bind(globals())\nfrom learntools.machine_learning.ex2 import *\
nprint(\"Setup Complete\")","metadata":{"collapsed":true,"jupyter":
{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"## Step 1: Loading Data\nRead the Iowa data file into a
Pandas DataFrame called `home_data`.","metadata":{}},
{"cell_type":"code","source":"import pandas as pd\n\n# Path of the file to read\
niowa_file_path = '../input/home-data-for-ml-course/train.csv'\n\n# Fill in the line below to
read the file into a variable home_data\nhome_data = pd.read_csv(iowa_file_path)\n\n# Call
line below with no argument to check that you've loaded the data correctly\
nstep_1.check()","metadata":{"collapsed":true,"jupyter":
{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":"# Lines below will give you a hint or solution code\
n#step_1.hint()\n#step_1.solution()","metadata":{"collapsed":true,"jupyter":
{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"## Step 2: Review The Data\nUse the command you
learned to view summary statistics of the data. Then fill in variables to answer the following
questions","metadata":{}},{"cell_type":"code","source":"# Print summary statistics in next
line\nhome_data.describe()","metadata":{},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":"# What is the average lot size (rounded to nearest integer)?\
navg_lot_size = home_data[\"LotArea\"].mean().round(0)\n\n# As of today, how old is the
newest home (current year - the date in which it was built)\nnewest_home_age =
datetime.now().year - home_data.YearBuilt.max()\n\n# Checks your answers\
nstep_2.check()","metadata":{},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":"#step_2.hint()\n#step_2.solution()","metadata":
{"collapsed":true,"jupyter":{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"## Think About Your Data\n\nThe newest house in your
data isn't that new. A few potential explanations for this:\n1. They haven't built new
houses where this data was collected.\n1. The data was collected a long time ago. Houses
built after the data publication wouldn't show up.\n\nIf the reason is explanation #1 above,
does that affect your trust in the model you build with this data? What about if it is reason
#2?\n\nHow could you dig into the data to see which explanation is more plausible?\n\
nCheck out this **[discussion thread](https://www.kaggle.com/learn-forum/60581)** to see
what others think or to add your ideas.\n\n# Keep Going\n\nYou are ready for **[Your First
Machine Learning Model](https://www.kaggle.com/dansbecker/your-first-machine-learning-
model).**\n","metadata":{}},{"cell_type":"markdown","source":"---\n**[Machine Learning
Course Home Page](https://www.kaggle.com/learn/machine-learning)**\n\n","metadata":
{}}]}

You might also like