0% found this document useful (0 votes)
2 views

lecture-week1

The document outlines the SIT112 Data Science Concepts course, detailing its structure, topics, and assessment methods. It introduces the unit chair, Dr. Davoud Mougouei, and provides a comprehensive weekly schedule of lectures and workshops, along with guidelines for tasks and submissions. Additionally, it emphasizes the importance of academic integrity and offers resources for support in programming and mathematics.

Uploaded by

trminhselflearn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

lecture-week1

The document outlines the SIT112 Data Science Concepts course, detailing its structure, topics, and assessment methods. It introduces the unit chair, Dr. Davoud Mougouei, and provides a comprehensive weekly schedule of lectures and workshops, along with guidelines for tasks and submissions. Additionally, it emphasizes the importance of academic integrity and offers resources for support in programming and mathematics.

Uploaded by

trminhselflearn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Introduction to

Data Science
SIT112 | Data Science Concepts
Lecture Week 1
Contents
• What is Data Science?
• What You Need to Do Before Starting an Analysis
• The 5 Phases of Data Analysis and Visualization
• Getting Started with Python
• Using JupyterLab
The Unit Outline
Unit Chair
Dr. Davoud Mougouei

Ph.D. Software Engineering


M.Sc. Computer Science
B.Eng. Computer Engineering

Academic Roles:
• Senior Lecturer @ School of IT, Deakin University
• Lecturer @ School of Computing and IT, University of Wollongong
• Lecturer @ School of Mathematics, Physics, and Computing, UniSQ
• Postdoctoral Research Fellow @ Faculty of IT, Monash University
Unit Chair (Cont.)

Research: Affective Computing, Software Engineering, Data Science

• https://www.researchgate.net/profile/Davoud-Mougouei
• http://globalaffects.org/
• https://ieeexplore.ieee.org/document/9709267?source=authoralert
• https://dl.acm.org/doi/abs/10.1145/3236024.3264843
• https://onlinelibrary.wiley.com/doi/abs/10.1002/hbe2.304
Teaching Team

Name Role Email


Dr. Davoud Mougouei Unit Chair, Lecturer, Tutor [email protected]
Dr. Gulisong Nasierding Tutor [email protected]
Pratheek kumar polepalli Tutor [email protected]
Dr. Russul Alanni Tutor [email protected]
Saikumar Chimakurthi Tutor [email protected]
Shafiuddin Mohammad Tutor [email protected]
Dr. Mohamed Reda Bouadjenek Campus Coordinator, Tutor [email protected]
Mukesh Malani Tutor [email protected]
Overview
Week Topic Workshop Task Start Task Due (11 AM
on Monday)
Week 1 (6 Mar) Introduction to Data Science Install Anaconda + Jupyter Lab + Intro to - -
Python - Part 1
Week 2 (13 Mar) Essentials for Data Analysis Task P1 + Intro to Python - Part 2 P1
Week 3 (20 Mar) Data Visualization Recap + Tasks P2 C1 P1
Week 4 (27 Mar) Getting Data Recap + Tasks P3 P2
Week 5 (3 Apr) Cleaning Data Recap + Tasks P4 C2 P3 C1
Break (10 Apr) P4
Week 6 (17 Apr) Preparing Data Recap + Tasks P5 D1 C2
Week 7 (24 Apr) Data Analysis – part 1 Recap + Tasks P6 P5
Public holiday: Tuesday 25 April - Uni
closed

Week 8 (1 May) Data Analysis – part 2 Recap + Tasks P7 P6


Week 9 (8 May) Making Predictions Recap + Tasks HD1 P7 D1
Week 10 (15 May) Case Study Recap + Tasks - -
Week 11 (22 May) Revision HD Interviews + Assistance with Learning - HD1
Portfolios
Week 12 (29 May) - - Submit portfolio (All except P8) by the -
Deadline (June 2)

Exam period Submit P8: EoUA (via Cloud Deakin)


Lectures

• Zoom Link for the lectures is provided on CloudDeakin.


• Lecture Notes will be made available on CloudDeakin before each lecture.
• The lectures will be recorded; can be accessed from CloudDeakin.
Workshops

• Zoom links for the (Online) workshops are provided on CloudDeakin.


• Workshops are the primary venue for you to get help on the tasks.
• The tutors will walk you through the tasks and help you complete them, depending
on the level of support those tasks need.
• Pass tasks: you will be guided to complete the tasks.
• Credit tasks: limited guidance will be provided to complete the tasks.
• Distinction and High Distinction tasks: are to be completed independently; the tutors can
still give you hints and tell you if you are on the right track.
Workshops: Structure

1. Introduce 2. Work on 3. Explain 4. Continue

The tutor will The students will The tutor will The students will
introduce the work on the provide more continue to work on
students to the activities/tasks guidance or walk the activities/tasks
workshop and recap while the tutor the students while the tutor
on the contents of oversees them and through the oversees them and
the previous week. answers tbheir solutions when answers their
questions. appropriate. questions.
Workshops: One-To-One Online Sessions
• Don’t be shy to ask for help!
• It is okay to not know, but it is not
okay to not ask ☺
• One-to-One Online Sessions
(breakouts) can be arranged
during the Online workshops; ask
your tutor!
• Use this responsibly as other
students might need help too ☺
Workshops: Dos and Don’ts!
• Please avoid emailing code or the screenshots of your code to the teaching team
outside workshop hours; instead, demonstrate your solutions (code/report) to
the tutors during the workshops.
• The tutors can help you fix your code in a one-to-one discussion. They can also
show you how to use external resources (e.g., ChatGPT) to fix your code while
improving your programming/problem solving skills.
• Please bring your own Windows/Mac/Linux laptop to the workshop (do not
use Tablet or Chromebooks for completing the tasks.
Assessment: Tasks
Task Definition
Pass Tasks P1-P8 To achieve the minimum acceptable standard for this unit. To complete the Pass tasks, students
should be able to comprehend and execute Data Science solutions implemented in Python and write
code for basic problems with guidance. End of the Unit Assessment (20%) is one of the Pass tasks.

Credit Tasks C1-C2 Students will apply what they have learnt in the pass tasks with less guidance. To complete the
Credit tasks, the students should be able to comprehend and execute Data Science solutions
implemented in Python and write code for basic problems with limited guidance.

Distinction Task D1 Students will apply their advanced knowledge to design and build solutions to a real-world scenario.
To complete the Distinction tasks, students should be able to comprehend and execute Data Science
solutions implemented in Python and write code for moderately complex problems, independently.

High Distinction HD1 Students will extend their understanding to demonstrate greater technical ability in developing
more complex solutions to a real-world scenario. To complete the High Distinction tasks, Students
should be able to comprehend and execute Data Science solutions implemented in Python and write
code for complex problems, independently.
Assessment : Set a Target Grade
Target Grade Minimum Requirements
Pass All the Pass tasks are completed.
Failure to complete any of the Pass tasks will result in a Fail grade.
Credit Minimum requirements for a Pass grade are met AND all Credit tasks are completed.
Distinction Minimum requirements for a Credit grade are met AND the Distention task are completed.
In addition, the student must create a video recording, presenting their completed task; they may
be required to answer questions or make changes to their code.
High Distinction Minimum requirement for a Distinction grade are met AND all High Distention tasks are
completed.
Interviews are required. The students might be asked questions about their submissions, and they
may be required to complete small tasks during the interviews.
Assessment: Tasks - OnTrack
Assessment: Complete and Submit the Tasks

• Knowledge and skills in this unit continuously build on those learnt the weeks before. Therefore, if you fall
behind it becomes difficult to understand the subsequent contents; try to submit your tasks by 11 am of
the due dates. If you miss a due date, you can still submit your task by the end of Week 12 (The Deadline).
However, only submissions by 11 am of the due date will receive feedback (via OnTrack).
• Having said that, you can still ask help on overdue tasks during the workshops, although the priority goes
to the task that are current (released and not due yet).
• Before completing any task, please read the instructions in the task description and task completion form;
submit via OnTrack: https://ontrack.deakin.edu.au/. Please note that Task P8 (End of Unit Assessment)
will be submitted via CloudDeakin; you don’t need to submit P8 via OnTrack. More information about End
of Unit Assessment will be provided later.
Assessment: Submission Items
Task Submission Items
Pass Tasks P1-P8 • P1: Sign and submit the assessment guideline via OnTrack.
• P2-P7: Submit the task completion report (PDF file) via OnTrack.
• P8: Submit the End of the Unit Assessment (EoUA) via Cloud Deakin
Credit Tasks C1-C2 • Submit the task completion report (PDF file) via OnTrack.
• Submit the Jupyter Notebook (ipynb file) via OnTrack.
Distinction Task D1 • Submit the task completion report (PDF file) via OnTrack; a link to the video recording
must be included in the task completion report.
• Submit the Jupyter Notebook (ipynb file) via OnTrack.
High Distinction Task HD1 • Submit the task completion report (PDF file) via OnTrack; a link to the video recording
must be included in the task completion report.
• Submit the Jupyter Notebook (ipynb file) via OnTrack.
Task
Completion
Report
Assessment: Feedback
Feedback Meaning Required Action
Complete The submission meets the essential ∙ No further action is required.
requirements of the task and is ready for
inclusion in the portfolio.
Discuss The tutor would like to discuss the ∙ Respond to the tutor’s questions via OnTrack.
submission with the student.
Demonstrate The tutor would like the student to ∙ Meet with the tutor (online/on-campus) to demonstrate your submission.
demonstrate the submission.
Fix and Resubmit The submission needs to be improved or ∙ Fix your submission and resubmit.
fixed. ∙ Maximum of 2 resubmissions are allowed per task, but only the first
resubmission will receive further feedback – only if it is received within 7
of the initial feedback. The 2nd resubmission can be made anytime by the
end of Week 12 (the Deadline) with no feedback.
Fail The submission has failed to meet the ∙ No action is required.
essential requirements of the task.

You are not allowed to make more than 3 submissions per task (original submission plus a maximum of 2 resubmissions).
If you mistakenly exceeded this limit, please contact your tutor; they will help you fix the issue.
Assessment: Portfolio submission
• By the end of Week 12, you will need to submit your final portfolio including all your completed tasks.
• Please note that Task P8 (End of Unit Assessment) will be submitted via Cloud Deakin; you don’t need to
submit P8 via OnTrack. More information about End of Unit Assessment will be provided later.
Assessment: End of Unit Assessment

• Evaluates your basic understanding of the unit.


• It is a Pass requirement; it is treated as a Pass task, and it is as challenging as
any other pass task.
• Task P8 (End of Unit Assessment) will be submitted via CloudDeakin; you don’t
need to submit P8 via OnTrack.
• More information about End of Unit Assessment will be provided later.
Assessment: Plagiarism
∙ SIT112 has zero tolerance for plagiarism, plagiarized submissions will be
flagged as Fail and reported to the academic integrity committee for further
investigation.
ChatGPT
∙ In this unit, we encourage you to use ChatGPT when
appropriate.
∙ The tutors will help you navigate your way through
this new world of AI in education to have an enjoyable
learning experience!
∙ Use ChatGPT for Data Science on the Student Forum
Additional Help: Help-Hub Sessions
∙ If you need assistance with your programming skills, please use the HelpHub sessions as listed
on CloudDeakin.
∙ You can ask programming questions and get limited support on the programming side of the
tasks.
∙ Please do not use HelpHub as a replacement for the Workshops.
Additional Help: Math Help
If you need help with math, please use the Maths Mentors Drop-in sessions: the Maths Mentors are
available Monday to Friday, 10 am – 2 pm through the Zoom Maths Mentor Online Drop-in or email
[email protected] anytime and the mentors will respond when they are next working.
Questions
• Technical Questions:
• Any technical questions about the lectures or tasks should be asked during the Workshops.
• If you are attending the online workshops, you can ask your tutors to help you in a one-to-one session.

• Non-Technical Questions:
• Directly email the examiner: [email protected]
• Your subject line must contain: UnitCode-StudentID-Subject
• You can expect an answer within 2 working days.
Discussions
• During the Workshops: we may allocate some
time to discussion.
• Outside the Workshops: you can continue to
discuss on the student forum.
• The student forum is to encourage discussion
among the students; they are not frequently
monitored by the teaching team.
Announcements

• We use CloudDeakin announcements


to communicate any updates, news,
and changes about the lectures,
workshops, tasks, etc.
• You are expected to read the
announcements carefully.
Best Practices
• Work on your tasks as early as possible.; do not procrastinate!
• Demonstrate your work to the tutors and ask for in-class feedback. That is especially useful for Credit, Distention, and
High Distention tasks. Your tutors can tell you how far your task is from completion.
• Try not to miss the due dates; you still can resubmit if the tutor did not flag your initial submission as complete ☺
• Learn how to use ChatGPT as your personal tutor!
• Set a realistic target grade that matches your skills, the grade requirements, and your career/study plans. If you are not
sure, talk to the teaching team; they will be able to guide you.
• Feel free to communicate any obstacle that prevents you from achieving your target grade; we can always help you
overcome it - as long as the target grade is set realistically.
• Be respectful to the tutors; they are there to help you achieve your goals!
• Stay positive and don’t be stressed; teaching and learning must be joyful ☺
Meet the Data Wizard
Inside Look at Data Science – Q&A Session
To be Announced
Online Resources
https://www.kaggle.com/
https://towardsdatascience.com/
https://www.datasciencecentral.com/
https://www.kdnuggets.com/
What is Data Science?
What is Data Science?
There are Different Definitions

There is not yet a clear


Data science is an emerging definition agreed by all for
discipline. It remains a the term ‘data science’.
science where new Different definitions exist
knowledge and tools are still from different perspectives
being invented. (government, business,
research, etc.)
NIST, Big Data
Interoperability
Framework
Latest standard on data science and big data
analytics: October 21, 2019.

URL: https://doi.org/10.6028/NIST.SP.1500-1r2
What is Data Science?
NIST’s definition

NIST’s definition: “Data science is the methodology for the synthesis of useful knowledge
directly from data through a process of discovery or of hypothesis formulation and hypothesis
testing.”
Who is Data Scientist?
NIST Definition

NIST definition: “A data scientist is a practitioner who has sufficient knowledge in the overlapping
regimes of business needs, domain knowledge, analytical skills, and software and systems engineering
to manage the end to-end data processes in the analytics life cycle.”
Who is Data Scientist?
A Bad Joke from Joel Grus …

“There’s a joke that says a data scientist is someone who


knows more statistics than a computer scientist and more
computer science than a statistician. (I didn’t say it was a
good joke.) In fact, some data scientists are—for all
practical purposes—statisticians, while others are fairly
indistinguishable from software engineers. Some are
machine learning experts, while others couldn’t
machine-learn their way out of kindergarten.”
Data Science is Interdisciplinary

https://doi.org/10.6028/NIST.SP.1500-1r2
Data Science is Interdisciplinary
● Domain data and processes - set of values that share common meaning or purpose. For example Customer
database - customer name, address, phone number, email address.
● Algorithms - Algorithms act as an exact list of instructions that conduct specified actions step by step.
● Software and Systems Engineering - They are involved in software creation - like creating the concept, design
and coding of the software. They maintain the software throughout its life cycle.
● Analytical Systems - These are IT systems that process the information outputs produced by middleware. Analytic
systems may be comprised of databases, data processing software, and Web services.
● Statistics - Statistics is a branch of applied mathematics. It is used to collect and summarize data.
● Machine Learning - ML is the science of developing algorithms and statistical models that computer systems use
to perform complex tasks without explicit instructions.
https://doi.org/10.6028/NIST.SP.1500-1r2
● Data Mining - Data mining is the process of analyzing a large batch of information to discern trends and patterns.
Video and other resources
● What is statistics

● Descriptive vs Inferential Statistics

● Learn more about Statistics

● Machine Learning

https://doi.org/10.6028/NIST.SP.1500-1r2
What is Statistics?

What is statistics?
Descriptive & Inferential Statistics

Descriptive statistics vs Inferential statistics


Learn More about
Statistics
https://www.mathsisfun.com/data/
Machine Learning

What is Machine Learning?


Machine Learning: Supervised Learning
Supervised learning: Models that can predict labels based on labeled training data.
• Classification: Models that predict labels as two or more discrete categories.
• Email spam classification: Given an email, the task is to classify it as either spam or not spam (ham).
• Image classification: Given an image, the task is to classify it into one of several categories, such as
dogs, cats, birds, etc.
• Regression: Models that predict continuous labels.
• House price prediction: Given a set of features such as the number of bedrooms, bathrooms, square
footage, etc., the task is to predict the selling price of a house..
• Stock price prediction: Given historical stock prices, the task is to predict the future stock prices.
Machine Learning: Unsupervised Learning
Unsupervised learning: Models that identify structure in unlabeled data
• Clustering: Models that detect and identify distinct groups in the data
• Customer segmentation: To segment customers into groups based on similarities in their behavior,
preferences, demographics, or purchase history.
• Image segmentation: To identify different objects in an image, such as cars, people, or animals. This can be
useful for tasks like object recognition, image search, or autonomous driving.
• Dimensionality reduction: Models that detect and identify lower-dimensional structure in
higher-dimensional data
• Face recognition: To extract the most important features from images in order to recognize faces. For example,
Principal Component Analysis (PCA) can be used to reduce the dimensionality of facial images.
• Sentiment analysis: To extract the most important features from text in order to perform sentiment analysis.
For example, Singular Value Decomposition (SVD) can be used to reduce the dimensionality of a
document-term matrix.
Our Focus
● Data science is a new set of skills that you can apply.
● Whether you are reporting election results, optimizing online ad clicks, identifying microorganisms in microscope
photos, or working with data in any other field, the goal of this unit is to give you the ability to ask and answer new
questions about your chosen subject area.,
● For example:
○ In Healthcare - Can we predict patient readmission rates based on certain physiological and behavioural factors?
○ In Finance - Are there patterns in customer spending behaviour that can help tailor personalized financial
products or services?
○ In Education - Are there early indicators that can predict student dropout rates or academic success?
● All the areas, we saw earlier require a solid set of the essential skills for data analysis and data visualization. So
that’s where we start, and those are the skills that you’ll learn from this unit.
● In this unit, you’ll learn how to use Python for data analysis and data visualization, and you’ll be introduced to the
What You Need to Do Before
Starting an Analysis
Set Your Goals
The goals of analysis can be well-defined, like trying to answer specific questions, or more general, like
trying to extract useful information from large volumes of data. Setting goals helps you define the
purpose and scope of your analysis and guide your decision-making throughout the process.
• Focus: Setting goals helps you focus on what's important and; prioritize the data sources, variables, and
analysis techniques that are most relevant to your goals, and avoid getting sidetracked by irrelevant
data or analysis.
• Measurement: Setting goals helps you establish benchmarks and metrics for measuring success.; you
can track your progress over time, evaluate the impact of your analysis, and make informed decisions
about next steps.
• Collaboration: Setting goals helps you align stakeholders and collaborators around a shared vision;
avoid confusion or misalignment.
• Accountability: Setting goals helps you hold yourself and others accountable for results.
Define Your Target Audience
If you’re presenting your findings to other people like managers or clients, you also need to define your target audience
before you start your analysis. This helps you tailor your analysis and presentation to their specific needs and
interests.

• Managers may be more interested in high-level insights and strategic recommendations than in detailed technical
explanations.

• Clients may be more interested in how your analysis can help them solve specific problems or achieve specific goals.
They may also be more interested in visualizations and interactive tools that allow them to explore the data
themselves.

• By defining your target audience, you can also anticipate potential objections or questions they may have and
address them proactively in your analysis and presentation. This can help build credibility and trust with your
audience.
The 5 Phases of Data
Analysis and Visualization
The 5 Phases of Data Analysis and
Visualization: 1. Get the Data
• Process of collecting and acquiring data for analysis.
• Involves identifying and gathering relevant data from various sources, such
as databases, APIs, web scraping, surveys, or experiments.
• It also involves documenting the data sources
The 5 Phases of Data Analysis and
Visualization: 2. Clean the Data
Focuses on identifying and correcting errors and inconsistencies in the data:
• Remove unnecessary rows and columns: This involves eliminating any rows or columns in the
dataset that are not relevant or useful for the analysis, which can improve processing speed and reduce
noise in the data.
• Handle invalid or missing values: This involves identifying and addressing missing or invalid data
values in the dataset, which can impact the accuracy and reliability of the analysis. Common approaches
to handling missing or invalid values include deletion, substitution, or imputation (estimating the
values by preserving the statistical relationships).
• Change object data types to datetime or numeric data types: Converting data stored in object (text)
format into datetime or numeric formats, such as integers, floats, or timestamps, is necessary for many
analysis techniques that require specific data formats.
The 5 Phases of Data Analysis and
Visualization: 3. Prepare the Data
• Add columns that are derived from other columns: This involves creating new columns in a dataset
based on the values of other columns, using mathematical calculations or data transformations to
extract more meaningful information.
• Shape the data into the forms that are needed for your analysis: This involves structuring the
data in a way that makes it easier to analyze, including filtering, sorting, and grouping the data to
focus on relevant information.
• Make preliminary visualizations to better understand the data: This involves creating graphs,
charts, and other visual representations of the data to explore its patterns and relationships, and
to identify any outliers or anomalies that may need further investigation.
The 5 Phases of Data Analysis and
Visualization: 4. Analyze the Data
• Get new views of the data by grouping and aggregating the data: This involves
summarizing the data by creating subsets based on specific variables and then applying
aggregation functions such as sum, average, or count to calculate summary statistics for each
group.
• Make visualizations that provide insights and show relationships: This involves creating
charts, graphs, and other visual representations of the data to identify patterns, trends, and
relationships that may not be apparent in raw data.
• Model the data as part of predictive analysis: This involves building statistical models, such
as regression or decision trees, to predict future outcomes based on historical data and to
identify the key factors that drive those outcomes.
The 5 Phases of Data Analysis and
Visualization: 5. Visualize the Data
• Enhancing visualizations to make them appropriate for the target audience.
• Involves tailoring the presentation of data to effectively communicate insights and key
messages to a specific group of people. This can include modifying the design, layout, and level
of detail of the visualization to match the audience's level of expertise, interests, and
preferences.
• For example, a simple and intuitive visualization may be more suitable for a non-technical audience,
while a more complex and detailed visualization may be more appropriate for a group of data experts.
What’s the difference between data cleaning
and data preparation?

Answer …
What’s the difference between data cleaning
and data preparation?

Answer: Data cleaning focuses on identifying and


correcting errors and inconsistencies in the data,
while data preparation focuses on transforming and
formatting the data to make it suitable for analysis.
Both steps are important in the data analysis process, as
they help to ensure that the results of the analysis are
accurate and meaningful.
Getting Started with Python
INSTALLING ANACONDA

https://docs.anaconda.com/anaconda/install/windows
https://docs.anaconda.com/anaconda/install/mac-os
https://docs.anaconda.com/anaconda/install/linux
LAUNCHING JUPYTERLAB FROM ANACONDA NAVIGATOR
TUTORIALS
● JupyterLab Tutorial - 1

● JupyterLab Tutorial - 2
Modules Included with Anaconda
Module Abbreviation Provides methods for
pandas pd Data analysis and visualization
numpy np Numerical computing
seaborn sns Data visualization
datetime dt Working with datetime objects
urllib Getting files from the web
zipfile Working with zip files
sqlite3 Working with a SQLite database
json Working with JSON data
Sklearn
Regression analysis
Two Ways to Install a Module

Install Pandas from the default channel.

conda install pandas –yes

Install pyreadstat from the conda-forge channel.

conda install –-channel conda-forge pyreadstat --yes

Conda-forge is a community-driven channel for Conda packages that provides many additional packages
beyond those available in the default Conda channels.
How to Import Modules

Import one module into the namespace specified by the ‘as’ clause

import pandas as pd

Import one submodule from a module

from urllib import request


How to Call Methods
How to call a method in a module
import pandas as pd
polls_url =\
'http://projects.fivethirtyeight.com/.../president_general_polls_2016.csv'
polls = pd.read_csv(poll_url)
How to call a method from a DataFrame object
polls.sort_values('startdate')
How to Chain Methods
How to chain the sort_values() and head() methods

# sort the rows of the DataFrame polls in ascending order based on the
values in the 'startdate' column and return the first five rows of the
resulting DataFrame.
polls.sort_values('startdate').head()

How to chain the query() and plot() methods


polls.query('state != "U.S."') \
.plot(x='startdate', y=['Clinton_pct','Trump_pct'])
How to Call a Method with Positional
and Keyword Parameters
The signature for the sort_values() method
sort_values(by, axis=0, ascending=True, inplace=False,
kind='quicksort', na_position='last')
The sort_values() method with positional
and keyword parameters
polls.sort_values('startdate', ascending=False, inplace=True)
The Syntax for Coding Lists, Slices, Tuples,
and Dictionary objects
A list is a sequence of items within brackets
[item1,item2,...]

A tuple is coded like a list but in parentheses


(item1,item2,...)

A dictionary is a sequence of key/value pairs


within braces
{key1:value1, key2:value2, ...}

A slice sets the start and stop values


and an optional step value
start:stop:step
How to use Lists, Slices, Tuples,
and Dictionary Objects
A list used as a keyword parameter
polls.drop(columns=['cycle','branch','matchup','forecastdate'],
inplace=True)
A tuple used as a keyword parameter
polls.plot.line(xlim=('2016-06','2016-11'))
A dictionary used as a keyword parameter
polls.rename(columns={'adjpoll_clinton':'Clinton',
'adjpoll_trump':'Trump'})
Two slices used in a loc[ ] accessor
polls.loc[0:100:10,'state':'grade']
How to Code a List Comprehension
The syntax
[expression for member in iterable]
A list comprehension used to provide the list
for a keyword parameter
xticks = [x for x in range(1900,1920,2)]
The resulting list
[1900, 1902, 1904, 1906, 1908, 1910, 1912, 1914, 1916, 1918]
The Type Function

How to use the Python type() function to check the data type of a variable
Two ways to Continue a Statement
With implicit continuation
polls.sort_values(
['state','startdate'],
ascending=False,
inplace=True)
With explicit continuation
polls.sort_values(['state','startdate'], \
ascending=False, \
inplace=True)
Using JupyterLab
Working with the Cells
How to select one or more cells
∙ To select one cell, position the pointer in the left margin of the cell so it becomes a crosshair,
and then click so a blue line is displayed.
∙ To select more than one cell, select the first cell, hold down the Shift key, and select the last
cell.

How to copy, delete, merge, or move the selected cells


∙ Use the buttons in the toolbar or the items in the Edit or shortcut menu.

How to add a cell after the current cell


∙ Use the + button in the toolbar.
Working with the cells (Cont.)
How to run the code in one cell
∙ Press Shift+Enter or click the Run button in the toolbar.

How to run the code in selected cells or all cells


∙ Use the Run button in the toolbar or the items in the Run menu.

How to interrupt, restart, or shutdown the kernel


∙ Use the items in the Kernel menu.
Markdown Language
A Notebook with headings

The Markdown language for the headings


Tab Completion
The Tab completion feature is activated when you press the Tab key
The tooltip feature
The tooltip feature is activated when you press the Shift+Tab key
The start of the tooltip for the sort_values() method
Syntax Error
• A syntax error occurs when the interpreter cannot parse the code due to invalid syntax. This
means that the code violates the language grammar rules, which results in an error message.
• Syntax errors can occur for a variety of reasons, such as a missing or misplaced punctuation
mark, an incorrect keyword or function name, or a mismatched set of parentheses or brackets.

A syntax error in a Notebook


Runtime Error
A runtime error in a Notebook
A runtime error occurs when the code is
syntactically correct, but an error occurs
during the execution of the code. This can
happen for a variety of reasons, such as trying
to perform an illegal operation, attempting to
divide by zero, or accessing an undefined
variable.

Runtime errors are also known as exceptions,


and when they occur, Python will usually raise
an exception and stop the program's
execution. The exception message will
provide information about what went wrong
and where the error occurred, allowing you to
identify and fix the issue.
References
• Data science from scratch: first principles with Python, Joel Grus, O'Reilly Media, 2019
• Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Wes McKinney, O'Reilly
Media, 3rd edition, 2022.
• Python Data Science Handbook: Essential Tools for Working with Data, Jake Vanderplas, O'Reilly Media,
2022
• Murach’s Python for Data Analysis, Scott McCoy, Mike Murach & Associates, Incorporated, 2021.
• Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The
Cloud, Paul Deitel, Pearson Education Limited, 2021.
• ChatGPT
End of lecture …

You might also like