0% found this document useful (0 votes)
11 views

CE880_Lecture_1_slides

Uploaded by

Anand A J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CE880_Lecture_1_slides

Uploaded by

Anand A J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

ILecture 1: What is Data Science?

An Approachable Introduction to Data Science

Haider Raza
Tuesday, 17 Jan 2023

1
About Myself

I Name: Haider Raza


I Position: Senior Lecturer in Artificial Intelligence
I Research interest: AI, Machine Learning, Data Science
I Contact: [email protected]
I Academic Support Hours: 1-2 PM on Friday via zoom. Zoom link is available
on Moodle
I Website: www.sagihaider.com

2
Assessment Information

The aim of the module is to develop quantitative skills in the area of AI and Data
Science to enable professionals working in areas in which these topics are now being
embedded. The module will enable those future professionals to take a knowledgeable
approach to their use of AI and data science.
The deadlines are as follows:

I Coursework: Weekly Lab, 14/04/2023 (Before 11h:59m. 59s) (60%)


I Coursework: Case Study, 14/04/2023 (Before 11h:59m. 59s) (40%)
Important points to noted:

I Students are required to complete their weekly lab work and submit all the lab
work notebooks to Faser before 14/04/2023 (11h:59m. 59s)
I If due to any reason you will not able to submit the coursework (Labs / Case
Study) on time, please consider https://www.essex.ac.uk/student/
exams-and-coursework/extenuating-circumstances

3
What is Data?

Data is a collection of facts, such as numbers, words, measurements,


observations or just descriptions of things

4
Types of Data

5
What is Data Science?

Data science enables businesses to process huge amounts of structured


and unstructured big data to detect patterns. This in turn allows
companies to increase efficiencies, manage costs, identify new market
opportunities, and boost their market advantage.

6
Data Science Venn Diagram

7
Advantage and Disadvantages of Data Science

Advantages Disadvantages
I Multiple job options I Data science is blurry term
I Business benefits I Mastering data science is near
I Highly paid jobs career to impossible
opportunities I Good domain knowledge
I Data science makes data better required
I Data science is versatile can be I Arbitrary data may yield
applied to any business unexpected results
I No more boring tasks I Problem of data privacy
I Everyday learning something
new

8
5 Reasons Why to Study Data Science?

I Learning about data science provides an opportunity for you to recreate yourself.
I We live in a digital world, everything is data-driven.
I Data science is also a very promising field with lots of high paying job
opportunities.
I Basic data science skills are important for personal use.
I You can use your knowledge in data science for generating side income.

9
History of Data Science

1
Figure 1: History of data science

1
https://towardsdatascience.com/the-history-of-data-science-dfe789499d50

10
Data Science Workflow

1
Microsoft

11
Data Science: Healthcare

1
Microsoft

12
Data Science: Finance

1
Microsoft

13
Data Science: Journalism

1
BBC

14
Data Science: Sports

1
Orreco

15
Data Science: Crime Prevention

1
IBM

16
Data Science: How UK Government is using it

1
blog.gov.uk 17
Data Science: Tools and Techniques

1
https://becomingadatascientist.wordpress.com/2013/07/26/choosing-a-data-science-technology-
stack-w-survey/

18
What Tools and Packages we are going to use?

I Coding: Google Colab for Python www.https://colab.research.google.com/


I Version control: GitHub www.https://github.com/
I Data Source: .csv, .xlsx, .tsv
I Python packages: NumPy, Pandas, Matplotlib, pip, Scikit-Learn, SciPy, Seaborn,
etc

19
Introduction to NumPy

What is NumPy?
NumPy is a Python library used for working with arrays.

I It also has functions for working in domain of linear algebra, fourier transform,
and matrices.
I NumPy was created in 2005 by Travis Oliphant. It is an open source project and
you can use it freely.
I NumPy stands for Numerical Python.

20
Introduction to NumPy

1
Nature: Array programming with NumPy

21
Introduction to Pandas

What is Pandas?
Pandas is a Python library used for working with data sets.

I It has functions for analyzing, cleaning, exploring, and manipulating data.


I The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.

22
How Data Frame looks like?

1
W3resource

23
Introduction to Matplotlib

What is Matplotlib?
Matplotlib is a low level graph plotting library in python that serves as a visualization
utility.

I Matplotlib was created by John D. Hunter.


I Matplotlib is open source and we can use it freely.
I Matplotlib is mostly written in python, a few segments are written in C,
Objective-C and Javascript for Platform compatibility.

24
What Matplotlib can do?

1
https://towardsdatascience.com/python-data-visualization-with-matplotlib-part-2-66f1307d42fb

25
Introduction to Scikit-Learn

What is Scikit-Learn?
Scikit-learn is a library in Python that provides many unsupervised and supervised
learning algorithms. It’s built upon some of the technology you might already be
familiar with, like NumPy, pandas, and Matplotlib! The functionality that scikit-learn
provides include:

I Regression: including Linear and Logistic Regression


I Classification: including K-Nearest Neighbors
I Clustering: including K-Means and K-Means++
I Model selection:
I Pre-processing: including Min-Max Normalization

26
Introduction to scikit-learn

1
https://scikit-learn.org/

27
Introduction to GitHub

What is GitHub?
GitHub is a code hosting platform for collaboration and version control. GitHub lets
you (and others) work together on projects.
What GitHub Repository can do?

I A GitHub repository can be used to store a development project


I It can contain folders and any type of files (HTML, CSS, JavaScript, Documents,
Data, Images)
I A GitHub repository should also include a licence file and a README file about
the project
I A GitHub repository can also be used to store ideas, or any resources that you
want to share

28
What GitHub can do?

1
https://www.coursereport.com/

29
Thank you!

30

You might also like