0% found this document useful (0 votes)
5 views

02 Data Mining Process

02 Data Mining Process

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

02 Data Mining Process

02 Data Mining Process

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Mining

Process
Course: Artificial Intelligence
Fundamentals

Instructor: Marco Bonzanini


Section Outline

• KDD process

• CRISP-DM process

• Discussion: relationship with software engineering

• Environment set-up (demo)


Knowledge Discovery in
Databases

• a.k.a. the KDD process

• Stages:
— Selection
— Pre-processing
— Transformation
— Data Mining
— Interpretation / Evaluation
Cross-Industry Standard
Process for Data Mining
• a.k.a. the CRISP-DM process

• Stages:
— Business Understanding
— Data Understanding
— Data Preparation
— Modelling
— Evaluation
— Deployment
CRISP-DM

https://en.wikipedia.org/wiki/File:CRISP-DM_Process_Diagram.png
Environment Set-up
with Python
Python Overview
• General purpose language: it works well in a
range of applications, from web applications to
scientific computing; multiple programming
paradigms (procedural, object-oriented, functional)

• Elegant syntax: easy to read, easy to learn, quick


to write (no boilerplate)

• Expressive language: fewer lines of code


Python Overview (2)
• Dynamic typing: data types are inferred by the
interpreter, no need to define types of variables,
arguments or return types [read small print]

• Interpreted: no need to compile, it runs anywhere


a runtime interpreter is available [read small print]

• Automatic memory management: no need to


allocate memory explicitly, nor to clean it up
(garbage collection)
Python Overview: small print

• Dynamic typing: “if it quacks, it’s a duck” (duck


typing)

• Dynamic typing: there’s no compiler to catch your


bugs — you can still annotate, and use a static
analysis tool (flake, pylint, …)

• Interpreted: is Python slow?


Python for Data Science

• Rapid prototyping: minimal development time

• End-to-end: mature language and tools, same


language and frameworks for R&D and production

• Scientific Computing Ecosystem: large


ecosystem of well-established tools for scientific
computing (numpy, scipy, pandas, scikit-learn, …)
Python for Data Science:
Ecosystem
• NumPy: computation on multi-dimensional arrays

• SciPy: modules for interpolation, optimisation, linear algebra,


integration, signal processing, etc.

• pandas: high-level library for data manipulation and analytics

• matplotlib: plotting library

• scikit-learn: Machine Learning library

• Keras: high-level, user-friendly Neural Network library

• Jupyter notebook: interactive web-based environment


Demo

You might also like