0% found this document useful (0 votes)
3 views

02 Data Mining Process

02 Data Mining Process

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

02 Data Mining Process

02 Data Mining Process

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Mining

Process
Course: Artificial Intelligence
Fundamentals

Instructor: Marco Bonzanini


Section Outline

• KDD process

• CRISP-DM process

• Discussion: relationship with software engineering

• Environment set-up (demo)


Knowledge Discovery in
Databases

• a.k.a. the KDD process

• Stages:
— Selection
— Pre-processing
— Transformation
— Data Mining
— Interpretation / Evaluation
Cross-Industry Standard
Process for Data Mining
• a.k.a. the CRISP-DM process

• Stages:
— Business Understanding
— Data Understanding
— Data Preparation
— Modelling
— Evaluation
— Deployment
CRISP-DM

https://en.wikipedia.org/wiki/File:CRISP-DM_Process_Diagram.png
Environment Set-up
with Python
Python Overview
• General purpose language: it works well in a
range of applications, from web applications to
scientific computing; multiple programming
paradigms (procedural, object-oriented, functional)

• Elegant syntax: easy to read, easy to learn, quick


to write (no boilerplate)

• Expressive language: fewer lines of code


Python Overview (2)
• Dynamic typing: data types are inferred by the
interpreter, no need to define types of variables,
arguments or return types [read small print]

• Interpreted: no need to compile, it runs anywhere


a runtime interpreter is available [read small print]

• Automatic memory management: no need to


allocate memory explicitly, nor to clean it up
(garbage collection)
Python Overview: small print

• Dynamic typing: “if it quacks, it’s a duck” (duck


typing)

• Dynamic typing: there’s no compiler to catch your


bugs — you can still annotate, and use a static
analysis tool (flake, pylint, …)

• Interpreted: is Python slow?


Python for Data Science

• Rapid prototyping: minimal development time

• End-to-end: mature language and tools, same


language and frameworks for R&D and production

• Scientific Computing Ecosystem: large


ecosystem of well-established tools for scientific
computing (numpy, scipy, pandas, scikit-learn, …)
Python for Data Science:
Ecosystem
• NumPy: computation on multi-dimensional arrays

• SciPy: modules for interpolation, optimisation, linear algebra,


integration, signal processing, etc.

• pandas: high-level library for data manipulation and analytics

• matplotlib: plotting library

• scikit-learn: Machine Learning library

• Keras: high-level, user-friendly Neural Network library

• Jupyter notebook: interactive web-based environment


Demo

You might also like