0% found this document useful (0 votes)

13 views

Notes Data Science 1

The document discusses data science topics including data mining tasks, the data mining process, supervised vs unsupervised learning, predictive modeling, and numpy. It provides examples of using numpy to perform operations on arrays such as creating, transforming, slicing, and calculating statistics of arrays.

Uploaded by

Đạt Trần Hưng

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Notes Data Science 1

Uploaded by

Đạt Trần Hưng

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

#NOTES ON DATA SCIENCE PART 2

#Chapter 1

'''
1 Use of data mining
2 Data Mining
4 Data Science, Engineering, and Data-driven decision making
6 Target example
7 Automated decision making
8 Big data
9 Big data 1 to Big Data 2
9 Data as an asset
10 Credit cards
11 Success stories
12 Data analytic thinking
14 Data Mining and data science
15 Data Science vs Data Scientist
'''

#Chapter 2
#A set of canonical data mining tasks; The data mining process; Supervised vs
Unsupervised data mining
'''
20 Classification and class probability estimation
21 Regression
21 Similarity matching
21 Clustering
21 Co-occurrence grouping (frequent itemset mining, association rule discovery,
market-based analysis)
22 Profiling
22 Data reduction
23 Causal modeling
24 Supervised vs Unsupervised methods
26 The data mining process
27 CRISP
27 Business Understanding
29 Data preparation
30 Leak
31 Evaluating the results of data mining
35 Statistics
37 Database Querying
38 Data Warehousing
39 Regression analysis
39 Machine Learning and Data Mining
'''

# Chapter 3: Intro to Predictive Modeling

# Identifying informative attributes,
# Segmenting data by progressive attribute selection
# Finding correlations, Attribute/Variable selection, Tree induction
'''
44 Models, Induction and Prediction
47 Data induction
51 Information gain, entropy
laplace correction
logistic regression
svm
sigmoid curve formula, calculate probability
'''

# Chapter 4: Fitting a modeling

# Logistic Regression, MSE

#NUMPY

import numpy as np
arr = np.arr([1, 2, 3, 4, 5])
print(type(arr))
OUTPUT: numpy.ndarray

np.arange(num) # create an array of numbers

np.reshape(array) # transform an array into another array
np.sqrt
np.ceil
np.floor
np.random.randn(val) #create an array with val random numbers from -1 to 1
np.add(a,b) OR a+b
np.multiply(a,b) OR a*b
np.maximum(c,d)
np.minimum(c,d)
np.modf(arr) #returns two arrays, one with the decimals and one with the integral
np.exp(num) #returns e^x
np.array() #make an array
np.dot() #dot product
np.ones()
np.zeros()
np.fill_diagonal()
np.mean()
np.amax()
np.newaxis()
np.vstack()
np.eye()
np.ndarray.strides

.ndim #number of dimensions

.shape #shape of the array
.dtype #data type of the data in the array
.transpose OR .T
.expand_dims

# Write a function that takes f_x as input and

# returns a numpy array containing probabilities of f(x) using logistic regression

def calc_prob(f_x):
#f_x = -1 * f_x
return (1/(1 + np.exp(-1*f_x)))

# Slicing, fancy indexing

# Question 9: Create a 6x6 array of zeros. Use fancy indexing to add 9 to the last
3 rows

arr2 = np.zeros((6,6), dtype = int)

arr2[[3,4,5]] = 9
arr2
# Question 5: Change all the values of the first row in the first layer of arr3D to
900 using slicing

arr3d[:1,:1] = 900
arr3d

# Assignment questions and answers

import numpy as np
mylist = [10, 20, 30]
myarray = np.array(mylist)
myarray

#######################

arr3d = np.arange(30, 109, 3).reshape(3,3,3)

arr3d

#######################

print("Number of dimensions:", arr3d.ndim)

print("Shape:", arr3d.shape)
print("Data type:", arr3d.dtype)

#######################

arr3d[:1] = 600
arr3d

#######################

arr3d[:1,:1] = 900
arr3d

#######################

A = np.array([5, 5, 5])
B = np.array([6, 6, 6])
dot_product = np.dot(A, B)
print(dot_product)

#######################

A = np.array([3.14, 1.41, 2.25])

x, y = np.modf(A)
print(x)
print(y)

#######################

arr1 = np.ones((5,5), dtype = int)

np.fill_diagonal(arr1, 9)
arr1

#######################

arr2 = np.zeros((6,6), dtype = int)

arr2[[3,4,5]] = 9
arr2
#######################

arr3 = np.arange(0, 121, 5).reshape(5,5)

maxval = np.amax(arr3, axis = 0)
print(arr3)
print("mean of each row:", np.mean(arr3, axis = 1))
print("max value in each column:", np.amax(arr3, axis = 0))

#########################
## Entropy Calculation ##
#########################

# calculate the entropy for a dice roll using base 6

from scipy.stats import entropy
# discrete probabilities
p = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
# calculate entropy
e = entropy(p, base=6)
# print the result
print('entropy: %.3f bits' % e)

# calculate the entropy for coin flip

from scipy.stats import entropy
# discrete probabilities
p = [1/2, 1/2]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

# calculate the entropy for 3 classes using base 3

from scipy.stats import entropy
# discrete probabilities
p = [1/3, 1/3, 1/3]
# calculate entropy
e = entropy(p, base=3)
# print the result
print('entropy: %.3f bits' % e)

#######################

# calculate the entropy for an unfair dice where even numbers show up twice as
often as odd numbers
from scipy.stats import entropy
# discrete probabilities
p = [0.67/6, (0.67*2)/6, 0.67/6, (0.67*2)/6, 0.67/6, (0.67*2)/6]
# calculate entropy
e = entropy(p, base=6)
# print the result
print('entropy: %.3f bits' % e)

#######################

# calculate the entropy for coin flip where heads comes 75% of times and tails
comes 25% of the times
from scipy.stats import entropy
# discrete probabilities
p = [3/4, 1/4]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

# compute entropy for 2 classes having probabilities of 2/3 and 1/3

from scipy.stats import entropy
# discrete probabilities
p = [2/3, 1/3]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

from scipy.stats import entropy

# discrete probabilities
p = [0.66, 0.37]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

from scipy.stats import entropy

# discrete probabilities
p_pos = 0.5
p_negative = 0.5
p = [p_pos, p_negative]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy = %.3f' % e)

#######################

from scipy.stats import entropy

# discrete probabilities
p_pos = 0.2
p_negative = 0.8
p = [p_pos, p_negative]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy = %.3f' % e)

#######################

from scipy.stats import entropy

# discrete probabilities
p_pos = 0.7
p_negative = 0.3
p = [p_pos, p_negative]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy = %.3f' % e)

#######################
## MASKING ##
#######################

data = np.arange(15).reshape(5,3)

OUT: array([[ 0, 1, 2],

[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])

alpha = np.array(['A','B','C','A','C'])
alpha == 'A'

OUT: array([ True, False, False, True, False])

data[alpha=='A']
OUT: array([[ 0, 1, 2],
[ 9, 10, 11]])

data[alpha == 'A'] = 100

data
OUT: array([[ 100, 100, 100],
[ 1, 4, 5],
[ 6, 7, 8],
[ 100, 100, 100],
[12, 13, 14]])

# https://numpy.org/doc/stable/reference/ufuncs.html

NumPy 2
No ratings yet
NumPy 2
11 pages
Workshop Notes-2 Handling Array with NumPy
No ratings yet
Workshop Notes-2 Handling Array with NumPy
13 pages
Unit5 NumPy Pandas Notes
No ratings yet
Unit5 NumPy Pandas Notes
90 pages
Data Manipulation With Numpy
No ratings yet
Data Manipulation With Numpy
13 pages
LAB1_ML_EAC22050
No ratings yet
LAB1_ML_EAC22050
17 pages
Numpy
No ratings yet
Numpy
4 pages
Notebook 1 - Numpy
No ratings yet
Notebook 1 - Numpy
17 pages
L_AND_T_project_Naveen 24cs002895
No ratings yet
L_AND_T_project_Naveen 24cs002895
7 pages
Labmanualfds
No ratings yet
Labmanualfds
49 pages
Efficient Computing with NumPy
No ratings yet
Efficient Computing with NumPy
73 pages
ML Journal
No ratings yet
ML Journal
58 pages
DL Lab2
No ratings yet
DL Lab2
38 pages
Aks Python Numpy
No ratings yet
Aks Python Numpy
6 pages
cs229_python_friday
No ratings yet
cs229_python_friday
40 pages
Numpy
No ratings yet
Numpy
11 pages
NumPy Functions
No ratings yet
NumPy Functions
5 pages
Numpy Library Basics
No ratings yet
Numpy Library Basics
16 pages
AP19110010420 - Venkatraja
No ratings yet
AP19110010420 - Venkatraja
6 pages
L2. Numpy
No ratings yet
L2. Numpy
24 pages
Exercise2 Submission Group 12 Yalcin Mehmet
No ratings yet
Exercise2 Submission Group 12 Yalcin Mehmet
10 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Sheet 3 Numpy
No ratings yet
Sheet 3 Numpy
10 pages
Numpy
No ratings yet
Numpy
1 page
numpy
No ratings yet
numpy
8 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Python programming U5
No ratings yet
Python programming U5
46 pages
numpy33
No ratings yet
numpy33
8 pages
DAY 1-Numpy -18.12.24-Data Science
No ratings yet
DAY 1-Numpy -18.12.24-Data Science
8 pages
NumPy is
No ratings yet
NumPy is
8 pages
Ch11a Numpy
No ratings yet
Ch11a Numpy
8 pages
Artificial Intelligence Lab
No ratings yet
Artificial Intelligence Lab
13 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
Assignment5 6
No ratings yet
Assignment5 6
5 pages
Ap19110010321 - Janith
No ratings yet
Ap19110010321 - Janith
6 pages
Precision and Recall
No ratings yet
Precision and Recall
13 pages
‎⁨ דף נוסחאות מדמח סופי⁩
No ratings yet
‎⁨ דף נוסחאות מדמח סופי⁩
2 pages
PS Assignment 3
No ratings yet
PS Assignment 3
2 pages
03 Numpy
No ratings yet
03 Numpy
12 pages
15CSL76 Students
No ratings yet
15CSL76 Students
18 pages
Python 20240309 154846 0000
No ratings yet
Python 20240309 154846 0000
34 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
#The Numpy Array 20240306 131018 0000
No ratings yet
#The Numpy Array 20240306 131018 0000
7 pages
ML_labs
No ratings yet
ML_labs
15 pages
Enthought: Introduction To Numerical Computing With Numpy
No ratings yet
Enthought: Introduction To Numerical Computing With Numpy
39 pages
Numpy
No ratings yet
Numpy
4 pages
A4
No ratings yet
A4
5 pages
Python Numpy 8 Nov
No ratings yet
Python Numpy 8 Nov
69 pages
Numpy
No ratings yet
Numpy
5 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
vertopal.com_unit2
No ratings yet
vertopal.com_unit2
12 pages
Scoa Codes
No ratings yet
Scoa Codes
9 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
Numpy Merged (1)
No ratings yet
Numpy Merged (1)
93 pages
Python Lab fileVANSH-1
No ratings yet
Python Lab fileVANSH-1
41 pages
DAV
No ratings yet
DAV
80 pages
ML Lab Record
No ratings yet
ML Lab Record
33 pages
Mds1111 Merged Numbered (1)
No ratings yet
Mds1111 Merged Numbered (1)
41 pages
Dhrumil Aml
No ratings yet
Dhrumil Aml
14 pages