0% found this document useful (0 votes)
13 views

Notes Data Science 1

The document discusses data science topics including data mining tasks, the data mining process, supervised vs unsupervised learning, predictive modeling, and numpy. It provides examples of using numpy to perform operations on arrays such as creating, transforming, slicing, and calculating statistics of arrays.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Notes Data Science 1

The document discusses data science topics including data mining tasks, the data mining process, supervised vs unsupervised learning, predictive modeling, and numpy. It provides examples of using numpy to perform operations on arrays such as creating, transforming, slicing, and calculating statistics of arrays.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

#NOTES ON DATA SCIENCE PART 2

#Chapter 1

'''
1 Use of data mining
2 Data Mining
4 Data Science, Engineering, and Data-driven decision making
6 Target example
7 Automated decision making
8 Big data
9 Big data 1 to Big Data 2
9 Data as an asset
10 Credit cards
11 Success stories
12 Data analytic thinking
14 Data Mining and data science
15 Data Science vs Data Scientist
'''

#Chapter 2
#A set of canonical data mining tasks; The data mining process; Supervised vs
Unsupervised data mining
'''
20 Classification and class probability estimation
21 Regression
21 Similarity matching
21 Clustering
21 Co-occurrence grouping (frequent itemset mining, association rule discovery,
market-based analysis)
22 Profiling
22 Data reduction
23 Causal modeling
24 Supervised vs Unsupervised methods
26 The data mining process
27 CRISP
27 Business Understanding
29 Data preparation
30 Leak
31 Evaluating the results of data mining
35 Statistics
37 Database Querying
38 Data Warehousing
39 Regression analysis
39 Machine Learning and Data Mining
'''

# Chapter 3: Intro to Predictive Modeling


# Identifying informative attributes,
# Segmenting data by progressive attribute selection
# Finding correlations, Attribute/Variable selection, Tree induction
'''
44 Models, Induction and Prediction
47 Data induction
51 Information gain, entropy
laplace correction
logistic regression
svm
sigmoid curve formula, calculate probability
'''

# Chapter 4: Fitting a modeling


# Logistic Regression, MSE

#NUMPY

import numpy as np
arr = np.arr([1, 2, 3, 4, 5])
print(type(arr))
OUTPUT: numpy.ndarray

np.arange(num) # create an array of numbers


np.reshape(array) # transform an array into another array
np.sqrt
np.ceil
np.floor
np.random.randn(val) #create an array with val random numbers from -1 to 1
np.add(a,b) OR a+b
np.multiply(a,b) OR a*b
np.maximum(c,d)
np.minimum(c,d)
np.modf(arr) #returns two arrays, one with the decimals and one with the integral
np.exp(num) #returns e^x
np.array() #make an array
np.dot() #dot product
np.ones()
np.zeros()
np.fill_diagonal()
np.mean()
np.amax()
np.newaxis()
np.vstack()
np.eye()
np.ndarray.strides

.ndim #number of dimensions


.shape #shape of the array
.dtype #data type of the data in the array
.transpose OR .T
.expand_dims

# Write a function that takes f_x as input and


# returns a numpy array containing probabilities of f(x) using logistic regression

def calc_prob(f_x):
#f_x = -1 * f_x
return (1/(1 + np.exp(-1*f_x)))

# Slicing, fancy indexing


# Question 9: Create a 6x6 array of zeros. Use fancy indexing to add 9 to the last
3 rows

arr2 = np.zeros((6,6), dtype = int)


arr2[[3,4,5]] = 9
arr2
# Question 5: Change all the values of the first row in the first layer of arr3D to
900 using slicing

arr3d[:1,:1] = 900
arr3d

# Assignment questions and answers


import numpy as np
mylist = [10, 20, 30]
myarray = np.array(mylist)
myarray

#######################

arr3d = np.arange(30, 109, 3).reshape(3,3,3)


arr3d

#######################

print("Number of dimensions:", arr3d.ndim)


print("Shape:", arr3d.shape)
print("Data type:", arr3d.dtype)

#######################

arr3d[:1] = 600
arr3d

#######################

arr3d[:1,:1] = 900
arr3d

#######################

A = np.array([5, 5, 5])
B = np.array([6, 6, 6])
dot_product = np.dot(A, B)
print(dot_product)

#######################

A = np.array([3.14, 1.41, 2.25])


x, y = np.modf(A)
print(x)
print(y)

#######################

arr1 = np.ones((5,5), dtype = int)


np.fill_diagonal(arr1, 9)
arr1

#######################

arr2 = np.zeros((6,6), dtype = int)


arr2[[3,4,5]] = 9
arr2
#######################

arr3 = np.arange(0, 121, 5).reshape(5,5)


maxval = np.amax(arr3, axis = 0)
print(arr3)
print("mean of each row:", np.mean(arr3, axis = 1))
print("max value in each column:", np.amax(arr3, axis = 0))

#########################
## Entropy Calculation ##
#########################

# calculate the entropy for a dice roll using base 6


from scipy.stats import entropy
# discrete probabilities
p = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
# calculate entropy
e = entropy(p, base=6)
# print the result
print('entropy: %.3f bits' % e)

# calculate the entropy for coin flip


from scipy.stats import entropy
# discrete probabilities
p = [1/2, 1/2]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

# calculate the entropy for 3 classes using base 3


from scipy.stats import entropy
# discrete probabilities
p = [1/3, 1/3, 1/3]
# calculate entropy
e = entropy(p, base=3)
# print the result
print('entropy: %.3f bits' % e)

#######################

# calculate the entropy for an unfair dice where even numbers show up twice as
often as odd numbers
from scipy.stats import entropy
# discrete probabilities
p = [0.67/6, (0.67*2)/6, 0.67/6, (0.67*2)/6, 0.67/6, (0.67*2)/6]
# calculate entropy
e = entropy(p, base=6)
# print the result
print('entropy: %.3f bits' % e)

#######################

# calculate the entropy for coin flip where heads comes 75% of times and tails
comes 25% of the times
from scipy.stats import entropy
# discrete probabilities
p = [3/4, 1/4]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

# compute entropy for 2 classes having probabilities of 2/3 and 1/3


from scipy.stats import entropy
# discrete probabilities
p = [2/3, 1/3]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

from scipy.stats import entropy


# discrete probabilities
p = [0.66, 0.37]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy: %.3f bits' % e)

#######################

from scipy.stats import entropy


# discrete probabilities
p_pos = 0.5
p_negative = 0.5
p = [p_pos, p_negative]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy = %.3f' % e)

#######################

from scipy.stats import entropy


# discrete probabilities
p_pos = 0.2
p_negative = 0.8
p = [p_pos, p_negative]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy = %.3f' % e)

#######################

from scipy.stats import entropy


# discrete probabilities
p_pos = 0.7
p_negative = 0.3
p = [p_pos, p_negative]
# calculate entropy
e = entropy(p, base=2)
# print the result
print('entropy = %.3f' % e)

#######################
## MASKING ##
#######################

data = np.arange(15).reshape(5,3)

OUT: array([[ 0, 1, 2],


[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])

alpha = np.array(['A','B','C','A','C'])
alpha == 'A'

OUT: array([ True, False, False, True, False])

data[alpha=='A']
OUT: array([[ 0, 1, 2],
[ 9, 10, 11]])

data[alpha == 'A'] = 100


data
OUT: array([[ 100, 100, 100],
[ 1, 4, 5],
[ 6, 7, 8],
[ 100, 100, 100],
[12, 13, 14]])

# https://numpy.org/doc/stable/reference/ufuncs.html

You might also like