0% found this document useful (0 votes)
16 views

Introduction To Data Science

Uploaded by

Sanam Durgarani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Introduction To Data Science

Uploaded by

Sanam Durgarani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Welcome to Course on Data Science

By Swathi Boosala(Data Scientist)


Agenda
1. Introduction to Datascience and its scope
2. Skill set used by data scientist
3. Introduction to python - algorithms
4. Operators
5. Control statements
6. Data structures in python – Lists
7. Tuples
8. Sets
9. Dictionaries
10. Numpy arrays
11. Pandas
12. Exception handling
13. Introduction sql –
14. Usage of window functions in sql
15. Probaility and hypothesis testing
16. Matplot library
17. Linear regression
18. Naïve Bayes theorem
19. Logistic regression
20. Assignments
What is Python?
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level
built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid
Application Development, as well as for use as a scripting or glue language to connect existing components together.
Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance.
Python supports modules and packages, which encourages program modularity and code reuse. The Python
interpreter and the extensive standard library are available in source or binary form without charge for all major
platforms, and can be freely distributed.

What can Python do?


•Python can be used on a server to create web applications.
•Python can be used alongside software to create workflows.
•Python can connect to database systems. It can also read and modify files.
•Python can be used to handle big data and perform complex mathematics.
•Python can be used for rapid prototyping, or for production-ready software development.
Features in Python
1. Free and Open Source
2. Easy to code
3. Easy to Read
4. Object-Oriented Language
5. GUI Programming Support
6. High-Level Language
7. Large Community Support
8. Easy to Debug
9. Python is a Portable language
10.Python is an Integrated language
11.Interpreted Language:
12.Large Standard Library
13.Dynamically Typed Language
14.Frontend and backend development
15.Allocating Memory Dynamically
The word Algorithm means ” A set of finite rules or instructions to be followed in calculations or other problem-
solving operations ”
Types of Algorithms:
•Sorting algorithms: Bubble Sort, insertion sort, and
many more. These algorithms are used to sort the data in
a particular format.
•Searching algorithms: Linear search, binary search, etc.
These algorithms are used in finding a value or record
that the user demands.
•Graph Algorithms: It is used to find solutions to
problems like finding the shortest path between cities,
and real-life problems like traveling salesman problems.
Example: algorithm to multiply 2 numbers and print the
result:

Step1:Start
Step 2: Get the knowledge of input. Here we need 3
variables; a and b will be the user input and c will hold
the result.
Step 3: Declare a, b, c variables.
Step 4: Take input for a and b variable from the user.
Step 5: Know the problem and find the solution using
operators, data structures and logic
We need to multiply a and b variables so we use *
operator and assign the result to c.
That is c <- a * b
Step 6: Check how to give output, Here we need to print
the output. So write print c
Step 7: End
Example 1: Write an algorithm to find the maximum of all the elements present in the array.
Follow the algorithm approach as below:

Step 1: Start the Program


Step 2: Declare a variable max with the value of the first element of the array.
Step 3: Compare max with other elements using loop.
Step 4: If max < array element value, change max to new max.
Step 5: If no element is left, return or print max otherwise goto step 3.
Step 6: End of Solution
Example 2: Write an algorithm to find the average of 3
subjects.
Follow the algorithm approach as below:

Step 1: Start the Program


Step 2: Declare and Read 3 Subject, let’s say S1, S2, S3
Step 3: Calculate the sum of all the 3 Subject values and store
result in Sum variable (Sum = S1+S2+S3)
Step 4: Divide Sum by 3 and assign it to Average
variable. (Average = Sum/3)
Step 5: Print the value of Average of 3 Subjects
Step 6: End of Solution
Flowchart
https://www.lucidchart.com/pages/what-is-a-flowchart-tutorial
Python divides the operators in the following groups:
Arithmetic operators
Assignment operators
Comparison operators
Logical operators
Identity operators
Membership operators
Bitwise operators
Python Assignment Operators
Python Comparison Operators
Python Logical Operators

Python Identity Operators


Python Membership Operators

Python Bitwise Operators


Operator Precedence
Lists are used to store multiple items in a single variable.
thislist = ["apple", "banana", "cherry", "apple", "cherry"]
print(len(thislist))
List items can be of any data type:
list1 = ["apple", "banana", "cherry"]
list2 = [1, 5, 7, 9, 3]
list3 = [True, False, False]
A list can contain different data types:
list1 = ["abc", 34, True, 40, "male"]
print(type(mylist))

thislist = list(("apple", "banana", "cherry")) # note the double round-


brackets
print(thislist)
print(thislist[1])
print(thislist[-1])
Return the third, fourth, and fifth item:
print(thislist[2:5])
print(thislist[:4])
print(thislist[2:])
print(thislist[-4:-1])
thislist = ["apple", "banana", "cherry"]
if "apple" in thislist:
print("Yes, 'apple' is in the fruits list")
thislist = ["apple", "banana", "cherry"]
thislist[1] = "blackcurrant"
print(thislist)
thislist = ["apple", "banana", "cherry", "orange", "kiwi", "mango"]
thislist[1:3] = ["blackcurrant", "watermelon"]
print(thislist)
thislist = ["apple", "banana", "cherry"]
thislist.insert(2, "watermelon")
print(thislist)
thislist = ["apple", "banana", "cherry"]
thislist.append("orange")
print(thislist)
thislist = ["apple", "banana", "cherry"]
tropical = ["mango", "pineapple", "papaya"]
thislist.extend(tropical)
print(thislist)
thislist = ["apple", "banana", "cherry"]
thislist.remove("banana")
print(thislist)
thislist = ["apple", "banana", "cherry"]
thislist.pop(1)
print(thislist)
thislist = ["apple", "banana", "cherry"]
thislist.pop()
print(thislist)
thislist = ["apple", "banana", "cherry"]
del thislist[0]
print(thislist)
thislist = ["apple", "banana", "cherry"]
del thislist
thislist = ["apple", "banana", "cherry"]
thislist.clear()
print(thislist)
thislist = ["apple", "banana", "cherry"]
for x in thislist:
print(x)
thislist = ["apple", "banana", "cherry"]
for i in range(len(thislist)):
print(thislist[i])
thislist = ["apple", "banana", "cherry"]
i=0
while i < len(thislist):
print(thislist[i])
i=i+1
thislist = ["apple", "banana", "cherry"]
[print(x) for x in thislist]

List Comprehension
newlist = [expression for item in iterable if condition == True]
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []

for x in fruits:
if "a" in x:
newlist.append(x)

print(newlist)

fruits = ["apple", "banana", "cherry", "kiwi", "mango"]

newlist = [x for x in fruits if "a" in x]

print(newlist)
thislist = ["orange", "mango", "kiwi", "pineapple", "banana"]
thislist.sort()
print(thislist)

thislist = ["orange", "mango", "kiwi", "pineapple", "banana"]


thislist.sort(reverse = True)
print(thislist)
Python Tuples
thistuple = ("apple", "banana", "cherry")
thistuple = ("apple", "banana", "cherry", "apple", "cherry")
print(len(thistuple))
tuple1 = ("abc", 34, True, 40, "male")
mytuple = ("apple", "banana", "cherry")
print(type(mytuple))
thistuple = tuple(("apple", "banana", "cherry")) # note the double round-brackets
print(thistuple)
print(thistuple[1])
print(thistuple[-1])
thistuple = ("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
print(thistuple[2:5])
thistuple = ("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
print(thistuple[:4])
thistuple = ("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
print(thistuple[-4:-1])
thistuple = ("apple", "banana", "cherry")
if "apple" in thistuple:
print("Yes, 'apple' is in the fruits tuple")
Tuples are unchangeable, meaning that you cannot change, add, or remove items once the tuple
is created.
thistuple = ("apple", "banana", "cherry")
for x in thistuple:
print(x)
tuple1 = ("a", "b" , "c")
tuple2 = (1, 2, 3)
tuple3 = tuple1 + tuple2
print(tuple3)
fruits = ("apple", "banana", "cherry")
mytuple = fruits * 2
print(mytuple)
Sets
myset = {"apple", "banana", "cherry"}
A set is a collection which is unordered, unchangeable*, and unindexed.
set1 = {"abc", 34, True, 40, "male"}
thisset = {"apple", "banana", "cherry"}
thisset.add("orange")
print(thisset)
thisset = {"apple", "banana", "cherry"}
tropical = {"pineapple", "mango", "papaya"}
thisset.update(tropical)
print(thisset)
thisset = {"apple", "banana", "cherry"}
thisset.remove("banana")
print(thisset)
thisset = {"apple", "banana", "cherry"}
x = thisset.pop()
print(x)
print(thisset)
set1 = {"a", "b" , "c"}
set2 = {1, 2, 3}

set3 = set1.union(set2)
print(set3)
set1 = {"a", "b" , "c"}
set2 = {1, 2, 3}

set1.update(set2)
print(set1)
x = {"apple", "banana", "cherry"}
y = {"google", "microsoft", "apple"}

x.intersection_update(y)

print(x)
x = {"apple", "banana", "cherry"}
y = {"google", "microsoft", "apple"}

z = x.intersection(y)

print(z)
x = {"apple", "banana", "cherry"}
y = {"google", "microsoft", "apple"}

x.symmetric_difference_update(y)

print(x)
x = {"apple", "banana", "cherry"}
y = {"google", "microsoft", "apple"}

z = x.symmetric_difference(y)

print(z)
Python Dictionaries
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}

print(len(thisdict))
String, int, boolean, and list data types:
thisdict = {
"brand": "Ford",
"electric": False,
"year": 1964,
"colors": ["red", "white", "blue"]
}
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(type(thisdict))
Accessing Items
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
x = thisdict["model"]

x = thisdict.keys()
x = thisdict.values()
x = thisdict.items()
Check if "model" is present in the dictionary:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
if "model" in thisdict:
print("Yes, 'model' is one of the keys in the thisdict dictionary")
Change the "year" to 2018:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["year"] = 2018
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
Adding items
thisdict.update({"year": 2020})
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["color"] = "red"
print(thisdict)
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.update({"color": "red"})
The pop() method removes the item with the specified key name:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.pop("model")
print(thisdict)
The popitem() method removes the last inserted item (in versions before 3.7, a random item is
removed instead):
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.popitem()
print(thisdict)
The del keyword removes the item with the specified key name:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
del thisdict["model"]
print(thisdict)
The del keyword can also delete the dictionary completely:
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
del thisdict
print(thisdict) #this will cause an error because "thisdict" no longer exists.
Print all key names in the dictionary, one by one:
for x in thisdict:
print(x)
for x in thisdict.values():
print(x)
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
mydict = thisdict.copy()
print(mydict)
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
mydict = dict(thisdict)
print(mydict)
Nested Dictionaries
Create a dictionary that contain three dictionaries:
myfamily = {
"child1" : {
"name" : "Emil",
"year" : 2004
},
"child2" : {
"name" : "Tobias",
"year" : 2007
},
"child3" : {
"name" : "Linus",
"year" : 2011
}
}
child1 = {
"name" : "Emil",
"year" : 2004
}
child2 = {
"name" : "Tobias",
"year" : 2007
}
child3 = {
"name" : "Linus",
"year" : 2011
}

myfamily = {
"child1" : child1,
"child2" : child2,
"child3" : child3
}
A lambda function is a small anonymous function.
A lambda function can take any number of arguments, but can only have one expression.

Add 10 to argument a, and return the result:


x = lambda a : a + 10
print(x(5))
x = lambda a, b : a * b
print(x(5, 6))
Summarize argument a, b, and c and return the result:
x = lambda a, b, c : a + b + c
print(x(5, 6, 2))

The power of lambda is better shown when you use them as an anonymous function inside
another function.
def myfunc(n):
return lambda a : a * n

mydoubler = myfunc(2)

print(mydoubler(11))
Python Arrays
cars = ["Ford", "Volvo", "BMW"]
x = cars[0]
x = len(cars)
for x in cars:
print(x)
cars.append("Honda")
Delete the second element of the cars array:
cars.pop(1)
Delete the element that has the value "Volvo":
cars.remove("Volvo")
NumPy Creating Arrays

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
print(np.__version__)
print(type(arr))
Create a 0-D array with value 42
arr = np.array(42)
Create a 1-D array containing the values 1,2,3,4,5:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
print(arr[1])
print(arr[2] + arr[3])
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('5th element on 2nd row: ', arr[1, 4])
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('Last element from 2nd dim: ', arr[1, -1])
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[:4])
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[::2])
From the second element, slice elements from index 1 to index 4 (not included):
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])
From both elements, return index 2:
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 2])
From both elements, slice index 1 to index 4 (not included), this will return a 2-D array
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pandas.DataFrame(mydataset)
print(myvar)
print(pd.__version__)
A Pandas Series is like a column in a table.
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
print(df.loc[0])
Pandas use loc attribute to return one or more rows
# importing pandas package
import pandas as pd
# making data frame from csv file
df = pd.read_csv("nba.csv")
# printing the first 10 rows of the dataframe
df[:10]
# Applying aggregation across all the columns
# sum and min will be found for each
# numeric type column in df dataframe
df.aggregate(['sum', 'min'])
# importing pandas package
import pandas as pd
# making data frame from csv file
df = pd.read_csv("nba.csv")
# We are going to find aggregation for these columns
df.aggregate({"Number":['sum', 'min'],
"Age":['max', 'min'],
"Weight":['min', 'sum'],
"Salary":['sum']})
# importing pandas as pd
import pandas as pd
# Creating the Series
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio'])
# Create the Index
index_ = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5']
# set the index
sr.index = index_
# Print the series
print(sr)
# change 'Rio' to 'Montreal'
# we have used a lambda function
result = sr.apply(lambda x : 'Montreal' if x =='Rio' else x )
# Print the result
print(result)

You might also like