Unit 3 data science
#introduction to python:
*python is a popular programming language.it was created by guido van Rossum, and released in
1991 at cwi(centrum wiskunde & informatica) netherland.
*python is general purpose ,high level programming language.
*python is a dynamic .
#why python:
1.simpy & easy to learn
2.platform independent
3.fre & open source
4.interpreted(byte-code-compiled)
5.embeddable & extensible
6.portable & robust
7. rich library support.
Use: web frameworks and application,gui-based desktop app, ml, data science
“hello world” in python
Print (“hello world”)
#input from user:
1.Name=input(“enter your name”)
Print(“hello , ”+ name)
Print(type(name))
Output:
Enter your name:Aman
Hello, Aman
<class ‘str’>
2.num=(input(“enter a number”))
Print (num)
Print(type(num))
Num=num+2
Print(num)
Output:
Enter a number 10
Unit 3 data science
10
<Class ‘str’>
Error must be str, not int
#Type casting
2.num=int((input(“enter a number”)))
Print (num)
Print(type(num))
Num=num+2
Print(num)
Output:
Enter a number 10
10
<Class ‘int’>
12
#variable in python:
Variable are used to store and mange data.
Variable name must start with a letter or the underscore character. variable name cannot start with a
number. Variable name can only contain alpha-numeric characters and undescores(A-z,0_9,and).
Variable name are case sensitive.
Variable name cannot be any of the python keyword.
Ex : var1=5
Var_1=5
_var1=5
1var, var 1> invalid
#data types in python:
1.numeric type: 4.sequence type:
Int, float ,complex. List ,tuple, range.
2.text type: 5.mapping type:
Str dict
3.boolean type: 6.set type
Bool set
Unit 3 data science
7.binary type:
Bytes ,bytearray
8.none type.
# various datatype in python:
1. Mutable data type
List, sets, dictionary
Both read and write
2. Immutable data type
Numbers ,strings ,tuples
Only read
#numpy stands for numerical python powerful python library that is widely used for scientific
computing, data analysis, and numerical computing task.
.pip install numpy.
.in numpy ,array is the fundamental object.
.arrays are used to store homogeneous data elements in a contiguous block of memory.
#how to create numpy array
1.Import numpy as np
List1=[10,20,30,40,50]
Array1=np.array(List1)
Print(Array1)
Type(Array1)
Output:
[10 20 30 40 50]
Numpy.ndarrray
2.import numpy as np
List1=[[10,20,30,4],[40,50,60],[70,80,90]]
Array1=np.array(List1)
Print(Array1)
Output:
[[10 20 30]
[40 50 60]
[70 80 90]]
3.Import numpy as np
Unit 3 data science
Array1=np.arange(1,8)
Print(Array1)
Output
[1 2 3 4 5 6 7]
4.Import numpy as np
Array1=np.arange(11,17).reshape((3,2))
Print(Array1)
output
[[11 12]
[13 14]
[15 16]]
5. import numpy as np
Array1=np.zeroes(4)
Print (Array1)
Output:[ 0. 0. 0. 0.]
5. import numpy as np
Array1=np.ones(4)
Print (Array1)
Output:[ 1. 1. 1. 1.]
#Attributes of numpy array
1.ndim
2.shape
3.size
4.dtype
5.itemsize
Ex; Import numpy as np
List1=[10,20,30,40,50]
Array1=np.array(List1)
Array1.ndim
Array1.shape
Array1.size
Unit 3 data science
Array1.dtype
Array1.itemsize
Output:
(5,)
Dtype(‘int32’)
4
2.import numpy as np
List1=[[10,20,30],[40,50,60],[70,80,90]]
Array1=np.array(List1)
Array1.ndim
Array1.shape
Array1.size
Array1.dtype
Array1.itemsize
Output:
(3,3)
3.import numpy as np
Array3=np.array([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]]])
Print(Array3)
Array3.dim
Array3.shape
Array3.size
Array3.dtype
Array3.itemsize
Output
[[[1 2 3]
Unit 3 data science
[4 5 6]]
[[7 8 9]
[10 11 12]]]
(2,2,3)
12
#indexing in numpy Array
Import numpy as np
Array1=np.array([10,20,30,40,50])
Print(Array1[0])
Print(Array1[-1])
Output;
10
50
2.array1=np.array([[10,20,30],[40,50,60],[70,80,90]])
Print(array1[1,2])
Print(array1[0,:])
Print(array1[:,1])
Output:
60
10 20 30
20 50 80
#slicing in numpy array.
Slicing is a way to extract a subset of data from a numpy array.
Import numpy as np
Array1=np.array([10,20,30,40,50,60,70])
Print(Array1[1:3])
Print(Array1[1:6:2])
Print(Array1[-1:-3:-1]) //[start :stop :step]
Print(Array1[:: 2])
Unit 3 data science
Print(Array1[: : -1])
Output:
20 30
20 40 60
70 60
10 30 50 70
70 60 50 40 30 20 10
Import numpy as np
Array1=np.array([[15,16,17],[25,26,27],[35,36,37],[45,46,47]])
Print(Array1[1,])
Print(Array1[:,1])
Print(Array1[1:3,1;3])
Print(Array1[1:3,])
Print(Array1[:,1:3])
Print(Array1[1:3,1])
Print(Array1[1;3,:1])
Print(Array1[1:3,1:])
0 1 2
0 15 16 17
1 25 26 27
2 35 36 37
3 45 46 47
visualization
Output;
25 26 27
16 26 36 46
26 27 36 37
25 26 27 35 36 37
16 17 26 27 36 37 46 47
26 36
Unit 3 data science
25 35
26 27 36 37
#arithmetuc operations
1.addition(+),-,*,/,//,**,%
Import numpy as np
X=np.array([[1,2],
[3,4]])
Y=np.array([[11,12],
[13,14]])
Z=x-y
Z=x%y // remainder
Z=x//y //floor division point value remove.
Z=x@y //matrix multiplication
Print(x.transpose())
Print(z)
#shorting
1.np.sort(): it will returns a sorted copy of an array.
2.np.argsort90: it will return the indices that would sort an array.
3.ndarray.sort():use array name and sort it in place.
import numpy as np
x=np.array([[12,11,15],
[21,25,20],
[18,27,16]])
Y=np.sort(x,axis=0)//columns wise
Y=np.argsort(x,axis=1)// indexing wise sort
Print(y)
In place;
x.sort()
print(x)
#short in 1d
Import numpy as np
Unit 3 data science
X=np.array([7,2,3,9,6])
1.Y=np.sort(x)
Print(x)
2.Y=np.argsort(x)
Print(y)
3.x.sort(x)
Print(x)
//Y=np.mean(x)
Print(y)
#statical operation
1.max() 2. Min() 3. Sum() 4. Mean() 5. Median() 6. Prod()
7.var() 8std()
#pandas store tabular data using a dataframe.
A dataframe is a two dimensional labelled data structure like a table in databases.
Every dataframe contains rows and columns,and therefore has both a row and column index.
Each column can have a different type of values.
Import pandas as pd
St_data=(1,”varun”30,”male”,”chandigarh”),
(2,”ravi”,31,”male”,”delhi”),
(3,”Preeti”29,”female’,”Jaipur”),
(4,”amrit”32,”male”,”Mumbai”),
(5,”pinki”,28,”female”,”banglore”)]
Df=pd.DataFrame(std_data,columns=[‘stu_id’,’name’,’age’,’gender’,’address’])
df
2.import pandas as pd
df=pd.read_csv(“student.csv”)
df
#df.head() // first five row
#df.tail() //last five row
# df.shape // no.row and no.column
#df.size //
Unit 3 data science
#df.column [[ ‘age’,’address’]]// name of column
#df.dtypes
#df.values
#df.index
#selecting row and columns
#selcting a single column
#df[‘age’]
#selecting multiple columns
#df[[‘name’,’address’]]
#selecting a single row by index label
#def.loc[0]
#selecting multiple row by index label
#def.loc[[0,2,4]]
#selecting a single raw by integer index
#def.iloc[0]
##selecting multiple row by integer index
#def.iloc[[0,2]]
#filtering rows
df[df[‘age’]>29]
#adding a new column to a dataframe
df[‘phone_no’]=[10,20,30,40,50]
df.insert(3,’phone_no’,[10,20,30,40,50]
print(df)
#deleting a column from dataframe
df=df.drop(columns=[‘phone_no’])
df
#deleting a column from dataframe
df=df.drop(columns=[‘phone_no’])
df
#rename the “old_name” column to ‘new_name ‘
#df=df.rename(columns={‘old_name’:’new_name’})
Unit 3 data science
df=df.rename(columns={‘age’: ‘student_age’})
df
#deleting a column from dataframe
del df[phone_no’]
#deleting a row from dataframe
df=df.drop(4)
df
#adding a new row in existing dataframe
Df.loc[4]=[5,’pinki’,28,’female’,’banglore’]
#updating the value
Df.loc[2,’student_age’]=71
Df
#updating the multiple values
Df.loc[[0,2],’adress’]=[andaman’,’nicobar’]