0% found this document useful (0 votes)
47 views55 pages

Study Material IP 2022

Uploaded by

palkisudha274
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views55 pages

Study Material IP 2022

Uploaded by

palkisudha274
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Unit 0: Revision of Python Basics

Intro:
● Python is a high level (close to English) and interpreted (read and executed line by line)
programming language developed by Guido Van Rossum in the 90s.
● It can be operated via shell (interactive) or script mode.
● Identifier: A variable/name of the function can be any combination of letters, digits and
underscore characters. The first character cannot be a digit. Variables in Python are
case sensitive.
abc (all chars) 1abc (starts with a digit)
valid _val1 (underscore, char and digits) for (it is a reserved keyword) invalid
first_name (underscore as a first&name (use of special character)
connector)
● Keywords: Reserved for special use. Can’t be used as variable names. Ex. if, any, in while, else
etc
● Operators: Just like regular mathematics has operators so does the python, most are
borrowed from math
a, b=15,4
✔ Arithmetic: +, -,*,/,//,%,** Ex. print(a+b,a%b,a//b,a*b) O/P 19 3 3 60
✔ Comparison: <,==,>=; Ex. print(a>b, a<b, a==15) O/P True False True
✔ Logical: and, or, not; Ex. print(a>b and b<a-b) O/P True
✔ Membership: in, not in Ex. print (a in [3,41,50]) O/P False

● Data Types:
✔ Number (Immutable)
✔ Integer- 52, -9
✔ Float- 23,7,-0.0003,
✔ Boolean- True, False, 2>3, 5%2==1
✔ Collection
❖ String- Ordered and immutable collection of characters, digits and special symbols. Methods:
count (), find (), isupper (), isdigit (), tolower (), etc. Ex. s1,s2 = 'अजगर', ''Sita sings the
blues''
❖ List - Ordered, Heterogenous and mutable collection. Methods: count (), insert (), append (),
remove (), pop (), sort () etc. Ex. l1,l2= [1,2,3], [1109,'R Rajkumar','XII','89.25%']
❖ Tuple - Ordered, Heterogenous and immutable collection. Methods: count(), index() etc.
Ex.t1,t2= (1,2,3), (1109,'R Rajkumar','XII','89.25%')
● Common Operations:
○ * and + operator will behave same on all three.
Ex. print(s1*3) # O/P: अजगरअजगरअजगर
Ex. print(l1+l2) # O/P: [1,2,3,1,2,3]

○ Iteration works exactly the same.


Ex. for i in t1:
print(i,end=' ') # O/P: 1 2 3

○ Slicing and element access is also the same.


Ex. print(s2[0:10]) # O/P: ''Sita sings''

● Uncommon: t1[2]=4 or s2[3]='e' will result in error(as they are immutable).


-Mapping Dictionary- Unordered, Heterogenous and has custom names for index called key.
Methods: get(), keys(), items(), update() etc.
d1= {'rno':1109,'name':'R Rajkumar','class':'XII','marks':'89.25%'}
FLOW OF CONTROL:

i=0 #initialisation Flow chart


while(i<=5): # execute statement inside, until i<=5
i=i+1
if(i==2): # skip rest of the code and go to back to
loop
continue
elif(i==4):
break # come out of the loop
else:
print(i,end=',')
else:
print('came out successfully')
print('break was applied') # will print it if break is
executed

for i in range(1,6): # i will have values from 1 to 5


if(i==2):
continue
elif(i==4):
break
else:
print(i,end=',')
else:
print('came out successfully')
print('break was applied')
output: 1, 3, break was applied
# increment was done with help of range function

Some Common Functions/properties: use with example


- Inbuilt: len(), type(),id(), print(), input(), int(), float(),eval()
- math: log(),sqrt(),pow() etc
Numpy Library: Ordered, Homogenous collection of numbers (generally).
Import via=> import numpy as np

Comparison Question from list, NumPy, and pandas series (3 Marks)


List vs numpy array vs series
- Size - Numpy data structures take up less space than list and series
- Performance - they are faster than lists and series
- Indexing - Both list and NumPy have a numeric index (0,1,2…) whereas Series supports custom
index.
- List does not support vectorised operation ex. print(np.array([2,4])*2) ⇒ [4,8]

import numpy as O/P Note: Even though lst (list object), arr (array
np [2, 1, 1, 2, 2, 1, 1, 2] object) and sr (series object) have the same data.
import pandas as [4 2 2 4] I.e. 2, 1, 1 and 2.
pd [2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2]
lst=[2,1,1,2] 0 6 When + or * operators are applied to them list
arr=np.array(lst) 1 3 behaves differently from both NumPy and series.
sr=pd.Series(lst) 2 3 lst*3 prints the list elements three times, whereas
print(lst+lst) 3 6 sr*3 multiplies 3 to the individual elements 2,1,1
print(arr+arr) dtype: int64 and 2.
print(lst*3)
print(sr*3)
Unit 1: Data Handling using Pandas and
Data Visualization

1.1 Introduction to Python libraries- Pandas, Matplotlib


(Pandas, Matplotlib. Data structures in Pandas - Series and Data Frames)
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation
module, built on top of the Python programming language. It is most widely used for data
science/data analysis and machine learning tasks. It is built on top of another package named
Numpy, which provides support for multi-dimensional arrays.
Pandas makes it simple to do many of the time consuming, repetitive tasks associated with working
with data, including:
a. Data cleansing
b. Data fill
c. Merges and joins
d. Data visualization
e. Statistical analysis
f. Data inspection
Two important data structures of pandas are Series and DataFrame
● Series: It is a one-dimensional array-like structure with homogeneous data which by default have
numeric data labels starting from zero.

● DataFrame: It is a two-dimensional table like structure with heterogeneous data having both
rows and columns. Each column can have a different type of value such as numeric, string,
boolean, etc., as in tables of a database.

Basic Features
S. N. Series Dataframe

1 1 - dimensional structure 2 - dimensional structure

2 Homogeneous data Heterogeneous data

3 Data is mutable Data is mutable

4 Size is immutable Size is mutable


1.2 Series
(Creation of Series from – ndarray, dictionary, scalar value; mathematical operations; Head and
Tail functions; Selection, Indexing and Slicing)
Series Creation: There are different ways in which a series can be created in Pandas. To create or use
series, we first need to import the Pandas library.
● Empty Series:
import pandas as pd
s = pd.Series()
print(s)
Output: Series([], dtype: float64)
● Using Scalar Value: To create a series using scalar value, indices must be provided. The scalar
value will be repeated to match the length of the index.
import pandas as pd
s = pd.Series(5, index=[0, 1, 2, 3])
print(s)
Output:
0 5
1 5
2 5
3 5
dtype: int64
● Using Dictionary: Series can be created from a dictionary where each key represents index label
and value represents data.
import pandas as pd
data_dict = {'a' : 0.0, 'b' : 1.3, 'c' : -2.7}
s = pd.Series(data_dict)
print(s)
Output:
a 0.0
b 1.3
c -2.7
dtype: float64
● Using ndArray (Numpy Array): Series can be created from Numpy array (ndarray) as well as
additional methods which are defined in numpy library/module i.e., arange(), linspace() etc.
import pandas as pd
import numpy as np
s = pd.Series(np.array([6, -1, 3]))
print(s)
Output:
0 6
1 -1
2 3
dtype: int32
import pandas as pd
import numpy as np
s = pd.Series(np.arange(15, 5, -3))
print(s)
Output:
0 15
1 12
2 9
3 6
dtype: int32
import pandas as pd
import numpy as np
s = pd.Series(np.linspace(-2.0, -3.0, num=5))
print(s)
Output:
0 -2.00
1 -2.25
2 -2.50
3 -2.75
4 -3.00
dtype: float64

● Using Mathematical Operations: The mathematical operation can be performed on two series
and is done on each corresponding pair of elements. While performing operations, index
matching is implemented and all the missing values are filled in with NaN by default.
import pandas as pd
seriesA = pd.Series([1, 2, 3, 4, 5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10, 20, -10, -50, 100], index = ['z', 'y', 'a', 'c', 'e'])
print(seriesA + seriesB)
Output:
a -9.0
b NaN
c -47.0
d NaN
e 105.0
y NaN
z NaN
dtype: float64
NOTE: NaN is considered as float64. During calculation, If data is missing for a particular index,
default value can be set to be utilized by the said by using add, sub, mul & div functions i.e.,
seriesA.add(seriesB, fill_value=0).

Head & Tail: These functions are used to retrieve small amounts of data from the front or rear end
and can be used to peek at the type of data stored in the Series.
● Head: head(n) function will return the first n items of the series. If the value for n is not passed,
then by default n takes 5 and the first five items will be displayed.
import pandas as pd
mySeries = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(mySeries.head(2))
print(mySeries.head())
Output:
a 1
b 2
dtype: int64
a 1
b 2
c 3
d 4
e 5
dtype: int64
● Tail: tail(n) function will return the last n items of the series. If the value for n is not passed, then
by default n takes 5 and the last five items will be displayed.
import pandas as pd
mySeries = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(mySeries.tail(2))
print(mySeries.tail())
Output:
i 9
j 10
dtype: int64
f 6
g 7
h 8
i 9
j 10
dtype: int64

Indexing: It is used to access elements in a series. Indexes are of two types i.e., Positional Index and
Labeled Index.
● Positional Index: It takes an integer value that corresponds to its position in the series starting
from 0 to n-1 (where n is the number of items in the series)
import pandas as pd
s = pd.Series([10, 20, 30], index = ['a', 'b', 'c'])
print(s[1])
Output: 20
● Labeled Index: It takes any user-defined label as index
import pandas as pd
s = pd.Series([10, 20, 30], index = ['a', 'b', 'c'])
print(s[['a', 'c']])
Output:
a 10
c 30
dtype: int64

Slicing: It is used to retrieve a subset of the series and will be done by specifying the start, end and
step parameters [start:end:step] with the series name.
● When positional indices are used for slicing, the value at the end index position is excluded
whereas in case of slicing with labeled Indices, end label position is included.
● Default values of start and step are 0 and 1 respectively and are optional. Default value of start
will change to -1 if the value of step is negative.
● Positional Index:
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(s[:-9:-2])
Output:
j 100
h 80
f 60
d 40
dtype: int64
● Labeled Index:
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(s['b':'h':2])
Output:
b 20
d 40
f 60
h 80
dtype: int64

Selection: There are two methods available for selection of the data from a Series i.e., loc and iloc.
These are used in filtering the data using positional and labeled indices and also according to some
conditions.
● iloc: It is an indexed-based selecting method which requires an integer index to select a specific
item. If char/string based indices are used, positional based indices (0 to length-1) can be used
with this method.
● loc: It is also an indexed-based selecting method which requires a labeled index to select a specific
item. Labeled indices can be of type integer or char/string.

S. N. Case loc iloc

1 A value A single label or integer A single integer


e.g. .loc[‘A’] or loc[1] e.g. iloc[1]

2 A list A list of labels A list of integers


e.g. .loc[[‘A’, ‘G’, ‘B’]] or loc[[1, 4, 0]] e.g. .iloc[[-2, 0, 4]]
3 Slicing e.g. .loc[‘A’:’E’], both ‘A’ & ‘E’ are e.g. .iloc[-5:-1:], -1 will be excluded
included

4 Condition A list of boolean values of length A list of boolean values of length


equal to the length of the Series equal to the length of the Series
e.g. loc[[True, False, True]] e.g. iloc[[True, False, True]]
Note: Here, length of series is 3 Note: Here, length of the Series is 3

It can also take logical conditions It can not take logical conditions
such as my_series.loc[my_series > directly similar to loc as it only
50] will retrieve all the values which accepts a list of Boolean values
are greater than 50. whereas logical conditions on
Series return a boolean Series.
This limitation can be bypassed by
converting this filtered Series into
a list using the inbuilt list()
function..
e.g. my_series.iloc[list(my_series >
50)]

1.3 Dataframe
(creation - from dictionary of Series, list of dictionaries, Text/CSV files;
display; iteration; Operations on rows and columns: add, select, delete, rename; Head and Tail
functions; Indexing using Labels, Boolean Indexing)
DataFrame Creation: There are different ways in which a DataFrame can be created in Pandas. To
create or use DataFrame, we first need to import the Pandas library.
● Empty DataFrame:
import pandas as pd
df = pd.DataFrame()
print(df)
Output:
Empty DataFrame
Columns: []
Index: []
● Using Dictionary of Series: DataFrame can be created from a dictionary of Series where each key
represents column index label and value (Series) represents data of that particular column. While
creation, index matching is implemented and all the missing values are filled in with NaN by
default.
import pandas as pd
data = {'col1':pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'col2':pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data)
print(df)
Output:
col1 col2
a 1.0 4
b 2.0 5
c 3.0 6
d NaN 7
● Using List of Dictionaries: It can be created using a list of dictionaries where each item
(dictionary) represents a row. Keys of each dictionary act as the column label index for each row.
import pandas as pd
data = [{'a': 1, 'b': 2},
{'a': 3, 'b': 'KVS', 'c': 5},
{'a': 6, 'b': 7, 'c': 8}]
df = pd.DataFrame(data)
print(df)
Output:
a b c
0 1 2 NaN
1 3 KVS 5.0
2 6 7 8.0
● Using Text/CSV Files: It can be created from Text or CSV (type of text file) with built-in read_csv
or read_table() functions.

import pandas as pd
df = pd.read_table('data.txt', header=None, delim_whitespace=True)
print(df)
Output:
0 1 2
0 1 2 4
1 3 KVS 5
2 6 7 8
Note: read_table() will throw an error if all the rows do not have the same no. of data items.
Parameter sep=" " can also be used in place of delim_whitespace=True. It can also be used to
read a CSV file by setting the parameter sep =",".

Iteration on DataFrame:
● Row-wise: .iterrows() function can be used to iterate over the DataFrame row-wise. It is a
generator which yields both the Row Label Index and Row Data (as a Series)
import pandas as pd
data = {'col1':pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd']),
'col2':pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data)
# access rows using iteration
for index, row in df.iterrows():
print(index, row['col1'], row['col2'])
Output:
a04
b15
c26
d37
● Column-wise: .iteritems() function can be used to iterate over the DataFrame column-wise. It is a
generator which yields both the Column Label Index and Column Data (as a Series).
import pandas as pd
data = {'col1':pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd']),
'col2':pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data)
# access columns using iteration
for col_index, col_data in df.iteritems():
print(col_index)
print(col_data)
Output:
col1
a 0
b 1
c 2
d 3
Name: col1, dtype: int64
col2
a 4
b 5
c 6
d 7
Name: col2, dtype: int64

Operations on rows and columns:


● Rows
■ Select: Row(s) can be selected using loc and iloc. Both of these work similar to that of Series.
○ Using iloc:
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
print(df.iloc[2])
Output:
col1 13
col2 23
Name: c, dtype: int64
○ Using loc:
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
print(df.loc['c'])
Output:
col1 13
col2 23
Name: c, dtype: int64
■ Add: New row can be added using the concat() function and loc.
○ Using concat(): It takes list of DataFrames to be merged and return the merged
DataFrame
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
df_temp1 = pd.DataFrame({'col1': 16, 'col2': 26}, index=['f'])
df_temp2 = pd.DataFrame({'col1': 17, 'col2': 27}, index=['g'])
df = pd.concat([df, df_temp1, df_temp2])
print(df)
Output:
col1 col2
a 11 21
b 12 22
c 13 23
d 14 24
e 15 25
f 16 26
g 17 27
○ Using loc
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
df.loc['f'] = [16, 26]
print(d)
Output:
col1 col2
a 11 21
b 12 22
c 13 23
d 14 24
e 15 25
f 16 26
■ Rename: Row Label Index can be changed using the rename() function.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
print(df)
temp = {'a': 'ABC',
'b': 'DEF',
'c': 'GHI',
'd': 'JKL',
'e': 'MNO'}
df = df.rename(index = temp)
print(df)
Output:
col1 col2
a 11 21
b 12 22
c 13 23
d 14 24
e 15 25
col1 col2
ABC 11 21
DEF 12 22
GHI 13 23
JKL 14 24
MNO 15 25
■ Delete: Row(s) can be deleted using the drop() function.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
df = df.drop(['a', 'c'])
print(df)
Output:
col1 col2
b 12 22
d 14 24
e 15 25
● Columns
■ Select:
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e']),
'col3':pd.Series([31, 32, 33, 34, 35], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
print(df[['col1', 'col3']])
Output:
col1 col3
a 11 31
b 12 32
c 13 33
d 14 34
e 15 35
■ Add:
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
df['col3'] = [31, 32, 33, 34, 35]
print(df)
Output:
col1 col2 col3
a 11 21 31
b 12 22 32
c 13 23 33
d 14 24 34
e 15 25 35
■ Rename: Column Label Index can be changed using the rename() function.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
print(df)
temp = {'col1' : 'Column1',
'col2' : 'Column2'}
df = df.rename(columns = temp)
print(df)
Output:
col1 col2
a 11 21
b 12 22
c 13 23
d 14 24
e 15 25
Column1 Column2
a 11 21
b 12 22
c 13 23
d 14 24
e 15 25
■ Delete: Column(s) can be deleted using del keyword, drop() or pop() functions.
○ Using drop(): This function can delete single or multiple columns (passed as list) from
DataFrame and return the modified DataFrame. This behavior can be changed by
setting the inplace parameter as True.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e']),
'col3':pd.Series([31, 32, 33, 34, 35], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
print(df)
df.drop(['col1', 'col2'], axis=1, inplace=True)
print(df)
Output:
col1 col2 col3
a 11 21 31
b 12 22 32
c 13 23 33
d 14 24 34
e 15 25 35
col3
a 31
b 32
c 33
d 34
e 35
○ Using del: This keyword can only delete a column at a time.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e']),
'col3':pd.Series([31, 32, 33, 34, 35], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
print(df)
del df["col2"]
print(df)
Output:
col1 col2 col3
a 11 21 31
b 12 22 32
c 13 23 33
d 14 24 34
e 15 25 35
col1 col3
a 11 31
b 12 32
c 13 33
d 14 34
e 15 35
○ Using pop() function: This function can only delete a column at a time and return the
deleted column as a Series object.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e']),
'col3':pd.Series([31, 32, 33, 34, 35], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
df = pd.DataFrame(data)
print(df)
df.pop('col1')
print(df)
Output:
col1 col2 col3
a 11 21 31
b 12 22 32
c 13 23 33
d 14 24 34
e 15 25 35
col2 col3
a 21 31
b 22 32
c 23 33
d 24 34
e 25 35

Head & Tail: These functions are similar for DataFrame as that of Series i.e. used to retrieve small
amounts of data from the front or rear end and can be used to peek at the type of data stored in the
DataFrame.
● Head: head(n) function will return the first n rows of the DataFrame. If the value for n is not
passed, then by default n takes 5 and the first five rows will be displayed.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15, 16, 17, 18, 19]),
'col2':pd.Series([21, 22, 23, 24, 25, 26, 27, 28, 29])}
df = pd.DataFrame(data)
print(df.head(2))
print(df.head())
Output:
col1 col2
0 11 21
1 12 22
col1 col2
0 11 21
1 12 22
2 13 23
3 14 24
4 15 25
● Tail: tail(n) function will return the last n rows of the DataFrame. If the value for n is not passed,
then by default n takes 5 and the last five rows will be displayed.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15, 16, 17, 18, 19]),
'col2':pd.Series([21, 22, 23, 24, 25, 26, 27, 28, 29])}
df = pd.DataFrame(data)
print(df.tail(2))
print(df.tail())
Output:
col1 col2
7 18 28
8 19 29
col1 col2
4 15 25
5 16 26
6 17 27
7 18 28
8 19 29

Indexing using Labels: There are two methods available for Indexing using Labels on DataFrame i.e.,
loc and iloc. These are used in filtering the data using positional and labeled indices and also
according to some conditions and works similar to that of the Series.
S. N. Case loc iloc

1 A value Pair of labels and or integers for row Pair of integers for row and
and column column
e.g. .loc['c', 'col1'] or loc[1, 'col1'] e.g. .iloc[1, 2]
Note: Here, integer index are not
positional index if set manually to
something else

2 Multiple list(s) of labels for rows and or list(s) of integers for rows and or
rows/cols columns columns
e.g. .loc[['a', 'd'], 'col1'] e.g. .iloc[2, [1, 2]]

3 Slicing e.g. .loc[[‘r1’:’r3’], [‘col1’:’col4’]], both e.g. .iloc[[7:3:-2], [-5:-1:]], 3 and -1


‘r3’ & ‘col4’ are included will be excluded

4 Condition Pair of list of boolean values of Pair of list of boolean values of


length equal to the length of the length equal to the length of the
rows and columns of DataFrame rows and columns of DataFrame
e.g. .loc[[True, False, True, True, e.g. .iloc[[True, False, True, True,
False], [True, False]] False], [True, False]]
Note: Here, the dimension of the Note: Here, the dimension of the
DataFrame is 5, 2. DataFrame is 5, 2.

It can also take logical conditions It can not take logical conditions
such as df.loc[df.col2 > 23, :] will directly similar to loc as it only
retrieve all the columns and rows accepts a list of Boolean values
where the value in row whereas logical conditions on
corresponding to the col2 is greater Series return a boolean Series.
than 23. This limitation can be bypassed by
converting this filtered Series into
a list using the inbuilt list()
function.
e.g. df.iloc[list(df.col2 > 23), :]

Boolean Indexing: To use this feature, Indices (Column and/or Row) of the DataFrame need to be in
the Boolean (True or False) values only.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e']),
'col3':pd.Series([31, 32, 33, 34, 35], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
tempCols = {'col1' : True,
'col2' : False,
'col3' : True}
tempRows = {'a' : True,
'b' : False,
'c' : True,
'd' : True,
'e' : False}
df.rename(columns=tempCols, index=tempRows, inplace=True)
print(df)
print()
print(df.loc[False, True])
Output:
True False True
True 11 21 31
False 12 22 32
True 13 23 33
True 14 24 34
False 15 25 35

True True
False 12 32
False 15 35

1.4 Importing/Exporting data between CSV and Dataframe


Python’s Pandas library provides inbuilt methods to import and export the data from and to a CSV file
using read_csv() and to_csv() functions respectively.
● Import Data:
import pandas as pd
df = pd.read_csv('data.csv', header=None)
print(df)
Output:
0 1 2
0 1 2 NaN
1 3 4 5.0
Note: if header is not set as None, the first row of the CSV file will be treated as column label
index.
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Output:
1 2 Unnamed: 2
0 3 4 5
● Export Data:
import pandas as pd
data = [{'a': 1, 'b': 2},
{'a': 3, 'b': 'KVS', 'c': 5},
{'a': 6, 'b': 7, 'c': 8}]
df = pd.DataFrame(data)
df.to_csv('data.csv', header=False, index=False)

Note: header and index parameters take the Boolean values i.e., True or False to determine
whether to store column label index and row label index respectively along with the data in CSV
file or not.
import pandas as pd
data = [{'a': 1, 'b': 2},
{'a': 3, 'b': 'KVS', 'c': 5},
{'a': 6, 'b': 7, 'c': 8}]
df = pd.DataFrame(data)
df.to_csv('data.csv', header=True, index=True)
1.5 Data Visualisation
(Purpose of plotting; drawing and saving following types of plots using Matplotlib – line plot,
bar graph, histogram. Customising plots: adding label, title, and legend in plots)
1.5.1 Introduction
- What do we mean by Data Visualisation?
Data visualisation is the representation of data through use of common graphics, such as charts, plots,
infographics, and even animations.

- What is the purpose of Data Visualisation?


These visual displays of information communicate complex data relationships and data-driven
insights in a way that is easy to understand. Visualisation also helps to effectively communicate
information to intended users.

- Examples of Visualisation?
Everyday objects around us give a lot of information using visual queues. Take for
example Traffic symbols: Red, Orange and Green. Everyone who knows basic traffic
rules knows what these colours represent. Similar to that there are Ultrasound
reports, Atlas book of maps, speedometer of a vehicle etc. all of these give away the
information very easily. Taking advantage of human ability to understand pictures
better than words. There’s also an old saying, “A picture is worth a thousand words.”

- Which fields are benefited with the help of Data Visualisation?


Visualisation of data is effectively used in fields like economy, health, finance, science, mathematics,
engineering, etc.

2020. Any guesses? 😷 covid-19 😷


The image above, showcases the foreign tourist arrival trend in India. There is a sharp decline in

The above infographics is taken from data.gov.in, one of many such websites that provides data for the
researchers,students and enthusiasts.
1.5.2 Plotting using Matplotlib
- Installing the matplotlib library
pip install matplotlib

- Importing the library


import matplotlib.pyplot as plt
#plt is an alias or an alternative name for matplotlib.pyplot. We can use any other alias also

- What is pyplot?
The pyplot is a module of matplotlib,which contains a collection of functions that can be used to work
on a plot.

- Anatomy of a figure:
The plot() function of the pyplot module is used to create a figure. A figure is the overall window
where the outputs of pyplot functions are plotted. A figure contains a plotting area, legend, axis labels,
ticks, title, etc.

List of Pyplot functions to plot different charts

Function Description

plot(x,y,’fmt’....) Plot x versus y as lines and/or markers

bar(x, height, width, align…..) Make a bar plot

hist(x, bins, weights,...) Plot a histogram


List of Pyplot functions to customise plots

Function Description

title(label[, fontdict, loc, pad]) Set a title for the axes

xlabel(xlabel[, fontdict, labelpad]) Set the label for the x-axis

xticks([ticks, labels]) Get or set the current tick locations and labels of the x-axis

ylabel(ylabel[, fontdict, labelpad]) Set the label for the y-axis.

yticks([ticks, labels]) Get or set the current tick locations and labels of the y-axis

legend(\*args, \*\*kwargs) Place a legend on the axes

savefig(\*args, \*\*kwargs) Save the current figure

show(\*args, \*\*kw) Display all figures.

1.5.4 Line Plot: Line plot shows how data changes over time or space. The x-axis shows time or
distance. Ex. A line plot could be used to show the changes in a country's employment structure over
time. (It is best used when to showcase the gradual changes)

- a simple line plot example with formatting: plot() method is used to draw the line plot.

x=[1970,1980,1990,2000,2010]
y1=[350,480,270,620,300]
y2=[400,550,600,50,150]
plt.plot(x,y1,'o-k',label='angola')
plt.plot(x,y2,'s:b',label='zimbabwe')
plt.xlabel('decade')
plt.ylabel('gdp in millions')
plt.title('gdp of Angola and zimbabwe')
plt.legend()
plt.show()

- customisation: use the fmt(string) parameter to change the marker,line and colour of the line plot.

fmt = '[marker][line][color]'
- Note: line plot is used to indicate the change of an entity over the period of time.

Marker

Marker Symbol Description

. Point

o Circle

v, ^, <, > , , , Triangle (down,up,left,right)

s Square

+ Plus

x, X , X, X Filled

Line

Name or symbol Output

solid or -

dotted or :

dashed or --

dashdot or -.

Colour

Character code Colour

‘b’ blue
‘g’ green
‘r’ red
‘c’ cyan
‘m’ magenta
‘y’ yellow
‘k’ black
‘w’ white

1.5.5 Bar Plot: Bar plot shows grouped data as rectangular bars, e.g. the number of tourists visiting a
resort each month. Lines are unable to efficiently depict comparison among multiple entities. In order
to show comparisons, we prefer Bar charts.

Source: https://www.bbc.co.uk/bitesize/guides/z2qpg82/revision/1
-method(s):
bar() method is used to draw the vertical bar plot.
barh() method is used to draw the horizontal bar plot.

- a simple bar plot example with formatting:

x=np.array([1970,1980,1990,2000,2010])
A=[350,480,270,620,300]
Z=[400,550,600,50,150]
plt.bar(x-1,A,width=2,label='Angola')
plt.bar(x+1,Z,width=2,label='Zimbabwe')
plt.xlabel('decade')
plt.ylabel('gdp in millions')
plt.title('gdp of Angola and zimbabwe')
plt.legend()
plt.show()

- customisation:
● color: The colors of the bar faces. {Red,Green, Blue etc}
● edgecolor: The colors of the bar edges. {Red,Green, Blue etc}
● linewidth: Width of the bar edge(s).{numeric values}
● linestyle: Changing the edge line style. {'-', '--', '-.', ':',}

1.5.5 Histogram: Histograms are similar to bar charts, but they show frequencies rather than groups
of data. A histogram could be used to show frequencies of earthquakes of each magnitude on the
Richter scale.Unlike bar chart where we use discrete values for comparison, a Histogram can be used
to show continuous values (Example Height of the people).

Source: https://www.bbc.co.uk/bitesize/guides/z2qpg82/revision/1

- a simple histogram example with formatting: hist() method is used to draw the histogram.

df=pd.read_csv('https://bit.ly/3EP8BAI')
plt.hist(x=df['Height'], bins=8,
histtype = 'bar',
orientation = 'vertical')
plt.xlabel('Height in cm')
plt.ylabel('No. of students')
plt.title('distribution of height of students
from class XI')
plt.show()

- parameters:
● histtype: Changing the representation of histogram. {bar, step, stepfilled}
● orientation: {Horizontal,Vertical}
● x: Input values, this takes either a single array or a sequence of arrays.
● bins: int or sequence or str {
- If bins is an integer, it defines the number of equal-width bins in the range.
- If bins is a sequence, it defines the bin edges, including the left edge of the first bin
and the right edge of the last bin
● weights: An array of weights, of the same shape as x. Each value in x only contributes its
associated weight towards the bin count. f(x)
● cumulative: True/False when True it will plot the cumulative

cumulative=False (default) cumulative=True

1.5.6 Save Plot:


#To save any plot we have to use savefig() function E.g. plt.savefig ("<filename.filetype>")
plt.savefig ('Student_Data.pdf')
plt.savefig ('Student_Data.svg')
plt.savefig ('Student_Data.png')

1.5.6 The Dos and Don’ts of Data Visualisation:

When done right, data visualisation is a great way to display large amounts of information simply and
intuitively. However, in order to ensure that visualisations are effective, it’s important to follow a few
important standards and avoid a few all-too-common mistakes.

- Do’s:

● Keep it simple!

● Do use the full axis and maintain consistency.


● Use proper legend and labels.
● Pay attention to how colour is used. Reduce non critical colours and other attention grabbers.
● Use the right chart/graph for the data
● Let your data tell a story

- Don'ts:

● Don’t intentionally misrepresent data

Visualization by Daily Record.

● Don’t try to present too much information

Via WT Visualizations

● Don't use more than (about) six colors


● Line Charts:

● Bar Charts:

● Histogram
Functions in SQL
Function :
A function is a predefined command set that performs some operations and returns the single
value. A function can have single, multiple or no arguments at all.

SQL Functions: SQL supports two kinds of functions:

1. Single Row Functions : These functions work on a single value at a time and produce a result
for each value they operate on. They may obtain the data as an argument or from the value of a
column of a table specified as an argument.

2. Multirow Functions/Aggregate Functions : These functions work on a group of values at


a time but produce a single result for each group they work on.

Single Row Functions : Single row functions can be further classified into various types. Few of
them are :

a. Text Function:
b. Math Functions
c. Date and Time functions

We will be using the following table to demonstrate various functions in the coming text.

Table : emp
Empid Ename Job Sal DeptNo Date_of_joining
1425 Jack Manager 1500 10 1978-12-22
1422 Jill Manager 1600 20 1988-10-22
1421 Aryaman Analyst 1550 30 1988-06-15
1427 Vikram Salesman 1200 10 1982-06-23
1429 George Salesman 1200 20 1983-06-17
1477 Sandeep NULL 1500 NULL 1999-12-12

a. Text Functions : Text functions generally perform an operation on a string input value and
return a string or numeric value. Various text function are described below:

i. Ucase()/Upper() : Converts given string to Uppercase. Examples :


Select upper('Apple Is reD');
Output:
APPLE IS RED

Select Upper(‘f/kvschool/2001-12/LIB/22’);
Output
F/KVSCHOOL/2001-12/LIB/22

ii. Lcase()/Lower(): Converts given string to lowercase. Examples :


Select lower('Apple Is reD');
Output:
apple is red

Select lcase(‘f/kvschool/2001-12/LIB/22’);
Output:
f/kvschool/2001-12/lib/22’

iii. Length() : It counts the number of characters in a given string. It includes all upper and
lower case alphabets, digits, spaces and other special characters.

Examples:
Select length('Hockey is our national game');
output:
27

Select (‘f/kvschool/2001-12/LIB/22’);
Output:
25

iv. Left() : It extracts N characters from the left side of a given String.
Syntax left(String, No of Characters to be extracted)
Examples :
Select Left ( ‘Orange’, 3)
Output :
Ora

Select Left ( ‘Orange’, 10)


Output :
Orange

Note: If the number of characters extracted are more than the length of the string, the
left function returns the same string without any leading or trailing spaces.

Select left(Ename,4) from emp where deptno=10;

left(Ename,4)

Jack

Vikr

v. Right() : It extracts N characters from the right side of a given String.

Syntax: right(String, No of Characters to be extracted)

Examples :
Select Right ( ‘Orange’, 3)
Output :
nge
Select Right ( ‘Orange’, 10)
Output :
Orange

Note: if the number of characters extracted are more than the length of the string, the
right function returns the same string without any leading or trailing spaces.

vi. MID()/Substr()/Substring() : function extracts some characters from a string.


Syntax: SUBSTRING(string, start, No. of characters)
Examples:

Select Substring(‘Incredible Rajasthan’, 3, 5);


Credi

Select Substr((‘Incredible Rajasthan’, 5,3);


edi

Select MID((‘Incredible Rajasthan’, 12);


Rajasthan

Select MID((‘Incredible Rajasthan’, -4,2);


th

Note :
● proving length/ no of characters to be extracted is optional. In case it is not provided,
the function extracts all characters from the given position till the end of the string.
● If the second argument (start) is negative, it will count from the right side of the
string.
● No. of characters can not be negative, it will return empty string if supplied.

Select mid(date_of_joining, 6,2) from emp where deptno=20;

mid(date_of_joining, 6,2)

10
06

vii. Instr() : returns the position of the first occurrence of a string in another string.
Example:

SELECT INSTR(‘CBSE Exam’, ‘E’);

INSTR(‘CBSE Exam’, ‘E’)

SELECT INSTR(‘CBSE Exam’, ‘KV’);


INSTR(‘CBSE Exam’, ‘KV’)

Note: if the substring is not found in the main string, Instr() returns 0.

viii. Ltrim() : Removes Spaces on the left side of a given string.


Select Ltrim(‘### India Shining##‘)

Ltrim('### India Shining## ')

India Shining##

Note: assume there are three spaces before and two spaces after India Shining
respectively. Spaces have been represented by #

ix. Rtrim() : Removes Spaces on the Right side of a given string.


Select Rtrim(‘### India Shining## ‘)

Rtrim('### India Shining## ')

###India Shining

x. Trim() : Removes both leading (left) and Trailing (right ) Spaces from a given string.
Select Trim### India Shining## ‘)

Trim('### India Shining## ')

India Shining

Select length(Trim(‘### India Shining## ‘)

Length(Trim('### India Shining## ')

13

b. Math Functions:
i. power(x,y)/pow(x,y): It returns the x raised to the power of y (xy).

select power(2,3);
power(2,3)
8

select pow(-1,5);

pow(-1,5)

-1
select pow(-1,4);
pow(-1,4)
1

select pow(10,-2);

pow(10,-2)

0.01
Note: Here the concept of negative power will be applied.

select pow(144, 1/2);


pow(144, 1/2)
12
Note: Alternate way of finding square root.

ii. Round(N,D) : Rounds number N upto given D no. of digits (by default D=0, if not
specified)

select round(4534.9767);
round(4534.9767)

4535

select round(4534.9767,0);

round(4534.9767,0)

4535

select round(4534.9767);

round(4534.9767)

4535

select round(4534.97378778,2);

round(4534.97378778,2)

4534.97

select round(4534.97578778,2);

round(4534.97578778,2)

4534.98

select round(4534.997,2);

round(4534.997,2)

4535.00

select round(4534.997,4);
round(4534.997,4)
4534.9970

Select round(4534.997,-1);

round(4534.997,-1)
4530

select round(4584.997,-2);

round(4584.997,-2)
4600

iii. MOD() : Remainder of X/Y

select mod(13,5);

mod(13,5)
3

select mod(6,10);
+-----------+
| mod(6,10) |
+-----------+
| 6 |
+-----------+

select mod(-17, 5) rem;


+------+
| rem |
+------+
| -2 |
+------+

select mod(-17, -5) as rem;


+------+
| rem |
+------+
| -2 |
+------+

select mod(17, -5) rem;


+------+
| rem |
+------+
| 2 |
+------+

c. Date Functions():
i. Now() : returns the current date and time, as "YYYY-MM-DD HH:MM:SS" (string)

Select Now();
+---------------------+
| now() |
+---------------------+
| 2022-10-19 18:31:32 |
+---------------------+
Assuming that the current date in the system is 19-Oct-2022 and time is 6:31pm

ii. Date() : returns the date part(yyyy-mm-dd) of date time value supplied as argument.

select date('1978-02-23 00:00:10')


+-----------------------------+
| date('1978-02-23 00:00:10') |
+-----------------------------+
| 1978-02-23 |
+-----------------------------+

iii. Day() : returns the day part of the date/date-time value supplied as argument.

select day('1978-02-23');
+-------------------+
| day('1978-02-23') |
+-------------------+
| 23 |
+-------------------+

select day('1978-02-09 18:31:32');


+-------------------+
| day('1978-02-09 18:31:32') |
+-------------------+
| 9 |
+-------------------+
iv. Month() : returns the month part for a given date/date-time (a number from 1 to 12).

select month('1978-02-23');
+---------------------+
| month('1978-02-23') |
+---------------------+
| 2 |
+---------------------+
v. Year() : returns the year part for a given date/date-time.

select year('1978-02-23');
+--------------------+
| year('1978-02-23') |
+--------------------+
| 1978 |
+--------------------+
vi. MonthName() : returns the name of the month for a given date/date-time.

select monthname('2017-09-14');
+-------------------------+
| monthname('2017-09-14') |
+-------------------------+
| September |
+-------------------------+
vii. DayName() : returns the Day Name corresponding to date/date-time value supplied as
argument.
select dayname('2017-09-14');
+-----------------------+
| dayname('2017-09-14') |
+-----------------------+
| Thursday |
+-----------------------+

Group by Functions/Multi-Row functions/Aggregate Functions :


i. SUM() : it returns the sum of values of a numeric column.
● It gives the arithmetic sum of all the values present in a particular column.
● It can take only one argument.
● NULL values are not included in the calculations.

Select Sum(Sal) from emp;

Sum(sal)

7050

ii. COUNT(): the COUNT() function returns the number of rows that matches a specified
criterion.
● Takes only one argument, which can be a column name or *.
● Count doesn’t count Null Values.
● Count(*) counts the number of rows in the table. A row is counted even if all the values in
the row are null values.

Select Count(*) from emp;

Count(*)

Select Count(job) from emp;

Count(job)

Note: The number of values in job column was 6 but it return 5 as one of the value is null

iii. AVG() : the average value of a numeric column.


● It gives the arithmetic average of all the values present in a particular column.
● It can take only one argument.
● NULL values are not included in the calculations.If in a column there are 7 values out of
which 2 are NULL, to calculate the average, MySQL will divide the sum of 5 NOT NULL
values by 5.

Select avg(sal) from emp;


avg(sal)

1410.0000

iv. MAX() : The Maximum value of a column


● It gives the maximum of all the values present in a particular column.
● Value can be integer, float, decimal, varchar, char or date
● It can take only one argument.
● NULL values are not included in the calculations.
● If all the values in the column are NULL, the function returns Null Value.

Select Max(sal) from emp;

Max(sal)

1600

Select Max(ename) from emp;

Max(ename)

Vikram

Select Max(Date_of_joining) from emp

Max(Date_of_joining)

1988-10-22

v. MIN() : The Minimum value of a column.


● It gives the minimum of all the values present in a particular column.
● Value can be integer, float, decimal, varchar, char or date
● It can take only one argument.
● NULL values are not included in the calculations.
● If all the values in the column are NULL, the function returns Null Value.

select Min(sal) from emp;

Min(sal)

1200

select Min(ename) from emp;


Min(ename)

Aryaman

select Min(Date_of_joining) from emp;

Min(Date_of_joining)

1978-12-22

Group by :
The GROUP BY Clause is utilized in SQL with the SELECT statement to organize similar data
into groups. It combines the multiple records in single or more columns using some functions.

select deptno, sum(sal) from emp group by deptno;

deptno sum(sal)
10 2700
20 2800
30 1550

select job, max(sal), min(sal) from emp group by job;

job max(sal) min(sal)


Analyst 1550 1550
Manager 1600 1500
Salesman 1200 1200

Having clause : The HAVING clause was added to SQL because the WHERE keyword cannot be used
with aggregate functions.

select deptno, sum(sal) from emp group by deptno having sum(sal)>2000;

deptno sum(sal)
10 2700
20 2800

Difference between where and having clause

Where Clause Having Clause

WHERE Clause is used to filter the records HAVING Clause is used to filter the records
from the table or used while joining more from the groups based on the given
than one table.Only those records will be condition in the HAVING Clause. Those
extracted who are satisfying the specified groups who will satisfy the given condition
condition in WHERE clause. will appear in the final result

WHERE clause is used before GROUP BY HAVING clause is used after GROUP BY

Order by :
The ORDER BY clause is used to sort the query result-set in ascending or descending order. It sorts the
records in ascending order by default. To sort the records in descending order, use the DESC keyword.

Select empid, Ename, deptno from emp order by empid;

empid Ename deptno


1421 Aryaman 30
1422 Jill 20
1425 Jack 10
1427 Vikram 10
1429 George 20

Select Ename, deptno, sal from emp order by sal desc;

Ename deptno sal


Jill 20 1600
Aryaman 30 1550
Jack 10 1500
Vikram 10 1200
George 20 1200

Select Ename, deptno,sal from emp order by deptno, sal desc;

Ename deptno sal


Jack 10 1500
Vikram 10 1200
Jill 20 1600
George 20 1200
Aryaman 30 1550
Unit 3: Introduction to Computer Networks
Introduction to Networks:
In general terms, a network is a group of two or more similar things or people interconnected with
each other.

● Some examples of network are:


o Social network
o Mobile network
o Network of computers
o Airlines, railway, banks, hospitals networks
● A computer network is an interconnection among two or more computers or devices which
allow computers to share data and resources (Hardware and Software) among each other.
Types of Network
Computer networks are broadly categorized as:
• PAN (Personal Area Network)
• LAN (Local Area Network)
• MAN (Metropolitan Area Network)
• WAN (Wide Area Network)

PAN (Personal Area Network): A PAN is a network of personal devices (i.e., Mobiles,
Laptops, Printers and other IoT Devices). It can be set up using guided media (USB cable) or
unguided media (Bluetooth, Infrared, WiFi, RFID, NFC, Hotspots etc.).

Local Area Network (LAN):


▪ The geographical area covered by a LAN can range from a single room, a floor, an office having
one or more buildings in the same premise, laboratory, a school, college, or university campus.
▪ Connected with wires, Ethernet cables, fiber optics or Wi-Fi
▪ LANs provide the short range communication with the high speed data transfer rates
▪ Can be extended upto 1 km
▪ Data transfer from 10 Mbps to 1000 Mbps (Mbps- Megabits
per Second)
Metropolitan Area Network (MAN)
▪ Metropolitan Area Network (MAN) is an extended form
of LAN which covers a larger geographical area like a
city or a town.

▪ Data transfer rate is less than LAN.


▪ Eg: Cable TV Network, Cable based broadband Internet.
▪ Can be extended upto 30-40 kms.
▪ Many LANs can be connected together to form MAN.
Wide Area Network (WAN)
▪ It connects computers and other LANs and MANs, which are spread across different
geographical locations of a country or in different countries or continents.

▪ The Internet is the largest WAN that connects billions of computers, smartphones and millions
of LANs from different continents.

3.2 Network Devices


To communicate data through different transmission media and to configure networks with different
functionality, we require different devices like Modem, Hub, Switch, Repeater, Router, Gateway, etc.

Modem:
▪ Stands for ‘MOdulator (Conversion from Digital Data to Analog Signal) DEModulator (from
Analog Signal to Digital Data).
▪ Modems are connected to both the source and destination nodes
▪ The modem at the sender’s end acts as a modulator that converts the digital data into analog
signals. The modem at the receiver’s end acts as a demodulator that converts the analog signals
into digital data for the destination node.

Ethernet Card/NIC/NIU/LAN Card:


▪ It is a network adaptor used to set up a wired network.
▪ It acts as an interface between the computer and the outside
network.
▪ Ethernet cable connects the computer to the network through
NIC.
▪ Data transfer rate varies between 10 Mbps and 1 Gbps.
▪ Each NIC has a unique MAC (Media Access Control) Address/Physical Address, which helps in
uniquely identifying the computer on the network.
▪ Example of MAC Address:- 00:01:5c:10:43:ad

Repeater
▪ Data is carried in the form of signals over the cable.
▪ Signals lose their strength beyond a certain limit and become weak. The weakened signal
appearing on the cable is regenerated and put back on the cable by a repeater.
▪ Signal limit for various wired media is :
o 100 metres (Ethernet Cable),
o 500 metres (Coaxial Cable),
o Over 100 kms (Optical Fibre)
Hub

▪ An Ethernet hub is a network device used to connect different devices through wires.
▪ Data arriving on any of the lines are sent out on all the others.
▪ The limitation of hub is that if data from two devices come at the same time, they will collide

Types of Hub-

Passive Hub: This type does not amplify or boost the signal. It does not manipulate or
view the traffic that crosses it.
Active Hub: It amplifies the incoming signal before passing it to the other ports.

Switch (Intelligent Hub)

▪ Like a hub, a network switch is used to connect multiple computers or communicating devices.
▪ When data arrives, the switch extracts the destination address from the data packet and looks
it up in a table to see where to send the packet. Thus it sends signals to only selected devices
instead of sending to all.
▪ It can forward multiple packets at the same time.
Difference between Hub and Switch: The main difference between Hub & Switch is that Hub
replicates what it receives on one port to all the other ports, while Switch keeps a record of the
MAC addresses of the devices attached to it and forwards data packets onto the ports for which
it is addressed across a network, that’s why Switch is also called as an Intelligent Hub.

Router

▪ A network device that can receive the data, analyse it and transmit it to other networks.
▪ Compared to a hub or a switch, a router has advanced capabilities as it can analyze the data
being carried over a network, decide or alter how it is packaged, and send it to another
network of different types.
▪ A router can be wired or wireless.
▪ A wireless router can provide Wi-Fi access to smartphones and other devices.
▪ Wi-Fi routers may perform the dual task of a router and a modem/switch

Gateway

▪ A gateway is a device that connects dissimilar networks.


(Networks with different software and hardware
configurations and with different transmission protocols).
▪ Gateway serves as the entry and exit point of a network, as all
data coming in or going out of a network must first pass
through the gateway.
▪ It can be implemented as software, hardware, or a combination of both because a network
gateway is placed at the edge of a network and the firewall is usually integrated with it.
3.3 Network Topologies
The arrangement of computers and other networking devices in a network is called its topology. Some
common topologies are as follows:
Star Topology: Each communicating device is connected to a central node, which is a networking
device like a hub or a switch.

Advantages:

● Easy to troubleshoot
● Very effective and fast.
● Fault detection and removal of faulty parts is easier.
● In case a workstation fails, the network is not affected.
Disadvantages:-
● Difficult to expand.
● More cable is required.
● The cost of hub and cables makes it expensive over others.
● In case the hub fails, the entire network stops working.
Bus Topology

▪ Each communicating device connects to a central transmission medium, known as bus.


▪ Data transmitted in both directions.
▪ Data can be received by any of the
nodes of the network.
▪ A terminator is required at the end
of the bus.
Advantages:
▪ Single backbone wire /bus used to
connect computers hence it is cheaper.
▪ It is also easy to maintain.
Disadvantages:-
▪ Doesn’t support a very large network.
▪ Problem identification is difficult.
▪ If the main cable suffers failure or damage, the whole network fails or partially breaks down.
▪ Slower data transmission speed.

Tree/Hybrid Topology:

▪ It is a hierarchical topology, in which there are multiple branches and each branch can have one
or more basic topologies like star, ring and bus.
Features of Tree Topology
● Ideal if workstations are located in groups.
● Used in Wide Area Network.

Advantages

▪ Extension of bus and star topologies.


▪ Expansion of nodes is possible and easy.
▪ Easily managed and maintained.

Disadvantages

▪ Higher maintenance cost.


▪ Difficult to configure.

Mesh Topology

▪ Generally, each communicating device is connected with every other device in the network
Advantages:

▪ Can handle large amounts of traffic since multiple nodes can transmit data simultaneously
▪ If any node gets down doesn’t affect other nodes.
▪ Secure than other topologies as each cable carries
different data.

Disadvantages:

▪ Wiring is complex and cabling cost is high in creating


such networks.
▪ There are many redundant or unutilised connections.

3.4 Introduction to Internet, URL, WWW, and its applications


The Internet

▪ It is the global network of computing devices including desktops, laptops, servers, tablets,
mobile phones, other handheld devices as well as peripheral devices such as printers,
scanners, etc.
Applications of Internet :
Following are some of the broad areas or services provided through Internet:
1. The World Wide Web (WWW)
2. Electronic mail (Email)
3. Chat
4. Voice Over Internet Protocol (VoIP)
The World Wide Web (WWW)

● It is an ocean of information, stored in the form of trillions of interlinked web pages and web
resources.
● A British computer scientist named Tim Berners Lee, invented the revolutionary World Wide
Web in 1990 by defining three fundamental technologies that lead to creation of web:
● HTML — HyperText Markup Language
▪ language which is used to design standardized Web Pages so that the Web contents can
be read and understood from any computer across the globe.
● URL — Uniform Resource Locator
▪ A URL is the address of a given unique resource on the Web or address of a website. The
URL is an address that matches users to a specific resource online, such as a web page or
a media.

▪ Example-http://www.cbse.nic.in

● HTTP — The HyperText Transfer Protocol


▪ Set of rules which are used to retrieve linked web pages across the web
▪ A more secure and advanced version is HTTPS.

Electronic Mail (Email)

● It is one of the ways of sending and receiving message(s) using the Internet.
● can be sent anytime to any number of recipients anywhere.
● To use email service, one needs to register with an email service provider by creating a mail
account. These services may be free or paid.
● Some of the popular email service providers are Google (gmail), Yahoo (yahoo mail), Microsoft
(outlook), etc.

Chat

● Chatting or Instant Messaging (IM) over the Internet means communicating to people at
different geographic locations in real time through text message(s).
● With ever increasing internet speed, it is now possible to send images, documents, audio, video
as well through instant messengers. I
● Applications such as WhatsApp, Slack, Skype, Yahoo Messenger, Google Talk, Facebook
Messenger, Google Hangout, etc., are examples of instant messengers.

VoIP

● Voice over Internet Protocol - allows us to have voice call (telephone service) over the Internet.
● VoIP works on the simple principle of converting the analog voice signals into digital and then
transmitting them over the broadband line.These services are either free or very economical.
● VoIP call(s) can be received and made using IP phones from any place having Internet access.

● Whatsapp Call, Google Meet, Microsoft Teams, Zoom etc are examples of VoIP.

Advantage of VoIP:
● Save a lot of money.
● More than two people can communicate or speak.
● Supports high quality audio transfer.
● Can transfer text, image, video along with voice.

Disadvantages of VoIP:
● Does not work in the absence of an active Internet connection.
● Slow Internet connection will lead to poor quality of calls.

3.5
Website
▪ A website is a collection of multiple related web pages which are connected through
hyperlinks.
▪ A Website can be created for a particular purpose, theme or to provide a service.
▪ A website is stored on a web server.
Purpose of a Website

▪ Portfolio: A website of an organization or an individual to display the information like


kvsangathan.nic.in.
▪ E-Commerce: Selling products and delivering services like chroma, flipkart etc.
▪ E- Governance: Government portals like e-passport, mygov, UIDAI etc.
▪ Communication: Communicating with each other with help of Social Media like
instagram, facebook etc.
▪ Search Engines and Wikis: Posting and finding information on the internet like google,
reddit, wikipedia etc.
▪ Streaming Services: Disseminating contents online like netflix, disney+ etc.
▪ Other web based activities: Online gaming, Cloud Services etc.
Web Page
▪ A web page is a document on the WWW that is viewed in a web browser.
▪ Structure of a web page is created using HTML (HyperText Markup Language) and CSS
(Cascaded Style Sheet).
▪ Contain information in different forms, such as: text in the form of paragraphs, lists,
tables, images, audio, video, software application, other interactive content
▪ The first page of the website is called a home page
Static vs Dynamic Web Pages
Static Webpage Dynamic Webpage
The static web pages display the same In the dynamic Web pages, the page
content each time when someone visits it. content changes according to the user.
It takes less time to load over Dynamic web pages take more
internet. time while loading.
No Database used. A database is used in at the
server end in a dynamic web
page.
Changes rarely. Changes frequently.
Example: ncert.nic.in Example: twitter.com
Difference between Website and Webpage :-
Website Webpage

1. A collection of web pages which are A document which can be displayed in a


grouped together and usually connected web browser such as Firefox, Google
together in various ways, Often called a Chrome, Opera, Microsoft Internet Explorer
"web site" or simply a "site." etc.

2. Has content about various entities. Has content about a single entity.

3. More development time is required. Less development time is required.

4. Website address does not depend on the Webpage address depends on Website
Webpage address. address.

Web Server
▪ Used to store and deliver the contents of a website to clients such as a browser that
requests it. A web server can be software or hardware.
▪ The server needs to be connected to the Internet so that its contents can be made
accessible to others.
▪ The web browser from the client computer sends a request (HTTP request) for a page
containing the desired data or service. The web server then accepts, interprets, searches
and responds (HTTP response) to the request made by the web browser.
▪ If the server is not able to locate the page, it sends the error message (Error 404 – page
not found) to the client’s browser.
Web Hosting :-
▪ Online service that enables users to publish websites or web applications on the
internet. When a user sign-up for a hosting service, they basically rent some space on a
server on which the user can store all the files and data necessary for the website to
work properly.
▪ A server is a physical computer that runs without any interruption so that website is
available all the time for anyone who wants to see it.

3.6 Web Browsers


(Introduction, commonly used browsers, browser settings, add-ons and plug-ins, cookies)
Browser:
● Software application that helps us to view the web page(s).
● Helps to view different contents retrieved from different web servers on the internet
● Mosaic was the first web browser developed by the National Centre for Supercomputing
Application (NCSA).
● Mozilla Firefox is an open source web browser which
is available free of cost and can be easily downloaded
from the Internet.
Browser Setting
● Every web browser has got certain settings that define the manner in which the browser will
behave. These settings may be with respect to privacy, search engine preferences, download
options, auto signature, autofill and autocomplete feature, theme and much more.

Add-Ons and Plug-ins

▪ Add-ons and plug-ins are the tools that help to extend and modify the functionality of the
browser.
▪ Both the tools boost the performance of the browser, but are different from each other.
▪ A plug-in is a complete program or may be a third-party software. For example, Flash and Java
are plug-ins. A Flash player is required to play a video in the browser. A plug-in is a software
that is installed on the host computer and can be used by the browser for multiple
functionalities and can even be used by other applications as well.
▪ An add-on is not a complete program and so is used to add only a particular functionality to the
browser. It is also referred to as extension in some browsers
Cookies
▪ A cookie is a text file, containing a string of information, which is transferred by the website to
the browser when we browse it.
▪ This string of information gets stored in the form of a text file in the browser.
▪ The information stored is retransmitted to the server to recognise the user, by identifying
pages that were visited, choices that were made while browsing various menu(s) on a
particular website.
▪ It helps in customising the information that will be displayed, for example the choice of
language for browsing, allowing the user to auto login, remembering the shopping preference,
displaying advertisements of one’s interest, etc. Cookies are usually harmless and they can’t
access information from the hard disk of a user or transmit virus or malware.
Unit 4: Societal Impacts
Section A (5 questions of 1 mark each)
Section B (1 question of 2 mark)
Section C (1 question of 3 mark)

Digital Footprint :
· Whenever we surf the Internet using smartphones, tablets, computers, etc., we leave a trail of
data reflecting the activities performed by us online, which is our digital footprint.
· It is the traces we leave on the internet. Our digital footprint can be created and used with or
without our knowledge.

Digital Footprint can be of two categories:


Active Digital Footprints: Are created when a user, for the purpose of sharing information
about oneself by means of websites/Blog or social media, deliberately.
Passive Digital Footprints: The digital data trail we leave online unintentionally is called
passive digital footprints. This includes the data generated when we visit a website, use a
mobile App, browse Internet, etc.

How to reduce the footprint?


- Logout after you’re done surfing a website
- Keep comments/likes to a minimum
- Think before posting on a public platform
- Don’t post too much personal info online
- Prefer using incognito/private mode in the browser
- Avoid/minimize the use of public networks

Net and Communication etiquettes:

Net Etiquettes(Netiquettes): We need to exhibit proper manners and etiquettes while being online
during our social interactions
Be Ethical: No copyright violation Share the expertise
Be Respectful: Respect privacy Respect diversity
Be Responsible: Avoid cyber bullying Don’t feed the troll

Communication Etiquettes: Good communication over email, chat room and other such forums
require a digital citizen to abide by the communication etiquettes
Be Precise Be Polite Be Credible Acknowledge others

Social Media Etiquettes: There are certain etiquettes we need to follow during our presence on social
media
Be Secure: Choose a strong password Know who you befriend beware of fake info
Be Reliable: Think before you upload do no fake yourself
Data Protection:
·Data protection refers to the practices, safeguards, and binding rules put in place to protect
your personal information and ensure that you remain in control of it. In this digital age, data
or information protection is mainly about the privacy of data stored digitally.

· Privacy of such sensitive data can be implemented by encryption, authentication, and other
secure methods to ensure that such data is accessible only to the authorized user and is for a
legitimate purpose.

Intellectual property rights (IPR):


· Intellectual Property Rights provides legal ownership to one’s inventions, literary and artistic
expressions, designs and symbols, names and logos.
· This enables the creator or copyright owner to earn recognition or financial benefit by using
their creation or invention.
· Intellectual Property is legally protected through copyrights, patents, trademarks, etc.

Plagiarism:
·Plagiarism is the act of using or stealing someone else’s intellectual work, ideas etc. and
passing it as his/her own work. In other words, plagiarism is a failure in giving credit to its
source (creator).

·Plagiarism is a fraud and violation of Intellectual Property Rights. Since IPR holds a legal entity
status, violating its owner’s right is a legally punishable offense.

·Several ways to avoid plagiarism: Be original, Cite/acknowledge the source, give credits to the
owners of the contents/website.

·Use tools like TurnItIn, Grammarly etc to check for plagiarism.

Copyright:
· Copyright grants legal rights to creators for their original works like writing, photograph,
audio recordings, video, sculptures, architectural works, computer software, and other creative
works like literary and artistic work.
·Copyright law gives the copyright holder a set of rights that they alone can avail legally. It
prevents others from copying, using or selling the work. For example, writer Rudyard Kipling
holds the copyright to his novel, ‘The Jungle Book’, which tells the story of Mowgli, the jungle
boy.
· To use other’s copyrighted material, one needs to obtain a license from them.

Trademark:
Trademark includes any visual symbol, word, name, design, slogan, label, etc., that
distinguishes the brand or commercial enterprise, from other brands or commercial
enterprises. For example, no company other than Nike can use the Nike brand to sell shoes or
clothes.

Patent:
A patent is usually granted for inventions. Unlike copyright, the inventor needs to apply (file)
for patenting the invention. When a patent is granted, the owner gets an exclusive right to
prevent others from using, selling, or distributing the protected invention. Patent gives full
control to the patentee to decide
License:
· Licensing and copyrights are two sides of the same coin.
· A license is a type of contract or a permission agreement between the creator of an original
work permitting someone to use their work, generally for some price; whereas copyright is the
legal rights of the creator for the protection of original work of different types.
· Licensing is the legal term used to describe the terms under which people are allowed to use
the copyrighted material.
· A software license is an agreement that provides legally binding guidelines pertaining to the
authorized use of digital material.

Free and Open Source Software (FOSS):


· It allows users to not only access the available software but also to modify (or improve) them.
· FOSS has a large community of users and developers who are contributing continuously
towards adding new features or improving the existing features.
· Ubuntu and Fedora, Libre Office, Open Office Mozilla Firefox are some of the examples.
· Type of Open Source Licenses:
GPL (General Public License: more restrictive than other software licenses)
CC (Creative Commons: not necessarily open, common for design projects)

Hacking:
· Hacking is the act of unauthorized access to a computer, computer network or any digital
system. Hackers usually have technical expertise in hardware and software. They look for bugs
to exploit and break into the system.
· The primary focus of hacking is on security cracking and data stealing, identity theft,
monetary gain, leak of sensitive data hence it is an punishable offense under IT Act
· To avoid hacking Install antivirus/firewall, regular update OS, do not download from
untrusted website, Use strong password, Secure wireless network, use secure websites.
· Two kinds: Ethical hacking or White Hat hacker (freelancer/hired by organization or Govt )
Unethical hacking or Black Hat hacker (freelancer individual or group)

Phishing:
· Phishing is an activity where fake websites or emails that look original or authentic are
presented to the user to fraudulently collect sensitive and personal details, particularly
usernames, passwords, banking and credit card details, therefore it is an unlawful act
· Do not open links received from untrusted email/website/sms & do not reveal sensitive
information (username, password, OTP etc.) on phone call or on social media platforms.
· Generally, a URL that resembles the name of a famous website. Example jio2021.com and with
very lucrative offers like free internet for a year. When clicked a fake website opens and steals
the data or supplies a free gift of the viruses to the user. This may lead to identity theft.

Identity theft:
· When someone uses our personal information—such as our name, license, or Unique ID
number without our permission to commit a crime or fraud.
· Common ways how Identity Can Be Stolen: Data Breaches, Internet Hacking, Malware, Credit
Card Theft, Mail Theft, Phishing and Spam Attacks, Wi-Fi Hacking, Mobile Phone Theft, ATM
Skimmers.
· How to protect identity online: use up-to-date security software, try to spot spam/scams, use
strong passwords, monitor credit scores, only use reputable websites when making purchases.

Cyber Bullying:
· Any insulting, degrading or intimidating online behavior like repeated posting of rumors,
giving threats online, posting the victim’s personal information, sexual harassment or
comments aimed to publicly ridicule a victim is termed as cyber bullying.
Technology is used to harass, threaten or humiliate a target. Examples of cyberbullying are
sending mean texts, posting false information about a person online, or sharing embarrassing
photos or videos. Different Types of Cyber Bullying: Doxing, Harassment, Impersonation,
Cyberstalking.
·We may prevent cyber bullying by limiting the information we share online, Don’t feed the
troll, Think before sharing credentials with others on an online platform, Keep personal
information safe, avoid unnecessary comments & posts

Cyber Crime:
·It is defined as a crime in which a computer is the medium of crime (hacking, phishing,
spamming), or the computer is used as a tool to commit crimes (extortion, data breaches,
theft).
·In such crimes, either the computer itself is the target or the computer is used as a tool to
commit a crime.
·Cyber-crimes are carried out against either an individual, or a group, or an organization or
even against a country, with the intent to directly or indirectly cause physical harm, financial
loss or mental harassment.

Crimes Against Individual Cyber harassment and stalking, distribution of child pornography, various
types of spoofing, credit card fraud, human trafficking, identity theft etc.
Crimes Against Group/Organization These crimes include DoS, DDoS attacks, hacking, virus
transmission, computer vandalism, copyright infringement, and IPR violations.
Crimes Against Country It includes hacking, accessing confidential information, cyber warfare, cyber
terrorism, and piracy (loss of revenue).

Cyber law:
Cyber Law: “law governing cyberspace”. It includes freedom of expression, access to and usage of the
internet, and online privacy. The issues addressed by cyber law include cybercrime, e-commerce, IPR,
Data Protection.

Indian IT Act:
The Government of India’s The Information Technology Act, 2000 (also known as IT Act), amended in
2008, and provides guidelines to the user on the processing, storage and transmission of sensitive
information.
Indian IT Act, 2000 and amendment in 2008 is the cyber law of India covers:
· Guidelines on the processing, storage and transmission of sensitive information
· Cyber cells in police stations where one can report any cybercrime
· Penalties Compensation and Adjudication via cyber tribunals

E-waste: Hazards and Management


·E-waste or Electronic waste includes electric or electronic gadgets and devices that are no
longer in use, hence, discarded computers, laptops, mobile phones, televisions, tablets, music
systems, speakers, printers, scanners etc. constitute e-waste when they are near or end of their
useful life.
· E-waste is becoming one of the fastest growing environmental hazards in the world today.

Impact of e-waste on environment


E-waste is responsible for the degradation of our environment.
Emission of gases and fumes into the atmosphere,
Discharge of liquid waste into drains or disposal of solid e-waste materials,
E-waste is carelessly thrown or dumped in landfills or dumping grounds, certain elements or
metals used in production of electronic products cause air, water and soil pollution.

Impact of e-waste on human


The electrical or electronic devices are manufactured using certain metals and elements like
lead, beryllium, cadmium, Mercury plastics, etc.
Most of these materials are difficult to recycle and are considered to be toxic and carcinogenic.
Whenever these enters the human body through contaminated food, water, air or soil. Lead
affects the kidneys, brain and central nervous system. Beryllium causes skin diseases, allergies
and an increased srisk of lung cancer.
Mercury causes respiratory disorders, cadmium can damage kidneys, liver and bones,Plastic
causes various psychological problems like stress and anxiety.

Management of e-waste
E-waste management is the efficient disposal of e-waste. Although we cannot completely
destroy e-waste, still certain steps and measures have to be taken to reduce harm to the
humans and environment. Some of the feasible methods of e-waste management are reduce,
reuse and recycle.
• Reduce: We should try to reduce the generation of e-waste by
purchasing the electronic or electrical devices only according to our
need.
• Reuse: It is the process of re-using the electronic or electric waste
after slight modification.
.• Recycle: Recycling is the process of conversion of electronic devices
into something that can be used again and again in some or the other
manner.

Benefits of e-waste management


The e-waste management-
Saves the environment and natural resources
Allows for recovery of precious metals
Protects public health and water quality
Saves landfill space

Awareness about health concerns related to the usage of technology:


Health concerns related to the usage of technology:
·As digital technologies have penetrated into different fields, we are spending more time in
front of screens, be it mobile, laptop, desktop, television, gaming console, music or sound
device.
· Improper posture can be bad for us — both physically, and mentally. Spending too much time
on the Internet can be addictive and can have a negative impact on our physical and
psychological well being.
·Stress, physical fatigue and obesity are the other related impacts the body may face if one
spends too much time using digital devices. Eye strain is a symptom commonly complained by
users of digital devices who continuously look at the screen for watching, typing, chatting or
playing games, apart from this a user may face emotional issues and remain isolated.

Addressing these issues:


Such health concerns can be addressed to some extent by taking care of the way we position.
Ergonomics helps us in reducing the strain on our bodies — including the fatigue and injuries
due to prolonged use. It is better to periodically focus on distant objects, and take a break for
outdoor activities to avoid getting rid of dry, watering, or itchy eyes.
Positive aspects help us to remain fit through:
· Health apps and gadgets to monitor and alert.
· Virtual Doctor
· VR games to improve fitness in a fun manner
· Online medical records.

You might also like