Study Material IP 2022
Study Material IP 2022
Intro:
● Python is a high level (close to English) and interpreted (read and executed line by line)
programming language developed by Guido Van Rossum in the 90s.
● It can be operated via shell (interactive) or script mode.
● Identifier: A variable/name of the function can be any combination of letters, digits and
underscore characters. The first character cannot be a digit. Variables in Python are
case sensitive.
abc (all chars) 1abc (starts with a digit)
valid _val1 (underscore, char and digits) for (it is a reserved keyword) invalid
first_name (underscore as a first&name (use of special character)
connector)
● Keywords: Reserved for special use. Can’t be used as variable names. Ex. if, any, in while, else
etc
● Operators: Just like regular mathematics has operators so does the python, most are
borrowed from math
a, b=15,4
✔ Arithmetic: +, -,*,/,//,%,** Ex. print(a+b,a%b,a//b,a*b) O/P 19 3 3 60
✔ Comparison: <,==,>=; Ex. print(a>b, a<b, a==15) O/P True False True
✔ Logical: and, or, not; Ex. print(a>b and b<a-b) O/P True
✔ Membership: in, not in Ex. print (a in [3,41,50]) O/P False
● Data Types:
✔ Number (Immutable)
✔ Integer- 52, -9
✔ Float- 23,7,-0.0003,
✔ Boolean- True, False, 2>3, 5%2==1
✔ Collection
❖ String- Ordered and immutable collection of characters, digits and special symbols. Methods:
count (), find (), isupper (), isdigit (), tolower (), etc. Ex. s1,s2 = 'अजगर', ''Sita sings the
blues''
❖ List - Ordered, Heterogenous and mutable collection. Methods: count (), insert (), append (),
remove (), pop (), sort () etc. Ex. l1,l2= [1,2,3], [1109,'R Rajkumar','XII','89.25%']
❖ Tuple - Ordered, Heterogenous and immutable collection. Methods: count(), index() etc.
Ex.t1,t2= (1,2,3), (1109,'R Rajkumar','XII','89.25%')
● Common Operations:
○ * and + operator will behave same on all three.
Ex. print(s1*3) # O/P: अजगरअजगरअजगर
Ex. print(l1+l2) # O/P: [1,2,3,1,2,3]
import numpy as O/P Note: Even though lst (list object), arr (array
np [2, 1, 1, 2, 2, 1, 1, 2] object) and sr (series object) have the same data.
import pandas as [4 2 2 4] I.e. 2, 1, 1 and 2.
pd [2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2]
lst=[2,1,1,2] 0 6 When + or * operators are applied to them list
arr=np.array(lst) 1 3 behaves differently from both NumPy and series.
sr=pd.Series(lst) 2 3 lst*3 prints the list elements three times, whereas
print(lst+lst) 3 6 sr*3 multiplies 3 to the individual elements 2,1,1
print(arr+arr) dtype: int64 and 2.
print(lst*3)
print(sr*3)
Unit 1: Data Handling using Pandas and
Data Visualization
● DataFrame: It is a two-dimensional table like structure with heterogeneous data having both
rows and columns. Each column can have a different type of value such as numeric, string,
boolean, etc., as in tables of a database.
Basic Features
S. N. Series Dataframe
● Using Mathematical Operations: The mathematical operation can be performed on two series
and is done on each corresponding pair of elements. While performing operations, index
matching is implemented and all the missing values are filled in with NaN by default.
import pandas as pd
seriesA = pd.Series([1, 2, 3, 4, 5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10, 20, -10, -50, 100], index = ['z', 'y', 'a', 'c', 'e'])
print(seriesA + seriesB)
Output:
a -9.0
b NaN
c -47.0
d NaN
e 105.0
y NaN
z NaN
dtype: float64
NOTE: NaN is considered as float64. During calculation, If data is missing for a particular index,
default value can be set to be utilized by the said by using add, sub, mul & div functions i.e.,
seriesA.add(seriesB, fill_value=0).
Head & Tail: These functions are used to retrieve small amounts of data from the front or rear end
and can be used to peek at the type of data stored in the Series.
● Head: head(n) function will return the first n items of the series. If the value for n is not passed,
then by default n takes 5 and the first five items will be displayed.
import pandas as pd
mySeries = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(mySeries.head(2))
print(mySeries.head())
Output:
a 1
b 2
dtype: int64
a 1
b 2
c 3
d 4
e 5
dtype: int64
● Tail: tail(n) function will return the last n items of the series. If the value for n is not passed, then
by default n takes 5 and the last five items will be displayed.
import pandas as pd
mySeries = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(mySeries.tail(2))
print(mySeries.tail())
Output:
i 9
j 10
dtype: int64
f 6
g 7
h 8
i 9
j 10
dtype: int64
Indexing: It is used to access elements in a series. Indexes are of two types i.e., Positional Index and
Labeled Index.
● Positional Index: It takes an integer value that corresponds to its position in the series starting
from 0 to n-1 (where n is the number of items in the series)
import pandas as pd
s = pd.Series([10, 20, 30], index = ['a', 'b', 'c'])
print(s[1])
Output: 20
● Labeled Index: It takes any user-defined label as index
import pandas as pd
s = pd.Series([10, 20, 30], index = ['a', 'b', 'c'])
print(s[['a', 'c']])
Output:
a 10
c 30
dtype: int64
Slicing: It is used to retrieve a subset of the series and will be done by specifying the start, end and
step parameters [start:end:step] with the series name.
● When positional indices are used for slicing, the value at the end index position is excluded
whereas in case of slicing with labeled Indices, end label position is included.
● Default values of start and step are 0 and 1 respectively and are optional. Default value of start
will change to -1 if the value of step is negative.
● Positional Index:
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(s[:-9:-2])
Output:
j 100
h 80
f 60
d 40
dtype: int64
● Labeled Index:
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(s['b':'h':2])
Output:
b 20
d 40
f 60
h 80
dtype: int64
Selection: There are two methods available for selection of the data from a Series i.e., loc and iloc.
These are used in filtering the data using positional and labeled indices and also according to some
conditions.
● iloc: It is an indexed-based selecting method which requires an integer index to select a specific
item. If char/string based indices are used, positional based indices (0 to length-1) can be used
with this method.
● loc: It is also an indexed-based selecting method which requires a labeled index to select a specific
item. Labeled indices can be of type integer or char/string.
It can also take logical conditions It can not take logical conditions
such as my_series.loc[my_series > directly similar to loc as it only
50] will retrieve all the values which accepts a list of Boolean values
are greater than 50. whereas logical conditions on
Series return a boolean Series.
This limitation can be bypassed by
converting this filtered Series into
a list using the inbuilt list()
function..
e.g. my_series.iloc[list(my_series >
50)]
1.3 Dataframe
(creation - from dictionary of Series, list of dictionaries, Text/CSV files;
display; iteration; Operations on rows and columns: add, select, delete, rename; Head and Tail
functions; Indexing using Labels, Boolean Indexing)
DataFrame Creation: There are different ways in which a DataFrame can be created in Pandas. To
create or use DataFrame, we first need to import the Pandas library.
● Empty DataFrame:
import pandas as pd
df = pd.DataFrame()
print(df)
Output:
Empty DataFrame
Columns: []
Index: []
● Using Dictionary of Series: DataFrame can be created from a dictionary of Series where each key
represents column index label and value (Series) represents data of that particular column. While
creation, index matching is implemented and all the missing values are filled in with NaN by
default.
import pandas as pd
data = {'col1':pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'col2':pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data)
print(df)
Output:
col1 col2
a 1.0 4
b 2.0 5
c 3.0 6
d NaN 7
● Using List of Dictionaries: It can be created using a list of dictionaries where each item
(dictionary) represents a row. Keys of each dictionary act as the column label index for each row.
import pandas as pd
data = [{'a': 1, 'b': 2},
{'a': 3, 'b': 'KVS', 'c': 5},
{'a': 6, 'b': 7, 'c': 8}]
df = pd.DataFrame(data)
print(df)
Output:
a b c
0 1 2 NaN
1 3 KVS 5.0
2 6 7 8.0
● Using Text/CSV Files: It can be created from Text or CSV (type of text file) with built-in read_csv
or read_table() functions.
import pandas as pd
df = pd.read_table('data.txt', header=None, delim_whitespace=True)
print(df)
Output:
0 1 2
0 1 2 4
1 3 KVS 5
2 6 7 8
Note: read_table() will throw an error if all the rows do not have the same no. of data items.
Parameter sep=" " can also be used in place of delim_whitespace=True. It can also be used to
read a CSV file by setting the parameter sep =",".
Iteration on DataFrame:
● Row-wise: .iterrows() function can be used to iterate over the DataFrame row-wise. It is a
generator which yields both the Row Label Index and Row Data (as a Series)
import pandas as pd
data = {'col1':pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd']),
'col2':pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data)
# access rows using iteration
for index, row in df.iterrows():
print(index, row['col1'], row['col2'])
Output:
a04
b15
c26
d37
● Column-wise: .iteritems() function can be used to iterate over the DataFrame column-wise. It is a
generator which yields both the Column Label Index and Column Data (as a Series).
import pandas as pd
data = {'col1':pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd']),
'col2':pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data)
# access columns using iteration
for col_index, col_data in df.iteritems():
print(col_index)
print(col_data)
Output:
col1
a 0
b 1
c 2
d 3
Name: col1, dtype: int64
col2
a 4
b 5
c 6
d 7
Name: col2, dtype: int64
Head & Tail: These functions are similar for DataFrame as that of Series i.e. used to retrieve small
amounts of data from the front or rear end and can be used to peek at the type of data stored in the
DataFrame.
● Head: head(n) function will return the first n rows of the DataFrame. If the value for n is not
passed, then by default n takes 5 and the first five rows will be displayed.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15, 16, 17, 18, 19]),
'col2':pd.Series([21, 22, 23, 24, 25, 26, 27, 28, 29])}
df = pd.DataFrame(data)
print(df.head(2))
print(df.head())
Output:
col1 col2
0 11 21
1 12 22
col1 col2
0 11 21
1 12 22
2 13 23
3 14 24
4 15 25
● Tail: tail(n) function will return the last n rows of the DataFrame. If the value for n is not passed,
then by default n takes 5 and the last five rows will be displayed.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15, 16, 17, 18, 19]),
'col2':pd.Series([21, 22, 23, 24, 25, 26, 27, 28, 29])}
df = pd.DataFrame(data)
print(df.tail(2))
print(df.tail())
Output:
col1 col2
7 18 28
8 19 29
col1 col2
4 15 25
5 16 26
6 17 27
7 18 28
8 19 29
Indexing using Labels: There are two methods available for Indexing using Labels on DataFrame i.e.,
loc and iloc. These are used in filtering the data using positional and labeled indices and also
according to some conditions and works similar to that of the Series.
S. N. Case loc iloc
1 A value Pair of labels and or integers for row Pair of integers for row and
and column column
e.g. .loc['c', 'col1'] or loc[1, 'col1'] e.g. .iloc[1, 2]
Note: Here, integer index are not
positional index if set manually to
something else
2 Multiple list(s) of labels for rows and or list(s) of integers for rows and or
rows/cols columns columns
e.g. .loc[['a', 'd'], 'col1'] e.g. .iloc[2, [1, 2]]
It can also take logical conditions It can not take logical conditions
such as df.loc[df.col2 > 23, :] will directly similar to loc as it only
retrieve all the columns and rows accepts a list of Boolean values
where the value in row whereas logical conditions on
corresponding to the col2 is greater Series return a boolean Series.
than 23. This limitation can be bypassed by
converting this filtered Series into
a list using the inbuilt list()
function.
e.g. df.iloc[list(df.col2 > 23), :]
Boolean Indexing: To use this feature, Indices (Column and/or Row) of the DataFrame need to be in
the Boolean (True or False) values only.
import pandas as pd
data = {'col1':pd.Series([11, 12, 13, 14, 15], index=['a', 'b', 'c', 'd', 'e']),
'col2':pd.Series([21, 22, 23, 24, 25], index=['a', 'b', 'c', 'd', 'e']),
'col3':pd.Series([31, 32, 33, 34, 35], index=['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(data)
tempCols = {'col1' : True,
'col2' : False,
'col3' : True}
tempRows = {'a' : True,
'b' : False,
'c' : True,
'd' : True,
'e' : False}
df.rename(columns=tempCols, index=tempRows, inplace=True)
print(df)
print()
print(df.loc[False, True])
Output:
True False True
True 11 21 31
False 12 22 32
True 13 23 33
True 14 24 34
False 15 25 35
True True
False 12 32
False 15 35
Note: header and index parameters take the Boolean values i.e., True or False to determine
whether to store column label index and row label index respectively along with the data in CSV
file or not.
import pandas as pd
data = [{'a': 1, 'b': 2},
{'a': 3, 'b': 'KVS', 'c': 5},
{'a': 6, 'b': 7, 'c': 8}]
df = pd.DataFrame(data)
df.to_csv('data.csv', header=True, index=True)
1.5 Data Visualisation
(Purpose of plotting; drawing and saving following types of plots using Matplotlib – line plot,
bar graph, histogram. Customising plots: adding label, title, and legend in plots)
1.5.1 Introduction
- What do we mean by Data Visualisation?
Data visualisation is the representation of data through use of common graphics, such as charts, plots,
infographics, and even animations.
- Examples of Visualisation?
Everyday objects around us give a lot of information using visual queues. Take for
example Traffic symbols: Red, Orange and Green. Everyone who knows basic traffic
rules knows what these colours represent. Similar to that there are Ultrasound
reports, Atlas book of maps, speedometer of a vehicle etc. all of these give away the
information very easily. Taking advantage of human ability to understand pictures
better than words. There’s also an old saying, “A picture is worth a thousand words.”
The above infographics is taken from data.gov.in, one of many such websites that provides data for the
researchers,students and enthusiasts.
1.5.2 Plotting using Matplotlib
- Installing the matplotlib library
pip install matplotlib
- What is pyplot?
The pyplot is a module of matplotlib,which contains a collection of functions that can be used to work
on a plot.
- Anatomy of a figure:
The plot() function of the pyplot module is used to create a figure. A figure is the overall window
where the outputs of pyplot functions are plotted. A figure contains a plotting area, legend, axis labels,
ticks, title, etc.
Function Description
Function Description
xticks([ticks, labels]) Get or set the current tick locations and labels of the x-axis
yticks([ticks, labels]) Get or set the current tick locations and labels of the y-axis
1.5.4 Line Plot: Line plot shows how data changes over time or space. The x-axis shows time or
distance. Ex. A line plot could be used to show the changes in a country's employment structure over
time. (It is best used when to showcase the gradual changes)
- a simple line plot example with formatting: plot() method is used to draw the line plot.
x=[1970,1980,1990,2000,2010]
y1=[350,480,270,620,300]
y2=[400,550,600,50,150]
plt.plot(x,y1,'o-k',label='angola')
plt.plot(x,y2,'s:b',label='zimbabwe')
plt.xlabel('decade')
plt.ylabel('gdp in millions')
plt.title('gdp of Angola and zimbabwe')
plt.legend()
plt.show()
- customisation: use the fmt(string) parameter to change the marker,line and colour of the line plot.
fmt = '[marker][line][color]'
- Note: line plot is used to indicate the change of an entity over the period of time.
Marker
. Point
o Circle
s Square
+ Plus
x, X , X, X Filled
Line
solid or -
dotted or :
dashed or --
dashdot or -.
Colour
‘b’ blue
‘g’ green
‘r’ red
‘c’ cyan
‘m’ magenta
‘y’ yellow
‘k’ black
‘w’ white
1.5.5 Bar Plot: Bar plot shows grouped data as rectangular bars, e.g. the number of tourists visiting a
resort each month. Lines are unable to efficiently depict comparison among multiple entities. In order
to show comparisons, we prefer Bar charts.
Source: https://www.bbc.co.uk/bitesize/guides/z2qpg82/revision/1
-method(s):
bar() method is used to draw the vertical bar plot.
barh() method is used to draw the horizontal bar plot.
x=np.array([1970,1980,1990,2000,2010])
A=[350,480,270,620,300]
Z=[400,550,600,50,150]
plt.bar(x-1,A,width=2,label='Angola')
plt.bar(x+1,Z,width=2,label='Zimbabwe')
plt.xlabel('decade')
plt.ylabel('gdp in millions')
plt.title('gdp of Angola and zimbabwe')
plt.legend()
plt.show()
- customisation:
● color: The colors of the bar faces. {Red,Green, Blue etc}
● edgecolor: The colors of the bar edges. {Red,Green, Blue etc}
● linewidth: Width of the bar edge(s).{numeric values}
● linestyle: Changing the edge line style. {'-', '--', '-.', ':',}
1.5.5 Histogram: Histograms are similar to bar charts, but they show frequencies rather than groups
of data. A histogram could be used to show frequencies of earthquakes of each magnitude on the
Richter scale.Unlike bar chart where we use discrete values for comparison, a Histogram can be used
to show continuous values (Example Height of the people).
Source: https://www.bbc.co.uk/bitesize/guides/z2qpg82/revision/1
- a simple histogram example with formatting: hist() method is used to draw the histogram.
df=pd.read_csv('https://bit.ly/3EP8BAI')
plt.hist(x=df['Height'], bins=8,
histtype = 'bar',
orientation = 'vertical')
plt.xlabel('Height in cm')
plt.ylabel('No. of students')
plt.title('distribution of height of students
from class XI')
plt.show()
- parameters:
● histtype: Changing the representation of histogram. {bar, step, stepfilled}
● orientation: {Horizontal,Vertical}
● x: Input values, this takes either a single array or a sequence of arrays.
● bins: int or sequence or str {
- If bins is an integer, it defines the number of equal-width bins in the range.
- If bins is a sequence, it defines the bin edges, including the left edge of the first bin
and the right edge of the last bin
● weights: An array of weights, of the same shape as x. Each value in x only contributes its
associated weight towards the bin count. f(x)
● cumulative: True/False when True it will plot the cumulative
When done right, data visualisation is a great way to display large amounts of information simply and
intuitively. However, in order to ensure that visualisations are effective, it’s important to follow a few
important standards and avoid a few all-too-common mistakes.
- Do’s:
● Keep it simple!
- Don'ts:
Via WT Visualizations
● Bar Charts:
● Histogram
Functions in SQL
Function :
A function is a predefined command set that performs some operations and returns the single
value. A function can have single, multiple or no arguments at all.
1. Single Row Functions : These functions work on a single value at a time and produce a result
for each value they operate on. They may obtain the data as an argument or from the value of a
column of a table specified as an argument.
Single Row Functions : Single row functions can be further classified into various types. Few of
them are :
a. Text Function:
b. Math Functions
c. Date and Time functions
We will be using the following table to demonstrate various functions in the coming text.
Table : emp
Empid Ename Job Sal DeptNo Date_of_joining
1425 Jack Manager 1500 10 1978-12-22
1422 Jill Manager 1600 20 1988-10-22
1421 Aryaman Analyst 1550 30 1988-06-15
1427 Vikram Salesman 1200 10 1982-06-23
1429 George Salesman 1200 20 1983-06-17
1477 Sandeep NULL 1500 NULL 1999-12-12
a. Text Functions : Text functions generally perform an operation on a string input value and
return a string or numeric value. Various text function are described below:
Select Upper(‘f/kvschool/2001-12/LIB/22’);
Output
F/KVSCHOOL/2001-12/LIB/22
Select lcase(‘f/kvschool/2001-12/LIB/22’);
Output:
f/kvschool/2001-12/lib/22’
iii. Length() : It counts the number of characters in a given string. It includes all upper and
lower case alphabets, digits, spaces and other special characters.
Examples:
Select length('Hockey is our national game');
output:
27
Select (‘f/kvschool/2001-12/LIB/22’);
Output:
25
iv. Left() : It extracts N characters from the left side of a given String.
Syntax left(String, No of Characters to be extracted)
Examples :
Select Left ( ‘Orange’, 3)
Output :
Ora
Note: If the number of characters extracted are more than the length of the string, the
left function returns the same string without any leading or trailing spaces.
left(Ename,4)
Jack
Vikr
Examples :
Select Right ( ‘Orange’, 3)
Output :
nge
Select Right ( ‘Orange’, 10)
Output :
Orange
Note: if the number of characters extracted are more than the length of the string, the
right function returns the same string without any leading or trailing spaces.
Note :
● proving length/ no of characters to be extracted is optional. In case it is not provided,
the function extracts all characters from the given position till the end of the string.
● If the second argument (start) is negative, it will count from the right side of the
string.
● No. of characters can not be negative, it will return empty string if supplied.
mid(date_of_joining, 6,2)
10
06
vii. Instr() : returns the position of the first occurrence of a string in another string.
Example:
Note: if the substring is not found in the main string, Instr() returns 0.
India Shining##
Note: assume there are three spaces before and two spaces after India Shining
respectively. Spaces have been represented by #
###India Shining
x. Trim() : Removes both leading (left) and Trailing (right ) Spaces from a given string.
Select Trim### India Shining## ‘)
India Shining
13
b. Math Functions:
i. power(x,y)/pow(x,y): It returns the x raised to the power of y (xy).
select power(2,3);
power(2,3)
8
select pow(-1,5);
pow(-1,5)
-1
select pow(-1,4);
pow(-1,4)
1
select pow(10,-2);
pow(10,-2)
0.01
Note: Here the concept of negative power will be applied.
ii. Round(N,D) : Rounds number N upto given D no. of digits (by default D=0, if not
specified)
select round(4534.9767);
round(4534.9767)
4535
select round(4534.9767,0);
round(4534.9767,0)
4535
select round(4534.9767);
round(4534.9767)
4535
select round(4534.97378778,2);
round(4534.97378778,2)
4534.97
select round(4534.97578778,2);
round(4534.97578778,2)
4534.98
select round(4534.997,2);
round(4534.997,2)
4535.00
select round(4534.997,4);
round(4534.997,4)
4534.9970
Select round(4534.997,-1);
round(4534.997,-1)
4530
select round(4584.997,-2);
round(4584.997,-2)
4600
select mod(13,5);
mod(13,5)
3
select mod(6,10);
+-----------+
| mod(6,10) |
+-----------+
| 6 |
+-----------+
c. Date Functions():
i. Now() : returns the current date and time, as "YYYY-MM-DD HH:MM:SS" (string)
Select Now();
+---------------------+
| now() |
+---------------------+
| 2022-10-19 18:31:32 |
+---------------------+
Assuming that the current date in the system is 19-Oct-2022 and time is 6:31pm
ii. Date() : returns the date part(yyyy-mm-dd) of date time value supplied as argument.
iii. Day() : returns the day part of the date/date-time value supplied as argument.
select day('1978-02-23');
+-------------------+
| day('1978-02-23') |
+-------------------+
| 23 |
+-------------------+
select month('1978-02-23');
+---------------------+
| month('1978-02-23') |
+---------------------+
| 2 |
+---------------------+
v. Year() : returns the year part for a given date/date-time.
select year('1978-02-23');
+--------------------+
| year('1978-02-23') |
+--------------------+
| 1978 |
+--------------------+
vi. MonthName() : returns the name of the month for a given date/date-time.
select monthname('2017-09-14');
+-------------------------+
| monthname('2017-09-14') |
+-------------------------+
| September |
+-------------------------+
vii. DayName() : returns the Day Name corresponding to date/date-time value supplied as
argument.
select dayname('2017-09-14');
+-----------------------+
| dayname('2017-09-14') |
+-----------------------+
| Thursday |
+-----------------------+
Sum(sal)
7050
ii. COUNT(): the COUNT() function returns the number of rows that matches a specified
criterion.
● Takes only one argument, which can be a column name or *.
● Count doesn’t count Null Values.
● Count(*) counts the number of rows in the table. A row is counted even if all the values in
the row are null values.
Count(*)
Count(job)
Note: The number of values in job column was 6 but it return 5 as one of the value is null
1410.0000
Max(sal)
1600
Max(ename)
Vikram
Max(Date_of_joining)
1988-10-22
Min(sal)
1200
Aryaman
Min(Date_of_joining)
1978-12-22
Group by :
The GROUP BY Clause is utilized in SQL with the SELECT statement to organize similar data
into groups. It combines the multiple records in single or more columns using some functions.
deptno sum(sal)
10 2700
20 2800
30 1550
Having clause : The HAVING clause was added to SQL because the WHERE keyword cannot be used
with aggregate functions.
deptno sum(sal)
10 2700
20 2800
WHERE Clause is used to filter the records HAVING Clause is used to filter the records
from the table or used while joining more from the groups based on the given
than one table.Only those records will be condition in the HAVING Clause. Those
extracted who are satisfying the specified groups who will satisfy the given condition
condition in WHERE clause. will appear in the final result
WHERE clause is used before GROUP BY HAVING clause is used after GROUP BY
Order by :
The ORDER BY clause is used to sort the query result-set in ascending or descending order. It sorts the
records in ascending order by default. To sort the records in descending order, use the DESC keyword.
PAN (Personal Area Network): A PAN is a network of personal devices (i.e., Mobiles,
Laptops, Printers and other IoT Devices). It can be set up using guided media (USB cable) or
unguided media (Bluetooth, Infrared, WiFi, RFID, NFC, Hotspots etc.).
▪ The Internet is the largest WAN that connects billions of computers, smartphones and millions
of LANs from different continents.
Modem:
▪ Stands for ‘MOdulator (Conversion from Digital Data to Analog Signal) DEModulator (from
Analog Signal to Digital Data).
▪ Modems are connected to both the source and destination nodes
▪ The modem at the sender’s end acts as a modulator that converts the digital data into analog
signals. The modem at the receiver’s end acts as a demodulator that converts the analog signals
into digital data for the destination node.
Repeater
▪ Data is carried in the form of signals over the cable.
▪ Signals lose their strength beyond a certain limit and become weak. The weakened signal
appearing on the cable is regenerated and put back on the cable by a repeater.
▪ Signal limit for various wired media is :
o 100 metres (Ethernet Cable),
o 500 metres (Coaxial Cable),
o Over 100 kms (Optical Fibre)
Hub
▪ An Ethernet hub is a network device used to connect different devices through wires.
▪ Data arriving on any of the lines are sent out on all the others.
▪ The limitation of hub is that if data from two devices come at the same time, they will collide
Types of Hub-
Passive Hub: This type does not amplify or boost the signal. It does not manipulate or
view the traffic that crosses it.
Active Hub: It amplifies the incoming signal before passing it to the other ports.
▪ Like a hub, a network switch is used to connect multiple computers or communicating devices.
▪ When data arrives, the switch extracts the destination address from the data packet and looks
it up in a table to see where to send the packet. Thus it sends signals to only selected devices
instead of sending to all.
▪ It can forward multiple packets at the same time.
Difference between Hub and Switch: The main difference between Hub & Switch is that Hub
replicates what it receives on one port to all the other ports, while Switch keeps a record of the
MAC addresses of the devices attached to it and forwards data packets onto the ports for which
it is addressed across a network, that’s why Switch is also called as an Intelligent Hub.
Router
▪ A network device that can receive the data, analyse it and transmit it to other networks.
▪ Compared to a hub or a switch, a router has advanced capabilities as it can analyze the data
being carried over a network, decide or alter how it is packaged, and send it to another
network of different types.
▪ A router can be wired or wireless.
▪ A wireless router can provide Wi-Fi access to smartphones and other devices.
▪ Wi-Fi routers may perform the dual task of a router and a modem/switch
Gateway
Advantages:
● Easy to troubleshoot
● Very effective and fast.
● Fault detection and removal of faulty parts is easier.
● In case a workstation fails, the network is not affected.
Disadvantages:-
● Difficult to expand.
● More cable is required.
● The cost of hub and cables makes it expensive over others.
● In case the hub fails, the entire network stops working.
Bus Topology
Tree/Hybrid Topology:
▪ It is a hierarchical topology, in which there are multiple branches and each branch can have one
or more basic topologies like star, ring and bus.
Features of Tree Topology
● Ideal if workstations are located in groups.
● Used in Wide Area Network.
Advantages
Disadvantages
Mesh Topology
▪ Generally, each communicating device is connected with every other device in the network
Advantages:
▪ Can handle large amounts of traffic since multiple nodes can transmit data simultaneously
▪ If any node gets down doesn’t affect other nodes.
▪ Secure than other topologies as each cable carries
different data.
Disadvantages:
▪ It is the global network of computing devices including desktops, laptops, servers, tablets,
mobile phones, other handheld devices as well as peripheral devices such as printers,
scanners, etc.
Applications of Internet :
Following are some of the broad areas or services provided through Internet:
1. The World Wide Web (WWW)
2. Electronic mail (Email)
3. Chat
4. Voice Over Internet Protocol (VoIP)
The World Wide Web (WWW)
● It is an ocean of information, stored in the form of trillions of interlinked web pages and web
resources.
● A British computer scientist named Tim Berners Lee, invented the revolutionary World Wide
Web in 1990 by defining three fundamental technologies that lead to creation of web:
● HTML — HyperText Markup Language
▪ language which is used to design standardized Web Pages so that the Web contents can
be read and understood from any computer across the globe.
● URL — Uniform Resource Locator
▪ A URL is the address of a given unique resource on the Web or address of a website. The
URL is an address that matches users to a specific resource online, such as a web page or
a media.
▪ Example-http://www.cbse.nic.in
● It is one of the ways of sending and receiving message(s) using the Internet.
● can be sent anytime to any number of recipients anywhere.
● To use email service, one needs to register with an email service provider by creating a mail
account. These services may be free or paid.
● Some of the popular email service providers are Google (gmail), Yahoo (yahoo mail), Microsoft
(outlook), etc.
Chat
● Chatting or Instant Messaging (IM) over the Internet means communicating to people at
different geographic locations in real time through text message(s).
● With ever increasing internet speed, it is now possible to send images, documents, audio, video
as well through instant messengers. I
● Applications such as WhatsApp, Slack, Skype, Yahoo Messenger, Google Talk, Facebook
Messenger, Google Hangout, etc., are examples of instant messengers.
VoIP
● Voice over Internet Protocol - allows us to have voice call (telephone service) over the Internet.
● VoIP works on the simple principle of converting the analog voice signals into digital and then
transmitting them over the broadband line.These services are either free or very economical.
● VoIP call(s) can be received and made using IP phones from any place having Internet access.
● Whatsapp Call, Google Meet, Microsoft Teams, Zoom etc are examples of VoIP.
Advantage of VoIP:
● Save a lot of money.
● More than two people can communicate or speak.
● Supports high quality audio transfer.
● Can transfer text, image, video along with voice.
Disadvantages of VoIP:
● Does not work in the absence of an active Internet connection.
● Slow Internet connection will lead to poor quality of calls.
3.5
Website
▪ A website is a collection of multiple related web pages which are connected through
hyperlinks.
▪ A Website can be created for a particular purpose, theme or to provide a service.
▪ A website is stored on a web server.
Purpose of a Website
2. Has content about various entities. Has content about a single entity.
4. Website address does not depend on the Webpage address depends on Website
Webpage address. address.
Web Server
▪ Used to store and deliver the contents of a website to clients such as a browser that
requests it. A web server can be software or hardware.
▪ The server needs to be connected to the Internet so that its contents can be made
accessible to others.
▪ The web browser from the client computer sends a request (HTTP request) for a page
containing the desired data or service. The web server then accepts, interprets, searches
and responds (HTTP response) to the request made by the web browser.
▪ If the server is not able to locate the page, it sends the error message (Error 404 – page
not found) to the client’s browser.
Web Hosting :-
▪ Online service that enables users to publish websites or web applications on the
internet. When a user sign-up for a hosting service, they basically rent some space on a
server on which the user can store all the files and data necessary for the website to
work properly.
▪ A server is a physical computer that runs without any interruption so that website is
available all the time for anyone who wants to see it.
▪ Add-ons and plug-ins are the tools that help to extend and modify the functionality of the
browser.
▪ Both the tools boost the performance of the browser, but are different from each other.
▪ A plug-in is a complete program or may be a third-party software. For example, Flash and Java
are plug-ins. A Flash player is required to play a video in the browser. A plug-in is a software
that is installed on the host computer and can be used by the browser for multiple
functionalities and can even be used by other applications as well.
▪ An add-on is not a complete program and so is used to add only a particular functionality to the
browser. It is also referred to as extension in some browsers
Cookies
▪ A cookie is a text file, containing a string of information, which is transferred by the website to
the browser when we browse it.
▪ This string of information gets stored in the form of a text file in the browser.
▪ The information stored is retransmitted to the server to recognise the user, by identifying
pages that were visited, choices that were made while browsing various menu(s) on a
particular website.
▪ It helps in customising the information that will be displayed, for example the choice of
language for browsing, allowing the user to auto login, remembering the shopping preference,
displaying advertisements of one’s interest, etc. Cookies are usually harmless and they can’t
access information from the hard disk of a user or transmit virus or malware.
Unit 4: Societal Impacts
Section A (5 questions of 1 mark each)
Section B (1 question of 2 mark)
Section C (1 question of 3 mark)
Digital Footprint :
· Whenever we surf the Internet using smartphones, tablets, computers, etc., we leave a trail of
data reflecting the activities performed by us online, which is our digital footprint.
· It is the traces we leave on the internet. Our digital footprint can be created and used with or
without our knowledge.
Net Etiquettes(Netiquettes): We need to exhibit proper manners and etiquettes while being online
during our social interactions
Be Ethical: No copyright violation Share the expertise
Be Respectful: Respect privacy Respect diversity
Be Responsible: Avoid cyber bullying Don’t feed the troll
Communication Etiquettes: Good communication over email, chat room and other such forums
require a digital citizen to abide by the communication etiquettes
Be Precise Be Polite Be Credible Acknowledge others
Social Media Etiquettes: There are certain etiquettes we need to follow during our presence on social
media
Be Secure: Choose a strong password Know who you befriend beware of fake info
Be Reliable: Think before you upload do no fake yourself
Data Protection:
·Data protection refers to the practices, safeguards, and binding rules put in place to protect
your personal information and ensure that you remain in control of it. In this digital age, data
or information protection is mainly about the privacy of data stored digitally.
· Privacy of such sensitive data can be implemented by encryption, authentication, and other
secure methods to ensure that such data is accessible only to the authorized user and is for a
legitimate purpose.
Plagiarism:
·Plagiarism is the act of using or stealing someone else’s intellectual work, ideas etc. and
passing it as his/her own work. In other words, plagiarism is a failure in giving credit to its
source (creator).
·Plagiarism is a fraud and violation of Intellectual Property Rights. Since IPR holds a legal entity
status, violating its owner’s right is a legally punishable offense.
·Several ways to avoid plagiarism: Be original, Cite/acknowledge the source, give credits to the
owners of the contents/website.
Copyright:
· Copyright grants legal rights to creators for their original works like writing, photograph,
audio recordings, video, sculptures, architectural works, computer software, and other creative
works like literary and artistic work.
·Copyright law gives the copyright holder a set of rights that they alone can avail legally. It
prevents others from copying, using or selling the work. For example, writer Rudyard Kipling
holds the copyright to his novel, ‘The Jungle Book’, which tells the story of Mowgli, the jungle
boy.
· To use other’s copyrighted material, one needs to obtain a license from them.
Trademark:
Trademark includes any visual symbol, word, name, design, slogan, label, etc., that
distinguishes the brand or commercial enterprise, from other brands or commercial
enterprises. For example, no company other than Nike can use the Nike brand to sell shoes or
clothes.
Patent:
A patent is usually granted for inventions. Unlike copyright, the inventor needs to apply (file)
for patenting the invention. When a patent is granted, the owner gets an exclusive right to
prevent others from using, selling, or distributing the protected invention. Patent gives full
control to the patentee to decide
License:
· Licensing and copyrights are two sides of the same coin.
· A license is a type of contract or a permission agreement between the creator of an original
work permitting someone to use their work, generally for some price; whereas copyright is the
legal rights of the creator for the protection of original work of different types.
· Licensing is the legal term used to describe the terms under which people are allowed to use
the copyrighted material.
· A software license is an agreement that provides legally binding guidelines pertaining to the
authorized use of digital material.
Hacking:
· Hacking is the act of unauthorized access to a computer, computer network or any digital
system. Hackers usually have technical expertise in hardware and software. They look for bugs
to exploit and break into the system.
· The primary focus of hacking is on security cracking and data stealing, identity theft,
monetary gain, leak of sensitive data hence it is an punishable offense under IT Act
· To avoid hacking Install antivirus/firewall, regular update OS, do not download from
untrusted website, Use strong password, Secure wireless network, use secure websites.
· Two kinds: Ethical hacking or White Hat hacker (freelancer/hired by organization or Govt )
Unethical hacking or Black Hat hacker (freelancer individual or group)
Phishing:
· Phishing is an activity where fake websites or emails that look original or authentic are
presented to the user to fraudulently collect sensitive and personal details, particularly
usernames, passwords, banking and credit card details, therefore it is an unlawful act
· Do not open links received from untrusted email/website/sms & do not reveal sensitive
information (username, password, OTP etc.) on phone call or on social media platforms.
· Generally, a URL that resembles the name of a famous website. Example jio2021.com and with
very lucrative offers like free internet for a year. When clicked a fake website opens and steals
the data or supplies a free gift of the viruses to the user. This may lead to identity theft.
Identity theft:
· When someone uses our personal information—such as our name, license, or Unique ID
number without our permission to commit a crime or fraud.
· Common ways how Identity Can Be Stolen: Data Breaches, Internet Hacking, Malware, Credit
Card Theft, Mail Theft, Phishing and Spam Attacks, Wi-Fi Hacking, Mobile Phone Theft, ATM
Skimmers.
· How to protect identity online: use up-to-date security software, try to spot spam/scams, use
strong passwords, monitor credit scores, only use reputable websites when making purchases.
Cyber Bullying:
· Any insulting, degrading or intimidating online behavior like repeated posting of rumors,
giving threats online, posting the victim’s personal information, sexual harassment or
comments aimed to publicly ridicule a victim is termed as cyber bullying.
Technology is used to harass, threaten or humiliate a target. Examples of cyberbullying are
sending mean texts, posting false information about a person online, or sharing embarrassing
photos or videos. Different Types of Cyber Bullying: Doxing, Harassment, Impersonation,
Cyberstalking.
·We may prevent cyber bullying by limiting the information we share online, Don’t feed the
troll, Think before sharing credentials with others on an online platform, Keep personal
information safe, avoid unnecessary comments & posts
Cyber Crime:
·It is defined as a crime in which a computer is the medium of crime (hacking, phishing,
spamming), or the computer is used as a tool to commit crimes (extortion, data breaches,
theft).
·In such crimes, either the computer itself is the target or the computer is used as a tool to
commit a crime.
·Cyber-crimes are carried out against either an individual, or a group, or an organization or
even against a country, with the intent to directly or indirectly cause physical harm, financial
loss or mental harassment.
Crimes Against Individual Cyber harassment and stalking, distribution of child pornography, various
types of spoofing, credit card fraud, human trafficking, identity theft etc.
Crimes Against Group/Organization These crimes include DoS, DDoS attacks, hacking, virus
transmission, computer vandalism, copyright infringement, and IPR violations.
Crimes Against Country It includes hacking, accessing confidential information, cyber warfare, cyber
terrorism, and piracy (loss of revenue).
Cyber law:
Cyber Law: “law governing cyberspace”. It includes freedom of expression, access to and usage of the
internet, and online privacy. The issues addressed by cyber law include cybercrime, e-commerce, IPR,
Data Protection.
Indian IT Act:
The Government of India’s The Information Technology Act, 2000 (also known as IT Act), amended in
2008, and provides guidelines to the user on the processing, storage and transmission of sensitive
information.
Indian IT Act, 2000 and amendment in 2008 is the cyber law of India covers:
· Guidelines on the processing, storage and transmission of sensitive information
· Cyber cells in police stations where one can report any cybercrime
· Penalties Compensation and Adjudication via cyber tribunals
Management of e-waste
E-waste management is the efficient disposal of e-waste. Although we cannot completely
destroy e-waste, still certain steps and measures have to be taken to reduce harm to the
humans and environment. Some of the feasible methods of e-waste management are reduce,
reuse and recycle.
• Reduce: We should try to reduce the generation of e-waste by
purchasing the electronic or electrical devices only according to our
need.
• Reuse: It is the process of re-using the electronic or electric waste
after slight modification.
.• Recycle: Recycling is the process of conversion of electronic devices
into something that can be used again and again in some or the other
manner.