0% found this document useful (0 votes)
6 views

sql class - Copy

A database table is a collection of related data entries organized in columns and rows, where columns represent specific information about records and rows represent individual entries. Relational Database Management Systems (RDBMS) utilize tables to define relationships between data, allowing multiple users to access the database simultaneously. Key concepts include primary keys for unique identification of records, foreign keys for maintaining relationships between tables, and various SQL aggregate functions for data manipulation.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

sql class - Copy

A database table is a collection of related data entries organized in columns and rows, where columns represent specific information about records and rows represent individual entries. Relational Database Management Systems (RDBMS) utilize tables to define relationships between data, allowing multiple users to access the database simultaneously. Key concepts include primary keys for unique identification of records, foreign keys for maintaining relationships between tables, and various SQL aggregate functions for data manipulation.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

What is a Database Table?

A table is a collection of related data entries, and it consists of columns and rows.

A column holds specific information about every record in the table.

A record (or row) is each individual entry that exists in a table.

RDBMS : A relational database defines database relationships in the form of tables. The tables are
related to each other - based on data common to each.

Sharing a database allows multiple users and programs to access the database
simultaneously.

The goal of a DBMS is to provide an Environment that is both convenient and


efficient to use in

retrieving Information from the database.

Storing information into the database. Databases are usually designed to manage
large bodies of information. This involves

definition of structures for information storage (data modelling).

provision of mechanisms for the manipulation of information (file and systems


structure, query processing)

providing for the safety of information in the database (crash recovery and security)
concurrency control if the system is shared by users. Components of
Database systems

components of Database systems

(i) Data
(ii) Software
(iii) (Hardware
(iv) users Data Abstraction
Main purpose of a database system is to provide users with an abstract view of the
system. The system hides certain details of how data is stored and created
and maintained all complexity are hidden from database users.

MySQL PRIMARY KEY Constraint


The PRIMARY KEY constraint uniquely identifies each record in a table.

Primary keys must contain UNIQUE values, and cannot contain NULL values.

A table can have only ONE primary key; and in the table, this primary key can consist of single or
multiple columns (fields).

CREATE TABLE Persons (


ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);

CREATE TABLE Persons (


ID int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);

MySQL FOREIGN KEY Constraint


The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.

A FOREIGN KEY is a field (or collection of fields) in one table, that refers to the PRIMARY KEY in another
table.

The table with the foreign key is called the child table, and the table with the primary key is called the
referenced or parent table.

SQL Aggregate Functions


an aggregate function is a function that performs a calculation on a set of values,
and returns a single value.


 MIN() - returns the smallest value within the selected column
 MAX() - returns the largest value within the selected column
 COUNT() - returns the number of rows in a set
 SUM() - returns the total sum of a numerical column
 AVG() - returns the average value of a numerical column

Aggregate functions ignore null values (except for COUNT()).

SELECT max(empsalr) from JNVemployee;

SELECT MIN(empsalr) from JNVemployee;

select * from JNVemployee;

insert into JNVemployee (empno,empname,empadr,empmail,empsalr,empmbl) values


(1004, "vamshi","manchirial","[email protected]",310000,901423461);

create table JNVemployee(empno int(20), empname varchar(15),empadr


varchar(20),empmail varcahr(15), empsalr float(15), empmbl int(10));

insert into music (sno, name, class) values ( 101, "srinu"," POSTGRADUATION");

create table music( sno int(10), name varchar(15), class varchar(10))


sub1=int(input("Enter marks of the first subject: "))
sub2=int(input("Enter marks of the second subject: "))
sub3=int(input("Enter marks of the third subject: "))
sub4=int(input("Enter marks of the fourth subject: "))
sub5=int(input("Enter marks of the fifth subject: "))
avg=(sub1+sub2+sub3+sub4+sub4)/5
if(avg>=90):
print("Grade: A")
elif(avg>=80&avg<90):
print("Grade: B")
elif(avg>=70&avg<80):
print("Grade: C")
elif(avg>=60&avg<70):
print("Grade: D")
else:
print("Grade: F")

 List is a collection which is ordered and changeable. Allows duplicate members.


 Tuple is a collection which is ordered and unchangeable. Allows duplicate
members.
 Set is a collection which is unordered, unchangeable*, and unindexed. No
duplicate members.
 Dictionary is a collection which is ordered** and changeable. No duplicate
members.

list
Lists are used to store multiple items in a single variable.

Lists are one of 4 built-in data types in Python used to store collections of data, the
other 3 are Tuple, Set, and Dictionary, all with different qualities and usage.

Lists are created using square brackets:

list = ["apple", "banana", "cherry"]


print(list)

List items are ordered, changeable, and allow duplicate values.

List items are indexed, the first item has index [0], the second item has
index [1] etc.

list = ["apple", "banana", "cherry", "apple", "cherry"]


print(list)

len() function:

to determine how many items a list has, use the len() function:

list = ["apple", "banana", "cherry"]


print(len(list))

o/p= 3

List items can be of any data type:


list1 = ["apple", "banana", "cherry"]
list2 = [1, 5, 7, 9, 3]
list3 = [True, False, False]

list4 = ["abc", 34, True, 40, "male"]

list = ["apple", "banana", "cherry"]


print(type(list))

o/p- <class, ‘list’>

Negative indexing means start from the end(-1,-2)

Print the last item of the list:

List3 = ["apple", "banana", "cherry"]


print(list3[-1])

Range of Indexes
You can specify a range of indexes by specifying where to start and where to end the
range.

When specifying a range, the return value will be a new list with the specified items.

List4 =
["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]
print(list4[2:5])

Note: The search will start at index 2 (included) and end at index 5 (not included).

Check if apple is present in the list:

List5 = ["apple", "banana", "cherry"]


if "apple" in list5:
print("Yes, 'apple' is in the fruits list")Try it Yourself »

Insert Items
To insert a new list item, without replacing any of the existing values, we can use
the insert() method.

The insert() method inserts an item at the specified index:

Example
Insert "watermelon" as the third item:

thislist = ["apple", "banana", "cherry"]


thislist.insert(2, "watermelon")
print(thislist)

Append Items
To add an item to the end of the list, use the append() method:

ExampleGet your own Python Server


Using the append() method to append an item:

thislist = ["apple", "banana", "cherry"]


thislist.append("orange")
print(thislist)

Extend List
To append elements from another list to the current list, use the extend() method.

Example
Add the elements of tropical to thislist:

thislist = ["apple", "banana", "cherry"]


tropical = ["mango", "pineapple", "papaya"]
thislist.extend(tropical)
print(thislist)

Remove Specified Item


The remove() method removes the specified item.
ExampleGet your own Python Server
Remove "banana":

thislist = ["apple", "banana", "cherry"]


thislist.remove("banana")
print(thislist)

If there are more than one item with the specified value, the remove() method removes
the first occurrence:

Example
Remove the first occurrence of "banana":

thislist = ["apple", "banana", "cherry", "banana", "kiwi"]


thislist.remove("banana")
print(thislist)

Remove Specified Index


The pop() method removes the specified index.

Example
Remove the second item:

thislist = ["apple", "banana", "cherry"]


thislist.pop(1)
print(thislist)

If you do not specify the index, the pop() method removes the last item.

Example
Remove the last item:

thislist = ["apple", "banana", "cherry"]


thislist.pop()
print(thislist)
The del keyword also removes the specified index:

Example
Remove the first item:

thislist = ["apple", "banana", "cherry"]


del thislist[0]
print(thislist)

Try it Yourself »

The del keyword can also delete the list completely.

Example
Delete the entire list:

thislist = ["apple", "banana", "cherry"]


del thislist

Clear the List


The clear() method empties the list.

The list still remains, but it has no content.

thislist = ["apple", "banana", "cherry"]


thislist.clear()
print(thislist)

Tuple
Tuples are used to store multiple items in a single variable.

Tuple is one of 4 built-in data types in Python used to store collections of data, the other 3 are List, Set,
and Dictionary, all with different qualities and usage.

A tuple is a collection which is ordered and unchangeable.


Tuples are written with round brackets.

Tuple1 = ("apple", "banana", "cherry")


print(tuple1)

o/p-("apple", "banana", "cherry")

Tuple items are ordered, unchangeable, and allow duplicate values.

Tuple items are indexed, the first item has index [0], the second item has index [1] etc.

Tuples allow duplicate values:

Tuple2 = ("apple", "banana", "cherry", "apple", "cherry")


print(tuple2)

o/p=("apple", "banana", "cherry", "apple", "cherry")

Tuple Length
To determine how many items a tuple has, use the len() function:

Tuple2 = ("apple", "banana", "cherry", "apple", "cherry")


print(len(tuple2))

o/p=5

TYPE ()

mytuple = ("apple", "banana", "cherry")


print(type(mytuple))

<CLASS ‘TUPLE’>

Change Tuple Values


Once a tuple is created, you cannot change its values. Tuples are unchangeable, or immutable as it
also is called.

But there is a workaround. You can convert the tuple into a list, change the list, and convert the list back
into a tuple

Convert the tuple into a list to be able to change it:

x = ("apple", "banana", "cherry")


y = list(x)
y[1] = "kiwi"
x = tuple(y)

print(x)

TEHN PERFORM

APPEND()

EXTEND()

INSERT()

REMOVE()

POP()

Dictionary
Dictionaries are used to store data values in key:value pairs.

A dictionary is a collection which is ordered*, changeable and do not allow duplicates.

Dict1 = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(dict1)

o/p=’brand’:’Ford’, ‘model’: ‘Mustang’,’year’: 1964

Dictionary items are ordered, changeable, and do not allow duplicates.

Dictionary items are presented in key:value pairs, and can be referred to by using the key name.

thisdict = {

"brand": "Ford",

"model": "Mustang",

"year": 1964

print(thisdict["brand"]).

When we say that dictionaries are ordered, it means that the items have a defined order, and that order
will not change.
Unordered means that the items do not have a defined order, you cannot refer to an item by using an
index.

Dictionaries are changeable, meaning that we can change, add or remove items after the dictionary has
been created.

Duplicates Not Allowed


Dictionaries cannot have two items with the same key:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964,
"year": 2020
}
print(thisdict)

Accessing Items
You can access the items of a dictionary by referring to its key name, inside square brackets:

Get the value of the "model" key:

thisdict = {
"brand": "HP",
"PROC": "INTELCORE",
"year": 1972
}
x = thisdict["PROC"]

There is also a method called get() that will give you the same result:

x = thisdict.get("model")

The keys() method will return a list of all the keys in the dictionary.

Get a list of the keys:

x = thisdict.keys()

The values() method will return a list of all the values in the dictionary.

Get a list of the keys:

x = thisdict.values()
Change Values
You can change the value of a specific item by referring to its key name:

Change the "year" to 2018:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["year"] = 2018

Update Dictionary
The update() method will update the dictionary with the items from the given argument.

The argument must be a dictionary, or an iterable object with key:value pairs.

Update the "year" of the car by using the update() method:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.update({"year": 2020})

Adding Items
Adding an item to the dictionary is done by using a new index key and assigning a value to it:

ExampleGet your own Python Server


thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["color"] = "red"
print(thisdict)

Removing Items
There are several methods to remove items from a dictionary:
ExampleGet your own Python Server
The pop() method removes the item with the specified key name:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.pop("model")
print(thisdict)

The popitem() method removes the last inserted item (in versions before 3.7, a random item is
removed instead):

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.popitem()
print(thisdict)

DELETE()

The del keyword removes the item with the specified key name:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
del thisdict["model"]
print(thisdict)

CLEAR()

The clear() method empties the dictionary:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.clear()
print(thisdict)

Set
Sets are used to store multiple items in a single variable.
Set is one of 4 built-in data types in Python used to store collections of data, the other 3 are List, Tuple,
and Dictionary, all with different qualities and usage.

A set is a collection which is unordered, unchangeable*, and unindexed.

Sets are written with curly brackets.


thisset = {"apple", "banana", "cherry"}
print(thisset)
Sets are unordered, so you cannot be sure in which order the items will appear.
Set items are unordered, unchangeable, and do not allow duplicate values.

Unordered
Unordered means that the items in a set do not have a defined order.

Set items can appear in a different order every time you use them, and cannot be referred to by index or
key.

Unchangeable
Set items are unchangeable, meaning that we cannot change the items after the set has been created.

Duplicates Not Allowed


Sets cannot have two items with the same value.

Duplicate values will be ignored

thisset = {"apple", "banana", "cherry", "apple"}

print(thisset)

o/p=

The values True and 1 are considered the same value in sets, and are treated as duplicates

True and 1 is considered the same value:

thisset = {"apple", "banana", "cherry", True, 1, 2}

print(thisset)

o/p=
The values False and 0 are considered the same value in sets, and are treated as duplicates

False and 0 is considered the same value:

thisset = {"apple", "banana", "cherry", False, True, 0}

print(thisset)

len() function.

To determine how many items a set has, use the len() function.

thisset = {"apple", "banana", "cherry"}

print(len(thisset))

Set Items - Data Types


Set items can be of any data type:

set1 = {"apple", "banana", "cherry"}


set2 = {1, 5, 7, 9, 3}
set3 = {True, False, False}

A set can contain different data types:

set1 = {"abc", 34, True, 40, "male"}

type()
From Python's perspective, sets are defined as objects with the data type 'set':

What is the data type of a set?

myset = {"apple", "banana", "cherry"}


print(type(myset))

o/p= <class 'set'>

Access Items
You cannot access items in a set by referring to an index or a key.
But you can loop through the set items using a for loop, or ask if a specified value is present in a set, by
using the in keyword.

Loop through the set, and print the values:

thisset = {"apple", "banana", "cherry"}

for x in thisset:
print(x)

o/p= apple

banana

cherry

check if “banana” is present in the set:

thisset = {"apple", "banana", "cherry"}

print("banana" in thisset)

o/p= t

check if “banana” is NOT present in the set:

thisset = {"apple", "banana", "cherry"}

print("banana" not in thisset)

O/P= F

Add Items
Once a set is created, you cannot change its items, but you can add new items.

To add one item to a set use the add() method.

thisset = {"apple", "banana", "cherry"}

thisset.add("orange")

print(thisset)

Add Sets
To add two sets ,

To add items from another set into the current set, use the update() method.

thisset = {"apple", "banana", "cherry"}


tropical = {"pineapple", "mango", "papaya"}

thisset.update(tropical)

print(thisset)

Add Any Iterable


The object in the update() method does not have to be a set, it can be any iterable object (tuples, lists,
dictionaries etc.).

thisset = {"apple", "banana", "cherry"}


mylist = ["kiwi", "orange"]

thisset.update(mylist)

print(thisset)

Remove Item
To remove an item in a set, use the remove(), or the discard() method.

thisset = {"apple", "banana", "cherry"}

thisset.remove("banana")

print(thisset)

You can also use the pop() method to remove an item, but this method will remove a random item, so
you cannot be sure what item that gets removed.

The return value of the pop() method is the removed item.

thisset = {"apple", "banana", "cherry"}

x = thisset.pop()

print(x)

print(thisset)
The clear() method empties the set:

thisset = {"apple", "banana", "cherry"}

thisset.clear()

print(thisset)

The del keyword will delete the set completely:

thisset = {"apple", "banana", "cherry"}

del thisset

print(thisset)

You can loop through the set items by using a for loop

thisset = {"apple", "banana", "cherry"}

for x in thisset:
print(x)

Dictionary Methods
Python has a set of built-in methods that you can use on dictionaries.

Method Description

clear() Removes all the elements from the dictionary

copy() Returns a copy of the dictionary

fromkeys() Returns a dictionary with the specified keys and value

get() Returns the value of the specified key


items() Returns a list containing a tuple for each key value pair

keys() Returns a list containing the dictionary's keys

pop() Removes the element with the specified key

popitem() Removes the last inserted key-value pair

setdefault() Returns the value of the specified key. If the key does not exist: insert the key, with the specified v

update() Updates the dictionary with the specified key-value pairs

values() Returns a list of all the values in the dictionary

What is Pandas?
Pandas is a Python library used for working
with data sets.

It is high performance data analysis tool.


Working with large data
Support (or) large files with different
formats
More flexible
Represent in tabular way(rows &columns)
Working an missing data
It has functions for analyzing, cleaning,
exploring, and manipulating data.
The name "Pandas" has a reference to both
"Panel Data", and "Python Data Analysis"
Pandas allows us to analyze big data and
make conclusions based on statistical
theories.
Pandas can clean messy data sets, and
make them readable and relevant
 Is there a correlation between two or more
columns.
 What is average value?
 Max value?
 Min value?
Pandas are also able to delete rows that are not
relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.
import it in your applications by adding
the import keyword:
Pandas is usually imported under the pd alias.
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)

print(myvar)

In PANDAS
3 TYPES OF DATA STRUCUTURES
A) SERIES
B) DATA FRAME
C) PANEL

A) SERIES : A Pandas Series is like a column in a table.


It is a one-dimensional array holding data of any type.

Syntax : pd.series(data, index)


import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

find the index : print(myvar[2])

index: the index argument, you can name your own labels.

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

o/p : x 1

y 7

z 2

What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows
and columns.

Syntax : pd.DataFrame(data)

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)

o/p : ies duration


0 420 50
1 380 40
Calories duration

0 420 50
1 380 40

2 390 45

Pandas use the loc attribute to return one or more specified


row(s)

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)
print(df.loc[0])
o/p= calories 420
duration 50

print(df.loc[[0, 1]]) #use a list of indexes:

o/p= calories duration

0 420 50

1 380 40

 the index argument, you can name your own indexes.

Add a list of names to give each row a name

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

o/p= Calories duration


day1(0) 420 50

day2(1) 380 40

day3(2) 390 45

Use the named index in the loc attribute to return the specified row(s).

print(df.loc["day2"]) #refer to the named index:

o/p = calories 380

duration 40

name: day2 , dtype: int64

Load Files Into a DataFrame


If your data sets are stored in a file, Pandas can load them into a DataFrame.

Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv') #refer to the data.cvs to file name

print(df)

Read CSV Files


A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

Download data.csv. or Open data.csv

Load the CSV into a DataFrame:


import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

o/p = total csv file (rows and columns)


(use to_string() to print the entire DataFrame.).

If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the last 5
rows:

Print the DataFrame without the to_string() method:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

o/p= return the first 5 rows, and the last 5

max_rows
The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the pd.options.display.max_rows statement.

import pandas as pd

print(pd.options.display.max_rows)

In my system the number is 60, which means that if the DataFrame contains more than 60 rows,
the print(df) statement will return only the headers and the first and last 5 rows.

You can change the maximum rows number with the same statement.

import pandas as pd

pd.options.display.max_rows = 9999

df = pd.read_csv('data.csv')

print(df)

Pandas - Analyzing DataFrames


Viewing the Data
One of the most used method for getting a quick overview of the DataFrame, is the head() method.

The head() method returns the headers and a specified number of rows, starting from the top.

Get a quick overview by printing the first 10 rows of the DataFrame:


import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(10)) # if the number of rows is not specified, the head() method


will return the top 5 rows.

o/p= first 10 rows of the DataFrame

Print the first 5 rows of the DataFrame:


import pandas as pd

df = pd.read_csv('data.csv')

print(df.head())

o/p= first 5 rows

There is also a tail() method for viewing the last rows of the
DataFrame.

The tail() method returns the headers and a specified number of rows, starting from the bottom.

Print the last 5 rows of the DataFrame:

print(df.tail())

o/p= last 5 rows

Info About the Data


The DataFrames object has a method called info(), that gives you more information about the data set.

The info() method also tells us how many Non-Null values there are present in each column, and in our
data set.

Empty values, or Null values, can be bad when analyzing data, and you should consider removing rows
with empty values. This is a step towards what is called cleaning data

Data Cleaning
Data cleaning means fixing bad data in your data set.

Bad data could be:

 Empty cells (The data set contains some empty cells)


 Data in wrong format (The data set contains wrong format)
 Wrong data (The data set contains wrong data)
 Duplicates (The data set contains duplicates)

Empty Cells
 Empty cells can potentially give you a wrong result when you analyze data.

Remove Rows
 One way to deal with empty cells is to remove rows that contain empty cells.
 This is usually OK, since data sets can be very big, and removing a few rows will not have a big
impact on the result.

 Return a new Data Frame with no empty cells:


import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())

# Notice in the result that some rows have been removed (rows).

#These rows had cells with empty values.


# By default, the dropna() method returns a new DataFrame, and will not change the original.

If you want to change the original DataFrame, use the inplace = True argument:

Remove all rows with NULL values:

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace = True)

print(df.to_string())

Now, the dropna(inplace = True) will NOT return a new DataFrame, but it will remove all rows
containing NULL values from the original DataFrame.
Replace Empty Values
Another way of dealing with empty cells is to insert a new value instead.

This way you do not have to delete entire rows just because of some empty cells.

The fillna() method allows us to replace empty cells with a value:

eplace NULL values with the number 130:

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

Replace Only For Specified Columns


The example above replaces all empty cells in the whole Data Frame.

To only replace empty values for one column, specify the column name for the DataFrame:

Replace NULL values in the "Calories" columns with the number 130:

import pandas as pd

df = pd.read_csv('data.csv')

df["Calories"].fillna(130, inplace = True)

Replace Using Mean, Median, or Mode


A common way to replace empty cells, is to calculate the mean, median or mode value of the column.

Pandas uses the mean() median() and mode() methods to calculate the respective values for a
specified column:

Calculate the MEAN, and replace any empty values with it:

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()
df["Calories"].fillna(x, inplace = True)

#As you can see in row 18 and 28, the empty values from "Calories" was replaced with the
mean: 304.68

Mean = the average value (the sum of all values divided by number of values).

Calculate the MEDIAN, and replace any empty values with it:

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)

Median = the value in the middle, after you have sorted all values ascending.

Calculate the MODE, and replace any empty values with it:

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)

Mode = the value that appears most frequently.

Pandas - Cleaning Data of Wrong Format


Data of Wrong Format

Cells with data of wrong format can make it difficult, or even impossible, to analyze data.

To fix it, you have two options: remove the rows, or convert all cells in the columns into the same format.

In our Data Frame, we have two cells with the wrong format. Check out the 'Date' column should
be a string that represents a date: NaN \20241205

Let's try to convert all cells in the 'Date' column into dates.

Pandas has a to_datetime() method for this:

Convert to date:
import pandas as pd

df = pd.read_csv('data.csv')

df['Date'] = pd.to_datetime(df['Date'])

print(df.to_string())

Removing Rows
The result from the converting in the example above gave us a NaT value, which can be handled as a
NULL value, and we can remove the row by using the dropna() method.

Remove rows with a NULL value in the "Date" column:

df.dropna(subset=['Date'], inplace = True)

Pandas - Fixing Wrong Data


Wrong Data

"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong, like if someone
registered "199" instead of "1.99".

Sometimes you can spot wrong data by looking at the data set, because you have an expectation of
what it should be.

If you take a look at our data set, you can see that in row 3, the duration is 420, but for all the other rows
the duration is between 60 and 70.

It doesn't have to be wrong, but taking in consideration that this is the data set of someone's workout
sessions, we conclude with the fact that this person did not work out in 420 minutes.

Duration date pulse maxpulse calories

0 60 2020/02/11 101 130 409.2


1 60 2021/01/13 107 140 412.5
2 70 2023/03/12 110 120 420.4
3 420 2024/01/02 114 134 425.5

How can we fix wrong values, like the one for "Duration" in row 3?

Replacing Values
One way to fix wrong values is to replace them with something else.
In our example, it is most likely a typo, and the value should be "42" instead of "420", and we could just
insert "42" in row 3:

ExampleGet your own Python Server


Set "Duration" = 42 in row 3:

df.loc[3, 'Duration'] = 42

For small data sets you might be able to replace the wrong data one by one, but not for big data sets.

To replace wrong data for larger data sets you can create some rules, e.g. set some boundaries for legal
values, and replace any values that are outside of the boundaries.

Loop through all values in the "Duration" column.

If the value is higher than 120, set it to 120:

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120

Removing Rows
Another way of handling wrong data is to remove the rows that contains wrong data.

This way you do not have to find out what to replace them with, and there is a good chance you do not
need them to do your analyses.

Example
Delete rows where "Duration" is higher than 120:

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)

import pandas as pd

df = pd.read_csv('data.csv')

print(df.duplicated())
Pandas - Removing Duplicates
Duplicate rows are rows that have been registered more than one time.

Duration date pulse maxpulse calories

0 60 2020/02/11 101 130 409.2


1 45 2021/01/13 107 140 412.5
2 70 2023/03/12 110 120 420.4
3 45 2021/01/13 107 140 412.5

By taking a look at our test data set, we can assume that row 1 and 3 are duplicates.

To discover duplicates, we can use the duplicated() method.

The duplicated() method returns a Boolean values for each row:

Returns True for every row that is a duplicate, otherwise False:

print(df.duplicated())

Removing Duplicates
To remove duplicates, use the drop_duplicates() method.

Remove all duplicates:

df.drop_duplicates(inplace = True)

emember: The (inplace = True) will make sure that the method does NOT return a new DataFrame,
but it will remove all duplicates from the original DataFrame.

Pandas - Data Correlations


Finding Relationships
A great aspect of the Pandas module is the corr() method.

The corr() method calculates the relationship between each column in your data set.

The examples in this page uses a CSV file called: 'data.csv'.


Download data.csv. or Open data.csv

Example
Show the relationship between the columns:

df.corr()

Result

Duration Pulse Maxpulse Calories


Duration 1.000000 -0.155408 0.009403 0.922721
Pulse -0.155408 1.000000 0.786535 0.025120
Maxpulse 0.009403 0.786535 1.000000 0.203814
Calories 0.922721 0.025120 0.203814 1.000000

Result Explained
The Result of the corr() method is a table with a lot of numbers that represents how well the
relationship is between two columns.

The number varies from -1 to 1.

1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set, each time a value
went up in the first column, the other one went up as well.

0.9 is also a good relationship, and if you increase one value, the other will probably increase as well.

-0.9 would be just as good relationship as 0.9, but if you increase one value, the other will probably go
down.

0.2 means NOT a good relationship, meaning that if one value goes up does not mean that the other will.

What is a good correlation? It depends on the use, but I think it is safe to say you have to have at
least 0.6 (or -0.6) to call it a good correlation.

Perfect Correlation:
We can see that "Duration" and "Duration" got the number 1.000000, which makes sense, each column
always has a perfect relationship with itself.
Good Correlation:
"Duration" and "Calories" got a 0.922721 correlation, which is a very good correlation, and we can
predict that the longer you work out, the more calories you burn, and the other way around: if you
burned a lot of calories, you probably had a long work out.

Bad Correlation:
"Duration" and "Maxpulse" got a 0.009403 correlation, which is a very bad correlation, meaning that we
can not predict the max pulse by just looking at the duration of the work out, and vice versa.

Pandas - Plotting
Plotting
Pandas uses the plot() method to create diagrams.

We can use Pyplot, a submodule of the Matplotlib library to visualize the diagram on the screen.

The plot() function is used to draw points (markers) in a diagram.

By default, the plot() function draws a line from point to point.

Import pyplot from Matplotlib and visualize our DataFrame:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot()

plt.show()

Scatter Plot
Specify that you want a scatter plot with the kind argument:

kind = 'scatter'
A scatter plot needs an x- and a y-axis.

In the example below we will use "Duration" for the x-axis and "Calories" for the y-axis.

Include the x and y arguments like this:

x = 'Duration', y = 'Calories'

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')

plt.show()

A scatterplot where there are no relationship between the columns:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot(kind = 'scatter', x = 'Duration', y = 'Maxpulse')

plt.show()
Histogram
Use the kind argument to specify that you want a histogram:

kind = 'hist'

A histogram needs only one column.

A histogram shows us the frequency of each interval, e.g. how many workouts lasted between 50 and 60
minutes?

In the example below we will use the "Duration" column to create the histogram:

Example
df["Duration"].plot(kind = 'hist')
What is Matplotlib?
Matplotlib is a low level graph plotting library in python that serves as a visualization utility.

Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript for
Platform compatibility.

Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:

Now the Pyplot package can be referred to as plt.

import matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([0, 6])


ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()
Plotting x and y points
The plot() function is used to draw points (markers) in a diagram.

By default, the plot() function draws a line from point to point.

The function takes parameters for specifying points in the diagram.

Parameter 1 is an array containing the points on the x-axis.

Parameter 2 is an array containing the points on the y-axis.

mport matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([1, 8])


ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints)
plt.show()
If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to the plot
function

Draw a line in a diagram from position (1, 3) to position (8, 10):

The x-axis is the horizontal axis.

The y-axis is the vertical axis.

Plotting Without Line


To plot only the markers, you can use shortcut string notation parameter 'o', which means 'rings'.

Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):

import matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([1, 8])


ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints, 'o')


plt.show()
Multiple Points
You can plot as many points as you like, just make sure you have the same number of points in both
axis.

Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to position (8, 10):

import matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([1, 2, 6, 8])


ypoints = np.array([3, 8, 1, 10])

plt.plot(xpoints, ypoints)
plt.show()

Default X-Points
If we do not specify the points on the x-axis, they will get the default values 0, 1, 2, 3 etc., depending on
the length of the y-points.

So, if we take the same example as above, and leave out the x-points, the diagram will look like this:

Plotting without x-points: (The x-points in the example above are [0, 1, 2, 3, 4, 5].)

import matplotlib.pyplot as plt


import numpy as np

ypoints = np.array([3, 8, 1, 10, 5, 7])

plt.plot(ypoints)
plt.show()

Markers
You can use the keyword argument marker to emphasize each point with a specified marker:

Mark each point with a circle:

import matplotlib.pyplot as plt


import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')


plt.show()

plt.plot(ypoints, marker = '*')


Marker Description

'o' Circle

'*' Star

'.' Point

',' Pixel

'x' X

'X' X (filled)
'+' Plus

'P' Plus (filled)

's' Square

linestyle = 'dotted
linestyle = 'dashed'

The line style can be written in a shorter syntax:

linestyle can be written as ls .


dotted can be written as : .

dashed can be written as --.

plt.plot(ypoints, ls = ':')

Matplotlib Labels and Title

Create Labels for a Plot


With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the x- and y-axis.

ExampleGet your own Python Server


Add labels to the x- and y-axis:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)

plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.show()

Create a Title for a Plot


With Pyplot, you can use the title() function to set a title for the plot.

Add a plot title and labels for the x- and y-axis:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.plot(x, y)

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.show()

Set Font Properties for Title and Labels


You can use the fontdict parameter in xlabel(), ylabel(), and title() to set font properties for
the title and labels.

Set font properties for the title and labels:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}

plt.title("Sports Watch Data", fontdict = font1)


plt.xlabel("Average Pulse", fontdict = font2)
plt.ylabel("Calorie Burnage", fontdict = font2)

plt.plot(x, y)
plt.show()

Add Grid Lines to a Plot


With Pyplot, you can use the grid() function to add grid lines to the plot.

Add grid lines to the plot:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)
plt.grid()

plt.show()

Set Line Properties for the Grid


You can also set the line properties of the grid, like this: grid(color = 'color', linestyle = 'linestyle',
linewidth = number).

Example
Set the line properties of the grid:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.plot(x, y)

plt.grid(color = 'green', linestyle = '--', linewidth = 0.5)

plt.show()

Display Multiple Plots


With the subplot() function you can draw multiple plots in one figure:

Draw 2 plots:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y) plt.show()
The subplot() Function
The subplot() function takes three arguments that describes the layout of the figure.

The layout is organized in rows and columns, which are represented by the first and second argument.

The third argument represents the index of the current plot.

plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.

plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.

So, if we want a figure with 2 rows an 1 column (meaning that the two plots will be displayed on top of
each other instead of side-by-side), we can write the syntax like this:
Example
Draw 2 plots on top of each other:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 1, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 1, 2)
plt.plot(x,y)

plt.show()
Title
You can add a title to each plot with the title() function:

Example
2 plots, with titles:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")

plt.show()
Result:

Super Title
You can add a title to the entire figure with the suptitle() function:

Example
Add a title for the entire figure:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")

plt.suptitle("MY SHOP")
plt.show()

Result:
Matplotlib Scatter

Creating Scatter Plots


With Pyplot, you can use the scatter() function to draw a scatter plot.

The scatter() function plots one dot for each observation. It needs two arrays of the same length, one
for the values of the x-axis, and one for values on the y-axis:

A simple scatter plot:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])

plt.scatter(x, y)
plt.show()

Result:
The observation in the example above is the result of 13 cars passing by.

The X-axis shows how old the car is.

The Y-axis shows the speed of the car when it passes.

Are there any relationships between the observations?

It seems that the newer the car, the faster it drives, but that could be a coincidence, after all we only
registered 13 cars.

Compare Plots
In the example above, there seems to be a relationship between speed and age, but what if we plot the
observations from another day as well? Will the scatter plot tell us something else?

Draw two plots on the same figure:

import matplotlib.pyplot as plt


import numpy as np

#day one, the age and speed of 13 cars:


x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)

#day two, the age and speed of 15 cars:


x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y)

plt.show()
Result:

Colors
You can set your own color for each scatter plot with the color or the c argument:

Example
Set your own color of the markers:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y, color = 'hotpink')

x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y, color = '#88c999')

plt.show()
Result:

Color Each Dot


You can even set a specific color for each dot by using an array of colors as value for the c argument:

Note: You cannot use the color argument for this, only the c argument.

Example
Set your own color of the markers:

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
np.array(["red","green","blue","yellow","pink","black","orange","purple","beige","brown"
,"gray","cyan","magenta"])

plt.scatter(x, y, c=colors)
plt.show()

Result:

Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:

Draw 4 bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x,y)
plt.show()
The bar() function takes arguments that describes the layout of the bars.

The categories and their values represented by the first and second argument as arrays.

Example
x = ["APPLES", "BANANAS"]
y = [400, 350]
plt.bar(x, y)

Horizontal Bars
If you want the bars to be displayed horizontally instead of vertically, use the barh() function:

Example
Draw 4 horizontal bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])
plt.barh(x, y)
plt.show()

Result:

Try it Yourself »

Bar Color
The bar() and barh() take the keyword argument color to set the color of the bars:

Draw 4 red bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x, y, color = "red")


plt.show()
Result:

plt.bar(x, y, color = "hotpink")


plt.show()

Bar Width
The bar() takes the keyword argument width to set the width of the bars:

Example
Draw 4 very thin bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])
plt.bar(x, y, width = 0.1)
plt.show()

Bar Height
The barh() takes the keyword argument height to set the height of the bars:

Draw 4 very thin bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.barh(x, y, height = 0.1)


plt.show()
Histogram
A histogram is a graph showing frequency distributions.

It is a graph showing the number of observations within each given interval.

Create Histogram
In Matplotlib, we use the hist() function to create histograms.

The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.

A Normal Data Distribution by NumPy:


import numpy as np

x = np.random.normal(170, 10, 250)

print(x)

he hist() function will read the array and produce a histogram:

A simple histogram:

import matplotlib.pyplot as plt


import numpy as np

x = np.random.normal(170, 10, 250)

plt.hist(x)
plt.show()

Creating Pie Charts


With Pyplot, you can use the pie() function to draw pie charts:

ExampleGet your own Python Server


A simple pie chart:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


plt.pie(y)
plt.show()

Labels
Add labels to the pie chart with the labels parameter.

The labels parameter must be an array with one label for each wedge:

A simple pie chart:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels)


plt.show()
start Angle
As mentioned the default start angle is at the x-axis, but you can change the start angle by specifying
a startangle parameter.

The startangle parameter is defined with an angle in degrees, default angle is 0:

Start the first wedge at 90 degrees:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels, startangle = 90)


plt.show()

Explode
Maybe you want one of the wedges to stand out? The explode parameter allows you to do that.

The explode parameter, if specified, and not None, must be an array with one value for each wedge.

Each value represents how far from the center each wedge is displayed:
Example
Pull the "Apples" wedge 0.2 from the center of the pie:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]

plt.pie(y, labels = mylabels, explode = myexplode)


plt.show()

Shadow
Add a shadow to the pie chart by setting the shadows parameter to True:

Example
Add a shadow:

import matplotlib.pyplot as plt


import numpy as np
y = np.array([35, 25, 25, 15])
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]

plt.pie(y, labels = mylabels, explode = myexplode, shadow = True)


plt.show()

Result:

Colors
You can set the color of each wedge with the colors parameter.

The colors parameter, if specified, must be an array with one value for each wedge:

Example
Specify a new color for each wedge:
import matplotlib.pyplot as plt
import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
mycolors = ["black", "hotpink", "b", "#4CAF50"]

plt.pie(y, labels = mylabels, colors = mycolors)


plt.show()

Result:

The Internet and the Web


The internet is a global network of interconnected computers and servers that allows people to
communicate, share information, and access resources from anywhere in the world. It was
created in the 1960s by the US Department of Defense as a way to connect computers and
share information between researchers and scientists.
The World Wide Web, or simply the web, is a system of interconnected documents and
resources, linked together by hyperlinks and URLs. It was created by Tim Berners-Lee in 1989
as a way for scientists to share information more easily. The web quickly grew to become the
most popular way to access information on the internet.
1. The Internet: In simplest terms, the Internet is a global network comprised of smaller
networks that are interconnected using standardized communication protocols. The Internet
standards describe a framework known as the Internet protocol suite. This model divides
methods into a layered system of protocols.
1. Application layer (highest) – concerned with the data(URL, type, etc.). This is where HTTP,
HTTPS, etc., comes in.
2. Transport layer – responsible for end-to-end communication over a network.
3. Network layer – provides data route.
The World Wide Web: The Web is a major means of access information on the Internet. It’s a
system of Internet servers that support specially formatted documents. The documents are
formatted in a markup language called HTML, or “HyperText Markup Language”, which supports
a number of features including links and multimedia. These documents are interlinked using
hypertext links and are accessible via the Internet.
To link hypertext to the Internet, we need:
1. The markup language, i.e., HTML.
2. The transfer protocol, e.g., HTTP.
3. Uniform Resource Locator (URL), the address of the resourc

Internet Web

The Internet is the network of networks and


the network allows to exchange of data
The Web is a way to access information
between two or more computers.
through the Internet.

The Web is a model for sharing


information using the Internet.
It is also known as the Network of Networks.

The Internet is a way of transporting


The protocol used by the web is HTTP.
information between devices.

The Web is accessed by the Web


Accessible in a variety of ways.
Browser.

Network protocols are used to transport Accesses documents and online sites
Internet Web

data. through browsers.

Global network of networks Collection of interconnected websites

Access Can be accessed using various


Accessed through a web browser
devices

Connectivity Allows users to access and


Connectivity Network of networks that allows
view web pages, multimedia content, and
devices to communicate and exchange data
other resources over the Internet

Protocols HTTP, HTTPS, FTP, SMTP,


Protocols TCP/IP, FTP, SMTP, POP3, etc.
etc.

Infrastructure Consists of web servers,


Infrastructure Consists of routers, switches,
web browsers, and other software and
servers, and other networking hardware
hardware

Used for communication, sharing of Used for publishing and accessing web
resources, and accessing information from pages, multimedia content, and other
around the world resources on the Internet

No single creator Creator Tim Berners-Lee

Provides a platform for publishing and


Provides the underlying infrastructure for the
accessing information and resources on
Web, email, and other online services
the Internet

URI: URI stands for ‘Uniform Resource Identifier’. A URI can be a name, locator, or both for an
online resource whereas a URL is just the locator. URLs are a subset of URIs. A URL is a
human-readable text that was designed to replace the numbers (IP addresses) that computers
use to communicate with servers.
A URL consists of a protocol, domain name, and path (which includes the specific subfolder
structure where a page is located) like-
protocol://WebSiteName.topLevelDomain/path
1. Protocol – HTTP or HTTPS.
2. WebSiteName – geeksforgeeks, google etc.
3. topLevelDomain- .com, .edu, .in etc.
4. path- specific folders and/or subfolders that are on a given website.
Uses of Internet and the Web :
1. Communication: The internet and web have made communication faster and easier than
ever before. We can now send emails, chat online, make video calls, and use social media
platforms to connect with people all over the world.
2. Information sharing: The web has made it possible to access vast amounts of information on
any topic from anywhere in the world. We can read news articles, watch videos, listen to
podcasts, and access online libraries and databases.
3. Online shopping: The internet and web have revolutionized the way we shop. We can now
browse and purchase products online, from clothes and groceries to electronics and furniture.
4. Entertainment: The internet and web provide a wealth of entertainment options, from
streaming movies and TV shows to playing online games and listening to music.
5. Education: The web has made it possible to access educational resources from anywhere in
the world. We can take online courses, access e-books and digital libraries, and connect with
educators and other learners through online communities.
6. Business: The internet and web have transformed the way businesses operate. Companies
can now use e-commerce platforms to sell products and services, collaborate with remote
workers, and access global markets.
7. Research: The internet and web have made it easier for researchers to access and share
information. We can now access scientific journals and databases, collaborate with other
researchers online, and conduct surveys and experiments through online platforms.
Issues in Internet and the Web :
1. Privacy and security: The internet and web are vulnerable to various security threats, such
as hacking, identity theft, and phishing attacks. These threats can compromise our personal
information, such as login credentials, financial information, and personal data.
2. Cyberbullying: The anonymity of the internet and web can lead to cyberbullying, where
individuals are harassed or threatened online. Cyberbullying can have severe consequences,
including depression, anxiety, and suicide.
3. Online addiction: The internet and web can be addictive, and individuals can spend hours
browsing social media or playing online games, leading to neglect of other important aspects
of their lives.
4. Environmental impact: The internet and web consume a significant amount of energy,
contributing to carbon emissions and climate change.

WEB Servers: To view and browse pages on the Web, all you need is a web browser. To publish
pages on the Web, you need a web server. A web server is the program that runs on a computer and is
responsible for replying to web browser requests for files. You need a web server to publish documents
on the Web. When you use a browser to request a page on a website, that browser makes a web
connection to a server using the HTTP protocol. The browser then formats the information it got from
the server. Server accepts the connection, sends the contents of the requested files and then closes.

WEB Browsers: A web browser is the program you use to view pages and navigate the World
Wide Web. A wide array of web browsers is available for just about every platform you can imagine.
Microsoft Internet Explorer, for example, is included with Windows and Safari is included with Mac OS
X. Mozilla Firefox, Netscape Navigator, and Opera are all available for free.

What the Browser Does The core purpose of a web browser is to connect to web servers, request
documents, and then properly format and display those documents. Web browsers can also display files
on your local computer, download files that are not meant to be displayed. Each web page is a file
written in a language called the Hypertext Markup

Protocols: In computing, a protocol is a set of rules which is used by computers to communicate with
each other across a network. A protocol is a convention or standard that controls or enables the
connection, communication, and data transfer between computing endpoints.

Internet Protocol Suite: The Internet Protocol Suite is the set of communications protocols used
for the Internet and other similar networks. It is commonly also known as TCP/IP named from two of
the most important protocols in it: The Transmission Control Protocol (TCP) and the Internet Protocol
(IP), which were the first two networking protocols defined in this standard.

What is a Website ?
A website is a collection of many web pages, and web pages are digital files that are written
using HTML(HyperText Markup Language). To make your website available to every person in
the world, it must be stored or hosted on a computer connected to the Internet round a clock.
Such computers are known as a Web Server.

Components of a Website: We know that a website is a collection of a webpages hosted on a


web-server. These are the components for making a website.

 Webhost
 Address
 Homepage
 Design
 Content
 The Navigation Structure

 Webhost: Hosting is the location where the website is physically located. Group of
webpages (linked webpages) licensed to be called a website only when the webpage is
hosted on the webserver. The webserver is a set of files transmitted to user computers
when they specify the website’s address..
 Address: Address of a website also known as the URL of a website. When a user wants
to open a website then they need to put the address or URL of the website into the web
browser, and the asked website is delivered by the webserver.
 Homepage : Home page is a very common and important part of a webpage. It is the first
webpage that appears when a visitor visits the website. The home page of a website is
very important as it sets the look and feel of the website and directs viewers to the rest of
the pages on the website.
 Design : It is the final and overall look and feel of the website that has a result of proper
use and integration elements like navigation menus, graphics, layout, navigation menus
etc.
 Content : Every web pages contained on the website together make up the content of the
website. Good content on the webpages makes the website more effective and attractive.
 The Navigation Structure: The navigation structure of a website is the order of the
pages, the collection of what links to what. Usually, it is held together by at least one
navigation menu.

Types of Website:
 Static Website
 Dynamic Website
 Static Website: In Static Websites, Web pages are returned by the server which are
prebuilt source code files built using simple languages such as HTML, CSS, or JavaScript.
There is no processing of content on the server (according to the user) in Static Websites.
Web pages are returned by the server with no change therefore, static Websites are fast.
There is no interaction with databases. Also, they are less costly as the host does not
need to support server-side processing with different languages.
 Dynamic Website: In Dynamic Websites, Web pages are returned by the server which is
processed during runtime means they are not prebuilt web pages, but they are built during
runtime according to the user’s demand with the help of server-side scripting languages
such as PHP, Node.js, ASP.NET and many more supported by the server.

There are different types of websites on the whole internet, we had chosen some most common
categories to give you a brief idea –
 Blogs: These types of websites are managed by an individual or a small group of persons,
they can cover any topics — they can give you fashion tips, music tips, travel tips, fitness tips.
Nowadays professional blogging has become an external popular way of earning money
online.
 E-commerce: These websites are well known as online shops. These websites allow us to
make purchasing products and online payments for products and services. Stores can be
handled as standalone websites.
 Portfolio: These types of websites acts as an extension of a freelancer resume. It provides a
convenient way for potential clients to view your work while also allowing you to expand on
your skills or services.
 Brochure: These types of websites are mainly used by small businesses, these types of
websites act as a digital business card, and used to display contact information, and to
advertise services, with just a few pages.
 News and Magazines: These websites needs less explanation, the main purpose of these
types of websites is to keep their readers up-to-date from current affairs whereas magazines
focus on the entertainment.
 Social Media: We all know about some famous social media websites like Facebook, Twitter,
Reddit, and many more. These websites are usually created to let people share their
thoughts, images, videos, and other useful components.
 Educational: Educational websites are quite simple to understand as their name itself
explains it. These websites are designed to display information via audio or videos or images.
 Portal: These types of websites are used for internal purposes within the school, institute, or
any business, These websites often contain a login process allowing students to access their
credential information or allows employees to access their emails and alerts.

You might also like