SS2 Note Data Processing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Saturday, 11 January 2020

SS2 SECOND TERM DATA PROCESSING NOTE

WEEK

TOPIC

Writing of scheme of work/Revision

2-3

ENTITY-RELATIONSHIP MODEL

· Definition of terms: entity, entity instance, entity set, attribute

· Types of attributes

· Relationship

· Degree and Cardinality of a relationship

3-6

RELATIONAL MODEL

· Definition of terms: relation, attribute, tuple, domain, degree, cardinality, relational scheme.

· Types of relational keys

· Creating and modifying relation using SQL

FILE ORGANIZATION

· Definition of file organization

· Types of file organization method

· File access method

· Types of file access method


8

INTERNET

· Definition of Internet

· Benefits of the internet

· Abuse of the internet

· Internet security

TOPIC 1: ENTITY - RELATIONSHIP MODEL

An entity-relationship model describes data in terms of the following:

1. Entities

2. Relationship between entities

3. Attributes of entities

ERM is graphically represented using an entity-relationship diagram (E-R Diagram).

Entity relationship diagram

Entity: An entity is an object that exists and which is distinguishable from other objects. An
entity can be a person, a place, an object, an event, or a concept which an organization wishes
to maintain data. The following are some examples of entities:

Person: STUDENT, EMPLOYEE, CLIENT

Object: COUCH, AIRPLANE, MACHINE

Place: CITY, ROOM, WAREHOUSE

Concept: PROJECT, ACCOUNT, COURSE

An entity type defines a collection of entities that have same attributes. An entity instance is a
single item in this collection. An entity set is a set of entity instances. For example, STUDENT is
an entity type; a student with ID number ISI070 is an entity instance; and a collection of all
students is an entity set.

In E-R diagram, an entity name should be a concise singular noun that captures the unique
characteristics of the entity type. An E-R diagram depicts an entity type using a rectangle with
the name of the entity inside, as shown below.

Attribute: An attribute is a characteristic of an entity type. Some attributes of common entity


types include the following:

STUDENT = {Student ID, Name, Address, Phone, Email, DOB}

ACCOUNT = {Account Number, Account Type, Date Opened, Balance}

E-R diagrams depict an attribute inside an ellipse and connect the ellipse with a line to the
associated entity type. The diagram below indicates some of the possible attributes in an E-R
diagram for the entity STUDENT.

Types of attributes

1. Simple or an atomic attribute, such as City or State, cannot be further divided into smaller
components.

2. Composite attribute can be divided into smaller subparts in which each subpart represents an
independent attribute. Name and Address are the only composite attributes in the diagram
above.

3. Single-valued attribute: most attributes have a single value. For example, a student has only
one date of birth and one identification number.

4. Multi-valued attribute: have more than one value for an entity instance. For example, student
attribute language above, which stores the name of languages a student can speak, is a multi-
valued attribute.

5. Stored attribute is an attribute whose value cannot be derived from the values of other
attributes.
6. Derived attribute: the value of a derived attribute can be determined by analyzing other
attributes. For example in the figure above, Age is a derived attribute because its value can be
derived from the current date and the attribute DateofBirth.

7. A key attribute (or identifier) is a single attribute or a combination of attributes that uniquely
identify an individual instance of an entity type. No two instances within an entity set can have
the same key attribute value. StudentID is the key attribute in the picture above.

Relationship

Entities in an organization do not exist in isolation but are related to each other. Students take
courses and each STUDENT entity is related to the COURSE entity. Faculty members teach
courses and each FACULTY entity is also related to the COURSE entity. Consequently, the
STUDENT entity is related to the FACULTY entity through the COURSE entity. In an E-R diagram,
we represent relationship types with diamond-shaped boxes connected by straight lines to the
rectangles that represent participating entity types. A relationship type is a given name that is
displayed in this diamond-shaped box and typically takes the form of a present tense verb or
verb phrase that describes the relationship. A relationship is an association among several
entities. The figure below is an E-R diagram that shows a relationship between entities
ACCOUNT and CUSTOMER.

Degree of a relationship

The number of entity sets that participate in a relationship is called the degree of relationship.
The 3 most common degrees of relationship are unary (degree 1), binary (degree 2) and ternary
(degree 3).

Unary relationship: A unary relationship exists when there is an association between two
instances of the same entity type. For example, in an organization, an employee will manage
one or more employee.

Binary relationship: A binary relationship exists when there is an association between two
instances of two different entity types. For example, in a secondary school, a binary relationship
exists between a Subject (SUBJECT entity) and a Teacher (TEACHER entity); a teacher teaches
(a) subject.
Ternary Relationship: A ternary relationship exists when there is an association between three
instances of three different entity types. For example, consider a student using certain
equipment for a project. In this case, the STUDENT, PROJECT, and EQUIPMENT entity types
relate to each other with ternary relationships: a student checks out equipment for a project.

Cardinality of a relationship

The cardinality is the number of occurrences in one entity which are associated (or linked) to
the number of occurrences in another entity. For example, if the entity types A and B are
connected by a relationship, then the maximum cardinality represents the maximum number of
instances of entity B that can be associated with any instance of entity A. There are three types
of relationship:

1. One-to-One Relationship: In a one-to-one relationship, at most one instance of entity B can be


associated with a given instance of entity A and vice versa. For example an employee is
assigned a parking space or an employee manages the department.

2. One-to-Many (or many-to-one) Relationship: In a one-to-many relationship, many instances of


entity B can be associated with a given instance of entity A. However, only one instance of entity
A can be associated with a given instance of entity B. For example, many employee works in a
department or a school has many employees.

3. Many-to-Many Relationship: In a many-to-many relationship, many instances of entity A can


be associated with a given instance of entity B, and, likewise, many instances of entity B can be
associated with a given instance of entity A. For example, many employees work on many
projects.

TOPIC 2: RELATIONAL MODEL

A relational model gives a simple way of representing data in form of a two-dimensional table
called relation. Relational model was invented by Edger Codd. The columns or fields in the table
identify the attribute such as surname, lastname, age, gender, Admission number etc. A tuple or
row contains all the data of a single instance of the table such as a person named James Bond.
In the relational model, every tuple must have a unique identification or key based on the data. In
the table below, the Admission number is the key that uniquely identifies each tuple in the
relation. Keys are often used to join data from two or more relations based on matching
identification. Another concept used in a relational model is the foreign keys, which are primary
keys in one relation that are kept in another relation to allow for the joining of data.

Definition of terms

1. Field: is a named column in the relation (table). There are five fields in the student relation
above – Admission number, Surname, Last name, Age, Gender.

2. Tuple: is a record (i.e. a row of data in a relation). All tuples in a relation must be distinct.

3. Domain: is defined as the set of all unique values permitted for an attribute. For example,
the domain of the field Gender as seen in the table above is M and F. Also the domain of the
attribute Surname consist the combination of all possible letters in the alphabet. A domain is
referred to in a relation schema by the domain name and has a set of values.

4. The Degree (also called arity) of a relation is the number of fields in it. The relation above
is of degree 5.

5. The cardinality of a relation is the number of tuples in it. For example, the relation above
contains 6 tuples and hence the cardinality is 6.

6. A relational database is a collection of relations with distinct names that can be linked
together.

7. Relation schema describes the column heads of the table. The schema specifies the
relation’s name, the name of the field (or attribute or column) and the domain of each field. For
example the schema for the Student relation above is given below:

Student (Admission number: integer, Surname: string, last name: string, Age: integer, Gender:
string).

Relational model keys

1. Candidate key: Candidate keys are defined as the minimal set of fields which can uniquely
identify each record in the table. It is an attribute or a set of attributes that can act as a primary
key for a table to uniquely identify each record in that table.
2. Primary key: is any candidate key that is most appropriate to be the main reference key for
the table. It is used to establish relationship with other tables. It must never be null and must be
unique.

3. Foreign key: is generally a primary key of one table that appears as a field in another
where the first table has a relationship with the second. In other words, if we have a table A with
a primary key X that is linked to a table B where X is a field in B, then X is a foreign key in B.

4. Secondary key or alternative key: a table may have one or more choices for the primary
key. Collectively these are known as candidate keys as shown above. One is selected as the
primary key. Those not selected are known as secondary keys or alternative keys.

5. Composite key is combination of two or more columns in a table that can be used to
uniquely identify each row in a table

CREATING AND MODIFYING RELATION USING SQL

MS-Access, a relational DBMS typically provides a Graphical User Interface (GUI) that allows the
creation of a database and tables that store information about different subject. This can be
achieved using a query language called SQL (Structured Query Language). SQL can do more
than query a database; it can define the structure of the database, modify data in the database
and specify security constraints. A subset of the SQL that is used to define the structure of the
database and the table is called data definition language (DDL). In SQL, a relation is a table.

Creating a database using SQL

The syntax is as follows:

CREATE DATABASE databasename;

The “CREATE DATABASE” is the keyword and “databasename” is the name of the database.

For example assuming we want to create a database for students information, then the create
statement would be as follows:

CREATE DATABASE StudentDB;

Note: Use the Commands:

SHOW DATABASES; - to see the list of databases in the DBMS

DROP DATABASE database_name; - to delete a database from a DBMS

Creating relation (Table) using SQL


The syntax is as follows:

CREATE TABLE table_name (

Column1 datatype,

Column 2 datatype,

Column3 datatype,

.....

columnN datatype,

PRIMARY KEY (one or more columns)

);

The “CREATE TABLE” is the keyword that tells the D system what to do table_name is the name
of the table, “PRIMARY KEY” is a keyword that indicate which column listed above is to be used
as the primary key, column1…ColumnN indicates the fields (or column as in a table).

For example, if you want to create a table that stores student information (i.e. student table), the
statements below will achieve that.

CREATE TABLE StudentTbl (

AdmissionNumber CHAR (10),

Surname CHAR (20),

Lastname CHAR(20),

Age INT (2),

Gender CHAR (1),

PRIMARY KEY (AdmissionNumber)

);

Note: To delete a table from a database, use the command:

DROP TABLE table_name; (i.e. DROP TABLE StudentTbl ).

MODIFYING RELATIONS USING SQL

Inserting tuple into a relation

In order to insert tuples (records) into a table, the INSERT command is used. Here is the syntax:
INSERT INTO Table_name (column1, column2, ….., columnN)

VALUES (value1, value2,…….., valueN);

OR

INSERT INTO table_name (value1, value2, ….., valueN);

For example to insert the first tuple as indicated in the table above, use the following
statements:

INSERT INTO StudentTbl (AdmissionNumber, Surname, Lastname, Age, Gender)

VALUES (1201, ‘James’, ‘Bond’, 15, ‘M’);

OR

INSERT INTO students (1201, ‘James’, ‘Bond’, 15, ‘M’);

Deleting tuple from a relation

To delete a tuple (record) from the table, use the DELETE Command. Here is the syntax:

DELETE FROM tablename

WHERE [condition];

For example, to delete a tuple with surname = James from the relation above, use the following
statement:

DELETE FROM StudentTbl

WHERE surname = ‘James’ ;

The database system deletes the first record with AdmissionNumber = 1201.

Note: The command ‘DELETE FROM students;’ - will delete all tuples (records)in the table.

Updating the content of a relation

The UPDATE query is used to modify the existing records in a table. The syntax is as follows:

UPDATE table_name

SET column1 = value1, column2 = value2, ……, column = valueN

WHERE [condition];

For example, if we want to change the gender of James Bond in the table above (i.e. from M to F)
, use the following statement:
UPDATE StudentTbl

SET Gender = ‘F’

WHERE surname = ‘James’;

The DELETE, INSERT and UPDATE statements above are part of the SQL subset called the Data
Manipulation Language (DML).

Querying Database

You query the database when you are requesting for certain information from the database. You
can include condition that such items you want should meet. The query helps us retrieve data
from the database.

In its simplest form, a query consists of two parts:

· A SELECT list, where the columns to be retrieved are specified

· A FROM clause, where the table or tables to be accessed are specified

Example

The following SQL statement selects the entire rows of data from the StudentTbl:

SELECT * FROM StudentTbl

It is also possible to specify certain columns to be retrieved from the StudentTbl.

Example

SELECT FirsName, LastName, Age FROM StudentTbl

We can structure the query to limit the number of columns of data to be retrieved from the
StudentTable.

Example

SELECT FirstName, LastName, Date

FROM StudentTbl

WHERE Age > 20

Integrity Constraint over Relations


Constraints

Constraints are the rules enforced on data columns in table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the
database. Constraints could be column level or table level. Column level constraints are applied
only to one column, whereas table level constraints are applied to the whole table. The following
are commonly used constraints in SQL:

1. Entity Constraints

Entity constraints can be enforced when you create a table and assign certain column of the
table to be the primary key or and when you assign certain column of that table not to accept
NULL values. The PRIMARY key constraint tells the database engine that none of the rows
should have a duplicate value. The NOT NULL constraint ensures that a column cannot have a
NULL value

Examples

CREATE TABLE StudentTbl (

StudentId INT Primary Key,

FirstName Varchar(15) NOT NULL,

LasttName Varchar(15) NOT NULL,

Gender Char(2),

DOB DateTime

);

You can also use the UNIQUES constraint to ensure that all values in a column are different; that
is, no duplicate entries should exist in the table.

CREATE Table StudentData (

StudentId INT Primary Key,

FullName Varchar(50) unique NOT NULL,

Age INT,

RegDate DateTime Default GETDATE()

);

The GetDate function inserts the current date into the table.
2. Domain Integrity

This constraint can be enforced by specifying that certain columns in the table should be of a
particular data type or that it should meet certain conditions. Example below employs two types
of constraints: CHECK and UNIQUE constraints. The CHECK constraint ensures that all values in
a column satisfy certain conditions (i.e. any record to be inserted must have age value greater
than or equal to 10).

Example

CREATE Table StudentData (

StudentId INT Primary Key,

FullName Varchar(50) unique NOT NULL,

Age INT Check(Age > = 10),

RegDate DateTime Default GETDATE()

);

The DEFAULT constraint provides a default value for the date column when no value is specified.

3. Referential Integrity

Referential Integrity constraint can be enforced using two or more tables. The first table is
called the parent table while any other table or tables will be regarded as the child table(s). The
child table(s) will be used to reference the parent table.

Example

Create the first table called the parent table:

CREATE TABLE Student(

StudentId INT Primary Key,

FirstName Varchar(15) NOT NULL,

LasttName Varchar(15) NOT NULL,

Gender Char(6),

DOB DateTime

);

Create the second table called the child table:


CREATE TABLE Subject(

SubjectId INT Primary Key,

StudentId INT Foreign Key References Student_tbl(StudentId),

SubjectName varchar(15) Unique,

TeacherName varchar(50) Not Null,

Duration DateTime

);

StudentId in the Subject table is a foreign key that “references” StudentId in the Student table
called the parent table. The idea behind this is that a subject cannot exist for a student when
that student is not in the student (parent) table. For example, in order for a student to have a
subject in the Subject table, such student must exist in the Student table.

TOPIC 3: FILE ORGNIZATION

File organization refers to the logical relationships among various records that constitute the
file, particularly with respect to the means of identification and access to any specific record.

File organization is used to describe the way in which the records are stored in terms of blocks,
and the blocks are placed on the storage medium. In short, it refers to the way in which data
records are stored or arranged.

Types of file organization

1. Serial (heap) files organization: records are placed in the order in which they are inserted. A
new record is inserted in the last page of the file; if there is insufficient space in the last page, a
new page is added to the file. This makes insertion very efficient but since heap file has no
particular ordering with respect to field values, a linear search must be performed to access
record as thus making it slower as compared to sorted files.

2. Sequential file organization: records are stored in some sort of order (ascending, descending
or alphabetical) based on the unique key field or search key. This method is fast & efficient for
huge amount of data. Sorted file is inefficient as it takes time & space for sorting records. A
sequential file could be stored on a sequential storage device such as magnetic tape and
magnetic disks. To access these records, the computer must read the file in sequence from the
beginning. The first record is read and processed first, then the second record in the file
sequence, and so on.

3. Index Sequential Access Method (ISAM): Records are organized in sequence and an index
table is used to speed up access to the records without requiring a search of an entire file. The
records of the files can be stored in a random sequence but the index table is in a sorted
sequence on the key value. This type of file organization is best suited for situations where both
batch and online processing are to be supported.

4. Direct or Random file organization: In this file organization, records are organized in such a
way that it is possible for the computer to directly locate the key of the desired record without
having to search through a sequence of the record. Direct file organization allows direct access
to individual records stored in any part of the file. These types of files are used in direct access
storage device (DASD) like floppy disk, hard disk and optical disk.

File access

When a file is used, information is read and accessed into computer memory and there are
several ways to access this information of the file. The file accessing methods are:

1. Direct access: the information/data stored on a device can be accessed randomly and
immediately irrespective to the order it was stored. The data with this access method is quicker
than sequential access. This is also known as random access method. For example hard disk,
flash drive etc.

2. Sequential access: the information/data stored on a device is accessed in the exact order in
which it was stored i.e. one record after another e.g. magnetic tape.

TOPIC 4: INTERNET

The Internet is the global system of interconnected computer networks that uses the Internet
protocol suite (TCP/IP) to link billions of devices worldwide. No one owns the internet, although
several organizations collaborate in its functioning and development.

BROWSER

A browser or a web browser is application software with a graphical user interface for
displaying HTML files, used to navigate the World Wide Web. Technically, a Web browser is a
client program that uses HTTP (Hypertext Transfer Protocol) to make requests of web servers
throughout the Internet on behalf of the browser user. Examples include Mozilla Firefox, Apple
safari, Opera, lynx (a non-graphical browser), Flock, Microsoft Internet explorer, Netscape
Navigator, UC browser, Google chrome, etc.

Benefits/advantages of the internet

· Allow access to vast range of information

· Allow the storing of electronic information in the cloud

· Job opportunity through work from home jobs which cut across freelance writing, graphic
designing, blogging etc.

· It’s an avenue for advertisement of goods and services

· E-commerce: buying online has been made possible through the internet

· Searching: the internet provides lightning fast search engine that gives thousands of link
based on specific search keywords.

· Researching: researchers now use vast availability of resources from digital library online
to do research

· Worldwide media accessibility: internet radio, internet TV, etc.

· News, of all kinds is available almost instantaneously.

· Online training and seminar called Webinars are made possible

· Online degrees: the internet provide us the opportunity to earn degree by taking online
classes

· Online banking

· Mode of communication has moved from face-to-face to Skyping, Instant messaging and
Texting.

Abuse of the Internet

· Pornography – watching X-rated films on the internet

· Fraud: Fraudulent activity involves any attempt to unlawfully obtain money, information or
resources from unsuspecting users within a network.

· Plagiarism – somebody’s electronic content can be plagiarized without due reference or


permission
· Lack of control – the level of information available on the internet is often not controlled.

· Virus - internet users are prone to downloading virus and other malicious program when
downloading software from illegitimate website on the internet.

· Hacking: Having unauthorized access to peoples computer on the internet

· Cyber-bullying: use if the internet to bully and intimidate

· Spamming: sending unsolicited or unwanted advertisement or email

Internet security

Internet security objective is to establish rules and measures to use against attacks over the
Internet. Generally, Internet security encompasses browser security, the security of data entered
through a Web form, and overall authentication and protection of data sent via Internet Protocol.
The following are various security threats:

Malicious software (malware): A computer user can be tricked or forced into downloading
software onto a computer that is of malicious intent. Such software comes in many forms, such
as viruses, Trojan horses, spyware, and worms.

Denial of service attack: A denial-of-service attack (DoS attack) or distributed denial-of-service


attack (DDoS attack) is an attempt to make a network resource unavailable to its intended users.

Phishing: Phishing occurs when the attacker pretends to be a trustworthy entity, either via email
or web page. Victims are directed to fake web pages, which are dressed to look legitimate, via
spoof emails, instant messenger/social media or other avenues.

Rootkits: this is a collection of software tools that enables remote control and administration-
level access over a computer or computer networks. Once remote access is obtained, the
rootkit can perform a number of malicious actions; they come equipped with keyloggers,
password stealers and antivirus disablers

· SQL injection attack: are designed to target data-driven applications by exploiting security
vulnerabilities in the application’s software.

The following tips can guard internet users against security threats:

Use complex and strong passwords always on the internet: a complex password involves a
combination of alphabets (both capital and small letters), numbers and special characters e.g.
P@$$w0rd.

Use computer firewall to monitor and control incoming and outgoing network traffic based on
predetermined security rules.

Antivirus should be installed and must be periodically updated to guard against virus and other
malicious software threats

Ensure that the choice of browser is that which has less vulnerability. Vulnerability in browser
can be reduced by using updated browser app.

Avoid clicking links or attachments from unknown sources

Encryption: data being sent over the internet can be encrypted in a form that only the recipient
can decrypt using a known key.

· Ensure visiting websites with secure protocol i.e. https

You might also like