SS2 Note Data Processing
SS2 Note Data Processing
SS2 Note Data Processing
WEEK
TOPIC
2-3
ENTITY-RELATIONSHIP MODEL
· Types of attributes
· Relationship
3-6
RELATIONAL MODEL
· Definition of terms: relation, attribute, tuple, domain, degree, cardinality, relational scheme.
FILE ORGANIZATION
INTERNET
· Definition of Internet
· Internet security
1. Entities
3. Attributes of entities
Entity: An entity is an object that exists and which is distinguishable from other objects. An
entity can be a person, a place, an object, an event, or a concept which an organization wishes
to maintain data. The following are some examples of entities:
An entity type defines a collection of entities that have same attributes. An entity instance is a
single item in this collection. An entity set is a set of entity instances. For example, STUDENT is
an entity type; a student with ID number ISI070 is an entity instance; and a collection of all
students is an entity set.
In E-R diagram, an entity name should be a concise singular noun that captures the unique
characteristics of the entity type. An E-R diagram depicts an entity type using a rectangle with
the name of the entity inside, as shown below.
E-R diagrams depict an attribute inside an ellipse and connect the ellipse with a line to the
associated entity type. The diagram below indicates some of the possible attributes in an E-R
diagram for the entity STUDENT.
Types of attributes
1. Simple or an atomic attribute, such as City or State, cannot be further divided into smaller
components.
2. Composite attribute can be divided into smaller subparts in which each subpart represents an
independent attribute. Name and Address are the only composite attributes in the diagram
above.
3. Single-valued attribute: most attributes have a single value. For example, a student has only
one date of birth and one identification number.
4. Multi-valued attribute: have more than one value for an entity instance. For example, student
attribute language above, which stores the name of languages a student can speak, is a multi-
valued attribute.
5. Stored attribute is an attribute whose value cannot be derived from the values of other
attributes.
6. Derived attribute: the value of a derived attribute can be determined by analyzing other
attributes. For example in the figure above, Age is a derived attribute because its value can be
derived from the current date and the attribute DateofBirth.
7. A key attribute (or identifier) is a single attribute or a combination of attributes that uniquely
identify an individual instance of an entity type. No two instances within an entity set can have
the same key attribute value. StudentID is the key attribute in the picture above.
Relationship
Entities in an organization do not exist in isolation but are related to each other. Students take
courses and each STUDENT entity is related to the COURSE entity. Faculty members teach
courses and each FACULTY entity is also related to the COURSE entity. Consequently, the
STUDENT entity is related to the FACULTY entity through the COURSE entity. In an E-R diagram,
we represent relationship types with diamond-shaped boxes connected by straight lines to the
rectangles that represent participating entity types. A relationship type is a given name that is
displayed in this diamond-shaped box and typically takes the form of a present tense verb or
verb phrase that describes the relationship. A relationship is an association among several
entities. The figure below is an E-R diagram that shows a relationship between entities
ACCOUNT and CUSTOMER.
Degree of a relationship
The number of entity sets that participate in a relationship is called the degree of relationship.
The 3 most common degrees of relationship are unary (degree 1), binary (degree 2) and ternary
(degree 3).
Unary relationship: A unary relationship exists when there is an association between two
instances of the same entity type. For example, in an organization, an employee will manage
one or more employee.
Binary relationship: A binary relationship exists when there is an association between two
instances of two different entity types. For example, in a secondary school, a binary relationship
exists between a Subject (SUBJECT entity) and a Teacher (TEACHER entity); a teacher teaches
(a) subject.
Ternary Relationship: A ternary relationship exists when there is an association between three
instances of three different entity types. For example, consider a student using certain
equipment for a project. In this case, the STUDENT, PROJECT, and EQUIPMENT entity types
relate to each other with ternary relationships: a student checks out equipment for a project.
Cardinality of a relationship
The cardinality is the number of occurrences in one entity which are associated (or linked) to
the number of occurrences in another entity. For example, if the entity types A and B are
connected by a relationship, then the maximum cardinality represents the maximum number of
instances of entity B that can be associated with any instance of entity A. There are three types
of relationship:
A relational model gives a simple way of representing data in form of a two-dimensional table
called relation. Relational model was invented by Edger Codd. The columns or fields in the table
identify the attribute such as surname, lastname, age, gender, Admission number etc. A tuple or
row contains all the data of a single instance of the table such as a person named James Bond.
In the relational model, every tuple must have a unique identification or key based on the data. In
the table below, the Admission number is the key that uniquely identifies each tuple in the
relation. Keys are often used to join data from two or more relations based on matching
identification. Another concept used in a relational model is the foreign keys, which are primary
keys in one relation that are kept in another relation to allow for the joining of data.
Definition of terms
1. Field: is a named column in the relation (table). There are five fields in the student relation
above – Admission number, Surname, Last name, Age, Gender.
2. Tuple: is a record (i.e. a row of data in a relation). All tuples in a relation must be distinct.
3. Domain: is defined as the set of all unique values permitted for an attribute. For example,
the domain of the field Gender as seen in the table above is M and F. Also the domain of the
attribute Surname consist the combination of all possible letters in the alphabet. A domain is
referred to in a relation schema by the domain name and has a set of values.
4. The Degree (also called arity) of a relation is the number of fields in it. The relation above
is of degree 5.
5. The cardinality of a relation is the number of tuples in it. For example, the relation above
contains 6 tuples and hence the cardinality is 6.
6. A relational database is a collection of relations with distinct names that can be linked
together.
7. Relation schema describes the column heads of the table. The schema specifies the
relation’s name, the name of the field (or attribute or column) and the domain of each field. For
example the schema for the Student relation above is given below:
Student (Admission number: integer, Surname: string, last name: string, Age: integer, Gender:
string).
1. Candidate key: Candidate keys are defined as the minimal set of fields which can uniquely
identify each record in the table. It is an attribute or a set of attributes that can act as a primary
key for a table to uniquely identify each record in that table.
2. Primary key: is any candidate key that is most appropriate to be the main reference key for
the table. It is used to establish relationship with other tables. It must never be null and must be
unique.
3. Foreign key: is generally a primary key of one table that appears as a field in another
where the first table has a relationship with the second. In other words, if we have a table A with
a primary key X that is linked to a table B where X is a field in B, then X is a foreign key in B.
4. Secondary key or alternative key: a table may have one or more choices for the primary
key. Collectively these are known as candidate keys as shown above. One is selected as the
primary key. Those not selected are known as secondary keys or alternative keys.
5. Composite key is combination of two or more columns in a table that can be used to
uniquely identify each row in a table
MS-Access, a relational DBMS typically provides a Graphical User Interface (GUI) that allows the
creation of a database and tables that store information about different subject. This can be
achieved using a query language called SQL (Structured Query Language). SQL can do more
than query a database; it can define the structure of the database, modify data in the database
and specify security constraints. A subset of the SQL that is used to define the structure of the
database and the table is called data definition language (DDL). In SQL, a relation is a table.
The “CREATE DATABASE” is the keyword and “databasename” is the name of the database.
For example assuming we want to create a database for students information, then the create
statement would be as follows:
Column1 datatype,
Column 2 datatype,
Column3 datatype,
.....
columnN datatype,
);
The “CREATE TABLE” is the keyword that tells the D system what to do table_name is the name
of the table, “PRIMARY KEY” is a keyword that indicate which column listed above is to be used
as the primary key, column1…ColumnN indicates the fields (or column as in a table).
For example, if you want to create a table that stores student information (i.e. student table), the
statements below will achieve that.
Lastname CHAR(20),
);
In order to insert tuples (records) into a table, the INSERT command is used. Here is the syntax:
INSERT INTO Table_name (column1, column2, ….., columnN)
OR
For example to insert the first tuple as indicated in the table above, use the following
statements:
OR
To delete a tuple (record) from the table, use the DELETE Command. Here is the syntax:
WHERE [condition];
For example, to delete a tuple with surname = James from the relation above, use the following
statement:
The database system deletes the first record with AdmissionNumber = 1201.
Note: The command ‘DELETE FROM students;’ - will delete all tuples (records)in the table.
The UPDATE query is used to modify the existing records in a table. The syntax is as follows:
UPDATE table_name
WHERE [condition];
For example, if we want to change the gender of James Bond in the table above (i.e. from M to F)
, use the following statement:
UPDATE StudentTbl
The DELETE, INSERT and UPDATE statements above are part of the SQL subset called the Data
Manipulation Language (DML).
Querying Database
You query the database when you are requesting for certain information from the database. You
can include condition that such items you want should meet. The query helps us retrieve data
from the database.
Example
The following SQL statement selects the entire rows of data from the StudentTbl:
Example
We can structure the query to limit the number of columns of data to be retrieved from the
StudentTable.
Example
FROM StudentTbl
Constraints are the rules enforced on data columns in table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the
database. Constraints could be column level or table level. Column level constraints are applied
only to one column, whereas table level constraints are applied to the whole table. The following
are commonly used constraints in SQL:
1. Entity Constraints
Entity constraints can be enforced when you create a table and assign certain column of the
table to be the primary key or and when you assign certain column of that table not to accept
NULL values. The PRIMARY key constraint tells the database engine that none of the rows
should have a duplicate value. The NOT NULL constraint ensures that a column cannot have a
NULL value
Examples
Gender Char(2),
DOB DateTime
);
You can also use the UNIQUES constraint to ensure that all values in a column are different; that
is, no duplicate entries should exist in the table.
Age INT,
);
The GetDate function inserts the current date into the table.
2. Domain Integrity
This constraint can be enforced by specifying that certain columns in the table should be of a
particular data type or that it should meet certain conditions. Example below employs two types
of constraints: CHECK and UNIQUE constraints. The CHECK constraint ensures that all values in
a column satisfy certain conditions (i.e. any record to be inserted must have age value greater
than or equal to 10).
Example
);
The DEFAULT constraint provides a default value for the date column when no value is specified.
3. Referential Integrity
Referential Integrity constraint can be enforced using two or more tables. The first table is
called the parent table while any other table or tables will be regarded as the child table(s). The
child table(s) will be used to reference the parent table.
Example
Gender Char(6),
DOB DateTime
);
Duration DateTime
);
StudentId in the Subject table is a foreign key that “references” StudentId in the Student table
called the parent table. The idea behind this is that a subject cannot exist for a student when
that student is not in the student (parent) table. For example, in order for a student to have a
subject in the Subject table, such student must exist in the Student table.
File organization refers to the logical relationships among various records that constitute the
file, particularly with respect to the means of identification and access to any specific record.
File organization is used to describe the way in which the records are stored in terms of blocks,
and the blocks are placed on the storage medium. In short, it refers to the way in which data
records are stored or arranged.
1. Serial (heap) files organization: records are placed in the order in which they are inserted. A
new record is inserted in the last page of the file; if there is insufficient space in the last page, a
new page is added to the file. This makes insertion very efficient but since heap file has no
particular ordering with respect to field values, a linear search must be performed to access
record as thus making it slower as compared to sorted files.
2. Sequential file organization: records are stored in some sort of order (ascending, descending
or alphabetical) based on the unique key field or search key. This method is fast & efficient for
huge amount of data. Sorted file is inefficient as it takes time & space for sorting records. A
sequential file could be stored on a sequential storage device such as magnetic tape and
magnetic disks. To access these records, the computer must read the file in sequence from the
beginning. The first record is read and processed first, then the second record in the file
sequence, and so on.
3. Index Sequential Access Method (ISAM): Records are organized in sequence and an index
table is used to speed up access to the records without requiring a search of an entire file. The
records of the files can be stored in a random sequence but the index table is in a sorted
sequence on the key value. This type of file organization is best suited for situations where both
batch and online processing are to be supported.
4. Direct or Random file organization: In this file organization, records are organized in such a
way that it is possible for the computer to directly locate the key of the desired record without
having to search through a sequence of the record. Direct file organization allows direct access
to individual records stored in any part of the file. These types of files are used in direct access
storage device (DASD) like floppy disk, hard disk and optical disk.
File access
When a file is used, information is read and accessed into computer memory and there are
several ways to access this information of the file. The file accessing methods are:
1. Direct access: the information/data stored on a device can be accessed randomly and
immediately irrespective to the order it was stored. The data with this access method is quicker
than sequential access. This is also known as random access method. For example hard disk,
flash drive etc.
2. Sequential access: the information/data stored on a device is accessed in the exact order in
which it was stored i.e. one record after another e.g. magnetic tape.
TOPIC 4: INTERNET
The Internet is the global system of interconnected computer networks that uses the Internet
protocol suite (TCP/IP) to link billions of devices worldwide. No one owns the internet, although
several organizations collaborate in its functioning and development.
BROWSER
A browser or a web browser is application software with a graphical user interface for
displaying HTML files, used to navigate the World Wide Web. Technically, a Web browser is a
client program that uses HTTP (Hypertext Transfer Protocol) to make requests of web servers
throughout the Internet on behalf of the browser user. Examples include Mozilla Firefox, Apple
safari, Opera, lynx (a non-graphical browser), Flock, Microsoft Internet explorer, Netscape
Navigator, UC browser, Google chrome, etc.
· Job opportunity through work from home jobs which cut across freelance writing, graphic
designing, blogging etc.
· E-commerce: buying online has been made possible through the internet
· Searching: the internet provides lightning fast search engine that gives thousands of link
based on specific search keywords.
· Researching: researchers now use vast availability of resources from digital library online
to do research
· Online degrees: the internet provide us the opportunity to earn degree by taking online
classes
· Online banking
· Mode of communication has moved from face-to-face to Skyping, Instant messaging and
Texting.
· Fraud: Fraudulent activity involves any attempt to unlawfully obtain money, information or
resources from unsuspecting users within a network.
· Virus - internet users are prone to downloading virus and other malicious program when
downloading software from illegitimate website on the internet.
Internet security
Internet security objective is to establish rules and measures to use against attacks over the
Internet. Generally, Internet security encompasses browser security, the security of data entered
through a Web form, and overall authentication and protection of data sent via Internet Protocol.
The following are various security threats:
Malicious software (malware): A computer user can be tricked or forced into downloading
software onto a computer that is of malicious intent. Such software comes in many forms, such
as viruses, Trojan horses, spyware, and worms.
Phishing: Phishing occurs when the attacker pretends to be a trustworthy entity, either via email
or web page. Victims are directed to fake web pages, which are dressed to look legitimate, via
spoof emails, instant messenger/social media or other avenues.
Rootkits: this is a collection of software tools that enables remote control and administration-
level access over a computer or computer networks. Once remote access is obtained, the
rootkit can perform a number of malicious actions; they come equipped with keyloggers,
password stealers and antivirus disablers
· SQL injection attack: are designed to target data-driven applications by exploiting security
vulnerabilities in the application’s software.
The following tips can guard internet users against security threats:
Use complex and strong passwords always on the internet: a complex password involves a
combination of alphabets (both capital and small letters), numbers and special characters e.g.
P@$$w0rd.
Use computer firewall to monitor and control incoming and outgoing network traffic based on
predetermined security rules.
Antivirus should be installed and must be periodically updated to guard against virus and other
malicious software threats
Ensure that the choice of browser is that which has less vulnerability. Vulnerability in browser
can be reduced by using updated browser app.
Encryption: data being sent over the internet can be encrypted in a form that only the recipient
can decrypt using a known key.