Databases_Phoenix

A database is an organized collection of information that allows for easy access, management, and updating, typically structured in three levels: external, conceptual, and internal. Unlike file systems, databases provide automated methods for data management, ensuring data integrity, consistency, and security, while supporting various applications across different domains. The document also discusses different data models, including hierarchical, network, and relational models, highlighting the advantages of relational databases in managing complex data relationships.

Uploaded by

stephenmaremaina

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

6 views

Databases_Phoenix

Uploaded by

stephenmaremaina

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 20

_| L ‘Whats a Database? Adatabase is a collection of information that is organized so thet it can easily be accessed, manage 4 and updated ADetabas e may contain different levels of abstraction ints architecture, Typically, the three levels: € xternal, conceptual and intemal make up the database architecture. Extemal level defines how the us es view the data. A single database can have multiple views. The intemal evel defines how the data s physically stored. The conceptual level is the communication medium between internal and external levels. Itprovides @ unique view of the database regardless of how itis stored or viewed. There are se veral types of databases such as Analytical databases, Data warehouses and Distibuted databases. Databases (more correctly, relational databases) are made up of tables, and they contain rows and co lurns, much lke spreadsheets in Excel, Each column corre sponds to an attibute while each row repr esents a single record. For example, in a databas e, which stores employee information of a company, the columns could contain employee name, employe e id and salary, while @ single row represents a si ngle employee. Most databases come with a Database Management System (DBMS) that makes it v ery easy to create manage organize data. Database System. Database and File System are two methods used to store, retrieve, manage and manipulate data. Both systems can be used to allow the user to work with data in a sirilarway. A File System is a collection of raw data files stored in the hard-dive, whereas a database is intended for easily organizing, storing and retrieving large amounts of data. In other words, a database holds a bundle of organized data (typi cally in a digital form) for one ormore users. Databases, of en abbreviated DB, are classified accord 1ngto their content, such 2s document-text, bibliographic and statistical. It should be noted that, even 1na database, data are eventually (physically) stored in some sort of files. Whatis the difference between Fie system and Database? As a summery, in a File System, files are used to store data while, a database is a collection of organized data. Although File System and databases are two ways 0 f managing data, databases clearly have many advantages over File Systems. Ty pically when using a File System, most tasks such as storage, retrieval and search are done manually (even though most operating systems provide graphical interfa ces to make these tasks easier) and it is quite tedious whereas when using a data base, the inbuilt DBMS will provide automated methods to complete these tasks. Because of this reason, using a File System will lead to problems like data integrity, data inconsistenc yand data security, but these problems cauld be avoided by us ing @ database. Unlike a File System d atabases are efficient because reading line by line is not required, and certain control mechanisms ar e in place. Whatis a File system? ‘As mentioned above, ina typical File System electronic data are directly stored in a set of files. If only one table is stored in a fie, itis called a flat fle. They contain values in each row separated with a sp ecial delimiter ike commas. In order to query some random data, first itis requited to parse each row and load it to an array at run time, but forth file should be read sequentially (because, there is no.co ntrol mechanism in files); therefore itis quite inefficient and time consuming. The burden of locating t he necessary file, going through the re cords (line by line), checking for the existence of a certain data and remembering what files /records to edit are on the user. The user either has to perform each task manually or has to write a seript that does them automatically with the help of the file management c “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500_| L apabilties of the operating system. Because of these reasons, File Systers are easily vulnerable to s etious issues lke inconsistency, inability to maintain concurrency, data isolation, threats on integrity & nd lack of security. Advantages of Database Systems Centralized storage of data for all applications in the organization that ca n then be pooled. Independent of application program - many different applications can use data from common shared database(s). Data consistency: when an attribute in a table is updated, its up-to-date va lueis available to all users of the RDBMS, in whatever report they use and i 1 exactly the same form Data redundancy- because there is only one copy of each attribute kept-du plication should be eliminated altogether in a well-designed DBS. Flexibility easy to set up new relationships and new entities. New tables a nd reports can be set up as and when required. Security- all access to data is via a centralized system, a uniform system 0 f security monitoring can be implemented. Applications of database systems (Shifts in application domains help illustrate evolution of DBMS's) “] Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:07 Reservation systems, banking systems Record/book keeping (corporate, university, medical), statistics Bioinformatics, e.g, gene databases Criminal justice 0 Fingerprint matching © How do you encode ‘looks like’? Multimedia systems © Require terabytes (10%? bytes) of storage o Tertiary storage devices, e.g, CD, DVDs_| L ©. Image/audio/video retrieval 0 Streaming, interactivity # Satellite imaging; can require petabytes (10'S bytes) of storage © The web ©. Client-server and multi-tier architectures 0 Almost all data-intensive websites are database-driven; IMDB.com is an exception * Information integration o Over the web ° Legacy systems; must deal with issues of * Synonymy: different words having the same meaning, e.g, ¢ offee shop vs. café * polysemy: same word (homonym) having different meaning 5,64, shot o Data warehouses ° Data mining (KDD, Knowledge Discovery in Databases), e.g. associ ation rules: ‘diapers’n beer; we pass these on to the marketing folk s * Insum, databases are everywhere! Data models 1. Hierarchical model The hierarchical data model organizes data in a tree structure. There is a hierar chy of parent and child data segments. This structure implies that a record ca 1 have repeating information, generally in the child data segments. Dataina s eties of records, which have a set of field values attached toit. It collects all th e instances of a specific record together as a record type. These record types a “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 55004 re the equivalent of tables in the relational model, and with the individual pcos ds being the equivalent of rows. To create links between these record types, th e hierarchical model uses Parent Child Relationships. These are a 1:N mappin g between record types. This is done by using trees, like set theory used in ther elational model, ‘borrowed’ ftom maths. For example, an organization might s tore information about an employee, such as name, employee number, depart ment, salary. The organization might also store information about an employe es children, such asname and date of birth. The employee and children data f orms a hierarchy, where the employee data represents the parent segment and the children data represents the child segment. If an employee has three chil dr en, then there would be three child segments associated with one employee se gment. In a hierarchical database the parent-child relationship is one to many This restricts a child segment to having only one parent segment. Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM's Infor mation Management System (IMS) DBMS, through the 1970s For example, the following is the hierarchical schema of a company database: DEPARTMENT ONAME | DNUMBER | MGANAME | MGRSTARIDATE. ‘ EMPLOYEE PROJECT, AMET SSN | BDATE [ADDRESS PHN T pruprsen | FLOGATON The tree representation of the above hierarchical schema is shown below: DEPARTMENT estore “> praueer The Hierarchical Data Mode | structures data in a tree of records, with each record having one parent r ecord and many children. It can be represented as follows: “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500Figure 1- The Hierarchical Bata Model O O Ahierarchical database consists of the following 1. Iteontains nodes connected by branches. 2. The top node is called the root 3. If multiple nodes appear atthe top level, the nodes are called root segments 4. The parent of node n*is a node directly above n* and connected to nx by a br anch 5. Each node (with the exception of the root) has exactly one parent. 6. The child of node n*is the node directly below nxand connected tom by ab ranch 7. Oneparent may have many children. Network model The popularity of the network data model coincided with the popularity of the hierarchical data model. Some data were more naturally modeled with more th an one parent per child. So, the network model permitted the modeling of man y-to-many relationships in data. In 1971, the Conference on Data Systems Lan guages (CODASYL) formally defined the network model. The basic data mode ling construct in the network model is the set construct. A set consists of an 0 wner record type, a set name, and a member record type. A member record typ e can have that role in more than one set, hence the multiparent concepts sup ported. An owner record type can also be a member or owner in another set. T he data model is a simple network, and link and intersection record types (call ed junction records by IDMS) may exist, as well as sets between them . Thus, t “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500_| L he complete network of relationships is represented by several pairwise sets; n each set some (one) record type is owner (at the tail of the network arrow) a ind one or more record types are members (at the head of the relationship arro w). Usually, a set defines a 1:M relationship, although 1:1 is permitted The Network Data Model uses a lattice structure in which a record can have many parents 2s well as many children. Itcan be represented as follows: Figure 2- The Network Data Model re a ny hoy ty ay Like the The Hierarchica Data Model the Network Data Model also consists of nodes and branches, but a child may have multiple parents within the network structure instead of being restricted to just 0 Both hierarchical and network databases, and they both suf fered from the following deficiencies (wh en compared with relational databases), © Access to the database was not via SQL query strings, but by a specific set of APIs, typically for FIND, CREATE, READ, UPDATE and DELETE. © Each API would only access single table (dataset), so it was not possible to i mplement 2 JOIN, which would return data from several tables * Itwas not possible to provide a variable WHERE clause. The only selection me chanism available was a. Read all entries (a full table scan). Read a single entry using a specific primary key. Read all entries on a child table which were associated with a select ed entry on a parent table d. Any further filtering had to be done within the application code. “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 55004, twas not possible to provide an ORDER BY clause. Data was presented intnee— order in which it existed in the database. This mechanism could be tuned by s pecifying sort criteria to be used when each record was inserted, but this had s everal disadvantages * Only asingle sort sequence could be defined for each path (link to a parent), s o all records rettieved on that path would be provided in that sequence. ® It could make inserts rather slow when attempting to insert into the middle of a large collection, or where a table had multiple paths each with its own set of sort criteria. The Relational Data Model e Relational mode! (RDBMS - relational database management system) A database based on the rela tional model developed by E.F. Codd. A relational database allows the definition o f data structures, storage and retrieval operations and integrity constraints. In suc ha database the data and relations between them are organised in tables. A table is a collection of records and each record in a table contains the same fields Properties of Relational Tables: Values Are Atomic Each Row is Unique Column Values Are of the Same Kind The Sequence of Columns is Insignificant The Sequence of Rows is Insignificant Each Column Has a Unique Name Certain fields may be designated as keys, which means that searches for s pecific values of that field will use indexing to speed them up. Where fields in two diff erent tables take values from the same set, a join operation can be performed to select related records in the two tables by matching values in those fields. Often, but not always, the fields will have the same name in both tables. For example, an “orders" table might contain (customer1D, pro duct-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple f ields. Because these relationships are only specified at retreival time, relati onal databases are classed as dynamic database management system. T he RELATIONAL database model is based on the Relational Algebra. The Relational Data Model has the relationat its heart, but with @ whole series of rules goveming it for “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500_| L exarrple, keys, relationships, joins, functional dependencies, transitive dependencies, multi-valued dependenci es, and modification anomalies. The Relations the basic elementin a relational data model. Figure 3 - Relations in the Relational Data Model Attributes: Attributes Attibutes Attributes \ \ f t —+ Tuples. ——>| — A relation is subjecttto the following rules: Relation (file, table) is a two-dimensional table. Attribute (Le. field or data item) is a column in the table. Each column in the table has a unique name within that table. Each column is homogeneous. Thus the entries in any column are all of the same type (eg. age, name, employee-number, etc) Each column has a domain, the set of possible values that can appearin th at column. ATuple (ie. record) is a row in the table. The order of the rows and columns is not important. Values of a row all relate to some thing or portion of a thing, Repeating groups (collections of logically related attributes that occur multi ple times within one record occurrence) are notallowed 10. Duplicate rows are not allowed (candidate keys are designed to prevent t his) 11. Cells must be single-valued (but can be variable length). Single valued m eans the following Cannot contain multiple values such as'A1,B2,C3", Cannot contain combined values such as ‘ABC-XYZ' where ‘ABC’ mea ns one thing and'XYZ' another. ow Bona pawn ‘relation may be expressed using the notation R(A,B,C, ...) where: “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500© R= the name of the relation * (ABC...) = theattibutes within the relation, © A= the attribute(s) which form the primary key Keys 1. Asimple key contains a single attribute 2. Acomposite key is a key that contains more than one attribute. 3. A candidate key is an attribute (or set of attributes) that uniquely identifies a row. A candidate key must possess the following properties: The key can be discarded without destroying the property of unique identific ation Unique identification - For every row the value of the key must uniquel y identify that row. Non redundancy -No attribute, 4. A primary key is the candidate key whichis selected as the principal unique identifier. Every relation must contain a primary key. The primary key is usua lly the key selected to identify a row when the database is physically implem ented. For example, a partnumber is selected instead of a part description. 5. Asuperkey is any set of attributes that uniquely identifies a row. A superkey differs from a candidate key in that it does not require the non redundancy p roperty. 6. A foreign key is an attribute (or set of attributes) that appears (usually) as @ hon key attribute in one relation and as a primary key attribute in another rel ation. | say usually because it is possible for a foreign key to also be the wh ole or part of a primary key: © Amany-tomany relationship can only be implemented by introducin g an intersection or link table which then becomes the child in two on e-to-many relationships. The intersection table theref ore has a foreig 1 key for each of its parents, and its primary key is a composite of bo th foreign keys * Aoneto-one relationship requires that the child table has no more th an one occurrence for each parent, which can only be enforced by lett ing the foreign key also serve as the primary key. 7. A semantic or natural key is a key for which the possible values have an obv ious meaning to the user or the data. For example, a semantic primary key f ora COUNTRY entity might contain the value 'USA’ for the occurrence descri bing the United States of America. The value ‘USA’ has meaning to the user. 8. Atechnical or surrogate or artificial key is a key for which the possible valu es have no obvious meaning to the user or the data. These are used instead of semantic keys for any of the following reasons © When the value in a semantic key is likely to be changed by t “] T Dr. Mbii Kavindu— Email: Honkavindu@amailcom -- Phone:0722294481/ 074541 5500he user, or can have duplicates. For example, on a PERSON table it is unwise to use PERSON_NAME as the key asi is possible to have more than one person with the same name, or the na me may change such as through marriage When none of the existing attributes can be used to guarantee uniqueness. | n this case adding an attribute whose value is generated by the system, e.g f rom a sequence of numbers, is the only way to provide a unique value. Typic al examples would be ORDER_ID and INVOICE_ID. The value '12345' has no meaning to the user as it conveys nothing about the entity to which it relate 8. 9. Akey functionally determines the other attributes in the row, thus itis alway s a determinant. 10. Note that the term ‘key’ in most DBMS engines is implemented as an ind ex which does not allow duplicate entries. Data Relationships One table (relation) may be linked with another in what is known as a relationship. Relationships may be built into the database structure to facilitate the operation of relational joins at runtime. 1. Arelationship is between two tables in what is known as a one-to-many or parent-child or master-detail relationship where an occurrence on the ‘one’ or ‘parent or ‘master table may have any number of associated occurrences on the many’ or'child’ or ‘detail’ table. To achieve this the child table must c ontain fields which link back the primary key on the parent table. These fiel ds on the child table are known as a foreign key, and the parent table is refe rred to as the foreign table (from the viewpoint of the child) 2. Itis possible for a record on the parent table to exist without corresponding records on the child table, but it should not be possible for an entry on the ¢ hild table to exist without a corresponding entry on the parent table. 3. Achild record without a corresponding parent record is known as an orpha n. 4, Itis possible for a table to be related to itself. For this to be possible it needs a foreign key which points back to the primary key. Note that these two key s cannot be comprised of exactly the same fields otherwise the record could only ever point to itself. 5. Atable may be the subject of any number of relationships, and it may beth € parent in some and the child in others 6. Some database engines allow a parent table to be linked via a candidate ke “] T Dr. Mbii Kavindu— Email: Honkavindu@amailcom -- Phone:0722294481/ 074541 5500y, but if this were changed it could result in the link to the a table being brok en. Some database engines allow relationships to be managed by rules known as referential integrity or foreign key restraints. These will prevent entries o 1 child tables from being created if the foreign key does not exist on the par ent table, or will deal with entries on child tables when the entry on the pare nt table is updated or deleted. Database Names 1. Database names should be short and meaningful, such as products, purcha sing and sales. © Short, but not too short, as in prod or purch © Meaningful but not verbose, asin ‘the database used to store product de tails. 2. Do not waste time using a prefix such as db to identify database names. Th e SQL syntax analyser has the intelligence to work that out for itself 3. If your DBMS allows a mixture of upper and lowercase names, and itis case sensitive, it is better to stick to a standard naming convention such as: © Alluppercase. © Alllowercase (my preference - see The choice between upper and lower c ase) © Leading uppercase, remainder lowercase. Inconsistencies may lead to confusion, confusion may lead to mistakes, mi stakes can lead to disasters. 4. If adatabase name contains more than one word, such asin sales orders a nd purchase orders, decide how to deal with it © Separate the words with a single space, asin sales orders (note that so me DBMSs do not allow embedded spaces, while most languages will re quire such names to be enclosed in quotes), © Separate the words with an underscore, as in sales_orders (my pref erenc @- see The choice between upper and lower case) © Separate the words with a hyphen, as in sales-orders. © Use camel caps, as in SalesOrders. Again, be consistent. 5. Rather than putting all the tables into a single database it may be better toc reate separate databases for each logically related set of tables. This may h elpwith security, archiving, replication, etc Table Names “] Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:07222 9 4481/ 074841 S5001. Table names should be short and meaningful, such as part, customer andi nvoice © Short, but not too short © Meaningful, but not verbose. 2. Do not waste time using a prefix such as tbl to identify table names. The SQ L syntax analyser has the intelligence to work that out for itself ~so should y ou. 3. Table names should be in the singular (eg. customer not customers). The f act that a table may contain multiple entries is irrelevant - any multiplicity ca 1 be derived from the existence of one-to-many relationships 4, If your DBMS allows a mixture of upper and lowercase names, and itis case sensitive, It is better to stick to a standard naming convention such as: © Alluppercase. 0 Alllowercase. (my preference- see The choice between upper and lower case) © Leading uppercase, remainder lowercase. Inconsistencies may lead to confusion, confusion may lead to mistakes, mi stakes can lead to disasters. 5. If atablenamecontains more than one word, such as in sales order and pu rchase order, decide how to deal with it © Separate the words with a single space, asin sales order (note that som @ DEMSs do not allow embedded spaces, while most languages will req uire such names to be enclosed in quotes) 0 Separate the words with an underscore, as in sales_order (my preference -see The choice between upper and lower case). © Separate the words with a hyphen, as in sales-order © Use camel caps, as in SalesOrder. Again, be consistent. 6. Be careful if the same table name is used in more than one database - it ma y lead to confusion Field Names 1. Field names should be short and meaningful, such as part_name and custo mer_name. © Short, but not too short, such as in ptnam © Meaningful, but not verbose, such as the name of the part. 2. Do not waste time using a prefix such as col or fld to identify column/field n ames. The SQL syntax analyserhas the intelligence to work that out for itsel f-so should you 3. If your DBMS allows a mixture of upper and lowercase names, andit is case Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:07_| L sensitive, it is better to stick toa standard naming convention such as: © Alluppercase. © Alllowercase. (my preference- see The choice between upper and lower case) © Leading uppercase, remainder lowercase. Inconsistencies may lead to confusion, confusion may lead to mistakes, mi stakes can lead to disasters 4. If afield name contains more than one word, such as in part name and cust omer name, decide how to deal with it: © Separate the words with a single space, as in part name (note that some DBMSs do not allow embedded spaces, while most languages will requir e such names to be enclosed in quotes) © Separate the words with an underscore, asin part_name (my preference - see The choice between upper and lower case). © Separate the words with a hyphen, as in part-name. © Use camel caps, as in PartName. Again, be consistent. 5. Common words in field names may be abbreviated, butbe consistent © Do not allow a mixture of abbreviations, such as ino, num’ and’nbr for’ number. © Publish a list of standard abbreviations and enforceit. 6. Although field names must be unique within a table, it is possible to use the same name on multiple tables even if they are unrelated, or they do not shar e the same set of possible values. It is recommended that this practice shou Id be avoided, for reasons described in Field names should identify their con tent and The naming of Foreign Keys Primary Keys 1. Itis recommended that the primary key of an entity should be constructed fr om the tablename with a suffix of ID. This makes it easy to identify the pri mary key ina long list of field names. 2. Do not waste time using a prefix such as pk to identify primary key fields. T his has absolutely mo meaning to any database engine or any application. 3. Avoid using generic names for all primary keys. It may seem a clever idea to use the name ID for every primary key field, but this causes problems: © Iteauses the same nameto appear on multiple tables with totally differe nt contexts. The string ID={ABC123'is extremely vague asit gives no ide a of the entity being referenced. Isit an invoice id, customer id, or what? © Italso causesa problem with foreign keys 4, There is no rule that says a primary key must consist of a single attribute -b Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:07oth simple and composite keys are allowed - so don't waste time creating artifi ial keys 5. Avoid the unnecessary use of technical keys. Ifa table already contains a sa tisfactory unique identifier, whether composite or simple, there is no need to create another one. Although the use of a technical key can be justified in ce rtain circumstances, it takes intelligence to know when those circumstances are right. The indiscriminate use of technical keys shows a distinct lack of i ntelligence. For further views on this subject please refer to Technical Keys - Their Uses and Abuses. Foreign Keys 1. Itis recommended that where a foreign key is required that you use the sam e name as that of the associated primary key on the foreign table. It is areq uirement of arelational join that two relations can anly be joined when they share at least one common attribute, and this should be taken to mean the a tribute name(s) as well as the value(s). Thus where the customer and invoi ce tables are joined in a parent-child relationship the following will result © The primary key of customer will be customer_id © The primary key of invoice will be invoice_id © The foreign key which joins invoice to customer will be customer id. 2. For MySQL users this means that the shortened version of the join condition may be used © Short: A LEFT JOIN B USING (a,b,c) © Long: A LEFT JOIN B ON (A.a=B.a AND A.b=B.b AND Ac=B.c) 3. The only exception to this naming recommendation should be where a table contains more than one foreign key to the same parent table, in which case t he names must be changed to avoid duplicates. In this situation | would sim ply add a meaningful suffix to each name to identify the usage, such as: © To signify movement I would use location_id_from and location_id_to. © To signify positions in a hierarchy I would use node_id_snr and node,i jnr. © To signify replacement | would use part id_old and part i I prefer to use a suffix rather than a prefix as it makes the leading characters match (asin PART_ID_old and PART_ID_new) instead of having the traiingc haracters match (as in old_PART_ID and new_PART_ID). Donot waste time using a prefix such as fk to identify foreign key fields will recreate an instance of a relation. Some sequences are more desirable since they resultin the cre “] T Dr. Mbii Kavindu— Email: Honkavindu@amailcom -- Phone:0722294481/ 074541 5500_| L ation of less irwalid data during the join operation. Suppose that a relation is decomposed using functional dependencies. andmult-valued dependencie s. Then at least one sequence of joins on the resulting relations exists that recreates the original insta noe with no invalid data created during any of the join operations. For example, Suppose that a lst of grades by room number is desired. This question, which was prob ably not anticipated during databas e design, can be answered without creating invalid data by either 0 f the following two jain sequences: Database normalization Database normalization is the process of organizing the fields and tables of a rel ational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and definin grelationships between them. The objective is to isolate data so that additions, d eletions, and modifications of a field can be made in just one table and then prop agated through the rest of the database via the defined relationships Objectives Of Normalisation. Free the database of modification anom: s Asimple exarnple of norms izing data might consist of a table showing Customer item purchas edPurchase priceThomasShint$40Mary shoes$35CaroleShirt$40WilliamTrousers $25 Custome Item purchase Purchase pric ' d e Thomas Shirt $40 Mary shoes $35 Carole Shirt $40 William Trousers $25 If this table is used for the purpose of keeping track of the price of terns and you want to delete one of the customers, you will also delete a price. Normalizing the data would mean understanding this an dsolving the problem by dividing this table into two tables, one with information about each customer +" @ product they bought and the second about each product and its price. Making additions or ““ Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:07_| L ons to elther table would not affect the other. When an attemptis made to modify (update, insert into, or delete from) a table, undesired side-ef fects may follow: Notall tables can suffer fromthese side-effects; rather, the side-effects can o nly arise in tables that have not been sufficiently normalized, An insufficiently normalized table mi ght have one or more of the following characteristics: © The same information can be expressed on multiple rows; therefore update § to the table may result in logical inconsistencies. For example, each recor d in an "Employees’ Skills" table might contain an Employee ID, Employee A ddress, and Skill; thus a change of address for a particular employee will p otentially need to be applied to multiple records (one for each of his skills) If the update is not carried through successf ully-if, that is, the employee's address is updated on some records but not others—then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomen on is known as an update anomaly. * There are circumstances in which certain facts cannot be recorded at all. F or example, each record in a ‘Faculty and Their Courses" table might contai n a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code-thus we can record the details of any faculty memberwho teaches at least one cou rse, but we cannot record the details of a newly-hired faculty member who has not yet been assigned to teach any courses except by setting the Cours e Code to null. This phenomenon is known as aninsertion anomaly. «There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different fa cts. The "Faculty and Their Courses" table described in the previous exampl e suffers from this type of anomaly, for if a faculty member temporarily ce ases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member. This phenomenon is known as a deletion anomaly. Minimize redesign when extending the database structure “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500_| L When a fully normalized database structure is extended to allow itto accommoda te new types of data, the pre-existing aspects of the database structure can remai largely or entirely unchanged. As a result, applications interacting with the datab ase are minimally affected Make the data model more informative to users Normalized tables, and the relationship betwe en one normalized table and another minarrea.workd¢ oncepts and their interrelationships. Normalization rules. First Normal Form * Eliminate repeating groups in individual tables © Create a separate table for each set of related data. Identify each set of related data with a primary key. Donot use multiple fields in a single table to store similardata. Forexample, to tr ack an inventory item that may come from two possible sources, an inventory rec ord may contain fields for Vendor Code 1 and Vendor Code 2. What happens when you add a third vendor? Adding a field is not the answer, it re quires program and table modifications and does not smoothly accommodate a dynamic number of vendors. instead, place all vendor information in a separate ta ble called Vendors, then link inventory to vendors with an item number key, or ven dors to inventory with a vendor code key. Normalizing an Example Table These steps demonstrate the process of normalizing a fictitious student table. 1. Unnormalized table: Student juvisor Adv-Roo Class Class Class # m 1 2 3 1022, 9 John 412 101-0 143-0 159-0 7 1 2 4123 Simon 216 201-0 211-0 214-0 1 2 1 2. Fitst Normal Form: No Repeating Groups Tables should have only two dimensions. Since one student has several cl asses, these classes should be listed in a separate table. Fields Class, Cla ss2, and Class3 in the above records are indications of design trouble. Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem is with a one-to-many relationship, do not put t “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500_| he one side and the many side in the same table. Instead, create another table in first normal form by eliminating the repeating group (Class#), as shown below: Student, yyigop Adv-Roo Class # m # 1022 © John 412 101-0 7 1022.9 John = 412 143.0 1 1022. John 412 159-0 2 4123 Simon 216 201-0 1 4123 Simon 216 211-0 2 4123 Simon 216 2140 1 Second Normal Form © Create separate tables for sets of values that apply to multiple records Relate these tables with a foreign key. Records should not depend on anything other than a table's primary key (a compo und key, if necessary). For example, consider a customer's address in an accounti 1g system. The address is needed by the Customers table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of storing the customers address as a separate entry in each of these tables, store it in one place, either in the Customers table or in a separate Addresses table. The following two tables demonstrate second normal form: Students: ‘Student # 1022.9 John 412 Advisor Adv-Room: “] T Dr. Mbii Kavindu— Email: Honkavindu@amailcom -- Phone:0722294481/ 074541 55004123 Simon 216 3 Registration: Student Class # # 1022-1010 7 1022 143.0 1 10221590 2 4123-2010 1 41232110 2 4123-2140 Fi Third Normal Form Eliminate fields that do not depend on the key. Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider pla cing those fields in a separate table. For example, in an Employee Recruitment table, a candidate's university name an daddress may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no wa y to list universities with no current candidates. Create a separate Universities tabl eand link it to the Candidates table with a university code key. EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you havea Customers table and you want to eliminate all pos sible interfield dependencies, you must create separate tables for cities, ZIP code s, Sales representatives, customer classes, and any other factor that may be dupli cated in multiple records. In theory, normalization is worth pursing. However, man ysmall tables may degrade performance or exceed open file and memory capacit ies it may be more feasible to apply third normal form only to data that changes freq “] Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500 rL uently. If some dependent fields remain, design your application to require the use + to verify all related fields when any one is changed 4. Thitd Normal Form: Eliminate Data Not Dependent On Key In the last example, Adv-Room (the advisors office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute fr om the Students table to the Faculty table, as shown below: Students: Student Advisor 1022 = John 4123 Simon Faculty Roo Name “7 Dept John 412 42 Simo 216 42 n Other Normalization Forms Fourth normal form, also called Boyce Codd Normal Form (BCNF), and fifth norm al form do exist, but are rarely considered in practical design. Disregarding these r ules may result in less than perfect database design, but should not affect functio nality “] T Dr. Mbii Kavindu — Email: Honkavindu@amailcom ~- Phone:0722294481/ 074841 5500