0% found this document useful (0 votes)
99 views14 pages

Chapter 4 Summery

Computer storage media form a storage hierarchy that includes two main categories. Primary storage Can be operated on directly by the computer central processing unit (CPU) Secondary Storage Have a larger capacity, cost less, and provide slower access to data.

Uploaded by

Puja Ranasinghe
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views14 pages

Chapter 4 Summery

Computer storage media form a storage hierarchy that includes two main categories. Primary storage Can be operated on directly by the computer central processing unit (CPU) Secondary Storage Have a larger capacity, cost less, and provide slower access to data.

Uploaded by

Puja Ranasinghe
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Chapter 4 Summery Computer storage media form a storage hierarchy that includes two main categories:

Primary storage Can be operated on directly by the computer central processing unit (CPU) Provides fast access to data Limited storage capacity computer main memory and smaller but faster cache memories

Secondary storage Have a larger capacity, cost less, and provide slower access to data than do primary storage devices Data in secondary storage cannot be processed directly by the CPU magnetic disks, optical disks, and rapes

MEMORY HIERARCHIES AND STORAGE DEVICES 1) Primary Storage Level i) Cache memory Description A static RAM (Random Access Memory) Used by the CPU to speed up execution of programs

ii)

DRAM (Dynamic RAM) Description Cost Drawback Speed

Main work area for the CPU (keeping programs and data) AKA main memory Low cost Volatility Lower speed

2) Secondary Storage level High storage capacity and low cost. i) Magnetic disks Description ii) CD-ROM Description iii) DVD-ROM Description iv) Magnetic Tapes Description Cost v) vi) USB Floppy Discs Archiving and backup storage of data Sequential access devices Least expensive longer access times Mass storage longer access times Hard disk Data stored as magnetized areas

3) Tertiary Storage level i) NASA's EOS (Earth Observation Satellite) system Description Often copied to secondary storage before use Archiving rarely accessed information Useful for extraordinarily large data stores,

Speed

Accessed without human operators Slow

CHARACTERISTICS OF STORAGE

Volatility 1) Non-volatile memory 2) Volatile memory i) Dynamic random-access memory ii) Static random-access memory Mutability Read/write storage or mutable storage Read only storage Slow write, fast read storage Accessibility Random access Sequential access Capacity Raw capacity Memory storage density Performance Latency Throughput Addressability Location-addressable File addressable Content-addressable

STORAGE OF DATABASES Most databases are stored permanently (or persistently) on magnetic disk secondary storage, for the following reasons: Databases are too large to fit entirely in main memory. Permanent loss of stored data arise less frequently for disk secondary storage than for primary storage. Secondary storage devices- Non-volatile storage, whereas main memory is often called volatile storage. The cost of storage per unit of data is an order of magnitude less for disk than for primary storage. Data stored on disk is organized as files of records. Each record is a collection of data values Records are stored to locate them efficiently whenever they are needed. FILE ORGANIZATION TECHNIQUES Three techniques 1) Serial or Heap (unordered) Used for temporary files such as transaction files, dump files Characteristics Insertion Fast: New records added at the end of the file Retrieval Slow: A sequential search is required Update Delete o Slow: Sequential search to find the page Make the update or mark for deletion Re-write the page 2) Sorted Sequential (SAM) Records are recorded in key sequence, but have no index Used for master files in a normal batch processing Characteristics o Older media (cards, tapes) o Records physically ordered by primary key o Use when direct access to individual records is not required o Accessing records Sequential search until record is found o Binary search can speed up access Must know file size and how to determine midpoint Insertion o Slow: Sequential search to find where the record goes Sufficient- space in that page, then rewrite Insufficient - space, move some records to next page If no space there, keep bumping down until space is found o May use an overflow file to decrease time Deletions and Updates o Deletion Slow: Find the record Either mark for deletion or free up the space Rewrite o Updates Slow: Find the record Make the change Rewrite

Indexed Sequential (ISAM) Disk (usually) Access is fast Records physically ordered by primary key Index gives physical location of each record Records accessed sequentially or directly via the index The index is stored in a file and read into memory when the file is opened. Indexes must be maintained Access Given a value for the key o Search the index for the record address o Issue a read instruction for that address o Fast: Possibly just one disk access Inserting o Not very efficient Indexes must be updated o Must locate where the record should go o If there is space, insert the new record and rewrite o If no space, use an overflow area o Periodically merge overflow records into fil Deletion and Updates o Fairly efficient Find the record Make the change or mark for deletion Rewrite Periodically remove records marked for Use ISAM files when o Both sequential and direct access is needed. o Say we have a retail application like Foleys. o Customer balances are updated daily. o Usually sequential access is more efficient for batch updates. o But we may need direct access to answer customer questions about balances

3) Hashed or Direct or Random Randomly organized file contains records stored without regard to the sequence of their control fields. Records are stored in some convenient order establishing a direct link between the key of the record and the physical address of that record

FILE ORGANIZATION AND STORAGE STRUCTURES Primary Storage (Main Memory) o Fast o Volatile o Expensive Secondary Storage (Files in disks or tapes)Non-Volatile PRIMARY FILE ORGANIZATIONS

1) Heap file (unordered file) Simplest and most basic type of organization Places the records on disk in no particular order by appending new records at the end of the file. 2) Sorted file (or sequential file) Keeps the records ordered by the value of a particular field (called the sort key) 3) Hashed file Uses a hash function applied to a particular field (called the hash key) to determine a record's placement on disk

4) B-trees, use tree structures

OTHER PRIMARY FILE ORGANIZATIONS Files of Mixed Records Physical clustering (of object types is used in object DBMSs to store related objects together in a mixed file.)

B-TREES AND OTHER DATA STRUCTURES AS PRIMARY ORGANIZATION Both the record size and the number of records in a file are small, some DBMSs offer this option. Ensure that the tree is always balanced and that the space wasted by deletion, if any, never becomes excessive Are unique Starts with a single root node B+-TREES Dynamic multilevel index use a variation of the B-tree data structure called a B+-tree. Data pointers are stored only at the leaf nodes of the tree Leaf nodes of the W-tree are usually linked together Internal nodes are similar to the other levels of a multilevel index DIFFERENCE BETWEEN B-TREE AND B+-TREE In a B-tree, pointers to data records exist at all levels of the tree. In a B+-tree, all pointers to data records exists at the leaf-level nodes. SECONDARY ORGANIZATION AKA Auxiliary access structure Allows efficient access to the records of a file based on alternate fields Most of these exist as indexes

A read command, the block from disk is copied into the buffer; whereas A write command, the contents of the buffer are copied into the disk block

PLACING FILE RECORDS ON DISK Data is usually stored in the form of records. Each record consists of a collection of o Related data values / items value -one or more bytes and corresponds to a particular field of the record.

ALLOCATING FILE BLOCKS ON DISK 1) Contiguous allocation The file blocks are allocated to consecutive disk blocks.

Makes reading the whole file very fast using double buffering, It makes expanding the file difficult.

2) Linked allocation Each file block contains a pointer to the next file block. This makes it easy to expand the file Its slow to read the whole file High-level programs (DBMS) software programs, access the records by using these commands, These program variables are: Open: Reset: Find Read (or Get): Find Next record-at-a-time operations Delete Modify each operation applies to a single record Insert Close:

OPERATION ON FILES

OPEN: o Readies the file for access, and associates a pointer that will refer to a current file record at each point in time. FIND: o Searches for the first file record that satisfies a certain condition, and makes it the current file record. FINDNEXT: o Searches for the next file record (from the current record) that satisfies a certain condition, and makes it the current file record. READ: o Reads the current file record into a program variable. INSERT: o Inserts a new record into the file, and makes it the current file record DELETE: o Removes the current file record from the file, usually by marking the record to indicate that it is no longer valid. MODIFY: o Changes the values of some fields of the current file record. CLOSE: o Terminates access to the file. REORGANIZE: o Reorganizes the file records. For example, the records marked deleted are physically removed from the file or a new organization of the file records is created. READ_ORDERED: o Read the file blocks in order of a specific field of the file. Read into a single operation, Scan, whose description is as follows: Scan: If the file has just been opened or reset, Scan returns the first record; otherwise it returns the next record. If a condition is specified with the operation, the retuned record is the first or next record satisfying the condition. In database systems, additional set-at-a-time higher-level operations may be applied to a file. Examples of these are as follows:

DIFFERENCE BETWEEN THE TERMS FILE ORGANIZATION AND ACCESS METHOD 1) File organization Organization of the data of a file into records, blocks, and access structures; Includes the way records and blocks are placed on the storage medium and interlinked. 2) Access method Provides a group of operations. Can be applied to a file Can apply several access methods to a file organization.

HEAP FILES (FILES OF UNORDERED RECORDS) Simplest and most basic type of organization New records are inserted at the end of the file Inserting o A new record is very efficient. Searching o Uses linear search through the file block by block-an expensive procedure. Deleting o The record is marked as deleted. Space is reclaimed during periodical re-organization. SORTED FILES (FILES OF ORDERED RECORDS ) Can physically order the records of a file on disk. o ( Done by the values of one of their fields-Ordering field) o Ordering key ordering field is also a key field of the file-a field guaranteed to have a unique value in each record AKA Sequential files. o Insertion Expensive (records must be inserted in the correct order.) o o Search Uses binary search Reading the records in order of the ordering field is quite efficient

Advantages Reading the records in order of the ordering key values becomes extremely efficient, (No sorting is required.) Finding the next record from the current one in order of the ordering key usually requires no additional block accesses,( the next record is in the same block as the current one ,unless the current record is the last one in the block). Using a search condition based on the value of an ordering key field results in faster access when the binary search technique is used, which constitutes an improvement over linear searches, although it is not often used for disk files? A Binary Search for disk files can be done on the blocks (rather than on the records.)

DIRECT OR HASHED ACCESS

A portion of disk space is reserved A hashing algorithm computes record Address HASHED ACCESS CHARACTERISTICS

No indexes to search or maintain Very fast direct access Inefficient sequential access Use when direct access is needed, but sequential access is not Data cannot be sorted easily HASHING TECHNIQUES

Very fast access to records on certain search conditions. Algorithm computes record address. AKA a hash file One of the file fields is designated to be the hash key of the file. Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept for storing such records.

COLLISION RESOLUTION

1. Open addressing: o Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused(empty) position is found. 2. Chaining: o A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. 3. Multiple hashing: o The program applies a second hash function if the first results in a collision. .

There are 2 types of Hashing Internal Hashing External Hashing Hash field The search condition must be an equality condition on a single field Hash key The hash field is also a key field of the file. Search is very efficient on the hash key

DISADVANTAGES OF STATIC HASHING Fixed number of buckets M is a problem if the number of records in the file grows or shrinks.

HASHED FILES LIMITATION Inappropriate for some retrievals Involving ranges of values Based on a field other than the hash field

LINEAR HASHING A dynamic hash table algorithm Allows for the expansion of the hash table one slot at a time. Well suited for interactive applications.

PARALLELIZING DISK ACCESS USING RAID TECHNOLOGY A major advance in secondary storage technology is represented by the development of RAID, which originally stood for Redundant Arrays of Inexpensive Disks mirroring or shadowing Hamming codes INDEXING STRUCTURES FOR FILES Index: o A data structure that allows particular records in a file to be located more quickly Can be Sparse or Dense o Sparse: record for only some of the search key values o Dense: record for every search key value index is usually specified on one field of the file AKA access path on the field.

TYPES OF SINGLE-LEVEL ORDERED INDEXES There are several types of ordered indexes 1) A primary index A n non-dense (sparse) index,( includes an entry for each disk block of the data file and the keys of its anchor record) An ordered file, o Records are fixed length with two fields. o same data type as the ordering key field-called the primary key-of the data file, o A pointer to a disk block (a block address). Needs fewer blocks than does in the data file o There are fewer index entries o Index entry smaller in size Notice that a file can have at most one physical ordering field, so it can have at most one primary index or one clustering index, but not both 2) Clustering index Defined on an ordered data with two fields o same type as the clustering field of the data file o block pointer The data file is ordered on a non-key field Have a distinct value for each record. Includes one index entry for each distinct value of the field; Its non-dense index. Can create a different type of index

It has an entry for every distinct value of the indexing field (a non-key)

3) Secondary index A secondary index provides a secondary means of accessing a file for which some primary access already exists. Can be on a field (which is a candidate key and has a unique value in every record, or a non-key with duplicate values. The index is an ordered file with two fields. o 1st field- Same data type as some non-ordering field of the data o 2nd field- Can be a block pointer or a record pointer. Can be many secondary indexes (and hence, indexing fields) for the same file. Includes one entry for each record in the data file; So it is a dense index. Single-level indexes are ordered file o In this case, the original index file is called the first-level index and the index to the index is called the second level index. o Can repeat the process, creating a third, fourth, ...,top level until all entries of the top level fit in one disk block. A multi-level index can be created for any type of first level index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block. Such a multi-level index is a form of search tree. Insertion and Deletion of new index entries gives a Problem o (every level of the index is an ordered file)

MULTILEVEL INDEXES Most B+-tree data structure ( the insertion and deletion problem) Leaves space in each tree node (disk block) to allow for new index entries Data structure is a variation of search trees Allow efficient insertion and deletion of new search values Each node corresponds to a disk block. Each node is kept between half-full and completely full

DYNAMIC MULTILEVEL INDEXES USING B+- TREES Most multi-level indexes use B+-tree data structure o the insertion and deletion problem This leaves space in each tree node (disk block) to allow for new index entries Data structure is a variation of search trees o Allows efficient insertion and deletion of new search values. Each node corresponds to a disk block. Each node is kept between half-full and completely full An insertion into a node that is not full is quite efficient. If a node is full the insertion causes a split into two nodes. Splitting may propagate to other tree levels A deletion is quite efficient if a node does not become less than half full. If a deletion causes a node to become less than half full, it must be merged with neighboring nodes.

DATABASE TUNING To tune or adjust all aspects of a database design for better performance DATABASE WORKLOADS

For a physical design is arriving at an accurate description of the expected workload. A workload description includes the: o A list of queries (with their frequency, as ratio of all queries / updates). o A list of updates and their frequencies. o Performance goals for each type of query and update.

For each query in the workload; we identify o Which relations are accessed. o Which attributes are retained (in the SELECT clause)

UNDERSTANDING THE WORKLOAD For each query in the workload: o Which relations does it access? o Which attributes are retrieved? o Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? For each update in the workload: o Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? o The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected DECISIONS TO MAKE What indexes should we create? o Which relations should have indexes? o What field(s) should be the search key? o Should we build several indexes? For each index, what kind of an index should it be? o Primary? o Clustered? o Hash/tree? Dynamic/static? o Dense/sparse? Should we make changes to the conceptual schema? o Consider alternative normalized schemas? (Remember, there are many choices in decomposing into BCNF, etc.) o Should we ``undo some decomposition steps and settle for a lower normal form?(De normalization.) o Horizontal partitioning, replication, views One approach: consider the most important queries. Consider the best plan using the current indexes, and see if a better plan is possible with an additional index. If so, create it. Before creating an index, must also consider the impact on up dates in the workload! o Trade-off: indexes can make queries go faster, updates slower. Require disk space, too.

ISSUES TO CONSIDER IN INDEXSELECTION Attributes mentioned in a WHERE clause are candidates for index search keys. Exact match condition suggests hash index. Range query suggests tree index. Clustering useful for range queries, Try to choose indexes that benefit as many queries as possible.

Only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering Multi-attribute search keys should be considered when a WHERE clause contains several conditions. o If range selections are involved, order of attributes should be carefully chosen to match the range ordering. o Such indexes can sometimes enable index-only strategies for important queries

INDEX ONLY PLAN Also a Query evaluation plan Requires to access only the indexes for the data records, (not the data records themselves) Are faster than regular plans If a certain query is executed repeatedly which only require accessing one field it would be an advantage to create a search key on this field to use an index-only plan. A number of queries can be answered without retrieving any tuples from one or more of the relations involved if a suitable index is available

Database design consists of several tasks: o Requirements analysis, o Conceptual design, o Schema refinement, o Physical design and tuning. Indexes must be chosen to speed up important o Index maintenance overhead on updates to key fields. o Choose indexes that can help many queries, if possible. o Build indexes to support index-only strategies o Clustering is an important decision; only one index on a given relation can be clustered! o Order of fields in composite index key can be important Static indexes may have to be periodically re-built. DATABASE TUNING

The process of continuing to revise/adjust the physical database design by monitoring resource utilization as well as internal DBMS processing to reveal bottlenecks such as contention for the same data or devices. Goal: o To make application run faster o To lower the response time of queries/transactions o To improve the overall throughput of transactions

TUNING INDEXES Tuning Indexes Reasons to tuning indexes Certain queries may take too long to run for lack of an index; Certain indexes may not get utilized at all; Certain indexes may be causing excessive overhead because the index is on an attribute that undergoes 2010, University of Colombo School of Computing frequent changes Options to tuning indexes Drop or/and build new indexes

Change a non-clustered index to a clustered index (and vice versa) Rebuilding the index

TUNING QUERIES In some situations involving using of correlated queries, temporaries are useful. The order of tables in the FROM clause may affect the join processing. Some query optimizers perform worse on nested queries compared to their equivalent un-nested counterparts.

A query with multiple selection conditions that are connected via OR may not be prompting the query optimizer to use any index. Such a query may be split up and expressed as a union of queries, each with a condition on an attribute that causes an index to be used. Apply the following transformations NOT condition may be transformed into a positive expression. Embedded SELECT blocks may be replaced by joins. WHERE conditions may be rewritten to utilize the indexes on multiple columns.

TUNING THE CONCEPTUAL SCHEMA The choice of conceptual schema should be guided by the workload, in addition to redundancy issues: o We may settle for a 3NF schema rather than BCNF o Workload may influence the choice we make in decomposing a relation into 3NF or BCNF. o We may further decompose a BCNF schema! o We might de normalize (i.e., undo a decomposition step), or we might add fields to a relation. o We might consider horizontal decompositions. If such changes are made after a database is in use, called schema evolution; might want to mask some of these changes from applications by defining views DEFINITIONS Disk Controller o Embedded in the disk drive, o controls the disk drive and Interfaces it to the computer system Standard interfaces used today for disk drives on PC and workstations -SCSI (Small Computer Storage Interface). Disk blocks o Records are stored. Cluster o Several contiguous blocks,(Place records in to groups) o Units of disk space as defined by a file system like FAT32 or NTFS These units /groups are similar to each other & dissimilar to records in other groups. When accessing they all go together Blocking: o Storing a number of records in one block on the disk OR o Average number of file records stored in a disk block

File A collection of data records with similar characteristics OR A sequence of records, where each record is collection of data values (or data items).

File descriptor (File header)

Information that describes the file (field names and their data types, and the addresses of the file blocks on disk)

Physical Design o Provide good performance Fast response time Minimum disk accesses Physical View o The DBMS must know exact physical location precise physical structure Cylinder Set of corresponding tracks

6 disks (platters); 12 surfaces; 2 outer protected surfaces; 10 inner data surfaces (coated with a magnetic substance to record data ) Data block The smallest unit of data defined within the database. Block size may be defined by the DB_BLOCK_SIZE

RAID (Redundant Arrays of Inexpensive (or Independent) Disks) A data storage system architecture Used commonly in large organizations for better reliability and performance Seek time Average time to move the read-write head to the correct cylinder Rotational delay (LATENCY) Average time for the sector to move under the read-write head Depends on the rpm of the disk Transfer time Time to read a sector and transfer the data to memory Logical Record The data about an entity (a row in a table) Data file: o a file containing the logical records Index file: o a file containing the index records Indexing field: o the field used to order the index records in the index file Physical Record A sector, page or block on the storage medium

You might also like