0% found this document useful (0 votes)
56 views26 pages

Unit - 2 Adbms

Uploaded by

Krushna Khedkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
56 views26 pages

Unit - 2 Adbms

Uploaded by

Krushna Khedkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 26
Parallel Databases : Lemma Architectures alll s, Pare ry evaluation, Pat tures for parallel databases, Parallel quer s ST hb rallelizing individual operations Paral optimizations 3.1_ Parallel Databases System L 2 Introduction A parallel database improves data processing speed by using multiple resources (CPU and Dist in paral Prove speed of operation and therefore sui a 'b. Parallel operations are becoming increasingly common to imy Parallel databases is becoming more important. loading, Query evaluation etc. f. We can use thousands of small Processors for making a parallel machine. Goals of parallel databases 2. Improved performance b. Increased availability "site containing a relation (table in database) isnot available, then the elation continues tobe al from another site which has a Copy of that data, An on tic ° *genzation may need to access data which belongs to different sites, as it may be possible © ‘multiple branches of a company. ; Eee ear ete Cee HEHE PEE ’ - We ADeMs (SPU) ‘Inter communication network Fig. 3.1.1: Parallel database system 3.2 System Parameters / Measures of Performance @. Explain various ‘system parameters of Parallel databases, Q. What is importance of the terms ‘Speed Up’ while applying parallelism in Case of parallel database ? ‘What are the factors that diminish both “Speed up’ and ‘Scale. up"? 1. Throughput - output efficiency : i , response time. Amount of time taken to complete a Single task from the time itis submitted is called as r 3H i it into number of smal b. We can improve Tesponse time by processing large transaction by subdividing it in transactions and by parallelizin, 3 Speed up : 1g their execution which also improves their throughput. yrallelism. % Speed up is defined as Funning task in less time by increasing degree of pa b cessor and dist) , of resources (Proc Time required for processing task is inversely Proportional to number fate available, WE e er of resources (Processors and disk) No, ‘uired for execution of task will be less. hy, 2 T/T = Time require Time required to © 1d to execute task of size Q ecute task of size N*Q where, he special cases N, If new system has N times more resources than that gy, Linear Spee smaller syster™) sub near speed uP £15 dup :1f speed UP F Ny, (Speed UP = peed up is less than N Linear speed-up (deal) Sublinear speed-up Number of transactions per second Number of CPUs Fig. 3.2.1 : Speed up 4, Scale up: Scale up is defined as handling a larger task in same amount of time by increasing degree of parallelism Scale up relates to the ability to process larger task in same amount of time by providing more resources Formula : Let Q be the task and Qy is a task N times bigger than Q. ,: Execution time of task Q on smaller machine M; T,: Execution time of task Q, on larger machine M, SealeUp = Ts/T. Special cases : Linear Scale up:T, = T. Sub Linear Scaleup:T; < Ty Unear speed-up per second ‘Subinear scale-up Number of transactions Number of GPUs or database size Fig. 3.2.2: Seale up ge an8Ms PPO) 3-4 Parallel Databases io Gio of task depends on database size and rte of submission of transaction for processing, cture of Parallel Databases 33__Are { ._ What are the. main architectures of used for building parallel databases? Give advantages and disadvantages. 33.1 Introdu paralelism in databases represents one ofthe most sucessful instances of parallel computing system, 3.3.2 Types 7. Shared Memory System 2. Shared Disk System 3, Shared Nothing Disk System 3,3.2(A) Shared Memory System a. Architecture details Multiple CPUs are attached to a common global shared memory via interconnection network or ‘communication bus. «shared memory architectures usually have large memory caches at each processor, so that referencing ofthe shared memory is avoided whenever possible. «Moreover, caches need to be coherent. That means if a processor performs a write to a memory location, the data in that memory location should be either updated or removed. Fig, 3.3.1: Shared memory system architecture b. Advantages * Efficient communication between processors. «Data can be accessed by any processor without being moved from one place to other. * Aprocessor can send messages to other processors much faster using memory writes. © Disadvantages * Bandwidth problem. to. bottleneck into Not scalable beyond 32 or 64 processors, since the bus or interconnection network wil 9 ‘crease waiting time of processors. More number of processors cal ZS Perales ADBMS (SPPU) fe .2(B) Shared Disk System a. Architecture details ; isk directly via inter communication network. But, eve ‘+ Multiple processors can access all di "Y Broce, i memory, 4 Shared disk has two advantages over shared memory. Each processor has its own memory; the memory bus is not a bottleneck system offers a simple way to provide a degree of fault tolerance. The systems built around this architecture are called clusters. b. Advantages ‘+ Each CPU or processor has its own local memory, so the memory bus will not face bottleneck. * High degree of fault tolerance is achieved. * Fault tolerance : Ifa processor (or its memory) fails, the other processor can take over its tasks, since te database is present on and the disks are accessible from all processors. * Tone processor fails, other processors can take over its tasks, since database is on shared disk that ont accessible from all processors. © Disadvantages + Some memory load is added to each processor. * Limited scalability : Not scalable beyond certain point. The shared-disk architecture faces this prove? because large amounts of data are shipped through the interconnection network. So now the interconnect? to the disk subsystem is a bottleneck. * The basic problem with the shared-memory and shared-disk architecture is interference. As more CPUS * added, and existing CPUs are slowed down because of the increased contention for memory access network bandwidth, Applications “ the * Digital Equipment Corporation (DEC) : DEC cluster running Relational Databases were one of ‘commercial user of shared disk database architecture. Now this is owned by Oracle ye ‘S ADBMS (SPUD 36 Parallel Databases note: Tis observation has motivated the development of the shared nothing architecture, wii considered to be the best architecture for large parallel database systems. CP 1 ow wily So 8 a 43,3.2(C) Shared Nothing Disk System a. Architecture details ‘+ Each processor has its own local memory and local disk, + Aprocessor at one node may communicate with another processor using high speed communication mete + Any terminal can act as a Node which functions as Server for data that is stored on local disk + Moreover, the interconnection networks for shared nothing systems are Usually designed to be scalable, so that we can increase transmission ‘capacity as more nodes are added to the network. Interconnection Notwork Me Local Memory BB testean ne Shared memory Fig. 3.3.3 : Shared nothing architecture b, Advantages * This architecture overcomes the disadvantage of Tequiting all 1/0 to go through a single interconnection » the network. Only queries which access non local disk can pass through is the network * High degree of parallelism is achieved .ie, number of CPU and disk can be connected as. desired. n be * Shared nothing architecture systems are more scalable and can easily support a large number of processors. © Disadvantages ‘* Cost of communication and of non local disk access is higher than that of other two architectures since sending data involves software interaction at both ends. slem * Requires rigid data partitioning tion 4. Applications * The tera data database machine uses shared nothing database architecture. e a ; * Grace and the Gamma research prototypes an 33.2(D) Hierarchical System a Archi : Architecture details shared disk and shred arly . is remory, The hierarchical architecture comes with combined characteristics of shared ms We ADBMS (SPPU) 37 Pola + Atte top level, the system consists of nodes connected by an interconnection network and they gy : disks or memory with one another. a + This architecture attempts to reduce the complexity of programming. Such systems yield distributeg memory architectures, where logically there is a single shared memory. The memory map, : Y Mapping haa coupled with system software. Allows each processor to view the disjoint memories as a single Virtual + The hierarchical architecture is also referred to as Non-Uniform Memory Architecture (NUMA), me 3.4 Parallel Query Evaluation Q.__Describe query evaluation process in parallel databases. 3.4.1 Introduction + _ Incase of parallel databases, query can be evaluated in the following two ways. 3.4.2. Types 1 Inter Query Parallelism Intra Query Parallelism j 3.4.2(A) Inter Query Parallelism a, In this case multiple queries are running simultaneously on multiple processors to reduce the time taken forts query evaluation. b. Tfoutput of one query consumes the output of second query then this system is called as pipelined paralleisn © If 10 there are queries in which each query takes 2 seconds, then it would require 20 seconds to execute all que But if we use Inter query parallelism, these queries will take only 2 seconds to execute, thereby saving time. d. We can improve throughput for large number of transat ns by inter query parallelism. @.Itis difficult to achieve Inter query parallelism because itis difficult to identify in advance which query should concurrently. Select * From Employee > > — Query 1 Processor 1 : a a Ss Query2 Processor 2 Parallel Dalaba 5 5 Soloct* From Sales 3 Queryn Processor Mace ry rocessorn a” Fig. 3.4.1 : Inter query parallelism . a4 inthis case @ query is divided in sub queries which will run simultaneously on multiple processors so as to ve to reduce the time taken for query execution. a his approach is aso called as a partitioned parallel evaluation Hf Query contains 10 transactions in which each transaction takes 2 seconds, then it would take 20 seconds feconds for executing all transactions. But if we use Intra query parallelism these queries will take only 2 seconds to execut parallel. thereby saving time wy We can improve response time for large transactions by intra query parallelism, Fig. 3.4.2 : Intra query parallelism ‘e. When to use intra query parallelism, () Executing different operations present in query evaluation plan. (i) Executing each operation with help of parallel processing, 3. Implementation a. Shared nothing system can be used successfully to implement the parallel query system. b. The main goal is to minimize data shipping by partitioning the data. 3.5 _ Parallel Query Optimization 1) Parallel query optimization is an essential for many DBMS in which best query execution plan, out of multiple alternative query execution plans is to be identified for solving a query efficiently in parallel database environment. 2) Query optimization is nothing but selecting the efficient query execution plan among the available query plans for solving a given query. 3) The query plan is constructed based on multiple cost factors. not always low cost plans 4) Parallel query optimization tries to select query plan which is lower processing cost (but are selected as other factors are also included in query evaluation). minimize the cost of 4Ue"Y 5) Query optimization come into picture when we want to develop a good system to evaluation. - yn and the amount of time sunt of time spent to find out the best pla 8) There is a trade-off between the amor as 1g these two factors: equired for running the plan. Different DBMS have different ways for balancin 3.5.1 Goals of Query Optimization . wanted tuples, oF rows om ge ®) Eliminate all unwanted data : Query optimization tries to eliminate un 3-9 Paralk % PU) 1 Apems (sPPu) —— find out query which gives result very tt en eae b) Speed up queries : Query optimization try to fin “ databases. i ral simpl 1 ery performance : Break up the single complex query into several simple query which ©) Increase qu : Mila, performance of execution of query. , i Fora single query we will have multiple p d) Select best query plan out of alternative query plan seas le pl mt query. Selecting best plan out of all is the main goal of query optimization. 3.5.2 Approaches of Query Optimization * Use index © Using an index is strategy that can used to speed up a query evaluation, © This strategy is very important for query optimization. * Aggregate table © Tables which stores only higher level data so fewer amounts of data need to be parsed, © So, query evaluation becomes faster. Vertical partitioning © Slicing the table vertically using columns. © Vertical partitioning will decrease the amount of data required for query processing. Horizontal partitioning © Partition the table by data value or by row wise. © This method will decrease the amount of data query needs to process. * De-normalization © The process of combining multiple tables into a single table is called as de-normalization. This speeds up query performance because less number of joins is required. © Server tuning ° © Each server has its own parameters and often tuning server parameters so that it can fully take advantai! the hardware resources and significantly speed up the query performance. 3.5.3 Traditional Query Optimizers Query ‘Query optimizer Evaluation plan Query plan evaluator Fig. 3.5.1 : Query parsing, optimization and execution GA: 3-10 Parallel Databases SS (SPE Ses tion is one of the most important tasks of a relational DBMS, imation yery opti Cg inter generates ternative plans and chooses the best pan with the let estimated com, ti se query oF a 3 4 3D a zr is responsible or identifying a best execution plan for evaluating the query, optim cae generates multiple query plans and selects the plan wit the least estimated cost The optimizer ate the cost of every plan, the optimizer uses system catalog, To estim: system catalog contains the information needed by the optimizer to choose between alternate plans for a The system given query: evaluation plan can have a remarkable impact on query execution time. 7D Query 3.54 Parallel Query Optimizers 2. Two phase parallel query optimizer © This is simplified optimization technique. This architecture is well suited for shared memory architecture, Parallel resource allocation ~ Allocation of the | __, processors and doin ordering ~Join ordering Step 1 Join ordering * This stage is uni-processor optimizer stage it optimizes the j ‘The Query tree is formulated which will give you technique to solve query. Step 2: Parallel resource allocation * These query trees will split into task which co uld execute with help of parallel resources, in order share the task by ‘multiple available resources, “Processor and Disk should operate as close as Possible to avail its full utilization, *Pynamic parallel allocation algorithms are used for distributing new task. 3.6 Virtualization in Multicore Processors 1. Introduction The processing power of a computer can be improved by adding more CPUs. 1 we increase number of CPUs i 'mprove its performance, . 4 than it will in host machine which is running multiple parallel machines ¢ jtual machines. Multicore processors will solve many complicated processing problems of handling multiple vt 1 ADBMS (SPPU) 2. Working Today, multi-core processor or hyper-threaded archi being manufactured, * One processor should be always free, Configuring CPU allocation settings properly, * tis designed for Heavy load Processing, 3.7 Data Partitioning ~ Parallel Query Optimization 3.7.1 Introduction 9) Partitioning a large dataset hy orzontaly across several disks enables us {0 increase the bandwidth o re writing on the disks in parallel ®) Round Robin Partitioning 5) Hash Partitioning c) Range Partitioning 3.7.2(A) Round Robin Partitioning a) Working * Ithere are n Disks, then the i tuple is assigned to Disk D a) 6) D = imodn ' 9 D= Disk Number (0 - 4) a Record Number (0-14) — total number of disk = 5 (Record 0 goes to disk 0) D=1%5~ Record 1.g0es to disk 1) 2 = 13 %5 = 3 (Record 13 goes to disk 3) D= 14% 5 =4 (Record 14 goes to disk 4) [Record -1q) [Record [Recon a) [Foca 7 [Record Fecorts| [Record 2] |Record-7] |Recont-o] [Reooaa Record = 9] [Record] [Focord-2] |Roconta] tRoseaa Disko Disk2 —Dsk2 sk 3k a Fig. 3.7.1: Round Robin Partitioning b) Example Numberof disk = 5 Number of tuple in relation = 25 (0-14) ©) Efficiency level *__TheRortioning suitable for evaluating the ques that acces the entre lon ab seu, 3.7.2(8) Hash Partitioning a) Incase of Hash partitioning, «hash function i applied to selected felds ofa tpl to deteine sdk runberia which it wll be placed. ing and : 1) Ifhash function retums i then the tuple is placed on disk D, This partitioning i suitable for partitioning fields ofa relation based on some hash function. 4) Example Hash Function for Partition 1-> 2N +1 Hash Function for Partition 2» 2N + Where N = 0,2, 2,3, E ns 2 ae ° Parton 4 Parton 2 (ako) (Oisk 1) Fig. 3.7.2: Hash partitioning 372 it (C)_ Range Partitioning Ries : 7 tains roughly equal 3) Tuples are stored logically as per sort key values so that each range conta yous ube tS 5) Tuples are stored logically as per sort key values so that each range contains rovs) 9) Tuples in range iis assigned to processor i invalues- tuple wit 4s numbered 0,1, 2,3 and 4 may assign tuple ing with 5 — & es vee ® y disk 3 Between 40 10 60 “Ss = disk 4 ‘Above 60 dis eid [EName] Ed [Nam] =a] | | [eaTeNome Ea [EN = at fs [ENan es [EN sa [ven aye Sy T] A sy x ey 2[ 8 10 Disk 3 Diek a Disk 2 Disks e 23g Bilan wiki co Econ dy aes) < Fig, 3.7.3 : Range partitioning 3.7.3 Comparison 8) Hash partitioning and range partitioning are better than round-robin partitioning because they enable us toa only those disks that contain matching tuples. ©) Hrange selections such as 18 < age < 65 are specified, range partitioning is superior to hash partitioning eas ‘ualfying tuples are likely o be clustered together on a few processors. 9 Sometimes, range partitioning leads to data skew when there is partition with widely varying number ‘cross partitions or disks ©) Data Skew causes processors needs to deal with large partitions to become performance bottlenecks. ©) Hash partitioning has the additional Property that it keeps data evenly distributed even if the data gras shrinks over time. 9) Toeduce skew in range partitioning, effective approach is to take samples from each processor, collect ands! ‘samples, and divide the sorted. set of. samples into equal sized subsets. 3.8 Parallelizing Individual Operations Shared nothing architecture gives us complete parallelization of horizontally partitioned across multiple disk a" Operations. We assume that each although this partitioning may or may not be appropriate for a" 3.8.1 Bulk Scanning 2 The scanning is nothing but reading process in whic va 2) _The records in relation (or rows in tables) ) can be read in parallel, ae 3.8 » 2 3) 4) 5) » 2a 3) 4 Peer Va ause uples rt all on ‘ge anes (SPP) 2 4 5) 38.2 Bulk Loading v 2 3) 4) 3.83 Sorting y 2 3) 4 rahe elation (able Is partitioned aeron 7 hich meets selection efi edie! ‘i criteria WHERE condition ne" UES together to-make 2 single Fesultreation abig, A" OS™ lation Table, re same idea can be applied witen retrieving al ing all tuples, if hashing oF range partitioning is used, selection, ‘queries can be answered by going t0 just those processors that. 5 that contain rel evan tuples, This operates in similar way as that of bulk scanning, sr user wants to insert (Load) bulk of data at one go into data ‘The ORACLE SQL* bt aeneatn ace QL" bulk Loaders facility that allows you to popuk fuk oading comes in picture pula database tables fom ‘ es om fates, fa relation has associated indexes, data, loading of data entries are required fr ited for building the ind indexes on newly loaded Bulk loading can also be done in parallel on multiple relat ple relations 3.8.2 : Bulk Loading ofthe table that is on is local disk and to then Sorting can be done by allowing all computers to sort the Part merge these sorted sets of tuples in table. mig ely to be upto certain iit inthe ering phase ‘The degree of parallelis ion for better results Use ange partitioning to redistribute al tuples in the relat rom 3000 to 21000, and we have Example ales range tf we want to sort collection of employe tes BY SHA sey 20 processors. 1009 to 2000. resin the range 1 1000 to 3000 and 50 OP using some diferent PES ‘of sorting algorthe. First Processor : Sorts the salary Valu sso cach processor then sons the tues asaned were 3.5 P, WE Apams (sPPU) He, Example A processor can collect tuples until its memory is full, and then sort these tuples, until ai ince ‘OMing tay been sorted on the local disk. 5) Approach 1 © We can use range partitioning to obtain a sample of the entire relation by t 19 samples Heath, that initially contains part of the relation, ( © — The (relatively small) sample is sorted and used. * This set of range values is also called as spliting vector, then used to partition the relatong bast range values. 6) To increase speed of the proess data entries must be sorted. 3.8.4 Joins 1) The main aim behind parallelization is to illustrate the use of the merge and split operators, 2) Parallel hash join is widely used type of join. t gives idea how sort-merge join can be parallelized, 3) Other join algorithms can be parallelized as wel, although not as effectively as these two algorithms. 4) Example Suppose that we want to join two relations, say, R1 and R2, on the salary attribute, We assume that they are initially distributed across several disks in some way that is not useful ft) operation, that is, the initial partitioning is not based on the join attribute. ‘The basic idea for joining RI and R2 in parallel is to decompose the Join into a collection of k smaller jon We can decompose the join by partitioning both Ri and R2 into a collection of k logical partitions. By using the same partitioning function for bot th RI and R2, we ensure that the union of the k smi computes the join of Ri and R2, 5) _Ifrange partitioning is used, the algo : rl rithm leads to a parallel version of sort merger join, with the adver the output is available in sorted order. 9) hash partitioning is used, we obtain a Parallel version of a hash joi Review Questions 0-11 What are the main architectures of Used for bulding parallel databases? Give advantages and disadvan 9-2 Erplinvaous system parameters of para database Q@3 What is importance of the te ic i terms ‘Speed up’ whil it 4 Q.4 Describe. ‘query evaluation Process in paralie| databases, @.5 — Whatare the ‘main architecture ; it res Of used for, builk ire a ng Paraliol databases? Give advantages and disatva™ lodels fo 2.7 Sahvaeussompennn " Parallel databases ? Explain, 0.10 a.12 0.13 a4 0.15 0.16 Parallel Databases Deserbe Query evaluation process in parallel databases Explain following operations with example, i) Bukloading i) Bulk scanning 1p one iW) Sorting Describe different architecture for Parallel Database Describe the steps used to perform JOINS in a Parallel Database, Explain Inter Query and Intra Query parallelism in parallel databases, Explain various system parameters of parallel databases, ggg Distributed Databases 3 Distributed DBMS, Distributed catalog management, Dist Distributed DBMS architectures, storing data in Distributed Concurrency control and Reco, ibuted data, Distributed transactions, Query processing, Updating distri 4.1 _ Distributed Databases System Concepts 1. Data in a distributed database system is stored across various sites in which every site is managed by ts DBMS that can run independently on that site. 2. Adistributed database is a collection of many logically related databases distributed over a computer network ‘A distributed database management system (DDBMS) manages a distributed database. 4, It is used for organizational for decentralization which required for multiple branches and also offers econeris processing at greater speed. 5, The main aim of distributed database system is to introduce all advantages of distributed computing syster! DBMS. Fig. 4.1.1: Shared nothing architecture of ited database distribut | = BMS (SPPU) Distributed Databases - f yee on this type of architecture is also calieg and n Databasi "© server which operat rates a single site. 2s share, "A nothing architect tecture, (UNFS) ie. local site of Mumbai, Py ‘system (CNFS) which i, Pune ete ‘S) which store ystem Stores all data of local network fil network file syster LNFS2 (Pune) rs 3 Fig. 4.1.2: C ‘entralized database architecture of distribut tributed database 8. A tre distributed 5 ri ystem have own database at ee each of site and communicate with wmunicate with each other th rough Fig. 4.1.3 : Distributed database system 4.1.1 Features of Distributed Computing System can be solved efficiently by partitioning I Inte smaller simple fragt ynents and than solve 1. Complex problems independently on different sites. ‘As we are using multiple computers for Vind ® complex problem hence mare computing Power generated at ndently. comparatively lower cost. an be managed indepe re autonomous and ct ‘and scalability of system 6 increased. Individual processing elements 2 stributed technology ell ptralized database 9542" 4. Asa result of dis ited computing into distributed feel that he is working in ® single antages of distribu 5. User in this system gets to bring all adve The main goal of distributed databases 5 database system. Distributed p, ‘DBMS (SPPU) 4 4. Advantages of Distributed Database System ious reasons ranging lik Distributed database management has been introduced for various ging like Oa, decentralization or data processing at lower cost. nsparen 1. Management of distributed data with different levels of transparency A DBMS should hide the detail of where each data item (like tables or relations) is physically storeq witha system. 8) Distribution / Network transparency ‘may be dim © In this all-internal network operations are hidden from the user. Network may be divided into bea, transparency and naming transparency. + Location transparency refers to a task performed by user is independent of the location of data an), location of the system. + Naming transparency states that once a name is specified, these objects can be accessed unambiguous), Server 3 (Mahim) (User unaware about data comes from which server) Fig. 4.1.4 : Network transparency b) Fragmentation transparency Replication transparency + Reliability is defined as the probability that a system is running (working) at a certain points of time. + In case of distributed database systems, if one system fails to work or down for some time other system can, take over its all functions hence overall system does not affected. Sener 1 (System Down) Server 2 (Network Problem) Server 4 fi bast Fig. 4.1.7: Reliability and availablity in distributed database . oe he ig continuously available (accessible) during a certain ability that the system is is the probabil * Availability is defined a ions. r time interval for any database opera if * Incase of distributed database systems ays availabl take over its all functions hence syste always : ig not available for some time then other system can cone system fe 45 WH ApeMs (sppu) Distributeg Data, : + A system which is working or keep on performing operations even in case of server failure g, uch . -essible to all its client on network is called availability, ty reliable while some system is acc 4. Improved performance A distributed DBMS fragments the databast andy to Keep data closer tothe site wher itis need the time for operations. a + Data localization reduces the conflict for CPU and /O services and simultaneously reduces acces 4 involved in wide area networks. “, * AS a large database is cstbuted over multiple sites. Hence, smaller databases exist at each stg “ simple to handle and maintain. * Therefore, local queries and transactions accessing data ata single site have better performance bec the smaller size of local databases. 3 Interquery and intraquery parallelism can be achieved efficiently by executing multiple queries att sites, or by breaking up a query into a number of small sub queries that execute in parallel 10 give fase results, User 1 (Mumbai) User 2 (Pune) Usern (Goa) Fig. 4.1.8: Data localization (increases Performance of system) 5. Ease for expansion Expansion of the system in t i ae y's In terms of adding more data, increasing database sizes, or adding more proces 41.3 Parallel v/s Distributed System 1, Introduction Moug PU) ap5Ms a Distributed Databases gly coupled architecture Muti inthis system Multiple processors share secondary (Disk storage) and primary storage and also share primary memory toosely coupled architecture 4. Many processors share secondary (disk) storage but each has their individual primary memory also shared nothing architecture as described above resembles a distributed database computing environment but iain differences exist in the operations it performs, In shared nothing multiprocessor systems, (parallel databases), homogeneity is their in various nodes «Incase of the distributed database environment there is heterogeneity of hardware and operating system is present at each node | 42 _ Types of Distributed Databases 1, Homogeneous distributed database system « _ Ifall data servers to which data is distributed are having same DBMS software then this system is called as a homogeneous distributed database system. «These systems are easy to handle good performance and good data access speed. Fig. 42.1: Homogeneous distributed database 2. Heterogeneous distributed database system der the conto of diferent type of DBMS systems and are connected to © i iter vers are running i! : selene veh system is called as a heterogeneous dstrbuted database system, enable access to data from diffrent sites, also referred to as a multidatabase syste™ ems we have to have well For constructing heterogeneous 5YS" accepted standards for gateway protocols. 47 bj WW ADeMs (sPPU) buted p, ‘A gutewiy protocol is an/APS th used when we exposes DBMS funtionaty toll other extemal pag jateway protocol is an cat {zamples connections using ODBC and JOBC are accessing databace servers through gateway prota, + System comes at economic cost in terms of performance, software complerity, and administration Giff ny Sy Fig. 42. Heterogeneous distributed database 4.3 Distributed DBMS Architectures * _ There are three alternative approaches to separating functionality across different DBMS-related process 1. Client-Server Systems 2. Collaborating Server Systems 3. Middleware Systems 1 Client-Server ‘Systems 2) 4 Client-Server system has number of dlients and some servers, a client Process can send a query to any 7 of server process and server will manage to solve that query and replies with some result. b) Clients responsibility + User interface issues ©) Servers responsibility * Servers manage data and execute transactions. 4) Advantages * tis relatively simple to implement due to centralized server system. st * Expensive Server machines are utilized by avoiding dull user interactions, which are now releg inexpensive client machines. ee eae Request for soy oO rn as * Proidosseveganag! Clients, Server Fig. 4.3.1: Typical client server system #) While writing Client-Server applicatio server and to keep the communicato 2 Collaborating Server Systems ns its important to remember the bo between them as simple as possible, a) lient-: b) d) 2) as well as local processing costs > atinu alqilligs Client Frotur resto cent Fig; 4.3.2: Collaborating Server Systems =, 7 Syst ns | ia eae jiowa single query to execute on multiple servers without help of any igned to allo a) The Middleware system is.desi ss cannot be extended basic capabil database server. legacy systems, whose . ple sever | b) gives simplicity to integrate mS Database server (handles multiole servers) Server n Fig. 4.3.3 : Middleware System We need just a single database server that is capable of managing queries and transaction fea, % oO servers, And all other servers only need to handle local queries and transactions. 4d) The software which helps the execution of queries and transactions across one or more then one, database servers, such software is often called middleware. @) The middleware layer is capable of executing operations like joins and relational operations on dias from the many servers, but generally do not maintain any data by its own. 4.4 Data Fragmentation — Data Storage in Distributed databases 4.4.1 Introduction a) The process of decomposing the database into smaller multiple units called as fragments. b) These fragments may be stored at various sites, is called data fragmentation. ©) Completeness constraint : The most important condition of data fragmentation process is that it must be complete ie. once ad fragmented, it must be always possible to reconstruct the original database from the fragments. iment 10. [_Employee_Id a] Employee_salary |] Salary 1] Department id Demme Horizontal A001 i — fragmentation { ‘A002 | we Deparment 160 Ara lee 4 fragmentation | | AB321 ad Employee details vertical Department detals fragment, vertical fragmentation Fig. 4.4.1: Horizontal and Vertical Fragmentation ——4 a 342 Fragmentation Schema Distributed Databases Its 9 definition of set of fragments that includes al attibuted and tuples in th condition that the whole database can be reconstructed from the fragments by a database operations. Types of Data Fragmentation se apBMs (SPPU) 410 | Gatabase and satisfies the PPIYINg some sequence of 443 ‘@) Horizontal data fragmentation b) Vertical data fragmentation ©)__Mixed data fragment: 4.4.3(A) Horizontal Fragmentation 1) Introduction 2) Horizontal fragmentation divides a relation horizontally into group of rows (tuple) to create subsets of tuples specified by a condition on one or more attibutes of relation, the relation, 2) Overview 2) Horizontal fragmentation is group of rows in relation, 5) Horizontal fragments are specified by the ‘SELECT’ operation of the relational algebra on single or multiple attributes. ©) Example Select all students of computer branch, branch = cow (Students) ) Types a) Primary horizontal fragmentation ‘+ Primary horizontal fragmentation is the fragmentation of primary relation, * Relation on which other relations are dependent using foreign key is called as primary relation, + Example Partition 1: All employees belong to department number 10. Ry © 6 pept «10 (EMP) Partition 2 : All employees belong to department number 20. Ry € 0 ogt-20 (EMP) b) Derived horizontal fragmentation oP te i mary relation introduces Derived horizontal fragmentation of other Horizontal fragmentation of a primary secondary relations that are dependent on primary relations. mentation is called as Derived horizontal fragmentation. * The fragmentation on some other fra —_

You might also like