0% found this document useful (0 votes)
547 views

Merging Files Using Data Structure

This document discusses different techniques for merging files in revision control systems. A two-way merge only considers differences between two files, while a three-way merge also considers their common parent file, making it more accurate. External sorting is used when data does not fit in memory, using a sort-merge strategy to combine sorted subfiles. K-way merging generalizes this to merging multiple files by repeatedly selecting the next smallest element from any list.

Uploaded by

Saddam Hussain
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
547 views

Merging Files Using Data Structure

This document discusses different techniques for merging files in revision control systems. A two-way merge only considers differences between two files, while a three-way merge also considers their common parent file, making it more accurate. External sorting is used when data does not fit in memory, using a sort-merge strategy to combine sorted subfiles. K-way merging generalizes this to merging multiple files by repeatedly selecting the next smallest element from any list.

Uploaded by

Saddam Hussain
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

MERGING FILES

INTRODUCTION

Merging in revision control, is a fundamental operation that reconciles multiple changes made to a revision-controlled collection of files.

Most often, it is necessary when a file is modified by two people on two different computers at the same time.

When two branches are merged, the result is a single collection of files that contains both sets of changes.

External sorting may be used.

EXTERNAL SORTING

External sorting is a term for a class of sorting algorithms that can handle massive amounts of data.

External sorting is required when the data being sorted do not fit into the main memory of a computing device and instead they must reside in the slower external memory.

External sorting typically uses a sort-merge strategy. In the merge phase, the sorted sub files are combined into a single larger file.

MERGE TECHNIQUES

Two Way Merge Three Way Merge K-Way Merge

TWO WAY MERGE


A two-way merge performs an automated difference analysis between a file 'A' and a file 'B'.

This method considers the differences between the two files alone to conduct the merge and makes a "bestguess" analysis to generate the resulting merge.

TWO WAY MERGE

ALGORITHM

DISADVANTAGE OF TWO WAY MERGE


This type of merge is usually the most error prone. Requires user intervention to verify and sometimes correct the result of the merge.

THREE WAY MERGE


A three-way merge is performed after an automated difference analysis between a file 'A' and a file 'B' while also considering the origin, or parent, of both files.

This type of merge is more likely to be usable in revision control systems, which can guarantee that such a parent exists and is known.

The merge tool examines the differences and patterns appearing in the changes between both files as well as the parent.

THREE WAY MERGE

ADVANTAGES OF THREE WAY MERGE


This merge is the most reliable and has performed well in practice.

It has also required the least amount of user intervention. In many cases, requiring no intervention at all making the process eligible for task automation.

K-WAY MERGE ALGORITHM


Let there be two arrays: An array of k lists and An array of k index values corresponding to the current element in each of the k lists, respectively. Main loop of the K-Way Merge algorithm: Find the index of the minimum current item, minItem Process minItem(output it to the output list) For i=0 until i=k-1 (in increments of 1) yIf the current item of list i is equal to minItem then advance list i. Go back to the first step.

K-WAY MERGE

Tournament sort is used.

PERFORMANCE FACTORS

The number of records to be sorted. The size of the records. The number of storage devices used. The distribution of those devices on the available I/O channels.

The distribution of key values in the input files.

THANK YOU

You might also like