Merging Files Using Data Structure
Merging Files Using Data Structure
INTRODUCTION
Merging in revision control, is a fundamental operation that reconciles multiple changes made to a revision-controlled collection of files.
Most often, it is necessary when a file is modified by two people on two different computers at the same time.
When two branches are merged, the result is a single collection of files that contains both sets of changes.
EXTERNAL SORTING
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data.
External sorting is required when the data being sorted do not fit into the main memory of a computing device and instead they must reside in the slower external memory.
External sorting typically uses a sort-merge strategy. In the merge phase, the sorted sub files are combined into a single larger file.
MERGE TECHNIQUES
A two-way merge performs an automated difference analysis between a file 'A' and a file 'B'.
This method considers the differences between the two files alone to conduct the merge and makes a "bestguess" analysis to generate the resulting merge.
ALGORITHM
This type of merge is usually the most error prone. Requires user intervention to verify and sometimes correct the result of the merge.
A three-way merge is performed after an automated difference analysis between a file 'A' and a file 'B' while also considering the origin, or parent, of both files.
This type of merge is more likely to be usable in revision control systems, which can guarantee that such a parent exists and is known.
The merge tool examines the differences and patterns appearing in the changes between both files as well as the parent.
This merge is the most reliable and has performed well in practice.
It has also required the least amount of user intervention. In many cases, requiring no intervention at all making the process eligible for task automation.
K-WAY MERGE
PERFORMANCE FACTORS
The number of records to be sorted. The size of the records. The number of storage devices used. The distribution of those devices on the available I/O channels.
THANK YOU