The University of Western Australia School of Computer Science & Software Engineering
The University of Western Australia School of Computer Science & Software Engineering
In this lab, you will write a program that creates an index for a text document. The index will list
every word that occurs in the document, together with the number of every line on which the word
appears. The words in the index will be sorted alphabetically, the line numbers listed for each
word will be in ascending order, and there will be no duplicated entries.
This lab uses some features that we have not seen in lectures yet – polymorphism and type
abbreviations. This code is in the starting point, however, so you not need to know how to write
such code, just to make use of the provided definitions, which should be easy.
Make sure that you have the following files, which are available via the unit web page:
Lab2index.fs contains type declarations and some trivial functions
inp contains an example text file
inp.index contains the index produced for inp
Compare inp and inp.index to make sure that you understand the problem specification – put inp in
an appropriate folder somewhere to use for testing once your program is complete.
Create a new project, and add the code Lab2index.fs, then add the functions below. The types for
each function include type abbreviations – these are in the starting point, and you should use these
types to guide you in writing your program.
1. Define a function
numberLines : fileContents -> (line * lineNumber) list
that takes the contents of a file and returns a list containing the lines from the file, each paired
with its line number. E.g.,
numberLines "a zz\r\nb yy\r\ncx d\r\n\r\nd zz d"
// yields: [ ("a zz",1); ("b yy",2); ("cx d",3); ("",4); ("d zz d",5) ]
(Hint: use lines and zip.)
[Note: “\r\n” is an end of line under Windows. However, beyond test cases like this it is
generally better to use Environment.NewLine, so that code will work with other operating
systems.]
that takes the contents of a file and returns a list containing the words of the file, each paired
with its line number. E.g.,
numberWords "a zz\r\nb yy\r\ncx d\r\n\r\nd zz d\r\n"
that takes a list of pairs xys and returns the list of distinct first fields in xys. xys is assumed
to be sorted. E.g.,
Here the 'a is a type variable, and means that the function can be used for every type obtained
by replacing 'a by some type. This function will remove duplicates from a list, as long as the
duplicates are grouped together – which they will be if the list is sorted.
that takes a value x and a list of pairs xys and returns the list of distinct second fields
associated with x in xys. x is assumed to occur on xys. E.g.,
snds "d" [("a",1); ("b",2); ("cx",3); ("d",3); ("d",5);
("d",5); ("yy",2); ("zz",1); ("zz",5)]
// yields: [3; 5]
that takes a list of words paired with individual line numbers wis and returns an index, i.e. a
list of words each paired with a list of line numbers. wis is assumed to be sorted. E.g.,
combineWords [("a",1); ("b",2); ("cx",3); ("d",3); ("d",5); ("yy",2); ("zz",1); ("zz",5)]
that takes the contents of a file and returns an index for the file. See inp and inp.index for
examples.
7. Test main by giving it the full path to a copy of inp, and check the index that it writes to
inp.index.