CIT 383 / Lab 4: Description
CIT 383 / Lab 4: Description
Description
Program at the end, make sure you name it lab4_lastname.rb and turn in it on Blackboard. We will try something different this time: no log file is required to be submitted, just the Ruby file. A few files like cat.rb will be created along the way, you do not need to submit these files.
If we run the program without redirecting standard input, it will read from the keyboard and print whatever we enter until we hit control-d, the UNIX end of file character. $ ./cat.rb hello hello <-- echoed by your program world world <-- echoed type ctrl-D now The filter can perform any type of processing desired within the loop. In the example below, the program searches for a fixed string in a simpler but similar manner to grep by using the index? method of the String class. searchfor = '.gif' while line = gets puts line if line.index?(searchfor) end This program works the same as the version that assumes that we're using the standard input and output objects. There is no change to the program if we explicitly specify these objects. while line = $stdin.gets $stdout.puts line end
Files
Files are accessed using objects of the File class. They can be created using the class constructor in the same way that String and Array objects can. This method accepts a pathname and a mode argument. A mode of "r" indicates that you want to open the file for reading, while a "w" indicates that you want to write to the file. You can also specify "a" if you want to append to the file instead of overwrite and supply + after any of these to add the other mode to the default (e.g. r+ adds writing capability to r). Once the File object has been created and opened in a readable mode, you can use gets to read a file a line at a time or getc to read it a character (string of size one) at a time. The read method will read the whole file (or the remaining part if you've already read part of it.) irb(main):001:0> pwfile = File.new('/etc/passwd', 'r') => #<File:/etc/passwd> irb(main):002:0> pwfile.gets => "root:x:0:0:root:/root:/bin/bash\n"
irb(main):003:0> pwfile.gets => "daemon:x:1:1:daemon:/usr/sbin:/bin/sh\n" <-- your output will probably differ on a lot of commands going forward in this section irb(main):004:0> pwfile.methods.sort => [snip. See anything familiar? Useful?] irb(main):005:0> pwfile.getc => "b" irb(main):006:0> pwfile.getc => "i" irb(main):007:0> pwfile.getc => "n" irb(main):008:0> pwfile.getc => ":" irb(main):009:0> pwfile.getc => "x" irb(main):010:0> pwfile.read => ":2:2:bin/bin:/bin/sh\n..." The seek method will change the position you're reading from in the file. seek's argument is the position in bytes. The tell method will identify the program's current position in the file. irb(main):001:0> pwfile.close => nil irb(main):001:0> pwfile = File.new('/etc/passwd', 'r') => #<File:/etc/passwd> irb(main):013:0> pwfile.tell => 0 irb(main):014:0> pwfile.seek(20) => 0 irb(main):015:0> pwfile.tell => 20 irb(main):016:0> pwfile.getc => "t" irb(main):017:0> pwfile.gets => ":/bin/bash\n" irb(main):018:0> pwfile.gets => "daemon:x:1:1:daemon:/usr/sbin:/bin/sh\n" irb(main):019:0> pwfile.tell => 70 The seek and tell methods are most useful in files that are organized in fixed size records instead of line-oriented files in /etc/passwd where each line has a different number of bytes. Let's write a simple program to read and print the contents of /etc/passwd. We use the each_line iterator to iterate through the file line by line. file = File.new("/etc/passwd", "r") file.each_line do |line|
puts line end file.close The easiest way to process a file is to create a File object using the open method rather than new. open takes a block as an argument in addition to the pathname and file mode. The block argument (the variable named between the vertical bars) is a File object. open will handle closing the file when the block exits. It will even close the file if an error happens during the block, so we don't have to write any error handling (yay us). Let's write the same program as above using the open method. File.open("/etc/passwd", "r") do |file| file.each_line do |line| puts line end end While we typically want to process a file a line at a time, Ruby provides methods that allow us to read the entire file into a string or into an array of strings. The main disadvantage of this technique is it requires that you have enough memory to contain the entire file, which makes it unsuitable for very large files. The IO.read call reads the entire file into a single string. We have to split the file on whitespace to extract individual account records, and then split it again on colons to extract fields from within each account record. irb(main):001:0> pw = IO.read("/etc/passwd") => "root:x:0:0:root:/root:/bin/bash\n ... " irb(main):002:0> pw.class => String irb(main):002:0> pw.length => 1351 irb(main):003:0> pw[0,4] => "root" irb(main):006:0> pw.split[0] => "root:x:0:0:root:/root:/bin/bash" irb(main):007:0> pw.split[0].split(/:/) => ["root", "x", "0", "0", "root", "/root", "/bin/bash"] irb(main):008:0> pw.split[0].split(/:/)[0] => "root" The IO.readlines method reads in the file at once as an array of strings. This saves us from having to split on whitespace to extract the individual accounts. irb(main):009:0> pwlines = IO.readlines("/etc/passwd") => ["root:x:0:0:root:/root:/bin/bash\n", ... ] irb(main):010:0> pwlines[0] => "root:x:0:0:root:/root:/bin/bash\n"
irb(main):011:0> pwlines[0].split(/:/)[0] => "root" We can put the password data into a hash with named fields so that we don't have to remember which field number has which data in it. (Does this look familiar?) irb(main):081:0> pwlines[0].strip! => "root:x:0:0:root:/root:/bin/bash" irb(main):082:0> pwlines[0].split(':') => ["root", "x", "0", "0", "root", "/root", "/bin/bash"] irb(main):071:0> pwitems = pwlines[0].split(':') irb(main):071:0> pwhash=Hash.new => {} irb(main):072:0> pwitems => ["root", "x", "0", "0", "root", "/root", "/bin/bash"] irb(main):072:0> pwfields = [:username,:password,:uid,:gid,:gcos,:homedir,:shell] => [:username, :password, :uid, :gid, :gcos, :homedir, :shell] irb(main):073:0> i=0 => 0 irb(main):074:0> while i<pwitems.size irb(main):075:1> pwhash[pwfields[i]] = pwitems[i] irb(main):076:1> i = i + 1 irb(main):077:1> end => nil irb(main):078:0> pwhash => {:username=>"root", :password=>"x", :gcos=>"root", :uid=>"0", :homedir=>"/root", :gid=>"0", :shell=>"/bin/bash"}
Writing Files
All of our previous examples have shown reading from files. We can also write to files, as in the following example which reads /etc/passwd and creates an output file, userlist, containing just the usernames. Tasks like this are very common in system administration. While you could do this manually if you only had a small number of users, once you have a few hundred users, automation becomes a necessity. out = File.new("userlist", "w") File.open("/etc/passwd", "r") do |file| file.each_line do |line| out.puts line.split(/:/)[0] end end out.close
Parsing Files
You often want to discard certain lines from a file as you read it. Lines to discard include blank lines, comment lines, or any lines that don't match a pattern for which you're searching. Let's look at the /etc/adduser.conf file, which sets options for the adduser program that creates new users on the system. This file uses # at the beginning of the line to indicate comments like many configuration files (and ruby for that matter.) We can skip the comment lines using the next statement which skips ahead to the next iteration without executing any further statements in the block. This program prints out every non-comment line of the configuration file. File.open("/etc/adduser.conf", "r") do |file| file.each_line do |line| puts line if line[0] != "#" end end Next let's remove blank lines as well as comments from our output. File.open("/etc/adduser.conf", "r") do |file| file.each_line do |line| puts line if line[0] != "#" and !line.strip.empty? end end The output of the block above is a list of uppercase variable names, followed by an equal sign and a value, which may or may not be in double quotes. There is no whitespace. It should look something like this: DSHELL=/bin/bash DHOME=/home GROUPHOMES=no LETTERHOMES=no SKEL=/etc/skel FIRST_SYSTEM_UID=100 LAST_SYSTEM_UID=999 FIRST_SYSTEM_GID=100 LAST_SYSTEM_GID=999 FIRST_UID=1000 LAST_UID=29999 FIRST_GID=1000 LAST_GID=29999 USERGROUPS=yes USERS_GID=100 DIR_MODE=0755 SETGID_HOME=no QUOTAUSER=""
SKEL_IGNORE_REGEX="dpkg-(old|new|dist)" We know a data structure that contains a collection of names with associated values: a hash! Let's create one to store the data from this configuration file. We'll use strip to eliminate extra whitespace and split on the equal sign to break apart the name and value. Note our script from earlier has changed a little bit since we need to do more than just output a line. irb(main):027:0> adduser = Hash.new => {} irb(main):028:0> File.open("/etc/adduser.conf", "r") do |file| irb(main):029:1* file.each_line do |line| irb(main):030:2* line.strip! irb(main):031:2> next if line[0] == "#" or line.empty? irb(main):032:2> key, value = line. split('=') irb(main):033:2> adduser[key] = value irb(main):034:2> end irb(main):035:1> end => #<File:/etc/adduser.conf (closed)> irb(main):036:0> adduser => {"USERS_GID"=>"100", "LAST_UID"=>"29999", "SETGID_HOME"=>"no", "FIRST_GID"=>"1000", "DSHELL"=>"/bin/bash", "LAST_SYSTEM_UID"=>"999", "QUOTAUSER"=>"\"\"", "DIR_MODE"=>"0755", "LETTERHOMES"=>"no", "GROUPHOMES"=>"no", "DHOME"=>"/home", "FIRST_UID"=>"1000", "FIRST_SYSTEM_GID"=>"100", "LAST_GID"=>"29999", "SKEL_IGNORE_REGEX"=>"\"dpkg(old|new|dist)\"", "FIRST_SYSTEM_UID"=>"100", "USERGROUPS"=>"yes", "LAST_SYSTEM_GID"=>"999", "SKEL"=>"/etc/skel"} irb(main):037:0> adduser['DSHELL'] => "/bin/bash" irb(main):038:0> adduser['DHOME'] => "/home" Now that we have all the data from the file stored in a hash, we can easily make changes and write the modified data back out to a file. Let's say we need to change the default shell and home directory of system users. irb(main):039:0> adduser['DHOME'] = '/home/a' => "/home/a" irb(main):040:0> adduser['DSHELL'] = '/bin/bash' => "/bin/bash" irb(main):042:0> newconf = File.new('adduser.conf', 'w') => #<File:/home/a/kuhla/adduser.conf> irb(main):043:0> adduser.keys.sort.each do |key| irb(main):044:1* newconf.puts "#{key}=#{adduser[key]}" irb(main):045:1> end
=> ["DHOME", "DIR_MODE", "DSHELL", "FIRST_GID", "FIRST_SYSTEM_GID", "FIRST_SYSTEM_UID", "FIRST_UID", "GROUPHOMES", "LAST_GID", "LAST_SYSTEM_GID", "LAST_SYSTEM_UID", "LAST_UID", "LETTERHOMES", "QUOTAUSER", "SETGID_HOME", "SKEL", "SKEL_IGNORE_REGEX", "USERGROUPS", "USERS_GID"] irb(main):046:0> newconf.close => nil Let's examine the new file to verify that the contents are what we expected. irb(main):047:0> puts IO.read('adduser.conf') do |file| DHOME=/home/a DIR_MODE=0755 DSHELL=/bin/bash FIRST_GID=1000 FIRST_SYSTEM_GID=100 FIRST_SYSTEM_UID=100 FIRST_UID=1000 GROUPHOMES=no LAST_GID=29999 LAST_SYSTEM_GID=999 LAST_SYSTEM_UID=999 LAST_UID=29999 LETTERHOMES=no QUOTAUSER="" SETGID_HOME=no SKEL=/etc/skel SKEL_IGNORE_REGEX="dpkg-(old|new|dist)" USERGROUPS=yes USERS_GID=100 => nil As we can see, our modifications have been saved to the new configuration file.
More Parsing
Now that we understand the basics of parsing, let's look at how to parse /etc/mime.types, a file that describes how file extensions match to MIME types. MIME types are how your web browser and email client know how to open documents of different types. Web and email servers include header information that identifies the document's MIME type. Sometimes, however, a program needs to identify a document without such header information so it has to fall back on file extensions and use /etc/mime.types to translate file extensions into MIME types. If you want to find more information about file extensions than is available in /etc/mime.types, www.fileinfo.net has a huge database of file extensions and their meanings. We will start by printing the file without comments or blank lines to see what the format is. Call
your program parsemime.rb. File.open("/etc/mime.types", "r") do |file| file.each_line do |line| line.strip! next if line[0] == "#" or line.empty? puts line end end From the output, it's clear that there is whitespace separating the MIME type from a list of file extensions. What does the whitespace consist of--tabs, spaces, form feeds? The cat program can show us the whitespace using the -vet series of options. Let's pipe our output through cat vet. $ ruby parsemime.rb | cat -vet | less The output shows the whitespace as one or more ^I characters. Control-I is the tab character (that makes sense, right?), which is represented as "\t" inside a Ruby program. The list of file extensions on the right is separated by spaces with no tabs, so now we know how to parse the file.
Programs
PROGRAM #1 Complete the MIME types parsing program. The program should parse a "mimetypes file" (see attached mime.types file for an example of the data formatting) that is passed to the program as a command-line argument. You can assume that a file is passed to the script correctly, you do not need to test if one was passed, but you do need to verify the file exists. If you are unfamiliar with mimetypes, they describe data contained in files and are of the form category/identifier, for example text/plain. One mimetype may have multiple file extension possibilities. Your script will build a hash whose keys are the MIME types and each key's value is an array of its file extensions. One thing to consider: how do you handle nonexistent mimetypes vs those that have no extensions.
After the hash is created, you will need to read user input from the console (hint: $stdin.gets). The user will enter a mimetype and you will output its extensions if it has any, "No extensions found" if it has none, and "Mime does not exist" if the mimetype does not exist in the file. If the mimetype has multiple extensions they should be "pretty printed" as seen in the screenshot, not dumped as arrays. You will continue to read in mimetypes until the user enters a q or Q to quit. For bonus points, also allow for the user to enter a file extension (without the period) to search for its mimetype. The screenshot shows the bonus functionality at the bottom.