0% found this document useful (0 votes)
30 views23 pages

Assignment 1 Manual

An assignment manual

Uploaded by

Safiullah Khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views23 pages

Assignment 1 Manual

An assignment manual

Uploaded by

Safiullah Khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CS 370 Operating Systems

Assignment 1 - Shell

Operating Systems - Fall ’24

Lead TAs: Taha Cheema, Ayesha Shafique, Raahem Nabeel

1 Introduction

This assignment offers you the opportunity to design and implement a minimal command-line interpreter,
or shell, that replicates the core functionalities of a UNIX shell, such as BASH or ZSH. The primary goal
is to provide you with practical experience in systems programming while deepening your understanding of
key operating system concepts. Although you won’t be creating a commercial-grade shell, you will gain a
solid grasp of the fundamental functionalities.

Tackling Your First OS Assignment

It’s completely normal to feel overwhelmed at first—this assignment may feel like a big leap from
previous ones, especially since there’s no starter code provided. The key to complete OS assignments,
in general, is to take things step by step. Start by thoroughly reading the ENTIRE manual. I
cannot stress this enough. Most function calls, their details and code is at the end of the
manual. After that, take a day or two to reflect on how you might structure your solution, what
functions you’ll need, and how you’ll approach the problem. Spend a good amount of time
thinking about an effective parser for the shell that will cater to both the basic and
advanced functionalities. Feel free to make as many files as want for your code, as long as main.c
is the starting point of your shell. If you’re still unsure, don’t worry—feel free to visit the TA office
hours for help. Additionally, we’ve included some helpful resources on structuring shell code and
understanding important function calls here:
• Write a Shell in C.
• Pipes, Forks & Dups: Understanding Command Execution and I/O Data Flow.
Also, note that you don’t need to write your own code for reading input as is done on the
linked resource — you’re encouraged to use the much simpler GNU Readline library discussed in
detail here: E.

2 Objectives

This project serves as both practice and a learning exercise, catering to various levels of experience. By the
end of this assignment, you should feel confident in your ability to:

• Comprehend the essential concepts and workings of a UNIX shell.


• Develop programs in the C Programming Language that are clear, readable, well-documented, and
well-designed.
• Locate and interpret relevant man pages for application-level system programming.
• Grasp the fundamental concepts of process creation and management in a UNIX-like operating
system.

1
CS 370 Operating Systems

• Understand and implement basic input/output redirection, command chaining, and piping.

3 Overview
3.1 Plan

This assignment is divided into three parts

Part Brief description Grade weightage Recommended deadline

Basic Builtin shell commands and executable commands. [35%] 16/09


Intermediate Chaining simple commands as implemented in Basic Part [10%] 19/09
Medium Input/output redirection and pipes. [35%] 26/09
Advanced Chaining multiple pipelines [20%] 29/09

None of the parts are optional, but a major chunk of the assignment comprises the Basic, Intermediate and
Medium sections.

3.2 Advice

IMPORTANT: Although the assignment is structured to progress from Basic to Intermediate to Advanced
levels, it can be highly beneficial to consider the later parts while working on the Basic section. For example,
implementing command chaining while keeping I/O redirection and pipes in mind will be useful to get the
final 20% grade. Moreover, if you independently work on the Basic part, you will get 35% of the grade but
may end up subsequently having to change your code a lot to be able to implement the Intermediate and
Advanced parts.

3.3 Restrictions

The use of the system() function is strictly prohibited for this assignment. Any implementation using
system() will result in a score of 0. Please ensure your shell is implemented without invoking external
system calls via system() .

3.4 Plagiarism Policy

The assignment is to be attempted individually. Plagiarism is strictly prohibited. Students are not allowed to
discuss their solutions with others or copy code directly from the internet or LLMs like Claude or ChatGPT.
However, they may seek help from the course staff. We will be running plagiarism checkers with current and
previous years’ code.

4 What is a shell?
Like the shells you use daily (or rarely), your shell should present a prompt when running interactively,
allowing it to read and execute commands entered by the user. Additionally, it should be capable of reading
and executing commands sequentially from a file. But what exactly is a command? In its simplest form,
a shell command consists of a command name followed by optional arguments. The general syntax for
executing a command is:
1 command [args]*

The [args]* syntax, adapted from UNIX man pages with slight modifications, indicates that an arbitrary
number (zero or more) of arguments can be provided. More details on command syntax used in this manual
can be found here: A.
In this assignment, you will be working with two types of commands that your shell should support:

2
CS 370 Operating Systems

• Executable Commands: These are commands that refer to a program or script that can be
executed. The name of the command could be just the name of the executable (like ls or echo), or
it could be a path to the executable file, which can be either a relative path (like ./myprogram) or
an absolute path (like /usr/bin/ls). We provide details on how to run these later.
• Shell Built-in Commands: These are commands that are built into the shell itself. They don’t
run an external program but instead perform operations that directly affect the shell’s behavior or
state. For example, a built-in command might change the current working directory or display a list
of recently executed commands.

In simpler terms, you will be creating a basic version of a shell — the kind of program that lets you type
commands to interact with your computer. Your shell will be able to run programs based on the commands
you type, and it will also include some built-in features that allow you to do things like change directories
or manage files directly from within the shell. This assignment will guide you through the process of
understanding how these shells work and how to build one yourself. Note: Your shell implementation will
be running like a regular C program within your terminal.
Here’s some example1 commands. To list the contents of a directory, you would use the ls command:
1 $ ls
2 Documents Downloads

To copy a file, you would use the cp command followed by the source and destination file names:
1 # copies the file source_file to destination_file
2 $ cp source_file destination_file

5 Part 1 : The Basics (35% Grade)


5.1 Specification

For this section, the shell should support the following features:

5.1.1 Interactive mode and Script mode


• In interactive mode (where you’re typing commands directly into the shell), once the executed
command finishes, the shell should display a new prompt, ready for your next input.
• In script mode (where the shell is reading commands from a file), after each command completes,
the shell should automatically move on to read and execute the next command in the script.

Your shell must support a script mode, where the shell reads commands from a file instead of the terminal.
The script mode is activated by passing the name of the script file as an argument to the shell executable. For
example, if the shell executable is named shell, then the script mode can be activated by running ./shell
script.txt. In this case, the shell should read the commands from the file script.txt and execute them.
If the shell is run without any arguments, it should run in the interactive mode, where it reads commands
from the terminal and executes them.
For the script mode, you can assume that commands will be separated by newlines.

Disclaimer
Implementing the Script mode is very important, as the testing depends on the script mode. If your
shell doesn’t support the script mode, you won’t be able to run the tests.

1 For this example, as well as all the subsequent examples, the $ sign represents the shell prompt. The # sign represents the

beginning of a comment. The \ sign represents the continuation of a command over multiple lines, and is not a part of the
command, unless escaped. Any line with no symbol at the start is an output

3
CS 370 Operating Systems

Side note on implementation

A single function like getInput() should be used that depending upon a flag, yields a command string,
either from the user’s input (in the case of the interactive mode), or from the script file provided as an
argument. Have a look at the int argc and the char** argv arguments of a standard C main function.

Disclaimer

Please note that any text enclosed in quotes (single or double) is treated as a single
argument string, with the quotes themselves being discarded.

For example, consider the command mkdir "My Directory". Here, My Directory is a single
argument being passed to the mkdir command, not two. If you are using strtok and parsing at
spaces, this might be useful to remember. You do not need to cater to escape sequences, or nested
quotations liked "i’m okay", etc within the commands or their args.

5.1.2 Built-in Commands

Shell built-in commands are special commands that are part of the shell itself. Unlike external commands
that require the shell to start a separate process to execute a program, built-in commands execute directly
within the shell. They perform tasks that directly impact the shell’s environment, such as changing the
current working directory using cd. Built-in commands don’t require the shell to fork a new process because
they are handled internally. Forking is the process of creating a child process to run an external program.
Since built-in commands are designed to interact directly with the shell’s internal state—modifying variables,
changing directories, or altering the shell’s behavior—there’s no need to create a separate process.

NOTE: You may want to handle errors related to number of arguments for making debugging eas-
ier. Although this will not be tested explicitly, it is still a good practice.

1. exit : The exit command allows you to exit the shell gracefully and return to the parent terminal.
In other words, it should terminate your running shell. It should not take any arguments.

Syntax: exit

2. pwd: This command is used to display the current directory in which the shell is operating. This
command is helpful when you need to know the exact path of the directory you are currently working
in. The usage of the pwd command is straightforward. Simply enter pwd without arguments and the
output is the path to the current directory.

Syntax: pwd

3. cd: stands for change directory, and that’s what it does. You specify the directory in the args, and
the present working directory (or the current directory) of the shell is changed to that. If cd gets
no args, it simply changes pwd to the home directory. Think about why can’t this be implemented
as an external process.

Hint: If built-in commands were implemented as external processes, the shell would have to fork
a new child process to execute them. However, any modifications made by the child process, such
as changing the working directory or setting a variable, would only affect the child process, not the
parent shell. Once the child process terminates, its environment changes would be lost, and the
parent shell’s state would remain unchanged. This would make commands like cd ineffective, as
the shell’s working directory wouldn’t actually change after the command completes. This is why
built-in commands must be executed within the parent shell process.
Syntax: cd [directory]

4
CS 370 Operating Systems

4. pushd: is the same as cd with one additional functionality. It first pushes the current working
directory to a directory stack before changing the directory to the one provided as an argument (or
home if no args were provided). You will have to implement the directory stack yourself.

Syntax: pushd [directory]

5. popd: this command pops a directory from the directory stack and changes the current working
directory to it. It takes no arguments. If the directory stack was already empty, the shell should
not crash and instead, it should output this statement: popd: directory stack empty

Syntax: popd

6. dirs: prints all the directories currently pushed to the directory stack. dirs takes no arguments. It
should print the last pushed directory first. Each directory should be printed on a new line. There
will be no output if the directory stack is empty.

Syntax: dirs

7. alias: The alias command allows you to create aliases for frequently used commands. An alias
is a user-defined shortcut or alternative name for a command. When you create an alias, the shell
replaces the alias with the actual command whenever you use it. Use the following syntax to create
an alias:
1 alias alias_name "input_string_to_alias"
2

For example, if you frequently use ls -al to list all contents of a directory, you can create an alias
called ll by the command: alias ll "ls -al". After creating this alias, whenever you enter ll
in the shell, it will be expanded to ls -al and executed. Please note that the input string to be
aliased can be any shell input, whether valid or invalid. The shell will simply replace the alias with
the input string and try to execute it.

Syntax: alias [alias_name [alias_value]]

If no arg is provided, alias lists all the aliases currently defined in the format "%s=’%s’\n",
alias_name, alias_value (printf format specifiers syntax). If only the alias_name is provided,
list the expansion of that alias in the format : "%s=’%s’\n", alias_name, alias_value. Please
note that following this syntax is important, as this will be followed in the testing. Consider the
following example:
1 $ alias ls "ls --color=tty"
2 $ alias grep "grep --color=tty"
3 $ alias
4 ls=’ls --color=tty’
5 grep=’grep --color=tty’
6 $ alias ls
7 ls=’ls --color=tty’
8 $ alias grep
9 grep=’grep --color=tty’
10

8. unalias: The unalias command allows you to remove aliases that have been defined. This
command is useful if you no longer need an alias or if you want to redefine an alias with a different
command. The syntax for removing an alias is straightforward: unalias alias_name. For example,
if you want to remove the ll alias created earlier, you can use the command: unalias ll

5
CS 370 Operating Systems

Syntax: unalias alias_name


9. echo: The echo command is used to display text or variables on the screen. It is commonly used for
printing messages or variables. For this assignment you will only need to print strings. You can also
pass multiple strings to the echo command, and they will be printed on the same line, separated by
a single space. Following examples demonstrate the usage:
1 $ echo hello
2 hello
3 $ echo hi hello
4 hi hello
5 $ echo "hello world for real"
6 hello world for real
7 $ echo "hello1" "hello2"
8 hello1 hello2
9 $ echo "hi\n" "hello"
10 hi
11 hello
12 $ echo ls
13 ls
14 $ echo cp file1 file2
15 cp file1 file2
16

17

Syntax: echo [string]*

Disclaimer

Please note that any text enclosed in quotes (single or double) is treated as a single
argument, with the quotes themselves being discarded. If you are using strtok and
parsing at spaces, this might be useful to remember. For this assignment, you do not need
to handle variable expansion or commands passed as arguments to echo; everything will be
treated as plain strings, as shown in the examples above.

5.1.3 Executable Commands

When you enter a command in the shell that isn’t a built-in function, the shell initiates a new process to run
the corresponding program from your computer’s filesystem. This process runs in the foreground, meaning
the shell waits for it to complete before accepting new input. Examples of executable commands include
cat, ls, grep, wc etc.
Let’s break down how executable commands work:

• Command Identification: The first word you type on the command line is interpreted as the
name of the program to execute. For example, in the command ls -l /home, "ls" is the program
name.
• Argument Passing: The entire command line, including the program name itself, is passed as
arguments to the program being executed. In our ls -l /home example, the program ls would
receive three arguments: ls, -l, and /home.
• Locating the Executable: The shell doesn’t just look in the current directory for the program.
Instead, it searches through a list of directories specified in the PATH environment variable. This
allows you to run programs from anywhere on your system without specifying their full path.

6
CS 370 Operating Systems

• Using exec Functions: To actually run the program, the shell uses one of the exec family of
functions. These functions replace the current process with a new one. The manual page for
these functions (accessible by typing man 3 exec in your terminal) provides details on the different
variants available. For our purposes, you should choose a variant that automatically searches the
PATH for the executable given its name, saving you from having to implement this search yourself.
(HINT: execvp is what you are looking for)

Please view this D section and the resource shared in the Introduction 1 for a guide on how to
implement executable commands.

6 Part 2 : Basic Chaining (10% Grade)

It’s time to extend the functionality of the shell even further by adding another core feature called "chaining".
NIX2 shells allow multiple commands to be chained together via different operators like &&, ||,
;. Suppose you want to execute a series of commands such that each next command depends on the success
of the previous one. One way is to sequentially execute them one-by-one, i.e. enter one command/pipeline,
wait for its completion, and then at the next prompt, enter the next command/pipeline and so on, until you
encounter an error, and its time to panic.
However, you can be more productive by chaining all the commands via a logical chaining operator. Similarly,
a lot of the cases arise, where you want a particular command to be executed only in case an error occurs
in the previous command. This can also be achieved by the use of chaining. However, chaining has many
more use cases than just that, which you’ll hopefully figure out once you start using those in your actual
shell usage.

6.1 Specification

A detailed specification of the features you are required to implement in this part is provided below.

Disclaimer
Please note that for Part 4, you will need to extend the simplified implementation below from
handling simple command chains to pipeline chains once you are familiar with pipelines in Part 3.
We recommend that you review Part 3 before designing your parser, for easier extendability in Part
4 which forms the last 20% of your grade.

The use of the phrase "pipeline" in the below section can be interpreted as a "command" for
the intent of implementing Part 2 only.

6.1.1 Chaining

We need to support three main chaining operators : &&, ||, ;. In C, zero is conventionally a success
status code while other codes like -1 are for failures. You should track status codes for the last run
command in order to implement chaining properly.

1. Logical Chaining Operators


• operator &&: The && operator allows for chaining of pipelines in a logical AND fashion. If the
last return status is zero (last executed command succeeded), only then the immediate right
hand side of the && is executed, otherwise the right hand side pipeline is skipped, and the next
link in the chain is tested w.r.t. its chaining operator.
2 NIX is an informal shorthand for UNIX and UNIX like systems

7
CS 370 Operating Systems

• operator ||: The || operator allows for chaining of pipelines in a logical OR fashion. If the
last return status is non-zero (last executed command failed), only then the immediate right
hand side of the || is executed, otherwise the right hand side pipeline is skipped, and the next
link in the chain is tested w.r.t. its chaining operator.
Consider the following example usages of Logical chaining operators.
1 # the first command succeeds, last run status code = 0, the echo command to the
right of && runs
2 $ true && echo "This should be printed"
3 This should be printed
4

5 # false executes first. last run status code = -1. the first echo doesn’t execute
because of &&. the second echo executes because of ||
6 $ false && echo "This should not be printed" || echo "This will print though"
7

8 # the first command succeeds, none of the other two execute because the last run
command’s status was success
9 $ true || echo "This should not be printed" || echo "Neither this"
10

11 # will only print and execute .o file if compilation was successful


12 $ gcc file.c && echo "Compilation successful" && ./a.out
13

14 # failure of first command results in only the second one being executed, when the
second one succeeds, the third one doesn’t run
15 $ false || echo "This should be printed" || echo "This should not be printed"
16 This should be printed

2. Sequential Chaining Operator ; : The ; operator allows for sequential chaining of pipelines. Irre-
spective of the return status of the previous pipeline, the next pipeline is always executed. Chaining
multiple commands/pipelines with ; is equivalent to entering them one-by-one at the prompt irre-
spective of the return status of the previous command/pipeline.
Example usage:
1 $ echo "Hello" ; false ; echo "World" ; true ; echo "great"
2 Hello
3 World
4 great

All of the above chaining operators have the same priority and are left associative, i.e. the
chaining is done from left to right. These chaining operators can be used to chain together multiple pipelines,
all these chaining operators can be used in a single command input string, in different combinations.
Consider the following examples. The example with a more complicated combination of chaining operators
1 $ echo "Hello" && echo "World" || echo "Bye" && echo "World"
2 Hello
3 World
4 World
5 $ false || echo "prints1" && echo "prints2"
6 prints1
7 prints2

The following example demonstrates a more complicated/confusing combination of chaining operators.

8
CS 370 Operating Systems

ls && false || echo "doit" && false && echo "won’t work" || echo "works" ; echo "done"
|_| |___| |_________| |___| |_______________| |__________| |_________|
1 2 3 4 5 6 7
1 -> returns 0
2 -> last status 0, gets executed (because &&), returns 1 (false always returns 1)
3 -> last status 1, gets executed (because ||), returns 0
4 -> last status 0, gets executed (because &&), returns 1
5 -> last status 1, doesn’t get executed (because &&)
6 -> last status 1, gets executed (because ||), returns 0
7 -> gets executed regardless of the last status, returns 0

# The output of the above command string would be


<output of ls>
doit
works
done

7 Part 3 : Pipelines and I/O Redirection (35% Grade)

NIX shells also support a core feature called IO redirection (input output redirection), which refers to the
ability of redirecting the input or output of commands. Generally commands take their input from stdin,
and dump their output to the stdout. For our purposes, we can consider stdin to be the terminal’s input
coming from the keyboard, and stdout to be the terminal’s display. This works well enough for standalone
processes, that simply take input from the user, and show the output to the user as well. However, enabling
basic IPC (inter process communication), where processes can communicate with each other, becomes a
necessity in a modern multi-process operating system e.g. one process can communicate its findings (output)
to another specialized process (as input) for further processing, or multiple processes can write their output
directly to a file, which can then be used by another process as input etc. IO redirection allows this without
changing the IO structure (in the code) of the program itself.
Before proceeding, I must present a subtle technicality here. "Redirection" specifically referes to the ability
of passing output of a command to, or taking input of a process from, a file or a stream. If the IO of a
command needs to be passed to another command, that can be done by using pipes, and is generally called
"piping commands" which will be described in detail below. Note that this definition is still a bit simplified,
but it should suffice for our purposes.

7.1 Specification

Although modern shells provide a lot of redirection facilities, we will be mainly concerned with the operators
>, <, », |. The following specification pertains to this custom shell only, and is not necessarily true for
all (atleast POSIX compliant) shells.

1. Output redirection
• command > file: Redirects output of the command to file, replacing the file if it exists. If the
file doesn’t exist, it is created.
• command » file: Redirects output of the command to file, appending to the file if it exists. If
it doesn’t exist, it is created.
Consider the example usage of output redirection:
1 $ echo "hello world"
2 hello world
3 $ echo "hello world" > file.txt
4 $ cat file.txt
5 hello world

9
CS 370 Operating Systems

6 $ echo "hello world 2" >> file.txt


7 $ cat file.txt
8 hello world
9 hello world 2

2. Input redirection
• command < file: Redirects input of the command to file. If the input file doesn’t exist, the
command should not execute, and report the error.
Consider the example usage of input redirection:
1 # assuming the file created earlier. this specifies that cat should read
2 # input from file.txt, instead of the keyboard
3 $ cat < file.txt
4 hello world
5 hello world 2

3. Pipes command1 | command2: Pipes the output of command1 to the input of command2. The output
of command1 should not be displayed on the terminal, and should be passed as input to command2.
The output of command2, if any, should be displayed on the terminal (unless it is redirected to a file
or piped with another command). The general syntax of piped commands (more commonly referred
to as a pipeline) is:
1 command [args]* [ | command [args]* ]*

The [x] means that x is optional, and the * means that the preceding element can be repeated any
number of times. So, the above syntax means that a pipeline can have a command, followed by any
number of args, followed by any number of commands (each followed by any number of args), each
separated by a ’|’ symbol.
Consider the following example pipelines:
1 # list the contents of a directory and then filter
2 # the results using the grep command
3 $ ls
4 build include Makefile src test
5 $ ls | grep include
6 include
7

8 # Find the pid of the zsh instance with the lowest pid and kill it
9 # Note that this is not the standard way to find the desired thing at all
10 # it’s just a long pipeline I crafted to show how we can use pipes
11 $ ps -opid,comm | grep -v PID | grep ’[z]sh’ | cut -d ’ ’ -f 3 | sort | head -1 |
xargs kill

Shells also support a combination of pipes and IO redirects. For example, a long pipeline can dump its
results (stdout output) to a file instead of the terminal screen. Similarly, the start of a pipeline can read
input from a file instead of stdin, and then propagate the output to the next command in the pipeline.
Consider the example usages:
1 $ echo "hello line1 hello line2" | wc
2 1 4 24
3

4 $ alias ls "ls --color=tty"


5 $ alias grep "grep --color=tty"

10
CS 370 Operating Systems

6 $ alias | grep color > greplog.log


7 $ cat greplog.log
8 ls=’ls --color=tty’
9 grep=’grep --color=tty’
10

11 # from a file with a bunch of names in the format ‘Fname Lname‘, extract all the
firstnames that are not ’Abdullah’ and write them to a file in a sorted order
12 $ cut -d ’ ’ -f 1 < names.txt | grep -v Abdullah | sort > extracted.txt

For our simple shell, we can assume that the file IO redirection operator, and pipes, come after the name of
the command, and the operator impacts the command that precedes it. To elaborate, you will not be tested
on inputs of the type
1 $ cut -d ’ ’ -f 1 < names.txt > new_names.txt

Note that for each process, the input can come from only one file, and the output can only go to one file as
well (file in this case refers to anything which has an associated file descriptor, it can either be stdin/stdout,
a filesystem file, or a pipe). So, if a process is in the middle of a pipeline, it can only receive input from
the previous process’/command’s pipe, and can only send its output to the next command’s/process’ pipe.
It cannot have any filesystem file IO redirection. Similarly, if a command has an associated output file
redirection, it cannot be piped to another command, it must be the last command in a pipeline (since the
output is already redirected to a file, it cannot be sent to another command). And similarly, if a command
has an associated input redirection, it cannot have a previous piped command, it must be the first command
in the pipeline (since the input is already redirected from a file, it cannot be received from another command).
Consider the following examples:
1 # invalid, the first command has two output destinations
2 $ echo "hello world" > file.txt | grep hello
3 # valid, since there’s only one input source, and one output destination
4 $ grep hello < file.txt | grep world
5 # invalid, since the second command has two input sources
6 $ grep hello < file.txt | grep world < file2.txt
7 # valid, since there’s only one input source, and one output destination for each command
8 $ grep hello < file.txt | grep world > file2.txt
9 # invalid, the middle command has two input sources
10 $ grep hello < hello.txt | wc < hello.txt | grep 10

Based on the above information, for part 2, your shell should be able to correctly handle and execute a
command input of the following form:
1 command [args]* [ < file] [ | command [args]*]* [(> OR >>) file]

Only the first command can have an optional input redirect, and only the last command (which
can very well be the first command as well, since the piped commands are optional) can have
an optional output redirect (either of the two, not the both). Test cases won’t include incorrect
pipelines that do not satisfy the above format, but you should ideally place checks for these requirements
and handle errors as they’ll help you debug where your parsing and execution logic is failing.
Notice that this generalized syntax also includes commands from part 1, where we looked at basic commands
without any IO redirection and pipes. From now on, we will be referring to this specification as a
pipeline. So having a correct pipeline execution automatically implies a working part 1 implementation.
Note the careful use of whitespaces in the above specification. You can assume that all commands, args
and operators are separated by atleast one space character. There can be more than one whitespace
characters though, but they should get ignored by the shell.

11
CS 370 Operating Systems

Disclaimer
The above specification holds true for all commands, both built-in as well as the external commands
from the first part. For example, in the usage of echo command in above examples, echo is a built-in.
The shell grammar makes no distinction between an external or an internal command, and thus a
pipeline can have a mix of both built-ins as well as external commands.

Please view this D section and the resource shared in the Introduction 1 for a guide on how to
implement pipes and I/O redirection in C.

8 Part 4 : Advanced (20% Grade)

In Part 2, you implemented chaining operators with basic commands. For the final 20% of the grade, your
chaining operators should work with pipelines seen in Part 3 (which include multiple piped commands with
possible I/O redirection.
Here is an example of chaining of multiple pipelines:
1 # If either of the strings "Forgis" and "Jeep" is found
2 # in the input string, then this is it
3 $ echo "I just put new Forgiato wheels on my vehicle from the company Jeep" > file.txt ;
cat < file.txt | grep Forgis || cat file.txt | grep Jeep && echo "Woah.. that’s..
what??" || echo "clearly not" ; echo "done"
4 # Guess what would the would output be (grep returns non-zero status if it can’t find the
word)
5 # pls dont @ me, the manual was written when this was funny

Please note that the previous specifications should still be compatible with above described specifications,
i.e. the previous specifications (described in part 1 and 2) should be a subset of this specification.

12
CS 370 Operating Systems

9 Getting started

The handout contains the following files:

Shell/
|-- Makefile
|-- dockerfile
|-- src/
| |-- main.c
|-- include/
| |-- utils.h
| |-- log.h
|-- build/
|-- test/
| |-- Tests/
| |-- test.py
| |-- config.json
|-- Manual/
|-- manual.pdf

The src directory should contain all of your source files (all the .c files), while all the header (.h) files go
to include directory. The build files (object files, and the executable) are placed inside the build directory.
The test directory contains the test files. The dockerfile is used to build the docker image, and the
Makefile is used to build the project. The README.md file contains the instructions to build and run the
project.

9.1 Setting up Docker

Although the assignment can be built on any Linux system, we recommend using Docker to build and run
the project. The project, as well as the test files, have been extensively tested on a docker image, and we
can guarantee that it would work on the docker image irrespective of your environment. Therefore, your
submission would be tested on the docker container provided. So, we recommend using Docker to build and
run the project. If you are not familiar with Docker, you can read about it here. You can install Docker on
your system by following the instructions here.
Once you have installed Docker, ensure the Docker application is running in the background and
then start the docker container by running the following command in the root directory of the project (where
the docker-compose.yml file is).
1 docker compose up -d

To run the container in interactive mode, run the following command


1 docker exec -it os-fall-2024-pa1 /bin/bash

Navigate to the mounted directory by typing the following in the interactive terminal.
1 cd /home/os-fall-2024-pa1

The container has the following packages installed:


build-essential, valgrind, binutils,libreadline-dev and git. You can install more packages if
you need to (e. g. apt-get install vim).

To exit the container, type exit in the terminal.

13
CS 370 Operating Systems

1 exit

Disclaimer
The container runs in the background and hence to stop it, you need to run the following command:
1 docker compose down

9.2 Building system and running project

For building the project, you can use the Makefile provided. The entry point of the program is main.c,
though you are free to create other files to keep your code organized. To build the project, run the following
command in the project directory, (containing the Makefile):
1 $ make

This would create the executable Shell (or whatever is defined to be the target’s name in the makefile)
inside the build directory, in debug mode by default. Debug mode is helpful for debugging - you will see
output using the provided LOG_DEBUG, LOG_ERROR, LOG_PRINT functions. You can also build the project in
release mode, which enables compiler optimizations, and disables debugging macros. To build the project
in release mode, run the following command:
1 $ make BUILD_DEFAULT=release

Or, you can change the value of BUILD_DEFAULT variable in the Makefile to release, and then run make
without any arguments. Always make sure to run make clean before switching between the build modes,
as the object files are not compatible between the two modes.
To run the shell/your program, just use make run in the project directory.
1 $ make run

To clean the build, to force a rebuild from scratch, use make clean, or make clean; make.
Please note that the provided Makefile assumes the provided directory structure. So, the above instructions
are only valid if you stick to that structure.

9.3 Automated Testing

The test directory contains the test files. The test.py script is used to test the project. The testing script
assumes that the executable to be tested is located inside the build folder, with the name "Shell". So,
make sure that you have built the project, in release mode, before running the tests. To run the tests, run
the following command in the project directory:
1 $ make test

This runs all the tests (for all 4 sections), and reports the score. There are four categories of the tests: easy,
intermediate, medium, advanced. So, to run a specific test, pass the name as an argument. For example,
to run the easy tests, run the following command:
1 # Runs the easy tests
2 $ make test ARGS="easy"
3 # Runs only the medium and advanced tests
4 $ make test ARGS="medium advanced"

14
CS 370 Operating Systems

If for some reason, you are not happy with the test results, you can see the intermediate results for yourself
as well, by looking at the files in the directory test/Tests/test_output, which stores the output of the
commands ran by your shell (ext. msh.out), and the (ext. dash.out) reference shell.

Disclaimer
All OS assignments will include hidden test cases, so your implementation should be flexible
enough to meet the requirements outlined in the manual. Don’t worry; we won’t test anything that
isn’t covered in the manual or that significantly differs from the public test cases. The hidden test
cases are mainly there to ensure that no one has hard-coded their outputs or not properly followed
the specifcations in the manual.

10 Submission

The submission should be done through LMS on the assignment tab. The submission should be a zip file
containing the following:

1. The src directory containing all the source files.


2. The include directory containing all the header files.

Please don’t include any other files, or directories, as they’d just be making our lives harder, and would
be removed nonetheless. Also, please make sure to follow the specified directory structure and not just
dump all the files directly in the archive. The zip file should be named as <roll_number>.zip, where
<roll_number> is your roll number. For example, if your roll number is 24100173, then the zip file should
be named as 24100173.zip. The zip file should not contain any subdirectories, i.e. the zip file should not
contain a directory named 24100173, which contains the src and include directories. The zip file should
be submitted on LMS before the deadline. Late submissions will not be accepted.
Good Luck, and Happy Coding!

15
CS 370 Operating Systems

Reference and Guide


A Syntax Specifcation reference

As mentioned earlier, the syntax used to specify syntax in this manual is an adoption from man pages
conventions, with a few changes. Following is a brief description of different operators used.

1. []: The [] operator specifies that the enclosed token or expression is optional. It is used to specify
optional args, or optional operators. The operator can be nested inside another [] operator as well.
For example, [arg1 [arg2]] specifies an optional arg1, which can be followed by an optional arg2,
(but arg2 can only be present if there’s an arg1 in the first place) So, empty string, arg1, and arg1
arg2 are all valid examples but arg2 or arg1 arg2 arg2 etc. are invalid examples.

2. (): The () operator specifies that the enclosed token/expression is mandatory. However, it is not
always used, and if a token is found without any operator, that is also considered mandatory.
So, for example, the syntax specifications : (command) [arg] and command [arg] are equivalent,
both specifying that the valid syntax comprises of a mandatory command, followed by an optional
argument. The main use of the operator () is to specify an expression, that reduces to a single
token. The scope of both [] and () is local.

3. OR, AND: These logical operators are self-explanatory. They are used to form an expression. The
OR operator specifies that only of the operands can be present, and the AND operator specifies that
all of the operands need to be present. For example, the syntax specification [(> OR ») file]
specifies optional output redirection. Note the nesting, and the use of an expression in the syntax
specification.

4. *: The operator * represents a repetition of the preceding expression. Note that, it doesn’t say
that the copies of the token are to be duplicated arbitrary number of times, but the preceding
expression can be repeated arbitrary number of times. For example, command [args]* specifies
that a command can take arbitrary number of optional commands, i.e. command arg1 arg2 ...
argn, where n is 0 or a positive integer.

5. :=: This operator represents equivalence. It is used to mainly use shorthands, to breakdown a
complicated syntax into smaller parts.

B String Manipulation

At this stage, I assume you have recognized that you would need to parse user input to extract individual
tokens from a command to make your shell work. There are many useful string functions in C: strtok,
strdup, strcpy, strcmp, strcat.

C System Calls

System calls, or more commonly "syscalls", are specialized functions. System calls are different from a regular
procedure call in that the callee is executed in a privileged state, i.e, that the callee is within the operating
system. Because, for security and sanity, calls into the operating system must be carefully controlled, there
is a well-defined and limited set of system calls. This restriction is enforced by the hardware through trap
vectors: only those OS addresses entered, at boot time, into the trap (interrupt) vector are valid destinations
of a system call. Thus, a system call is a call that trespasses a protection boundary in a controlled manner.
The process abstraction, as well as some operations related to processes, are managed by the operating
system, therefore, in order to work with the processes, and control them, system calls are essential. The
following UNIX syscalls are probably going be to useful for this project.

16
CS 370 Operating Systems

System Call Brief Description

chdir Changes the current working directory of the process.


close Closes a file specified by the file descriptor.
dup The dup() system call creates a copy of the file descriptor provided, using the
lowest-numbered unused file descriptor for the new descriptor.
dup2 The dup2() system call performs the same task as dup(), but instead of using the
lowest-numbered unused file descriptor, it uses the file descriptor number specified
in newfd.
exec exec is a family of syscalls, which are responsible for replacing the current process
with a new process. The different functions differ in terms of the params they
receive.
exit Terminates the current process, with the specified exit status code.
fork Creates a new child process, identical to the parent process.
getcwd Returns the current working directory of the process.
getenv Returns the value of an environment variable e.g. getenv(HOME) can be used to
retrieve the value of the home directory.
getpid Returns the process ID of the current process.
getppid Returns the process ID of the parent of the current process.
open Opens a file. On success, returns a file descriptor.
pipe Creates a pipe. On success, returns two file descriptors, one for reading and one
for writing.
read Reads data from a file specified by a file descriptor.
setenv Sets the value of an environment variable.
wait wait family of functions wait for a child process to terminate.
write Writes data to a file specified by a file descriptor.

Please note that this is a very brief description of some useful syscalls that you may or may not use in the
implementation. For more details, you can always visit the man pages. The man pages also mention the
header files that need to be included in order to use the syscalls, as well as the return values, parameters
and other details.

D Helpful Tips for Understanding pipe(), dup(), fork(), and exec() Operations

D.1 fork()

fork is like making a photocopy of your running program. When you call fork(), it creates an exact
duplicate of the current process, called a child process. The child process is identical to the parent, except
for a few key details:

• The child gets a new process ID (PID).

• Fork returns 0 in the child process and the child’s PID in the parent process

• The child process gets a copy of all variables from the parent process.

17
CS 370 Operating Systems

• Although the child starts as an exact copy, its memory space is separate from the parent’s. Changes
in one process do not affect the other after fork().
• The child and parent share file descriptors, meaning they can affect each other in terms of file
operations, but not through variables.

D.2 execvp

After forking, you often want the child process to run a different program. That’s where execvp comes in.
It replaces the current process image with a new process image specified by the file name you provide. The
’v’ in execvp means it takes an array of strings as arguments, and the ’p’ means it will search the PATH
environment variable to find the executable. The array of strings MUST be NULL terminated, i.e., the last
element of the array of strings must be a NULL pointer.
When execvp() executes successfully, it doesn’t return because the current process image is replaced by the
new program specified in the function. As a result, any code following the execvp() call won’t be executed
after a successful run.
However, if execvp() encounters an error, it returns -1, allowing for error handling in your program.
Example usage:
1 char *args[] = {"ls", "-l", NULL};
2 execvp(args[0], args);

This would replace the current process with the ls command, passing -l as an argument.
You typically use fork() to create a new process before calling execvp() because if you call execvp() directly
in your current process, the process will be entirely replaced by the new program, and any remaining code in
the original process will not run. Using fork() first allows the original process to continue running alongside
the new one. This method is known as the "fork-exec" model

D.3 dup & dup2

To understand how the dup() call works, lets first understand what a File descriptor is. A file descriptor is
a unique integer that identifies an open file or other I/O resource, such as a pipe or network socket, within
a computer’s operating system. It allows the system to manage and perform operations on these resources.
When you open a file, create a pipe, or establish a network connection, the operating system assigns a file
descriptor to that resource.
Each process in the operating system has a table that tracks its open files and resources. File descriptors
act as indices in this table. By default, every process starts with three standard file descriptors:
0: Standard input (stdin)
1: Standard output (stdout)
2: Standard error (stderr)

These file descriptors can be used to interact with the standard input, output, and error streams.
In your shell assignment, you will implement features like pipes and I/O redirection. To do this, you will
need to duplicate file descriptors, which is where dup and dup2 come into play.
dup(): The dup function duplicates an existing file descriptor and returns a new file descriptor that refers
to the same open file. The new file descriptor will be the lowest-numbered unused file descriptor.
1 /*
2 Syntax : int dup(int oldfd);
3

4 oldfd : This is the file descriptor you want to duplicate .


5 Returns : A new file descriptor on success or -1 on error.

18
CS 370 Operating Systems

6 */
7 // Example Usage
8 int newfd = dup (1); // Duplicate stdout (file descriptor 1)

In this example, newfd is a new file descriptor that now points to the same output stream as stdout. If you
write to newfd, it will display on the terminal just like stdout.
dup2(): The dup2 function is similar to dup, but it allows you to specify the new file descriptor number. If
the specified file descriptor is already open, dup2 closes it before duplicating oldfd into it.
1 /* Syntax : int dup2(int oldfd , int newfd);
2 oldfd : The file descriptor you want to duplicate .
3 newfd : The file descriptor you want to overwrite (or reuse).
4 Returns : newfd on success , or -1 on error.
5 */
6

7 dup2(fd , 1); // Redirect stdout to the file descriptor ‘fd ‘

In this example, dup2 duplicates fd and overwrites stdout (file descriptor 1). Now, anything written to
stdout will be sent to the file or pipe associated with fd.

D.4 pipe()

The pipe() system call creates a pair of file descriptors: one for reading and one for writing.

• Write End: The file descriptor at the write end of the pipe allows a process to send data into the
pipe. When a process writes to this end, the data is temporarily stored in the pipe until it is read
by another process.
• Read End: The file descriptor at the read end of the pipe allows a process to receive data that has
been written into the pipe. The process at this end reads the data, consuming it from the pipe.

These file descriptors can be used to pass data uni-directionally from one process to another. When we say
a pipe is unidirectional, we mean that data flows in only one direction through the pipe: from the write end
to the read end. This means that one process writes data into the pipe, and another process reads that data
from the pipe. The process that writes cannot read from the pipe, and the process that reads
cannot write to it.
1 int pipefd [2];
2 pipe( pipefd );
3

4 /* pipefd [2]: This is an array of two integers . After calling pipe (), the array will
hold two file descriptors :
5

6 pipefd [0]: The read end of the pipe. This is where the data is read from the pipe.
A process reading from this end typically uses the output generated and written to
the write -end of the pipe by another process .
7 pipefd [1]: The write end of the pipe. This is where the data is written into the
pipe. A process writing to this end is typically generating output that another
process will use.
8

9 Returns :
10 0 on success .
11 -1 on failure .
12 */

For a command like ls | grep "txt", you can imagine the flow to be like

Process A (ls) --> Write End (pipefd[1]) -->


[PIPE] --> Read End (pipefd[0]) --> Process B (grep "txt")

19
CS 370 Operating Systems

D.5 Bringing it all together

To summarize and illustrate the entire process, here is a comprehensive example demonstrating how a
program manipulates file descriptors:

1 # include <stdio.h>
2 # include <stdlib .h>
3 # include <unistd .h>
4 # include <fcntl.h>
5 # include <string .h>
6

7 int main () {
8 int pipefd [2];
9 int saved_stdin ;
10 pid_t pid;
11

12 // Save the original stdin


13 saved_stdin = dup( STDIN_FILENO );
14

15 // Create the pipe


16 if (pipe( pipefd ) == -1) {
17 perror ("pipe");
18 exit( EXIT_FAILURE );
19 }
20

21 // Fork a child process


22 pid = fork ();
23 if (pid == -1) {
24 perror ("fork");
25 exit( EXIT_FAILURE );
26 }
27

28 if (pid == 0) {
29 // Child process : Execute "cat file.txt"
30

31 // Redirect stdout to the write end of the pipe


32 dup2( pipefd [1], STDOUT_FILENO );
33

34 // It’s important to close unused file descriptors to prevent the unnecessary


consumption of system resources and avoid potential issues with resource leakage .
35

36 close( pipefd [0]); // Close the unused read end of the pipe
37 close( pipefd [1]); // Close the original write end of the pipe
38 char* args [] = {"cat", "file.txt", NULL };
39 execvp (args [0], args);
40 perror (" execvp ");
41 exit( EXIT_FAILURE );
42 } else {
43 // Parent process : Handle "cd"
44

45 // Redirect stdin to the read end of the pipe


46 dup2( pipefd [0], STDIN_FILENO );
47 close( pipefd [1]); // Close the unused write end of the pipe
48 close( pipefd [0]); // Close the original read end of the pipe
49

50 // Here , instead of executing "cd" as a child process ,


51 // we simulate a built -in "cd" command within the parent shell.
52 char buffer [256];
53 while (fgets(buffer , sizeof ( buffer ), stdin) != NULL) {

20
CS 370 Operating Systems

54 buffer [ strcspn (buffer , "\n")] = ’\0’; // Remove newline character


55 if (chdir( buffer ) == -1) {
56 perror ("chdir");
57 }
58 }
59

60 // Restore the original stdin


61 dup2(saved_stdin , STDIN_FILENO );
62 close( saved_stdin ); // Close the duplicated file descriptor
63 }
64

65 return 0;
66 }

E GNU Readline

GNU Readline is a library that provides line-editing and history capabilities for interactive programs with a
command-line interface, such as the Unix shell and the programming languages Python, Ruby and Haskell.
It is free software, distributed under the terms of the GNU GPL, version 3 or later.
As far as I know, Bash, and a lot of other commercial shell programs use Readline to provide the standard
command line interface. The Readline library includes additional functions to maintain a list of previously-
entered command lines, to recall and perhaps reedit those lines, and perform csh-like history expansion on
previous commands. It also provides the UP/DOWN scrolling through history by default. The significance
of this library will become apparent instantly when you’ll use scanf (a highly discouraged way to read input
from stdin), and won’t be able to edit the entered input in any way.
You are free to implement these features on your own if you want. That itself, can be a very interesting
learning opportunity but you are also allowed to use GNU readline. The following C program demonstrates
the usage of GNU readline. The provided makefile has been configured to link the program with the
readline library. Readline is a very extensive library, and is full of useful/useless features. Make sure to
check out the official docs.
1 # include <stdlib .h>
2 # include <stdio.h>
3 # include <readline / readline .h>
4

5 int main ()
6 {
7 // Configure readline to auto - complete paths when the tab key is hit.
8 rl_bind_key (’\t’, rl_complete );
9

10 while (1) {
11 // Display prompt and read input
12 char* input = readline ("prompt > ");
13

14 // Check for EOF.


15 if (! input)
16 break;
17

18 // Do stuff ...
19

20 // Free buffer that was allocated by readline


21 free(input);
22 }
23 return 0;
24 }

21
CS 370 Operating Systems

F Debugging
"Debugging is like being the crime detective in a crime movie where you’re also the murderer."

Debugging is the process of finding and resolving bugs (defects or problems that prevent correct operation)
within computer programs, software, or systems. Since this will be the first time writing a non-trivial C
(read C, not C++) program for a lot of you people, you will be making a lot of mistakes. And that’s okay.
That’s how you learn. Often times you will find yourself staring at a Segmentation fault message that shows
up out of nowhere. Or maybe your program will just crash without any error message. Or maybe your
program will just hang and not do anything. Or maybe your program will just not do what you want it to
do. These are all very common problems that you will face while writing your shell. And the only way to
solve these problems is to debug your program.

F.1 Print-and-hunt

One of the most common (and probably the most primitive) methods of debugging is print-and-hunt. This
is something you probably are the most used to. A proper way to do printf debugging is to use logs instead.
With logs, you can separate debug statements from the actual error/non-error print statements. In a debug
build the debug statements are printed, and in a release build they are excluded, by simply changing the
value of some control variable instead of removing all the print statements manually. For this purpose,
we have provided two macros, LOG_DEBUG, and LOG_ERROR, which are wrappers around printf and can be
used how you would use printf. These macros are defined in include/log.h. You can also use assert
statements to check for certain conditions. If the condition is false, the program will terminate and print
the line number and the file name where the assertion failed (only in debug mode).

F.2 GDB

GDB is an essential tool that allows you to interactively debug a program while it is running or after it
crashes. It gives you insight into what is happening inside your program, helping you pinpoint bugs and
incorrect behavior.

• Pause execution at breakpoints or specified conditions to inspect the state of the program.
• Examine the state of the program at any point, including variable values, function call stacks, and
memory contents.
• Modify program state on the fly to test solutions for potential bugs without recompiling the program.

F.2.1 Steps for Using GDB


1. Compile with Debug Symbols: Ensure that your program is compiled with debug symbols to
make debugging easier. This can be done by setting the BUILD_DEFAULT to debug in the Makefile,
then running:
1 $ make clean
2 $ make

2. Start GDB: Use the make gdb command to start GDB with your program:
1 $ make gdb

3. Setting Breakpoints: Breakpoints allow you to pause execution at a specific point in your program.
Set a breakpoint at the start of the main function using:
1 (gdb) break main

22
CS 370 Operating Systems

4. Running the Program: Once inside GDB, run your program with:
1 (gdb) run

You can find a detailed guide on how to use GDB here (including stepping line-by-line through code, printing
variable values, and much more).

F.3 Valgrind

Valgrind is a programming tool for memory debugging, memory leak detection, and profiling. While Valgrind
comes with many tools, we are primarily concerned with memcheck, the default tool.
Valgrind can be used to detect memory leaks and identify their sources within a program. It is also par-
ticularly useful for finding the source of segmentation faults. If your program encounters a segfault, you
can rerun it with Valgrind to identify exactly where the segfault occurred in your code, provided the same
initial conditions are met.
Running Valgrind: To run Valgrind on your program, use the following command:
1 $ make valgrind

Valgrind will output something like this when a segmentation fault occurs.
==113== I n v a l i d w r i t e o f s i z e 4
==113== a t 0 x10B674 : main ( main . c : 1 6 8 )
==113== Address 0 x0 i s not s t a c k ’ d , malloc ’ d o r ( r e c e n t l y ) f r e e ’ d
==113==
==113==
==113== P r o c e s s t e r m i n a t i n g with d e f a u l t a c t i o n o f s i g n a l 11 (SIGSEGV)

The code I used to generate this segfault was:


int ∗ p t r = NULL;
∗ ptr = 5;

Disclaimer
You must build the project in debug mode in order to view the line number where segfaults occur.

23

You might also like