Overview of OS interface

We are going to build something part of the way to Web 2.0

You will be building an OS layer. It will be sort of a file system running on top of a UNIX operating system. You will write your code in c++ using threads. The overall project will act like a Web server, with a few extensions. Web pages are referenced by a URL. A typical URL is http://www.ndsu.nodak.edu/instruct/juell/cs475s00/home.html. We will refer to the parts of a URL as: <access-method>, <ip-name>, <path> and <file>. The "http" we note as <access-method>, www.ndsu.nodak.edu as <ip-name>, /juell/cs475s00 as <path> and home.html as <file>. That is a URL would be <access-method>://<ip-name><path>/<file>. The normal operation of a http <access-method> is the <ip-name> specifies a computer's name and the <path> specifies the directory structure from a known point to the file. The normal known point is ~/public_html.

We will provide our own <access-method>'s. Two of these are httpr and httpw. httpr will appear to the user as identical to http, the name change is just to highlight we are to do the handing rather than the standard Web server. httpw is new in that it will allow writing a page. A URL for both httpr and httpw will look identical to normal http URL's. However, the operation and directory structure supporting these will be very different from the traditional systems.

The Broker

Our system will have a central program called a broker. A broker will receive the external requests (web page reads and writes). The broker will then communicate with a distributed collection of processes and machines. The httpr requests are read only. When the broker receives a URL request, it will broadcast the <path><file> to the distributed processes. If one or more process knows the <path><file> they will respond back to the broker. The responding processes and the broker will have an auction. The winner will then respond back to the requesting external entity.

The URL/commands

The httpw requests are writes. They write a file. If the file does not exist, it is created. The broker will proceed as for a httpr URL. If a file is found, the winner will read the file from the external entity and write the file, and replace the previous version of the file. If no process knows the <path><file>, a process is chosen and it builds an entry for file. Each "directory" process has a space in which it writes. It will write the file and its descriptors in this space. You will need to use some standard UNIX file naming for this, but that will not be seen by the user accessing through the URL mechanism. The path does not create a file path, but is used as part of a descriptor in an auxiliary table. There is no real directory structure. All of the files for a single directory process are in the same directory and at the same naming level. This is basically a content addressable system. Rather than a single process and a hierarchical directory structure, we will use multiple processes and a almost flat directory structure. The processes will be responsible for "name"'s. These names will represent what would be a path and file name in a normal system.

There is one other special type of URL. This is httpe or execute. These are used to point to and execute programs. A program execution has two special files [standard-input] and [standard-output]. A program will read from [standard-input], do whatever processing it does and produce a file of [standard-output]. Where these files are obtained from and where they are delivered is specified by keywords specifying URL's within the system's file space or standard input and output. The file executed is to be in the system file space, but is executed as a Unix shell file. The execution reads from a file or standard input and passes this to the Unix shell as standard input. The output from the Unix script is then returned and either displayed on standard output (the screen) or written to the the specified output file.

The initial system

We will simulate the broker reading from ports, by having the broker reading from standard input. The input file will be structured as

<file>    ::= <request>*
<request> ::= <httpr-URL><EOF-marker>
<request> ::= <httpw-URL><line>* <EOF-marker>
<request> ::= <httpe-URL>
			[in=<httpr-URL>] [out=<httpw-URL>]
			<line>* <EOF-marker>
<EOF-marker> ::= $			// a exactly one character line 
<line>    ::= <char>*			// any text except a <EOF-marker>
<request> ::= batch <batch-number> <Unix-file-name>
<batch-number> ::= <digit>
<request> ::= showbatch <batch-number>
<request> ::= setflag [batchtrace=(t,f)] [trace=(t,f)] [showservers=(t,f)] ...

A <httpr-URL> will return the file or "file not found". A <httpw-URL> will find or create a file. The lines following the URL will be the contents of the new file. It is important that the file write operation be atomic. That is, either all or none of the file is written. You should never read part of a file or parts of multiple versions of the file. This means that you will need to lock all access to a file during a write. You must also be very careful that you do not keep pointers to files that may no longer exit.

There are two versions of URL you need to handle, one pointing to a file and one pointing to a "directory". We will interpret a URL ending in a slash as pointing to a directory and one not ending in a slash as pointing to a file. If a httpr is specified for a directory, the names of all of the files with a matching path will be returned, one per line.

URL read and write
the broker task
the distributed process structure
per directory read/write process advertise process
debugging
  unit test on object
  group test on group of objects
  test harness
  test generator
the distributed version
  all system interacting


Detailed Semantics

access method URL ending action
httpr letters read the named file
httpr / list the files in the named "directory"
httpw letters Write the input to the named file.
httpw / not valid
httpe letters Execute the program specified by the httpe. Read the standard input or "in" file and pass to the Unix shell script as standard input. The broker is to pass a job number. Hash the job number and pass back to the broker. If the broker does not confirm it has authorized that job on this server, cancel all jobs on the server. Execute the file specified after the httpe. The shell output is to be returned either to standard output (the screen) after the job runs, or if out is specified, it is to be written to the specified system file.
httpw letters
Create a unique file name starting with the path/name specified. Return the name to the broker like a httpr request. (unique to the system, not just the DS) If the file has a keyword notice, write the full URL of the new file at the URL specified by notice.

Some Design Choices

If the access method marks are a problem, simply use http and add an accessmethod=[r,w,e] at the end. This will allow your requests to be actual web requests. The execution option, of httpe, can be handled with the exec (or execl) command. If you use an untested exec, it is very important to have some sort of security to keep others from passing in commands to execute. The other alternative is to have a list of commands you support, but then you have to write a command interpreter.

Questions:

Should directories "exist" or only be derived implicitly? How could we specify a unique name, and pass it to another program? Is there an easy and clear way to have temp files for execution of programs?
some examples
This page updated 1/9/04 PLJ