The Design and Implementation of A Media On Demand System For WWW
The Design and Implementation of A Media On Demand System For WWW
The Design and Implementation of A Media On Demand System For WWW
Abstract
Current WWW clients do not directly support synchronized playback of continuos media. It is
however possible to support continous media by using external programs that are controlled
by the WWW client. This paper describes the design and implementation of a Media on
Demand server which uses WWW as its user interface. Offered media include audio and
video recordings of transmissions on the Internet Multicast backbone (MBONE), as well as
arbitrary prerecorded audio files.
1.0 Introduction
A hypertext link in WWW may refer to an arbitrary object. Such objects might be text files,
images, sound files or anything else. Most commonly, hypertext links point to documents
written in HTML, the Hypertext Markup Language [HTML], which may contain other
hypertext links. An HTML document is a representation of textual information. Apart from
specifying hypertext links, the principal use of HTML is to format text and display characters
using different fonts. The multimedia capabilities of HTML are quite limited. It is possible to
include graphics in the flow of text, such as GIF images. This is called inlining. It is, however,
not possible to inline arbitrary arbitrary multimedia objects, or even other textual documents.
An extention could be made to the inlining mechanism to make it possible to include audio
files and MPEG movies in the flow of text. The playback of the movie or the audio file would
begin when the text surrounding the inlined objects is displayed.
If an inline operation specifies multiple objects, or perhaps another HTML document, that in
its turn inlines multiple objects, those objects should be played in parallel. This would make it
possible to present a slideshow at the same time as a an audio file is being played.
It is still possible to use WWW to play parallel continous media streams, such as audio and
video. But without some sort of extentions to HTML, such as those described above, the actual
playback will have to be done by programs external to the WWW client.
This paper describes the architecture of one such system that uses audio and video tools that
are being used for experimentation on the MBONE, the Internet Multicast Backbone.
The routing of multicast packets on the MBONE is implemented so that multicast packets are
only duplicated when the path to group members branch at a router. When the transmission
media of a link supports multicast, such as Ethernet, the IP multicast datagrams are transmitted
using link level multicast.
The efficient routing of multicast packets on the MBONE has made it tractable to use it for
experiments in multimedia conferencing. Such experiments have been conducted for about
three years [Casner] using various tools for transmitting audio, video and whiteboard data.
Conferences or seminars are regularly being sent on the MBONE from Europe, USA and
sometimes also Australia. Although remote conferencing makes it possible to participate in a
conference from the convenience of ones own office, the time differences between Europe and
the USA often make remote attendance impractical.
A reasonable solution to the problem would be to record the transmissions on a disk and play
them back at a later date.
We implemented a series of tools to record and replay captured conferencing traffic and
eventually realized that these recording tools, together with the normal conferencing software,
were good candidates for use in a WWW based video on demand server. Since the
conferencing tools are not limited to video, is more correct to talk about a media on demand
server, as opposed to a server limited to video only.
There are two popular programs for transmitting video on the MBONE, ivs by Thierry
Turletti at INRIA, and nv by Ron Frederick at Xerox PARC. Normally, these programs are
used with a separate program for transmitting audio, vat by Van Jacobsen and Steve
McCanne at LBL. A typical video transmission using nv requires approximately 128 kbit/s
and the audio adds an extra 64 kbit/s. This is roughly an order of magnitude less than what is
used by MPEG-1. The image is only a quarter of the size of the original video frame and the
frame rate is usually less than 5 frames per second. This quality is not acceptable for
transmitting feature length movies. But this is often sufficient for the purpose of transmitting
images from a conference. As long as the sound is adequate, it might not be important to have
full motion video of the speaker.
If the camera is pointed at the overhead display at the conference site, the low frame rate
should be of even a lesser problem. Although contrast and resolution may still be a problem, it
is recommended to use a distributed whiteboard program, (several such programs are
available) to disseminate the slides.
In an effort to avoid large packetization delays which might be irritating during two-way
communication, the vat and nv programs send fairly small size packets. This also helps to
reduce the perceptible impact of lost packets. The drawback is, of course, that a higher packet
intensity is imposed on the network. Nv generates about 40 packets per second, and vat
generates 25 packets per second if transmitting 40 ms worth of audio in each packet. Using the
same figures as above, a Sparcstation based router will receive one packet every 15 ms. If it
makes 4 copies of each packet, it will be able sustain 3-4 concurrent transmissions.
There are two distinct methods of accessing the recordings. The one that was implemented
first uses the fill out form facility of HTML+ [HTML+], a proposed extention to HTML, that
was first implemented in NCSA Mosaic 2.0.
Figure 1.
The server script then proceeds to generate a custom HTML document to control the
recording. This document includes references to how to start the vat and/or nv programs with
the right parameters to be able to receive the transmission. The document also includes a link
to a script that returns a document with the MIME type application/x-csh. This makes
it possible for clients to have the multimedia programs started automatically, if their WWW
client is configured to recognize such documents.
Figure 2.
At the bottom of the control page there is a set inlined bitmaps representing the controls of a
tape deck, ie. start, stop, pause, forward and rewind buttons. See Figure 2.
Pressing the play button will execute a script at the server that starts the vat_play (for audio
only) or av_play (for audio and video) programs if they are not already running. The script
will execute a rsh command to the host that has the disk with the recordings, rather than
executing playback programs directly on the WWW server host.
When the other buttons are pressed, or the play button is pressed after the playback has been
started, the script will execute a remote control program. This program, playremote, will
identify the correct instances of the playback programs and pass the remote control command
on to them.
The server also has a mechanism for logging requests and it keeps track of how many
recordings are currently being played. In our current configuration it will only accept 5
concurrent retransmissions in an attempt to limit the load on our local area network.
This method has the advantage that was not originally possible using form based access,
namely that new recordings can be added just by modifying a HTML document. The server
script need no longer be modified.
It is still possible that the user may want to use other settings than the default values. The
control page provides the user with a link to a fill out form which makes it possible to change
the chosen parameters. This form is dynamically generated, and it does not allow the user to
choose between different recordings. In the example above, the form will look like the original
fill out form, but the only recording offered will be Johns speech.
The recording is offered not as a scrollable list, but as a radio button. The reason for this is that
radio buttons take a VALUE attribute, which makes it possible to supply the directory path
(conf1/talk1) as a value instead of the descriptive string. When the form is submitted, the
server script will need to get the directory path from somewhere. The only place it can get this
information from is from the values provided by the form.
It might look somewhat awkward to have a radio button with only one selection, but it used to
be that all values supplied by a form had to be visible by the user. A possible workaround
would be to supply the directory path information as an extention to the pathname that
specifies the script to be executed when the form is submitted. But this should not be necessary
as NCSA Mosaic 2.2 now supports invisible INPUT keys/value pairs if they are marked as
having the type HIDDEN.
for the user to press the forward button several times. This is especially true if the user wants
to skip forward significantly in the recording. But Mosaic does not allow one to queue multiple
requests, so pressing buttons while the server is in the midst of serving a request has no effect.
It is not certain that adding the ability of recognizing multiple requests or the ability to submit
aggregate requests to HTTP is a good idea. One must draw the limit somewhere, otherwise
there is a risk that the clients and/or HTTP becomes overly complicated. An alternative
approach would be to design a different user interface that makes better use of available
functionality.
Figure 3.
An example of such a user interface is shown in Figure 3. There are two horizontal arrows
with long shafts pointing in opposite directions. When the user points and clicks at an arrow,
the amount of time to skip forward or rewind would be calculated from the offset of the
pointer to the base of the shaft. This can be done by using the image mapping techniques that
are already built in to NCSA Mosaic and many HTTP servers.
The only way to end a transmission is for the user who started it to use the buttons in the
WWW control page.
The problem of malicious or accidental transmissions of recordings that nobody wants to
receive is further complicated in the multicast case. Different solutions have been suggested,
that all make use of special ID messages sent by the conferencing programs. Vat sends these
ID messages to the multicast group, while nv sends its ID messages using unicast IP to the
sender of the video images. The idea is that the server should terminate the transmission if it
detects the absence of these messages.
One possible solution is for the server to listen for ID messages from any group member. It
aborts the transmission if it does not receive any ID messages after a given period of time.
Unfortunately, usage patterns on the MBONE seem to indicate that users have a tendency to
tune in on multicast groups without actually being interested in the traffic. Often users join a
multicast group, and then leave their offices while still remaining members of the group. Thus,
this scheme would lead the server to incorrectly believe that there still are users who are
interested in receiving the transmission, when there in fact might not be.
Another possible solution is for the server to listen for only those ID messages that originate
from the user that originally requested the transmission. This method has a problem opposite
to the scheme described earlier. If there are other users who are in fact interested in receiving
the transmissions, these users will be cut off if first user leaves the group. This is particularly
prone to happen when retransmission is requested from an office workstation, but the
recording is to be viewed at a different location such as conference room.
Both methods also suffer from the complication that the ID messages generated by vat are
being sent using IP multicast. The MBONE sometimes experiences outages due to
malfunctioning multicast routers. These outages can last several minutes and may erroneously
cause the server to abort its transmissions. Since the underlying unicast network, the Internet,
is still functioning, this could have been avoided had the ID messages been sent directly, with
unicast, to the source.
Even when disregarding these problems, which will not occur as long as the users are well
behaved, multicast communication poses other problems. The original access metaphor is
altered to something that resembels cable TV Pay Per View systems. The question now is
who should have the remote control of the playback. If the unicast communication model is
simply extended, it will be the user who first requested the transmission, but control can also
be shared by all members of the multicast group. This latter approach is not likely to be
adequate unless the group is a small group of collaborating individuals.
If the transmission begins at the request of the first user, this may not satisfactory to the other
users. When they join the group they may already have missed a significant part of the
transmission. A different approach would be to assign a multicast group at the request of the
first user, but schedule the transmission for a later time. This increases the possibility of
multiple users being able to join the multicast group in time to receive the transmission from
the beginning. From the perspective of saving bandwidth, it is desirable to schedule the
transmission far in the future, so as to maximize the number of users that can follow the
transmission. But this may not be satisfactory to the first user since the latency between the
request and the beginning of the transmission may be very large. Applying this scheme makes
it even more similar to cable TV Pay Per View systems. Much of the interactivity associated
with video on demand systems is lost.
6.0 Summary
We have described the design and implementation of a server that allows users to request
retransmissions of audio and or video recordings. These recordings typically originate from
the MBONE, but it is also possible to request audio files. The media will be delivered from the
server to one or multiple recipients using conferencing tools used for experimentation on the
MBONE. We have identified some problems that occur when one tries to apply the tape
deck, video on demand and pay per view metaphors to a system like ours.
7.0 References
[Casner] S. Casner, S. Deering, First IETF Internet Audiocast, ConneXions, No.6:10-17, June
1992
[Deering] S. Deering, Multicast Routing in a Datagram Internetwork, Ph.D. Thesis, Stanford
University, 1991.
[Federighi] C. Federighi, L.A. Rowe, A Distributed Hierarchical Storage Manager for a
Video-on-Demand System, Storage and Retrieval for Image and Video Databases II,
IS&T/SPIE, Symp. om Elec. Imaging Sci. & Tech., San Jose, CA, February 1994.