Web Design & Development Guide
hyperlinked video, is a displayed video stream that contains embedded,
user clickable anchors, allowing navigation between video and other
hypermedia elements. Hypervideo is thus analogous to hypertext, which allows a reader to click on a word in one document
and retrieve information from another document, or from another place in
the same document. That is, hypervideo combines video with a non linear
information structure, allowing a user to make choices based on the
content of the video and the user's interests.
A crucial difference between hypervideo and hypertext is the element of time.
Text is normally static, while a video is necessarily dynamic; the content of
the video changes with time. Consequently, hypervideo has different technical,
aesthetic, and rhetorical requirements than a static hypertext page. For
example, hypervideo might involve the creation of a link from an object in a
video that is visible for only a certain duration. It is therefore necessary to
segment the video appropriately and add the metadata required to link from frames - or even objects - in a video to the
pertinent information in other media forms.
History of Hypervideo
Illustrating the natural progression to hypervideo from hypertext, the
a hypertext writing environment, employs a spatial metaphor for displaying
links. Storyspace utilizes 'writing spaces', generic containers for
content, which link to other writing spaces. HyperCafe,
a popular experimental prototype of hypervideo, made use of this tool to create
"narrative video spaces". HyperCafe was developed as an early model of a
hypervideo system, placing users in a virtual cafe where the user dynamically
interacts with the video to follow different conversations.
Video to video linking was demonstrated by the Interactive Cinema Group at
the MIT Media Lab. Elastic Charles was a hypermedia journal developed between
1988 and 1989, in which "micons" were placed inside a video, indicating links to
other content. When implementing the Interactive Kon-Tiki Museum, Listol used
micons in order to represent video footnotes. Video footnotes were a deliberate
extension of the literary footnote applied to annotating video, thereby
providing continuity between traditional text and early hypervideo. In 1993, Hirata et
considered media based navigation for hypermedia systems, where the same type of
media is used as a query as for the media to be retrieved. For example, a part
of an image (defined by shape, or color, for example) could link to a related
image. In this approach, the content of the video becomes the basis of forming
the links to other related content.
HotVideo was an implementation of this kind of hypervideo, developed
at IBM's China Research Laboratory in 1996. Navigation to associated
resources was accomplished by clicking on a dynamic object in a video. In 1997,
a student project at the MIT Media Lab called Hypersoap further developed
this concept. HyperSoap was a short soap opera program in which a viewer could
click with an enhanced remote control on objects in the video to find
information on how they could be purchased. The company Watchpoint Media was
formed in order to commercialize the technology involved, resulting in product
called Storyteller, oriented towards interactive television. Watchpoint
Media was acquired by Goldpocket in 2003, which was in turn acquired by Tandberg
Television in late 2005.
In 1997, the Israeli software firm Ephyx Technologies released a product
the first commercial object based authoring system for hypervideo. This
technology was not a success, however; Ephyx changed its name to Veon in 1999,
at which time it shifted focus away from hypervideo to the provision of
development tools for web and broadband content.
VideoClix, a hypervideo authoring tool able to dynamically track and
link objects, was released in 2001 by eline Technologies, founded in 1999 as a
provider of hypervideo solutions. With the advantage that its videos can
play back in popular video players such as QuickTime and Flash, this product has
proven to be a commercial success. In 2006, eline
Technologies was acquired by VideoClix Inc.
Concepts and Technical Challenges
Hypervideo is challenging, compared to hyperlinked text, due to the unique
difficulty video presents in node segmentation; that is, separating a video into
algorithmically identifiable, linkable content.
Video, at its most basic, is a time sequence of images, which are in turn
simply two dimensional arrays of color information. In order to segment a video
into meaningful pieces (objects in images, or scenes within videos), it is
necessary to provide a context, both in space and time, to extract meaningful
elements from this image sequence. Humans are naturally able to perform this
task; however, developing a method to achieve this automatically (or by
algorithm) is a complex problem.
And it is desirable to do this algorithmically. At an
NTSC frame rate of
30 frames per second,
even a short video of 30 seconds comprises 900 frames. The identification of
distinct video elements would be a tedious task if human intervention were
required for every frame. Clearly, even for moderate amounts of video material,
manual segmentation is unrealistic.
From the standpoint of time, the smallest unit of a video is the frame (the
finest time granularity).
Node segmentation could be performed at the frame level - a straightforward task
as a frame is easily identifiable. However, a single frame cannot contain video
information, since videos are necessarily dynamic. Analogously, a single word
separated from a text does not convey meaning. Thus it is necessary to consider
the scene, which is the next level of temporal organization. A scene can be
defined as the minimum sequential set of frames that conveys meaning. This is an
important concept for hypervideo, as one might wish a hypervideo link to be
active throughout one scene, though not in the next. Scene granularity is
therefore natural in the creation of hypervideo. Consequently, hypervideo
requires algorithms capable of detecting scene transitions.
Of course, one can imagine coarser levels of temporal organization. Scenes
can be grouped together to form a narrative sequence, which in turn are grouped
to form a video; from the point of view of node segmentation, these concepts are
not as critical. Issues of time in hypervideo were considered extensively in the
creation of the HyperCafe.
Even if the frame is the smallest time unit, one can still spatially segment
a video at a sub-frame level, separating the frame image into its constituent
objects; this is necessary when performing node segmentation at the object
level. Time introduces complexity in this case also, for even after an object is
differentiated in one frame, it is usually necessary to follow the same object
through a sequence of frames. This process, known as object tracking, is
essential to the creation of links from objects in videos. Spatial segmentation
of object can be achieved, for example, through the use of intensity gradients
to detect edges, color histograms to match regions,
or a combination of these and other methods.
Once the required nodes have been segmented and combined with the associated
linking information, this metadata must be incorporated with the original video
for playback. The metadata is placed conceptually in layers, or tracks, on top
of the video; this layered structure is then presented to the user for viewing
and interaction. Thus the display technology, the hypervideo player, should not
be neglected when creating hypervideo content. For example, efficiency can be
gained by storing the geometry of areas associated with tracked objects only in
certain keyframes, and allowing the player to interpolate between these
keyframes, as developed for HotVideo by IBM.
Furthermore, the creators of VideoClix emphasize the fact that its
content plays back on standard players, such as Quicktime and Flash. When one
considers that the Flash player alone is installed on over 98% of internet
enabled desktops in mature markets,
this a perhaps a reason for the success of this product in the current arena.
Hypervideo authoring tools
The process of creating hypervideo content is known as authoring. Many early
attempts at creating widely distributed authoring tools were not successful, for
a variety or reasons. However, this field is currently enjoying a resurgence of
interest, perhaps due to the greater availability of broadband internet.
Most likely the most successful product in this category is VideoClix,
described on its website as the premier and only commercially available
technology for creating clickable videos. It is prominent in the rapidly growing
domain of internet video. Tandberg Television, specializing in interactive
television solutions, has a hypervideo system called AdPoint for
video-on-demand. They also sell Storyteller, a product derived from the
MIT project Hypersoap.
Adivi (Add Digital Information to VIdeo) is a project of the Darmstadt
University of Technology, Germany. They are studying the potential of hypervideo
to support collaborative documentation. Siemens, an
engineering firm, will use this technology for enhanced on-line training
Adobe Flash, a popular multimedia authoring program widely used to create
animated web content, can also be used to create hypervideo content. As Flash
was not designed as a hypervideo authoring tool, creating such content can be
difficult using Flash alone. Such added functionality has been provided through
outside software in the past - for example, MoVideo and Digital Lava.
However, these products are no longer sold.
In the past, there have been a number of attempts to market hypervideo
authoring software that is no longer available. MediaLoom,
a product based on a Masters of Science project at the
Georgia Institute of Technology, was an early hypervideo authoring tool. It
used the Storyspace hypertext authoring environment to generate script
files for the hypervideo engine of the HyperCafe. This product reached
prototype stage, but was not commercially successful. Ephyx Technologies created
v-active, the first authoring software using dynamically tracked objects
in video. The company moved away from hypervideo, however, when it became Veon
Hypervideo can also be created using services provided by firms with
proprietary methods, such as those provided by Vimation. However, this company
does not licence its authoring software.
The rise of hypervideo
As the first steps in hypervideo were taken in the late 1980s, it would
appear that hypervideo is taking unexpectedly long to realize its potential.
Many interesting experiments (HyperCafe, HyperSoap) have not been
extensively followed up on, and authoring tools are at the moment available from
only a small number of providers.
However, perhaps with the wider availability of broadband internet, this
situation is rapidly changing. Interest in hypervideo is increasing, as
reflected in popular blogs on the subject,
as well as the extraordinary rise of the internet phenomenon
YouTube. Furthermore, by 2010, some estimates have internet downloads claiming
over one third of the market for on-demand video.
As the amount of video content increases and becomes available on the
internet, the possibilities for linking video increase even faster. Digital
libraries are constantly growing, of which video is an important part. News
outlets have amassed vast video archives, which could be useful in education and
Direct searching of pictures or videos, a much harder task then indexing and
searching text, could be greatly facilited by hypervideo methods.
Perhaps the most significant consequence of hypervideo will result from
commercial advertising. Devising a business model to monetize video has
proven notoriously difficult. The application of traditional advertising methods
- for example introducing ads into video - is likely to be rejected by the
online community, while revenue from selling advertising on video sharing sites
has so far not been promising.
Hypervideo offers an alternate way to monetize video, allowing for the
possibility of creating video clips where objects link to advertising or
e-commerce sites, or provide more information about particular products. This
new model of advertising is less intrusive, only displaying advertising
information when the user makes the choice by clicking on an object in a video.
And since it is the user who has requested the product information, this type of
advertising is better targeted and likely to be more effective.
c Smith, Jason and Stotts, David, An Extensible
Object Tracking Architecture for Hyperlinking in Real-time and Stored Video
Streams, Dept. of Computer Science, Univ. of North Caroline and Chapel Hill
b HyperCafe: Narrative and Aesthetic Properties
of Hypervideo, Nitin Nick Sawhney, David Balcom, Ian Smith, UK Conference on
^ Brøndmo H; Davenport G (1989).
Elastic Charles: A Hyper-Media Journal. MIT Interactive Cinema group.
Liestol, Gunner. Aesthetic and Rhetorical Aspects of linking Video in
c Luis Francisco-Revilla (1998).
A Picture of Hypervideo Today. CPSC 610 Hypertext and Hypermedia.
Center for the Study of Digital Libraries: Texas A&M University.
Hirata, K., Hara, Y., Shibata, N., Hirabayashi, F., 1993, Media-based
navigation for hypermedia systems, in Hypertext '93 Proceedings.
Khan, Sohaib and Shah, Mubarak, Object Based Segmentation of Video Using
Color, Motion and Spatial Information, Computer Vision Laboratory,
University of Central Florida
U.S. Patent 6912726
The Economist, Feb. 8 2007, What's on next
The Economist, Aug 31st 2006, The trouble with YouTube.