Privacy and Security Notice

ESnet ADVANCED APPLICATIONS

ESnet ADVANCED APPLICATIONS

A White Paper by

ESSC ADVANCED APPLICATIONS REQUIREMENTS
WORKING GROUP

Martin Greenwald, Sandy Merola, Larry Price, Bill Wing

for the ESnet Steering Committee

February 14, 1998

SUMMARY

In this paper the Energy Sciences Network (ESnet) Steering Committee (ESSC) makes an initial identification of advanced networking technologies that are expected to underlie the future success of DOE science. The ESSC provides this information in an attempt to influence future networking services as well as associated research and development. We identify seven application areas that, with appropriate development, promise to serve as a foundation for expected expansion and extension of DOE programmatic research capabilities. We then identify and discuss five cross-cutting network technologies that will enable and/or accelerate further scientific application advancements. We close this paper with a short discussion of network research and development and, finally, a brief discourse on the benefits of a collaborative approach.

We note here that the success of ESnet is ultimately determined by the advancements in DOE science that the network facilitates. We commend the ESnet providers on their success to date within budget constraints and given agreed prioritization.

Past experience has demonstrated that networking advances are both application driven and technology driven. ESnet must continue to be responsive to both. The DOE community has applications that the next generation of network services must support. Synergistically, network-based technological developments will benefit the applications community. Local-area networking, shared whiteboards embedded into workstation videoconferencing tools, and the World Wide Web are all examples of network-based technological advances that have benefited the network user community. Thus, ESnet must be responsive to the needs of the programmatic applications community as well as the network research community.

BACKGROUND

The ESnet Steering Committee formed an Applications Requirements Working Group to help ensure that future network requirements of the ESnet community are identified. In particular, this working group focused on applications whose demands, in the five-year time frame, are expected to exceed current network services. We acknowledge and appreciate the input from members of the ESnet Coordinating Committee, MICS-funded network research principal investigators, and principal investigators throughout the DOE community, including participants in the recent DOE Large Scale Networking Workshop. As we work together to realize the benefit of network advancements, we expect that the ESSC, the ESnet implementation team, and the MICS program office will supplement this document with experience acquired from DOE2000, the Network Challenged Applications program, and other contacts within the DOE research community.

THE PRESENT ROLE OF ESnet IN DOE RESEARCH

ESnet was established in 1986 to provide commonly needed network services to DOE's Energy Research programs, following a decade during which computer networking became established as an essential tool for research and each of the programs had found its own way to provide networking. A steering committee was established to provide input on networking requirements and to provide information back to the programs on developments in networking. This mission of ESnet and the composition of the ESSC were later expanded to include other significant partners within the DOE community.

The rather unexpected result of the feedback arrangement among the ESSC, the MICS program office, and the ESnet implementation team was a notable degree of coevolution of the research programs and the network. In many cases, the network has been tailored to provide for specific program requirements, and in turn the programs have been able to quickly and efficiently utilize new capabilities of the network. Network research has helped advance the network services of this community and the entire Internet. After eleven years of ESnet, there is a significant and growing degree of integration between the network and the programs. While this paper is primarily about future needs, we provide some brief examples of current advanced uses of the network.

Large collaborations of scientists are typical of the ESnet community. In the high-energy and nuclear physics and fusion communities, for example, collaborations typically encompass hundreds of scientists who rely heavily on effective interactive, network-based communications. Collaborative authoring of research papers, computer codes, and other documents by extended groups is an early and continuing use of the network. The ability to exchange documents within minutes has transformed the effectiveness of distant collaboration by shortening the time between modification iterations by orders of magnitude. While use of email for this purpose had already provided an enormous benefit, the current embellishments of multimedia files, annotation, and real-time distributed shared workspace dramatically improve efficiency.

Subsequent improvements in bandwidth and networking tools facilitated the collaborative analysis of large data sets from multiple remote locations. This development, combined with collaborative writing, was crucial in permitting the formation of larger collaborations of scientists and enabling them to address larger scientific problems. A recent extension in this series of network enhancements, still maturing, is the ability of a scientist to participate remotely in data-taking sessions, with visual, graphical, and other data returned from the experimental site.

Remote conferencing is also important to current research work. Conferencing encompasses communications that mimic the interactions of a group of people around a conference table. Modern conferencing can provide real-time video of participants and presentations, with shared copies of documents for editing and annotation and high-quality audio communications for discussion. The requisite software must be easily installable on popular platforms, the environments created must follow consistent rules for ease of use with a reduced learning curve, and conferences must be easy to establish. Scientific collaboration, from experiment design to analysis, generally requires detailed interaction between the instrument scientists and the application scientists. This has just become possible through the use of network collaboration tools, resulting in a growing appreciation of collaborative possibilities given the right mix of tools.

These networking-based services and others have augmented the abilities and enhanced the effectiveness of the DOE scientific community. Services continue to expand and improve in response to developing needs of researchers.

EVOLUTION OF NETWORK REQUIREMENTS

While developments in the programmatic research of the ESnet community cannot be predicted with complete clarity, it is clear that the scientific community will depend increasingly on the integration of network-based services directly into the scientific environment. The ESnet Program Plan, to be available at http://www.es.net/pub/esnet-doc/esnet-program-plan/1998/index.html, provides a detailed look at future directions of these programs.

In this document, we only report on the implications of current planning and make a modest extrapolation for what comes later. No attempt has been made to identify explicit site connectivity or associated bandwidth requirements. Rather, we have focused on applications requiring advanced networking services. The future of network service requirements is driven not only by the forecast of future network-based applications, but also by research that will advance the performance and services of ESnet and the Internet as a whole. The following sections summarize the new networking capabilities that will be needed by developments in applications and networking.

We have sorted application-driven requirements into the principal areas of:

We have sorted opportunities and requirements that are driven by anticipated service-offering advances into the network cross-cutting areas of:


APPLICATION AREAS

Remote Experimental Operations

Remote control of instruments and facilities is a very new and experimental capability. It has already found important uses in a few cases and several more trials are underway. Beginning with turbulence measurements on the Princeton Plasma Physics Laboratory's TFTR tokamak, plasma diagnostics have been run remotely from controlled fusion experiments for almost a decade. Recently this work has been extended to include proof-of-principle demonstrations of full remote control of tokamak operations, first on Alcator C-Mod at MIT and later on DIII-D at General Atomics. Developments in remote experimental operations require close cooperation between network managers and developers on the one hand and instrument builders and users on the other. Vigorous development will bring enormous benefits to research programs, including more efficient utilization of costly or unique instruments, as well as more efficient use of researchers' time. Further, the ability to correlate results from multiple instruments will greatly improve the depth of scientific understanding resulting from such measurements.

The remote control of experiments requires secure and guaranteed network transactions to ensure accurate and safeguarded control. Thus network security, authorization, and guaranteed bandwidth on demand are critical to success.

Large amounts of data are typically captured at the experiment site and are made available to the local researcher. The remote researchers must be provided with shared and distributed access to this experimental data and to shared applications. This might include access to large relational and object-oriented databases; the creation, access, and use of electronic notebooks; and visualization tools for remote data and applications performance analysis. All of these requirements will need to take place in real time, and the remote experimenter may not be located directly on an ESnet-backbone site.

Scientists involved in remote experimentation would greatly benefit from the existence of an environment (albeit virtual) similar to that at the experimental site proper. This places a demand on the network to support shared virtual environments including teleimmersion. Teleimmersion allows users to see, hear, and touch each other and a representation of their data in a simulation of the actual environment. Teleimmersion relies on virtual reality display environments at each location and high-performance network connections between locations. Resulting data flows range from streaming audio and video to virtual reality tracking data and interactive simulation updates. Additionally, diagnostics, remote access to machine status, and other related subsystems must be supported by the network with appropriate prioritization.

Authentication and security services are required at some level for all networked applications but are particularly important for experimental collaborations where expensive (and possibly hazardous) equipment may be involved.

Distributed Parallel Computing

A new computing paradigm, generating increased networking requirements as success stories spread, is distributed parallel computing. This approach to problem solving has been called by various names, including computational grids, computational nets, and simply distributed computing.

Two endpoints of distributed parallel computing are worth noting here. At one extreme, the environment would offer the sharing of a small number of distributed and unique high-performance computers. At the other extreme, the environment might consist of sharing very large numbers of underutilized desktop machines. For example, a cluster of 44 networked workstations was used to process video images containing complex three-dimensional information from the DIII-D tokamak and map the resulting data onto magnetic flux surfaces. The calculations would have taken more than a year on a single machine. Between these two extremes, a new class of distributed computing is growing, based on coupling resources on a user's desk with both local and geographically remote computing resources. As more and more sites assemble Beowulf-class computers and are willing to broker time among them, an expanded supercomputing community is forming, and associated networking requirements are expanding.

At the present time, the class of problems that lend themselves to this sort of distributed computational effort is relatively small. Algorithms that can successfully hide latency measured in tens of milliseconds instead of microseconds are still emerging from developers. However the rewards of being able to distribute problems across a meta-computer, whose resources are "free" within the brokering scheme, are so great they are being pursued by several groups. The next step in realizing such an approach is a formal, object-based computational model that allows the problem to gracefully use available resources without intrusively dominating them. Conceptually, this is a realization of the concept that the network serves as the actual backplane of the meta-computer that is composed of all the resources available on the network.

Remote/Shared Code Development

Most ESnet programs are increasingly dependent on collaborative work by researchers at distant locations. Thus, ESnet must support the collaborative development of large codes for simulation, data analysis, and other purposes. High-energy and nuclear physics collaborations typically require a centralized mechanism to control the distributed code development. As distributed computing and remote visualization become more prevalent, increased demands will be made on ESnet in this area.

Distributed code development teams can make use of desktop videoconferencing, a common code-version control system, an electronic notebook to document coding changes, common output display demonstrations, and other available communication and collaboration technologies. A common on-line code-sharing library would improve code and data access, code interconnection, and code invocation. Finally, large-scale projects, including the Spallation Neutron Source to be constructed at Oak Ridge National Laboratory, will require code development and optimization by over 100 users, well beyond the ability of present network-based tools to work effectively.

Issues such as the efficient use of distributed compute cycles, reliable asynchronous intertask communications, multicasting of data, the remote display and downloading of results, distributed task queuing, and session management will all need to be addressed if progress is to be made in this area.

Remote and Distributed Data Access

The network should enable transparent data access from remote locations for either storage or retrieval. A variety of technologies exist for this purpose-for example, distributed files systems, caching, distributed objects, and remote procedure calls-each appropriate for different applications. In general, these require a software infrastructure including an intuitive and consistent user interface, coordinated management, and other services.

Many programs using ESnet will be mounting experiments or simulation efforts that will generate enormous quantities of data. For example, the LHC (Large Hadron Collider) in Geneva, Switzerland, and RHIC (Relativistic Heavy Ion Collider) in Upton, New York, will each generate about a petabyte (million gigabytes) per year of raw data. Data rates from the Jefferson Lab accelerator in Newport News, Virginia, and the Tevatron Collider in Chicago, Illinois, are only slightly smaller. These data rates are so high that data analysis and reduction on the order of hundreds or thousands are necessary before it is feasible to move the data over the Internet. Data sets from Atmospheric Research, Basic Energy Sciences, and Fusion Energy Sciences are on the order of terabytes (thousand gigabytes). Across the ESnet community, it can be expected that in another decade at least 20 petabytes of raw data will be generated per year.

Large and rapidly growing databases of biological structure and sequence information are only tenuously connected to locally executed programs. Typically life scientists download entire databases or subsets, while others execute programs made available over the Internet which perform queries at the remote database site. This community would benefit from the existence of middleware APIs allowing programs to connect across the Internet directly to the databases, returning the appropriate files or query results to the program for immediate use.

The communities needing to use the data are distributed and will need to access the data multiple times from many different locations. In many cases, the data sets themselves will also be distributed across the network. Tools to manage and integrate views of distributed data will need to be developed. Work has begun on these issues at Berkeley Lab, Stanford Linear Accelerator Center, Brookhaven, and CERN, focused on object-oriented databases and hierarchical storage systems in support of BABAR, RHIC, and LHC.

Given such large databases, additional strategies are needed to reduce the impact of moving these data over the wide-area network. Local data reduction and data compression are two such strategies. Methods for providing users with estimates of the "costs" for accessing particular data (query estimation) may also be useful. Little has been done so far, however, to optimize network use through caching strategies. Database and network researchers will need to collaborate to solve this problem. Because generation of data is clearly increasing faster than bandwidth available on the Internet, it may be necessary to assess computational models in terms of their impact on the network.

Collaborative Engineering

Engineering has also become a collaborative effort, with design and analysis carried out by teams spread across the country and around the world. In this environment, excellent interactive communication is essential. Shared design efforts require effective tools for sharing files, displaying and annotating electronic drawings, and remote conferencing. An environment that must be virtually replicated over the network is that of a team of engineers sitting around a table piled with large, high-resolution drawings. Shared three-dimensional environments would greatly aid in visualizing complex structures and systems. Large-scale engineering codes are used in many areas of analyses. Three-dimensional thermal, structural, neutronic, and electromagnetic problems (often coupled) can often be solved only with the use of the powerful supercomputers. Engineers remote from the supercomputer centers need distributed computing tools to share the workload between local and remote systems and advanced visualization tools for analyzing the results.

Visualization

DOE scientists routinely perform computer simulations, computer modeling, and the analysis and synthesis of large amounts of experimental data, converting them into pictures or animation using sophisticated but data-intensive visualization techniques. Requirements for visualization techniques and associated data management permeate the ESnet user community, including plasma physics, climate analysis, materials, chemistry, computational fluid dynamics, combustion, DNA analysis, particle analysis, and astrophysics.

Applications of such scale can be executed only in environments with large amounts of memory and processor speed. An emerging trend to address this problem is massively parallel processor (MPP)-based visualization tools, requiring connectivity with high bandwidth and low latency among the researcher's visualization environment, the MPP, and the data storage site. Given the scarcity of MPP systems, these environments are typically geographically dispersed. To achieve interactive rates, images must be delivered to the desktop at 5 to 30 frames per second, challenging the network's bandwidth and responsiveness. While interactions from the user to the MPP (typically generated by the movement of a mouse or the pressing of a switch) may not require large amounts of bandwidth, the needed response requires minimal network latency.

Similar requirements are typical of shared work spaces, immersive visualization environments including latency-intolerant haptic devices, and remote experimentation.

Teleconferencing and Videoconferencing

The ESnet community has embraced and benefited greatly from the present availability of conference-room-based videoconferencing. Useful as it is, teleconferencing is in its infancy and needs major improvements. Advances are needed initially in ease of use, the incorporation of multimedia interactions, in the shared creation and editing of documents, and in the integration with data collection and analysis environments. The expansion of usage has increased the need for service directories, and the multiplicity of participants in any single session has created the need for floor control capabilities. Both are needed to ensure the future viability of teleconferencing. Additionally, as tools become easier to use and more integrated into the networked environment, general planning and coordination services will be needed.

Demands from the increased utilization of workstation-based videoconferencing could be enormous. The high-energy and nuclear physics communities suggest that their videoconference use in the near future might be as demanding on the network as their current data requirements-perhaps within two years as the B factories become operational in the U.S. and Japan and the Tevatron Run 2 begins at Fermilab. In the five years after that, high-energy physics will need an extensive evolution of the present conferencing system to permit effective work with gigantic data sets by extremely distributed groups of collaborators. The new capabilities required will almost certainly require evolution of relevant network protocols as well as the software at the end nodes. The ESnet community will need to have the expanded conferencing ability integrated closely with Remote Experimental Operations, placing even more demands on the detailed operation of the network.

Conference rooms throughout the ESnet community are already fully booked with apparent unsatisfied demand. The lack of both universal interoperability and ease of use continue to pose a barrier to increased usage of this service. Commercial providers do not seem motivated to resolve this hurdle. On a positive note, standards have been recently developed in this area, and low-cost commercial implementations have created the potential for significant increased use. ESnet will need to ensure timely and supported advancements in this area.

ESnet collaborations can be expected to make heavy use of workstation-based videoconferencing. However, the existing service model is a barrier to widespread use as it only adequately supports small numbers of participants. In addition, there is a need for a complete, readily accessible directory of institutions and individuals who are accessible via this medium.

CROSS-CUTTING AREAS

Quality of Service Capabilities

The success of computer networking to date has encouraged the creation of increasingly demanding network-based applications and rising expectations about network performance. These new applications will require stringent limits on parameters such as latency, jitter, packet loss, and throughput.

Four factors will contribute to the growing demands on network performance: (1) Real-time applications are expected to permeate the network as collaboratory and remote experimentation benefits are realized. (2) Local-area infrastructures, whose bandwidths are currently less than ESnet, thereby serving as a bottleneck protecting ESnet, will be upgraded, allowing greater demand on ESnet services. (3) Lack of interbackbone connectivity will especially affect those users of DOE facilities who are either at university sites or not domestic. (4) Financial constraints will limit the total bandwidth available so that (1), (2), and (3) cannot be countered simply by additional bandwidth. A smarter mechanism must be found to ensure needed performance.

There currently exists no management mechanism to allocate the available resource in a manner consistent with programmatic priorities. All other DOE resources (especially user facilities such as the Advanced Light Source, the Advanced Photon Source, the Tevatron, and the National Energy Research Scientific Computing Center) have management mechanisms and implementation schemes that allow for resource allocation and protection.

Both performance and management issues require a mechanism to provide needed network resources at appropriate times. The mechanism to accomplish this is Quality of Service (QOS). Since different resources can be required, different QOS guarantees may be required separately or in combination. For example:

In some cases, QOS may be a static condition that might apply to all remote facility operation sessions. In other cases, QOS might require integration with a scheduling mechanism, so that appropriate network performance can be scheduled to coincide with other reserved resources. One can imagine that QOS would allow end-to-end prioritization of the network to coincide with scheduling time at an experiment site or the reservation of other similarly critical resources. The process of booking and then delivering a specified QOS resource is presently unsolved and is a major need of the ESnet program.

Non-Backbone Connectivity

The DOE community has benefited greatly from the architecture and authorized use policies (AUPs) of ESnet. The ESnet backbone is very responsive and generally provides more than sufficient bandwidth between a scientist and the used facility in those cases where they are both located on the backbone. However, such interconnectivity and responsiveness is not typical of the U.S. and worldwide Internet. DOE researchers who are not located on the backbone, but rather at a U.S. university or an international facility, are typically limited by the bandwidth and response time of the Internet as a whole. The communities associated with research in high-energy and nuclear physics, global climate change, the Accelerated Strategic Computing Initiative (ASCI), environmental restoration, and post-genome processing all have users and/or facilities that are not directly connected to the ESnet backbone. These require sufficient network bandwidth for dealing with very large amounts of data in experimental, computational, or analysis environments that require low network latency as well. Current Internet bandwidth and latency impair DOE science in these areas. Thus, advances are needed in end-to-end network performance and tools with an effect beyond the current borders of ESnet.

Network Status and Diagnostic Tools for Users

Information about status, configuration, and performance of the network is increasingly needed as applications rely on adequate levels of bandwidth and latency. As guaranteed performance and associated booking systems become available, booking status and associated network usage information will also be needed.

Scalable Communications

Many of the new applications described above will make use of shared data, whether the data consists of experimental data for analysis, video, audio, immersive environments, or data caching. Any one of these data transfers could require sending large quantities of data to tens of different destinations. Efficient use of network resources demands that distribution of data streams like this not be duplicated any more times than absolutely necessary. Use of a single transmission over shared portions of a route, followed by individual distribution to final destinations, is presently implemented for some applications, notably video, by multicasting. Broadcast technologies (not using the Internet) may be appropriate for simultaneous distribution of very large data sets. Continued development of these technologies will be needed, along with their integration into applications identified above.

Network and Application Security

The vision espoused in this white paper, of new network-based paradigms such as distributed computing, virtual research groups, and telescience, depends implicitly on absolute security. This security includes both the end-to-end security of applications and, of course, security of the underlying network itself. It has come to be something of an aphorism that computers, not networks, are subject to security problems. However, the bridges, switches, routers, and domain name servers that make up the Internet are computers. They authenticate system (network) managers with username-password pairs, accept Telnet connections, and are vulnerable to exactly the same attacks that the media has reported in the computing world.

From a user's point of view, what counts is not the security of the underlying network, but the end-to-end security of applications. In this context, an application can include the remote control of an instrument or a complete research facility. Such security includes not only the protection of the bit stream, but also the security of the instrument, the facility, or even people. Providing end-to-end security at the application level involves not only securing the underlying network, it also involves authentication and certification of users or operators, and certification of application results.

Some of these issues have been solved in other contexts (e.g., training certification), and in many cases tools are beginning to appear that will allow straightforward development of solutions (such as public key encryption). But in general, complete solutions do not exist now and cannot be purchased as turnkey packages from commercial vendors. The commercial marketplace is investing in the development of secure electronic commerce (so people can send credit card numbers over the Web), but not to guarantee the results of a physics calculation. Thus, there exists a gap (recognized by DOE2000) between simple encryption of credit card numbers at one extreme, and the protection of multi-user, multi-system, real-time sessions at the other. This gap includes requirements for certification and authentication of users across administrative domains, protection and certification of shared information or results, and very high-speed encryption of real-time data streams at OC-12 speeds or higher.

Much of the work needed is summarized in the paper "Research & Development Priorities for Communications and Information Infrastructure Assurance" by Huntman, Jacobsen, Johnson, Mansur, and Baily, available at http://www-itg.lbl.gov/security/Publications/C+I_Report.html. The paper provides rough estimates of the levels of research needed in 13 topical areas: (1) characterization and notification of threats; (2) detection, analysis, and prevention; (3) definition of security architectures; (4) response, recovery, and reconstitution; (5) advanced concepts and theory; (6) management of information protection; (7) characterization of infrastructure required for minimum essential services; (8) valuation of information; (9) indication and warning; (10) cost-benefit analysis; (11) modeling and simulation; (12) risk management; and (13) encryption technologies.

NETWORK RESEARCH AND DEVELOPMENT

The MICS-funded network research and development efforts are intended to create and/or enable new high performance networking applications. Examples of their efforts to ensure scalable networking include the improvement of network protocols and router algorithms; the creation of tools that model, measure, and analyze network traffic; technology that can guarantee bandwidth (e.g., quality of service) and associated management tools; multicasting of data; security; and innovative techniques for the efficient handling of World Wide Web-related traffic. ESnet is motivated to support these efforts because such R&D underpins the future success of networking, and the R&D program itself is a member of the user community that ESnet is mandated to serve.

This situation presents a unique set of challenges to the ESnet provider. On the one hand, users of the production aspects of ESnet do not wish to have their network infrastructure disturbed by such R&D activities. On the other hand, network researchers must have access (at some point) to a large network with real users as the ultimate testbed for their efforts. Such an environment is critical to the future of ESnet and the entire community that it serves. A full discussion of issues related to supporting production and research traffic (and one possible solution) can be found in "MORPHnet" by Aiken, Carlson, Foster, Kuhfuss, Stevens, and Winkler at http://www.anl.gov/ECT/Public/research/morphnet.html. We believe such a testbed is an important requirement.

In the end, research and development must result in products that work in the scientific environment of the ESnet user community. Such products must be reasonably easy to use, perhaps appearing almost shrink-wrapped. Such tools must be integrated into the full scientific research environment and be easily maintainable.

A COLLABORATIVE APPROACH

This paper has discussed the need for a variety of complex and advanced networking services. In order to ensure their timely introduction and ease of use by the scientific community, we believe that a collaborative approach to development and implementation is needed. Most of the solutions will require close collaboration among the ESnet staff, the scientists using the network, and the network research and development staff. Of the applications and infrastructure enhancements discussed above, this collaborative approach to development will be particularly needed for remote experimental operations, remote/shared code development, visualization, teleconferencing/videoconferencing, quality of service, and network status and diagnostic tools for users.

Without a collaborative and tightly coupled approach, application and network infrastructure developments will not be coordinated, resulting in less than synergistic efforts and lost time. ESnet, as a network run for DOE research programs in close consultation with the user programs, is ideally positioned to ensure the needed collaborative developments and to work iteratively with the programmatic users to ensure that the users get the maximum value from the network and the improved functionality they need as soon as possible.

We find that close cooperation between network researchers, network service providers, and the ESnet user community will continue to be central to the success of the DOE networking program and the community that it benefits.