Java Technology Home Page

   Try the Applet Menu

What's New Products & APIs Documentation Applets For Developers Java Technology in the Real World Business & Licensing Support & Services Employment Java Store A-Z Index

java.sun.com - The Source for Java Technology java.sun.com - The Source for Java Technology Feedback Map Search

Portable Data / Portable Code:
XML & JavaTM Technologies

Prepared for Sun Microsystems, Inc. by: JP Morgenthal
Director of Research, NC.Focus
(516) 792-0997
FAX: (516) 792-0996
www.ncfocus.com

Executive Summary
Origins of the XML Standard
Using XML
Synergy of XML & Java Technologies
Portable Data and Code For the Enterprise
       Electronic Data Exchange and E-Commerce
       Electronic Data Interchange (EDI)
       Enterprise Application Integration (EAI)
       Publishing
       Software Development
Sun, XML Technology, and the Java Platform
       Java Platform Standard Extension for XML Technology
       XML Technology Makes Sense for the Java Platform
Conclusions
Appendix A: Resources
Appendix B: About NC.Focus

Executive Summary

Prior to 1998, the exchange of data and documents was limited to proprietary or loosely defined document formats. But the advent of Hypertext Markup Language (HTML)--the presentation markup language for displaying interactive data in a Web browser--offered the enterprise a standard format for exchange with a focus on interactive visual content. However, HTML is rigidly defined and cannot support all enterprise data types, and those shortcomings provided the impetus to create the Extensible Markup Language (XML). The XML standard allows the enterprise to define its own markup languages with emphasis on specific tasks, such as electronic commerce, supply-chain integration, data management, and publishing.

For those reasons, XML is rapidly becoming the strategic instrument for defining corporate data across a number of application domains. The properties of XML markup make it suitable for representing data, concepts, and contexts in an open, platform-, vendor-, and language-neutral manner. It uses tags--identifiers that signal the start and end of a related block of data--to create a hierarchy of related data components called elements. In turn, this hierarchy of elements provides context--implied meaning based on location--and encapsulation. As a result there is a greater opportunity to reuse this data outside of the application and data sources from which it was derived.

XML technology has already been successfully used to furnish solutions for mission-critical data exchange, publishing, and software development. Additionally, XML has become the incentive for groups of companies within a specific industry to work together to define industry-specific markup languages (sometimes referred to as vocabularies). These initiatives create a foundation for information sharing and exchange across an entire domain rather than on a one-to-one basis.

Sun Microsystems, along with other major vendors, such as IBM, Novell, Oracle, and even Microsoft, are strong supporters of the XML standard. Indeed, Sun Microsystems coordinated and underwrote the World Wide Web Consortium (W3C) working group that delivered the XML specification. Sun is also the creator of the JavaTM platform--a family of specifications that form a ubiquitous application development and runtime environment. It is now Sun Microsystems' intention to ensure that XML technology and the Java platform join in a way that is complementary to both.

XML and Java technologies have many complementary features, and when used in combination they enable a powerful platform for sharing and processing of data and documents. While XML can clearly define data and documents in an open and neutral manner, there is still a need to develop applications that can process it. The Java platform offers a homogeneous computing environment with portable code that can be downloaded over a network to any Java virtual machine. Together, XML and Java technologies allow enterprises to apply Write Once, Run AnywhereTM fundamentals to the processing of data and documents generated by both Java technology and non-Java technology sources. By extending the Java platform standards to include XML technology, companies will obtain a long-term secure solution for including support for XML technologies in their applications written in the Java programming language.

Introduction

The purpose of this paper is two-fold: To introduce the Extensible Markup Language (XML), as well as how it benefits the enterprise, and to explain the cooperative environment formed by integrating XML and Java technologies into a solution. Readers familiar with XML may want to concentrate on the sections that specifically deal with the discussion of using XML with Java technology.

The Extensible Markup Language (XML) is syntax for developing specialized markup languages, which adds identifiers, or tags, to certain characters, words, or phrases within a document so that they may be recognized and acted upon during future processing. "Marking up" a document or data results in the formation of a hierarchical container that is platform-, language-, and vendor-independent and separates the content from any environment that may process it.

Because XML is a recommendation of the W3C (World Wide Web Consortium), the group responsible for creating and maintaining all core Web technical specifications, it reflects a true industry accord that provides the first real opportunity to liberate the business intelligence that is trapped within disparate data sources found in the enterprise. XML does this by providing a format that can represent structured and unstructured data, along with rich descriptive delimiters, in a single atomic unit. In other words, XML can represent data found in common data sources, such as databases and applications, but also in non-traditional data sources, such as word processing documents and spreadsheets. Previously, non-traditional data sources were constrained by proprietary data formats and hardware and operating system platform differences.

W3C has released and maintains the Extensible Markup Language 1.0 as the official specification that defines the rationale behind the development of XML and the rules for processing XML-formatted data and documents (see Appendix A: Resources for associated Web links).

Origins of the XML Standard

The process of making SGML simpler and Internet-aware gave rise to the XML specification. This section explains the process that led to the development and adoption of XML technology as a W3C Recommendation.

The first standardized markup language, SGML (Standard Generalized Markup Language), is still a heavily used international standard maintained by the ISO (International Standards Organization). SGML gave the publishing industry a machine- and process-independent method of separating content from presentation. In publishing, the presentation is usually a form of printed medium and the machines that support those objectives. SGML simply lets authors define the characteristics of the print version without requiring them to include machine-specific codes.

To date, HTML (Hypertext Markup Language) is the most popular application of SGML. It acts as the presentation syntax that Web browsers use to render documents visually. Clearly, the Web is one of today's most powerful communication vehicles, illustrating the importance of authoring in a markup language. However, HTML is too specific to represent information generically, and SGML is too overbearing to use in tandem with the Web, therefore the XML language emerged as a simpler, generalized markup for the Web.

The distinction between SGML and HTML spurred the development of the XML specification. Jon Bosak, an engineer at Sun Microsystems and generally regarded as the "father of XML", realized the limitations of HTML early on. Bosak had used SGML extensively for managing technical documentation on behalf of large vendors-first Novell and then Sun Microsystems. This experience led Bosak to drive higher the expectations for publishing on the Web and demand nothing less powerful than SGML as the delivery tool.

Bosak's persistence prompted the W3C to recognize SGML and its associated style sheet language, DSSSL. He was also offered the opportunity to lead the Web SGML Activity (later renamed XML). Part of Bosak's responsibility was to obtain funding for the W3C Activity and build the team to design the specification. Bosak did both; Sun Microsystems underwrote the effort and a number of SGML experts participated in the development of the specification.

Using XML

XML technology enables companies to develop application-specific languages that better describe their business data. This section provides a brief overview of what it means to use XML data and what a XML document looks like.

By applying XML technology, one is essentially creating a new markup language. For example, an application of the XML language would produce the likes of an Invoice Markup Language or a Book Layout Markup Language. Each markup language should be specific to its creator's individual needs and goals.

Part of creating a markup language includes defining the elements, attributes, and rules for their use. In the XML language, this information is stored inside of a document type definition (DTD). DTDs may be included within XML documents or the DTD can be external to it. If the DTD is stored externally then the XML document must provide a reference to the DTD. If a document does provide a DTD and the document adheres to the rules specified in the DTD then it is considered valid.

The following is an example of a Document Type Definition that defines an element named BILLING_PARTY along with both its required and optional sub-elements.

if you cannot see the sample code, please follow the link

The example states that the element BILLING_PARTY must have one sub-element named ACCOUNT_NUMBER directly following it and optionally may be followed by any of the contact information fields.

Of note, it is not a requirement that a DTD be provided. Documents without DTDs that follow the rules of the XML specification are designated as well formed, but not valid. An XML parser can identify whether a document is well formed and valid.

The following is an example of a well-formed XML document:

if you cannot see the sample code, please follow the link

This example illustrates how XML can provide developers with the ability to define application-specific tags, such as <INVOICE> and <BILLING_PARTY>. But, it is the resulting markup language that gives the enterprise the power to leverage and reuse this invoice description across many applications.

For example, the invoice document could be rendered into HTML and displayed to the user in a Web shopping scenario, it could be delivered to a Point-of-Sale (POS) terminal in a store to be rendered into a receipt, and it could be sent to the back-office where it would be used to update the accounting and inventory systems. Additionally, this XML document could be generated by an existing sales application, illustrating how the output of one system can be used as input to another and thereby providing a simple means by which application integration can occur.

This example also illustrates how XML technology supports semi-structured data. First, it provides encapsulation, which tells us where data starts and stops with regard to a single element. Second, it provides context; inside of tells us that the price relates to that single item. Finally, the XML language provides meta-information, such as currency on price. By using the power of attributes to represent currency, this same format can be leveraged across the globe for multiple monetary concerns. Best of all, this format is extensible, which means that any one company could extend it to support data that is specific to its needs.

Synergy of XML & Java Technologies

The Java platform's portable code capability has been invaluable in fostering a collaborative environment. For example, the XML-Dev mailing list used Java technology as the basis for a collaborative project called SAX (Simple API for XML). SAX is a Java technology interface that allows applications to integrate with any XML parser to receive notification of parsing events. Every major Java technology-based parser available now supports this interface. It was developed by a group of individuals participating in the mailing list who leveraged Java platform's portability to speed development and share ideas.

Without Java technology, the SAX developers would have had a much more difficult time building this interface. First, they would have been required to share portable C or C++ code; a very difficult thing to create. Secondly, all of the SAX creators would have needed a C or C++ compiler for their platforms, which requires them to build and debug their own versions of the SAX implementation; a time consuming task at best. Instead, the participants needed only to download a widely available version of the Java Development Kit (JDKTM) and a Java technology-based parser that supported the SAX interface.

Here are some other key synergies that the Java platform shares with the XML standard:

  • The Java platform intrinsically supports the Unicode standard, making child's play of processing an international XML document. For platforms without native Unicode support, the application must implement its own handling of Unicode characters, which adds complexity to the overall solution.
  • The Java technology binding to the W3C Document Object Model (DOM) provides developers with a highly productive environment for processing and querying XML documents. The Java platform can become a ubiquitous runtime environment for processing XML documents.
  • The Java platform's intrinsic support of the object-oriented programming means that developers can build applications by creating hierarchies of Java objects. Similarly, the XML specification offers a hierarchical representation of data. Because the Java platform and XML content share this common underlying feature, they are extremely compatible for representing each other's structures.
  • Applications written in the Java programming language that process XML can be reused on any tier in a multi-tiered client/server environment, offering an added level of reuse for XML documents. The same cannot be said of scripting environments or platform-specific binary executables.

Portable Data and Code For The Enterprise

When using XML and Java technologies together, there is a greater interoperability formed with other applications both inside and outside of the Enterprise. This section provides some examples of business imperatives that can leverage XML and Java technologies simultaneously.

For those just beginning to explore XML, it is not uncommon to feel that the language is being pitched for every IT ailment. The reason for this is clear: XML delivers interoperability of data across applications and hardware. In today's mostly heterogeneous computing environments, interoperability is still the biggest problem. By virtue of the support it has garnered from the largest vendors in the world, such as IBM, Oracle, Sun Microsystems, Microsoft, and SAP, XML delivers like no other computing initiative since ASCII. The result of vendor support is immediate data interoperability with perhaps one small requirement of adherence to a selected vocabulary.

The following is a brief categorical breakdown of the types of tasks that have become less complex thanks to the XML standard, and, where applicable, a description of how XML and Java technologies simplify these tasks.

Electronic Data Exchange and E-Commerce

Processing data from other departments and/or enterprises should be a simple task given the industry's vast knowledge of communications, networking, and data processing, but unfortunately, that's not the case. Validating data format and ensuring content correctness are still major hurdles to achieving simple, automated exchanges of data. Using XML technology as the format for data exchange may quickly remedy most of this problem for the following reasons:

  • Electronic data exchange of non-standard data formats requires developers to build proprietary parsers for each data format. XML technology eliminates this requirement by using a standard XML parser.
  • An XML parser can immediately provide some content validation by ensuring that all the required fields are provided and are in the right order. This function, however, requires the availability of a DTD. Additional content validation is possible by developing applications using the W3C Document Object Model--an application programming interface that facilitates exploration of XML documents--that apply field validation rules to content by element.

Additionally, content and format validation can be completed outside of the processing application and perhaps even on a different machine. The effect of this approach is two-fold: It reduces the resources used on the processing machine and speeds up the processing application's overall throughput since the it does not need to first validate the data. Secondly, the approach offers companies the opportunity to accept or deny the data at time of receipt instead of requiring them to handle exceptions during processing.

When XML markup is combined with Java technology it becomes significantly easier to build electronic data exchange applications for a couple of reasons. First, the Java platform is Internet-enabled, which immediately facilitates connectivity over TCP/IP between the exchanging parties. As a result, these parties can use the Internet as an exchange transport. Moreover, technically sophisticated enterprises can provide the tools and technologies to help less sophisticated ones participate in electronic data exchange.

In addition, both XML and the Java platform intrinsically support Unicode character sets so both environments enable enterprises to support development of internationalized applications. Using the Unicode standard, applications can represent characters in multiple national languages. With XML markup as the format for data exchange and an internationalized application written in the Java language for processing, XML documents can be exchanged globally.

Electronic Data Interchange (EDI)

EDI is a special category of data exchange that nearly always uses a VAN (Value-Added Network) as the transmission medium. It relies on either the X12 or EDIFACT standards to describe the documents that are being exchanged. Currently, EDI is a very expensive environment to install and possibly requires customization depending upon the terms established by the exchanging parties. For this reason, there are a number of enterprises and independent groups examining the XML language as a possible format for X12 and EDIFACT documents, although no decisions have been reached.

However, one area where XML can provide immediate value is in establishing a vocabulary and format for the definition of EDI documents. This is especially useful when one trading partner for its own internal use has extended the base X12 and EDIFACT documents. Using XML data, trading partners could communicate the schema of their EDI documents. Longer term, this information may potentially become an automated part of the exchange process, thus simplifying and reducing implementation costs.

Enterprise Application Integration (EAI)

Enterprise Application Integration (EAI) is best described as making one or more disparate applications act as one single application. This is a complex task that requires that data to be replicated and distributed to the right systems at the right time. For example, when integrating accounting and sales systems, it may be necessary for the sales system to send sales orders to the accounting system to generate invoices. Furthermore, the accounting system must send invoice data into the sales system to update the sales representatives. If done correctly, a single sales transaction will generate the sales order and the invoice automatically, thus eliminating the potentially erroneous manual re-entry of data.

An enterprise can accomplish EAI by using many methods; some of which will be made easier by using XML markup. For example, when integrating applications using messaging, the communicating applications must agree on the message formats. Since, there is little chance that two disparate applications might share similar data structures, interim format capable of handling semi-structured data is needed. XML can make it possible for EAI to easily represent semi-structured data.

Another form of integration uses shared data mediums, such as a database or memory. Business data is aggregated from multiple data sources, such as legacy applications and databases, and presented as a semi-structured document to other applications. In the case of the accounting and sales systems integration, the aggregated data set would contain all the data necessary to represent a complete sales transaction to any other system. This document would then be stored in the shared data medium and accessed as needed by the sales or accounting system.

Because the Java platform supports connectivity to a diverse set of middleware services, such as databases, transaction processing monitors, asynchronous messaging systems, and object request brokers, it makes an excellent tool for developing EAI applications. The Java enterprise application programming interfaces (APIs), which include Enterprise JavaBeansTM architecture, Java Interface Definition Language, Java Database Connectivity (JDBCTM), Java Messaging Service (JMSTM), Java Naming and Directory InterfaceTM (JNDI), and Java Transaction Server (JTS) APIs, let developers access many of the tools used for integration with non-Java technology environments. XML lets developers represent Java object data as it travels in and out of the Java virtual machine and across non-Java technology-based middleware.

Publishing

The XML language retains much of SGML's capabilities and is as useful to publishing as is its parent. Indeed, many of the initiatives surrounding the XML specification within the W3C focus on publishing objectives. XML can be used to provide print publishing with ways of organizing content for maximum reusability. For example, it can be used to represent the similar and car-specific sections of an automotive user manual. This provides maximum reusability of content and streamlines production practices.

In addition to simplifying content organization, XML technology also simplifies generation to multiple output mediums. For example, the Extensible Style sheet Language (XSL)--an application of XML that is used to describe how to transform a document--can be used to generate output from a single XML document to a myriad of devices, such as printers, plotters, and print presses.

Besides supporting traditional printing objectives, XML technology can also be used for newer electronic publishing tasks. That is, XML can be used to "mark up" images, video streams, audio stream, and other assorted binary data objects. This provides a way to index, search, and manipulate streams within applications.

However, there is more to publishing than just organizing content and reproduction. Workflow and storage are two key aspects of a robust publishing environment that require specific business logic to implement. And, this is where Java technology comes into play. The Java platform's connectivity to the network and storage environments makes it a perfect platform on which to implement publishing systems. With Java technology, it is possible to build a robust shared publishing environment that can handle authoring, processing, distribution, workflow, and storage.

Software Development

Three key areas of software development that XML has impacted are the sharing of application architectures, the building of declarative environments, and scripting facilities. Each of these will be discussed further in this section.

In February 1999, the OMG (Object Management Group) publicly stated its intention to adopt the XMI (XML Metadata Interchange) specification. XMI is a XML based vocabulary that describes application architectures that are designed using the Unified Modeling Language (UML). UML is a standard set of rules that describe system elements and the relationships between them. With the adoption of XMI, it becomes possible to share a single UML model across a large-scale development team that is using a diverse set of application development tools. This level of communication over a single design makes large-scale development teams much more productive. Also, because the model is represented in XML, it easily can be centralized in a repository, which makes it easier to maintain and change the model as well as provide overall version control.

XMI illustrates how XML simplifies the software development process, but it also can simplify design of overall systems. Since XML content is embodied in a document that must be parsed to provide value, it is a given that that a XML technology-based application will be a declarative application. A declarative application decides what a document means for itself. In contrast, an imperative application will make assumptions about the document it is processing based on predefined logic. The Java compiler is imperative because it expects any file it reads to be a Java class file. A declarative environment would first parse the file, examine it, and make a decision about the type of document it is. Then, based on this information the declarative application would take a course of action.

The concept of declarative environments is extremely popular right now, especially when it comes to business rules processing. These applications allow developers to declare a set of rules that then get submitted to a rules engine, which will match behavior (actions) to rules for each piece of data it examines. XML technology can also provide developers the ability to develop and process their own action (scripting) languages. The XML language is a meta-language so it can be used to create any other language, including a scripting language. This is a powerful use of XML technology that the industry is just starting to explore.

Sun, XML Technology, and the Java Platform

Sun Microsystems has had a long association with the XML specification. At this time Sun will extend its support to ensure that the enterprise has access to XML technology from within the Java platform. This section outlines Sun's vision of XML technology as it relates relative to the Java platform. It also illustrates how XML and Java technologies will work together to provide a portable code/portable data environment.

Since early 1998, early adopters of the XML specification have been using Java technology to parse XML and build XML applications for a variety of reasons. Java technology's portability provides developers with an open and accessible market for sharing their work and XML data portability provides the means to build declarative, reusable application components.

Development efforts within the XML community clearly illustrate this benefit. In contrast to many other technology communities, those building on XML technology always have been driven by the need to remain open and facilitate sharing. Java technology has enabled these communities to share markup languages as well as code to process markup languages across most major hardware and operating system platforms.

Sun Microsystems' vision for XML and Java technologies is to provide a platform that embodies portable data and portable maintainable code to produce platform-independent standards-based applications. Sun Microsystems clearly recognizes that the enterprise IT community has accepted XML technology because of its ability to simply represent semi-structured data. And, the availability of Java virtual machines on every major hardware platform and operating system means that users will have the ability to process that semi-structured data anywhere in the enterprise.

Java Platform Standard Extension for XML Technology

To further help users leverage the power of XML and the Java platform, Sun Microsystems is working through the Java Community Process to develop a Java platform standard extension for the XML specification. The Java Community Process is a formal process for developing Java specifications, to produce standard extensions to the Java platform. The outcome of this process will rapidly produce a high-quality specification which uses an inclusive, consensus building process that not only delivers the specification, but also the reference implementation and its associated suite of compatibility tests.

The Java Platform Standard Extension for XML technology proposes to provide basic XML functionality to read, manipulate, and generate text. This functionality will conform to the XML 1.0 specification and will leverage existing efforts around Java technology APIs for XML technology, including the W3C Document Object Model (DOM) Level 1 Core Recommendation and the SAX (Simple API for XML) programming interface version 1.0.

The intent of supporting a XML technology standard extension is to:

  • Ensure that it easy for developers to use XML and XML developers to use Java technologies
  • Provide a base from which to add XML features in the future
  • Provide a standard for the Java platform to ensure compatible and consistent implementations
  • Ensure a high-quality integration with the Java platform

The Java Community Process gives Java technology users the opportunity to participate in the active growth of the Java platform. These extensions eventually will become supported standards within the Java platform, thus providing consistency for applications written in the Java programming language going forward. The Java Platform Standard Extension for XML technology will offer companies a standard way to create and process XML documents within the Java platform.

XML Technology Makes Sense For the Java Platform

To reiterate, as a result of formatting data using XML technology, data is interoperable across heterogeneous systems. To meet the objectives of the enterprise, the Java platform must also interoperate with existing applications and systems. The XML language provides a data-centric method of moving data between Java technology and non-Java technology platforms. While CORBA represents the method of obtaining interoperability in a process-centric manner, it is not always possible to use CORBA connectivity. In these cases the XML language does an excellent job of representing the state of Java objects as they leave and re-enter the Java virtual machine.

XML will also be used to define deployment descriptors for Enterprise JavaBeansTM (EJB) architecture. Deployment descriptors describe for EJB implementations the rules for packaging and deploying an Enterprise JavaBeans component. According to Sun Microsystems, the next release of EJB will use XML technology for deployment descriptors providing, once again, data interoperability for Java platform. It will also be used as a standard for transmission of mission-critical business data inside of the Java 2 platform.

Conclusions

XML technology holds much promise for the future. It is an industry-wide recognized language for building representations of semi-structured data that could be shared intra- and inter-enterprise. However, XML lets companies describe only the data and its structure. Additional processing logic must be applied to ensure document validity, transportation of the documents to interested parties, and for transforming the data into a form more useful to everyday business systems.

The Java platform offers the enterprise a method for building libraries once to handle the creation, maintenance, distribution, and processing of XML documents and to leverage that process across a multitude of hardware and operating system platforms. Sun Microsystems' initiative to extend the Java platform to incorporate standard processing of XML technology will give companies the security they need to build these libraries without fearing the rapidly changing programming interfaces and functionality.

XML and Java technologies are clearly the two most important developments for Internet computing since the advent of the original ARPAnet project, the basis for today's Internet. Whether used together or separately, these two standards empower the enterprise to forge new electronic partnerships that leverage the availability and ubiquity of the Internet to exchange and share data.

Appendix A: Resources

http://www.w3.org/TR/1998/REC-xml-19980210 - The Extensible Markup Language 1.0

http://java.sun.com/xml - Provides news, information, and links to other resources.

http://java.sun.com/jdc - Provides preview of technologies and is home to the Java Community Process.

http://www.jxml.com - JXML Inc.'s home page and home of XML and Java technologies initiatives. Also home of the Java-XML Mailing List.

Appendix B: About NC.Focus

About NC.Focus

NC.Focus was founded in 1996 and has become a leading technology research and advisory firm specializing in enterprise application integration (EAI) techniques, tools, and technologies. We provide the unique service of identifying the impact and benefits of enterprise application integration trends and technologies on business, incorporated with an in-depth explanation of how the tools and technologies operate. As a result, we empower our clients to better identify and choose technologies based upon specific organizational needs and requirements.

For additional information, or to subscribe to the NC.Focus services, please contact JP Morgenthal, president and Director of Research for NC.Focus, at (516) 792-0997 or jp@ncfocus.com.


 

This page was updated: 29-Mar-99


What's New | Read | Products | Applets | For Developers
Real World Java | Business | Marketing | Employment | Java Store
FEEDBACK | SUPPORT & SERVICES | MAP | A-Z INDEX
 

For information, call:
(800) 786-7638
Outside the U.S. and Canada, dial your country's AT&T Direct Access Number first.

Copyright © 1995-99 Sun Microsystems, Inc.
All Rights Reserved. Legal Terms. Privacy Policy.