Prepared for Sun Microsystems, Inc. by:
Director of Research, NC.Focus
FAX: (516) 792-0996
Origins of the XML Standard
Synergy of XML & Java Technologies
Portable Data and Code For the Enterprise
Electronic Data Exchange and E-Commerce
Electronic Data Interchange (EDI)
Enterprise Application Integration (EAI)
Sun, XML Technology, and the Java Platform
Java Platform Standard Extension for XML Technology
XML Technology Makes Sense for the Java Platform
Appendix A: Resources
Appendix B: About NC.Focus
Prior to 1998, the exchange of data and documents was limited to
proprietary or loosely defined document formats. But the advent of
Hypertext Markup Language (HTML)--the presentation markup language for
displaying interactive data in a Web browser--offered the enterprise a
standard format for exchange with a focus on interactive visual
content. However, HTML is rigidly defined and cannot support all
enterprise data types, and those shortcomings provided the impetus to
create the Extensible Markup Language (XML). The XML standard allows the
enterprise to define its own markup languages with emphasis on
specific tasks, such as electronic commerce, supply-chain integration,
data management, and publishing.
For those reasons, XML is rapidly becoming the strategic instrument
for defining corporate data across a number of application domains.
The properties of XML markup make it suitable for representing data,
concepts, and contexts in an open, platform-, vendor-, and
language-neutral manner. It uses tags--identifiers that signal the
start and end of a related block of data--to create a hierarchy of
related data components called elements. In turn, this hierarchy of
elements provides context--implied meaning based on location--and
encapsulation. As a result there is a greater opportunity to reuse
this data outside of the application and data sources from which it
XML technology has already been successfully used to furnish solutions for
mission-critical data exchange, publishing, and software development.
Additionally, XML has become the incentive for groups of companies
within a specific industry to work together to define
industry-specific markup languages (sometimes referred to as
vocabularies). These initiatives create a foundation for information
sharing and exchange across an entire domain rather than on a
Sun Microsystems, along with other major vendors, such as IBM, Novell,
Oracle, and even Microsoft, are strong supporters of the XML standard.
Indeed, Sun Microsystems coordinated and underwrote the World Wide Web
Consortium (W3C) working group that delivered the XML specification.
Sun is also the creator of the JavaTM platform--a family of specifications that
form a ubiquitous application development and runtime environment. It
is now Sun Microsystems' intention to ensure that XML technology and
the Java platform join in a way that is complementary to both.
XML and Java technologies have many complementary features, and when
used in combination they enable a powerful platform for sharing and
processing of data and documents. While XML can clearly define data
and documents in an open and neutral manner, there is still a need to
develop applications that can process it. The Java platform offers a
homogeneous computing environment with portable code that can be
downloaded over a network to any Java virtual machine. Together, XML
and Java technologies allow enterprises to apply Write Once, Run
AnywhereTM fundamentals to the
processing of data and documents generated by both Java technology and
non-Java technology sources. By extending the Java platform standards
to include XML technology, companies will obtain a long-term secure
solution for including support for XML technologies in their
applications written in the Java programming language.
The purpose of this paper is two-fold: To introduce the Extensible
Markup Language (XML), as well as how it benefits the enterprise, and
to explain the cooperative environment formed by integrating XML and
Java technologies into a solution. Readers familiar with XML may want
to concentrate on the sections that specifically deal with the
discussion of using XML with Java technology.
The Extensible Markup Language (XML) is syntax for developing
specialized markup languages, which adds identifiers, or tags, to
certain characters, words, or phrases within a document so that they
may be recognized and acted upon during future processing. "Marking
up" a document or data results in the formation of a hierarchical
container that is platform-, language-, and vendor-independent and
separates the content from any environment that may process it.
Because XML is a recommendation of the W3C (World Wide Web
Consortium), the group responsible for creating and maintaining all
core Web technical specifications, it reflects a true industry accord
that provides the first real opportunity to liberate the business
intelligence that is trapped within disparate data sources found in
the enterprise. XML does this by providing a format that can represent
structured and unstructured data, along with rich descriptive
delimiters, in a single atomic unit. In other words, XML can represent
data found in common data sources, such as databases and applications,
but also in non-traditional data sources, such as word processing
documents and spreadsheets. Previously, non-traditional data sources
were constrained by proprietary data formats and hardware and
operating system platform differences.
W3C has released and maintains the Extensible Markup Language 1.0 as
the official specification that defines the rationale behind the
development of XML and the rules for processing XML-formatted data and
documents (see Appendix A: Resources for associated Web links).
Origins of the XML Standard
The process of making SGML simpler and Internet-aware gave rise to
the XML specification. This section explains the process that led to
the development and adoption of XML technology as a W3C
The first standardized markup language, SGML (Standard Generalized
Markup Language), is still a heavily used international standard
maintained by the ISO (International Standards Organization). SGML
gave the publishing industry a machine- and process-independent method
of separating content from presentation. In publishing, the
presentation is usually a form of printed medium and the machines that
support those objectives. SGML simply lets authors define the
characteristics of the print version without requiring them to include
To date, HTML (Hypertext Markup Language) is the most popular
application of SGML. It acts as the presentation syntax that Web
browsers use to render documents visually. Clearly, the Web is one of
today's most powerful communication vehicles, illustrating the
importance of authoring in a markup language. However, HTML is too
specific to represent information generically, and SGML is too
overbearing to use in tandem with the Web, therefore the XML language
emerged as a
simpler, generalized markup for the Web.
The distinction between SGML and HTML spurred the development of the
XML specification. Jon Bosak, an engineer at Sun Microsystems and
generally regarded as the "father of XML", realized the limitations of
HTML early on. Bosak had used SGML extensively for managing technical
documentation on behalf of large vendors-first Novell and then Sun
Microsystems. This experience led Bosak to drive higher the
expectations for publishing on the Web and demand nothing less
powerful than SGML as the delivery tool.
Bosak's persistence prompted the W3C to recognize SGML and its
associated style sheet language, DSSSL. He was also offered the
opportunity to lead the Web SGML Activity (later renamed XML). Part
of Bosak's responsibility was to obtain funding for the W3C Activity
and build the team to design the specification. Bosak did both; Sun
Microsystems underwrote the effort and a number of SGML experts
participated in the development of the specification.
XML technology enables companies to develop application-specific
languages that better describe their business data. This section
provides a brief overview of what it means to use XML data and what a
XML document looks like.
By applying XML technology, one is essentially creating a new markup language.
For example, an application of the XML language would produce the likes of an
Invoice Markup Language or a Book Layout Markup Language. Each markup
language should be specific to its creator's individual needs and
Part of creating a markup language includes defining the elements,
attributes, and rules for their use. In the XML language, this information is
stored inside of a document type definition (DTD). DTDs may be
included within XML documents or the DTD can be external to it. If
the DTD is stored externally then the XML document must provide a
reference to the DTD. If a document does provide a DTD and the
document adheres to the rules specified in the DTD then it is
The following is an example of a Document Type Definition that defines
an element named BILLING_PARTY along with both its required and
The example states that the element BILLING_PARTY must have one
sub-element named ACCOUNT_NUMBER directly following it and optionally
may be followed by any of the contact information fields.
Of note, it is not a requirement that a DTD be provided. Documents
without DTDs that follow the rules of the XML specification are
designated as well formed, but not valid. An XML parser can identify
whether a document is well formed and valid.
The following is an example of a well-formed XML document:
This example illustrates how XML can provide developers with the
ability to define application-specific tags, such as <INVOICE> and
<BILLING_PARTY>. But, it is the resulting markup language that gives
the enterprise the power to leverage and reuse this invoice
description across many applications.
For example, the invoice document could be rendered into HTML and
displayed to the user in a Web shopping scenario, it could be
delivered to a Point-of-Sale (POS) terminal in a store to be rendered
into a receipt, and it could be sent to the back-office where it would
be used to update the accounting and inventory systems. Additionally,
this XML document could be generated by an existing sales application,
illustrating how the output of one system can be used as input to
another and thereby providing a simple means by which application
integration can occur.
This example also illustrates how XML technology supports
semi-structured data. First, it provides encapsulation, which tells
us where data starts and stops with regard to a single element.
Second, it provides context; inside of
- tells us that
the price relates to that single item. Finally, the XML language
provides meta-information, such as currency on price. By using the
power of attributes to represent currency, this same format can be
leveraged across the globe for multiple monetary concerns. Best of
all, this format is extensible, which means that any one company could
extend it to support data that is specific to its needs.
Synergy of XML & Java Technologies
The Java platform's portable code capability has been invaluable in
fostering a collaborative environment. For example, the XML-Dev
mailing list used Java technology as the basis for a collaborative
project called SAX (Simple API for XML). SAX is a Java technology
interface that allows applications to integrate with any XML parser to
receive notification of parsing events. Every major Java
technology-based parser available now supports this interface. It was
developed by a group of individuals participating in the mailing list
who leveraged Java platform's portability to speed development and
Without Java technology, the SAX developers would have had a much more
difficult time building this interface. First, they would have been
required to share portable C or C++ code; a very difficult thing to
create. Secondly, all of the SAX creators would have needed a C or
C++ compiler for their platforms, which requires them to build and
debug their own versions of the SAX implementation; a time consuming
task at best. Instead, the participants needed only to download a
widely available version of the Java Development Kit (JDKTM) and a Java technology-based parser that
supported the SAX interface.
Here are some other key synergies that the Java platform shares with the XML
- The Java platform intrinsically supports the Unicode standard,
making child's play of processing an international XML document. For
platforms without native Unicode support, the application must
implement its own handling of Unicode characters, which adds
complexity to the overall solution.
- The Java technology binding to the W3C Document Object Model (DOM)
provides developers with a highly productive environment for
processing and querying XML documents. The Java platform can become a
ubiquitous runtime environment for processing XML documents.
- The Java platform's intrinsic support of the object-oriented
programming means that developers can build applications by creating
hierarchies of Java objects. Similarly, the XML specification offers a
hierarchical representation of data. Because the Java platform and
XML content share this common underlying feature, they are extremely
compatible for representing each other's structures.
- Applications written in the Java programming language that process
XML can be reused on any tier in a multi-tiered client/server
environment, offering an added level of reuse for XML documents. The
same cannot be said of scripting environments or platform-specific
Portable Data and Code For The Enterprise
When using XML and Java technologies together, there is a greater
interoperability formed with other applications both inside and
outside of the Enterprise. This section provides some examples of
business imperatives that can leverage XML and Java technologies
For those just beginning to explore XML, it is not uncommon to feel
that the language is being pitched for every IT ailment. The reason
for this is clear: XML delivers interoperability of data across
applications and hardware. In today's mostly heterogeneous computing
environments, interoperability is still the biggest problem. By
virtue of the support it has garnered from the largest vendors in the
world, such as IBM, Oracle, Sun Microsystems, Microsoft, and SAP, XML
delivers like no other computing initiative since ASCII. The result
of vendor support is immediate data interoperability with perhaps one
small requirement of adherence to a selected vocabulary.
The following is a brief categorical breakdown of the types of tasks
that have become less complex thanks to the XML standard, and, where
applicable, a description of how XML and Java technologies simplify
Electronic Data Exchange and E-Commerce
Processing data from other departments and/or enterprises should be a
simple task given the industry's vast knowledge of communications,
networking, and data processing, but unfortunately, that's not the
case. Validating data format and ensuring content correctness are
still major hurdles to achieving simple, automated exchanges of data.
Using XML technology as the format for data exchange may quickly
remedy most of this problem for the following reasons:
- Electronic data exchange of non-standard data formats requires
developers to build proprietary parsers for each data format. XML
technology eliminates this requirement by using a standard XML parser.
- An XML parser can immediately provide some content validation by
ensuring that all the required fields are provided and are in the
right order. This function, however, requires the availability of a
DTD. Additional content validation is possible by developing
applications using the W3C Document Object Model--an application
programming interface that facilitates exploration of XML
documents--that apply field validation rules to content by
Additionally, content and format validation can be completed outside
of the processing application and perhaps even on a different machine.
The effect of this approach is two-fold: It reduces the resources used
on the processing machine and speeds up the processing application's
overall throughput since the it does not need to first validate the
data. Secondly, the approach offers companies the opportunity to
accept or deny the data at time of receipt instead of requiring them
to handle exceptions during processing.
When XML markup is combined with Java technology it becomes significantly
easier to build electronic data exchange applications for a couple of
reasons. First, the Java platform is Internet-enabled, which
immediately facilitates connectivity over TCP/IP between the
exchanging parties. As a result, these parties can use the Internet
as an exchange transport. Moreover, technically sophisticated
enterprises can provide the tools and technologies to help less
sophisticated ones participate in electronic data exchange.
In addition, both XML and the Java platform intrinsically support
Unicode character sets so both environments enable enterprises to
support development of internationalized applications. Using the
Unicode standard, applications can represent characters in multiple
national languages. With XML markup as the format for data exchange and an
internationalized application written in the Java language for
processing, XML documents can be exchanged globally.
Electronic Data Interchange (EDI)
EDI is a special category of data exchange that nearly always uses a
VAN (Value-Added Network) as the transmission medium. It relies on
either the X12 or EDIFACT standards to describe the documents that are
being exchanged. Currently, EDI is a very expensive environment to
install and possibly requires customization depending upon the terms
established by the exchanging parties. For this reason, there are a
number of enterprises and independent groups examining the XML language as a
possible format for X12 and EDIFACT documents, although no decisions
have been reached.
However, one area where XML can provide immediate value is in
establishing a vocabulary and format for the definition of EDI
documents. This is especially useful when one trading partner for its
own internal use has extended the base X12 and EDIFACT documents.
Using XML data, trading partners could communicate the schema of their
EDI documents. Longer term, this information may potentially become
an automated part of the exchange process, thus simplifying and
reducing implementation costs.
Enterprise Application Integration (EAI)
Enterprise Application Integration (EAI) is best described as making
one or more disparate applications act as one single application. This
is a complex task that requires that data to be replicated and
distributed to the right systems at the right time. For example, when
integrating accounting and sales systems, it may be necessary for the
sales system to send sales orders to the accounting system to generate
invoices. Furthermore, the accounting system must send invoice data
into the sales system to update the sales representatives. If done
correctly, a single sales transaction will generate the sales order
and the invoice automatically, thus eliminating the potentially
erroneous manual re-entry of data.
An enterprise can accomplish EAI by using many methods; some of which
will be made easier by using XML markup. For example, when integrating
applications using messaging, the communicating applications must
agree on the message formats. Since, there is little chance that two
disparate applications might share similar data structures, interim
format capable of handling semi-structured data is needed. XML can
make it possible for EAI to easily represent semi-structured data.
Another form of integration uses shared data mediums, such as a
database or memory. Business data is aggregated from multiple data
sources, such as legacy applications and databases, and presented as a
semi-structured document to other applications. In the case of the
accounting and sales systems integration, the aggregated data set
would contain all the data necessary to represent a complete sales
transaction to any other system. This document would then be stored
in the shared data medium and accessed as needed by the sales or
Because the Java platform supports connectivity to a diverse set of
middleware services, such as databases, transaction processing
monitors, asynchronous messaging systems, and object request brokers,
it makes an excellent tool for developing EAI applications. The Java
enterprise application programming interfaces (APIs), which include
Enterprise JavaBeansTM architecture,
Java Interface Definition Language, Java Database Connectivity
(JDBCTM), Java Messaging Service
(JMSTM), Java Naming and Directory
InterfaceTM (JNDI), and Java
Transaction Server (JTS) APIs, let developers access many of the tools
used for integration with non-Java technology environments. XML lets
developers represent Java object data as it travels in and out of the
Java virtual machine and across non-Java technology-based middleware.
The XML language retains much of SGML's capabilities and is as useful
to publishing as is its parent. Indeed, many of the initiatives
surrounding the XML specification within the W3C focus on publishing
objectives. XML can be used to provide print publishing with ways of
organizing content for maximum reusability. For example, it can be
used to represent the similar and car-specific sections of an
automotive user manual. This provides maximum reusability of content
and streamlines production practices.
In addition to simplifying content organization, XML technology also simplifies
generation to multiple output mediums. For example, the Extensible
Style sheet Language (XSL)--an application of XML that is used to
describe how to transform a document--can be used to generate
output from a single XML document to a myriad of devices, such as
printers, plotters, and print presses.
Besides supporting traditional printing objectives, XML technology can also be
used for newer electronic publishing tasks. That is, XML can be used
to "mark up" images, video streams, audio stream, and other assorted
binary data objects. This provides a way to index, search, and
manipulate streams within applications.
However, there is more to publishing than just organizing content and
reproduction. Workflow and storage are two key aspects of a robust
publishing environment that require specific business logic to
implement. And, this is where Java technology comes into play. The
Java platform's connectivity to the network and storage environments
makes it a perfect platform on which to implement publishing systems.
With Java technology, it is possible to build a robust shared
publishing environment that can handle authoring, processing,
distribution, workflow, and storage.
Three key areas of software development that XML has impacted are the
sharing of application architectures, the building of declarative
environments, and scripting facilities. Each of these will be
discussed further in this section.
In February 1999, the OMG (Object Management Group) publicly stated
its intention to adopt the XMI (XML Metadata Interchange)
specification. XMI is a XML based vocabulary that describes application
architectures that are designed using the Unified Modeling Language
(UML). UML is a standard set of rules that describe system elements
and the relationships between them. With the adoption of XMI, it
becomes possible to share a single UML model across a large-scale
development team that is using a diverse set of application
development tools. This level of communication over a single design
makes large-scale development teams much more productive. Also,
because the model is represented in XML, it easily can be centralized
in a repository, which makes it easier to maintain and change the
model as well as provide overall version control.
XMI illustrates how XML simplifies the software development process,
but it also can simplify design of overall systems. Since XML content
is embodied in a document that must be parsed to provide value, it is
a given that that a XML technology-based application will be a
declarative application. A declarative application decides what a
document means for itself. In contrast, an imperative application
will make assumptions about the document it is processing based on
predefined logic. The Java compiler is imperative because it expects
any file it reads to be a Java class file. A declarative environment
would first parse the file, examine it, and make a decision about the
type of document it is. Then, based on this information the
declarative application would take a course of action.
The concept of declarative environments is extremely popular right
now, especially when it comes to business rules processing. These
applications allow developers to declare a set of rules that then get
submitted to a rules engine, which will match behavior (actions) to
rules for each piece of data it examines. XML technology can also
provide developers the ability to develop and process their own action
(scripting) languages. The XML language is a meta-language so it can
be used to create any other language, including a scripting language.
This is a powerful use of XML technology that the industry is just
starting to explore.
Sun, XML Technology, and the Java Platform
Sun Microsystems has had a long association with the XML
specification. At this time Sun will extend its support to ensure that
the enterprise has access to XML technology from within the Java
platform. This section outlines Sun's vision of XML technology as it
relates relative to the Java platform. It also illustrates how XML
and Java technologies will work together to provide a portable
code/portable data environment.
Since early 1998, early adopters of the XML specification have been
using Java technology to parse XML and build XML applications for a
variety of reasons. Java technology's portability provides developers
with an open and accessible market for sharing their work and XML data
portability provides the means to build declarative, reusable
Development efforts within the XML community clearly illustrate this
benefit. In contrast to many other technology communities, those
building on XML technology always have been driven by the need to
remain open and facilitate sharing. Java technology has enabled these
communities to share markup languages as well as code to process
markup languages across most major hardware and operating system
Sun Microsystems' vision for XML and Java technologies is to provide a
platform that embodies portable data and portable maintainable code to
produce platform-independent standards-based applications. Sun
Microsystems clearly recognizes that the enterprise IT community has
accepted XML technology because of its ability to simply represent
semi-structured data. And, the availability of Java virtual machines
on every major hardware platform and operating system means that users
will have the ability to process that semi-structured data anywhere in
Java Platform Standard Extension for XML Technology
To further help users leverage the power of XML and the Java platform,
Sun Microsystems is working through the Java Community Process to
develop a Java platform standard extension for the XML specification.
The Java Community Process is a formal process for developing Java
specifications, to produce standard extensions to the Java
platform. The outcome of this process will rapidly produce a
high-quality specification which uses an inclusive, consensus building
process that not only delivers the specification, but also the
reference implementation and its associated suite of compatibility
The Java Platform Standard Extension for XML technology proposes to
provide basic XML functionality to read, manipulate, and generate
text. This functionality will conform to the XML 1.0 specification
and will leverage existing efforts around Java technology APIs for XML
technology, including the W3C Document Object Model (DOM) Level 1 Core
Recommendation and the SAX (Simple API for XML) programming interface
The intent of supporting a XML technology standard extension is to:
- Ensure that it easy for developers to use XML and XML
developers to use Java technologies
- Provide a base from which to add XML features in the future
- Provide a standard for the Java platform to ensure compatible and
- Ensure a high-quality integration with the Java platform
The Java Community Process gives Java technology users the opportunity
to participate in the active growth of the Java platform. These
extensions eventually will become supported standards within the Java
platform, thus providing consistency for applications written in the
Java programming language going forward. The Java Platform Standard
Extension for XML technology will offer companies a standard way to
create and process XML documents within the Java platform.
XML Technology Makes Sense For the Java Platform
To reiterate, as a result of formatting data using XML technology,
data is interoperable across heterogeneous systems. To meet the
objectives of the enterprise, the Java platform must also interoperate
with existing applications and systems. The XML language provides a
data-centric method of moving data between Java technology and
non-Java technology platforms. While CORBA represents the method of
obtaining interoperability in a process-centric manner, it is not
always possible to use CORBA connectivity. In these cases the XML
language does an excellent job of representing the state of Java
objects as they leave and re-enter the Java virtual machine.
XML will also be used to define deployment descriptors for Enterprise
architecture. Deployment descriptors describe for EJB implementations
the rules for packaging and deploying an Enterprise JavaBeans
component. According to Sun Microsystems, the next release of EJB
will use XML technology for deployment descriptors providing, once
again, data interoperability for Java platform. It will also be used
as a standard for transmission of mission-critical business data
inside of the Java 2 platform.
XML technology holds much promise for the future. It is an
industry-wide recognized language for building representations of
semi-structured data that could be shared intra- and inter-enterprise.
However, XML lets companies describe only the data and its structure.
Additional processing logic must be applied to ensure document
validity, transportation of the documents to interested parties, and
for transforming the data into a form more useful to everyday business
The Java platform offers the enterprise a method for building
libraries once to handle the creation, maintenance, distribution, and
processing of XML documents and to leverage that process across a
multitude of hardware and operating system platforms. Sun
Microsystems' initiative to extend the Java platform to incorporate
standard processing of XML technology will give companies the security
they need to build these libraries without fearing the rapidly
changing programming interfaces and functionality.
XML and Java technologies are clearly the two most important
developments for Internet computing since the advent of the original
ARPAnet project, the basis for today's Internet. Whether used
together or separately, these two standards empower the enterprise to
forge new electronic partnerships that leverage the availability and
ubiquity of the Internet to exchange and share data.
Appendix A: Resources
http://www.w3.org/TR/1998/REC-xml-19980210 - The Extensible Markup Language 1.0
http://java.sun.com/xml - Provides news, information, and links to other resources.
http://java.sun.com/jdc - Provides preview of technologies and is home to the Java Community Process.
http://www.jxml.com - JXML Inc.'s
home page and home of XML and Java technologies initiatives. Also
home of the Java-XML Mailing List.
Appendix B: About NC.Focus
NC.Focus was founded in 1996 and has become a leading technology
research and advisory firm specializing in enterprise application
integration (EAI) techniques, tools, and technologies. We provide the
unique service of identifying the impact and benefits of enterprise
application integration trends and technologies on business,
incorporated with an in-depth explanation of how the tools and
technologies operate. As a result, we empower our clients to better
identify and choose technologies based upon specific organizational
needs and requirements.
For additional information, or to subscribe to the NC.Focus services,
please contact JP Morgenthal, president and Director of Research for
NC.Focus, at (516) 792-0997 or firstname.lastname@example.org.