The Artima Developer Community
Sponsored Link

Cool Tools and Other Stuff
JavaOne 2007, #3: Mashup a Web App with Little or No Code
by Eric Armstrong
May 14, 2007
Summary
When the information you need already exists, but it's scattered here and there around the web, you have an option. You can create a small, super-lightweight web app to put it together--a mashup. It's not quite as easy as falling off a log, but it's gotten to the point that end users can create their own applications.

Advertisement

Contents

What's a Mashup?

Mashups are super-lightweight web apps that are created in minimum time, with minimum code, from information and services that already exist on the web. You want your email, RSS feeds, and calendar all in one page? Smash the pieces into a web page that has the interface you want. Have a web-accessible GPS locator in your company's delivery trucks? Combine that feed with a a map service like Google maps to display the truck's location in real time,

The possibilities are cool, and the prospect of doing them with very little work is even cooler. so it was great to get an overview of the technologies that make it all work.

Andreas Krohn of Kapow Technologies gave a great survey of the tchnology landscape. This post summarizes his talk, mashing it together with a talk given by Sean Brydon, Greg Murray, and Mark Basier of Sun Microsystems, as well as one given by Dave Johnson (also from Sun). (The latter talks went deeper into the technologies and provided useful insights into security considerations. But mostly they introduced the most important buzzwords to know.)

The Long Tail

Software development tends to be an expensive, time-consuming process. So only the most critical projects get implemented. There are a limited number of them, but since they are used by many people, they justify the investment in a system that enhances reliability and scalability, like the Service-Oriented Architecture (SOA)--even if it is more complex and harder to use.

On the other hand, many people have a need for small, single-purpose applications that may not do much more than put information together for their purposes. Those small apps rarely get developed, because coding resources are scarce. There are a very small number of users for those apps, but there are a large number of applications. When plotted on a curve, there the number of possible applications continues to infinity as the number of users diminishes to one:

      | *
  ^   |  *
  |   |   *
Users |    *
      |     *
      |       *
      |         *
      |            *
      |                *
      |                    *
      |                         *
      |                              *
      |                                   *   *   *
      +--------------------------------------------
                   Applications -->

Andreas pointed out that mashups help to address that "long tail" at the end of the graph. All development enhancements do that to some degree, of course. But the goal of mashup technology is to get to the point that users can do it for themselves.

Structure of a Mashup

Let's say you want to put your hotel reservations, map of their locations, and a calender with your travel dates, all on one page. The goal is to see the information you need, all in one place, gathering it from whereever it happens to exist.

To create a mashup, you need:

The GUI components display data and give you a way to provide selection. The coommunication mechanism goes out to the web, delivering a package of information in a given data format. The data sources deliver one-way information, while the optional repository gives you a way to store information you want to save.

In a moment, We'll look at The most common technologies used in each area. First, let's take a look at some all-in-one mashup builders that let you create a mashup without writing a line of code.

Mashup Builders

Google Gadgets
These are fast way to see what a mashup could be. Google Gadgets are mostly display-only widgets, some of which let you specify filtering criteria or add information (like the calendar). They're pre-connected to a data service, and they provide bits of javascript you can drop into a web page. (So far, I haven't figured out how to keep them from overwriting each other, but I'm sure I'll figure it out, eventually)
 
Yahoo Pipes
With Yahoo Pipes, you drag and drop components onto a page, identify the RSS feeds you want to use for your data source, and then specify sorting and selection criteria (which you can attach to fields and other GUI components) to control the information you see.

You can also configure your mashup to publish the information it gathers, delivering information in other formats such as Atom and JSON.

The Pipes system is limited to RSS feeds, so if you want to access additional data sources, or perhaps add special functionality (like dragging items to the calendar), then you'll need to add some code.

Teqlo
This award-winning system uses Java technology. It only works with Firefox 2.0, but it provides multiple kinds of widgets--including RSS readers, todo lists, calendar, and others. Perhaps even more than the GUI building tool is the fact that the widgets can talk to each other, so you can drag and drop information from one widget to another. (You could drag an entry into the calendar, for example, or drag a calendar entry into the map to see where it's located.)
 
QEDWiki
I stumbled across this item while researching the others. IBM's QEDWiki is a PHP-based system that combines Wiki building and mashup construction in a single system that both coders and end users can customize to get the behaviors they want.

If you want to start playing with these technologies and see what kind of mashups you can construct, head over to the Resources section now. To find out how to turn normal web pages into data sources you can use in your mashups, read the next section. Following that, you'll find more information on the underlying technologies.

Mashup Enablers

When the information you is on the web, but it isn't in a form a mashup can, there is a solution: Use a Mashup Enabler to convert the information into usable form.

Microformats
Embedding microformat tags in an HTML page makes it possible to extract its contents in XML form. This kind of enabling requires cooperation from the information producer, but it's relatively easy to do, and the data tagging will remain valid even when the page layout changes.
 
Kapow and OpenKapow
Web Scrapers like Kapow and OpenKapow pull data from an HTML page and turns it into a data feed a mashup can use. (OpenKapow is the free version.) Andreas Krohn of Kapow Technologies demonstrated the process:
  • Download and run openkapow
  • Tell it to build a new service
  • Specify the URL the data comes from
  • Tell it what to search for in the page
  • Tell it which items to include in the output
  • Do an initial search
  • Specify a loop to output multiple items
  • Use menu items to extract pieces

The risk with web scraping, of course, is that the data format you're scraping could change. But the reward is that you get the app you want.

The risk/reward ratio depends on the time required to create such an app. With the all-in-one mashup build systems that serve "the long tail", the ratio becomes favorable to the point that it's worth setting up a web scraper to access critical bits of data, even if it has to change once in a while.

Note: Other services in the web scraping category include Dapper, Google Data, and the Java Mozilla HTML parser.

Mashup Technologies

What follows is a whirlwind tour of the technology buzzwords mentioned in the talks. My notes are sketchy at points, but should serve as a decent guide to the process.

Web page GUI components and GUI Builders

In the old days, you did a lot of programming to create a GUI. But in the web era, you assemble pre-built components, wiring them to data feeds. The code you write--if any--is minimal.

The components themselves are built using AJAX, of course. That means Javascript. But the tricky bit is the differences in the document object structure (DOM) that different browsers create for their web pages.

AJAX component libraries attempt to account for those differences. The degree to which they're successful determines how robust and reliable they are.

Libraries built of AJAX components include:

But even when you have the best of libraries, it takes a fair amount of work to wire them up and lay them out. That's where GUI builders come in.

For a serious enterprise app that will have many users on different browsers, one of the commercial mashup builders may make sense, for the sake of increased reliability and timely support:

On the other hand, when you're creating something for yourself, one of the open source builders may work well enough to do what you need on the browser you use regularly:

Data Formats

Microformats
An XML microformat embedded in an HTML page is one kind of data format, of course, as are the more widely known RSS and Atom formats. This section compares those formats, along with Javascript Object Notation (JSON).
 
RSS
The standard syndication formats, intended for one feed and many listeners, are RSS and Atom. Of the two, RSS is the more widely known, but it has many difficulties:
  • RSS 1.0 and RSS 2.0, despite having similar names, are backed by two entirely different groups, and are fundamentally incompatible.
  • Several variants of RSS 1.0's predecessors are still in common use, as well, from 0.92 through 0.94.
  • RSS is a very loose standard, so succesfully parsing one standard doesn't guarantee success with a different one. (For example, it doesn't specify which fields could contain HTML escaped into text form, and there is no support for summares.)
  • RSS only covers the transmission of text. It doesn't handle multimedia, binaries, or HTML.
     
Atom
The Atom specification rectifies those problems:
  • ATOM 1.0 is a single, well-specified IETF standard.
  • It handles binaries and HTML, as well as text.
  • There is a publishing protocol that can be used to create and update feeds, as well as a client protocol to consume them.
     
JSON
Individual XML formats and standard variants like RSS and Atom are great for data exchange, but they add a fair amount of overhead for a simple message. JavaScript Object Notation is a much simpler mechanism for sending small amounts of data. It's a lot easier to parse--all it takes is one line of Javascript, which makes it ideal for an AJAX client.

A client that expects JSON format data uses tbe following URL convention to express that preference:

http://someURL?format=json

The structure that comes back consists of:

  • Name/value pairs, separated by colons, where strings are surrounded by quotes:
  • "id": 5
    "name": "Bob"
    
  • Multiple pairs in an object, separated by commas, where the object is defined by braces:
  • {"id": 5, "name": "bob" }
    
  • Multiple objects in an array, separated by commas, where the array is defined by square brackets:
    [ {...},
      {....
      }
    ]
    

When delivering JSON, the server sets the MIME type with a call like this one:

response.setContentTye("appication/json;charset=UTF-8")

Within strings, special characters like quotes, backslashes, slash characters, and control chars need to be escaped.

JSONP

In an interesting augmentation of basic JSON functionality, the client can append a callback name as parameter:

http://...?jsonp=someFunction

When the server sees that command, it surrounds the JSON data with a call to the function, returning:

someFunction(...JSON data...)

So instead of doing an XMLHttpRequest, the client can use JSONP.

Note on innerHTML:
Once you've obtained objects from your data source, the easist way to insert them into the web page so they can be viewed is by using the innerHTML function call on the element you want to modify:

Communication Mechanisms

The standard communication mechansims are REST, the Atom Publishing Protocol (APP) to publish Atom, RSS, JSON, and microformats, as well as straight HTTP requests to get such data.

REST
REST stands for "REpresentational State Transfer". (Huh? What does that mean?) The standard definition goes on to state, "representations embody state". (Sorry, still no clue.) But at bottom, REST is really simple, and almost obvious. So it's worth knowing about. To help, here's an acronym that should be easier to remember:
Really Easy System for Transmitting (web data)

All REST is, in the end, is standard HTTP requests. The HTTP requests have been around for a long time, in fact. But someone finally decided to use them.

With REST, you always specify a resource with a URI. You then implement standard CRUD functions (create, read, update, delete) with the equivalent HTTP requests:

  • Http POST
  • Http GET
  • Http PUT
  • Http DELETE

When you combine REST with Javascript callback functions and JSON data, you get a lot of power at very low cost, even if the server is just pushing data to your client.

But when the server is ready to listen to post, put, and delete requests, as well as GETs, you get even more power. And that is the basis of the Atom Publishing Protocol.

Atom Publishing Protocol (APP)

The APP is a REST-based publishing mechanism for Atom format data. In additon to providing CRUD functions with post, get, put, and delete, APP provides for user authentication, which is critical in any such scenario.

Atom is a generic data format, of course. It's not just for blogs. But in addition to allowing any kind of data in an entry, Atom also allows for CRUD operations on collections of entries. In fact, it even includes next/previous links in its collections.

To publish an entry, you post to a collection URI. To read one, you use GET. Then you edit and use PUT to replace it.

Libraries that support APP include:

  • ROME: A strong Java library that handles RSS and Atom (discussed below)
  • Abdera: STAX-based Atom-only parser in Java
  • Google Data API: Google's library.

Note: Other entries in this category include the Universal Feed Parser (Python) and the Windows-only Windows RSS Platform built into IE7.

Data Sources and Repositories

Now that you've see the choices for data formats and communication mechanisms, the possibilities for a data source are fairly easy to understand (any of which could be generated by masup enabler:

The choices for a data repository are equally simple:

Coding Libraries

The libraries that make the most sense for coders on the Java platform include JSR 311, ROME, and ROME Propono.

JSR 311
This JSR is a Java API for RESTful web services. It's intended to make coding simpler and encourage good RESTful style.
 
ROME
This toolkit is a DOM-based parser/generator in Java. Arguably the most capable toolkit for RSS and Atom on the java platform, it parses and generates all forms of RSS and Atom. It's based on JDOM, and is both pluggable and extenbsible.

To consume an arbitrary feed with ROME, create a SynFeed object, read the data, and then convert it to Atom (or RSS, if you must). The command to create a new SynFeed looks something like this:

Synfeed feed = input.build(new InputStreamReader(inputStream));

When polling, take the following steps to maximize performance and minimize network traffic:

  • Use HTTP conditional GET or Etags
  • Don't poll too often
  • Use ROME's Fetcher, which includes a caching feed-store (as do other such utilities)

To publish with ROME:

  • Create a synfeed
  • Add entries to it
  • Set the content type and deliver it:
    application/rss+xml
    application/atom+xml
    
ROME Propono
ROME Propono is a client/server library that makes it easy to build a client APP, and makes it easy to add RESTful services to an existing web app.

Service-Building Strategies

In general, you want to think syndication, not coordination. In other words, provide a RESTful feed, and let others mash it up the way they want to. (Instead of having long meetings where you make sure the provider and consumer interfaces match up.)

You want to provide one or more of:

But to make sure that clients can make use of your service, you also want to provide:

You also want to set up a server-side proxy-style service, where:

(The client can also use cross-site scripting to pull together data from multiple sources.)

Ideally, the server should also be prepared to deliver data in mulitple formats, including JSON, JSONP, and XML:

if ("jsonp".equals(format) ...
   response.write("callback" + jsonp_value + content)
else if ("json".equals(format) ...
   response.setContentType("text/json")
else
   response.setContentType "text/xml")
   write ...

For performance, you'll also want to do as much caching as you can:

And you'll also want to set things up so that applications and browsers like Firefox, Safari, and IE can auto-discover your feeds, using HTML that looks roughly like this:

<link rel="alternate", "title ="My Feed (RSS)",
      href="feeds/myfeed?format=RSS">

You'll also want to ensure that you're generating a valid feed, with properly escaped HTML and well-formed XML:

Other service-building strategies include:

In summary,

Security Considerations

The information in this section came from the Java Blueprints folks-Sean Brydon, Greg Murray, and Mark Basier--who turn out to have a fair amount of expertise on the subject. Any inaccuracies here resulted from my limitations, not theirs.

To start with, use the data format that is appropriate for your level of exposure:

Securing JSON:

When using JSON, any javascript could be coming your way, so you need a certain level of trust in your data source, or added levels of security protection:

  • Use a namespace for your javascript commands
  • Use CSS for customization
  • Don't add to the prototype of common objects

Securing your services:

Other security notes:

Resources

All-in-one mashup builders:

Mashup Enablers

GUI-building technologies:

Communications and data formats:

Feed Validator:

Talk Back!

Have an opinion? Readers have already posted 2 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Eric Armstrong adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Eric Armstrong has been programming and writing professionally since before there were personal computers. His production experience includes artificial intelligence (AI) programs, system libraries, real-time programs, and business applications in a variety of languages. He works as a writer and software consultant in the San Francisco Bay Area. He wrote The JBuilder2 Bible and authored the Java/XML programming tutorial available at http://java.sun.com. Eric is also involved in efforts to design knowledge-based collaboration systems.

This weblog entry is Copyright © 2007 Eric Armstrong. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use