Sponsored Link •
|
Summary
When the information you need already exists, but it's scattered here and there around the web, you have an option. You can create a small, super-lightweight web app to put it together--a mashup. It's not quite as easy as falling off a log, but it's gotten to the point that end users can create their own applications.
Advertisement
|
Contents
- What's a Mashup?
- The Long Tail
- Structure of a Mashup
- Mashup Builders: Google Gadgets, Yahoo Pipes, Teqlo, QEDWiki
- Mashup Enablers: Microformats & Web Scrapers (OpenKapow, Kapow, Dapper)
- Mashup Technologies
- Web page GUI components and GUI Builders: GWT, Xap, NetBeans
- Data Formats: Microformats, RSS, Atom, JSON, JSONP
- Communication Mechanisms: REST, APP
- Data Sources and Repositories
- Coding Libraries: JSR 311, ROME, ROME Propono
- Service-Building Strategies
- Security Considerations
- Resources
Mashups are super-lightweight web apps that are created in minimum time, with minimum code, from information and services that already exist on the web. You want your email, RSS feeds, and calendar all in one page? Smash the pieces into a web page that has the interface you want. Have a web-accessible GPS locator in your company's delivery trucks? Combine that feed with a a map service like Google maps to display the truck's location in real time,
The possibilities are cool, and the prospect of doing them with very little work is even cooler. so it was great to get an overview of the technologies that make it all work.
Andreas Krohn of Kapow Technologies gave a great survey of the tchnology landscape. This post summarizes his talk, mashing it together with a talk given by Sean Brydon, Greg Murray, and Mark Basier of Sun Microsystems, as well as one given by Dave Johnson (also from Sun). (The latter talks went deeper into the technologies and provided useful insights into security considerations. But mostly they introduced the most important buzzwords to know.)
Software development tends to be an expensive, time-consuming process. So only the most critical projects get implemented. There are a limited number of them, but since they are used by many people, they justify the investment in a system that enhances reliability and scalability, like the Service-Oriented Architecture (SOA)--even if it is more complex and harder to use.
On the other hand, many people have a need for small, single-purpose applications that may not do much more than put information together for their purposes. Those small apps rarely get developed, because coding resources are scarce. There are a very small number of users for those apps, but there are a large number of applications. When plotted on a curve, there the number of possible applications continues to infinity as the number of users diminishes to one:
| * ^ | * | | * Users | * | * | * | * | * | * | * | * | * | * * * +-------------------------------------------- Applications -->
Andreas pointed out that mashups help to address that "long tail" at the end of the graph. All development enhancements do that to some degree, of course. But the goal of mashup technology is to get to the point that users can do it for themselves.
Let's say you want to put your hotel reservations, map of their locations, and a calender with your travel dates, all on one page. The goal is to see the information you need, all in one place, gathering it from whereever it happens to exist.
To create a mashup, you need:
- Web page GUI components and a GUI Builder
- A communication mechanism and a data format
- One or more data sources you can access
- Optionally, a data repository you can interact with
The GUI components display data and give you a way to provide selection. The coommunication mechanism goes out to the web, delivering a package of information in a given data format. The data sources deliver one-way information, while the optional repository gives you a way to store information you want to save.
In a moment, We'll look at The most common technologies used in each area. First, let's take a look at some all-in-one mashup builders that let you create a mashup without writing a line of code.
You can also configure your mashup to publish the information it gathers, delivering information in other formats such as Atom and JSON.
The Pipes system is limited to RSS feeds, so if you want to access additional data sources, or perhaps add special functionality (like dragging items to the calendar), then you'll need to add some code.
If you want to start playing with these technologies and see what kind of mashups you can construct, head over to the Resources section now. To find out how to turn normal web pages into data sources you can use in your mashups, read the next section. Following that, you'll find more information on the underlying technologies.
When the information you is on the web, but it isn't in a form a mashup can, there is a solution: Use a Mashup Enabler to convert the information into usable form.
- Download and run openkapow
- Tell it to build a new service
- Specify the URL the data comes from
- Tell it what to search for in the page
- Tell it which items to include in the output
- Do an initial search
- Specify a loop to output multiple items
- Use menu items to extract pieces
The risk with web scraping, of course, is that the data format you're scraping could change. But the reward is that you get the app you want.
The risk/reward ratio depends on the time required to create such an app. With the all-in-one mashup build systems that serve "the long tail", the ratio becomes favorable to the point that it's worth setting up a web scraper to access critical bits of data, even if it has to change once in a while.
Note: Other services in the web scraping category include Dapper, Google Data, and the Java Mozilla HTML parser.
What follows is a whirlwind tour of the technology buzzwords mentioned in the talks. My notes are sketchy at points, but should serve as a decent guide to the process.
In the old days, you did a lot of programming to create a GUI. But in the web era, you assemble pre-built components, wiring them to data feeds. The code you write--if any--is minimal.
The components themselves are built using AJAX, of course. That means Javascript. But the tricky bit is the differences in the document object structure (DOM) that different browsers create for their web pages.
AJAX component libraries attempt to account for those differences. The degree to which they're successful determines how robust and reliable they are.
Libraries built of AJAX components include:
- prototype
- scrip.aculo.us
- Dojo
- jMaki, where the "j" stands for JavaScript, and "Maki" is Japanese for "wrapper", or "container".
But even when you have the best of libraries, it takes a fair amount of work to wire them up and lay them out. That's where GUI builders come in.
For a serious enterprise app that will have many users on different browsers, one of the commercial mashup builders may make sense, for the sake of increased reliability and timely support:
- BackBase
- NexaWeb
On the other hand, when you're creating something for yourself, one of the open source builders may work well enough to do what you need on the browser you use regularly:
- Google web Toolkit (GWT): Java classes that generate Javascript, so you can write cleaner code and use the compiler to help detect errors. With GWT you can browse the libraries, take advantage of code completion, refactor with the IDE, do unit testing, and rapidly cycle between coding and testing. Perhaps more importantly, the GWT libraries were designed to optimize the end user experience and only then, where possible, optimize the developer experience.
- XAP: The open source version of NexaWeb
- NetBeans: for jMaki
A client that expects JSON format data uses tbe following URL convention to express that preference:
http://someURL?format=json
The structure that comes back consists of:
"id": 5 "name": "Bob"
{"id": 5, "name": "bob" }
[ {...}, {.... } ]
When delivering JSON, the server sets the MIME type with a call like this one:
response.setContentTye("appication/json;charset=UTF-8")
Within strings, special characters like quotes, backslashes, slash characters, and control chars need to be escaped.
In an interesting augmentation of basic JSON functionality, the client can append a callback name as parameter:
http://...?jsonp=someFunction
When the server sees that command, it surrounds the JSON data with a call to the function, returning:
someFunction(...JSON data...)
So instead of doing an XMLHttpRequest, the client can use JSONP.
<span onmouseover='alert(document.cookie);'>Data</span>
The standard communication mechansims are REST, the Atom Publishing Protocol (APP) to publish Atom, RSS, JSON, and microformats, as well as straight HTTP requests to get such data.
Really Easy System for Transmitting (web data)
All REST is, in the end, is standard HTTP requests. The HTTP requests have been around for a long time, in fact. But someone finally decided to use them.
With REST, you always specify a resource with a URI. You then implement standard CRUD functions (create, read, update, delete) with the equivalent HTTP requests:
- Http POST
- Http GET
- Http PUT
- Http DELETE
When you combine REST with Javascript callback functions and JSON data, you get a lot of power at very low cost, even if the server is just pushing data to your client.
But when the server is ready to listen to post, put, and delete requests, as well as GETs, you get even more power. And that is the basis of the Atom Publishing Protocol.
Atom Publishing Protocol (APP)
The APP is a REST-based publishing mechanism for Atom format data. In additon to providing CRUD functions with post, get, put, and delete, APP provides for user authentication, which is critical in any such scenario.
Atom is a generic data format, of course. It's not just for blogs. But in addition to allowing any kind of data in an entry, Atom also allows for CRUD operations on collections of entries. In fact, it even includes next/previous links in its collections.
To publish an entry, you post to a collection URI. To read one, you use GET. Then you edit and use PUT to replace it.
Libraries that support APP include:
- ROME: A strong Java library that handles RSS and Atom (discussed below)
- Abdera: STAX-based Atom-only parser in Java
- Google Data API: Google's library.
Note: Other entries in this category include the Universal Feed Parser (Python) and the Windows-only Windows RSS Platform built into IE7.
Now that you've see the choices for data formats and communication mechanisms, the possibilities for a data source are fairly easy to understand (any of which could be generated by masup enabler:
- RESTful service
- RSS or Atom feed (RSSBus, Grazr, Java ROME)
The choices for a data repository are equally simple:
- RESTful service (especially one that supports APP)
- Local storage (for a rich client implementation)
The libraries that make the most sense for coders on the Java platform include JSR 311, ROME, and ROME Propono.
To consume an arbitrary feed with ROME, create a SynFeed object, read the data, and then convert it to Atom (or RSS, if you must). The command to create a new SynFeed looks something like this:
Synfeed feed = input.build(new InputStreamReader(inputStream));
When polling, take the following steps to maximize performance and minimize network traffic:
To publish with ROME:
application/rss+xml application/atom+xml
In general, you want to think syndication, not coordination. In other words, provide a RESTful feed, and let others mash it up the way they want to. (Instead of having long meetings where you make sure the provider and consumer interfaces match up.)
You want to provide one or more of:
- APIs
- A RESTful web service
- Atom or RSS feeds
But to make sure that clients can make use of your service, you also want to provide:
- Client-side javascript libraries
- A client-side widget
- Client-side CSS
- Documentation for the API
- An example showing how it all fits together
You also want to set up a server-side proxy-style service, where:
- The browser gets a web page from the host server
- The web page gets Javascript file and CSS from the mashup server
- The javascript gets data from the mashup database server
(The client can also use cross-site scripting to pull together data from multiple sources.)
Ideally, the server should also be prepared to deliver data in mulitple formats, including JSON, JSONP, and XML:
if ("jsonp".equals(format) ... response.write("callback" + jsonp_value + content) else if ("json".equals(format) ... response.setContentType("text/json") else response.setContentType "text/xml") write ...
For performance, you'll also want to do as much caching as you can:
- Client-side cache via HTTP Conditional GET
- Proxy server cache via HTTP headers
- Server side cache using your favorite cache technology
And you'll also want to set things up so that applications and browsers like Firefox, Safari, and IE can auto-discover your feeds, using HTML that looks roughly like this:
<link rel="alternate", "title ="My Feed (RSS)", href="feeds/myfeed?format=RSS">
You'll also want to ensure that you're generating a valid feed, with properly escaped HTML and well-formed XML:
- feedvalidator.org (works on all formats)
Other service-building strategies include:
- DWR: A combination of servlets on the server and Javascript on client--good for a tightly integrated application
- JavaServer Faces to wrap Ajax components
In summary,
- Consider json for data interchanges
- Strive for RESTful design
- Access the server using a server-side proxy stragegy
- Build a RESTful service so others can mashup your site
The information in this section came from the Java Blueprints folks-Sean Brydon, Greg Murray, and Mark Basier--who turn out to have a fair amount of expertise on the subject. Any inaccuracies here resulted from my limitations, not theirs.
To start with, use the data format that is appropriate for your level of exposure:
- JSON: When you're in the same domain as the data provider. (You could get arbitrary Javascript, and that Javascript will execute, so you want to be inside your firewall.)
- JSONP: When you are inside or outside of your domain. (You're specifying the javascript to execute with this format, so it's safer. That makes this the easiest and most portable option.)
- XML: When you're getting data from outside your domain, or when the client isn't using Javascript, which would let it easily parse JSON data.
When using JSON, any javascript could be coming your way, so you need a certain level of trust in your data source, or added levels of security protection:
- Use a namespace for your javascript commands
- Use CSS for customization
- Don't add to the prototype of common objects
Securing your services:
- Create a token:
- Create a file with apikey (It's secure because it can't be edited with javascript.)
- Use a session-based hash to make sure that a session has been established (so the request isn't coming in at random):
- Create an API key generated from the URL, using a one-way hash
- The user registers to get the API key (a very long string)
- The key is attached to the GET request
- The host name is mapped against the hash for access.
Other security notes:
- Don't change state with HTTP GET
- Add a security token to your forms
- Don't just use cookies to validate the user
- When rendering query string data, verify it
All-in-one mashup builders:
- BackBase: http://www.backbase.com/ (Unfortunately, this page has a redirect that defeats the back button.)
- NexaWeb: http://www.nexaweb.com/
- QEDWiki (IBM): http://services.alphaworks.ibm.com/qedwiki/
- Pipes (Yahoo): http://pipes.yahoo.com/pipes/
- Teqlo: http://www.teqlo.com/
Mashup Enablers
- Dapper: http://www.dapper.net/
- Google Gadgets: http://desktop.google.com/plugins/
- Kapow Technologies: http://www.kapowtech.com/
- Microformats: http://microformats.org/
- OpenKapow: http://www.openkapow.com/
GUI-building technologies:
- Dojo: http://dojotoolkit.org/
- DWR: http://getahead.org/dwr/
- Google Web Tookkit (GWT): http://code.google.com/webtoolkit/
- Java Server Faces: http://java.sun.com/javaee/javaserverfaces/
- jMaki: https://ajax.dev.java.net/
- XAP: http://incubator.apache.org/xap/
Communications and data formats:
- Abdera (Apache): http://incubator.apache.org/abdera/
- Atom: - Spec: http://www.ietf.org/rfc/rfc4287.txt - Overview: http://www-128.ibm.com/developerworks/xml/library/x-atom10.html
- Google Data: http://code.google.com/apis/gdata/index.html
- JSON / JSONP: http://www.json.org/
- REST: http://rest.blueoxen.net/cgi-bin/wiki.pl
- ROME / Propono: - http://wiki.java.net/bin/view/Javawsxml/Rome - http://wiki.java.net/bin/view/Javawsxml/RomePropono
- RSS 1.0: http://web.resource.org/rss/1.0/
- RSS 2.0: http://feedvalidator.org/docs/rss2.html
Feed Validator:
Have an opinion? Readers have already posted 2 comments about this weblog entry. Why not add yours?
If you'd like to be notified whenever Eric Armstrong adds a new entry to his weblog, subscribe to his RSS feed.
Eric Armstrong has been programming and writing professionally since before there were personal computers. His production experience includes artificial intelligence (AI) programs, system libraries, real-time programs, and business applications in a variety of languages. He works as a writer and software consultant in the San Francisco Bay Area. He wrote The JBuilder2 Bible and authored the Java/XML programming tutorial available at http://java.sun.com. Eric is also involved in efforts to design knowledge-based collaboration systems. |
Sponsored Links
|