This post originated from an RSS feed registered with .NET Buzz
by Peter van Ooijen.
Original Post: Including classic htm pages in an asp.net site (Look mom, no frames)
Feed Title: Peter's Gekko
Feed URL: /error.htm?aspxerrorpath=/blogs/peter.van.ooijen/rss.aspx
Feed Description: My weblog cotains tips tricks and opinions on ASP.NET, tablet PC's and tech in general.
At the moment I'm migrating my website. The old version is a bunch of static htm pages scribbled with FrontPage, the new version is an asp.net 2.0 site. As the old site is well read I don't want to break any links, a link to http://www.gekko-software.nl/DotNet/Art01.htm should keep serving the same Delphi vs. C# article The easiest way would be to just copy the files and use IIS as a dumb "htm-file-server". But I want to have control over the pages to display them in a nice .net (master) page so I can add functionality as desired. An option could be to use frames, one frame for the aspx, one frame for the htm. But for many reasons I don't find frames very nice to work with. So here I'll present a pure .net solution.
In the first step the handler of an htm request to the site (like http://www.gekko-software.nl/DotNet/Art01.htm) has to be set to asp.net. You do this in the configuration of the virtual directory in IIS. Add the htm extension in the application configuration list and set the executable to aspnet_isapi.dll.
Now every incoming htm request for the virtual directory will be handled by asp.net.
To intercept the request and redirect it to my viewer I will install a so called HttpModule. An httpmodule is a way to be the first and the last in the handling of any request coming to the site. Installing a HttpModule is done in the web.config
The module has a name and a type. This type is a class which should implement the IHttpModule interface. The class is in the app_code folder of the site.
IHttpModule is a nice lean interface. It's init method is passed the full context of you web application. What I do is add a handler to the beginrequest event. Which gives my code a first look at every request coming in and the possibility to change the request. The method filters out any request for an htm and, using the Context.RewritePath method, rewrites the request url to that of my aspx page with the viewer, passing in the desired htm file in the querystring. Now all request for an htm will be served by my asp.net 2.0 code.
(You can do a lot more with HttpModules. There are many events you can hook into. The module is the first and the last one to handle, bend, modify or analyze all requests served by your app. There are loads and load of good samples to be found all over the web)
Now the viewer has to display the htm. How will it do that ? The easy part is that you can assign any html to the text of a label. The result will be that the page rendered displays it in its full glory. But I want to be a neat citizen on the web and not render any garbage. The original htm of my pages has a lot of bla-bla Frontpage headers. What my code will do is extract the real content from the htm file and assign that to the label. Any html response (should) look(s) like this
<html> <head> <title>This page is about software</title> ....... </head> <body> ................. </body> </html>
The real content is between the body tags.
The code takes this appoach
Read the htm filename form the querystring
Read in the htm file into the rawHtml string.
Extract the page title using a regular expression
Assign the title to the viewerpage's title
Extract the page body by searching for the body tags
For the code to build you have to include System.Text.RegularExpressions in you using list. A regular expression is a nice way to get the title, also when the tags are spelled poorly, like <tiTle >. The Groups[1].Value member will return the title enclosed by the tags. It would be tempting to also use a regular expression to get the body. But due to the many nested <'s and >'s inside the body that would be a pretty complicated expression. And when you manage to figure out a working one there's quite a chance it literally will take ages to evaluate. I know there is (maximum) one pair of body tags, a linear search will be fast and good enough.
The result is that all my classic pages are a full part of my asp.net 2.0 site and are still accessible by the classic url. The user won't even notice