The Artima Developer Community
Sponsored Link

.NET Buzz Forum
Including classic htm pages in an asp.net site (Look mom, no frames)

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Peter van Ooijen

Posts: 284
Nickname: petergekko
Registered: Sep, 2003

Peter van Ooijen is a .NET devloper/architect for Gekko Software
Including classic htm pages in an asp.net site (Look mom, no frames) Posted: Apr 12, 2006 4:54 AM
Reply to this message Reply

This post originated from an RSS feed registered with .NET Buzz by Peter van Ooijen.
Original Post: Including classic htm pages in an asp.net site (Look mom, no frames)
Feed Title: Peter's Gekko
Feed URL: /error.htm?aspxerrorpath=/blogs/peter.van.ooijen/rss.aspx
Feed Description: My weblog cotains tips tricks and opinions on ASP.NET, tablet PC's and tech in general.
Latest .NET Buzz Posts
Latest .NET Buzz Posts by Peter van Ooijen
Latest Posts From Peter's Gekko

Advertisement
At the moment I'm migrating my website. The old version is a bunch of static htm pages scribbled with FrontPage, the new version is an asp.net 2.0 site. As the old site is well read I don't want to break any links, a link to http://www.gekko-software.nl/DotNet/Art01.htm should keep serving the same Delphi vs. C# article  The easiest way would be to just copy the files and use IIS as a dumb "htm-file-server". But I want to have control over the pages to display them in a nice .net (master) page so I can add functionality as desired. An option could be to use frames, one frame for the aspx, one frame for the htm. But for many reasons I don't find frames very nice to work with. So here I'll present a pure .net solution.

In the first step the handler of an htm request to the site (like http://www.gekko-software.nl/DotNet/Art01.htm) has to be set to asp.net. You do this in the configuration of the virtual directory in IIS. Add the htm extension in the application configuration list and set the executable to aspnet_isapi.dll.

Now every incoming htm request for the virtual directory will be handled by asp.net.

To intercept the request and redirect it to my viewer I will install a so called HttpModule. An httpmodule is a way to be the first and the last in the handling of any request coming to the site. Installing a HttpModule is done in the web.config

<system.web>

<httpModules>
   <add name="UrlRewriter" type="Gekko.WebSite.URLrewriter"/>
</httpModules>

The module has a name and a type. This type is a class which should implement the IHttpModule interface. The class is in the app_code folder of the site.

namespace Gekko.WebSite

{

    public class URLrewriter : IHttpModule

    {

 

        #region IHttpModule Members

        public void Dispose()

        {

 

        }

 

        public void Init(HttpApplication context)

        {

            context.BeginRequest += new EventHandler(context_BeginRequest);

        }

        #endregion

 

        void context_BeginRequest(object sender, EventArgs e)

        {

            HttpApplication httpApp = sender as HttpApplication;

            string pageName = httpApp.Request.AppRelativeCurrentExecutionFilePath;

            if (pageName.EndsWith(".htm"))

            {

                httpApp.Context.RewritePath(string.Format(@"~/ArticleViewer.aspx?article={0}", pageName.Substring(2)), false);

            }

        }

    }

}

IHttpModule is a nice lean interface. It's init method is passed the full context of you web application. What I do is add a handler to the beginrequest event. Which gives my code a first look at every request coming in and the possibility to change the request. The method filters out any request for an htm and, using the Context.RewritePath method, rewrites the request url to that of my aspx page with the viewer, passing in the desired htm file in the querystring. Now all request for an htm will be served by my asp.net 2.0 code.

(You can do a lot more with HttpModules. There are many events you can hook into. The module is the first and the last one to handle, bend, modify or analyze all requests served by your app. There are loads and load of good samples to be found all over the web)

Now the viewer has to display the htm. How will it do that ? The easy part is that you can assign any html to the text of a label. The result will be that the page rendered displays it in its full glory. But I want to be a neat citizen on the web and not render any garbage. The original htm of my pages has a lot of bla-bla Frontpage headers. What my code will do is extract the real content from the htm file and assign that to the label. Any html response (should) look(s) like this

<html>
   <head>
      <title>This page is about software</title>
      .......
   </head>
   <body>
      .................
   </body>
</html>

The real content is between the body tags.

The code takes this appoach

  • Read the htm filename form the querystring
  • Read in the htm file into the rawHtml string.
  • Extract the page title using a regular expression
  • Assign the title to the viewerpage's title
  • Extract the page body by searching for the body tags
  • Assign the body to the text of a label

private void displayArticle()

{

    object o = Request.Params["article"];

    if (o != null)

    {

        string pageName = o.ToString();

        // read in the htm file

        string fullFileName = HttpContext.Current.Server.MapPath(o.ToString());

        StreamReader sr = null;

 

        try

        {

            sr = new StreamReader(fullFileName);

            string rawHtml = sr.ReadToEnd();

            // Use regex to extract title

            Regex reTitle = new Regex(@"<title\b[^>]*>(.*?)</title", RegexOptions.IgnoreCase & RegexOptions.Multiline);

            if (reTitle.Matches(rawHtml).Count > 0)

                this.Title = reTitle.Matches(rawHtml)[0].Groups[1].Value;

            // Plain search to extract body

            int bodyStart = rawHtml.IndexOf("<body");

            if (bodyStart >= 0)

            {

                bodyStart = rawHtml.IndexOf(">", bodyStart);

                int bodyEnd = rawHtml.IndexOf("</body");

                if (bodyEnd < 0)

                    bodyEnd = rawHtml.Length;

                LabelArticle.Text = rawHtml.Substring(bodyStart + 1, bodyEnd - bodyStart - 1);

            }

        }

        catch (Exception ex)

        {

            LabelArticle.Text = "Article not available";

        }

        finally

        {

            if (sr != null)

                sr.Close();

        }

    }

}

For the code to build you have to include System.Text.RegularExpressions in you using list. A regular expression is a nice way to get the title, also when the tags are spelled poorly, like <tiTle  >. The Groups[1].Value member will return the title enclosed by the tags. It would be tempting to also use a regular expression to get the body. But due to the many nested <'s and >'s inside the body that would be a pretty complicated expression. And when you manage to figure out a working one there's quite a chance it literally will take ages to evaluate. I know there is (maximum) one pair of body tags, a linear search will be fast and good enough.

The result is that all my classic pages are a full part of my asp.net 2.0 site and are still accessible by the classic url. The user won't even notice

Share this post: Email it! | bookmark it! | digg it! | reddit!

Read: Including classic htm pages in an asp.net site (Look mom, no frames)

Topic: Lara Croft 360 Previous Topic   Next Topic Topic: Unsigned Drivers and Vista

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use