The Artima Developer Community
Sponsored Link

.NET Buzz Forum
HTML Agility Pack

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Duncan Mackenzie

Posts: 689
Nickname: duncanma
Registered: Aug, 2003

Duncan Mackenzie is the Visual Basic Content Strategist at msdn.microsoft.com
HTML Agility Pack Posted: Feb 10, 2004 10:12 AM
Reply to this message Reply

This post originated from an RSS feed registered with .NET Buzz by Duncan Mackenzie.
Original Post: HTML Agility Pack
Feed Title: Code/Tea/Etc...
Feed URL: /msdnerror.htm?aspxerrorpath=/duncanma/rss.aspx
Feed Description: Duncan is the Visual Basic Content Strategist at MSDN, the editor of the Visual Basic Developer Center (http://msdn.microsoft.com/vbasic), and the author of the "Coding 4 Fun" column on MSDN (http://msdn.microsoft.com/vbasic/using/columns/code4fun/default.aspx). While typically Visual Basic focused, his blogs sometimes wanders off of the technical path and into various musing of his troubled mind.
Latest .NET Buzz Posts
Latest .NET Buzz Posts by Duncan Mackenzie
Latest Posts From Code/Tea/Etc...

Advertisement
I've seen this around before, and this post was from June 2003, but it is worth mentioning again!
.NET Html Agility Pack: How to use malformed HTML just like it was well-formed XML... Here is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. It is an assembly that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what [is provided by] System.Xml, but for HTML documents (or streams).

Sample applications:
  • Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes, you name it.
  • Web scanners. You can easily get to img/src or a/hrefs with a bunch XPATH queries.
  • Web scrapers. You can easily scrap any existing web page into an RSS feed for example, with just an XSLT file serving as the binding. An example of this is provided.

  • There is no dependency on anything else than .Net's XPATH implementation. There is no dependency on Internet Explorer's dll or tidy or anything like that. There is also no adherence to XHTML or XML, although you can actually produce XML using the tool.

    Read: HTML Agility Pack

    Topic: Dread and Panic Previous Topic   Next Topic Topic: Yet Another Amazing Article from MSDN België & Luxemburg

    Sponsored Links



    Google
      Web Artima.com   

    Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use