The Artima Developer Community
Sponsored Link

Agile Buzz Forum
Smalltalk XML Parsing

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
Smalltalk XML Parsing Posted: Dec 12, 2005 1:40 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: Smalltalk XML Parsing
Feed Title: Cincom Smalltalk Blog - Smalltalk with Rants
Feed URL: http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml
Feed Description: James Robertson comments on Cincom Smalltalk, the Smalltalk development community, and IT trends and issues in general.
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Cincom Smalltalk Blog - Smalltalk with Rants

Advertisement

I ran across this page in my aggregator before lunch, and flagged it for followup. That time has come - it must be testing day on this blog. Here's what jumped at me:

I give up on libxml for the time being, and think instead of Chris Petrilli’s comment that ruby (and python) performance is “not quite in the league of Smalltalk (or Lisp, likely), which have extremely mature VMs with on-the-fly compilation and optimization”. Is Smalltalk then much faster than python or ruby, or comparable with C, for the task of parsing moderately large XML files?

No. Time to load and parse my iTunes library file, an 11mb Apple plist, on a 1 GHz G4 Powerbook with VisualWorks Non-Commercial 7.3.1: about three minutes.

That didn't seem right - I use the XML code in VW extensively, so I'm pretty familiar with it. I grabbed my iTunes file (only 2.7 MB) and parsed that - took 5.5 seconds. Well, the two caveats are, that's a smaller file, and my hardware isn't his hardware. With that in mind, I went ahead and created a large XML file. I grabbed the default feed file for BottomFeeder, and saved it as an XML feed list instead of as a binary dump - like this:


file := Tools.XMLConfigFileSupport.XMLConfigFile 
                     filename: 'g:\vw74\image\feeds.xml'.
file saveObject: RSSFeedManager default subscribedFeedsFolder.
file saveConfiguration

That just dumps the 80 sample feeds into a (pretty verbose) XML format - I ended up with a 13 MB file. That seemed large enough, so I tried the parse on that:


content := 'feeds.xml' asFilename contentsOfEntireFile.
parser := XMLParser new.
parser validate: false.
Time millisecondsToRun: [parser parse: content readStream]

That last line times the execution - it ran in 17.9 seconds. Not a couple of seconds, but not 3 minutes, either. There was some GC going on during that, so I'm sure that things could be improved by simply configuring VW with a larger bite of old space up front - in dealing with large amounts of data, a fair bit of time is going to be chewed up either in allocating more memory, or GC'ng if we hit the current limits (as per the memory policy in place).

For this kind of parse to take 3 minutes, either the hardware would have to be very slow, or memory limits would have to be set badly for dealing with larger files. I'm not entirely sure what was going on.

Read: Smalltalk XML Parsing

Topic: Overhyped snow Previous Topic   Next Topic Topic: Are we live, or deadorex?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use