The Artima Developer Community
Sponsored Link

Agile Buzz Forum
From Streams to Xtreams

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
From Streams to Xtreams Posted: Apr 16, 2010 3:40 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: From Streams to Xtreams
Feed Title: Michael Lucas-Smith
Feed URL: http://www.michaellucassmith.com/site.atom
Feed Description: Smalltalk and my misinterpretations of life
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Michael Lucas-Smith

Advertisement

Last time I discussed Xtreams, I was lamenting on some performance issues we were having with substreams. Martin and I have just finished rewriting the broken stuff and all the tests pass again. So here is a post I delayed once before that I can now finished.

Reading the contents of a file off of disk that is encoding in UTF8. My changes file is 6,613,356 bytes big, give or take. How many characters exist in the file though? and how fast can we find out?

| stream size |
stream := ('changes.cha' asFilename withEncoding: #utf8) readStream.
[contents upToEnd size] ensure: [stream close].

This yields us a result of 6,583,057 characters and it took 2.845 seconds to run. So how do we achieve the same result using xtreams?

| stream |
stream := 'changes.cha' asFilename reading encoding: #utf8.
[stream rest size] ensure: [stream close].

This yields the same character count and run in 3.427 seconds. Now imagine that the file on disk is stored using the default platform string encoding. In this case, the code for streams becomes:

'changes.cha' asFilename contentsOfEntireFile size

This ran in 1.035 seconds. The xtreams version can be smarter, since it can utilize the primitives that already exist to read using the platform encoding:

| stream |
stream := 'changes.cha' asFilename reading contentsSpecies: String.
stream rest size

This ran in 0.551 seconds. Let's say we want to read each line and count every line that starts with a < character (which has a codePoint of 60).

| stream count line |
stream := 'changes.cha' asFilename readStream binary.
count := 0.
[stream atEnd] whileFalse:
  [line := stream upTo: 13.
  (line notEmpty and: [line first = 60]) ifTrue: [count := count + 1]].

This ran in 0.61 seconds and returned 46,388. The xtreams version looks like:

| stream count |
stream := 'changes.cha' asFilename reading.
count := 0.
(stream ending: 13 inclusive: true) do: [:substream | substream get = 60 ifTrue: [count := count + 1]]

This version runs in 0.402 seconds. At this point we can start to diverge in simplicity from Streams. For example, what if you want every line in a file?

(('somefile.txt' asFilename encoding: #utf8) ending: Character cr) collect: #rest

You can use this technique to iterate over sections that are split by newlines and write out a transformed stream. You can take it a step forward and look for specific content as well:

stream := 'changes.cha' asFilename reading.
((stream encoding: #utf8) ending: 'class') collect: [:substream | stream position]

The above gave me each position in the file where the word 'class' exists. This ran in 3.188 seconds for me.

Read: From Streams to Xtreams

Topic: Smalltalk Daily 04/14/10: Using Twitter from Smalltalk Previous Topic   Next Topic Topic: Smalltalk Daily 04/13/10: Using a SAX Driver

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use