Ruby gurus, be gentle with me and please correct any misunderstandings below and in the rest of this series!
I want to simply build a list of strings with the empty strings removed.
The following examples are taken directly from the Python and irb interactive interpreters. The >>> are standard Python interpreter prompts, showing what I typed, with responses lacking any prefix. The Ruby irb interpreter is similar with >> prefixing my entries and => prefixing the interpreter responses.
The generator-based approach in Python is very compact:
>>> aList = ["fred", " ", "harry"]
>>> [x for x in aList if len(x.strip()) > 0]
['fred', 'harry']
So I first tried an iterator block with trailing conditional logic - it felt natural to write, pity it didn't work:
>> aList = ["fred", " ", "harry"]
=> ["fred", " ", "harry"]
>> aList.each { |x| x unless x.rstrip.empty? }
=> ["fred", " ", "harry"]
it appears that each just returns the original object.
Although it didn't work, this attempt provides a useful jumping-off point to discuss what I think is the single most powerful idiom in Ruby and how it is implemented - yield and the optional code block.
Just one quite note on container types, before we move on - Ruby arrays are equivalent to Python lists, hashes to dictionaries and there's no equivalent of the immutable Python tuples.
Any method in Ruby can have a code block argument attached to the parameters, some methods like each require such a block. The role of yield inside a method seems identical to that in Python generators. The difference is in the recipient of the yield and the resulting programming pattern. One way of looking at it would be to say Ruby yield is inside-out compared to Python yield.
Python yield is used to return control with one or more values to the calling context, which is then in charge and makes a decision as to whether to invoke the yielding function again. There is no obligation on the calling code to keep calling the generator which yielded, even if there are values in a container just begging their turn to be iterated.
Ruby yield is used to pass control and one or more values to a code block that has been sent to a method. Your yielding method is in charge and when the code block exits, control returns automatically to the yielder.
Yield is the Strategy Pattern provided in the core of the language.
For example, the open method can take a block and return the results of the block. You can write some code like a typical Python idiom:
>>> to_clean = open("test.txt").readlines()
identically in Ruby:
>> to_clean = open("test.txt").readlines()
or in Ruby with a block:
>> to_clean = open("test.txt") { |f| f.readlines() }
The block-based Ruby version is more predictable and safer. If there is a failure within the block, it guarantees the file closure. Blocks in Ruby have their own scope and are also closures - they retain the context for variables from outside the block which are used inside the block. Blocks in Smalltalk are typically used to pass logic into other method calls. With the yield style of using blocks in Ruby, they seem to have a more local purpose.
The declaration of Ruby blocks is very similar to Smalltalk blocks and straightforward. Blocks are bounded by the words do and end or a pair of braces. The Ruby convention that has evolved is to use braces for single-line examples as I've done here and use do/end for bounding multiple-line blocks.
The vertical bar or pipe character is used to bound the parameters to the block. These parameters are populated by whatever is yielding to the block so their purpose and count depends on the method you're sending the block to. The comma-separated list of parameter names is just names, no typing information of course!
Recipe 7.2 in Ruby Cookbook has a nice detailed explanation of yielding to blocks and subsequent recipes show more examples iterating over data structures.
The each method returns the container, as all Ruby expressions return a value. If you do something like printing the parameters to the block, you will see them printed first:
>> aList = ["fred", " ", "harry"]
=> ["fred", " ", "harry"]
>> aList.each { |x| print x }
fred harry=> ["fred", " ", "harry"]
Looking through the definition of the Enumerable mixin (more on what mixin means later), I spotted the collect method which does create a new array from the results of the block:
>> aList = ["fred", " ", "harry"]
=> ["fred", " ", "harry"]
>> aList.collect { |x| x.length unless x.rstrip.empty? }
=> [4, nil, 5]
Oops: it is very literal-minded - if you don't return anything in the block, a nil is put in that slot in the array being created.
Enough elegance - rather than trying to write an expression which returns the data I want, I gave up and decided to have the code block act on some data storage from outside that was pre-declared:
>> destList = []
=> []
>> aList = ["fred", " ", "harry"]
=> ["fred", " ", "harry"]
>> aList.each { |x| destList << x unless x.rstrip.empty? }
=> ["fred", " ", "harry"]
>> destList
=> ["fred", "harry"]
More on blocks in a future episode, this one posted from Hokitika on the West Coast of New Zealand's South island where we are enjoying a holiday.