Summary
I'm not talking about the early adopters writing obscure code here -- that can probably be solved with a suitable style guide. I just debugged my way through an example that should have been trivial but I only figured out because:
Advertisement
I have experience struggling through these kinds of things and
I know enough about the subject that I can understand why they did it that way.
But my concern is that this should be an example that a beginner could understand, and they can't. There's too much depth exposed.
Here's the example, which is written as a script:
import scala.io.Source._
case class Registrant(line: String) {
val data = line.split(",")
val first = data(0)
val last = data(1)
val email = data(2)
val payment = data(3).toDouble
}
val data = """Bob,Dobbs,bob@dobbs.com,25.00
Rocket J.,Squirrel,rocky@frostbite.com,0.00
Bullwinkle,Moose,bull@frostbite.com,0.25
Vim,Wibner,vim32@goomail.com,25.00"""
val lines = fromString(data).getLines
//val lines = fromString(data).getLines.toIndexedSeq
val registrants = lines.map(Registrant)
registrants.foreach(println)
registrants.foreach(println)
The class Registrant takes a String as its constructor argument, and splits it up to produce the various data items stored within that object. Thus you can open a CSV (comma-separated value file, as is produced by most spreadsheets) and parse it into a collection of Registrant objects. You would ordinarily do this by reading in a file using fromFile instead of fromString, which is how I started before seeing weird behavior.
The "strange" behavior is this: as written, the program will only list the registrants once, instead of twice as requested. Indeed, you can't do anything else to your supposed collection of Registrant objects once they've been printed the first time.
Give up? The answer is that registrants is not a collection of any kind. Because getLines returns an iterator (which is the logical thing to do), any functional operation you perform on that iteratoralso produces an iterator, and you can only use an iterator to pass through your data once. This also makes sense ... after you understand the depth of what's going on, and realize that having "iterators all the way down" is good computer science.
But no posts I looked at that discussed reading files mentioned this, because I suspect the posters (A) didn't know and (B) didn't expect it to work that way, so assumed (logically) that things would work without doing anything else.
Here's the trick I discovered, although there certainly could be other, better ways to solve it. You have to know that you're getting back an iterator, and explicitly convert it to a regular sequence by calling toIndexedSeq as seen in the commented-out line.
This means that, to do something simple and useful that a beginner might find motivating -- like manipulating spreadsheet data from a file -- you'll probably have to explain to your beginner the difference between an iterator and a collection and why an iterator only passes through once, and that you must convert to something called an IndexedSeq. You can choose to wave your hands over the issue but I find that if you throw things at people without explaining them it tends to be confusing.
You can certainly argue that this is nice and consistent from a computer science standpoint and that the whole language maintains this consistency from top to bottom which makes it quite powerful. And that's great, but it means that before you can start doing useful things you need the kind of breadth and depth of knowledge that only a computer scientist has, and that makes Scala a harder sell as a first programming language (even though many aspects of Scala are significantly easier to grasp than Java or C++).
For a much more in-depth analysis of Scala complexity by someone with greater knowledge of Scala, see this well-written article. Please note that, just like the author of that article, I'm not saying that Scala is "bad" or "wrong" or things along those lines. I like Scala and think it's a powerful language that will allow us to do things that other languages won't. But in a previous article I suggested that Scala might be a good beginner's language, and "sharp edges" like this that are exposed in what would otherwise be beginning concepts make me wonder if that's true, or if it should actually be relegated to a second or even third language, after the learner has gone through the curve with one or two other languages. So the question is not whether I can figure out this puzzle, or whether it's obvious to you -- since you are probably an experienced programmer -- but rather how much more difficult it might be to teach Scala to an inexperienced programmer.
I don't think it's fair to blame Scala for the confusion between an iterator and a collection; many other languages also have iterators. For the record, as soon as I saw the code posted, I immediately understood what was going on, because I have personally confused myself in Python also when it comes to iterators and collection. Last year I accidentally "optimized" some code in one of my Python programs, by globally replacing list comprehensions with generator comprehensions; the resulting code broke because something really needed to be a list in order to be iterated over twice from different places in the program. I don't blame Python for my error.
In Scala, there is even less of a reason to get the two confused, because it is statically typed: for clarity, one can forego some type inference and explicitly write out the type, to alert the code reader to what is going on:
val result : Iterator[String] = fromString(data).getLines
Bruce: I have seen quite a few examples of "Scala is confusing to non-experts", and it may well be the case, but I don't feel that this example is a strong argument in its favor. I suppose I just don't find this behavior to be all that surprising. Would I make a mistake like this? Absolutely -- I have done similar things many times in the past! If the second use were far removed from the first use then perhaps I would get confused and take some time figuring out what was wrong. But as you presented it, the problem becomes immediately obvious... you can iterate the list once but not twice. It will be less obvious WHY that is... especially if I don't know much about iterators, but "stick it in a list" is the obvious solution, and just how to do that isn't too difficult to figure out.
I don't think this is about iterators vs. collections. It is about mutability. Iteration constructs don't have to be mutable (and should not be, by default). E.g. in Clojure:
Stuart, I agree with you that stateless iteration is better, but that is a whole topic in itself. The Haskell world is very active in that general area.
Scala has gone the Java way with its iterators (that in turn were inspired by those in C++ STL). Hence, the very interface is fundamentally designed for a mutable external iterator that is really a "cursor": http://www.scala-lang.org/api/current/scala/collection/Iterator.html
I am still a novice at Scala, but I presume there are more functional libraries for Scala that bypass the Java legacy interface.
As others have noted this example behaves exactly the same in Python, which is generally perceived to be a good beginners language.
Small note on code style: Instead of toIndexedSeq, it's shorter to use just toSeq. You do not need random access of your sequence, so specifying toIndexedSeq is overkill.
Interesting, the equivalent program in F#, and probably also C# will iterate through the registrants twice, as expected. open System
type Registrant = { Data : string; First : string; Last : string; Email : string; Payment : decimal }
let loadRegistrant (data : string) = let lines = data.Split([|','|]) { Data = data First = lines.[0] Last = lines.[1] Email = lines.[2] Payment = Decimal.Parse(lines.[3]) }
let data = @"Bob,Dobbs,bob@dobbs.com,25.00 Rocket J.,Squirrel,rocky@frostbite.com,0.00 Bullwinkle,Moose,bull@frostbite.com,0.25 V im,Wibner,vim32@goomail.com,25.00"
The main difference between the F# and scala versions is that Seq.map returns an IEnumerable, which would roughly be equivalent to the Iterable trait in scala. Both Iterable and IEnumerable produce a cursor like object that iterates once through the collection of objects.
It is interesting that F# implements higher order functions like map, and fold only on the IEnumerable (scala Iterable). Scala implements these functions on both Iterable, and Iterator, which is causing the confusion in this particular program.
Using higher order functions on Iterable (F# IEnumerable) prevents you from attempting to use iterators that are already consumed. However, the second time you iterate over the sequence, an entirely new collection of objects is generated. If this is not your intention, you will need to cache the first sequence using something like Seq.cache, or Seq.toList
There aren't all that many Scala methods that return an iterator; the ones that do do so for a good reason. For example, getLines doesn't know if the source (usually a file) is ten million lines long. Or look at combinations, permutations, groups, or sliding in Seq.
Alternatively, they could have returned lazy collections. That might have been a little nicer.
Anyway, this is a mistake that everyone makes once, and I don't think it is generally hard to understand, particularly if your source was a file, not a string. Soon afterwards your fingers learn to type getLines.toArray (or, if you are a computer scientist, toSeq :-)).
BTW, why use a Source in the first place? What's wrong with
I am a bit late to this, but it is worth noting that the usual way in Scala to get a lazy immutable collection (a Stream) is to call toStream on the Iterator. Iterators and Streams are kind of the mutable/immutable equivalents and the conversion between them is trivial.
There're not many dumb-proof programming languages and not everybody can or should write programs. But if you want to and you get stuck, you can ask and possibly learn.
Report from a Scala newbie Talk about a mistake that everyone makes once. I've just made it - and then found this page after a few searches. I was trying the following snippet, never dreaming for a moment that toList does NOT return a list.
object ExpListLines { def main(args: Array[String]) = { val lines = io.Source.fromFile("some txt file").getLines // println(lines.length) for(l <- lines) println("[" + l + "]") } } I was rather surprised to learn that the lines didn't show up in the output - until I remembered Eckel's post I read the day before and commented the call to length.
It may be true that this mistake is made only once, but I also feel that everyone makes it.
Flat View: This topic has 17 replies
on 2 pages
[
12
|
»
]