Summary
Dare Obasanjo, author of the open-source RSS Bandit reader, and a program manager at Microsoft, compares new features of C# 3.0 with Python and Ruby from the perspective of dynamically typed code and functional programming.
Advertisement
After having spent several years working primarily with C#, Microsoft program manager Dare Obasanjo decided to explore the world of dynamic languages by learning Python. He then revisited the latest features of C# 3.0 with that dynamic programming experience, realizing that C# 3.0 provides many of the language features that make some developers prefer dynamically-typed languages. Obasanjo summarizes his experience in a recent blog post, Does C# 3.0 Beat Dynamic Languages at their Own Game?:
Shortly after I started using Python regularly as part of the prototyping process for developing new features for RSS Bandit, I started trying out C# 3.0. I quickly learned that a lot of the features I'd considered as language bloat a couple of months ago actually made a lot of sense if you're familiar with the advantages of dynamic and functional programming approaches to the tasks of software development. In addition, C# 3.0 actually fixed one of the problems I'd encountered in my previous experience with a dynamic programming language while in college.
As I started investigating C# 3.0, I discovered that almost all the features I'd fallen in love with in Python which made my life as a developer easier had been integrated into C#. In addition, there was also a feature which is considered to be a killer feature of the Ruby programming language which also made it into C# 3.0...
C# has added features that make it close to being on par with the expressiveness of functional and dynamic programming languages. The only thing missing is dynamic typing (not duck typing), which I’ve come to realize ... has a lot more going for it than lots of folks in the strongly and statically typed world would care to admit...
In his review blog post, Obasanjo notes his own journey to dynamic typing via a stint on Microsoft's XML team:
The first half of my career at Microsoft was spent working on the XML team which was responsible for the core XML processing APIs that are utilized by the majority of Microsoft's product line. One of the things that was so cool about XML was that it enabled data formats to be as strongly structured or semi-structured depending on the needs of the application...
However one problem we repeatedly bumped against is that data formats that can have unknown data types show up in them at runtime bump up against the notion of static typing that is a key aspect of languages in C#...
I came to the realization that some degree of dynamism is desirable especially when dealing with the loosely coupled world of distributed programming on the Web. I eventually decided to ignore my earlier misgivings and start exploring dynamic programming languages. I chose IronPython because I could focus on learning the language while relying on the familiar .NET Framework class library when I wanted to deal with necessary tasks like file I/O or Web requests.
One C# 3.0 feature that Obasanjo especially likes is the language's support for lambda expressions, something both Python and Ruby feature as well:
Creating a short hand syntax where anonymous blocks of code can be treated as function objects is now commonly known as "lambda expressions". Although C# has had functions as first class objects since version 1.0 with delegates and introduced anonymous delegates in C# 2.0, it is in C# 3.0 where the short hand syntax of lambda expressions has found its way into the language.
Next, Obasanjo compares Python's list comprehensions with C# 3.0's language-integrated query:
A common programming task is to iterate over a list of objects and either filter or transform the objects in the list thus creating a new list. Python has list comprehensions as a way of simplifying this common programming task...
Certain recurring programming patterns become more obvious as a programming language evolves, these patterns first become encapsulated by APIs and eventually become part of the programming language’s syntax. This is what happened in the case of the Python’s map() and filter() functions which eventually gave way to list comprehensions...
In C# 3.0, the language designers made the observation that performing SQL-like projection and selection is really the common operation and not just filtering/mapping of lists. This lead to Language Integrated Query (LINQ)...
These are two fundamentally different approaches to tackling the same problem. Where LINQ really shines is when it is combined with custom data sources that have their own query languages such as with LINQ to SQL and LINQ to XML which map the query operations to SQL and XPath queries respectively.
The rest of Obasanjo's post compares tuples with anonymous types, and dynamic typing versus type inferencing. He also focuses on extension methods in Ruby and C# 3.0.
Do you agree with Obasanjo that C# 3.0 provides many of the benefits of dynamic languages in terms of productivity?
C# provides the best of both worlds. For instance, the primary benefit of Ruby and Python is, ostensibly, the dynamic type system; it cuts through a lot of the otherwise verbose syntax of statically typed languages such as Java. Alas, the hidden cost of dynamically typed languages is of course lack of tools; without static types there's no deterministic or practical means to provide rich compile time feedback and code completion i.e., IDEs for dynamic languages suck. I think of IntelliJ IDEA as a solid IDE.
Now some have argued that with dynamic languages, esp. Ruby, you don't need code completion and the like because you don't write the code in the first place. It's a tired argument that has been pulverized more times than I can count, somehow it keeps surfacing though... probably because Java sucks, but I won't go there now.
Anyway the argument is that since you don't have static types in Ruby you don't code them, thus you don't need code completion. Voila! Indeed, that's the keystone of Ruby itself... well if you consider blocks as first class types, which they are. What's frequently missing in some of the rebuttals to this argument, however, is the concept of type inference -- the ability to infer a type based on context of use. For example, an intensely frustrating experience in coding Java is initializing a local variable with a parameterized instance of a generic class:
ArrayList<Person> l = new ArrayList<Person>();
Lord, that is awful. Dynamic languages sure look attractive when you're face with the tedium of typing junk like that all day. With C#, though, a good portion of it disappears. Witness the var statement:
var l = new ArrayList<Person>();
That's refreshing and it feels like dynamic typing, save the parameterization. And it's the simplest form of type inferencing there could be. Let's consider a more involved example using iteration in Java:
for( Person p : l )
{
p.setAge( 42 );
}
This is also frustrating. Why do we have to explicitly declare the type of p as Person? Ruby people roll eyes. Java people who don't know better (most of them) don't question it. On-the-edge C# people rejoice! They can do this:
foreach( var p in l )
{
p.Age = 42;
}
The loop var p is inferred from the parameterized list l. Nice. Then I can type "p." and get code completion of p in terms of Person. That's what you can't get from Ruby or any other dynamically typed language, and never reliably will. That's just a small slice of why I claim C# harnesses the best of both worlds. There are many other much more sophisticated examples involving lambdas/closures/blocks. For example, the for-statement could be replaced with an each() method on List having a block as the parameter. There p's type is also inferred, thus avoiding the unnecessary coding, yet still providing all the power of the IDE that you can't get with Ruby.
All the other good wholesome benefits of statically typed languages are built in as well. Compile time feedback is a godsend once you've struggled with a large codebase in a dynamic language. That's where Ruby and Python break down quick. Large projects without good IDEs and tools are quite a thing to reckon with especially if you're a new guy on the project. As much as I hate Java, a large project on it is much more approachable with a solid IDE such as IDEA than is a similar project on Ruby or Python. For now I can only dream of using a statically typed language with type inferencing, closure features, etc. such as with C# -- this is the future of programming languages.
> As much as I hate Java, a large > project on it is much more approachable with a solid IDE > such as IDEA than is a similar project on Ruby or Python.
People like Steve Yegge would claim that the idea of bringing up and maintaining "big projects" is exactly what make them suck - not to speak about their supporters.
Nevertheless I agree that programming C# 3.0 with VS 2008 is a nice experience. We have the choice between icecream and chocolate these days. The world - at least for programmers - has become a better place.
>Then I can type "p." and get code completion of p in terms of Person.
++
Tool support (code completion, code navigation, refactoring tools) accounts for 80-90% of the practical benefits of static typing. Discoverability isn't just a UI concept: it applies to programming as well. In IntelliJ, hitting ctrl-Q and getting the javadoc for a method or ctrl-B and jumping to the definition, finding overrides, etc. All killer features for understanding a code base, and very difficult to implement well with a dynamic language.
The two features that massively reduce code bloat in a statically typed programming language are some level of type inference and some form of closures/blocks. The first eliminates redundant type specification and the second allows operations over data structures (map, findAll, etc.) to be one-liners rather than explicitly coded for-loops (often with accumulator variables).
With these two things done well, statically types languages are almost as easy on the eyes as dynamic languages like Ruby.
IMO C# 3.0 does a pretty good job of this. I think our internal programming language at Guidewire, GScript, does a better one <smile/>.
As an example, consider this Ruby:
def emps_with_salary_less_than( amt ) @employees.find_all {|emp| emp.salary < amt } end
And this GScript:
function empsWithSalaryLessThan( amt : int ) : List<Person> { return _employees.findAll( \ emp -> emp.Salary < amt ) }
There are three extra bits of annotation in the strongly typed GScript: the argument type, the function return type, and the inclusion of an explicit return statement. That seems like a small price to pay for code completion when you hit '.' after 'emp'.
Dynamic languages do let you do some interesting meta-programming tricks as well, but there are other ways to address that style of programming in statically typed languages. GScript does it in a unique way. I'll find out if I can discuss it publicly.
Wow! GScript looks like an awesome language. I see a lot going on in your example (OOP, closures, generics!) Why is it kept internal? In other words in this day and age why would a general purpose language be considered proprietary? Imho, your company would do itself a (potentially huge) favor by exposing GScript publicly, if not as open source.
Questions: - What platform[s] does GScript support? - Assuming it is compiled, is it an embedded language i.e., can you easily compile in-memory, say in a web server? - Can you elaborate on GScript's feature set? Is it fully OOPified, generics implementation, Java interoperability, app server support, debugger support, IDE support, etc. - GScript appears to have a lot in common with Scala, but with a more conventional syntax (which I prefer). Could you provide more code examples, just for fun? - Looking up Guidewire, it appears to be an insurance software company (?) How/why in world does an insurance software company build a language like GScript?? (asked in a positive tone)
A reasonably good article but Dare's critique of type inference in C# 3.0 misses the mark. He is not using C# 3.0 idioms and that's why he's running into problems. I explain and provide an alternate implementation of his sample code on my blog at http://themechanicalbride.blogspot.com/2008/01/misunderestimating-c-30.html
> def emps_with_salary_less_than( amt ) > @employees.find_all {|emp| emp.salary < amt } > end >
> And this GScript: >
> function empsWithSalaryLessThan( amt : int ) : List<Person> { > return _employees.findAll( \ emp -> emp.Salary < amt ) > } >
> There are three extra bits of annotation in the strongly > typed GScript: the argument type, the function return > type, and the inclusion of an explicit return statement. > That seems like a small price to pay for code completion > when you hit '.' after 'emp'.
It's 7 tokens in a 27 token program. As written, it's about 25% cost in code space.
Anyway, I suspect most of this could go away.
At the simplest level, the return statement could be inferred - last expression in the block could give the return if no return statement is included. I'm actually surprised you don't do that already, as you have enough information.
At a less trivial level, if we assume that the _employees variable is typed, we could infer a type of "Must be comparable to int" for amt, and "List<Person>" for the find_all, assuming a usable declaration of the type of find_all. Of course, if you want to do the latter inference, you might want to re-introduce the return just to show that you actually DO want to return a value.
WRT IDE jump functionality being a killer feature for understanding a codebase: Yes, though that ease also translates into people not being careful about how they structure their codebase, and the expense of a larger codebase to enable the functionality.
I've worked on Java and Perl codebases implementing roughly the same thing (very similar applications in two different companies where the apps were supposed to be merged, a little more functionality in the Perl version). The senior programmer on the Java project, having worked with it for several years, regularly spent 20 minutes to find out where stuff was and how it was tied together. I, being a senior programmer on the Perl codebase (DISCLAIMER: I did not choose what language that was written in), "never" spend more than 1 minute to find something. It doesn't happen. I could spend up to 5 minutes when the codebase was new to me; now I know it.
It has to do with how things are optimized, of course. In the Perl codebase, we optimize for ease of searching over the entire codebase using find/grep, and we rearrange directories etc to ensure that we have a sensible source code layout. To be able to live with the size of this codebase in Perl, this is a necessity.
For the Java codebase, even though it is twice as large as the Perl codebase, they don't optimize for this, because they have tools for jumping around the code, and source layout and grepping is unimportant.
The net result is that the codebase in Perl, the language that should be nasty in this area, is much easier to deal with.
Of course, the ideal would be to have something that is good in all these dimensions - compact, full tool support, freedom of expression from dynamic typing, checks from static typing, and magically make people carefully organize their code anyway. Alas, I think some of those are contradictory...
> It's 7 tokens in a 27 token program. As written, it's > about 25% cost in code space.
True. But when compared with java (I count about 57 tokens in my implementation) the Ruby solution is 64% smaller and the GScript version is 52% smaller. So you got a lot of bang for your buck in GScript.
It's only because they are both so terse that the distinction between them is large. In wall time and human-reading time, they are nearly equivalent, at least for me.
> Anyway, I suspect most of this could go away. > > At the simplest level, the return statement could be > inferred - last expression in the block could give the > return if no return statement is included. I'm actually > surprised you don't do that already, as you have enough > information.
It's a design tradeoff. In GScript you have to annotate all in-types and out-types of a method. That's what allows us to lazily compile classes: we don't have to compile a class fully to know what types it expects, so we can incrementally compile classes as needed. It lets us have a much more immediate Ruby-like feedback cycle on class changes.
There is also an argument that by annotating your methods in and out types, you are providing some documentation about the method and possibly hiding some implementation details about the method (e.g. returning a List rather than an ArrayList.) I'm not sure I totally agree with that argument, but I'm pretty sure I don't totally disagree with it.
Scala, which compiles fully to an intermediate form, does infer the return type usually (there are cases when it can't.) I admire it for doing so, but that ain't how GScript is gonna work.
> At a less trivial level, if we assume that the _employees > variable is typed, we could infer a type of "Must be > comparable to int" for amt, and "List<Person>" for the > find_all, assuming a usable declaration of the type of > find_all. Of course, if you want to do the latter > inference, you might want to re-introduce the return just > to show that you actually DO want to return a value.
If we were willing to sacrifice lazy compilation, we could go the whole nine yards and do Milner-style inference, but we thought the benefits weren't worth the costs.
> WRT IDE jump functionality being a killer feature for > understanding a codebase: Yes, though that ease also > translates into people not being careful about how they > structure their codebase, and the expense of a larger > codebase to enable the functionality.
Totally agreed. I definitely don't think as hard about package design after coding in IntelliJ for a few years. I always say to myself "I'll clean this up later" and then don't. OTOH, IntelliJ makes it easy to not do so. I'd hate to use emacs on our code base.
Good? Bad? _shrug_
> The net result is that the codebase in Perl, the language > that should be nasty in this area, is much easier to deal > with.
Sounds plausible. I've never worked on a significantly sized Perl project, so I'll defer to you on it.
> Of course, the ideal would be to have something that is > good in all these dimensions - compact, full tool support, > freedom of expression from dynamic typing, checks from > static typing, and magically make people carefully > organize their code anyway. Alas, I think some of those > are contradictory...
They definitely are. GScript tries to strike a reasonable balance on the statically typed, tools-oriented side of the fence. I'm biased, but I think it does a pretty good job of it.
"A couple of weeks ago I decided to take the plunge and start learning Python after spending the past few years doing the majority of my software development in C#."
A couple of weeks? As in on or around December 18th, 2007?
"Shortly after I started using Python regularly as part of the prototyping process for developing new features for RSS Bandit, I started trying out C# 3.0."
So within two weeks of starting to learn Python, he's onto C# 3.0. I don't want to underestimate Dare Obasanjo but with a two weeks of starting to learn Python (non-exclusively by his own admission) I would consider him unqualified to make such broad claims. Even just looking at the article, the section on lambda experssions shows a C# version that, in my mind isn't equivalent to the python version. That's followed by the version that would appear to be equivalent to the python version but "doesn't work".
I could be confused about things but I find it hard to take this article seriously. If I wrote a blog about how I started learning C# two weeks ago, and between then and now tried out a new version of Java and concluded that Java was equivalent, if not more powerful than C#, would anyone take that seriously? Oh, and assume for argument's sake I work for Sun.
My own journey has been bass-ackwards from Dare's in that for most of my programming careed I worked mostly with dynamic languages (dBASE, Clipper etc. and later Python and Javascript), but for the last three years I've been working mostly with C#, and have come to greatly appreciate static typing (and intellisense). I have also used IronPython extensively, and love the fact that it integrates so elegantly with my C#. Along the way I have also studied, and lusted after Dylan, Haskell, and Oz/Mozart, but unfortunately all of these are not available on either .NET (or even JVM), and unlikely ever to be so.
However, I think Dare has missed the boat completely. I believe the real action in the DotNET space over the next few years will not be with IronPython or the "DLR" / Silverlight stuff. Rather it is going to be with F# - a beautiful .NET implementation of O'CAML that uses type inferencing to offer a "best of both worlds" solution to the eternal dynamic/static typing dichotomy.
F# also elegantly integrates many of the best Python features such as list (and generator!) comprehensions, tuples and tuple unpacking. Indeed one can easily extend such ideas much further via pattern matching and "active patterns". Astoundingly, it even manages to offer a seemingly useful and _understandable_ implementation of Haskell Monads (but christens them "Workflows" rather than Monads).
Dare needs to get with the (f#) program (imho of course ;)
> Really? To me it looks pretty much equivalent, > C#: > IEnumerable<Person> empsWithSalaryLessThan(int amt) { > return _employees.Where(emp => emp.Salary < amt); > } This one is the standard approach
IEnumerable<Person> empsWithSalaryLessThan(int amt) { return from e in _employees where Salary < amt select e; }