Summary
Recently, at Devoxx, it was announced that Java would get lambda functions (a.k.a. anonymous functions or closures). There are many ways of implementing these on the JVM. This post proposes that reifying lambdas is a good choice.

Reified Lambda Functions

Lambda functions are anonymous functions that can be defined using short syntax and in many cases offer a more concise alternative to inner classes. It was recently decided to add lambdas to Java (http://blogs.sun.com/mr/entry/closures_qa) and a 'strawman' proposal was later given (http://cr.openjdk.java.net/~mr/lambda/straw-man/) for discussion on a discussion group (http://mail.openjdk.java.net/pipermail/lambda-dev/2009-December/thread.html#start).

There are many ways to implement Lambdas on the JVM, one possibility is to erase their actual types and use signatures in class files (much like generics at present). Neal Gafter has given an example (http://mail.openjdk.java.net/pipermail/lambda-dev/2009-December/000192.html) that is difficult to do without erasing the types:

<T> #T() constant(T t) { return #() (t); }

The syntax used is from the referenced strawman:

#T(): Means a lambda that returns a T when called and has no arguments.
#() (t): Means create a new lambda that returns t, the type T is inferred and the expression could be written #T() (t).

The above 'constant' example will be used as a running example below.

Like generics the erasure option has problems, e.g.:

1. Can't, with some erasures proposals, have two methods of the same name only distinguished by lambda type, e.g. filter( #boolean(int) ) and filter( #boolean(float) ) illegal if both in same class

2. Can't have arrays of lambdas that are type safe, e.g. new #int()[n]; // Illegal

3. Can't have instanceof tests or class references, e.g. #int().class; // Illegal

This post proposes an alternative technique that doesn't erase type information except when used in conjunction with generics.

Reification

You could overcome the limitations of erasure given above in many cases if you had reified lambdas (i.e. lambdas that didn't erase the type). For example supose that:

#String() constant(String t) { return #() (t); }

Was translated to:

Callable$String constant(String t) {
  return new Callable$String() {
    @Override public String call() { return t; }
  };
}

Where:

public interface Callable$0<R> { R call(); }

public interface Callable$String extends Callable$0<String> {}

The types Callable$... are created as needed by the class loader (they are not emitted by the compiler, but for typing purposes they are known internally to the compiler).

The case of primitives is more complicated because you want to avoid boxing, e.g:

#int() constant(int t) { return #() (t); }

Becomes:

Callable$int constant(int t) {
  return new Callable$int() {
    @Override public int call_int() { return t; }
  };
}

Where:

public interface Callable$Integer extends Callable$0<Integer> {}

public abstract class Callable$int implements Callable$Integer {
  @Override public final Integer call() { return call_int(); }
  public abstract int call_int();
}

And in the generic's case (the original example) you would erase the type (because it is already erased):

<T> #T() constant(T t) { return #() (t); }

Is:

<T> Callable$0<T> constant(T t) {
  return new Callable$0<T>() {
    @Override public T call() { return t; }
  };
}

You would use the most specific type when a function type is given, e.g.:

class StringLambda extends #String() { ... }
#String() stringLambda = ...;

Become:

class StringLambda extends Callable$String { ... }
Callable$String stringLambda = ...;

Except when generics are used, in which case wild-cards and a Callable$*n* interface used:

#T(S) tSLambda = ...;

Becomes:

Callable$1<? extends T, ? super S> tSLambda = ...;

Where:

public interface Callable$1<R, A1> { R call(A1 a); }

Difficult Cases!

The cases:

#Object(String) oS;
#String(Object) sO = #String(Object o) (o.toString());
oS = sO;
#Object(String)[] oSs = new #Object(String)[] {oS, sO};
#O(S) gOS = oS; // S = String, O = Object
boolean test = gOS instanceof #Object(String)
List<? extends #Object(String)> lOS;
List<? extends #String(Object)> lSO = new ArrayList<#String(Object)>();
lOS = lSO;
List<#int()> lI = new ArrayList<#int()>();
for (int ii = 0; ii < 3; ii++) { lI.add(constant(ii)); }
#int() i = lI.get(0);

Are translate to:

Callable$Object$String oS;
Callable$String$Object sO = new Callable$String$Object() {
  @Override public String call(final Object o) { return o.toString(); }
};
oS = From$1$String$Object$$To$Object$String.cast(sO);
Callable$Object$String[] oSs =
    new Callable$Object$String[] {oS, From$1$String$Object$$To$Object$String.cast(sO)};
Callable$1<Object,String> gOS = oS;
boolean test = gOS instanceof Callable$Object$String;
List<? extends Callable$1<? extends Object, ? super String>> lOS;
List<? extends Callable$1<? extends String, ? super Object>> lSO =
    new ArrayList<Callable$1<? extends String, ? super Object>>();
lOS = lSO;
List<Callable$0<? extends Integer>> lI = new ArrayList<Callable$0<? extends Integer>>();
for (int ii = 0; ii < 3; ii++) { lI.add(constant(ii)); }
Callable$int i = From$0$Integer$$To$int.cast(lI.get(0));

Where:

public interface Callable$Object$String extends Callable$1<Object, String> {}

public interface Wrapper { Object getWrapee(); }

public static final class Wrappers {
  private Wrappers() {}
  public static Object unWrap(final Wrapper from) {
    Object wrapee = from.getWrapee();
    while (wrapee instanceof Wrapper) { wrapee = ((Wrapper)wrapee).getWrapee(); }
    return wrapee;
  }
  public static boolean equals(final Wrapper self, final Object other) {
    if (self == other) { return true; }
    if (other == null) { return false; }
    final Object o = (other instanceof Wrapper) ? unWrap((Wrapper)other) : other;
    final Object s = unWrap(self);
    return o == s;
  }
}

public interface Wrapper$Object$String extends Callable$Object$String, Wrapper {}

public interface Callable$String$Object extends Callable$1<String, Object> {}

public final static class From$1$String$Object$$To$Object$String {
  private From$1$String$Object$$To$Object$String() {}
  public static Callable$Object$String cast(final Callable$1<? extends String, ? super Object> from) {
    return new Wrapper$Object$String() {
      @Override public Object call(final String s) { return from.call(s); }
      @Override public Object getWrapee() { return from; }
      @Override public boolean equals(final Object other) { return Wrappers.equals(this, other); }
      @Override public int hashCode() { return Wrappers.unWrap(this).hashCode(); }
    };
  }
}

public static abstract class Wrapper$int extends Callable$int implements Wrapper {}

public final static class From$0$Integer$$To$int {
  private From$0$Integer$$To$int() {}
  public static Callable$int cast(final Callable$0<? extends Integer> from) {
    if (from instanceof Callable$int) { return (Callable$int)from; }
    if (from instanceof Wrapper) {
      final Object wrapee = Wrappers.unWrap((Wrapper)from);
      if (wrapee instanceof Callable$int) { return (Callable$int)wrapee; }
    }
    return new Wrapper$int() {
      @Override public int call_int() { return from.call(); }
      @Override public Object getWrapee() { return from; }
      @Override public boolean equals(final Object other) { return Wrappers.equals(this, other); }
      @Override public int hashCode() { return Wrappers.unWrap(this).hashCode(); }
    };
  }
}

The cast methods create a new object; which is wasteful, but on the other hand they aren't needed often (see next section). The erasure schemes don't require 'cast' objects and therefore avoid this overhead, but on the other hand the erasure versions have similar limitations to erased generics. For the generic cases the use of wild-cards avoid the need for the cast methods (since the generics have already erased the types). The From$...$$To$... classes, like the Callable$... classes, are generated by the class loader on demand and not written by the compiler (but for typing purposes are known to the compiler).

Contravariance

The cast methods above are used 'mainly' for contravariant overriding. Most OO languages do not support contravariant overriding (except for generics), but for a moment suppose that Java did allow ontravariant overriding:

abstract class Base { abstract Object m(String s); }

class Derived extends Base {
  @Override String m(Object o) { return o.toString(); } // Illegal arg should be String
}

The above is type safe because the arguments to m vary contravariantly and the return types vary covariantly. The argument of m varies contravariantly in the derived class, Derived extends Base, but m's argument is Object and Object is the super type of String (which is m's argument type in Base). As m's enclosing classes go down the hierarchy, Base to Derived, its arguments go in the contra direction (up), from String to Object.

The above example shows that this can be useful, and that is the main purpose of providing 'cast' methods. However the above ability is not really missed in Java and is not provided by most of the common OO languages; therefore I infer that contravariance is not that common a requirement and hence the method suggested above is the best solution. In particular, since contravariance is not common, it makes sense to make that the expensive operation and make more common operations fast.

Conclusions

Reifying lambdas makes them easier to use them since they behave like normal Java objects, but on the other hand the implementation is harder. The implementation is harder because you essentially have an erased and a reified version.

How valuable is reifying lambdas?

(Updated 2 Jan 10 in response to comments from Neal Gafter and Zdenek Tronicek and updated 3 Jan 10 in response to comments from Zdenek Tronicek - Thanks.)