Ralph Johnson: For some years, I've taught a software engineering course that used both XP and RUP. Students had group projects, and half were XP and half were RUP. The projects are not run in as controlled a fashion as those in the paper, so perhaps I am missing something. However, in general I have not seen a big difference in results between the two. The main indicator of success is previous experience. Groups that have a couple of experienced people do better than those without.
However, one group of people consistently did better with XP than with RUP. This is a group with little programming experience. RUP did not work for them because they had nobody who could act like an architect. They would produce UML diagrams, but they were always wrong, so it was a waste of time. However, when they followed XP, they produced (usually poor) working code that had regression tests. Eventually they started to refactor it. Because they always kept their tests working, they always made progress, even if it was slow. XP worked much better for these groups than it did for average groups.