|
Re: Optimizing ArrayList.removeAll()
|
Posted: Mar 8, 2007 11:48 AM
|
|
> > In that case, I agree your solution scores over Sun's > as > > Sun still has inner code of O(n) 'cos of remove(). > Yours > > become O(n) while Sun's remains O(n^2). Agreed. > > Okay, cool. That was the main point of the article. > > > > > What we were first arguing over was that your solution > is > > worst case O(n) while Sun's O(n^2). > > Yes, I see the source of confusion now. I was talking > worst case of removeAll in isolation.
yes.
> > > > So now let's talk the real case where contains() has a > > base complexity of O(n) as James also observed. > > Okay, though I don't see why you consider the O(n) case > the "real" case. > > > > > *Assume* that argument Collection in removeAll() is the > > same 'ArrayList' as the actual one. This is the worst > case > > with both having exactly the same elements. > > > > In this case 'contains()' iterates over the entire list > > thereby having O(n). So if your solution also invokes > > contains() (as it does) in the loop you have O(n^2) as > > well. > > Okay, agreed, but lets make that more thorough: > > n: elements in the collection, C, on which you are doing > removeAll > s: elements in the list, L, you pass to removeAll > r: the elements in s that are in n (that is the number of > elements that will be removed). > > So for suns solution we have, roughly: > r*n: the cost of doing a remove r times > + n*s: the cost of doing an L.contains() call on each > element in C, assuming O(n) performance for contains. > > for my solution, we have, roughly: > n : the cost to remove r items > + n*s: the cost of doing an L.contains() call on each > element in C, assuming O(n) performance of contains. > > There you have the difference between the two. Now, if: > > a) r=s=n, we get n^2 performance for both, though mine > ne would still perform better (removal is constant vs. > n^2). > b) s=0, and r is much, much larger than n (r=1000000, > 0, n=1), then the performance of the list dominates, and > both algorithms perform poorly. > > Side note: I think it was said on this list before, that > passing a collection with poor contains performance is a > *bad thing*, unless the list is really small. >
Missed it completely if it was discussed that strongly.
> > > > In terms of time complexity in big-Oh notation, > there's > > no improvement. > > > > But as i stated previously, your solution has got rid > of > > remove() only and not contains(). Sun's time is O(n * > > (2n)) i.e O(n^2). That 2n is contains() + remove(). > > > > Worst case time complexity in Big-Oh is same for both > if > > contains() is O(n). Your solution has subtracted 'n' > > cycles from Sun's but not divided it by 'n' cycles. > > > > > > contains() = O(1) : Amin's solution's wonderful. > > contains() = O(n) : No significance difference. > > > > As i get your solution doesn't change contains(). > > > > I hope this time I'm clear to you. > > Okay, I think we are roughly on the same page. By the way, > the concrete timings are available here: > http://www.ahmadsoft.org/articles/removeall/index.html. > You can see that the solutions I provided outperform Sun's > by a factor of 3 when removing 150 items from a list of > size 2500 (passing in a hashset of course). There is no > case where Sun's solution is seen outperforming the one I > have provided (obvious, based on the analysis above). > > amin
Agree with you.
|
|