Matt Gerrans
Posts: 1153
Nickname: matt
Registered: Feb, 2002
|
|
Re: Processing large files suggestions
|
Posted: Jun 6, 2004 4:58 PM
|
|
> If (b) then it's a whole different kettle of fish called Distributed Computing.
Also "grid computing." (I'm not sure what the distinction between the two is).
> Google for your specific problem.
This looks like a promising list: http://math.nist.gov/javanumerics/
Back on the main question, you may additionally want to do a little profiling or quasi-profiling to determine whether most of your time is being used doing calculating or doing I/O. At the very least, look at a CPU and I/O meter while you run what you have.
Buffer size is important. I wrote a program for calculating CRCs and also tuned it for very large files. It turns out that when it comes to buffers, bigger is not necessarily better. A relatively small buffer of about 100K bytes was close to the sweet spot (on current high-end hardware). When you think about it, it makes sense: if your buffer is too big, then the CPU has to sit long periods of time doing almost nothing while the buffer is being filled. (By the way, this lesson also applies to CPU cache: advertising a huge L2 cache sounds impressive, but it may not affect peformance much).
In fact, you could get improved performance by having calculation threads separated from I/O threads, but I think the performance improvement on a single-CPU machine would be pretty small (a few percent) in comparision with the amount of additional work and complexity. Getting the buffer size right is easier and the payoff can be quite a bit more substantial (orders of magnitude).
The jrMan ray-tracing project also processes huge files and the author told me that he got amazing performance (even better than the C++ program, RenderMan, upon which it is based) by using memory mapped I/O. I've only used the memory mapped I/O on Windows with the Windows API, so I don't know what Java interface he is using, but if you want to pursue that, you can have a look at the project link: http://sourceforge.net/projects/jrman/ (also, I don't know if that technique or something similar can be effected on other non-Windows platforms)
|
|