This post originated from an RSS feed registered with Java Buzz
by News Manager.
Original Post: 12 things I hate about Hadoop
Feed Title: JavaWorld
Feed URL: http://www.javaworld.com/index.rss
Feed Description: JavaWorld.com: Fueling Innovation
I love the elephant. The elephant loves me. Nothing is perfect, however, and sometimes friends fight.
Here are the things I fight with Hadoop about.
1. Pig vs. Hive
You cannot use Hive UDFs in Pig. You have to use HCatalog to access Hive tables in Pig. You cannot use Pig UDFs in Hive. Whether it's one little extra functionality I need while in Hive, but don’t really feel like writing a full-on Pig script or it's the “gee, I could easily do this if I were just in Hive” while I’m writing Pig scripts, I frequently think, “Tear down this wall!” when I’m writing in either.
2. Being forced to store all my shared libraries in HDFS
This is a recurring theme in Hadoop. If you store your Pig script on HDFS, then it automatically assumes any JAR files will be there as well (I’m working on fixing that myself). This general theme repeats in Oozie and other tools. It's usually sensible, but at times, having an organization-wide forced shared library version is painful. Besides, more than half the time, these are the same JAR files you installed everywhere you installed the client, so why store them twice? This is being fixed in Pig. How about everything else?