The Artima Developer Community
Sponsored Link

Java Buzz Forum
16 for '16: What you must know about Hadoop and Spark right now

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
News Manager

Posts: 47623
Nickname: newsman
Registered: Apr, 2003

News Manager is the force behind the news at Artima.com.
16 for '16: What you must know about Hadoop and Spark right now Posted: Jan 12, 2016 3:33 PM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by News Manager.
Original Post: 16 for '16: What you must know about Hadoop and Spark right now
Feed Title: JavaWorld
Feed URL: http://www.javaworld.com/index.rss
Feed Description: JavaWorld.com: Fueling Innovation
Latest Java Buzz Posts
Latest Java Buzz Posts by News Manager
Latest Posts From JavaWorld

Advertisement

The biggest thing you need to know about Hadoop is that it isn’t Hadoop anymore.

Between Cloudera sometimes swapping out HDFS for Kudu while declaring Spark the center of its universe (thus replacing MapReduce everywhere it is found) and Hortonworks joining the Spark party, the only item you can be sure of in a “Hadoop” cluster is YARN. Oh, but Databricks, aka the Spark people, prefer Mesos over YARN -- and by the way, Spark doesn’t require HDFS.

[ Also on InfoWorld: Apache Hive brings real-time queries to Hadoop as the perfect partner for an enterprise data warehouse. | Cut to the key news in technology trends and IT breakthroughs with the InfoWorld Daily newsletter, our summary of the top tech happenings. ]

Yet distributed filesystems are still useful. Business intelligence is a great use case for Cloudera’s Impala and Kudu, a distributed columnar store, is optimized for it. Spark is great for many tasks, but sometimes you need an MPP (massively parallel processing) solution like Impala to do the trick -- and Hive remains a useful file-to-table management system. Even when you’re not using Hadoop because you’re focused on in-memory, real-time analytics with Spark, you still may end up using pieces of Hadoop here and there.

To read this article in full or to leave a comment, please click here

Read: 16 for '16: What you must know about Hadoop and Spark right now

Topic: ElasticSearch Tutorial for Beginners Previous Topic   Next Topic Topic: Clojure VIM Environment

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use