First time, When I was started reading about
Spark, every time only one question was raising in my mind “if we have already MapReduce
and which provides similar functionality then why introduced Apache
Spark?”.These kinds of questions force me to make the comparison in between Spark
and MapReduce.
In this post we will focus on differences
between Spark and MapReduce.
Spark is differ from MapReduce because
it is very faster than MapReduce.
Apache Spark
processes data in-memory while Hadoop MapReduce persists back to the disk after
a map or reduce action, so Spark should outperform Hadoop MapReduce.
Nonetheless, Spark needs a lot of
memory. Much like standard DBs, it loads a process into memory and keeps it
there until further notice, for the sake of caching. If Spark runs on Hadoop
YARN with other resource-demanding services, or if the data is too big to fit
entirely into the memory, then there could be major performance degradations
for Spark.
MapReduce, however, kills its
processes as soon as a job is done, so it can easily run alongside other
services with minor performance differences.
Spark has the upper hand as long
as we’re talking about iterative computations that need to pass over the same
data many times. But when it comes to one-pass ETL-like jobs, for example, data
transformation or data integration, then
MapReduce is the deal—this is what it was designed for.
0 comments:
Post a Comment