Sunday, 19 April 2020

Deployment Modes in Spark







Hello folks!! If you are learning Apache Spark, you will need deploy the Spark on machine(s). Before Using the Apache Spark, you must figure out, for what purpose we are going to use then we will be able to deploy the Apache Spark. As we know that Spark application contains several components and each component has specific role in executing Spark program.


As we can see that all Spark’s components in above high-level architecture, there is one component named Cluster Manager; for managing the Spark cluster, Cluster manager uses the resource manager. There are several common deployment modes for Spark.
1)Local Mode
2)Standalone Mode
3)YARN Mode
4)Mesos Mode

Let us try to understand each mode in detail…
1)Local Mode:
Local mode allows all spark processes to run on a single machine, optionally using any number of cores on Local system.
Using Local Mode is often a quick way to test a new Spark installation, and it allows you to quickly test Spark routines against small datasets.

How to submit Spark job to Local Mode?

$SPARK_HOME/bin/spark-submit \
--class 0rg.apache.spark.example.Sparkpi \
--master local \
$SPARK_HOME/example/jars/spark-example*.jar 10

Note: You can specify the no. of cores to use in Local Mode.
Ex:  For 2 cores: local[2]

       For use all cores on system: local[*]
2)Spark Standalone:
Spark distribution comes with it’s own resource manager. When your program uses spark’s resource manager, execution mode is called standalone.

How to submit Spark job to Standalone Mode?
$SPARK_HOME/bin/spark-submit \
--class 0rg.apache.spark.example.Sparkpi \
--master spark://mysparkcluster:7077 \
$SPARK_HOME/example/jars/spark-example*.jar 10

3)Spark on YARN:
YARN is generic resource management framework for distribution workloads; in other words, a cluster-level operation system.
Spark application on YARN, we have got two deploy modes:
a)      cluster Mode:
The Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.
b)     Client Mode:
The driver runs in the client process, and the application master is only used for requesting resources from YARN.

How to submit Spark job to YARN Mode?
$SPARK_HOME/bin/spark-submit \
--class 0rg.apache.spark.example.Sparkpi \
--master YARN \
--deploy-mode cluster \
$SPARK_HOME/example/jars/spark-example*.jar 10

$SPARK_HOME/bin/spark-submit \
--class 0rg.apache.spark.example.Sparkpi \
--master YARN \
--deploy-mode client
$SPARK_HOME/example/jars/spark-example*.jar 10

3)Spark on Mesos:
Mesos is open source resource manager introduced by UC Berkley university in 2008. Mesos is like YARN resource manager and it has also 2 deployment: Cluster and Client mode.

How to submit Spark job to YARN Mode?
$SPARK_HOME/bin/spark-submit \
--class 0rg.apache.spark.example.Sparkpi \
--master mesos://mesoscluster:7077 \\
--deploy-mode cluster \
$SPARK_HOME/example/jars/spark-example*.jar 10

$SPARK_HOME/bin/spark-submit \
--class 0rg.apache.spark.example.Sparkpi \
--master mesos://mesoscluster:7077 \
--deploy-mode client
$SPARK_HOME/example/jars/spark-example*.jar 10















1 comment:

  1. Thx Ravi Kumar
    Simply explained
    I noticed sark cluster mode yarn master .. use spark extra resources compare with client . Mode.
    Is it correct.



    Venu

    Spark training in Hyderabad

    ReplyDelete