Monday, 13 May 2019

Introduction to Apache Spark


Hello folks, today we will explore some basic and important things about Apache Spark. In this post we will only focus on Introduction part about Apache Spark. I am very much excited to include this post into this blog and hoping you also get some good information. For each and every topic, we will follow the below strategy to understand about topic:

Q1. What is this?
Q2. Why does it need?

 So let’s start…..

Q What is Apache Spark?
Apache Spark is a fast and general-purpose data processing engine. Spark is basically use for computation intensive algorithms over the cluster. It works on top of Apache Hadoop platform. It is one of the famous ecosystems of Hadoop, Spark is 100 times faster than Big data Hadoop and 10 times faster than accessing data from disk.
Spark comes with several programming languages like java, Python, Scale & R for data Processing.

Q Why does it need?


As we know that we cannot do the data analysis on a single machine and also we never do the data analysis on huge amount of data. If we have huge amount of data and we want to do some computation on data then we need to use cluster computing conceptBefore going forward let us discuss little bit about cluster computing.

Cluster: Cluster is nothing but network of machine/commodity hardware.

Cluster Computing: cluster computing is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.





Apache Spark is a powerful open source engine that provides real-time stream processing, interactive processing, graph processing, in-memory processing as well as batch processing with very fast speed, ease of use and standard interface.

Who Use the Spark?
There are 2 Kind of people use the Spark:
1)  Data Engineers

2)  Data Scientists

Data Scientists analysis the data on the top of Big data and they want a value out of this data by using machine learning algorithms.

Data Engineers are processing the application data for the specific requirement.



1 comment:

  1. Wow Very Nice Post I really like This Post. Please share more post. Apache Spark Training Institute in Pune We provide Blog, LMS, Regular Course, Self Paced Course, Webinar Session: Marketing, Project: Sell, Technical Support, Lab Service: Sell.

    ReplyDelete