Member-only story
Spark core concepts explained
Apache Spark architecture is based on two main abstractions RDD and DAG, let’s dive in what those concepts are
Apache Spark is considered as a powerful complement to Hadoop, big data’s original technology. Spark is a more accessible, powerful and capable big data tool for tackling various big data challenges. It has become mainstream and the most in-demand big data framework across all major industries. Spark has become part of the Hadoop since 2.0. And is one of the most useful technologies for Python Big Data Engineers.
This series of posts is a single-stop resource that gives spark architecture overview and it’s good for people looking to learn Spark.
Whole series:
- Things you need to know about Hadoop and YARN being a Spark developer
- Spark core concepts explained
- Spark. Anatomy of Spark application
Apache Spark architecture is based on two main abstractions:
- Resilient Distributed Dataset (RDD)
- Directed Acyclic Graph (DAG)
Let’s dive into these concepts