Member-only story

Spark core concepts explained

Apache Spark architecture is based on two main abstractions RDD and DAG, let’s dive in what those concepts are

7 min readFeb 5, 2021

Apache Spark is considered as a powerful complement to Hadoop, big data’s original technology. Spark is a more accessible, powerful and capable big data tool for tackling various big data challenges. It has become mainstream and the most in-demand big data framework across all major industries. Spark has become part of the Hadoop since 2.0. And is one of the most useful technologies for Python Big Data Engineers.

This series of posts is a single-stop resource that gives spark architecture overview and it’s good for people looking to learn Spark.

Whole series:

Things you need to know about Hadoop and YARN being a Spark developer
Spark core concepts explained
Spark. Anatomy of Spark application

Apache Spark architecture is based on two main abstractions:

Resilient Distributed Dataset (RDD)
Directed Acyclic Graph (DAG)

Let’s dive into these concepts

RDD — the Spark basic concept

Spark core concepts explained

Apache Spark architecture is based on two main abstractions RDD and DAG, let’s dive in what those concepts are

RDD — the Spark basic concept

Written by Kirill Bobrov

No responses yet