Member-only story

Spark core concepts explained

Apache Spark architecture is based on two main abstractions RDD and DAG, let’s dive in what those concepts are

Kirill Bobrov

--

Apache Spark is considered as a powerful complement to Hadoop, big data’s original technology. Spark is a more accessible, powerful and capable big data tool for tackling various big data challenges. It has become mainstream and the most in-demand big data framework across all major industries. Spark has become part of the Hadoop since 2.0. And is one of the most useful technologies for Python Big Data Engineers.

This series of posts is a single-stop resource that gives spark architecture overview and it’s good for people looking to learn Spark.

Whole series:

Apache Spark architecture is based on two main abstractions:

  • Resilient Distributed Dataset (RDD)
  • Directed Acyclic Graph (DAG)

Let’s dive into these concepts

RDD — the Spark basic concept

--

--

Kirill Bobrov
Kirill Bobrov

Written by Kirill Bobrov

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering, ML. Check out my blog—luminousmen.com

No responses yet