Published inData Engineer ThingsTwo Archetypes of Data EngineersDiscover different archetypes of data engineers and how their collaboration drives data-driven success7h ago7h ago
Published inData Engineer ThingsHow to Speed Up Spark Jobs on Small Test DatasetsDealing with small datasets (under a million records) can be a peculiar challenge when you’ve chosen Apache Spark as your go-to tool…Dec 6, 2024Dec 6, 2024
Why Use `pip install — user`?When working with Python, you’re likely familiar with the process of installing packages using the popular package manager, pip. It's a…Dec 3, 2024Dec 3, 2024
Comparing Dgraph and Neo4j Graph Databases: Key Differences and Use CasesIn modern data engineering, graph databases have gained prominence for their ability to efficiently store, query, and traverse…Nov 19, 2024Nov 19, 2024
Exploring the Power of Graph Databases“Everything is connected to everything else.” — Leonardo da Vinci.Nov 5, 2024Nov 5, 2024
Table Selection in Software EngineeringIn the world of poker, there is a strategy that goes beyond just playing the game well — it’s about choosing the right table. The idea here…Oct 15, 2024Oct 15, 2024
Senior Engineer FatigueI can’t go back to yesterday because I was a different person then — Alice, Lewis CarrollOct 1, 2024Oct 1, 2024
Published inCodeXWhy Apache Spark RDD is immutable?Every now and then, when I find myself on the interviewing side of the table, I like to toss in a question about Apache Spark’s RDD and its…Sep 23, 2024Sep 23, 2024
Future of Search EnginesSearch engines, as we know them (as well as blogs heh), are gradually losing their relevance because the content they index is increasingly…Sep 18, 2024Sep 18, 2024
Published inData Engineer ThingsLocking Mechanisms in High-Load SystemsIn the world of concurrent systems, especially when it comes to highly loaded distributed environments, finding a balance between data…Aug 23, 2024Aug 23, 2024