There are a lot of engineers who have never been involved in the field of statistics or data science. But in order to build data science pipelines or rewrite produced code by data scientists to an adequate, easily maintained code many nuances and misunderstandings arise from the engineering side. For those Data/ML engineers and novice data scientists, I make this series of posts. I’ll try to explain some basic approaches in plain English and, based on it, explain some of the Data Science basic concepts.

The whole series:


In the process, developers often have to update their services and deploy them. When the team and the number of services are small, this is not a problem because releases and deployments are rare. Tests, release management, publishing artifacts, and deployments can be run manually. But over time, the number of services and tasks increases, the cognitive load increases even more, and the release cycle starts to fail if you don’t do it often and get bogged down in running its individual steps.

Let’s look at the typical process of feature implementation/bug fixing for the majority of projects:

  • Create a…


My friend asked me an interesting question about what skills are worth learning for Data Management specialists and how to build a grow roadmap.

In fact, the question made me think because I haven’t had a clear picture in my head. It’s just my thoughts on the topic and for the most part, I’m just speculating about the current state and the future of Data Management.

Prerequisites

In the beginning, as in any other area, there are basic things that any Software Engineer should know.

In short, I assume that the person who came to the Big Data already knows some…


My friend asked me an interesting question about what skills are worth learning for Data Management specialists and how to build a grow roadmap.

In fact, the question made me think because I haven’t had a clear picture in my head. It’s just my thoughts on the topic and for the most part, I’m just speculating about the current state and the future of Data Management.

Prerequisites

In the beginning, as in any other area, there are basic things that any Software Engineer should know.

In short, I assume that the person who came to the Big Data already knows some…


The activities of web applications are uncertain, sometimes they serve a huge number of workloads, but sometimes they idle without a large number of requests. The hosting of applications on virtual machines in the cloud forces us to pay for idle times too. To solve such a problem we must look at load balancing, DNS lookup, and automatic scaling. It is difficult to manage all of this and on pet projects it makes zero sense.

Serverless technologies are several years old and its popularity is increasing every year. For highly loaded systems it is a simple way of infinite scaling…


Apache Spark architecture is based on two main abstractions RDD and DAG, let’s dive in what those concepts are

Apache Spark is considered as a powerful complement to Hadoop, big data’s original technology. Spark is a more accessible, powerful and capable big data tool for tackling various big data challenges. It has become mainstream and the most in-demand big data framework across all major industries. Spark has become part of the Hadoop since 2.0. And is one of the most useful technologies for Python Big Data Engineers.

This series of posts is a single-stop resource that gives spark architecture overview and it’s good for people looking to learn Spark.

Whole series:


So I wrote a book.

I’ve never had anyone I know write a book. I decided to do it myself as an experiment, and I want to tell you a little bit about it. Maybe one day you can learn from my example.

Writing a book is fucking hard, it is hard work, especially when you are not Stephen King. It is even harder when the publisher has a hard deadline. Fortunately, I did not have such conditions — I published the book myself and did the whole process from beginning to end. But I spent some time digging through…


In this post, I will talk about my experience with AWS certification for Solution Architect Associate and how I prepared for it.

AWS certification allows the developer to confirm his qualification and skills in working with AWS services. And the preparation process itself provides additional experience in working with AWS services. But besides all of this, you also get knowledge about architectural patterns that can be applied anywhere else, how solutions are built in the cloud, their limitations and problems.

Documentation and videos on the AWS services pages are certainly not a bad start, but to prepare for the exam…


Since childhood, we know that when we come from the street, we have to wash our hands. However, we do not really think about what to do after surfing online.

Everyone should decide for himself which level of security is acceptable for him personally. You have to understand how much the protected information costs in case you lose it. If you have important information and you are afraid of losing it, and the mere thought that this information might reach your enemies scares you, then you should think about information security.

Who reads me or knows me personally understands that…


In python, it is common practice to write all the application dependencies that are installed via pip into a separate text file called requirements.txt.

It’s good practice to fully specify package versions in your requirements file. And in our case, everything will be there — both direct dependencies of our application and dependency dependencies, etc.

But sometimes, especially on a long-lived project, it’s hard to understand what dependencies were original. It is necessary to update them on time, not depend on packages that are outdated or no longer needed for some reason.

For example, which of the following dependencies are…

Kirill Bobrov

helping robots conquer the earth and trying not to increase entropy using Python, Big Data, ML. Linkedin @luminousmen. Check out my blog — luminousmen.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store