Famous in-memory data format

Kirill Bobrov
Nov 4, 2020

Apache Arrow is a sacred grail of analytics that was invented not so long ago. It is a special format for column data storage in memory. It allows you to copy objects from one process to another very quickly — from pandas to PyTorch, from pandas to TensorFlow, from Cuda to PyTorch, from one node to another node, etc.. This makes it the horse of a large number of frameworks for both analytics and big data.

--

--

Kirill Bobrov

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering, ML. Linkedin @luminousmen. Check out my blog—luminousmen.com