We should understand that ML models are not static — as soon as the data changes, so do the models and their predictions, and it is necessary to constantly monitor ML pipelines, retraining, optimization and so on. All these are “time series” problems, which should be solved by engineers and data scientists, which are not trivial from many points of view. And solutions may have huge time horizons, but the worst part is that they need to be maintained afterwards. Eww. As engineers, we love to create things, but we don’t want to maintain them. To somehow automate data preprocessing, feature engineering, model selection and configuration, and the evaluation of results, the AutoML process was invented. AutoML can automate these tasks by providing a basic result, can provide high quality for certain problems and can give an understanding of where to continue research.
It sounds great, of course, but how effective is it? The answer to this question depends on how you use it. It’s about understanding where people are good at and where machines are good at. People are good at connecting existing data to the real world — they understand the business area, they understand what specific data means. Machines are good at calculating statistics, storing and updating state, and doing repetitive processes. Tasks like exploratory data analysis, preprocessing of data, hyper-parameter tuning, model selection and putting models into production can be automated to some extent with an automated machine learning frameworks, but good feature engineering and draw actionable insights can be done by human data scientist that understands what he is doing. By separating these activities, we can easily benefit from AutoML now, and I think that in the future AutoML as a thing will replace most of the work of a data scientist.
Many data scientists are saying that the existence of human data scientist is still necessary after AutoML, but I doubt it. I am not talking about specific tasks to achieve maximum model accuracy or research, I am talking about real business problems. And here I think it is obvious that AutoML will win. There are not many projects in the real world that go from POC to production, and automation will help to make a quick prototypes and eventually increase ROI for the company.
What’s more, I think it’s noticeable that the industry is undergoing a strong evolution of ML platform solutions (e.g. Amazon Sagemaker, Microsoft Azure ML, Google Cloud ML, etc.) and as ML adoption grows, many enterprises are quickly moving to ready-to-use DS&ML platforms to accelerate time to market, reduce operating costs and improve success rates (number of ML models deployed and commissioned).