I previously wrote about the evolution of data engineering, in which data engineers (ex-business intelligence) had to become more technical in order to follow the pace of innovation in order to support the massive growth of data and data usage, as well as on the various spectrum of data scientists. I am now convinced that data science as we know it is set to die and that like the role of the business intelligence engineer in its’ days it will evolve, but contrary to data engineering, it will evolve towards a less technical nature.
This evolution will be forced in by three different trends: the automation of individual workflows typically performed by data-scientists, the creation of data products effectively taking away certain repetitive part of the job for data-scientists and finally a move towards higher value added work.
If you thought you could keep on doing your typical machine learning for the rest of your career, its’ time for a reality check.
In a lot of small startups the data-scientist has been the jack-of all trade in the data domain. From having to setup the infra on which every data-job will be run, to ingesting and processing the different data-sources to finally arrive to a predictive output. Each of these steps are now becoming easier and easier.
Infrastructure is becoming easier to manage with more and more turnkey solutions. With the S3/Athena combo offering and easy setup data-lake on Amazon, or Lambda functions powering model development, ready to deploy Cloud Composer on Google Cloud offering Airflow as a service, most of the work that would have taken a data-scientist or a dev-ops person ages to setup a proper and scalable infra has been outsourced to these cloud providers.
At the same time, the ingestion of data sources is also becoming easier, Google Analytics events are directly integration into Big Query for Google 360 users and other sources can be integrated through third party tools such as funnel or stichdata.
At the same time as the more foundational work gets automated, the preprocessing steps and processing steps also become easier and more easily automated. Tools such as Alteryx, provide a drag and drop way to handle the different workflows. More advanced data science use cases become easier, a very difficult library to manipulate such as Tensorflow, gets an easy wrapping interface in Keras, machine learning gets to the cloud Cloud AutoML and AWS SageMaker.
The different trends within data-science to automate and simplify the workflows of data-scientists, both allows people with less domain knowledge to perform these tasks and frees up time for data-scientist.
Building on top of the work done to automate data-science’s workflow, a different trend is also eating away at Data-scientists’ lunch, the generalisation of data products, leveraging machine learning models, and standard Insights. These are
CRM platforms for instance have moved heavily towards this, offering propensity models, recommendation engines, automated AB testing … Segmentation and decisioning capabilities are typically built in and different players have focused on different data capabilities, Emarsys with automated RFM reporting and segmentation and Sales-forces with its’ Einstein Artificial Intelligence.
DMPs and the likes have also heavily moved into this domain, implementing technologies to help with probabilistic user matching/merging and lookalike segmentation models. Personalization also a task previously the sole domain of the data-scientist is now powered within CMSs, as well as some more specific platforms.
Data products building standardized datastructures and data-capture have already started to take away some of the lower value added data science works.
Data-scientists will move towards higher value added work, moving towards more towards the fields of Strategy and product development, while having a focus towards empowering users of the datasets. Some of the more traditional data-science work will more towards higher impact projects either through customization or R&D.
Analysis and interpretation skills will becoming more important as the role will increasingly shift towards a more strategic component. Data-scientists & Analytics role in Silicon valley companies have increasingly shifted towards setting strategic decision based on data be it Data Scientists with a product Analytics at Facebook or those in Strategic Planning & analytics at Airbnb or Uber, the main role of these analytics professional is to shape strategy rather than to provide predictions.
The ability to manage projects and products will become increasing in importance. New age data-scientists will need to manage both the data projects from beginning to end, as well as leading the overall product outcome based on data. This is already the case in some companies hiring products Analytics professionals or having highly quantitative product managers. There is a high need for technical professionals that truly understand data to help shape and build products, be it data products or simply to guide normal products through their developments. And current data-scientists with their mix of business/products-quantitative-technical are well suited to fit this
Democratizating of data will become increasingly significant for organizations trying to become data driven and data-centric. Teaching other how to interpret and leverage the data available will become an increasingly important part of the work of this data scientist.
Some of the more typical data-science work, will end up resolving around customization or “tailoring “ of the models in existing data platforms (eg CloudML) and delivering P.O.Cs prior to full scale development. Some data-scientists will on the other hand have an increased focus on R&D will be prevalent in certain large data-products driven companies.
Different trends are shaping the future of data-science and the job is certainly going to go through drastic change in the coming years. Driven by automation and productionalization, the work done by data-scientists will be moving towards higher value work, mostly less technical than what it is today.
The data-scientists of tomorrow, will be much more focused on product, strategy and how to empower their organizations to leverage the data that they have, the more highly technical work is likely going to be left out to an ever more technical data engineer, save for specific R&D roles.