What a Data Scientist Really Needs to Learn: Nothing

Raise your hand if you are a data scientist and you have used ChatGPT to help you write your code or maybe generate a photo like the one above. Exactly what I thought. Many have discovered the usefulness of ChatGPT to not only write thank you notes but to also write code. And this is just the beginning.

Recently Microsoft announced at their Build 2023 conference they are developing AI add-in layers across all of their products. If you are a serious developer, you might have already experienced the Microsoft and OpenAI collaboration aimed at developing code for GitHub users which began in 2021. For the rest of us, a quick conversation with ChatGPT usually solves most of our coding errors, but what about non-coders?

What if that magic was no longer limited to those talented enough to type syntax-error-free code…

As a data scientist, writing code is the key to unlocking the magic in data. The answers are hidden away and only the most skilled could find those numerical patterns that improved decisions, developed new medicines, or optimized supply chains. What if that magic was no longer limited to those talented enough to type syntax-error-free code and curate massive amounts of data? What if we all could do it? This might be the future-of-work impact of generative AI that can respond to simple text prompts and produce complex computer code.

Low code, no code — why bother?

It seems only a few years ago there was serious debate about low code / no code and its comparison to using traditional data analysis tools. Could a low-code solution that was merely pointing and clicking at the data really be as useful as code that was arduously crafted by those who truly understood the underlying algorithms? Perhaps. Is it important to know how those algorithms work, or is the real skill of the future to know when it is appropriate to use them? And what would the world which is becoming more and more data-driven look like? What will it look like when almost everyone could use data, untethered by obscure languages and coding nuances that previously took years to comprehend? What is the power of data when it is truly democratized?

A future where knowing a particular way to arrange code is less important.

Democratization of data analytics is a good thing. It can be empowering and improve our data literacy. People using tools like Co-Pilot can seek answers to questions in a conversational way rather than being hindered by code. And the more people who seek answers the better. If you are a data scientist and your greatest talent is your coding efficiency, garnered from your years of experience and memorization of all the popular algorithms and code libraries, then you likely already see the future. A future where knowing a particular way to arrange code is less important. Soon those competing for your job will require no coding experience at all.

Data Age of Enlightenment

The requirements of data analysts and business decision consultants in the (near) future will be those who can see beyond the technical limitations of unlocking value in data. It will be the equivalent of the 17th century Age of Enlightenment when out of the Dark Ages knowledge was not exclusive to those with privilege or the ability to read dead languages, but rather available to everyone. The important skills will be framing the question, continued curiosity, and understanding how data and knowledge can be applied not just extracted.

ChatGPT is only, yes only about six months old. We have barely had enough time for the hype cycle to run its course, for the media cycle to describe the “rise of the machines”, and for everyone else from accountants to airplane pilots to ponder the end of our careers. Although we should be more optimistic than this. The future of data scientist work influenced by AI may provide more opportunities than disruptions. It will allow more curious people with or without coding skills to find opportunities born from data. We will debate less the merits of particular coding languages (Python vs. R vs. C++ etc.) since it will not really matter which programming tools you use. What a data scientist really needs to know is not how to find those answers, but importantly why and how to apply those answers to decisions. Maybe improving access to data and its insights could make the world just a little bit better.

Leave a comment