A couple of posts on this topic have caught my eye recently (here is another). In various ways they ask the same question - “outside of the AI labs, what is the role of a data scientist in a gen AI project?”
A comforting view (which both of these posts endorse) is that data scientists possess a valuable skillset and are every bit as essential in the process of stitching together systems around off-the-shelf foundation models as they were for more “traditional” modeling endeavors. For reasons of self-interest I’d like to believe this story is true, but something about it strikes me as suspiciously tidy. I can’t shake the feeling that the argument behind this story begins with the assumption that all these data scientist roles can’t simply disappear, so of course they’re going to stick around and morph into playing an integral part of the AI projects of the future - but of course, there’s no such guarantee.
The core of the argument seems to be that because certain concepts with which data scientists are familiar (like train/validate/test splits) can be kinda-sorta mapped to AI workflows if you try hard enough (few-shot examples/optimize/evaluate), we can put eval/optimization work into the same bucket as model fitting and therefore claim it for the data scientists. This is pretty obviously a stretch - I could just as easily argue that there’s a great deal of overlap between traditional product owner responsibilities and AI product evals, so actually LLM optimization is properly the domain of the product owners.
A secondary argument is that a data scientist possesses experience and perspective that will better position them to think about LLM evaluations (“A data scientist would reduce complexity, make each metric actionable, and tie it to a business outcome.”) Perhaps. But the most one can conclude from this is that data scientists are probably pretty well positioned to make the transition into whatever specialist role ultimately crystalizes here (if such specialties do emerge in addition to or in place of a full-stack AI engineer archetype - not a foregone conclusion), not that the data scientist qua data scientist is essential.
If you’re a data scientist who’s anxious about what happens to all those jobs involved with the model development process when a sizable portion of them are replaced with off-the-shelf models, thinking about how some of your skills map to that new world is a useful exercise. But be careful that you don’t turn into a pollyanna in the process and convince yourself that your role will transfer over wholesale, sparing you the need to invest substantial effort into the transition - doing so vastly overstates your relative advantage and will leave you poorly positioned against the competition.