Alumnus Profile: Stefano Soatto (MS '93, PhD '96)
Stefano Soatto's (MS ‘93, PhD '96) journey to boundary breaking work in Artificial Intelligence (AI) may seem nonlinear, but his commitment to curiosity has been a constant theme. Currently a Vice President of Applied Science and Distinguished Scientist at Amazon Web Services (AWS), and a Professor of Computer Science (on leave) at UCLA, Soatto's career spans both academia and industry. Prompted by solving challenging customer problems, Soatto's explorations touch fundamental questions of AI—understanding how AI models work, how they can be improved, how we can trust them, and how they might reshape the world for the better.
ENGenuity spoke with Soatto to learn how he has become a key player in the burgeoning AI landscape, and how his time at Caltech shaped what he thought was possible and laid the foundation for his work in industry.
ENGenuity: What are you currently working on and how would you describe your professional contributions?
Soatto: I am a Vice President of Applied Science at AWS; I led the teams that developed AI services now available through AWS. These span the areas of vision, speech, language, verticals (forecasting, personalization, industrial, medical, etc.) and foundation models (Amazon Bedrock, Amazon Titan, Amazon Q). Currently, my team and I are exploring risks and opportunities arising from large-scale AI models beyond the current generation. I am also a professor at UCLA, where I continue to advise students. My journey has been seemingly nonlinear. I started with classics (Latin, Greek, history, and philosophy) in Italy. Then I went into engineering because I was interested in understanding systems and solving problems. At the time, I was fascinated by how biological systems work. During my studies at the University of Padova, I spent a year at UC Berkeley, where I interacted with Hans Bremermann, an MD and mathematician who had worked with John von Neumann. He studied mathematical models of the interaction of HIV with the immune system, which at that time was not yet established as the cause of AIDS. It seemed to me that, at the time, what mattered in practice and what was intellectually interesting were largely disjoint. I believe it is quite different now, but remember this was in the early stages of the Human Genome Project, and I did not see myself spending long days in a wet lab. So, I started looking for interesting problems. When listening to a random seminar, I learned about the thesis work of Pietro Perona [Allen E. Puckett Professor of Electrical Engineering], then a postdoc at MIT in Vision. I went to Cambridge to meet with him and the following year I was among his first cohort of students at Caltech. During the past few years, large-scale generative models and large language models have challenged many of the premises we all held dear, shaking the foundations of computer science. So, I have been busy learning and trying to understand.
ENGenuity: How do you see the future of AI and how it will scale over time?
Soatto: Bright. People are developing large-scale and large language models—and when I say language, I don't mean the natural language with which these models are partly trained, but rather the inner language that emerges in models trained on sequence data with latent logical structure, including audio, video, biological data and of course natural language. As these models develop their inner languages, we need to understand how they represent and manipulate abstract concepts. There are so many fascinating questions to be understood—questions that have been around for hundreds if not thousands of years. But now with these models, because we built and designed them, we can measure everything about them. We can revisit old questions with new tools, and tackle hard problems like trust, factuality, and uncertainty. Users of these models want to be able to determine to what extent they can trust them or rely on their answers, so this work is both interesting and impactful.
ENGenuity: Are there any synergies between your work as a professor and role at AWS?
Soatto: Once an academic, always an academic, they say. Both roles involve exploring the frontier, but in academia you are driven by your curiosity and much of the effort is in framing the problem. You don't expect instant impact. Curiosity-driven exploration is very necessary because out of tens if not hundreds of thousands of people who pursue their own curiosity, occasionally somebody hits something that ends up having a massive impact, but impact is not the main driver. Posthumous recognition is ok.
In industry, you are constantly exposed to hard problems in need of creative solutions. Exploration is not curiosity-driven, but problem-driven — or we say at AWS, "customer-obsessed." What is quite refreshing about AWS is that you don't need to sit around your office thinking about problems that don't exist. There are actual problems that AWS customers encounter that most people do not even know exist yet. Customers solve the easy problems on their own, and for the hard ones they come to us. It's a very privileged perch and a treasure trove for an academic, as you get exposed to new problems before they become widely known.
For example, a few years ago, we launched a service called Amazon Textract, which is now one of the leading document analysis and intelligence services. When we launched the second version of Textract, which reduced the error rate of the previous version by 30%, we started receiving complaints from customers who said they wanted the old model back. We were puzzled to say the least. But what the customers were telling us was that we were solving the wrong problem. The fact that every single academic paper measures performance of machine learning systems using the average error rate does not mean that it is the right measure to optimize. Each customer cares about performance on their specific domain or cohort, which begs the question: Why should we use average error rate as the training loss? Just because every single academic paper does so? What we discovered is that even identical models trained on identical data and identical optimization procedure can make an identical number of mistakes on average, but most of the mistakes they make are different. So, we could independently optimize the average error rate and what we now call "positive congruence," or compatibility of old and new models on the cohorts of interest. Positive congruent training would have never been developed without listening to customers' feedback, while looking out the window of an academic office.
ENGenuity: Is there a quality that you have found to be useful as someone who leads teams in AI?
Soatto: Curiosity. Curiosity is expressed in different ways. In academia, it drives your goals. You are obsessed about a question just because. In industry, curiosity drives the solution. You are trying to solve a problem, but you need to understand the nuances, the different aspects of the problems, and the different ways you can look at the problem. It's still curiosity, but it is leveraged differently.
ENGenuity: Throughout your career, how has your Caltech education influenced you?
Soatto: Caltech is small, and that has some advantages and some disadvantages. The advantages for me were manifest at the beginning. I came to Caltech from Italy, where you grow up with the sense that anybody who did anything that mattered has been dead and buried for at least a few centuries. On my way to school in Padova, I biked in front of Galileo Galilei's house, Andrea Palladio's, Gabriele Falloppio's, and Alvise Cornaro's house. What could a random teen possibly do to top them? Caltech was small and influenced by larger-than-life personalities—Feynman was gone by then, but his imprint was there. This gave me the impression that, if the person panting next to me on the treadmill at the gym got a Nobel Prize, then why can't I? Of course, it's a silly thought, but proximity does give you a sense of possibility—that there are people who have done and can do great things, and they are walking the same grounds as you. They are humans, not statues like Leonardo da Vinci or Aristotle or whomever. It's a much more tangible setting. And that is made possible by the small size and accessibility.
I think things have changed now. Doing research now, at least in AI, is quite different from what it was like even 10 years ago; the social aspect is more pronounced and the role of any one individual figure less prominent. We went from a world where if you wanted your ideas exposed to the world you needed to convince Euler or a handful of other key bottlenecks. Now there is no obstacle to dissemination. Anybody can put out papers on arXiv, and whether some have an impact depends on a variety of factors, including less substantive ones like social media influence. I am not a fan of this model, maybe because of my Caltech education, but I don't see us going back to a more traditional academic model.
ENGenuity: What advice would you give to alumni, and more specifically, recent Caltech graduates, on how to bridge their Caltech education with life after Caltech?
Soatto: My answer is heavily conditioned by the present time. This is a time when things are happening that most people have not yet realized, especially those who have not witnessed firsthand what these models can do beyond a chat bot window. These models are challenging some of the very foundations of engineering and science that we use to educate students. What I would tell students is don't assume that what you learn today will still be valid even a few years out, even if it seems solid, and don't assume that this will be the lifelong basis of your intellectual framework. Things are changing very rapidly. I am concerned that there is a transitional generation of students who are being educated as this is happening and may find that their education is going to be rapidly obsolete. People can still learn methods, ways of thinking that can be used to expand and broaden their range, but especially if you grow up in a small place, keep your eyes wide open.
ENGenuity: Is there a project in your career that you are most proud of?
Soatto: The most recent crop of work that we've done with my team, my students, and my collaborators is aimed at understanding—as in modeling mathematically and analytically and implementing with tangible, computational infrastructure—the way in which these large-scale models represent and manipulate abstract concepts and meanings. This is normally something that engineers don't think much about. There is a fundamental misconception that we all have been raised on as engineers and scientists which is that there is this true world out there, and when we use data to build models of this world, all these models should converge into the one and only true model that we all share. Epistemologists call this "naive objectivism" or "naive realism." Each of us (bots included) build and maintain a different representation, and I cannot know what is inside your head and you cannot know what is inside mine. The only way we can register these is through a medium, a communication, and this is no different for models. Realizing this is important because the question becomes not about identifiability, uniqueness, or truth, but about alignment, explainability, representability, and learnability.
I think many of the questions that people in the foundations of mathematics, people in epistemology, as well as people in philosophy at the time of Russell, Frege, and Hilbert were asking about concepts of infinity and limits are becoming tangible today because these models do capture these concepts. Even concept with infinite manifest complexity, like "pi", or a visual scene, can be represented with finite data, finite compute and finite memory in finite time. Models can do it too. But we cannot know for sure if and when that has happened.
It's not easy to explain, so this is not the kind of work that will get a lot of 'likes,' but it drives better understanding of how these models can be controlled, and it tackles fundamental questions about uncertainty—whether you can trust these models and how you can measure not just the uncertainty of the model but the uncertainty of a specific outcome to a specific query that the model produces. All these questions are now addressable with tangible means that are under our control. Even though they are complex, I can see a path to studying them with tools that were not available at the turn of the 1900s.
ENGenuity: What gets you up in the morning?
Soatto: It's never a boring day when I get to think about some of these questions that are truly a once in a generation opportunity and discuss them with the brilliant minds I collaborate with. We are in the phase where things are happening fast, and we don't quite understand everything. Even the language to understand them and develop the theory has not been formed yet. Many of the tools we used to use to formalize, rationalize, and theorize phenomena that we observe need revising. So it is a very exciting time indeed.