EAS AI Bootcamp Progress Report: One Year In
Last year, Caltech EAS set out to empower its graduate students and postdocs with basic artificial intelligence (AI) training. This effort resulted in the creation of a series of AI bootcamps taught through the Division of Engineering and Applied Science. These courses are designed not only to introduce participants to AI technologies but also to foster interdisciplinary collaboration and practical, hands-on learning experiences. Each bootcamp offers participants the opportunity to dive into cutting-edge AI topics and directly apply AI techniques to the unique challenges they face in their respective fields across the engineering and science spectrum.
To learn more about the bootcamps' impact and how they are shaping the future of AI education at Caltech, ENGenuity spoke with Reza Sadri, Director of the AI Bootcamp Program in Engineering and Applied Science. Sadri shared insights into the program's initial success, challenges, and what lies ahead.
ENGenuity: How have you seen AI and ML being used by students in the AI bootcamp program?
Reza Sadri: When I joined Caltech, I interviewed a lot of people to get a sense of who is using AI, and who can use AI but is still not using it. There are very advanced use cases of the most up-to-date AI technologies that are out there. It's a very quickly evolving field. At the same time, you see a lot of people who still don't know exactly what it is, how they should use it, or they don't have the basic understanding of where it's applicable and where it's not the right tool. That was my first perception from talking to different people.
The general population has a very superficial understanding of AI. They hear about large language models (LLMs), or they hear about generative AI, but they don't know where it is applicable and where it is not applicable. Our goal is to change that superficial understanding—make it deeper and more guided. The way that we do that is by not only teaching people the basics of AI and how it works, but also to ask them to bring their problems from different fields—from chemistry, biology, physics, engineering, astronomy—and then discuss it in the class. By interacting and providing context, they get a lot of feedback. Our students are using techniques in different domains. When they interact with each other, the AI bootcamp not only deepens their understanding about AI, but it also makes it a lot crisper for them in terms of where they should apply what type of AI techniques.
ENGenuity: Why do some people come in knowing a lot about AI while some people come in knowing a little? Why is there that discrepancy?
Sadri: A lot of it depends on their background. Some disciplines are closer to computer science and algorithms. For example, there are people who come from CMS [Computing and Mathematical Sciences], so obviously they already have a lot of understanding. Some people have a background in AI and then they chose a specific application discipline. They may have a computer science background, but they decided to work on aeronautics. Those are the ones that have a better understanding of the foundations and fundamentals. But some people are very domain specific. They are good at geology, for instance, but they haven't had a lot of exposure to algorithms.
ENGenuity: Do students come into the AI bootcamps with a similar baseline understanding of certain mathematical concepts?
Sadri: Yes and no, and that was a bit of a surprise. I think there has been a shift in the math that is used in engineering. A lot of engineering disciplines have used quite a bit of calculus and multi-variable calculus over the last few decades. AI uses calculus, but it heavily relies on linear algebra. Some of the students don't have the required linear algebra background, so this is an area for improvement. You need to know three things: you need to know probability theory, calculus, and linear algebra.
ENGenuity: How have you seen people's relationship to AI change during the AI bootcamps?
Sadri: They start asking questions that are surprising. Not only do students take an AI bootcamp, but some of them remain in touch with me, they show me the work they've done, and sometimes they come and give talks about what they have done in later AI bootcamps. In fact, in all my AI bootcamps, I have people from previous AI bootcamps who come and talk about what they have done.
ENGenuity: What kind of questions do students ask in an AI bootcamp?
Sadri: Questions that fill in the gap in their knowledge. When you go and teach a concept from the beginning, you go over how things work, like neural networks. Then, it clicks at some point. It doesn't always click at first, but at some point, it clicks, and they bring a lot more questions. Then they start relating it to what they do. For example, when we had a bootcamp on transformers, after a few days they started asking questions about how they could use transformers in their specific applications. Then some discussed how they could tweak their specific problems to fit this tool.
ENGenuity: Has anything surprised you from the AI bootcamps so far?
Sadri: One thing that surprises me is that sometimes you see three or four people from very different domains ask a question, or they try to use a certain algorithm, and you have to give them the exact same answer even if they are from very diverse domains. For example, we had one participant with a problem from astronomy and one looking at medical imaging and other types of imaging device outputs. One was looking at the data that comes from the farthest exoplanets, and the other one was asking questions about images that are taken from the brain. The type of problem that they wanted to solve and the type of underlying technology was very similar. That's interesting. It's not unexpected, but when you see it in action, it's amazing.
ENGenuity: What have students found most interesting?
Sadri: Some of them understand the concept of ML, but they want to have a more hands-on approach; they want to see it in context.
ENGenuity: Are the bootcamps going to be repeated? Will there be new bootcamps devoted to specific topics?
Sadri: We have covered core ML, which includes linear models and neural networks; then we had physics-informed neural networks; we had reinforcement learning; then we had transformers. We've had six courses so far. There are some that we haven't had the chance to do yet. There is one that I am going to offer on foundational models, like LLMs and generative models. Then there will be another on studying graphs using ML. We will repeat topics and introduce new ones, but these are the core topics. We will do the intro to ML bootcamp at least twice a year. We will also do transformers for the time being because that is the basis for most of the new models. A lot of these are going to be repeated, but we are going to interject new areas like ethics. That is becoming an important topic.
ENGenuity: What are the components of ethics in AI that you would be addressing in that bootcamp?
Sadri: You can build a model, but if people trust it too much it can cause a lot of issues. The model may not give you the right information or it may not ask you to do the right thing. That can have ethical consequences. Then there is the problem of bias in the data that is fed to the model. How do you handle that? How do you take that into account? Privacy is also a big thing. You may use a model, and you don't know if the model is trained with private data. Does that impact your research? You may collect a lot of data and put it into the model, and then that data is encoded into your model. If you put that model out, other people may get access to that data, and you have to make sure that's ok.
The existing machine learning models are black boxes to a very large extent. It looks like you have tons of data, trillions of bytes, and then you throw them inside this box and there is this mechanism that processes them. You have very little visibility into how it happens. What part of the model connects to what part of the data or the answer? There is the input and the model. There are multiple layers in the model, but you don't know which one of the layers had what kind of impact. That is dangerous because if there are some issues with some of the input data, you don't know where those issues show themselves.
ENGenuity: In the AI bootcamps, do you go under the hood and help students explain what is going on technically with AI models?
Sadri: How to figure out what's going on exactly is an open problem. Nobody really knows. For our bootcamps, there are two goals. One goal is to let people know how to use this tool, because at the end of the day, it is a tool. A laptop is a tool, and you use that to make progress in your research. ML is also a tool. Now, when you have a tool, if you have some basic understanding of how that tool works, it helps you to use it more efficiently and safely. We are teaching how to use the tool and providing a basic understanding of how it works so you have a feeling about where to use it, where not to use it, and which context is the right use case.
ENGenuity: Going forward, what are the things you are looking to improve upon?
Sadri: How do we run the bootcamps? How do we advertise them? The other one is about the content and the process of the bootcamps. What kind of content do we offer? Do we do more lectures? Do we do more hands-on? How much do the participants need to participate and do projects? We have some good understanding now about the limitations and some ways to improve. For example, students and postdocs who come to these bootcamps are very busy. They have tried to carve out a week to dedicate to each bootcamp, but usually they still have some things to do on the side, so you cannot expect them to do much extra work.
So far, it has been more of a transfer of knowledge, but we want to make it more engaging both with the lectures and with the hands-on activities. We have been trying to make the examples more relevant to people inside Caltech. A lot of AI is built by companies for narrow business applications mostly in advertising and e-commerce. So, a lot of the real-world examples are from those two applications. Most people teaching ML use examples from commerce. How can we find the movie that someone will like and watch? How can we increase the click rate of an ad? But that is not useful for a Caltech student. We have to come up with test cases like processing some type of physical phenomenon, for example. These are usually complicated and require you to have a lot of domain knowledge. We need to simplify some of these to bring into our bootcamps, and we are going to offer some bootcamps that are domain specific. Right now, we have generic areas of AI like reinforcement learning and transformers, but we may go to verticals. For example, we can create an AI bootcamp specifically for biology.
ENGenuity: How should Caltech evolve to accommodate a world where AI is going to be used more and more?
Sadri: The number one thing is we need to build our compute infrastructure. If you want to be good at anything, you have to have the right infrastructure. If you want to have a lot of good drivers, you need to have a lot of good roads. We have fairly good infrastructure, but we still need a lot more. There are many layers to this. We need better GPUs, much faster GPUs. We need to figure out if we want to do that inside the campus. Currently, we have our own cloud. The hardware infrastructure is part of it, servers, memory, but the most important part of it is GPUs. In addition, you need a lot of new processes. The way that ML works is you have your data and then you take this data, fit it to some type of process, and it gives you a model, which is basically an application. That process has to become automated and easy to trace, track, monitor, and manage. These processes are as important as the hardware infrastructure.
When you use a car, you go inside the car, you press a button and the car starts moving. But underneath, a lot of things are happening. There is an engine, there are wheels, and you don't have to worry about how those things work. All you need to do is press the accelerator and steer. For machine learning, you don't need to know how this data is moved from here to here. That process has to become streamlined and easy so every person can use ML. We can reduce the barrier to entry for ML.
ENGenuity: What are your hopes and fears for what AI and ML can do for science and for society?
Sadri: I don't have a lot of fears; I have more hopes. I am on the optimistic side. It has a lot of impact because it is going to automate a lot of tasks that are boring or don't allow a human to add a lot of value. If you are a chemist, you want to understand the structure of a material or the structure of an interaction. You don't want to sit down and write a lot of code. Your real interest is not writing code. These tools are going to bridge that gap, so you can focus on the science. Also, it will create a language for scientists to talk to each other. If you look at science a few hundred years ago, you had philosophers (polymaths) who knew everything. They knew math, physics, astronomy, and sometimes they were also poets. But then, as science became deeper, you had to become more specialized. Now you have scientists focused on very narrow areas. You don't have those generalists anymore. Everyone has to be super specialized in order to be good at something. ML and AI are going to help those specialists talk to people from different disciplines. For example, if I'm reading a paper and I don't understand it, I can ask ChatGPT to explain it to me in simple terms. That's good because it makes collaboration a lot easier.
Now, AI-powered systems can analyze data across different fields, identifying patterns and connections that human researchers might miss because of the limits of specialization. This helps scientists tackle complex problems, such as climate change or personalized medicine, and brings back the idea of a "universal thinker," making it easier for people to gain insights from multiple disciplines.