Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

52 min

•May 21, 2026about 2 months ago

Summary

Cerebras CEO Andrew Feldman discusses the company's massive wafer-scale chip architecture, which solves a 75-year-old problem in computing by building a single chip the size of a dinner plate. The episode covers Cerebras's $5.5B IPO, major deals with OpenAI and AWS, and how the company's speed advantages in AI inference are reshaping the economics of token generation and data center deployment.

Insights

Wafer-scale architecture eliminates memory bottlenecks by using fast memory instead of slow HBM, delivering 15x faster inference than GPUs and avoiding supply chain constraints that limit competitors
Speed in AI inference commands premium pricing (Anthropic's 2x faster service sold out at 6x cost), suggesting users value productivity gains over raw cost per token
Open-source models cost significantly less per unit of intelligence than closed-source alternatives, creating a durable market for both rather than winner-take-all dynamics
Data center capacity and real estate, not chip manufacturing, are now the binding constraint on AI infrastructure growth for the next 15-18 months
CUDA's dominance has eroded dramatically—two of three leading frontier models (Gemini, Claude) now use zero CUDA, down from 100% market share three years ago

Trends

Inference workloads are becoming the primary driver of AI compute demand, outpacing training in commercial deploymentSpeed-optimized inference is emerging as a differentiator and premium service tier, not a commodity featureOpen-source models are capturing significant market share despite being 3-5% less capable than closed-source alternatives, driven by cost economicsData center real estate and power delivery infrastructure are becoming the primary bottleneck in AI scaling, not chip design or manufacturingDisaggregated inference solutions (mixing multiple chip types) are becoming standard practice rather than single-vendor deploymentsExport controls on advanced semiconductors are becoming operationally significant for chip vendors and their international customersFinancial instruments for hedging compute capacity (debt backed by GPU/chip deployments) are normalizing and creating derivatives marketsEnterprise customers are increasingly seeking inference providers separate from model makers to protect proprietary data and avoid training competitorsWafer-scale and alternative chip architectures are fragmenting the GPU-dominated inference market, reducing NVIDIA's architectural moatRegulatory and CFIUS approval processes are becoming material business risks for semiconductor companies with international customers

Topics

Wafer-scale chip architecture and manufacturing AI inference economics and token pricing GPU vs. alternative chip architectures for AI Memory bandwidth bottlenecks in AI inference Open-source vs. closed-source AI model economics Data center capacity constraints and real estate CUDA software ecosystem and competitive moats TSMC manufacturing capacity and supply chain Export controls and national security in semiconductors Financial instruments for compute capacity hedging Disaggregated inference and multi-vendor deployments Enterprise data privacy in AI inference US semiconductor manufacturing policy and CHIPS Act IPO valuation metrics for pre-profitable AI hardware companies Speed as a premium feature in AI inference services

Companies

Cerebras

Subject of episode; built world's largest chip for AI inference, IPO'd at $64B valuation with $5.5B raise

OpenAI

Signed $20B+ deal with Cerebras in December 2025; uses GPUs for training, not CUDA-dependent for inference

Amazon Web Services

Signed deal to deploy Cerebras systems in AWS data centers; will offer Cerebras inference via Bedrock service

NVIDIA

Dominant GPU vendor; Cerebras positions as faster alternative; CUDA software moat eroding as models move to TPUs/othe...

Anthropic

Frontier model maker; offers premium 2x faster inference service; uses Tranium and TPUs, not CUDA; trains on Cerebras...

Google

Builds Gemini model on TPUs; represents loss of CUDA market share in frontier model training

TSMC

Primary manufacturing partner for Cerebras; uses 5nm process; avoids supply constraints hitting NVIDIA (3nm, CoWoS)

G42

UAE national AI champion; major Cerebras customer and minority investor; operates cloud across UAE ecosystem and US d...

CoreWeave

GPU cloud provider; pioneered debt instruments backed by GPU deployments to fund data center expansion

ARM

Alternative processor architecture mentioned as example of market fragmentation vs. single dominant player

AMD

X86 processor competitor to Intel; example of durable multi-player market structure

Intel

X86 processor vendor; example of market with multiple viable competitors

Broadcom

CEO Hock Tan cited as one of great CEOs of era alongside NVIDIA's Jensen Huang

Samsung

HBM memory supplier; investing billions in Texas fab; alternative to TSMC for advanced manufacturing

Micron

HBM memory supplier; one of three companies making high-bandwidth memory under supply pressure

SK Hynix

HBM memory supplier; one of three companies making high-bandwidth memory under supply pressure

Mayo Clinic

Cerebras customer using systems for genomic and medical data analysis with proprietary datasets

GlaxoSmithKline

Pharma customer using Cerebras for drug design with unique proprietary datasets

Cursor

AI coding tool company using open-source models; competes with Anthropic and OpenAI coding products

Cognition

AI company using open-source models; powered by Cerebras infrastructure

People

Andrew Feldman

Discusses wafer-scale chip architecture, IPO, major customer deals, and competitive positioning vs. NVIDIA

Jill Weisenthal

Co-hosts episode; asks questions about chip architecture, economics, and competitive dynamics

Tracy Alloway

Co-hosts episode; discusses AI psychosis and inference economics

Jensen Huang

Cited as one of greatest CEOs of era; NVIDIA's GPU dominance and CUDA moat discussed as competitive context

Gene Amdahl

Failed wafer-scale attempt in 1980s; Feldman cites as example of 75-year history of failed attempts

Ben Thompson

Wrote piece distinguishing answer vs. agentic inference; Feldman responds to his framework

Ray Wang

Referenced for prior episode on memory as bottleneck in chip design

Quotes

"Larger chips process more information in less time. That produces faster results. Everybody had gone to bigger chips."

Andrew Feldman•Early in interview

"We're 15 times faster than the fastest GPU. On some problems we're 50, 100, even a thousand times faster than graphics processing units."

Andrew Feldman•Discussing performance advantages

"How big is the market for slow search? Zero. How big is the market for dial up internet? Zero. Why is that? Because nobody wants to wait."

Andrew Feldman•On speed importance in inference

"CUDA was really important in the creating of the AI landscape, but it's not important now. Two of three leading frontier models today use no CUDA."

Andrew Feldman•On NVIDIA's eroding moat

"We made more than 800 millionaires. And that's something I'm proud of every minute of every day."

Andrew Feldman•On IPO impact

Full Transcript

OddLots is brought to you by VanEck. For years, investors basically forgot about real assets, energy, gold, and infrastructure. But look at what's driving markets now. Central banks loading up on gold, massive CapEx cycles, currencies doing weird things, these assets are at the center of it. Racks, the VanEck real assets ETF, is an actively managed one-stop shop for real assets spanning gold, commodities, natural resource equities, Go to VanEck.com. The people of Britain love their fancy blenders. They've bought loads of them. And luckily, if they bought them with Barclaycard, they earned rewards. In fact, they'll earn rewards on all their eligible purchases. What you buy is your business. Giving you rewards on purchases is ours. Barclaycard, backing your future. 28.9% APR representative variable subject to application financial circumstances and borrowing history, T's and C's apply. Chase is the digital bank that gives your savings a boost anytime, anywhere. Even in the middle of the night. You bet. You could earn 4.5% AER variable, including a 2.25% AER fixed boost for 12 months. Right now with Chase, you could be boosting your way to a new kitchen. Exactly. Search Chase Boosted Saver. 18 plus UK residents available to new Chase current account customers for their first 31 days, 4.41% gross. Interest paid monthly eligibility in terms of supply. Bloomberg Audio Studios. Podcasts. Radio. News. Hello and welcome to another episode of the OddLots podcast. I'm Jill Weisenthal. And I'm Tracy Allaway. Tracy, I have to say, unfortunately, I don't have AI psychosis. I'm certain of that. Debatable. I'm pretty sure I don't have AI psychosis. I do have to say, unfortunately, like the amount of time now where it's like, it feels like AI related questions, and there are many of them, are sort of like swallowing up the other thoughts that I have in my head of whether it's questions about which model is best and why and what are the economics of inference and how much training is pre-training versus post-training for each model. Like it's just sort of like this blob that's growing that's taking up more and more of my thoughts. What is your definition of AI psychosis? Because one would argue that maybe thinking about AI literally all the time would be a form of psychosis. Well, let's just say like, I'm not the type who thinks that like, I don't like think that the AI is a friend for one thing. I'm not in love with the AI models. I don't think that in collaboration with chat GPT that I'm stumbling on unified theory of physics and things like that. So like... But you do spend a lot of time inputting instructions, pressing the button and seeing what comes out. Let's see what comes out. I'm just saying, I think I'm aware that I'm talking to machine and that we're not establishing any great breakthroughs of which we are collaborators and partners and friends. Recognizing you have a problem is the first step towards healing, Joe. Seriously though, there's a good reason to think about AI more and more, which is that a huge chunk of not just the market but the real economy is now revolving around AI, right? Totally. So anyway, again, within the AI conversation, there are a lot of subcategories. One of the subcategories happens to be another odd lot's favorite topic, which is chips. Of course, chips are used in multiple different ways. Their chips are used in different parts of the AI supply chain, different types of chips of different roles. And so we have to learn more. We have to learn more. And I have to say, I'm particularly interested in the company we're about to speak to, partly because the two things I know about them are, number one, they just had a huge IPO, raising something like $5.5 billion at kind of insane multiple. I can't even do a price to earnings multiple because they're not profitable yet. But I think just on a sales basis, it was like 67 times forward earnings, which is pretty juicy, pretty hot. And the second thing I know about the company is they make giant wafers, which is just a fun image to have in your head. That's right. So if you were thinking, it's like, okay, there is a hot entrant in this space. What is their differentiator? Well, one fact about them is their chips are just enormous about the size of the dinner plate. One might think you're reading an onion article, but in fact, it's real. And apparently, it actually has some real technical advantages. And it's different to what everyone else is doing. So everyone else is, I guess, doing this sort of modular networking thing where you get together a bunch of chips, where you connect them together, and that's how you get more compute, more memory, more power, basically. But this company has done something different in the form of the giant wafer. The giant wafer, and if you figure that to get maximum performance, you sort of want to lessen the distance between things, then put it all on one wafer. Anyway, we're going to learn a lot more. I'm very excited to say about giant waivers and more. I'm very excited to say we do have the founder and CEO of Cerebros on the podcast, Andrew Feldman, truly the perfect guest. So Andrew, thank you so much for coming on the podcast on the week of your IPO. Well, thank you so much for having me. What a pleasure. Absolutely. Why don't you just start us off the big giant chip? They're apparently real. They're as big as a dinner plate. What is the technical reason why this actually makes sense as a superior form of architecture for at least some aspect of AI? I think larger chips process more information in less time. Okay. That produces faster results. Everybody had gone to bigger chips. Nvidia had moved from 400 square millimeters to 800 square millimeters over the course of five or six years for this exact reason. In the compute industry, wafer scale, which is building a chip this big, by the way, for those who are just listening, Andrew is now holding up the chip. And yes, it looks, it actually looks bigger than the dinner plate, to be honest. But that is a big chip. That's a big chip. That's a big chip. It's beautiful. It's 58 times larger than any other chip that had ever been. Wow. And what it did was it allowed us to use a different type of memory. Okay. A type of memory that, at the beginning, there are two types of memory. There's memory that can store a lot, but it's really slow. Okay. And there's memory that can't store very much per square millimeter, but it's blisteringly fast. Okay. And historically, all graphics, processing units, use this memory that could store a lot, but was really slow. And that's the reason they do inference so slowly. So if you're using Claude right now, or you're using anything but chat GPT, what you'll frequently feel is you'll enter your prompt and you'll wait for an answer. Right. And that's because the memory is slow and they have to move a ton of information from memory to compute. Now by going to wafer scale, we could use this fast memory. Now we couldn't make that memory store more information per square millimeter, but we could add a square millimeters. And so by building this big chip, we were able to stuff it to the gills with this fast memory. And that's why we're 15 times faster than the fastest GPU. That's why on some problems we're 50, 100, even a thousand times faster than graphics processing units. Wait, can you explain how you actually managed to do this? Because I know there have been previous attempts to do wafer scale. And I seem to remember there was even like an early attempt in the 1980s or something to do it. How were you able to pull this off? Yeah, it was an ambitious undertaking. That's for sure. Every previous effort in the 75 year history of our industry had failed, including Gene Amdahl, who's sort of on the Mount Rushmore of compute in our industry. He failed sort of spectacularly in the mid 80s at a company called Trilogy. Not only that, but after we succeeded, people who had visited us, who'd been in our labs tried to copy us and they also failed. And so what we were able to do is solve a set of really fundamental problems. And those problems cut across a wide swath of technology. They cut across lithography. So we had to collaborate closely with TSMC and they turned out to be a great partner. We had to make inventions in material and packaging. That's how you put a processor, how you put a piece of silicon on a motherboard, deliver power, and IO to it. We had to make inventions in power delivery. When you build a giant chip, you're going to deliver way more power to it than if you do a chip the size of a postage stamp. We had to invent ways to cool it. We had to write new types of software that ran on it. All of these had never been done before. And it was a decade long process. It took us five years and about $500 million to deliver the first one. And it's been an extraordinary run since. In December, we signed a deal with Open AI north of $20 billion, one of the largest contracts ever signed in Silicon Valley. And then in March, we signed a deal with AWS, where they would deploy our systems in their data centers, in their AWS data centers. And so it's just been an extraordinary run, but it took a long time. It took extraordinary engineering. And there were certainly long periods of time when it wasn't clear we were going to make this work. Obviously, you've hit this remarkable milestone. You have, in fact, IPO'd and so forth. And right now, market's valuing your company at $64 billion early days of the IPO, just for the listener to understand. The chips are solely an inference as opposed to training. When we think about AI, I think about, okay, there's training, training the model, and then answer giving. That's the inference. Are the chips for just for inference? So a couple of things. I think you framed it exactly right. Training is how we make AI. And inference is how we use AI. And so what happened was that in sort of 2025, in the first part of 2025, the models we made were smart enough to be useful. And there was an explosion of use. And we use AI by doing inference. So there was this sort of tidal wave of demand on inference. And that has continued in 2026. And we think it will continue for years and years to come. And so that's what had happened. In 2015, when we began thinking about the company, we knew that AI was on the horizon and that would eat a huge amount of compute. And we made sort of two fundamental bets. We bet that it would need dedicated silicon. And right, graphics had needed dedicated silicon. That's how you got the graphics processing unit, mobile compute had needed dedicated compute. That's where you got ARM processors. We made that bet. And we made a bet that modifying the GPU architecture wouldn't be right. You needed to start with a clean sheet of paper. And so what we started with was a new vision. And that vision could do training. And it could do inference. And it was orders of magnitude faster both. But right now, what we're seeing is such an explosion in demand for inference that a lot of the business has meant it is inference, even though we're just as fast at the same amount faster than GPUs on training. That's interesting. Maybe we'll get more to the theoretical training market a little later. Just real quick, on inference, Ben Thompson, who writes a newsletter about tech, he wrote a piece in which he distinguishes between answer inference and agentic and friend. So answer inferences like, you know, format by resume or whatever, or write me an essay on X or Y or answer some questions. And then agentic inference is like, okay, here's this thing that's going to go around. Do you distinguish and do services for you, not producing visual answers? Do you distinguish between those two? Are the is that a real divide in your view? And can your chips do both? Our chips can do both. I think it is a divide. Okay, I think speed matters equally in both. Okay, I think if you are engaged with the AI, if you're writing code, which is agentic, if you're writing code, or you're doing work, nobody wants to wait. I mean, we could just turn the question around and say, well, how big is the market for slow search? Zero. How big is the market for dial up internet? Zero. Why is that? Because nobody wants to wait. Right? So if you're engaged with the AI, speed is of the essence. But if the AI is doing agentic work, and your competitor gets three times, five times, 10 times as much work done in 20 minutes than you do, you're going to get smoked. And so this notion somehow that been proposed that speed isn't very important in agentic flows is dead wrong. That speed is important in all aspects of productive work. And that your ability to get more done in less time is a fundamental advantage that a cruise over time. Right? If while your competitor is doing one unit of work, you can do three. And in the next time they do one unit of work, you do six. Right? This adds up over time and you beat them in any line of work. And so speed, which is sort of our specialty, is important across the board. What do giant wafers and speed in general actually mean for, I guess, the economics of tokens? Because one way I think about it, I have this sort of vision in my head, like, okay, if I'm out shopping for toothpaste, I know I need toothpaste every once in a while and I go into like a CVS store, I get one thing of toothpaste and then maybe a week later I get some more toothpaste, or I could go to Costco and buy a giant thing of toothpaste and take it home, probably at a cheaper cost. And that's sort of how I think of the giant wafers. Maybe it's a bad analogy. But what does speed actually mean for the cost of tokens? Well, I think there are a couple observations. I think people have chosen so far to price speed a little higher. For example, Anthropic offered a premium service in which they offered tokens twice as fast and charged six times as much. And they sold it out and they couldn't meet the demand. Now, just to give you an idea, we're 15 times faster than they're twice as fast. And so people value speed because it allows them to do more work and they value their time. And when you can do more work in less time, you are making people more productive. That's why people have chosen to price them at a premium. They don't cost more to make. In fact, in the GPU architecture is an extremely good architecture and extremely efficient at building very slow tokens. And if you don't mind slow, the cost per token on a GPU is extremely low. But the GPU has a characteristic that as you try and go faster, the cost and the power used per token increase. Sort of like as you go faster in your car, your miles per gallon decrease. So what happens is as you try and get fast enough to be useful, fast enough to be interesting, fast enough to keep users intelligence focused on this product, they become extremely expensive and extremely power hungry. And so the question is, is not just what people are paying for a token, what people are choosing to price them that, but what they actually cost to make. And GPUs make very slow tokens very cheaply. And they're unbelievably expensive at fast tokens. We make fast tokens vastly less expensive than the GPUs and we use a tiny fraction of the power. Data centers need electricity. AI needs copper. Reshoring needs steel. And Gold's Run may tell you something about how the world is repricing money and debt. All of those point back to real assets. The RACCETF is an actively managed one stop real asset shop from gold to commodities to natural resource equities, adjusting as conditions change. Visit vanneck.com slash raax pod to learn more. And investors should consider the investment objective risks, charges and expenses of the fund carefully before investing to obtain a prospectus and summary prospectus, which contain this and other information. Visit vanneck.com. Please read the prospectus and summary prospectus carefully before investing. Rax is distributed by Vanneck Securities Corporation distributor. Some follow the noise. Bloomberg follows the money because behind every headline is a bottom line. Whether it's the funds fueling AI or crypto's trillion dollar swings, there's a money side to every story. And when you see the money side, you understand what others miss. Get the money side of the story. Subscribe now at Bloomberg.com. So get yours today. Offer ends 28th of May. Let's say we stipulate that this is all true and everyone wants the fastest and everyone's like, you know what, this is the solution. That the cerebros technology, one big chip, this is really where it's at. How much of like your market share for the inference market when you look out next year, the year after, et cetera, how much is your market share going to be dictated by your ability to get capacity at TSMC fabs? How much is that a gating mechanism for growth? TSMC is a huge part of the supply chain. But we have some real advantages. There are three areas right now that are limiting vendors and building AI computes. Number one is HBM memory. Is this memory we described earlier that can store a lot, but it's really slow? That's made by three companies approximately, Samsung, Heinecks and Micron. And it's under unbelievable supply pressure. It's extremely difficult to get. There are very long lead times. It's unbelievably expensive right now. We don't use it. The second part that's limiting is a process inside of TSMC called COOS. And this is the process that NVIDIA and other GPUs use. We don't use it. The third thing is that at TSMC, the factory that is under most pressure is their three nanometer factory. We don't use it. We use five nanometer. So we have managed to avoid some of the most binding supply constraints. Now, TSMC still has to give us a meaningful allocation. And they've been an extraordinary partner from the get-go. And they are the greatest manufacturing company on earth by far. A fab is sort of a modern pyramid. It's an unbelievable thing. And I highly recommend you or any of your listeners, if you get a chance to go to Taipei, go and see them. They are just extraordinary. Can you do fab tours? You can. You can go and they have a museum of innovation. And it is an extraordinary thing. They are the sort of the national champion of Taiwan. But I think today, TSMC has given us as many wafers as we've needed. Business today is constrained by data centers. And that's the grand irony. You invent technology that has been unbuildable, never been invented for 75 years in the history of compute. You write software that is extraordinary. You build a product that is vastly faster than the incumbent. And what are we all constrained by? Buildings. Data centers right now are everybody's constraint in the entire industry. Powered buildings. So real estate. It is an amazing thing right now. And that is to sort of across the board. And that will not change for the next 15 or 18 months for sure. I mean, since we're talking physical constraints, I guess I should ask you, we did an episode about helium recently. The helium shortage given the situation in the Strait of Hormuz. And one of the things that helium is used for is lithography on semiconductor chips. Has that affected you at all? Is that something that you're monitoring? We monitor, but there's not a lot we can do. And there's plenty of stuff to worry about that we can't affect. We obviously are in communication every day with TSMC. We're in communication with our entire supply chain every single day. And we stay abreast of the various issues. But it has had no impact on us. And we put that in the bucket of things that our manufacturing partners worry about also and that we can't help. So in addition to manufacturing these chips, you actually, I didn't realize this, you have your own cloud. We do. And you have your own cloud services. We do. I have a bunch of questions about that. But you have your own cloud services through which a user can actually get access to various open source models and so forth. It looks a little bit sort of visually, it looks a lot like the open router interface, roughly the same environment, except it's all like the open source. What I'm something I'm curious about, and maybe you could speak to this, you know, in traditional software, open source, one nice thing about open source is you don't have to pay for it. So it's free. It's a little bit different when we're talking about there's not really such thing as like free AI software, because even if it's like free, you still have to like pay for the depreciation of the chips and you have to pay for the electricity to run them. So there's no real such thing as like free open source AI software. But what I am curious about in your experience as a cloud vendor, are the open sources models cheaper on a per unit of intelligence basis, if we had some way of saying levelized cost of intelligence, which I don't know if the industry has yet, are open source models cheaper per IQ point, whatever we want, however we want to measure intelligence. Yes, by a lot. Really? Yeah, I think in the closed source world, you're paying a lot for that extra little bit of intelligence, right? The open source models, there are no open source models that are as good as the closed source models. Okay. Think of it as three, four percent, five percent different. Okay. Something in that range and it could be a little more, it could be a little less. But the cost to you using them, you can jump up right now and run Kimmy K2. It's a one trillion parameter model. It's an open source model on Cerebrus where 10 or 15 times faster than others. And what you're paying for is the cost of our power and some cost of the compute that took to calculate it. What you're not paying for was the cost to train it. Right. And that's a battle that is underway in the market. You have open AI with their coding software. You have Anthropic with their coding software. And you've got companies like Cursor and Cognition that are using open source. We power open AI and we power cognition. You have a battle underway between closed source and open source. And I think that the winners of that battle is yet to be determined. What is clear is that the closed source is strictly better by a little bit, by how much varies and it's more expensive. Yeah. I think we've talked about this before, but like I've heard of a lot of big companies in the U.S. who have been like very quietly shifting from some of the closed source models to the open source models like the Chinese ones like Kimmy, is that what it's called? Kimmy and Quinn. I'm sorry to press you on this point, but if you had to make a bet like in 20 years is the dominant aim AI model going to be a cheap open source thing or a more expensive incrementally better closed source model? I don't think there's going to be one. Right. There's not one SaaS software. Right. There's some big dogs. Right. There's Salesforce. There's some other sort of giant players and there are lots of other specialists. I can't think of many markets where we sort of settled onto one player. Right. If you look at the semiconductor market, you've got X86 where you've got two major players in AMD and Intel. And then you've got a whole adjacent market owned by ARM and the companies that build ARM parts. And then you've got custom silicon around that. I think that's the way you're going to have this. We're going to have open AI is going to continue to do extraordinary things. There will be competitors to them and there will be open source. I don't think any of those go away. Since we're on the topic of software, one of the things you often hear when talking about new chip entrance going up against Nvidia is this idea that, well, Nvidia chips, they're great and all, but the real moat of Nvidia's business is CUDA. Right. It's software stack that goes with it. What's your take on that? Is that a realistic concern for someone who's trying to go up against a company as big and I guess as embedded in the software system as Nvidia currently is? Nvidia is probably the greatest company in the first part of the this century. Jensen's one of the great CEOs of our era along with Hock Tan at Broadcom and maybe Lisa at AMD, just extraordinary. And CUDA was really important in the creating of the AI landscape, but it's not important now. And it has no role whatsoever in inference. If you want to move from running a model on GPUs today to running it on us, we can move it in 10 keystrokes. Just move, point to our API. So that's the first part. The second part is that a year ago, every major frontier lab model had been built on a CUDA foundation. And today, two of three haven't. So they lost 70% market share. There are three leading frontier models, Gemini, Claude, and GPT. Gemini, built by Google on TPUs, trained on TPUs, served on TPUs, no CUDA. Anthropics models, trained on Tranium, no CUDA, served on TPUs, on Tranium, and on GPUs. And OpenAI's GPT, trained on GPUs in the CUDA environment. So two of the three leading models today use no CUDA. That's a hemorrhaging of share. And so I think what was true three or five years ago in which CUDA had a dominant position with Central, has shrunk significantly, and not important at all in inference and shrinking in its role in training. Since we're talking about the economics of inference and all this stuff, I would love to get your take. One of the things that literally in the last couple of weeks, there's been this flurry of announcements of these attempts to financialize the market for compute. And so it's like, oh, you're going to buy some capacity, the H100 benchmark, et cetera, and people on maybe theoretically hedging it. I'm not entirely convinced. It still seems to me, like it's not like maybe, but on the other hand, like an inference provider can lock in a very long term relationship bilaterally with a data center and so forth, and no need for like these spot hedging markets. Do you think the market is going to evolve in such a way that there will be significant demand for financial instruments that allow inference providers to hedge their price exposure? I don't know. I'm not a financial engineer. It's the first thing. But we can look a little bit at history. The guys at CoreWeave were enormously innovative in how to fund some of their massive deployments. They were some of the first to use a dead instrument that had a backstop with the GPU. And this enabled them to really leap out and have first mover advantage in the neocloud space. And that was an innovation in financial engineering and extremely creative. Others followed, and now there's a big and active debt market in funding the building and the fit out of data centers. When you have a market that is that big and that active, you have people who want to make bets on either side. And I think over time, those bets normalize in regular eyes, and you can wrap them up and you can make it easy to make the bet. When sort of Co2 was one of the first to loan money against GPUs for CoreWeave, this was really innovative. And not only does CoreWeave get credit for the creating of the instrument, but so does the other side of the deal for doing it and making a successful innovative bet. And as sort of more and more people jumped in, and these could be regularized, they could be more easily priced, and then once it's regularized and you have a market, then derivatives of that market are easy to make historically. And that's sort of the way I see this unfolding. That as this market for data centers and compute matures, there'll be people making bets on either side and financial instruments will be created to do it. Whether it's a good idea or not, I have no opinion at this point. Since we brought up finance, I was looking through the IPO filing and looking at some of the actual numbers in there. And I know you have the open AI deal now, but a huge chunk of your revenue comes from this company called G42 in Abu Dhabi. And I think they're both like your biggest customer and also a major investor. What does G42 actually do with all these chips? Sure. Last year, they were a really important chunk of our business, a lot of it. They're a minor, a minority investor. They are the national champion, the national AI champion of the UAE. And they built a cloud that is used across the UAE's ecosystem. So it's used by leading universities there. It's used by leading companies there, companies like AdNock. They're a leading oil company. It's used by G42's nine operating companies. The deployments to date have been in the US. We have a massive data centers that run equipment for G42 here in Santa Clara, but also in Minneapolis, in Dallas, Texas, soon in Toronto. And so they're doing training and they're doing inference. The training they're doing, they have pioneered some of the leading English-Arabic models. They've done genomic work. They are doing serving of models and they're operating as a cloud, particularly for the UAE ecosystem, but also for global companies. San Francisco. On April 4th, 2023, around two in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco. Hey, who did this to you? What happened next turned the story into a political firestorm. Reports have identified the victim as Bob Lee, the founder of Cash App. From Bloomberg Podcasts, this is Foundry, the killing of Bob Lee. Listen now, wherever you get your podcasts. Blowing out budget or metrics that look great till the CFO sees them. That's bull spend. And marketers are calling it out in dashboard confessions. I remember telling my boss it'll be good for the brand when leads were slow. Yeah, it wasn't. Cut the bull spend. LinkedIn lets you target by company, job title and more. Advertise on LinkedIn. Spend 200 pounds on your first campaign and get a 200 pound credit. Go to linkedin.com Slate terms and conditions apply. Do you think that over time, corporate users and perhaps individual users, but corporate users will want inference served from a company that's separate from the model maker such that they can be certain that they are not revealing and thus training the company that might replace them? I mean, look, anthropic every couple of days and now just some new thing. Oh, we have a new markdown file that could do this for tax or that could do this for whatever and then a bunch of companies fall. Like our company is that you use AI increasingly going to want to want to use data centers and inference providers that aren't the model themselves. Well, first, I think there is a type of professional, a type of job that is most directly under threat from AI. Okay. And they're almost always wide caller and they required you to have expertise over a body of knowledge. That's what an accountant is. You have expertise over a body of knowledge of rulings of previous examples of tax case law, et cetera. That's exactly what AI is good at right now. Exactly. So lawyers, accountants, they're sort of these professionals who have stood between sort of the ordinary person who doesn't know anything about IRS tax rules and the tax rules. That is under threat. And that is something that it will be very easy for companies like Open AI and Anthropic to chew through. There are other areas like say drug design, genetics, genomics, where companies like Galaxosmith Klein have remarkable and unique datasets. This is true for one of our large customers, Mayo Clinic. It's true for Galaxosmith Klein and other of our pharma customers. They have unique data and they will be able to find insight in that data and they will be able to get value from that data. And they will certainly not want to share that data with the foundation model makers unless they are guaranteed that it will not sort of make the general model smarter. And these are companies that have spent 20 or 30 years spending tens of billions of dollars a year gathering data, patient care records or test results for drug design. They're going to mine the insight in this work and they're going to find extraordinary things. And those are much more protected because the insights in the data and they have the data. You know you were talking about fabs in Taiwan earlier and I'm now regretting not going on a fab tour when I was in Taipei but it just didn't cross my mind at that time. Next time. Next time. Yeah, hopefully. And there have been various efforts under the CHIPS Act and some other industrial policies to try to build more chip making capacity in the US. In your view, what's the big I guess impediment to actually do it? Yeah, A, is it happening and then B, why does it seem so difficult to actually make happen? Right, the first thing is difficult because it's a difficult problem to their heart. They cost 30 or 40 billion dollars and take five or six years to build. So that amount of money in that amount of time cuts across administrations. Right, and that's a problem with the politics in the US is it's hard to make policy that's durable across administrations and across time. It's the first thing. The second thing is these are remarkably complicated buildings and we have a sort of a hodgepodge, a sort of strange latticework of local regional building codes that a fab maker has to negotiate. Third is we're trying. TSMC has dedicated tens of billions of dollars to their fabs in Arizona and have committed hundreds of billions more. Samsung has dedicated tens of billions of dollars and committed hundreds of billions more to their fabs in Texas, but they take a long time and we have to remain committed to building not just the fab, but the surrounding ecosystem, not just for three or five years, but for 20 years or 25 years because you want not just one fab, but you want a whole trajectory of fabs. You want them working at today's cutting edge, but tomorrow's and next years and in 10 years cutting edge as well. And those are things that have proven really challenging in the US. And I think we need it. They're strategic assets and I think we need to find ways to collaborate with those that have the expertise and to find ways to build policy that is durable over a length of time that can build a vibrant ecosystem in the fab and the associated elements. The other big political economy theme, I guess when it comes to semiconductors, is this idea that they are in fact a strategically important technology and so the US should place some limitations on their use abroad. And so we've seen things like export controls, export restrictions. You're an actual chip company and so I'm very curious at an operating level what your experience of these kind of export controls has actually been. Like how much time does that take up for you? And then also given that one of your biggest customers is an international firm in Abu Dhabi, like how important is the trajectory of those export controls to your future business? I think three or four years ago I would have said not important at all. I think today they're really important. In the last administration I got to know the leadership in the Department of Commerce and in the the BIS division of commerce which oversees the licensing. I think this is an extraordinarily difficult job and we saw really hardworking smart people doing a job that is very, very difficult. I got to know the people in this administration and I found the same. Every single one of them is earning a tiny fraction of what they could earn in the private sector and is doing this because they believe that this is an important mission. The problem is that there are differing views about the right way to do this and there are differing views on the right way to achieve the goal which is to not give your most precious technology to your industrial enemy. And I think we can agree that today in today's environment China is an industrial enemy. Good, well-meaning people can disagree on whether the right strategy is to limit them from gaining access. Others argue, as those at NVIDIA have argued, is that the right strategy is to give them access and to keep them working on our product, on US-made, on US sort of designed product. I come down on the other side of that argument. I understand they're good arguments in both directions. I think limiting the distribution, the diffusion of our most precious technologies makes sense. And I think we have to do it thoughtfully and we have to recognize that means some markets will be foreclosed to us and I'm okay with that. Just quickly on the sort of like current business stuff, you mentioned the deal with AWS. How does that work? Can customers right now, like, can customers of AWS pay them to have inference served specifically on one of your chips? Not yet, but soon. Okay. They will be, it will be served in Bedrock, which is their AI as a service offering. And they will, yes, be able to go down the click-down menu and get superfast inference, which will be delivered via a combination of what's called a disaggregated solution, which is using some tranium for some of the inference work and using the Cerebrus technology in our systems called the CS3 for other parts of the work. And presumably someone who scrolls down and selects that, they would pay some premium for that ultra-fast inference. I think they will pay a premium. We will see this as entirely as Amazon wishes to price it, their product. So you IPOed this week, it's May 2026. This is not the first time that you've tried to or looked towards going to the IPO market. There were headlines going back to 2024 about wanting to try for the IPO market. And then there were headlines last year, especially because of the relationship with G42 about SIFIUS and some of the national security concerns. And maybe that was an issue with the IPO. And then, but also last September, you got one of your, it looks like G round, G round, one of the participants in the G round investor was 1789 capital, which is, of course, the firm that's associated with Donald Trump Jr., which is a lot of things. And then the IPO happened. I'm a cynic. So I wonder if the participation, if Donald Trump Jr.'s investment in your company made it easier to get the green light from these national security concerns to do an IPO. I wish it were that easy. No, it had no, no role at all. We resolved all SIFIUS issues in March of 2025. I believe that was before we took money from, from 1789. Okay. Moreover, I, I wouldn't ask that that's not who I am. And that's not the way we roll. So we took money because they are a thoughtful venture firm. And we don't believe that there's only one point of political view. There are lots of political views. They all have some merit. They all have some weaknesses. And so we have right-leaning political, some investors, we have left-leaning. The fact that this firm had some right-leaning in, in sort of investors, we were looking only at their ability to help us build an extraordinary company. And we have asked and we will not ask, we have never asked, nor will we ever ask for political access or anything of the kind. What's it like to become a billionaire in a single day? This is something I assume will never happen to me. So I might as well ask you. No, I, I think the honest truth is it was a big nothing for me. I was, had some wealth before and have some wealth after. I think this is a very difficult way to make money, right? Being a tech CEO. I think what you have to do is you have to love the work. You have to love the people. And you have to think every day about how to make your team rich. And far more important than sort of some change in my wealth was we made more than 800 millionaires. And that's something I'm proud of every minute of every day. And at my last company, we made 100 millionaires. And at this company, through our IPO, we made more than 800. And that's something that you wake up feeling good about yourself every single day. That was going to be my last question. But actually, you just reminded me in that answer, you know, the idea that getting here, I said you became a billionaire in a day, but obviously, this was the outcome of years and years and years of work. And if we think about technological hardware, one of the things most people associate it with is really long lead times and really big research and development budgets. Now that you're a public company, how do you sort of balance that quarter to quarter financial performance pressure with the idea that you still need to be investing in CAPEX, in new ways of designing chips, new improvements to the existing ones? First, we think the opportunity for innovation based on our way for scale engine, the best work is still ahead of us. Number one, we see an opportunity for extraordinary innovation in the years ahead to make leaps every bit as big and often bigger than what we made by building the largest chip on earth. When you love building hardware, the fact that it takes time is part of the deal, right? That what we do can't be done in a week or a month or a year. And that's what you sign up for. And that's true in every profession. You sign up for the good and the challenging. And you have to sort of make peace with that if you're a person that wants to dive in and sort of begin iterating right away and fail quickly and code up something and look at it and throw it out in the market and see if it wins. Godspeed, that's great. And that's not for me. In our business, we measure twice before we cut once. And you have to put that in your soul and you have to like it. You have to like that mistakes in our business are really expensive. And you have to like the fact that you breathe life into a chunk of silicon. And you get it to do things that nobody else has ever been able to make a chunk of silicon do. And if that's for you, then this process that takes time and money, you love that too. And so I think I would love it less if you could do it in a week. And I think the people that I love to work with, they feel the same way. And they like being engineers, not because it's a path to money. They like being engineers because they like building things. And they like building hard things. And I like working with them for exactly that reason. Yeah, you mentioned breathing life into a chunk of silicon. My dad, who's a physicist, always likes to point out how carbon and silicon are right next to each other on the periodic table. They are. And they're sort of like, here are the two things that we have closest to life and they're literally touching each other. Maybe there's something deep in that. I think that's a really thoughtful thing your father said. Thank you. And I think that's really cool. And nobody pointed that out to me though. We've stared at periodic tables for a long time. But I think to the extent we can make artificial life, we need silicon. Yeah, and they're right next to each other. Right. Carbon is the heart of all other life. And artificial life will be founded. At least the intelligent part will be founded on silicon. Right below silicon is germanium. Maybe the next. I don't know. What does that mean, Joe? Ask your dad. Yeah, let's keep an eye on germanium next. Andrew, thank you so much for coming on Odd Lodz. Fascinating conversation right in the sweet spot of what we're interested. Really appreciate you taking your time. Hey, thank you guys for having me. And I really appreciate it. Look forward to seeing you against it. That was really fun. I'm super interested in this topic. And it does feel to me like the economics of inference in particular and the market inference, inference capacity, speed, like it's still day one. You know what I'm saying? Yeah. I just like looking at the giant. It's so cool. It really does seem like an onion thing, doesn't it? It's like company solves inference with a giant chip. By building the biggest chip in the world. But it is interesting. We did that episode, of course, with Ray Wang from Semi-analysis and talking about the role, like memory as being this really important part of this sort of cutting edge chipsets. And it's interesting to think it's like, okay, well, here is ab bottleneck that doesn't run into that they don't have. And the idea that at least as he described it, they're not fighting to get the smallest nanometer chips. And so maybe that gives them a little bit of breathing room on capacity there too. Yeah. I mean, I do imagine there are some downsides to having giant chips, just as there are upsides that Andrew laid out. The other thing I was wondering, I know he made the case for the reason speed is very important, but I can also imagine a world where maybe it's not that important. I think at some point, like the incremental speed factor just starts to become less important when weighed against the incremental cost of generating that extra speed. Yeah. I think it really, this is one of those things where it probably really depends on what you're using. So it's like, if you're like, you know what, I'm really curious why pterodactyls aren't actually dinosaurs. Can you explain it to me? Then it's like, I don't care about that. Like that fraction of a second is not that important. I would wait five minutes for the chatbot to tell you you're wrong, Joe. You just don't really care that much. But if you're doing some sort of like a genticoting thing, or whatever, etc., then like, yeah, that definitely adds up. And I will say, like, as you use it more, like, it's just like everything else, the treadmill of expectations. Here's some task that you can do in 30 seconds, which maybe several years ago would have taken you 30 minutes. And you get impatient in that 30 seconds, and you want it in 10 seconds. And that's just like that competition to shave down seconds. I think it's always going to be there. So no one ever gets satisfied with my point. It always eventually becomes like, it feels like waiting. But to me, this feels like this is the crux of the AI valuation argument, which is like, how much of a premium are we going to place on a model that maybe a closed source model that is maybe slightly better than an open source model? How much premium are we going to place on compute that is slightly faster than this other type of computer, like other use of compute? Like that, to me, it's an unanswered question. And Andrew was pretty upfront about closed versus open source. But I think on the speed question too, like, we're going to find out. We're going to find out. And, you know, I think one of the things that is going to happen, and there have been all these stories about sort of like token shock, like how many companies are spending on tokens. My guess is one of the things that will happen at some point is there's going to be a lot more discussion about why are we using this ultra premium model when we could have done this? Like there is a lot of just like throw it at the AI, rack up those bills, etc. And at some point there's going to be this like, okay, what really needs to be served fast? What really needs to be served on the most premium closed source models? And companies are probably going to get a lot more skilled at allocating from, you know, different forms of inference, depending on the need. Yeah, I think that's exactly it. And at that point, like we could well see some of the dynamics in the market start to change in terms of valuation. Shall we leave it there? Let's leave it there. This has been another episode of the OddLots podcast. I'm Tracy Allaway. You can follow me at Tracy Allaway. And I'm Joe Wiesenthal. You can follow me at the Stullwork. Follow our producers, Carmen Rodriguez at Carmen Arman Dash. She'll be at Dashbot, Kale Brooks at Kale Brooks, and Kevin Lozano at Kevin Lloyd Lozano. And for more OddLots content, go to Bloomberg.com slash OddLots, where we have a daily newsletter and all of our episodes. And you can chat about all of these topics 24-7 in our Discord, Discord.gg slash OddLots. And if you enjoy OddLots, if you like it, when we talk about giant wafers, then please leave us a positive review on your favorite podcast platform. And remember, if you're a Bloomberg subscriber, you can listen to all of our episodes absolutely ad free. All you need to do is find the Bloomberg channel on Apple Podcasts and follow the instructions there. Thanks for listening. This is Tom Keane inviting you to join us for the Bloomberg Surveillance Podcast. It's about making you smarter every business day. I'm Paul Sweeney. We bring you complete coverage of the U.S. market open. We cover stocks, bonds, commodities, even crypto, all the information you need to excel. And I'm Alexis Christopharis. Bloomberg Surveillance also brings you the analysis behind the headlines. We do that through conversations with the smartest names in economics, finance, investment, and international relations. We do all this live each and every weekday. Then bring you the best analysis in our daily podcast. Search for Bloomberg Surveillance on Apple, Spotify, YouTube, or anywhere else you listen. On the East Coast, listen at lunch. And on the West Coast, listen as soon as you wake up. That's the Bloomberg Surveillance Podcast with Tom Keane, Paul Sweeney, and me, Alexis Christopharis. Subscribe today, wherever you get your podcasts. Bloomberg Surveillance, essential listening, each and every business day.