Ex-Citadel Quant and AI Researcher On Breaking In, Tech vs Finance Careers
58 min
•Jan 26, 20263 months agoSummary
Nimit Sohani, a Stanford PhD and former Citadel quant, discusses his transition from quantitative finance to AI research at Cartesia. He contrasts career paths in finance versus tech, explores the critical skill of problem selection in research, and explains cutting-edge work in voice AI and state space models versus transformers.
Insights
- Problem selection is the most valuable research skill, requiring deep literature knowledge and pattern recognition—90% of research success depends on finding the right problems, not just executing them
- Quantitative finance offers better work-life balance than AI research due to trading hour constraints, despite its reputation for intensity, while AI's competitiveness drives longer hours
- PhD value is modality-dependent: essential for fundamental research and industry research roles, but unnecessary for engineering-heavy AI work; it primarily opens doors and develops research taste
- AI startups with research arms can outmaneuver big labs by challenging orthodoxy and being more exploratory, while maintaining product-market fit to avoid becoming obsolete wrapper companies
- Hybrid architectures combining state space models and transformers represent the cutting edge for text, while SSMs alone excel for audio due to data redundancy and fixed-size state advantages
Trends
Voice AI emerging as fastest-growing AI segment with applications in call centers, entertainment, and personal assistantsShift from pure transformer dominance toward hybrid architectures that combine SSMs and transformers for efficiency and qualityFinance firms increasingly establishing AI research arms and deep learning capabilities, moving beyond traditional quant strategiesStartup-versus-big-lab dynamics: startups gaining advantage through research-product integration and willingness to challenge accepted ideasCompression and efficiency becoming critical for inference at scale, particularly for long-context and multi-turn applicationsCareer convergence: technical professionals increasingly pivoting from SWE to AI research roles, especially at startups with flexible structuresSecrecy culture in finance contrasting sharply with open-source and publication culture in AI, creating different incentive structuresEnd-to-end model training replacing modular pipelines (speech-to-text + LLM + text-to-speech) for improved latency and naturalness
Topics
Quantitative Finance Careers and CompensationAI Research Problem Selection and Research TastePhD Value in Tech and Finance CareersState Space Models vs Transformers ArchitectureVoice AI and Text-to-Speech TechnologyWork-Life Balance: Finance vs AI ResearchCareer Transitions from SWE to AI ResearchHybrid Neural Network ArchitecturesLong Context and Sequence ModelingAI Startup vs Big Lab Research StrategyNon-Compete Agreements and Garden LeaveQuant Firm Culture and SecrecyInference Optimization and LatencyMultimodality in Deep LearningTokenization and Byte-Level Modeling
Companies
Citadel
Guest's former employer as quantitative researcher; discussed quant finance culture, compensation, and work-life balance
Cartesia
Guest's current employer; voice AI company building text-to-speech, speech-to-text, and voice agents with research focus
Eleven Labs
Primary competitor to Cartesia in voice AI space; had 18-month head start but Cartesia differentiates on latency and ...
Jane Street
Quant firm mentioned as example of different organizational structure with fewer quants and more technical traders
Renaissance Technologies
Described as gold standard quant firm with insane historical returns over 20-30 year period
Jump Trading
Top-tier quant firm mentioned as well-regarded with great technical talent and returns
Hudson River Trading
Elite quant firm mentioned among top firms with excellent returns and technical talent
XTX Markets
Newer elite quant firm with very strong returns, similar to Renaissance Technologies model
Radix Trading
Newer elite quant firm mentioned as having excellent returns despite being less well-known
TGS
Elite smaller quant firm in Southern California with excellent returns and secretive operations
OpenAI
Mentioned as example of big AI lab where guest could have worked instead of joining startup
DeepMind
Mentioned as example of big AI lab where guest could have worked instead of joining startup
Anthropic
Mentioned as example of big AI lab where guest could have worked instead of joining startup
Stanford University
Guest's PhD institution; source of co-founders and research network for Cartesia
Citadel Securities
Specific division where guest interned and worked as quantitative researcher
NVIDIA
Mentioned as having published hybrid SSM-transformer models representing cutting edge
People
Nimit Sohani
Stanford PhD, former Citadel quant, current AI researcher at Cartesia; main guest discussing career transitions
Albert Gu
Cartesia co-founder, Stanford PhD lab mate, creator of Mamba state space models; challenged transformer orthodoxy
Chris Ré
Stanford professor whose lab Nimit and Cartesia co-founders were part of during PhD
Quotes
"90% of the battle in research is actually finding the right problems."
Nimit Sohani•Early in episode
"Unlike in tech, one thing I was surprised by, culturally is how tight-lipped people are in finance, even within a firm."
Nimit Sohani•Mid-episode
"I think quant actually probably has a better work-life balance than AI, particularly because the level of competition in AI right now."
Nimit Sohani•Mid-episode
"The main challenge of Transformers is that the memory that they use at inference time grows linearly with the sequence length."
Nimit Sohani•Technical discussion section
"I think being in intersection is actually quite valuable because having a product that customers use is something that can drive the research."
Nimit Sohani•Late episode
Full Transcript
90% of the battle in research is actually finding the right problems. This is Nimit Sohani, Stanford PhD, AI researcher, and previously a quant at Citadel, and I asked him about both AI research and quant careers. Do you have any advice for someone who wants to move into AI research? If you want to sort of switch from a SWE track to AI track, there's got to be something behind it, right? We compared and contrasted these roles, which had some surprising insights. Yeah, I think, you know, quant actually probably has a better work-life balance than AI. Unlike in tech, you know, one thing I was surprised by, you know, culturally is how tight-lipped people are in finance, even within a firm. We also went deep into what he's actually working on now. The main challenge of Transformers is that... Here's the full episode. when you think about the opportunities that are not available to you without a phd what comes to mind yeah so i think you know there's not really too many opportunities that are you know actually unavailable to to people without a phd but some of them just get a lot easier with phds i mean so academia is an obvious one that does require a PhD. That was never something I was super interested in for a lot of reasons. But I think some roles that are definitely a PhD opens a lot of doors to are kind of the two that I've had experience with. One is like, you know, sort of doing industry research in AI like I'm doing now, or, you know, back in the day, there were, you know a few different uh like industry research and computer science or mathematics was a little bit more diverse but more and more people are converging towards ai so i'll just say like ai research is is one of them where having a phd helps or um not that you you know a lot of people do ai research without a phd but you know the type and shape of the role can look kind of different and another one is quantitative finance so again a lot of people go into quant um you know out of undergrad, but certainly having a PhD opens you up to some sort of different opportunities, and it can be a lot easier to get your foot in the door there. So if we think concretely, let's say I was going for an AI researcher role or something like that, are you saying the PhD helps you in that first step of filtering, or does it help somewhere else in the process in getting one of those roles? Yeah, so I think it's both. So certainly it's a lot easier to get an interview if you've differentiated yourself from the pack in some way. Just applying for AI research role at a top firm can be difficult if you don't have whatever the right schools, quote unquote, on your resume, the right internships, the right connections, whatever. But it's certainly doable. And then I guess like I think like more less transactionally having a or like doing the PhD can like develop like a key critical skill set that can help you as as along your path towards becoming a great AI researcher. But of course, you know, there is an argument for being thrown into the fire as well and just like kind of learning on the job. And that's certainly an option that works for many people. I think, you know, there are some things that are harder to do in industry than in academia, like kind of the more exploratory first principles, like fundamental research without necessarily like a direct application. You know, in an industry, definitely skews a lot heavier towards the applied side of things. But I think like having that fundamental background can be very valuable depending on what kind of research you're targeting. You mentioned the type and the shape of the role could be different if you had a PhD versus not. Could you give an example of what you mean? If you're working on more like engineering heavy stuff in AI, so, you know, building, you know, training or evaluation infrastructure, you know, working on like, you know, data processing, things like that. Those are not things like a PhD is really necessary for at all. I'd say if you want to do more like uh sort of pie in the sky type stuff like uh you know architecture design things like that um a phd can be you know can be helpful there because um you know you have more time to kind of explore directions that may not pay off in the short term um but you know again there are examples of people being successful in um you know without a phd with or without a phd in both domains. So I think like if your only goal is to be an AI researcher and you're not super, you know, tied to the, you know, the particular type of work you do, you just really want to get into the field. A PhD is definitely not necessary. But I think like if you're still in like the sort of exploration phase of your career and you want to find like a, you know, problem that really, draws your interest, then a PhD can be a good way to do that. You mentioned the PhD skill set or something that you kind of develop when you get a PhD. What is that skill set? I would say like 90% of the battle in research is actually finding the right problems. So you have to find a problem that is interesting, it's meaningful, that people are actually going to care if you solve it. You have to sometimes convince people it's interesting because they might not have thought about it the same way. And then you have to execute. And you need to make sure it's a problem that's appropriately scoped that is actually tractable for you to make progress on. So I think all of those things were not skills that I had developed. It was more just like execution was my strength. And so definitely that was a big learning process for me during the PhD is like that sort of like research taste and problem selection. And this is something that, you know, just like being immersed in the field, you know, really helps with, you know, once you've read enough papers, you know, talk to enough people, you kind of get a sense of the patterns and trends that are going on in the field. You mentioned the research taste and finding the right problems. If you could kind of condense what you learned in your PhD, is there maybe some top tips that kind of lead to you finding the right problems? I would say the main things that I find useful are just, you know, keeping abreast of the current literature, just reading, you know, as many papers as you can. It doesn't have to be reading them like end to end, just like skimming abstracts, you know, seeing what's going on, what are people thinking about. And, yeah, I think the other thing is just working your way up. So, you know, initially earlier in your career, you want to attack like, you know, very small sub problems that are, you know, you're reasonably likely to make progress on. Right. So one example, you know, example of this can be like you take a, you know, method and you try to extend it to some like, you know, special case or some something like slightly different from the original application. And then as you as you go on and as you mature as a researcher, you can start tackling like bigger and bigger problems. So, you know, not just like kind of extending previous work, but maybe coming up with like totally new ideas, things like that. So I think there is, you know, a gradual stage of maturation as a researcher. And I think, you know, some people do try to skip those steps. And I think that's generally, you know, inadvisable, I think. You mentioned keeping abreast of the literature. What's the go-to spot for you to kind of get your feet of the hot research paper to read? honestly uh twitter is uh one of the probably the main way that i uh keep up with papers i think if you um follow enough good people on on x i guess your your feed becomes like pretty curated to that so that's usually the first way i find out about stuff um obviously you know just like talking to people you know co-workers and so on but yeah i think x is my go-to so i try to i try to curate my feed in such a way that it's mostly machine learning papers and pictures of cute animals. Do you have a good starting point for someone who just wants to plug in? I pretty much started with following people I knew from Stanford and elsewhere, professors whose papers I'd read, prominent people at big labs and so on. And then anytime they tweet a paper, they like a paper or something like that. If it's interesting to me, I just click on that and I follow all the people tagged in or associated with that work. And that's, yeah, so I sort of grew my follow list organically via that. I understand after your PhD, you became a quantitative researcher at Citadel. Where were you in your career and why'd you decide to become a Yeah, I joined Citadel Securities after graduation for my PhD. And so there, yeah, so basically, I had actually interned there right before the summer right before graduating. And I liked it a lot. The reason I decided to intern, just kind of, I wanted to see what else was out there. I had a few friends who had interned at Citadel Securities and, you know, enjoyed it or other quant firms. and I was just kind of curious you know by that point I'd been working AI research for like four to five years and yeah like I said I was generally you know when I entered my PhD I was interested in careers in which I could apply my interests in mathematics and computation and so I think the three major careers of that form at the time were you know machine learning research, which I already had experience with, you know, quantitative finance. And then the last one, maybe quantum computing, but, you know, it was a much smaller sort of domain and one I had no experience with, although that one's also kind of blowing up these days. So yeah, quant finance, I was just kind of curious and, you know, I'd heard good things. And so I decided to intern and I ended up liking it a lot. I think it was, you know, refreshing in some ways. you know like I said the PhD is a grind you know you can burn out at various points and it was kind of a fresh set of problems totally different environment. Well it's funny that you say that the grind of the PhD you kind of took a break to become a closet Citadel because I've heard that the work culture is pretty intense at these finance companies is that the case? yeah so that is the reputation but uh i think that definitely varies a lot based on the team you're on the firm you're at um yeah i personally i had a pretty great work-life balance uh as funny as that might sound um as a quant um you know i think one reason is that um you know traders typically uh will work you know trading hours or whatever locale they're in you know of course there are markets all over the world, you know, APAC, you know, Europe and so on. But in the U.S., U.S. traders are typically working, you know, around U.S. trading hours and, you know, of course, a little bit before and after just to prepare and stuff. And so I think that generally has a sort of trickle down effect on the culture where, you know, most people are just kind of, you know, really clustered working around trading hours and then don't take their work home all too much. So, you know, even though the work can be done at any time, I think it just that is sort of how how the office culture operates. Yeah. When it comes to quantitative finance or quant work, how would you describe the work? It really, really depends a lot on both the team you're on, like the sector you focus on, you know, whether you're at a hedge fund or a market maker, whether you're like front office or back office, quant. and, you know, of course, the company. And so, you know, some quants will spend all their time just like on alpha generation. So, you know, generating new, you know, trading strategies and, you know, backtesting them and so on and putting them into practice and monetizing them. You know, I mean, well, some people will focus just on the alpha. Some people will focus on the monetization. Or you can be, you know, like a risk quant. So you're basically not necessarily generating strategies at all, but just like, you know, trying to come up with metrics to capture risk and, you know, avoid that, reduce risk without reducing, you know, cutting into profits. You might be, you might be doing like data analysis. So you might have like a ton of like historical, like trade data and stuff and analyzing them in various ways and so on. So yeah, I think, yeah. Yeah. Like I said, like, you know, hedge fund versus market making, they're actually very different problems. So I think the thing that unifies all of them is really, you know, having a strong math background. Like in the day-to-day, let's say, you know, your project that you're currently working on, like what would that look like? What would the shape of that problem look like? And how does math concretely play a role? You know, depending on what sector you're in, you know, there's a lot of different math that will come into play. I mean, you know, the sort of the backbone of finance is stochastic calculus. So I think that comes up almost everywhere. But then there are other things like, you know, numerics, numerical optimization, numerical interpolation, things like that. You know, machine learning, of course, is, you know, now more and more firms are like getting into getting really deep into the deep learning space, even like establishing their own research research arms that do like LLM type research and stuff like that. Yeah, numerical linear algebra, So there's a lot of different math and it's actually very diverse in terms of what people are always trying to come up with ways to apply different fields of math to quant. I think some of it is just for fun kind of because quants are such a mathy intellectual bunch but there is actually a lot that you know underpins the entire field So yeah I mean I think, yeah, stochastic calculus is probably the most unifying part. Like that's kind of the, you know, finance 101 type math. So you mentioned coding a lot as a quant, and I have had some friends who were Sweez at Citadel and these various companies understand the roles are quite different. How do Quants and Sweez typically collaborate at these companies? Really depends a lot on the company. Some companies like Jane Street, for instance, the number of people who are called Quants is actually very small and traders themselves are quite technical and implement a lot of stuff. And then, of course, they have software engineers as well um whereas citadel i think is a more quant forward firm so i think uh you know you know a quants might be uh if not the largest like percentage of employees like it might be like about equal in terms of the technical staff um and so yeah i think um you know there can be a lot of overlap in what a quant uh quantitative researcher and what a you know software engineer does and also you know between a quant and a trader um so um yeah it kind of just depends like at some firms i think it's more divorced where quants are really doing you know the the strategy work uh and then it's like kind of handed off to software engineers to implement um but at other firms i think uh you might do a bit of both uh because uh you know of course like the the person like if if they have the implementational skills the person best posed to like actually implement something is the person who like understands all the you know reasons and you know edge cases and things like that. And so, yeah, like I said, you know, I did a ton of coding, mostly in C++, also some Python. If you were to compare and contrast finance and tech generally across these roles, what comes to mind? So I think a lot of the skill set, first of all, is actually quite similar. Yeah, like I said, you know, math and computer science were my main interests, and I wanted a job that would leverage both of them. And I think that's been the case in quant, and that's also been a case in, you know, in AI research that I've done. And so, you know, I knew nothing about finance when it before I joined Citadel Securities. But, you know, I read a few textbooks recommend that were recommended by people. And, you know, that was really all I needed. And, yeah, from there, I just, you know, drew upon my sort of technical skills. And I think like AI research is a lot of the same way. I think if you have really strong fundamentals, you can pick up, you know, pick up the rest. So, yeah, in terms of technical skills, I don't think it was, you know, really a rough transition either going, you know, going either way. I think, you know, obviously the, you know, culture is different, you know, SF versus New York, those kind of things. Yeah, work hours, I would say, yeah, I think, you know, quant actually probably has a better work-life balance than AI, you know, particularly because the level of competition in AI right now. It's just a very competitive space. And so one of the ways you can gain a comparative advantage is just by outworking your competition. And that kind of is what happens in practice a lot of places. I know a lot of people who are just working around the clock. I've heard insane stories about the comp structures at quantitative finance firms. Is that all true? Like, is it heavily bonus weighted? And I've also heard stuff about garden leave. So, yeah, in terms of comp. Yeah, I think one thing is like, you know, there's not really standardized levels like there are in tech. You know, you can't just sit, you know, it's not like someone is just like IC5 and you kind of know like what, you know, kind of pay bands there, what they're making. uh it's it's uh yeah i think comp is really driven by a few things um you know how the company does that year how your team does that year um if you are really on like the alpha side of things like you know how you how your particular strategies did that year um and then um of course there's like uh other things that play into it like seniority both in terms of you know hierarchy if you're at one of the firms that you does have a kind of explicit hierarchy or in terms of like you know just like tenure at the firm or years of experience things like that and so um yeah i think you know quant firms i think are more secretive and you know partly because of the you know relative lack of standardization so um it is kind of opaque in terms of like how those factors actually combine uh for your final comp but um yeah i think it you know it can be very bonus driven if you're really on the alpha side of things and that attracts some people to that kind of thing where they really would just want like as they want to be as exposed as possible to I guess like the fruits of their labor but it is you know the downside is it can be much riskier business as well um so yeah it's just more variable but like if you're you know more back office type thing I think the comp is you know probably a little bit more deterministic if you're not you know directly tied to alpha generation yeah it's interesting I mean because we were talking about ai research versus quants and obviously being a quant is uh famous for earning a lot if you have generated a lot of alpha i hear compensation like easily in the millions for for a lot of these people um but at the same time ai research also popped off too you know if you're the top one percent of either of these firms you're going to do very well yeah i mean yeah they're kind of crazy um yeah these i mean these these things really do exist where people are making like nba player salaries and stuff i think you know for the for the median case uh yeah it's still it's still very good but i think uh yeah it's it's uh not not exactly that outlandish um yeah sorry uh you you also mentioned stuff about ndas and stuff and and garden leave uh so or sorry non-competes i guess so yeah i think um you know so finance firms are very serious about this sort of thing. Unlike in tech, one thing I was surprised by culturally is how tight-lipped people are in finance, even within a firm. There's things that you can and cannot share across teams. Or people might just want to be more secretive because they're protective of their alphas. And so if you know what they're doing, you can sort of re-implement a similar thing and like capture take over some of their alpha right because uh it what what makes it alpha is that it's you know secret if if more the more people know who know about it like the less profitable it's going to be for any individual and so um yeah i think it's you know quite secretive uh you know even even the firms that are you know have a reputation for being more open are actually quite secretive versus in tech you know people talk about things all the time and so it was a bit jarring for me returning it to tech and like hearing like people you know talk about what they're doing in like a you know very open way i was like wow like you're just going to tell me that for free so yeah a non-compete is like yeah probably the you know most notorious part of this is yes a lot of firms will um have a clause in your you know in the contract you sign at the beginning stating that you cannot work for a competitor for a period of time after you um after you leave the firm uh and this period of time is typically decided by the company when you when you leave but it can be um anywhere from well it can be zero uh up to like two years uh i think i've heard like even up to three years for for some places but i think that's rare i think the norm is i would say the norm is like you know six months to two years um and so yeah during this period you're basically just paid to not work um yeah it's called garden leave because i guess you know you you sit at home and garden or whatever um and it's actually like i mean it's actually a quite interesting thing you know it creates interesting incentives for some people because you are typically compensated quite well during this garden leave period so um it's not necessarily a you know it's not necessarily a downside for some people um and yeah basically idea is like you know um you won't you know leak ideas to your competitors and you know by the time your garden leave is over you know if you have some special alphas or trading strategies um you know, two years down the line, they're probably not even relevant anymore. So it doesn't even matter. You mentioned the secrecy within in quantitative finance. And I see a natural incentive here to kind of be hostile or I guess competing within the firm because, yeah, my alpha is my alpha. I'm not going to help you. Did you ever feel that or see stories of that? Yeah, no, It's definitely a thing. You know, people are, yeah, I think a lot of people are, you know, reluctant or even forbidden to talk about any details, basically, of what they do. You know, some people, you know, won't even, don't even, like, say what sector, you know, they work on, you know, at least across companies and stuff like that. So, yeah, I think that's definitely a thing. You know, some firms are set up where it's, like, basically pods. So, you know, one pod is just responsible for basically all of their P&L and then the firm takes a cut. And so, you know, different pods might be working on, you know, very similar things unknowingly, right, but they're not sharing any of the information. and there is some logic behind this because the idea is you want to have uncorrelated you know uncorrelated returns so if all the pods are like you know talking to each other sharing ideas you know chances are they're going to start doing very similar things and then you know that exposes you to risk where you know what if the thing you're doing is actually wrong and you know you can wipe out not just one pod but entire team of them whereas if people are working independently then you know that's that's not that's less of a risk. Earlier you mentioned the top one percent of AI researchers and quants are going to do extraordinarily well. I'm curious what sets the top one percent of AI researchers and quants apart from the rest? There are a lot of things. I think there are different ways to get to that point as well. It can be raw technical skill like some people are just really really good at what they do able to you know the prototypical like 10x engineer that kind of thing and they just have you know a better a higher level of intuition and or execution speed um stuff like that um you know of course there's politics involved like uh you know people who are better at playing the political game can um you know rise up in the ranks i think in quant you know one thing is that it is a bit you know it's harder to game the system because there are kind of hard metrics that it's easier to evaluate how someone does especially you know if you were alpha quant you know it's quite clear right like if you implement a strategy um and you make the firm a ton of money like that's obviously going to be recognized um i think you know in in ai you know it can be a little bit harder but of course i mean the analog might be like you publish like a seminal paper like you make a true breakthrough in the field um you know you you make you know the models much better than um yeah that sort of thing so um yeah i think it's yeah i guess it's probably similar to you know other domains it's just a combination of skill and like um you know playing playing the game and i think being in the right place at the right time has a lot to do with it you know both in quant in terms of uh seeing something before other people do and then like making taking advantage capitalizing on market trends and and turning that into a profit um or an ai you know like having the right idea at the right time When it comes to quant firms, I'm kind of curious. There's all these tier lists out there. What are the top firms and why? Rentech, like we talked about, is one of the sort of mythical firms in this space. You know, you can't really argue with their returns. The historical returns over like a 20-year, 30-year period, it's pretty insane. So, you know, Rentech is maybe, you know, the gold standard, you know, depending on who you ask. um then there are other firms like uh you know some of the slightly bigger ones like you know jane street citadel uh jump trading hudson river i think those are generally very well regarded firms um and you know having that kind of thing on your resume can definitely be you know an asset to to future quant rules and things like that so um yeah very good firms i think you know great technical talent uh great returns obviously um and then there are there are some like elite smaller ones, you know, similar to Rentech, like, you know, smaller, more secretive, less well-known, but still very, very excellent returns. So yeah, TGS is one, it's in Southern California. Yeah, XTX is another one, one of the newer firms. I think Radix is another newer firm that's, yeah, in that boat. So yeah. Are there any stories you working in the space that you think might be interesting you know finance firms do not mess around so you hear stories about like you know people are just doing you know dumb things like you know traders or quants having like an internal whatsapp group uh where they you know talk about strategies and they're like like uh so as a quant you have like trading restrictions you have to get all trades pre-approved and so of course if you're you know someone who works in equities or something you know you probably are not going to be able to trade those tickers at all uh but you know people try to get around it with their little whatsapp groups or whatever like telling their friends to you know buy these buy these stocks or something split the profits or whatever um if that happens and you get found out you know um they're gonna go after you you'll you'll get fired obviously there'll be lawsuits you can you can even go to jail uh yeah there's like a um a few stories about this uh because it is against the law. And so, yeah, heard more stories about this. That's one of the things they tell you about in training actually, is like, yeah, do not do this. Similar with like non-competes, you know, people going to competitors or starting their own thing or something and like getting accused of taking strategies and stuff like that Yeah all these firms have like elite legal teams and yeah just not something you want to mess with i heard that in quantitative finance that it's kind of intense sometimes or rather people may get fired uh very often did you ever have like you just were working with someone that kind of disappeared Yeah. Yeah. I mean, yes, that does happen. Yeah, I think it's interesting in quant because, like I said, yeah, comp is a function of many things among which is seniority. And so I think your job security can actually be kind of U-shaped because senior quants, like, you know, even if they're very good, they just get very expensive after a while because that's sort of what the market rate is for senior quants. And so, you know, even even a good quant, you know, can stop being worth it after a while. Whereas like early career quants, you know, you can you know, they might be very good and also not command, you know, as high of a salary. So, yeah, the job shape is a great not not you should like kind of like an inverse parabola almost. And yeah, I mean, people certainly get fired. It's a quant in general, I think, has a culture of, you know. well one is like up or out and two is like you know just trimming the low performers and again I think this can become you know especially easy if you're like more like on the alpha side like if you're just not making money like it can be pretty clear but in in generally even for like you know engineers yeah I think there is this kind of culture yeah traders of course it's yeah since they're you know making trades and stuff again it's like very easy to monitor. So I think that can be even more, more brutal. Why did you leave Citadel to join Cartesia? When I joined Citadel, it was partly because I was just interested in learning about a new problem domain and like, you know, learning some, learning some new stuff, you know, learning about finance in general, I think was also like kind of interesting to me. And yeah, I think, you know, I became more financially literate as a result of things and stuff like that like it was it was it was a great learning experience for me and I was kind of optimizing for like growth potential partially as well um but yeah I mean by that point you know I'd been at Sidel for you know a couple years and was um yeah I think um you know like it's it's sort of like your growth uh you know at most places will kind of like accelerate for a bit and then like sort of taper off but I think there was still a lot more to to be learned had I decided to continue on that path. But I saw what was going on in the field of AI. When I graduated, actually, it was right before ChatGPT came out. And so I think a lot had changed even since I joined Citadel. And I heard that the founders of Cartesia were starting this company. And for context, I knew all of them from my PhD at Stanford. They were actually all in Chris Ray's lab with me. I knew Albert pretty well. And so, yeah, I had tons of respect for them. They're great researchers. I worked pretty closely with some of them. Albert was a good friend of mine, knew the other guys. And so it just seemed like a great opportunity and a great time to get back into the field of AI when things were sort of taking off. And I thought it would be great in terms of personal and technical growth. Also, the opportunity to join a small startup was definitely something that was interested in me and kind of like shaped the company and the culture as one of the earlier employees. And so, yeah, it was really, yeah, I think, yeah, it was all about, all about growth, getting back into AI. And I think like, you know, there is like a definitely a different risk profile. I think when I graduated my PhD, I was kind of more like risk averse. You know, quant was like a, you know, stable, you know, lucrative opportunity that, you know, was the right choice for me at that time. now that I had sort of established myself a little bit gotten some of that stability I thought it was you know opportune time to take a risk. Cartesia I guess if you could give us some context on kind of the primary problem the company solving and just like what the company's about. Yeah we are a voice AI company you know our current mission is to build sort of the next generation of voice AI and a platform for that so what that means is we do you know our flagship product is text-to-speech. We also have products around speech-to-text, voice agents, and stuff like that. And yeah, I think we believe voice AI is the future. It's actually one of the fastest growing areas of AI. People are using voice AI in many applications, call centers being one of the predominant ones, but also a bunch of applications and entertainment, a bunch of companions, a bunch of different things. And so that's kind of the product set we're building. And yeah, in terms of why do we choose voice? So I think voice is actually a very interesting test bed for a lot of research ideas that we're exploring. So we also have a sort of research arm of the company that focuses on kind of longer term research around, you know, around long context, around multimodality, things like continual learning and memory, you know, test, time, compute, and in general, like, you know, or, you know, sort of overall, like even higher level goals to build real time, you know, systems that are truly intelligent, and that you can like interact with and that can learn from experience. And so I think, you know, building these, you know, voice agents, you know, speech to speech models and so on is, you know, it requires you to kind of, you know, solve some of these problems for the, you know, sort of eventual idea of like a kind of like always on assistant, personal assistant. When it comes to this voice AI space, who are the top competitors? So our main competitor is a company called Eleven Labs. They're, you know, another voice AI company, basically. And yeah so they I think had about a 18th month head start on us yeah I actually used to play with 11 labs you know long before Cartesia was ever a thing just like kind of make like you know fun videos and and whatnot and so yeah it's you know very very similar company I think you know where Cartesia stands out I think is you know we have sort of a focus on you know things like latency So, you know, low latency is really important for a lot of voice applications, you know, for naturalness, you know, like the conversation we're having now, you know, you can't afford to have, you know, you know, a second pause in between like each, each, you know, each turn of the conversation, you know, that just really breaks the sort of illusion and, and immersion. and so you know latency is really important for uh you know a lot of our customers um yeah you know we're continuing to try to push the boundary of sequence modeling and stuff to get you know better and better quality um without compromising on latency um and then um you know going into like more end-to-end systems as well so right now the way voice agents are um typically implemented is uh you have a speech-to-text system that transcribes some text uh then you feed this into a language modeling backbone. And then you have a text-to-speech system that will take the text that is output by the language model and speak the result. But this has a lot of problems in terms of latency, again, in terms of naturalness, because it's kind of not an end-to-end system. So there's a lot of loss in between each of these components and so on. And so, yeah, that's one thing that we're trying to build towards. But even right now, I would say, even if you just look at our text-to-speech products, I think, you know, we're definitely right up there as, you know, one of the leaders in the space. I think, yeah, you know, 11 wins on some languages. We win on some, you know, I would say we have like better voice cloning, things like that. So, yeah, we're trying to become number one in everything. But, yeah, I think, you know, like I said, voice AI is a very fast growing space. And so a lot of people are jumping into the space. But I think the pie is very large. What does it look like if Cartesia completely destroys 11 Labs? I think we already win in terms of things like latency in terms of cost. I think if we can conclusively win in terms of quality, not just for a subset of tasks, and not just for a subset of things, but there are many things that people care about for text-to-speech quality. There is just adhering to the transcript, so actually reading what is put in front of the model, which is you know can be surprisingly hard especially if you have different languages especially if you know special characters you know repetitions whatever uh you know all models struggle with this but there's also naturalness like does it really sound like a person saying this or does it sound robotic you know of course like people a lot of a lot of applications actually care about naturalness even more than uh just transcript fidelity um and then you know of course there are all the you know speed and and things like that um and then there are features like you know voice cloning accent localization so you know taking my voice and making it have a different accent things like that um you know controllability you know speed emotion um things like that and so um yeah like i said i think you know we have better quality in some areas maybe worse than in some others you know we'd like to get to number one and um you know in as many categories as possible right and so i think that's that's sort of the thing right like um you know switching costs exists even in AI. I think, you know, depending on the size of the customer, like some customers are reluctant to switch over from, you know, one thing to the other. You know, obviously startups can be more nimble, but, you know, when you're talking about enterprise scale, you know, this matters. But like if you are, you know, conclusively show that you're better in every way, then I guess like at some point it becomes hard to argue for not switching. I imagine you could have worked at a big lab, open AI, Dropik, et cetera. What's the main difference in working in an AI startup versus one of these big AI labs? Big labs have obviously amazing resources. You know, they have all the compute in the world, you know, tons of researchers and so on. I think one thing is that like the flip side of that is that I think big labs can sometimes be more averse to sort of out of the box ideas and a little bit more susceptible to groupthink or like sort of overarching trends in the field. and like less willing to take a risk on something different. And then, you know, that makes sense, right? Because, you know, with sort of these, you know, great resources, you know, there's a lot of cost to investigating new ideas that don't turn out well. Whereas as a startup, I think you're a bit more nimble. You're able to, you know, you're able to be a little bit more exploratory if you do it strategically and sort of challenged the orthodoxy in that way. And so that was one of the things, like I mentioned, that drew me to Cartesia. Albert has a lot of interesting ideas that I think don't necessarily go with the accepted grain. Around the time Mamba came out, people were kind of of the opinion, a lot of people were of the opinion that sequence modeling was kind of a solved problem and all you need is skill. like you just take the transformer recipe and you just scale it further and further um yeah i mean albert showed that you know that's not necessarily the case right with mamba um that you can actually get real advantages um in terms of in terms of things like you know efficiency computational efficiency but also like even in terms of just like raw quality um you know state space models can be advantageous for a lot of classes of problems or like things like hybrid models where you take some state safe model layers some transformer layers things like that You know, another more recent work that we put out at Cartesia was this idea of HNets where, so yeah, for context, the way that text modeling is usually done is you take, you know, you take, you know, raw text, you know, sequence of characters or, you know, UTF 8 bytes or whatever. And then you compress it or, you know, you represent it as these things called tokens, which are basically like little pieces of words or sub words. And then you run modeling over that. So it's like a two stage pipeline. uh you know we showed that um if you actually just go from the raw uh you know raw characters um and you kind of learn this tokenization you learn how to like draw these boundaries in between groups of letters instead you can actually get better performance um and so yeah that's the kind of thing you know i think challenging accepted ideas like that's the kind of thing that appealed to me for contact you mentioned state space models versus transformer could you just give a quick primer, I guess. Without going into too much, I guess, technical detail, you know, basically the main challenge of transformers is that the, you know, the memory that they use at inference time grows linearly with the sequence length, because what they do is like, you know, they will take each token and store, you know, a representation of it in what's called the KV cache, you know, the key value cache. And so as you, as your sequence grows longer and longer, like you're still storing all of this information in context in your memory and so for very long sequences you know this can get prohibitive both in terms of you know computational cost in terms of memory. SSMs are different because instead of storing everything into you know everything like in this uncompressed way they take that information and they compress it so the size of the state is fixed and so as a result like you're you know the cost of doing a certain step doesn't change with the length of the sequence. And the amount of information you have to keep in memory does not grow with the sequence length. And so kind of an intuition, our co-founder Albert Gu has a great blog on this, is that SSMs are kind of like a brain. The human brain also does not store an unbounded amount of context. It takes in information and it processes it and it keeps it in this fixed size state, which is our brain. Of course, you can simulate having an unbounded state, via use of external tools, like writing stuff down and so on. But the core primitive remains fixed Whereas transformers are more like a database where you can kind of recall anything in the context And so I think both of these approaches are complementary, right? And yeah, so, you know, we're currently exploring kind of, you know, extensions of that analogy. But yeah, I would say that's kind of the, you know, high level thing. Is the sequence just the input? So the longer the prompt, the longer the sequence, and therefore more memory consumption at inference. That's right. So the sequence is the prompt plus the response. So, you know, as the model is generating the response, you know, the context, you know, the context includes what has been generated so far, right? So you can refer back to what you yourself have said and you'll figure out what is, you know, what is the next appropriate token to say. And so this can get obviously especially large for multi-turn conversations where now the context includes like everything that has been said in the entire conversation up to that point. And so, you know, beyond a point, you know, as I'm sure we've all had experience with, you know, if you're chatting with these language models, you know, sort of ceases to be, you know, that useful maybe after, you know, 10s or something of turns. And, you know, it can be best to start a new conversation. But the challenge with that, and, you know, of course, companies are doing things to try and sort of address or bandaid this, you know, for instance, like ChatGPT now, like, say, of some like global context in between conversations and stuff like that. But it doesn't really truly learn from, you know, from your personal proclivities and preferences and like the things you've asked in the past. Like there is some semblance of this, of course, but I wouldn't say that it's like, you know, truly personal yet. And in terms of like an actual agent that is like kind of learning and growing every day. Yeah. You know, I've been using Cloud Code a bunch and I noticed occasionally it does this thing. It says it's compacting or something like that. I imagine it's taking the multi-turn conversation. I don't know what it's doing, just maybe summarizing it and restoring it. Yep. Yeah, there are all sorts of different ways to kind of compress the KV cache, either sort of mechanistically or kind of doing like a, you know, textual summaries or things like that. Yeah, this is a pretty active area of research as well. You mentioned that the state space models, they have a compressed representation of the KV cache sure. And so I'm curious, does that have a trade-off in terms of the quality of inference? Is it lossy? Yeah, so there are certainly trade-offs. I think depending on the task, so for very recall-heavy or fact-based tasks, pure SSM models can lag transformers because the the ability of transformers to do this kind of exact in context recall turns out to be very helpful for this kind of task whereas for you know for other tasks that don't require this type of thing um you know sms can can scale just as well or better as transformers even for like a fixed parameter budget um you know let alone the inference budget um you can you can uh kind of get the best of both worlds you know a lot of people have shown this um by doing a hybrid model So you just basically interleave state space model and transformer layers with, you know, with some ratios. And so, yeah, NVIDIA has put out stuff like this. Even the QN, you know, the latest QN models follow the strategy as well. So, yeah, I think, you know, the cutting edge, I would say, for text is probably in these hybrid models, at least in terms of like what's out there for open source. But the interesting thing is that for other modalities like audio, it actually makes a lot of sense to have this compression as an explicit inductive bias. So using SSMs for audio has proven very useful for us. We found that it actually improves performance. It's kind of almost a free lunch. You get improved quality and improved performance at inference time. And the reason is that sort of like, if you think about what these models are doing, you know, audio is, you know, depending on how you represent it is a very like, you know, there's, there's very little information contained in any one, like, you know, time step or token, if you will, of audio. So it's like a frame of, depending on what you're doing, like 10 milliseconds, 200 milliseconds. And so one frame to the next doesn't really vary that much. And so compressing these into sort of fixed size state can actually makes a lot of sense, as opposed to text, which is a much sort of densely informational modality. One word to the next, there is actually a ton of information contained in each of those tokens. And so compression is less, it's kind of already pre-compressed if you're using a token level representation. But yeah, even so, I think hybrid models, I would say hybrid models, I think, are the future in that regard. I see. Okay. So it's because the modality itself has, I guess, redundancy in the data that means that this lossiness is actually an asset rather than a problem. Exactly. Yeah. So, yeah, I think, you know, there's a lot of interplay between modality and architecture. It's definitely not something you cannot design your architecture independently of your data. And so, yeah, kind of this like, you know, co-design and like thinking about, you know, modality, multi-modality from a fundamental level. You know, this is one of, you know, the research problems that I mentioned that like kind of drives a lot of the work we do here. When you think about companies that focus on product versus research, what pattern do you think is most effective? um so i think you know personally and and this is one of you know also one of the reasons i decided to join cartesia i think it is very important to have uh have both um i think like so there are you know several startups popping out recently that are really um you know focused on core research and not don't even necessarily have like an idea how to productionize it or or turn that into uh you know a product a product or revenue stream um i think like you know i Personally, I'm fairly skeptical of this approach. I think, you know, for a few reasons, I think, first of all, you know, you know, big labs, you know, have tons of resources and also have, you know, large teams focused on this sort of thing. I think, yeah, I think like, you know, ultimately the goal of a company is to is to make money. Right. And so I think, you know, eventually, you know, if you are if you are a company of this form, like you need to eventually deliver like, you know, massively outsized returns, you know, at some point. And so I think you're taking a big risk where it can kind of be all or nothing type thing. I think the flip side of like a sort of product only company that's built on AI models that are built by other people, I think that is like risky in the sense that you don't have as much of a moat. So, you know, like we saw this with, you know, the initial chat GPT or, you know, going from GPT-3 or GPT-4, right? a lot of these wrapper companies kind of just got made obsolete by the fact that the base models improved so much that they could often just do what the wrapper was trying to do by themselves without very much scaffolding. And so it became kind of thing you can just build in-house rather than needing another company to post-process the output of these models. I think being in intersection is actually quite valuable for um you know for many reasons i think having a product a real product that customers use is something that can drive uh the research um so you see firsthand the issues um and you can use that to drive um you know your next iteration of modeling um you know try and fix these issues not as a band-aid um but like you know from the ground up right like from um at the model level itself uh and so i think having control over the models is like very important uh when you're uh when you're building an ai product um which is not to say that like you know there's no room for any non-research company i think it just like um it has to be in the you know right uh right kind of right kind of space um and so yeah i think um cartes has a great blend of research and product uh you know we're very i would say we're first and foremost a product company. But, you know, we want to build the best products we can. And we believe that that requires us to actually solve some of these fundamental research problems in order to do that. I think there's a lot of people who want to get into AI research. I mean, I was just talking to a friend today who's a SWE and he's saying, I don't think software engineering is going to be around in N years or something like that. So he's been investigating. And I'm curious, Do you have any advice for someone who is technical and wants to move into AI research? My philosophy has always been to try and build up my technical skills as much as possible. I think if your fundamentals are good enough, at some point, the opportunities will just come to you rather than the other way around. And so I would say just focus on getting as good as you can at coding, at AI, read tons of papers. Yeah, I think math skills and math intuition are really important. And so that's what I've kind of been optimizing for, you know, ever since undergrad, when I realized what I wanted to do was at least, you know, some combination of math and computer science. And so I've always more focused on like kind of building up those fundamentals. And I think that is the way to get your foot in the door. I think like bigger companies, it can be a bit harder to pivot, you know, teams or what you work on. And so for for someone like that, I think switching like teams or companies can can be like, you know, sort of the only path forward. I think you can kind of get siloed in a little bit if you're at a bigger company sometimes. Although I do think, you know, some companies are better about it. And, you know, I have seen people transition from SWE's to research and stuff like that. um so i think uh you know this is one of the areas where getting a sort of qualification on your resume can be useful like getting a you know master's in ai at least or something like that can help uh when when you're looking to make a sort of lateral career change like that you're you're saying there's kind of two common paths one would be get more education and use that qualification to kind of pivot directly into AI research or go to a startup where you can kind of like mold yourself into an AI research role? That's kind of right. But I think even if you want to go to a startup, right, and you want to, but you want to sort of switch from a SWE track to AI track, like there's got to be some, there's got to be something behind it, right? Like you have to have some evidence of a skill set, whether it's like sort of, you know, organically grown or from, you know from from schooling but I think it can probably be a lot easier to get your foot in the door if you have some evidence of it on your resume so like let's say you were you hired at Cartesia and then that person comes to you and he's like hey I want to do more AI research in that case is that something where it's like just flip the switch and next project is AI research project this has actually happened you know in cartesia itself you know we have had people transition uh roles like that so i think uh it is definitely easier at a startup which is uh you know it can be a bit more flexible just because you know everyone kind of knows everyone and so um you can get a sense of you know whether this might be an appropriate career change just by like kind of knowing the person for a while and so yeah i mean we've actually done uh you know people have done this in Cartesia with, you know, a lot of success. Do you have a biggest regret when you look back on your whole career? I mean, I think I often overthink things and I think I have spent a lot of time regretting, you know, past decisions that turned out not to matter in the end. And I kind of regret the amount of time I spent regretting other things. So, you know, I try to learn from that now. You know, I think like, you know, don't sweat the small stuff. Like, you know, minor setbacks happen and And they happen. But I think, you know, there's a risk of, you know, putting too much stress on yourself and you're like beating yourself up and stuff like that. And those are just like not productive ways to spend your time and they don't make anyone feel good. And so I think, yeah, I try and, you know, I try not to regret stuff because I think it's just not not a super good use of time. If you had to go back in time and you could give yourself some advice when you're just entering the industry, what would you say? Focus on building the deep technical skills. Don't waste time with trifling stuff or spreading yourself too thin. Focus on what you want to focus on, I guess. Basically, the skills that you want to leverage in your day job, just do those and get good at those. That's where you should spend all your time at work. You make it sound so simple. maybe it is it is it is a simple it's kind of a simple recipe that's very hard to follow right like it's very hard to maintain that discipline uh it's kind of like you know um you know what's secret to being healthier it's you know exercising eating right and those are things that are just very much easier said than done um but yeah i think that it is it is that simple awesome cool well yeah thanks so much for your time nimit thanks for listening to the podcast i don't sell anything or do sponsorships but if you want to help out with the podcast you can support by engaging with the content on youtube or on spotify if you want to drop a review that'll be super helpful and if there's any guests that you want to bring on to please let me know i feel like sourcing very senior ic's there's no well studied list out there on google that i can just search this up so if there's someone in your org or at your company who you really look up to and you want to hear their career story, let me know and I'll reach out to them.