How this visually impaired engineer uses Claude Code to make his life more accessible | Joe McCormick

49 min

•Feb 16, 20265 months ago

Summary

Joe McCormick, a visually impaired principal software engineer at BabyList, demonstrates how he uses Claude Code and AI tools to build custom Chrome extensions that enhance accessibility and productivity. The episode showcases practical examples of personal software development, including image description tools, spell checkers, and a link summarization extension built live during the podcast.

Insights

Personal software ROI has collapsed due to AI efficiency—tasks that previously required days to build now take 30 minutes, making automation of recurring friction points economically viable
Accessibility features built with AI assistance often exceed baseline standards when explicitly requested, as foundational models have well-documented accessibility standards (ARIA roles, live announcements) to follow
Keyboard shortcuts and consistent terminal UI patterns (e.g., 1=yes, 2=maybe, 3=no) dramatically improve both accessibility and general user efficiency in developer tools
MCPs (Model Context Protocols) enable users to bypass problematic SaaS UIs entirely, converting complex interfaces into accessible markdown or structured formats for easier navigation
Multimodal AI capabilities are democratizing previously inaccessible experiences—reading books to children, understanding images without assistance, and processing visual content independently

Trends

Personal software and micro-automation becoming primary use case for AI coding assistantsAccessibility-first design driving better UX outcomes for all users, not just disabled populationsTerminal-based AI development tools gaining accessibility improvements through consistent patterns and audio feedbackMCPs emerging as critical interface layer for bypassing enterprise SaaS accessibility limitationsMultimodal AI (vision, audio, text) enabling equitable access to previously gatekept experiencesChrome extensions as lightweight alternative to desktop app development for power usersClaude Skills as reusable patterns for accelerating iterative personal software developmentScreen reader compatibility becoming table stakes for developer tools and AI interfacesMarkdown-based workflows as accessibility-friendly alternative to complex visual interfacesReturn on investment for automation collapsing, shifting incentive structure toward building vs. doing

Topics

Chrome Extension Development Claude Code and AI-Assisted Coding Accessibility in Software Engineering Personal Software and Micro-Automation Visual Impairment and Technology Adaptation Multimodal AI Applications Screen Reader Compatibility Model Context Protocols (MCPs)Claude Skills for Code Generation Keyboard Shortcuts and Terminal UI Design Image-to-Text Processing Voice-to-Text Dictation Prompt Engineering Techniques Developer Workflow Optimization Accessibility Standards (ARIA)

Companies

BabyList

Joe McCormick's employer where he works as principal software engineer leading AI enablement

OpenAI

Provides API used in multiple Chrome extensions for summarization, spell-checking, and content analysis

Google

Mentioned for Gemini app's multimodal capabilities for reading books and Google Docs accessibility

Anthropic

Creator of Claude and Claude Code, primary AI tools used throughout the episode for development

Meta

Meta Glasses mentioned as accessibility tool used by Joe for personal life enhancement

Notion

SaaS tool discussed in context of MCPs and accessibility challenges requiring markdown conversion

Slack

Primary platform where Joe built multiple Chrome extensions for accessibility and productivity

Harvard

University where Joe studied computer science after losing his vision

People

Joe McCormick

Principal software engineer at BabyList with visual impairment, demonstrates AI-powered accessibility solutions

Claire Vo

Host of How I AI podcast, product leader and AI enthusiast conducting the interview

Quotes

"The gap between a software engineer for a sighted person and a visually impaired person is closing day by day."

Joe McCormick

"Before you'd have an idea like that, you'd be like, this is going to save me like three minutes a day, but it's going to take me three days to build. Now it's like, it saved me three minutes a day and it takes me 30 minutes to build."

Joe McCormick

"I would love to do everything in one place and not have to switch tools. Having MCPs has been great for that."

Joe McCormick

"Your son being like, I want to read this book and you having to be like, sorry, I can't. And now that sorry, I can't become sorry, I can with the assistance of so many different tools now."

Joe McCormick

"When the final output is just a user of one, the code quality is a lot less important. What matters is does it work? Yes, exactly."

Joe McCormick

Full Transcript

Right before I started college, I ended up losing most of my central vision due to a rare genetic disorder called Lieber's hereditary optic neuropathy. I was talking with someone who was losing their sight recently from the same disease, and they were asking about different things. And I was like, oh, you can just do all of that now. But Gemini or JAT-TBT, the world is a whole lot easier. So you're going to show us some of the things that you've built for yourself. So when someone sends me an image, I use this tool to be able to get the gist of an image without needing to ask somebody to explain it to me. If I hit control shift D on any message, it's going to pop up and go off and describe that image for me. And the cool thing is I can go ask some follow-ups. What age child is this for? And we'll head off to Chachupichi and get the response for this as well. I'm curious for you, what are you most excited about in the multimodal world of AI? One thing that I was always afraid of, can I read stories? I can memorize stories, I can tell stories, but your son being like, I want to read this book and you having to be like, sorry, I can't. And now that sorry, I can't become sorry, I can with the assistance of so many different tools now. Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive here on a mission to help you build better with these new tools. Today, we have Joe McCormick, principal software engineer at BabyList, who has a vision impairment, and he's going to show us how he uses AI to build micro Chrome apps to make his everyday life and work a lot more accessible. You're going to learn how to use Cloud Code to write Chrome apps, and you're going to be inspired at the little things you can do to make your own Slack a little bit more efficient. Let's get to it. This episode is brought to you by Tynes, the intelligent workflow platform powering the world's most important work. Business moves faster than the systems meant to support it. Teams are stuck with repetitive tasks, scattered tools, and hard-to-reach data. AI has huge promise, but struggles when everything underneath is fragmented. Times fixes that. It unifies your tools, data, and processes in one secure, flexible platform, blending egetic AI, automation, and human-led intervention. Teams get their time back, workflows run smarter, and AI actually delivers real value. Customers now automate over 1.5 billion actions every week. Tynes is trusted by companies like Canva, Coinbase, Databricks, GitLab, Mars, and Reddit. Try Tynes at tynes.com slash howiai. Jo, thanks for joining How I AI. And I want you to spend a little bit of time introducing yourself and your story and how AI has impacted your ability to do work and build interesting things and engage in lots of awesome projects. And what's different about your life now with AI versus before? So, yeah, my name is Joe McCormick. I'm a principal software engineer at BabyList. And I think I took maybe a little more interesting journey than most into the computer science world. So right before I started college, I ended up losing most of my central vision due to a rare genetic disorder called Lieber's Hereditary Optic Neuropathy. And so before starting at Harvard, I was more interested into the mechanical world and kind of robotics and everything in that space. and then found that that was a lot harder and doing things with my hands where it's becoming a lot harder month after month. And so I took the intro to computer science course at Harvard and immediately fell in love and found that I got the same feeling of creativity and being able to come up with the idea and make it happen. But now I was on maybe not a full equal plane to my competitors at that time or my other students. But then obviously as AI took off, became even more equivalent and the gap between, I think, software engineer for a sighted person and a visually impaired person is closing day by day. And also in my personal life, I think it's even been extra impactful. I was talking with someone who was losing their sight recently from the same disease and they were asking about different things. And I was like, oh, you can just do all of that now with sharing your stream with Gemini or JAT-GVT. Whereas when I was first losing my sight, it was using different magnification tools or or even like glasses and things. And it's like, no, the world is a whole lot easier. I'm an avid meta glass user and different things make my personal life a lot easier as well. But yeah, I do lots of AI product engineering now. And I, at BabyList, lead the AI enablement in trying to make sure all of our software engineers can build AI as productively as possible at all different parts of the software development lifecycle. So you figured out a way, one, to adjust your interest in engineering to something that's a little bit more accessible for you. And then to lean into how these AI tools can really increase the accessibility and user experience of supportive technology that you've maybe used in the past, but that you've been able to make better yourself. And what I love about this personal software moment that we're in right now, which is unfortunately accessibility software and custom software that meets the needs of a lot of people, is simply in some instances not an economically viable business, for example, to build. And so in the kind of broader economic world, there's not a lot of incentive to build a full set of robust tools that can meet the needs of everybody who deserves to have their needs met. and needs met. And what I love about what we're able to do now with AI is not only are more interesting sort of accessibility tools and platforms being able to be built, but people can build these solutions for themselves and they can be very customized to your experience, your needs, your strengths. And I think that's a really underappreciated benefit of AI. And so you're going to show us some of the things that you've built for yourself. And you're actually going to walk us through your coding flow, which I think is really awesome on how to build one of these tools. So we can follow a step by step. For sure. So yeah, we can jump right in. I'll show off two that I built myself ahead of time. And we're going to do one on the fly. And I do think personal software is going to be huge. One reason why I like building some of these, so I'm going to show off a couple of Chrome extensions that I've worked on. And one thing I like about building some of these as compared to maybe some of the offerings we have today from the AI native browsers is AI native browsers are great. Like I do use Comet, but it's, it's using the, the Swiss army knife and it does everything. But in order to do that, some of its processes are definitely slower, which there are certain things that's totally fine for, but there's other steps where you want it to be really quick and you want to use the drill instead of the little tiny screw driver that came with the Swiss army knife. And so I'll show off a couple that I've already built and then we'll jump into one that's not necessarily even accessibility only that I think everyone could benefit from. So I'm going to hop into Slack now. BabyList where I work is a big Slack company so lots of my stuff is very Slack based because it's where I spend a lot of my non-coding time. I'm going to hop in here and I'll share and show a couple ones I've built so far. There's a little temporary Slack channel that we have for this with some example messages that were actual messages sent by my colleagues. I just have them resend it into here. So first one I'm going to demo is a image description tool. So when someone sends me an image, I use a screen magnifier. So I typically am looking at my screen at about 10x zoom, but it's not the easiest to do. And I prefer to not have to always be paying attention to that if possible. So I use this tool about to show off to be able to get the gist of an image without needing to ask somebody to explain to me or me have to initially use my eyes to do it. So I have a shortcut in Slack. If I hit, I'm on Windows, so Control Shift D on any message, it's going to pop up and go off and describe that image for me and tell me a little description of it. So I can see, hey, it shows a modern infant baby stroller with a car seat. It's got a canopy. It's got all details about this. And the cool thing is I can go ask some follow-ups. So I can say, what age child is this for? and it will help head off to Chachapichi and get the response for this as well. And so we can get our answers there. And it's just a nice way for me to not have to push back and work with people and get some answers to my questions as I go on this too. And I think this is something that folks don't really appreciate, which is for you, you know, you have the ability to zoom in, look at this, but it's just from a time perspective, probably a lot more tedious for you to do. And so folks have thought a lot about image to description in terms of generating metadata for e-commerce sites, which I'm sure you all think about a lot. Or we had an episode with a documentary producer that highlighted using image to text descriptions plus metadata to organize archival footage a lot easier. But this is a great example of image to text being just a much more efficient information transfer method for someone like you who might need to parse this information differently. And then what I love about this is you could have just done the image description, right? Just what is this image and tell me. But you actually were able to go that next step and say, great, if I need to query more information about this to understand more context, you make that really, really easy. So I love, I just, I love this example. And you, you built this all yourself. Yeah, this was probably 25 minutes of a cloud code session. It was, it was pretty straightforward. Awesome. A flavor of this one I'm working on right now is a version that works in Figma directly as well, that given any Figma node will explain it to me with a much different prompt, right? In the Figma case, I want to hear about the colors of the CTAs. I want to hear about all this stuff because I am a full stack engineer. And so you can get all that out of Figma, but it's lots of clicks and lots of different steps. And being able to just hit one keyboard shortcut and find out what this design is really accomplishing is going to be a nice, easy win for me. And that one is just about done as well. It's got a little bit of bug, so it's not ready to fully demo, but that's one that I'm excited about for as well. So before we go into building one of these, are there a couple other extensions that you've built just as inspiration for folks watching or listening that you think are really interesting to show? yeah another one that's not necessarily accessibility focused um but it's at least a cool one um so i uh i'm not the best typer in the world i i don't even think that's my vision's fault um i just i i think i'd like to think i'm a touch typer but i think my brain goes faster than my finger sometimes so i have one that i built that is just like a really easy spell checker there's lots of tools that do this grammarly and all this but similarly they're not all screen reader accessible they're multiple clicks away sometimes um and so i built one out that works in any input field in on the web. I'll demo it here. I'm gonna say test testing typos in the message. And then if I hit control shift s here on this one, this is going to go off, send that off to open AI and come back with with that. And while it was doing that for me on my screen reader, it said like processing spell check spell check complete. And so I know when I'm writing a a message, I don't need to necessarily worry about all the polish on it. I can just do that. I hear spellcheck complete. I'm ready. I go off, hit that and send it off to people. And I have a prompt there that's basically like, do not change any of the words, just fix typos, like really hyper focused to make sure that it's the content that I wrote, but just what the typos corrected in. So I'm leaning in and smiling a lot. One, because if you've been watching how I AI, you haven heard about my fancy nails I a fancy nail girl And with these fancy nails I cannot type anything It is all typos And so this is such a great little workflow that you built for yourself that I going to steal The two things I want to call out for people who are watching or not watching the details is you're actually running Slack right now in Chrome. So at first I was like, wait, how are all these apps interacting with the Chrome desktop app? But you're running Slack in Chrome, which means that these are all extensions that are available to you to interact with content in Slack and make modifications via a Chrome extension. So I think that's a really interesting hack for folks that are like, okay, I can't hack my way into the desktop app, but I can load Slack in the browser. And then on top of that, add a browser extension that can do these interesting things for me. The second thing I love is how you're using so many keyboard shortcuts to trigger these micro apps. And again, this is about efficiency. I always say in these AI products, that latency is the killer feature. And so anything you can do from a UX perspective or from a performance perspective to make these little apps more efficient, the better they're going to feel. And so I love that you, you know, type a couple keys and you get a fully corrected sentence here right in your browser. It's a great idea. Yeah, the first version I had of this was just using the like chat GPT alt space shortcut that would open up that little like mini chat GPT window. And I had like a saved just custom GPT to do it. And then I was like, well, why am I jumping out of where am I actually working? I can save two steps in like three clicks here. And so I think it's almost like piloting it first without doing this and then realizing like, oh yeah, there's a better way. And I think one thing about peripheral software is the return on investment became so much faster. Like before you'd have an idea like that, you'd be like, this is going to save me like three minutes a day, but it's going to take me three days to build. So the payback period is just not totally there. And now it's like, it saved me three minutes a day. And it takes me 30 minutes to build. Like the payback period has just become insane for a lot of this tooling. I love it. So let's build one. I want to see what your flow is for actually building one of these things. Yeah. So before we build it, I'm going to talk about what I want to build. So one thing that comes up a lot in the babyless Slack world and probably in many other companies Slacks is people send links all the time. And for me, I often just hit like the save for later button. and then maybe at the end of the week, I like decide if I want to read them. But I've realized that like, I mean, it's the do the thing that takes you one minute in a moment instead of keep deferring it. I think it would be great if there was an easy shortcut where I could have an AI go off and fetch this article, give me the key takeaways. And then I'll decide, do I want to actually do a full read and save it for later? Or do I just skip it in the moment and have that all work in under five to 10 seconds? I think it's a much more powerful than deferring and having this big to-do list at the end of the week when I want to catch up on all these messages. So you're going to show us how you built this. And what I have to say is all of us are so overwhelmed with so much context and links and docs. And yeah, I would see something like this, whether it's a partner or a competitor or just something somebody found that was interesting. And you want to go, oh, yeah, I should definitely read that. But should you? Should you definitely read it. So this quick summarization is a great idea. And so you're going to walk us through how to actually code this up using Cloud Code, I believe. Yeah. And I'll jump in along the way with a couple like Cloud Code tweaks that I've made or at least lean into to try and make it a little more screen reader accessible. But again, I similarly think that lots of things that are good for Google screen readers probably are going to be good tools for everybody across the board. Great. So let's jump right in. So I'm going to switch over to my terminal for just a second. So I can initialize our project. So I'm going to run a make dir command to get our repo set up. So we have our Slack summary extension. I'm making that directory and I just have open up quad code quick on this, which I open up BS code on this. So BS code opening and initializing here. I'm going to open it up as big as I can. and we are going to jump in as this finishes loading. So I'm going to start here by making the PRD like every good How I A Podcast goes. That's exactly right. And I'm going to do this with audio. So sometimes I'll use Whistler Flow for what I'm doing, but in this case, I actually find the VS Code Copilot audio to be pretty good. And so if I'm just doing something kind of quick like this, I'll just end up using the Copilot integration. so you'll see i can do uh ctrl i and then when i hit ctrl i again it's going to dictate so i'm not gonna i'm gonna pause my talking and switch in that mode and i'm just gonna dictate out for us a little bit of this prd and then see what it comes up with for us we want to build a simple prd for a locally run chrome extension whose job is it to exist in slack alone and when focused on a Slack message, you can hit the keyboard shortcut, control shift one, and it will search that message to find any external links. If there are external links found, it should open them up in hidden tabs, extract their content, and send it off to OpenAI to summarize. And we'll see here, I just finished that. It's going to go off and quickly generate a small PRD for that. This doesn't take very long And I'll, I'm going to accept all the changes because I find reading in that diff view to be particularly painful with a screen reader. It's not like terrible, but it's much easier just to read it in the document. And since there's a new document being made, I'll now look at it here and we'll see how this looks. So we have our goals. We want some privacy security. Makes sense. That's the user stories here. There's some functional requirements. It's got a parse. It's all is making sense so far. It's a non-functional and some out of scope. There were images. Yep. I already demoed my image processing and so open questions. All right, where and how should we summarize? Makes sense. We don't need success metrics for this. It's internal. So let's answer some of these questions. So let's just select it all and add it. I'm going to dictate again. So I can make for a while and I'll start. We want to build a very simple PRD here for a locally run Chrome extension. Where the job is when in Slack in Chrome and hovering over a message that has focus, you can run the keyboard shortcut, control shift one, and that will look for any external links in the message. If any are found, open them up in hidden tabs and extract that content and send it over to OpenAI. When in OpenAI, we should summarize them and extract three to five key takeaways from the article and return those to the user in a fully screen reader accessible modal which includes the article's title and a link out in a new tab to view the article. Okay. We now have this generating the PRD. So we can see, talk to our keyboard shortcut. It's got our goals. Minimize user effort. Extreme accessibility is key for this. Some basic user stories. Some functional requirements. Cool. This looks good. And one thing I have to call out here is you're an engineer, and that was a pretty good product description. And that resulted in a pretty good PRD. So one of the things I like about AI is, as I say, there are no lanes. If you are an engineer and you have an idea, you can write a very good PRD using a little bit of AI assistance. If you need a tip, I know one or two tools that can help you with it. Yeah. And this is no custom PRD prompts or anything like that. This is just mostly the foundational models work here. Yep. Cool. So now we're gonna hop into a new tab, spin up clod code in here. And ahead of this, as I mentioned, I built a couple of these Chrome extensions. So because I've done a few of them, I ended up building out a clod skill to help me build more Chrome extensions. So after I built the first two, I had clod look at both and figure out what was the patterns that were common across them and work on a skill. So I could just build the third, the fourth, the fifth in a much simpler version and extract out that common piece. I have found Claude Skills to be a mixed bag in terms of actually being picked up automatically. So I am going to explicitly say like use the skill, but technically you're not supposed to do that. But I definitely found that that's buried in its approach. So I'm going to make a prompt here. It's going to more request the skill. And for folks that are wondering how to set up their own skills, we do have a How I A episode. It is Introduction to Claude Skills, where I explain that Claude Skills are files in a folder. Sometimes they're zipped files in a folder. So if they seem mysterious to you, go check out that mini episode from, I think it was October or November and learn how to make your own Claude skills. They're pretty useful. Yeah, so in here, the prompt's just going to be, so at first, I mentioned the PRD, so we have that. And we're going to say, use the Claude skill for creating Chrome extensions to build out this PRD. And one thing, Cloud Code has added this feature where you can edit a prompt instead of just in the terminal in a code file. And so in Cloud Code, if you hit Control plus G, it will open that prompt in a text editor. So especially for me, where navigating that terminal is not super screen readerable. friendly i now am navigating it in the same place that i write code on a day-to-day basis which is very screen reader accessible and so again other people may find this to be useful you can craft deeper prompts you can like control f in here you can do whatever it's just a file and so i think it's a really useful tool they added a few versions ago to make it a little bit easier to work with and i'm just gonna put a note here um use my open ai key from my shared chrome extension And the config. I've had some, I don't want to keep pasting my open AI keys and stuff like that. So I end up pulling out some shared config to share across all my Chrome extensions. That way I don't need to rinse or repeat that step over and over again. So I save this file now. And now whenever I close this, it's going to replace my prompt in Cloud Code with that completed prompt. Oh, interesting. Yeah. So it's super effective, especially as you want to do a deeper prompt to use this and not have to worry about the whole terminal side of things. Cool. Now we're going to kick this off. And so I do have this cloud code session right now in planning mode. You'll see it's requesting use the skill, which is great. We want to use our skill. I'm going to shrink this terminal so we can see more clouds. I don't really need this as much. Okay. It's going to run some commands that actually pull in that cloud, that problem extension config that I talked about. Another thing I have to make cloud code a little more accessible is, again, I'm not necessarily seeing everything that pops in as it's going. and so i set up a clod hook that whenever clod needs user input it will um basically like ding a bell on my computer so i could hear a sound that is like oh joe you need to do something right now to work on this i actually don't want clod code to read this file so i'm gonna say no because this is is some secret configuration so i'm gonna say no don't read that file uh now that has a api key and don't need it. If you need you then use JQ to extract the keys. So again, luckily I am an engineer. So it's not like fully vibe coding this from scratch. I know that there's a utility that will extract just the keys out of a JSON object. And Claude won be able to see the values which is the actual secrets there So now here we are it just going to make a symlink to this extension for us So we don need to worry about the configuration And if I ever do change it, it will automatically update in all my extensions. So instead of having each one of their own API key and my key becomes invalidated for some reason, having to update all of them, I use this concept called a symbolic link. So just link the same config file into all my extensions. So one change fixes everywhere. Yeah. And this is one of those things that's just easy to do when you're running these things locally or just building stuff for yourself is you just make the maintenance and the maintenance and deploy of these really easy for yourself and make it as simple as possible for you to repeat building things and using the same, for example, API keys. And, you know, when you want to share all these publicly and publish them to the Chrome extension marketplace, we can do a little cleanup. Exactly. So here's our plan. And just like before, Control G also works to edit plans in the editor. So similarly, this is going to be a pain to read in this terminal for a screen reader. But also if I want to make a tiny tweak to one thing here, I don't need to worry about telling Claude to update it or write it to a file. I just hit Control G, opens it up in this file. And now we have our full plan here and you can just tweak different parts of it if you want and modify it. So it's another great usage of the Control G shortcut. Yeah, I want to call this out for people because so many folks would get something like this. And then if it was wrong, kind of say, no, this is wrong. Please update XYZ or ABC. And, you know, you're calling out not only is that a pretty inefficient way for you to interact with this file in terms of accessibility and your need for a screen reader, but it's also just not the fastest way to give it feedback. And so your ability to just take this, move it into code, use this control G, I believe, edit it, close it out, run it is just, again, a lot more efficient. Yeah. And so we'll do a quick one through this plan. I'm using, again, some more keyboard shortcuts just to break it down and fold the markdown headings. So I can, from my perspective, it's hard for me to visually or I don't visually scan the page. So reading through a big file, I typically in code or markdown rely on folding so I can collapse different sections and read them and then expand only the sections I care about. So I don't care about certain aspects here. But like maybe I want to get deep into the error handling piece. So I'll expand that section and just read this part. So we got some just logging, some key patterns. But this plan generally looks good to me. So now I will just save this and again, close the file. And that is what is over here in the prompt because I didn't modify it. It didn't take any time. It was just ready. But if I modified it, it would take a split second. We'll load that new plan in and then it moves forward. Yeah. And I just have to call this again out again for people who are maybe listening to the podcast or again, are not paying attention to what it means to use a screen reader here, which is you got your little your little headphone plugged in right now. And I am so impressed that you're using the screen reader while walking us through this demo. And what I think is so fascinating about watching your workflow here is it's super efficient and very fast, even if you don't take into account you're using a screen reader. So the fact that you've been able to build these shortcuts, these tools, use Cloud Code in a more effective way, and then you add on this layer of and it makes using this kind of screen reader a lot more accessible to you is just very impressive. And I don't want people to miss that there's this invisible layer that we don't get to see or hear right now that you're also putting in between this, which adds a little bit of micro friction. yeah and i think one thing that's great about cloud code uh so like right now um visually one of the options is selected in blue i don't necessarily know which one it is and using the arrow keys it does not tell me with the screen reader what's selected but cloud code has done a great job of standardizing where one means yes um two means often yes but like manually with like a variation and then three is is like no or or type something extra and so i can And I can basically, instead of using the arrow keys and enter, I can just be like, yeah, I want to just move forward here. I'm sure I hit the number one. And so they've done a good job of using, I think, lots of different inputs and lots of different ways to make this a little more accessible as it goes as well. Yep. And so that consistency, again, maybe this is for folks that are building AI products and trying to reinforce workflows, especially for folks that are building maybe these terminal UIs that I think are really lovely and interesting to build, is you want, you know, people love the terminal because it's so fast and you want it to both be performance fast, but you also want it to be UI UX fast, which is if your user always knows one is X, two is Y, three is Z, then they can consistently use these keyboard patterns of one key or two keys to efficiently get through your UI. And I think, you know, taking that mental friction, that cognitive friction off a user by driving consistency and patterns that people can either explicitly or implicitly learn is a really useful tool when you're using UI that is more constrained. Because again, a terminal UI is naturally constrained to basically text. Yeah. And Cloud Code has been working on and has released a BS Code extension that is more of a GUI. I've just found that it's a little bit lacking behind some of the latest and greatest features. And I'm like, I want everything immediately. So I'm able to be spoiled. But I think that's going to catch up as we go to and maybe a potentially more screen reader accessible option for some of these things. Yeah. And again, we love a beep boop at the end of an agent completion. So I love I love the cursor sound. I love that you're using one here for for cloud code because again i'm presuming with your screen reader you're not going to read this whole stream of what no that's too much i think it's a big difference too between like vibe coding things and um like production quality code right the final output here is just for me it's gonna run in my chrome i don't really care what the code looks like at all versus when i am building software for my my uh full-time day job and actually building stuff that's going to be in the hands of millions of users and many developers, I do do things a lot differently. The plan I'm going to read very detailed what it's going to actually do. The code I'm going to be reading, I'm going to be doing smaller commits and reviewing it kind of chunk by chunk and getting much more detailed. In this case, yeah, when the final output is just a user of one, the code quality is a lot less important. Yeah, what matters is does it work? Yes, exactly. And does it work in a little bit of like, did you leak your API key? Those are the two things we want to worry about. But other than that, we are on our way. And the other thing that I think is really fun here is because you've built, I think the idea that you pick a platform or a framework for a set of your personal software, and then establish best practices through a skill, and then just rinse and repeat for other use cases is a really good way to get super fluent with some of these AI tools. And so I've seen a lot of people say, all I do is build markdown based repos for my documentation and everything else. Everything I build is just a markdown based repo. And then I've gotten really good at using cursor for this. And then you know, you have this example of every, you know, not everything, I'm sure, but like a lot of what I build are going to be Chrome extensions. So I'm going to make, you know, this framework, get it going. And then I can get really good at cloud code because I'm not relearning a technology and on top of relearning a tool. And so I do think it's, you get compounding effects by staying in the same technical space when you're trying to learn these by coding tools, because you're not trying to learn, you're not trying to learn on two fronts, you're trying to learn on just the tool, the tooling front. And some of the technical pieces have already been established. For sure. And I think as we talked about before, when we were starting like return on investment gets better and better because building this one as my third takes half the time as the first one and when i build the fifth i think it's going to take a fifth of the time of the first one like i am getting better the clod skill each time i build one i'm just going to feed it back in and be like what was the skill missing like make make it better and so i think it makes uh the return investment may turn into be like two days a payback period or something crazy um so yeah it does it does feel like it's cool to i think spread out and try many different things, but it does also just feel great to be like, I have an idea. It is in my hands in under 30 minutes. Like it's, it's just very cool. Yeah. And again, this is one of those things where I tell people to work on their anti to-do list. And when you have a recurring task or a recurring point of friction where you're constantly like opening links from Slack and a new tab and then trying to come back to them later and read them. Or, you know, you would be constantly doing this, like, let's zoom in on this image and figure out what it is. And is it something I need to worry about when you have those recurring tasks? It's 100% worth it instead of spending the time on the task itself, to spend the time never having to do that task again. And I like this idea of the payback period of personal software basically collapsing to zero because it really just illustrates where we are in terms of the efficiency and value out of AI, which is it is much more important to learn to build some of these tools than to do a task right now. Like the payoff is so much higher to learn how to automate the task versus doing the task. And if you can just do the change your muscle memory to every time you do the task, pause yourself and say, actually, I'm going to learn how to automate the task. You can really, really create a lot of leverage in in your day to day life, even in your personal life. Yeah, for sure. And we are just about done here. So I was checking back in on our little to do list, which it just finished. So it's doing some final steps here, but we're just about ready to actually load this in. Right now, it's just kind of analyzing to see. Yeah, perfect. So it's running this one last step, which is going to be basically telling us, hey, go go load this in. So once we are done with this, we need to actually load it into Chrome. So Chrome is the mode for extensions called developer mode. You'll see I have that toggle on at the top. And it basically means that you can install extensions, not from a Chrome web store. You can install extensions like from your local computer. so you don't want to generally have that on because like somebody could have side loaded in some uh some credit card skimmer that you've imported or something but if you know what you're doing uh and you know you just built a thing you can go in here and turn this on and then this looks a little bit different once you have this on compared to probably you guys who are not having this on look in your chrome extensions world because we have these options on the side here to load an unpacked extension so this means an extension that's not like fully deployed in the app store. So let's hit this. And this is going to open up our little, uh, file browser here. So let's just pop back. And we had called this Slack summary extension. Okay. So that is now loaded in the bone with truth. Was it true? The software is you can only test it, uh, or easily tested in Chrome. Um, so we're going to actually try it out. Uh, whenever you download a new extension, you do need to refresh your tab. So it picks up that extension. So if right now tried to use it it's still working with the extensions that i had at the time i loaded it so i'm going to refresh this slack so i can pick up our new uh extension here and our moment of truth is going to be we're going to focus on this message and i'm going to hit my shortcut of control shift one and we see did we uh did we nail it oh look at this in our link black color right yeah it kind of interesting so it processing our link we see did it work work So it kind of worked but we have JSON here right It not it not perfect So let's let's work on a one level of refinement here. So I'm going to take a quick snippet shortcut of this. So we're going to take a screenshot of this and we're going to send this back to Cloud Code and say almost. So again, it would be cool if we one shot it. Having a really cool demo, but it's not a perfect one shot. So we'll make one slight tweak here. And I'm actually using a custom slash command I wrote to deal with screenshots. So because I'm developing on Windows, but I actually run Cloud Code in this thing called the Windows subsystem for Linux, it doesn't have access to my Windows clipboard. So I can't do do what everyone else can do, which is just like hit control V here and paste it. So I added a slash command here called paste image that uses some PowerShell shortcuts to pull images out of my clipboard and share them with Claude. And so I can take that snippet and share it. And again, this is similarly, I was like, would copy a file, and I'd like save it in Windows and move it to Linux and then import it with an app mention. And I was like, there's got to be a better array. And I use this slash command now all the time for building stuff out. this is this is extreme software engineer stuff where you're like okay i run on this os but i run my terminal on this and now i can't access my clipboard but i still like it that way so i'm going to write a little script to give myself a two-word shortcut to make this happen and so it ported this and it just it drops screenshots in our in my tmp directory so it's up to say yep Yep, please read it. So again, it's saving those two minutes every day. Adds up fast. And so it's gonna fix this little JSON piece here. And again, it was kind of close. It got the right content, just didn't display it, right? And so now it's gonna go off and work on this for another second here, and we should hopefully have a quick update. And the nice thing is with Chrome extensions in the developer mode, there's just an easy one click button where you will update and grab the latest copy of all of your extensions. So if that's you're working multiple at a time or whatever, you can just hit that button and it's going to update for us. So we'll see it's finishing up here. And one of the things that it's doing, just I'm calling this out for people who are writing queries to OpenAI, it's moving the JSON out of the prompt, which it's saying, please return JSON. I think it's actually just returning JSON as a string in text. And it actually moved that to change the response object to being JSON. So then it can actually be read by the Chrome extension. in a more structured way. Control shift one. See if it works. We're doing drum roll. Beautiful. We got it. Nailed it. So again, we've got these takeaways. We can now action on this and I can decide, do I actually care that iterable added MCP? I do. Spoiler alert. But yeah, again, we did it, I think under 25 minutes here. And is it working in your screen reader? Did the accessibility piece? Yeah, fully accessible. um so this modals in general can be sometimes problematic um because a screen reader will sometimes read behind the modal yeah um but surprisingly although not a bunch of the web is not accessible if you tell some of the foundational models like please make this accessible the accessibility standards are actually incredibly well documented so they they do actually a great job with this so they they use the right it's called aria aria roles and make this modal have the right focus, not letting you read behind it. So out of the box, they're not going to make everything extremely accessible, but you say, hey, go do this. It'll gladly go follow this back and make it accessible. So this is a meta question. And maybe before I get into the meta question, let's just recap for folks what we saw. So you built a Chrome extension that's focused to the web version of Slack. That Chrome extension that's running locally because you've toggled on developer mode in your Chrome settings, will take a focused link that's shared by a colleague or somebody in Slack. It will go out. It will parse that link. We'll see if there are any links in it. It'll parse it. It'll take some key takeaways. The way you built this is you bopped into VS Code. You dictated a short PRD. You let AI kind of build that out. You made minor tweaks to it, but basically shipped it. You used Claude code, including some custom slash commands and a Claude skill specifically around building Chrome extensions to then scaffold out that Chrome extension. You showed us control G in Claude code, where you can actually just modify prompts and inputs as code, which is much more efficient, both from an accessibility perspective and just general user experience perspective. and you showed us a custom screenshot. So you're very special, as I say, you know, unique snowflake software engineer environment can operate as you want, even if there are some technical hurdles. And then now we have this great little extension that I want running on my app. So this is great, Joe. I love this. I want to hop into some lightning round questions. And this has given me an idea of one that I really want to ask you, which is about MCPs. So one of the things that I think is so interesting about MCPs is it allows you to bypass all UI and just get to the bones of what a SaaS product does. And I can imagine that while there are lots of okay accessibles, enterprise software products, not all of them are building for maximum accessibility, either in their design or in their kind of underlying way they're implemented. Have you found MCPs and just that interface into some of these SaaS tools has improved accessibility for you? Has not? What are your thoughts there? I think the altered goal of mine is I would love to do everything in one place and not have to switch tools. Like whether the context switch cost or just a switch cost. So having MCPs has been great for that. Luckily, I actually think lots of enterprise software surprisingly is being built pretty successfully. I want to really give strong kudos to Google Docs. Google Docs for what it does is so crazy accessible. And the work that people don't know that goes into make it that like every single thing that is being done is being communicated to the screen reader, basically letter by letter via this like secreting the secret system that people don't know about called like ARIA live announcements is kind of crazy. But I do find like, hey, I need to get something from three sites. Like that's kind of painful. Like, can I just use the notion mcp and coolbox mcp or glean in there is great um particularly notion is one that is a little bit harder i think they they do their best from accessibility standpoint in some ways but there's a lot going on in an ocean post an ocean article so i think that's another example where it's like yeah i can just pull this down and work in the markdown version that's going to be a lot easier um and again i've got it's way easier to navigate with these like keyboard shortcuts and the folding features so i will pull down some notion posts and just be like dump this into a markdown file for me and then use my little code shortcuts to help navigate some of those pieces. Yeah, I love that. And so my second question, again, is around personal software and the ability to translate. Sort of as you said, you can take a pretty complex Notion page, turn it into markdown that allows you to, you know, read and parse it in a much more efficient way. And so I think this ability to translate files or formats is a really exciting part of AI. And we've hinted at a couple things we've seen in this episode, a little bit of like image to text, a little bit of voice to text. But I'm curious for you, what are you most excited about in sort of like the multimodal world of AI? And, you know, what recently has come out that's, you know, caught your eye and made you excited? Or what are you hoping to see in the next couple months or years that you think could really open up stuff, either for you personally, or just as a product builder? Yeah, I'll talk in the personal space. So I have, I have two kids, I have a five year old and a three year old. And reading books to them is is a challenge. I don't know Braille, I've been trying to learn, but it's a hard skill to pick up at 33. And so I've memorized a handful of books, and I'll read those, but it's not really reading, it's fake, fake reading. But I big shout out to the Gemini app. And it's like live share features, I can now read any book. It sounds like me reading it, but me and my three-year-old Cole will sit on the couch. He'll bring a book over and I'll be like, Hey, let's, I can read this one or no Gemini can read this one for us. And we'll turn the pages and say like Gemini next page. And Gemini will read that page and then we'll turn to the next one and it'll read the content of that. And so I think just like equitable access to everything is, is great. And that piece is one thing that I was always afraid of. I was like, can I read stories? I can memorize stories. I can tell stories but there is something to just like your son being like i want to read this book and you having to be like sorry i can't and now that's all right i can't becomes like sorry i can with the assistance of so many different tools now but i think the gemini one is particularly useful and i found that one to be the strongest for just like easy sharing just saying next page it knows all the context it immediately starts reading it i have like meta glasses i have the chat to be pro i've got all these things but i think gemini is doing the best job of it right now and this is for me trying any of the Gemini Pro 3 that just came out to see if any of that makes it even better. Well, that is a very, very sweet story. And yes, I was just thinking I wish the Meta glasses, which I love and use every day, would also help do a better job there. But it's awesome to hear that Gemini can add to that special time with kids, which as you and I were talking before the show, you know, we both are parents of boys is just is such a special time. So my last question is, when AI is not listening, and I'm curious if you type or if you speak this, what are your prompting techniques? Have you ever whisper flow yelled at your AI? Or, you know, do you have any tricks for us when AI or Claude gets really, really stuck? it's kind of a nerdy answer which makes sense for for me but uh i my typical mode is basically like clear content clear the context and uh and start fresh as much as possible i think i think a lot of people uh will will try and like keep massaging it and being like if i just send this one extra prompt in this conversation it'll figure it out it's like no you just have to start from scratch and take the learnings that you have from the last time um and so sometimes i'll be like this hasn't been going great. What did you what did you learn about this? Take that and feed that into the next prompt. But most of I used to be like, let's start from scratch, something clearly got poisoned in this context. And we start from scratch. I feel like it just everything just feels smoother. I love it. Well, Joe, thank you so much for showing this. I think it's just one of those workflows that we haven't seen before. Everybody can find a use case. I am thinking of all sorts of little micro frictions in my own life, where a keyboard shortcut or two could really make things a little bit better for me. So where can we find you and how can we be helpful to you? Yeah, so I mentioned at BabyList, BabyList is very actively hiring. And if you are somebody who likes using AI in your day-to-day building of software and you're a software engineer, we are Ruby on Rails and React Shop, but hiring across the board, all different levels. So check us out on BabyList.com. And personally, I'm on LinkedIn as well. So feel free, especially if you have any accessibility questions or any questions on some of the Chrome extension piece, I'm always happy to answer on LinkedIn. Well, thanks for joining Howie AI. Thank you. Thanks so much for watching. If you enjoyed this show, please like and subscribe here on YouTube or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiaipod.com. See you next time.