I have trouble paying attention when I read.
My eyes move across the page, but nothing sticks. Sometimes I'll get through an entire paragraph and realize I absorbed none of it. I scroll back. Try again. Still nothing. It's not that I don't care about what I'm reading—it's that my brain doesn't register it unless I hear it, too.
I've known this about myself for years: I retain information best when it comes in through both eyes and ears. Reading with text-to-speech turned on is like flipping a switch in my brain. Suddenly, the words actually make sense.
So when I discovered that the mobile Claude app had built-in TTS—but the web version didn't—it felt like someone handed me a tool and then quietly took it away again. And this isn't about convenience. It's about cognition. One version lets me engage meaningfully; the other makes me feel broken.
The worst part? I've been here before. This is a familiar loop:
- Discover a tool that almost helps
- Hit a wall due to platform gaps or missing features
- Wait for someone to fix it—or struggle without
I've hit this pattern dozens of times. A notification system that's almost gentle enough. An IDE theme that's close to the right contrast. A commit workflow that would work perfectly if it just had one more hook. Each time, the calculation is the same: is this painful enough to justify days of work? Usually, the answer is no. So I adapt. I cope. I wait.
And maybe most people just... move on. But this time, I didn't want to wait. I needed it now. Not a week from now. Not when it made the product roadmap. I needed to fix it today, for me.
So I did.
The 45-Minute Solution
I opened Claude Code and typed out exactly what I needed:
I have an idea for a utility to help me with my autism. I'm big on TTS because reading has high cognitive load for me. And there's lots of really good small TTS models out there, but they're hard to set up. I want to build a tool for Windows in C# that hooks into an easily setup local TTS model like kokoro.js or something and allow me to highlight ANY text ANYWHERE and have it read by a good TTS voice. Wanna help me build that?
What came back wasn't a prototype or a proof-of-concept. It was a working architecture: a .NET 8 Windows service with global hotkey registration, clipboard monitoring, Docker container management for the TTS engine, and audio playback through NAudio.
I made a few adjustments: tweaked the Docker container lifecycle management, added logging, simplified the settings UI. But the core implementation? That was handled. The pain I would've expected—wrestling with Win32 API documentation, debugging COM threading issues, figuring out how to pipe audio correctly—just... didn't happen.
Forty-five minutes later, I had LocalTTS: a system tray app that lets me highlight any text anywhere, hit Ctrl+Shift+R, and hear it read back immediately using Kokoro TTS running locally in Docker.
It's not just a clever hack—it's real code. Documented. Running right now as I write this. Published on GitHub. Not as a polished product, but as something good enough to use, built specifically for my brain.
Imperfect But Real
It's not perfect.
Let me be clear about that up front. LocalTTS requires .NET 8 build tools and Docker. It's Windows-only. If you don't already have developer muscle memory, the setup process isn't exactly friendly. There are bugs—sometimes it stumbles on weird characters or long selections. The voices are configurable but limited to what Kokoro supports. There's no installer. The UI is just a system tray icon with a right-click menu and a settings form. If this were a product, it wouldn't ship.
But that's the point. It's not a product. It's a tool.
And it works.
And here's the wild part: this isn't trivial software. It's using platform-level APIs via DllImport, managing Docker container lifecycle, handling global hotkey registration, piping raw text into a FastAPI service running in a container, and streaming audio back through NAudio. That's the kind of thing that used to take me a weekend—days of reading docs, debugging obscure Win32 behavior, tracking down why the audio buffer wasn't flushing correctly.
In a world where the alternative was nothing, that's more than enough. I didn't need a marketplace-ready solution. I needed relief. I needed something that could close the loop between seeing and understanding. And now I have it.
It's buggy. It's rough. It's also the reason I actually can more easily remember what I read now.
What This Actually Means
This isn't a productivity story. It's a story about agency.
Because here's what changed: I went from "I wish the Claude web interface had TTS" to "I have TTS for the Claude web interface and any other selectable text on my system" in less time than it takes to file a feature request and get a response. The waiting became obsolete. The platform gap—mobile has this feature, web doesn't—stopped mattering, because I could bridge it myself.
Not everyone can do this. This still requires developer context—understanding what Docker is, knowing how to navigate build tools, having the mental model of how services talk to each other. But here's the thing: I could always do this. In theory. What changed is that the theory-to-practice gap collapsed. The thing that kept me from solving my own problems wasn't capability. It was cost. Time cost, friction cost, the cost of maintaining motivation through tedious implementation details I already understood conceptually.
AI didn't solve my problem. I did.
AI just removed the friction. It gave me back my agency.
What If AI, But For Good?
Maybe you need notification sounds that ramp up gradually instead of jolting you out of hyperfocus. Maybe you need your IDE to dim the screen during compilation so the sudden UI shift doesn't break your flow.
No one's building that for you. The business case doesn't exist. The market is too small, the need too specific, the ROI too uncertain.
But if you're a developer who lives at the intersection of technical capability and cognitive difference—if you know what you need but have been blocked by the sheer implementation cost of building it—something fundamental just shifted.
You don't need to wait anymore. You don't need to file feature requests or hope someone sees your use case as viable. You don't need to choose between spending a week on a personal tool or just living with the friction.
You can just build it. Today. For yourself.
And maybe that's the future worth building toward: not AI that works for everyone, but AI that lets each of us build exactly what we need.
Top comments (0)