• jaypeters.dev
  • Posts
  • Speech-to-Text, Scaling, and A Personal Brand System

Speech-to-Text, Scaling, and A Personal Brand System

The Weekly Variable

The Weekly Variable

Created a podcast, recorded a rough tutorial editing that podcast, became a speech-to-text expert, and listened to some podcasts.

Topics for this week:

How to Speech-to-Text

What I learned from trying to turn video into a transcription.

The easiest and cheapest, but manual option is to upload the video to YouTube and wait.

YouTube will automatically generate a transcript for you, but it may take a few hours after you upload for the captions to show up.

Tempting, but once you try to use YouTube’s API to automate this, you quickly run into their API quotas which I think you have to request to increase by presenting your case to YouTube why you need more uploads and transcription access.

Rather than deal with that, I could just use OpenAI’s Whisper.

OpenAI offers a hosted solution through an API so I could send them a video or audio file and they’ll send back the transcript for $.006 per minute which is pretty cheap! $.72 for a full 2 hour transcription seems like a good price.

The only problem there is they limit their file size to 25 MB. The HD stream I was testing with is 2.71 GB. It would take 109 video files, sent individually to get the whole thing transcribed.

Technically I could rip the audio from the video and that would reduce the file size considerably, but I’d still be looking at probably 20+ files to queue and manage.

I debated it. It would be kinda fun to build a system that would handle that process.

Instead I decided to run Whisper myself since OpenAI published the code.

I had Claude write up a Docker image that would run Whisper in a container and then I could host that image wherever I wanted.

With a little trial and error, I had a Whisper service running and transcribing a handful of files directly on my computer.

I haven’t hit it with the full 2 hour stream yet because it’ll take some time. It seems to take about 30% of the file time to transcribe depending on the density of the words so the the 2 hour stream should take a little less than an hour to transcribe. I’m sure I’ll set it and forget for an hour sometime this weekend.

After that, OpenAI recommended using a smarter model like GPT-4o to re-read the transcription and check for any errors or acronym issues because Whisper is really only trained on common spoken language and isn’t as complex as GPT-4o. Whisper might struggle transcribing audio that’s full of technical jargon, but I could send the transcript to GPT-4o with the context that this is a software stream and it should be able to correct any errors from Whisper.

This recursive approach to AI its true hidden power, but we’ll come back to that topic.

For now, I know how to use Whisper for Speech-to-Text and AI pipeline expands it’s toolkit.

Image-generation may be next on the list…

 

Scaling and Kubernetes

After a solid month of coordinating schedules, episode 9 of The Dev Sync finally went live on Monday!

Since I’ve been on a Kubernetes (k8s) kick lately, Eric, Will and I talked about scalability and how broad of a term that could be.

They both immediately jumped into clarifying questions about what scalability meant and what kinds of scalability we were talking about, which I expected nothing less from them.

I think I made the same joke in the podcast, but like any good senior engineer, it can all be summed up with the universal answer: “it depends”.

Every situation is different. It depends on what your scaling, how you want to scale it, how was it built and what tools are available, what kind of budget is available, what’s the timeline and ultimately what’s the final goal or outcome?

Scalability is a fun one. It’s one of those concepts that keeps the internet and apps running for millions of users every day, but is completely taken for granted.

If you want to jump into a whirlwind conversation about what to do when your app hits a million users overnight, check out the episode below:

Editing The Dev Sync

The Dev Sync episodes have been about once per month right now so every time I go to edit an episode, it takes a few tries to remember exactly what I did the last time.

This time I got smarter and recorded practically the entire process.

Most of it is just me scanning through the recording, looking for clips to take out of context - which I need to write some code to find those clips for me.

But I really like the idea of recording processes for documentation purposes like that. Makes it easier for me to remember what I did last time and also makes it possible to pass the process off to someone else if needed.

Thankfully, AutoPod saves quite a bit of editing time. Once I have a system for automatically pulling out clips for the intro, this editing process could be 30 minutes instead of 3 hours.

Until then, here’s how I find clips and put them at the front of the episode:

A Personal Brand Content System

We’re on a roll with the content theme so let’s not break the streak.

Matt Gray offered his system for building a personal brand based on your own content in this recent YouTube video and I keep toying with applying that system to this newsletter.

Nearly 50 newsletters plus 8 published blog articles (with a ton more half-written or almost finished blog posts sitting in Notion), I have a backlog of content to recycle and post on all the social media platforms.

Like everything, I’m blocked by overthinking even though I shouldn’t be. A personal brand is easy enough because you are the only you. No one else can have your personal brand so it’s hard to go wrong. Trying to narrow down the personal brand to focus on a few specific topics is the real struggle for me…

I will be looking through my content to use with Matt’s system and get the ball rolling on LinkedIn, X, and YouTube again.

The one post I have on LinkedIn still trickles in about 40 impressions a week, which is kind of crazy. That could quickly compound with additional quality posts so I will certainly entertain that idea.

I’m always up for a coaching or chat session so that would be the outcome of the personal brand but we’ll get started and see what evolves out of it.

I’ve made a few attempts in the past, but with a proper plan and a proper AI pipeline to aid in the creation and posting process, I’m sure I can make a personal brand content system stick.

Consistency is key!

The Tour Begins

I called it!

Alex Hormozi appeared on Chris Williamson’s podcast this week, and confirmed that his book $100M Money Models will be releasing later this year.

After Hormozi showed up on the My First Million podcast, I had a feeling Chris’s had to be in the future and it turns out it was!

Explaining the book to Chris, Alex said that he actually wrote $100M Money Models years ago, but then realized he needed to write $100M Offers and $100M Leads before releasing Money Models for people to have context on making money first.

After writing the other 2 prequels, in true Alex fashion, he re-wrote models 9 times last year but now says it’s ready to go this year.

He’s revealed his process for marketing something like this in $100M Offers so he’ll start “quietly” showing up on podcasts, then as the release day gets closer, he’ll “shout” about the book on all of his socials for the last few weeks leading up to the release. Pretty cool to see the process in action; a master at work.

And the conversation between Chris and Alex is another solid addition to the series, with harsh advice and lots of long dramatic pauses. I thoroughly enjoyed it, even at a solid 3 hours:

And that’s it for this week!

Those are the links that stuck with me throughout the week and a glimpse into what I personally worked on.

What did you do this week? Listen to any good podcasts? Let me know!

And if you want to start a newsletter like this on beehiiv and support me in the process, here’s my referral link: https://www.beehiiv.com/?via=jay-peters. Otherwise, let me know what you think at @jaypetersdotdev or email [email protected], I’d love to hear your feedback. Thanks for reading!