Using OpenAI’s Speech-to-Text with Siri Shortcuts for Automatic (and Cheap) Audio Transcriptions on MacOS
In today’s fast-paced digital world, having access to efficient and accurate transcription services is essential. OpenAI’s Speech-to-Text API offers an excellent solution for developers looking to incorporate automated transcription services into their applications. Plus, it only costs about US$0.006/min of transcription (as of April 2023). This blog post will guide you through creating a Siri Shortcut on your Mac that leverages OpenAI’s powerful Speech-to-Text API to automatically generate transcriptions from video and audio files. Understanding OpenAI’s Speech-to-Text API and Siri Shortcuts OpenAI’s Speech-to-Text API is designed to convert spoken language into written text, providing accurate and near real-time transcriptions. It can be utilized for various applications, such as voice assistants, transcription services, and more. Siri Shortcuts, on the other hand, are customizable, multi-step actions that streamline everyday tasks on your Mac. Prerequisites for Creating the Siri Shortcut To create the Siri Shortcut, you will need: The Quickest Way: Download the Pre-built Siri Shortcut If you prefer not to create the Siri Shortcut manually, you can download the pre-built shortcut using the following link: https://www.icloud.com/shortcuts/dcaa8c4c0c584b7083e0ff552ce016c0 This includes functionality for handling video files as well as audio files, but won’t work for all formats. If your video or audio file is super long (typically around 1hr and longer) you can try splitting it into shorter segments before passing it through the Siri Shortcut, as the final file still has to be under 25MB according to OpenAI’s limitations. Step-by-Step Guide to Creating the Siri Shortcut Step 1: Open Siri Shortcuts and create a new shortcut Step 2: Configure the shortcut actions Step 3: Save and test the shortcut By following these steps, it’s possible to create a cheap, easy-to-use speech-to-text transcription functionality to use on Mac, or even your iPhone or iPad with Siri Shortcuts. By referring to OpenAI’s API Documentation, this could easily be adapted to provide a subtitle file (.srt) or even translate audio from other supported languages into an English transcription. In our experience, this speech-to-text functionality is more accurate than the built-in (Apple or Siri-powered) transcription abilities, and even Google’s functionality to automatically generate a subtitles track once a video has been processed with YouTube. The primary downside with this (and many other) solutions is that you’re submitting your audio file to OpenAI directly, which could raise some privacy concerns. But this is a great solution if you’re looking for an easy way to transcribe audio or video files. Free Transcriptions Using Siri Shortcuts If you have a powerful computer, such as an M1 or better, there’s the possibility of running OpenAI’s Whisper model locally. This also addresses privacy concerns you may have. This option does require some terminal commands, and is not recommended for those not comfortable or familiar with using the Terminal. Using this shortcut, the files will be transcribed locally, and it will automatically output a .txt file of the transcription in the same directory. This requires Python (and Homebrew) to be installed on your Mac. This is an easy-to-follow guide on how to install these requirements.
Read article