In today’s fast-paced digital world, having access to efficient and accurate transcription services is essential. OpenAI’s Speech-to-Text API offers an excellent solution for developers looking to incorporate automated transcription services into their applications. Plus, it only costs about US$0.006/min of transcription (as of April 2023). This blog post will guide you through creating a Siri Shortcut on your Mac that leverages OpenAI’s powerful Speech-to-Text API to automatically generate transcriptions from video and audio files.
Understanding OpenAI’s Speech-to-Text API and Siri Shortcuts
OpenAI’s Speech-to-Text API is designed to convert spoken language into written text, providing accurate and near real-time transcriptions. It can be utilized for various applications, such as voice assistants, transcription services, and more. Siri Shortcuts, on the other hand, are customizable, multi-step actions that streamline everyday tasks on your Mac.
Prerequisites for Creating the Siri Shortcut
To create the Siri Shortcut, you will need:
- A Mac computer running macOS with Siri Shortcuts enabled. (Monterey, Ventura and later). This will also work with the latest versions of iOS and iPadOS that have Siri Shortcuts.
- An OpenAI API key, which can be obtained by signing up for the OpenAI API platform.
The Quickest Way: Download the Pre-built Siri Shortcut
If you prefer not to create the Siri Shortcut manually, you can download the pre-built shortcut using the following link: https://www.icloud.com/shortcuts/dcaa8c4c0c584b7083e0ff552ce016c0
This includes functionality for handling video files as well as audio files, but won’t work for all formats. If your video or audio file is super long (typically around 1hr and longer) you can try splitting it into shorter segments before passing it through the Siri Shortcut, as the final file still has to be under 25MB according to OpenAI’s limitations.
Step-by-Step Guide to Creating the Siri Shortcut
Step 1: Open Siri Shortcuts and create a new shortcut
- Launch the Siri Shortcuts app on your Mac.
- Click the “+” button to create a new shortcut.
Step 2: Configure the shortcut actions
- Add a “Choose File” action to allow selecting the audio file to be transcribed.
- Insert a “Convert Media” action and set the output format to “Audio Only” to extract the audio from the selected file.
- Add a “Get Contents of URL” action to send a POST request to OpenAI’s Speech-to-Text API using your API key.
- Set the URL to
https://api.openai.com/v1/audio/transcriptions
- Configure the request headers with the following key-value pairs:
- “Authorization”: “Bearer YOUR_API_KEY” (replace “YOUR_API_KEY” with your actual OpenAI API key)
- “Content-Type”: “multipart/form-data”
- Configure the request body with the necessary Form parameters, such as the extracted audio data and any desired API options. This should look something like this:
- file=”Encoded_Media_Filepath” [required]
- model=”whisper-1″ [required]
- response_format=”text”
- language=”en” (or another supported ISO-639-1 format language)
- prompt=”Hello, welcome to my lecture.” (It can be useful to add a prompt here if you’d like the transcription to include punctuation, or to clarify certain words, etc.)
- Set the URL to
- In the case of a “text” output, all you need to add is “Show Contents_of_URL in Quick Look”. This will display the generated transcription with TextEdit (or your default text editor application), which you can then save from within TextEdit.
- Within the inspector on the right-hand side, under Details, select “Use As Quick Action” as well as “Finder”. This will allow you even easier access to the speech-to-text functionality, as you’ll be able to right-click a file from within Finder and immediately submit it to the Siri Shortcut.
Step 3: Save and test the shortcut
- Give your shortcut a descriptive name, such as “Transcribe Video/Audio.”
- Save the shortcut and test it by selecting a video or audio file to transcribe.
By following these steps, it’s possible to create a cheap, easy-to-use speech-to-text transcription functionality to use on Mac, or even your iPhone or iPad with Siri Shortcuts. By referring to OpenAI’s API Documentation, this could easily be adapted to provide a subtitle file (.srt) or even translate audio from other supported languages into an English transcription.
In our experience, this speech-to-text functionality is more accurate than the built-in (Apple or Siri-powered) transcription abilities, and even Google’s functionality to automatically generate a subtitles track once a video has been processed with YouTube. The primary downside with this (and many other) solutions is that you’re submitting your audio file to OpenAI directly, which could raise some privacy concerns. But this is a great solution if you’re looking for an easy way to transcribe audio or video files.
Free Transcriptions Using Siri Shortcuts
If you have a powerful computer, such as an M1 or better, there’s the possibility of running OpenAI’s Whisper model locally. This also addresses privacy concerns you may have. This option does require some terminal commands, and is not recommended for those not comfortable or familiar with using the Terminal.
Using this shortcut, the files will be transcribed locally, and it will automatically output a .txt file of the transcription in the same directory. This requires Python (and Homebrew) to be installed on your Mac. This is an easy-to-follow guide on how to install these requirements.