speech-to-text/extension.yaml (110 lines of code) (raw):
name: speech-to-text
version: 0.1.6
specVersion: v1beta
displayName: Transcribe Speech to Text
description:
Transcribes audio files in Cloud Storage to .txt files using Cloud Speech To
Text.
icon: icon.png
tags: [ai, speech-to-text, speech-transcription, google-ai]
license: Apache-2.0 # The license you want for the extension
author:
authorName: Google Cloud
url: https://cloud.google.com/
sourceUrl: https://github.com/GoogleCloudPlatform/firebase-extensions/tree/main/speech-to-text
releaseNotesUrl: https://github.com/GoogleCloudPlatform/firebase-extensions/tree/main/speech-to-text/CHANGELOG.md
apis:
- apiName: speech.googleapis.com
reason: Used for transcribing the audio of sound files.
resources:
- name: transcribeAudio
type: firebaseextensions.v1beta.function
description: >-
Listens for new audio files uploaded to a specified Cloud Storage bucket,
transcribes the speech in those files, then stores the transcription in
storage, or in firestore, or in both.
properties:
location: ${param:LOCATION}
runtime: nodejs20
availableMemoryMb: 1024
eventTrigger:
eventType: google.storage.object.finalize
resource: projects/_/buckets/${param:EXTENSION_BUCKET}
params:
- param: LOCATION
label: Cloud Functions location
description: >-
Where do you want to deploy the functions created for this extension? You
usually want a location close to your database. Realtime Database
instances are located in `us-central1`. For help selecting a location,
refer to the [location selection
guide](https://firebase.google.com/docs/functions/locations).
type: select
options:
- label: Iowa (us-central1)
value: us-central1
- label: South Carolina (us-east1)
value: us-east1
- label: Northern Virginia (us-east4)
value: us-east4
- label: Belgium (europe-west1)
value: europe-west1
- label: London (europe-west2)
value: europe-west2
- label: Frankfurt (europe-west3)
value: europe-west3
- label: Hong Kong (asia-east2)
value: asia-east2
- label: Tokyo (asia-northeast1)
value: asia-northeast1
default: us-central1
required: true
immutable: true
- param: EXTENSION_BUCKET
label: Cloud Storage bucket for input and output
description: >
The Cloud Storage bucket that the extension should be listening to. Files
uploaded to this bucket will be transcribed by the extension. If cloud
storage output is enabled, transcriptions will be written to this bucket.
type: selectResource
resourceType: storage.googleapis.com/Bucket
example: my-project-12345.appspot.com
validationRegex: ^([0-9a-z_.-]*)$
validationErrorMessage: Invalid storage bucket
default: ${STORAGE_BUCKET}
required: true
- param: OUTPUT_STORAGE_PATH
label: Storage path for transcriptions
description: >
The storage path in which to output transcriptions. If this is not set,
the extension will output to the root of the bucket.
example: transcriptions
required: false
- param: COLLECTION_PATH
label: Firestore collection for storing transcribed audio
description: >
The firestore collection in which to output transcriptions. If this is not
set, the extension will not output the data to Firestore.
validationRegex: '^[^/]+(/[^/]+/[^/]+)*$'
validationErrorMessage: Must be a valid Cloud Firestore Collection
example: transcriptions
required: false
- param: LANGUAGE_CODE
label: BCP-47 code of the transcription language
description: >
The BCP-47 code of the transcription language, as shown in the [Language
support
documentation](https://cloud.google.com/speech-to-text/docs/languages)
example: e.g. en-US
# TODO(reao): Here I assume that language codes need to end in two
# uppercase characters -- is that correct? It is a pattern in the language
# support documentation.
validationRegex: '^([a-zA-Z-])*[A-Z][A-Z]$'
validationErrorMessage:
Must be a valid code from
https://cloud.google.com/speech-to-text/docs/languages
required: true
- param: MODEL
label: Language model used for transcription
description: >
Which kind of use-case should the speech-to-text transcription algorithm
be honed for? For details, see [the model field in the
documentation](https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig)
If you're not sure, just use the default.
example: default
default: default
required: true
- param: ENABLE_AUTOMATIC_PUNCTUATION
label: Enable automatic punctuation
description: >
Should the transcription algorithm attempt to add punctuation to the
transcription? For details, see [the
documentation](https://cloud.google.com/speech-to-text/docs/automatic-punctuation)
type: select
options:
- label: Enabled
value: true
- label: Disabled
value: false
default: true
roles:
- role: storage.objectAdmin
reason: Allows the extension to write to your Cloud Storage.
- role: datastore.user
reason: Allows the extension to write to your Firestore Database instance.
events:
- type: firebase.extensions.storage-transcribe-audio.v1.complete
description:
Occurs when audio transcription completes. The event will contain the full
transcription.
- type: firebase.extensions.storage-transcribe-audio.v1.fail
description:
Occurs when audio transcription fails. The event will contain information
about the failure.