Module: Google::Cloud::Speech

Defined in:
lib/google/cloud/speech.rb,
lib/google/cloud/speech/audio.rb,
lib/google/cloud/speech/result.rb,
lib/google/cloud/speech/stream.rb,
lib/google/cloud/speech/project.rb,
lib/google/cloud/speech/service.rb,
lib/google/cloud/speech/version.rb,
lib/google/cloud/speech/operation.rb,
lib/google/cloud/speech/credentials.rb,
lib/google/cloud/speech/v1/speech_client.rb,
lib/google/cloud/speech/v1/cloud_speech_pb.rb,
lib/google/cloud/speech/v1/cloud_speech_services_pb.rb,
lib/google/cloud/speech/v1/doc/google/cloud/speech/v1/cloud_speech.rb

Overview

Google Cloud Speech

Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application's microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.

For more information about Google Cloud Speech API, read the Google Cloud Speech API Documentation.

The goal of google-cloud is to provide an API that is comfortable to Rubyists. Authentication is handled by #speech. You can provide the project and credential information to connect to the Cloud Speech service, or if you are running on Google Compute Engine this configuration is taken care of for you. You can read more about the options for connecting in the Authentication Guide.

Creating audio sources

You can create an audio object that holds a reference to any one of several types of audio data source, along with metadata such as the audio encoding type.

Use Project#audio to create audio sources for the Cloud Speech API. You can provide a file path:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw,
                     language: "en-US",
                     sample_rate: 16000

Or, you can initialize the audio instance with a Google Cloud Storage URI:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "gs://bucket-name/path/to/audio.raw",
                     encoding: :raw,
                     language: "en-US",
                     sample_rate: 16000

Or, with a Google Cloud Storage File object:

require "google/cloud/storage"

storage = Google::Cloud::Storage.new

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio file,
                     encoding: :raw,
                     language: "en-US",
                     sample_rate: 16000

Recognizing speech

The instance methods on Audio can be used to invoke both synchronous and asynchronous versions of the Cloud Speech API speech recognition operation.

Use Audio#recognize for synchronous speech recognition that returns Result objects only after all audio has been processed. This method is limited to audio data of 1 minute or less in duration, and will take roughly the same amount of time to process as the duration of the supplied audio data.

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw,
                     language: "en-US",
                     sample_rate: 16000

results = audio.recognize
result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Use Audio#process for asynchronous speech recognition, in which a Operation is returned immediately after the audio data has been sent. The op can be refreshed to retrieve Result objects once the audio data has been processed.

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw,
                     language: "en-US",
                     sample_rate: 16000

op = audio.process
op.done? #=> false
op.wait_until_done!
op.done? #=> true
results = op.results

result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Use Project#stream for streaming audio data for speech recognition, in which a Stream is returned. The stream object can receive results while sending audio by performing bidirectional streaming speech-recognition.

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw"

stream = audio.stream encoding: :raw,
                      language: "en-US",
                      sample_rate: 16000

# Stream 5 seconds of audio from the microphone
# Actual implementation of microphone input varies by platform
5.times do
  stream.send MicrophoneInput.read(32000)
end

stream.stop
stream.wait_until_complete!

results = stream.results
result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Obtaining audio data from input sources such as a Microphone is outside the scope of this document.

Defined Under Namespace

Modules: V1 Classes: Audio, InterimResult, Operation, Project, Result, Stream

Constant Summary collapse

VERSION =
"0.24.0"

Class Method Summary collapse

Class Method Details

.new(project: nil, keyfile: nil, scope: nil, timeout: nil, client_config: nil) ⇒ Google::Cloud::Speech::Project

Creates a new object for connecting to the Speech service. Each call creates a new connection.

For more information on connecting to Google Cloud see the Authentication Guide.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw,
                     language: "en-US",
                     sample_rate: 16000

Parameters:

  • project (String)

    Project identifier for the Speech service you are connecting to.

  • keyfile (String, Hash)

    Keyfile downloaded from Google Cloud. If file path the file must be readable.

  • scope (String, Array<String>)

    The OAuth 2.0 scopes controlling the set of resources and operations that the connection can access. See Using OAuth 2.0 to Access Google APIs.

    The default scope is:

    • https://www.googleapis.com/auth/speech
  • timeout (Integer)

    Default timeout to use in requests. Optional.

  • client_config (Hash)

    A hash of values to override the default behavior of the API client. Optional.

Returns:



222
223
224
225
226
227
228
229
230
231
232
233
234
235
# File 'lib/google/cloud/speech.rb', line 222

def self.new project: nil, keyfile: nil, scope: nil, timeout: nil,
             client_config: nil
  project ||= Google::Cloud::Speech::Project.default_project
  if keyfile.nil?
    credentials = Google::Cloud::Speech::Credentials.default scope: scope
  else
    credentials = Google::Cloud::Speech::Credentials.new(
      keyfile, scope: scope)
  end
  Google::Cloud::Speech::Project.new(
    Google::Cloud::Speech::Service.new(
      project, credentials, timeout: timeout,
                            client_config: client_config))
end