Class: Google::Cloud::Bigquery::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Updater

Attributes collapse

Lifecycle collapse

Table collapse

Data collapse

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.access #=> [{"role"=>"OWNER",
               #     "specialGroup"=>"projectOwners"},
               #    {"role"=>"WRITER",
               #     "specialGroup"=>"projectWriters"},
               #    {"role"=>"READER",
               #     "specialGroup"=>"projectReaders"},
               #    {"role"=>"OWNER",
               #     "userByEmail"=>"123456789-...com"}]

Manage the access rules by passing a block:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "owners@example.com"
  access.add_writer_user "writer@example.com"
  access.remove_writer_user "readers@example.com"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

Returns:

See Also:



255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/google/cloud/bigquery/dataset.rb', line 255

def access
  ensure_full_data!
  access_builder = Access.from_gapi @gapi
  if block_given?
    yield access_builder
    if access_builder.changed?
      @gapi.update! access: access_builder.to_gapi
      patch_gapi! :access
    end
  end
  access_builder.freeze
end

#api_urlObject

A URL that can be used to access the dataset using the REST API.



124
125
126
127
# File 'lib/google/cloud/bigquery/dataset.rb', line 124

def api_url
  ensure_full_data!
  @gapi.self_link
end

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
                             name: "My Table",
                             description: "A description of table."

The table's schema fields can be passed as an argument.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

schema_fields = [
  Google::Cloud::Bigquery::Schema::Field.new(
    "first_name", :string, mode: :required),
  Google::Cloud::Bigquery::Schema::Field.new(
    "cities_lived", :record, mode: :repeated
    fields: [
      Google::Cloud::Bigquery::Schema::Field.new(
        "place", :string, mode: :required),
      Google::Cloud::Bigquery::Schema::Field.new(
        "number_of_years", :integer, mode: :required),
      ])
]
table = dataset.create_table "my_table", fields: schema_fields

Or the table's schema can be configured with the block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |t|
  t.name = "My Table",
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

Parameters:

  • table_id (String)

    The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

  • fields (Array<Schema::Field>)

    An array of Schema::Field objects specifying the schema's data types for the table. The schema may also be configured when passing a block.

Yields:

  • (table)

    a block for setting the table

Yield Parameters:

  • table (Table)

    the table object to be updated

Returns:



382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
# File 'lib/google/cloud/bigquery/dataset.rb', line 382

def create_table table_id, name: nil, description: nil, fields: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id))
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
    tb.schema.fields = fields unless fields.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil) ⇒ Google::Cloud::Bigquery::View

Creates a new view table from the given query.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]",
          name: "My View", description: "This is my view"

Parameters:

  • table_id (String)

    The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • query (String)

    The query that BigQuery executes when the view is referenced.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

Returns:



432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
# File 'lib/google/cloud/bigquery/dataset.rb', line 432

def create_view table_id, query, name: nil, description: nil
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id, table_id: table_id
    ),
    friendly_name: name,
    description: description,
    view: Google::Apis::BigqueryV2::ViewDefinition.new(
      query: query
    )
  }.delete_if { |_, v| v.nil? }
  new_view = Google::Apis::BigqueryV2::Table.new new_view_opts

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_atObject

The time when this dataset was created.



179
180
181
182
183
184
185
186
# File 'lib/google/cloud/bigquery/dataset.rb', line 179

def created_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#dataset_idObject

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



67
68
69
# File 'lib/google/cloud/bigquery/dataset.rb', line 67

def dataset_id
  @gapi.dataset_reference.dataset_id
end

#default_expirationObject

The default lifetime of all tables in the dataset, in milliseconds.



154
155
156
157
158
159
160
161
# File 'lib/google/cloud/bigquery/dataset.rb', line 154

def default_expiration
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.



169
170
171
172
# File 'lib/google/cloud/bigquery/dataset.rb', line 169

def default_expiration= new_default_expiration
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ Boolean

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
dataset.delete

Parameters:

  • force (Boolean)

    If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false.

Returns:

  • (Boolean)

    Returns true if the dataset was deleted.



288
289
290
291
292
# File 'lib/google/cloud/bigquery/dataset.rb', line 288

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  true
end

#descriptionObject

A user-friendly description of the dataset.



134
135
136
137
# File 'lib/google/cloud/bigquery/dataset.rb', line 134

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.



144
145
146
147
# File 'lib/google/cloud/bigquery/dataset.rb', line 144

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagObject

A string hash of the dataset.



114
115
116
117
# File 'lib/google/cloud/bigquery/dataset.rb', line 114

def etag
  ensure_full_data!
  @gapi.etag
end

#locationObject

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.



208
209
210
211
# File 'lib/google/cloud/bigquery/dataset.rb', line 208

def location
  ensure_full_data!
  @gapi.location
end

#modified_atObject

The date when this dataset or any of its tables was last modified.



193
194
195
196
197
198
199
200
# File 'lib/google/cloud/bigquery/dataset.rb', line 193

def modified_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#nameObject

A descriptive name for the dataset.



95
96
97
# File 'lib/google/cloud/bigquery/dataset.rb', line 95

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.



104
105
106
107
# File 'lib/google/cloud/bigquery/dataset.rb', line 104

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idObject

The ID of the project containing this dataset.



76
77
78
# File 'lib/google/cloud/bigquery/dataset.rb', line 76

def project_id
  @gapi.dataset_reference.project_id
end

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Google::Cloud::Bigquery::QueryData

Queries data using the synchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
  puts row["name"]
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • max (Integer)

    The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

  • timeout (Integer)

    How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds).

  • dryrun (Boolean)

    If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

Returns:



638
639
640
641
642
643
644
645
# File 'lib/google/cloud/bigquery/dataset.rb', line 638

def query query, max: nil, timeout: 10000, dryrun: nil, cache: true
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache }
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_service!
  gapi = service.query query, options
  QueryData.from_gapi gapi, service
end

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Queries data using the asynchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • priority (String)

    Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • table (Table)

    The destination table where the query results should be stored. If not present, a new table will be created to store the results.

  • create (String)

    Specifies whether the job is allowed to create new tables.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies the action that occurs if the destination table already exists.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
  • large_results (Boolean)

    If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.

  • flatten (Boolean)

    Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.

Returns:



580
581
582
583
584
585
586
587
588
589
# File 'lib/google/cloud/bigquery/dataset.rb', line 580

def query_job query, priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, large_results: nil, flatten: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten }
  options[:dataset] ||= self
  ensure_service!
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...

Retrieves an existing table by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name

Parameters:

  • table_id (String)

    The ID of a table.

Returns:



468
469
470
471
472
473
474
# File 'lib/google/cloud/bigquery/dataset.rb', line 468

def table table_id
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  Table.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.all do |table|
  puts table.name
end

Parameters:

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of tables to return.

Returns:



509
510
511
512
513
514
# File 'lib/google/cloud/bigquery/dataset.rb', line 509

def tables token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_tables dataset_id, options
  Table::List.from_gapi gapi, service, dataset_id, max
end