location/keyRings/keyring/cryptoKeys/key"
This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property.
See more details about Protecting data with Cloud KMS keys.
friendly_name
STRING
Example: friendly_name="my_table"
This property is equivalent to the friendlyName table resource property.
description
STRING
Example: description="a table that expires in 2025"
This property is equivalent to the description table resource property.
labels
ARRAY<STRUCT<STRING, STRING>>
Example: labels=[("org_unit", "development")]
This property is equivalent to the labels table resource property.
default_rounding_mode
STRING
Example: default_rounding_mode = "ROUND_HALF_EVEN"
This specifies the default rounding mode that's used for values written to any new NUMERIC
or BIGNUMERIC
type columns or STRUCT
fields in the table. It does not impact existing fields in the table. The following values are supported:
"ROUND_HALF_AWAY_FROM_ZERO"
: Halfway cases are rounded away from zero. For example, 2.5 is rounded to 3.0, and -2.5 is rounded to -3."ROUND_HALF_EVEN"
: Halfway cases are rounded towards the nearest even digit. For example, 2.5 is rounded to 2.0 and -2.5 is rounded to -2.0.This property is equivalent to the defaultRoundingMode
table resource property.
enable_change_history
BOOL
In preview.
Example: enable_change_history=TRUE
Set this property to TRUE
in order to capture change history on the table, which you can then view by using the CHANGES
function. Enabling this table option has an impact on costs; for more information see Pricing and costs. The default is FALSE
.
max_staleness
INTERVAL
Example: max_staleness=INTERVAL "4:0:0" HOUR TO SECOND
The maximum interval behind the current time where it's acceptable to read stale data. For example, with change data capture, when this option is set, the table copy operation is denied if data is more stale than the max_staleness
value.
max_staleness
is disabled by default.
enable_fine_grained_mutations
BOOL
In preview.
Example: enable_fine_grained_mutations=TRUE
Set this property to TRUE
to enable fine-grained DML optimization on the table. The default is FALSE
.
storage_uri
STRING
In preview.
Example: storage_uri=gs://BUCKET_DIRECTORY/TABLE_DIRECTORY/
A fully qualified location prefix for the external folder where data is stored. Supports gs:
buckets.
Required for managed tables.
file_format
STRING
In preview.
Example: file_format=PARQUET
The open-source file format in which the table data is stored. Only PARQUET
is supported.
Required for managed tables.
The default is PARQUET
.
table_format
STRING
In preview.
Example: table_format=ICEBERG
The open table format in which metadata-only snapshots are stored. Only ICEBERG
is supported.
Required for managed tables.
The default is ICEBERG
.
VALUE
is a constant expression containing only literals, query parameters, and scalar functions.
The constant expression cannot contain:
SELECT
, CREATE
, or UPDATE
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
If VALUE
evaluates to NULL
, the corresponding option NAME
in the CREATE TABLE
statement is ignored.
column_option_list
Specify a column option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
description |
| Example: This property is equivalent to the schema.fields[].description table resource property. |
rounding_mode |
| Example: This specifies the rounding mode that's used for values written to a
This property is equivalent to the |
VALUE
is a constant expression containing only literals, query parameters, and scalar functions.
The constant expression cannot contain:
SELECT
, CREATE
, or UPDATE
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the VALUE
replaces the existing value of that option for the column, if there was one. Setting the VALUE
to NULL
clears the column's value for that option.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
and bigquery.tables.updateData
permissions.
If the OPTIONS
clause includes any expiration options, then the bigquery.tables.delete
permission is also required.
The following example creates a partitioned table named newtable
in mydataset
:
CREATETABLEmydataset.newtable(xINT64OPTIONS(description="An optional INTEGER field"),ySTRUCT< aARRAY<STRING>OPTIONS(description="A repeated STRING field"),bBOOL> )PARTITIONBY_PARTITIONDATEOPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",partition_expiration_days=1,description="a table that expires in 2025, with each partition living for 24 hours",labels=[("org_unit","development")])
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.newtable
, your table qualifier might be `myproject.mydataset.newtable`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table uses the following partition_expression
to partition the table: PARTITION BY _PARTITIONDATE
. This expression partitions the table using the date in the _PARTITIONDATE
pseudocolumn.
The table schema contains two columns:
y: A STRUCT containing two columns:
The table option list specifies the:
A table that expires in 2025
org_unit = development
The following example creates a table named top_words
in mydataset
from a query:
CREATETABLEmydataset.top_wordsOPTIONS(description="Top ten words per Shakespeare corpus")ASSELECTcorpus,ARRAY_AGG(STRUCT(word,word_count)ORDERBYword_countDESCLIMIT10)AStop_wordsFROM`bigquery-public-data`.samples.shakespeareGROUPBYcorpus;
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.top_words
, your table qualifier might be `myproject.mydataset.top_words`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table schema contains 2 columns:
top_words: An ARRAY
of STRUCT
s containing 2 fields: word
(a STRING
) and word_count
(an INT64
with the word count)
The table option list specifies the:
Top ten words per Shakespeare corpus
The following example creates a table named newtable
in mydataset
only if no table named newtable
exists in mydataset
. If the table name exists in the dataset, no error is returned, and no action is taken.
CREATETABLEIFNOTEXISTSmydataset.newtable(xINT64,ySTRUCT<aARRAY<STRING>,bBOOL>)OPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",description="a table that expires in 2025",labels=[("org_unit","development")])
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.newtable
, your table qualifier might be `myproject.mydataset.newtable`
.
The table schema contains 2 columns:
y: A STRUCT containing a (an array of strings) and b (a boolean)
The table option list specifies the:
A table that expires in 2025
org_unit = development
The following example creates a table named newtable
in mydataset
, and if newtable
exists in mydataset
, it is overwritten with an empty table.
CREATEORREPLACETABLEmydataset.newtable(xINT64,ySTRUCT<aARRAY<STRING>,bBOOL>)OPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",description="a table that expires in 2025",labels=[("org_unit","development")])
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.newtable
, your table qualifier might be `myproject.mydataset.newtable`
.
The table schema contains 2 columns:
y: A STRUCT containing a (an array of strings) and b (a boolean)
The table option list specifies the:
A table that expires in 2025
org_unit = development
REQUIRED
columnsThe following example creates a table named newtable
in mydataset
. The NOT NULL
modifier in the column definition list of a CREATE TABLE
statement specifies that a column or field is created in REQUIRED
mode.
CREATETABLEmydataset.newtable(xINT64NOTNULL,ySTRUCT< aARRAY<STRING>,bBOOLNOTNULL,cFLOAT64 > NOTNULL,zSTRING)
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.newtable
, your table qualifier might be `myproject.mydataset.newtable`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table schema contains 3 columns:
REQUIRED
integerREQUIRED
STRUCT containing a (an array of strings), b (a REQUIRED
boolean), and c (a NULLABLE
float)z: A NULLABLE
string
The following examples create a table named newtable
in mydataset
with columns a
, b
, c
, and a struct with fields x
and y
.
All STRING
column schemas in this table are collated with 'und:ci'
:
CREATETABLEmydataset.newtable(aSTRING,bSTRING,cSTRUCT< xFLOAT64yARRAY<STRING> > )DEFAULTCOLLATE'und:ci';
Only b
and y
are collated with 'und:ci'
:
CREATETABLEmydataset.newtable(aSTRING,bSTRINGCOLLATE'und:ci',cSTRUCT< xFLOAT64yARRAY<STRINGCOLLATE'und:ci'> > );
The following example creates a table named newtable
in mydataset
. The parameters in parentheses specify that the column contains a parameterized data type. See Parameterized Data Types for more information about parameterized types.
CREATETABLEmydataset.newtable(xSTRING(10),ySTRUCT< aARRAY<BYTES(5)>,bNUMERIC(15,2)OPTIONS(rounding_mode='ROUND_HALF_EVEN'),cFLOAT64>,zBIGNUMERIC(35))
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. Instead of mydataset.newtable
, your table qualifier should be `myproject.mydataset.newtable`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table schema contains 3 columns:
The following example creates a partitioned table named newtable
in mydataset
using a DATE
column:
CREATETABLEmydataset.newtable(transaction_idINT64,transaction_dateDATE)PARTITIONBYtransaction_dateOPTIONS(partition_expiration_days=3,description="a table partitioned by transaction_date")
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.newtable
, your table qualifier might be `myproject.mydataset.newtable`
.
The table schema contains 2 columns:
The table option list specifies the:
A table partitioned by transaction_date
The following example creates a partitioned table named days_with_rain
in mydataset
using a DATE
column:
CREATETABLEmydataset.days_with_rainPARTITIONBYdateOPTIONS(partition_expiration_days=365,description="weather stations with precipitation, partitioned by day")ASSELECTDATE(CAST(yearASINT64),CAST(moASINT64),CAST(daASINT64))ASdate,(SELECTANY_VALUE(name)FROM`bigquery-public-data.noaa_gsod.stations`ASstationsWHEREstations.usaf=stn)ASstation_name,-- Stations can have multiple namesprcpFROM`bigquery-public-data.noaa_gsod.gsod2017`ASweatherWHEREprcp!=99.9-- Filter unknown valuesANDprcp > 0-- Filter stations/days with no precipitation
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.days_with_rain
, your table qualifier might be `myproject.mydataset.days_with_rain`
.
The table schema contains 2 columns:
DATE
of data collectionSTRING
FLOAT64
The table option list specifies the:
Weather stations with precipitation, partitioned by day
The following example creates a clustered table named myclusteredtable
in mydataset
. The table is a partitioned table, partitioned by a truncated TIMESTAMP
column and clustered by a STRING
column named customer_id
.
CREATETABLEmydataset.myclusteredtable(input_timestampTIMESTAMP,customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYTIMESTAMP_TRUNC(input_timestamp,HOUR)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.myclusteredtable
, your table qualifier might be `myproject.mydataset.myclusteredtable`
.
The table schema contains 3 columns:
TIMESTAMP
STRING
NUMERIC
The table option list specifies the:
A table clustered by customer_id
The following example creates a clustered table named myclusteredtable
in mydataset
. The table is an ingestion-time partitioned table.
CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYDATE(_PARTITIONTIME)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.myclusteredtable
, your table qualifier might be `myproject.mydataset.myclusteredtable`
.
The table schema contains 2 columns:
STRING
NUMERIC
The table option list specifies the:
A table clustered by customer_id
The following example creates a clustered table named myclusteredtable
in mydataset
. The table is not partitioned.
CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)CLUSTERBYcustomer_idOPTIONS(description="a table clustered by customer_id")
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.myclusteredtable
, your table qualifier might be `myproject.mydataset.myclusteredtable`
.
The table schema contains 2 columns:
STRING
NUMERIC
The table option list specifies the:
A table clustered by customer_id
The following example creates a partitioned and clustered table named myclusteredtable
in mydataset
using the result of a query.
CREATETABLEmydataset.myclusteredtable(input_timestampTIMESTAMP,customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYDATE(input_timestamp)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")ASSELECT*FROMmydataset.myothertable
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.myclusteredtable
, your table qualifier might be `myproject.mydataset.myclusteredtable`
.
The table schema contains 3 columns:
TIMESTAMP
STRING
NUMERIC
The table option list specifies the:
A table clustered by customer_id
The following example creates a clustered table named myclusteredtable
in mydataset
using the result of a query. The table is not partitioned.
CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)CLUSTERBYcustomer_idOPTIONS(description="a table clustered by customer_id")ASSELECT*FROMmydataset.myothertable
If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id
contains special characters: `project_id.dataset.table`
. So, instead of mydataset.myclusteredtable
, your table qualifier might be `myproject.mydataset.myclusteredtable`
.
The table schema contains 2 columns:
STRING
NUMERIC
The table option list specifies the:
A table clustered by customer_id
The following example creates a temporary table named Example
and inserts values into it.
CREATETEMPTABLEExample(xINT64,ySTRING);INSERTINTOExampleVALUES(5,'foo');INSERTINTOExampleVALUES(6,'bar');SELECT*FROMExample;
This script returns the following output:
+-----+---+-----+|Row|x|y|+-----+---|-----+|1|5|foo||2|6|bar|+-----+---|-----+
Suppose you have a BigLake table named myawsdataset.orders
that references data from Amazon S3. You want to transfer data from that table to a BigQuery table myotherdataset.shipments
in the US multi-region.
First, display information about the myawsdataset.orders
table:
bqshowmyawsdataset.orders;
The output is similar to the following:
Last modified Schema Type Total URIs Expiration ----------------- -------------------------- ---------- ------------ ----------- 31 Oct 17:40:28 |- l_orderkey: integer EXTERNAL 1 |- l_partkey: integer |- l_suppkey: integer |- l_linenumber: integer |- l_returnflag: string |- l_linestatus: string |- l_commitdate: date
Next, display information about the myotherdataset.shipments
table:
bqshowmyotherdataset.shipments
The output is similar to the following. Some columns are omitted to simplify the output.
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Clustered Fields Total Logical ----------------- --------------------------- ------------ ------------- ------------ ------------------- ------------------ --------------- 31 Oct 17:34:31 |- l_orderkey: integer 3086653 210767042 210767042 |- l_partkey: integer |- l_suppkey: integer |- l_commitdate: date |- l_shipdate: date |- l_receiptdate: date |- l_shipinstruct: string |- l_shipmode: string
Now, using the CREATE TABLE AS SELECT
statement you can selectively load data to the myotherdataset.orders
table in the US multi-region:
CREATEORREPLACETABLEmyotherdataset.ordersPARTITIONBYDATE_TRUNC(l_commitdate,YEAR)ASSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1992;
You can then perform a join operation with the newly created table:
SELECTorders.l_orderkey,orders.l_orderkey,orders.l_suppkey,orders.l_commitdate,orders.l_returnflag,shipments.l_shipmode,shipments.l_shipinstructFROMmyotherdataset.shipmentsJOIN`myotherdataset.orders`asordersONorders.l_orderkey=shipments.l_orderkeyANDorders.l_partkey=shipments.l_partkeyANDorders.l_suppkey=shipments.l_suppkeyWHEREorders.l_returnflag='R';-- 'R' means refunded.
When new data is available, append the data of the 1993 year to the destination table using the INSERT INTO SELECT
statement:
INSERTINTOmyotherdataset.ordersSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1993;
The following example inserts data into an ingestion-time partitioned table:
CREATETABLEmydataset.orders(idString,numeric_idINT64)PARTITIONBY_PARTITIONDATE;
After creating a partitioned table, you can insert data into the ingestion-time partitioned table:
INSERTINTOmydataset.orders(_PARTITIONTIME,id,numeric_id)SELECTTIMESTAMP("2023-01-01"),id,numeric_id,FROMmydataset.ordersof23WHEREnumeric_id>4000000;
CREATE TABLE LIKE
statementCreates a new table with all of the same metadata of another table.
CREATE[ORREPLACE]TABLE[IFNOTEXISTS]table_nameLIKE[[project_name.]dataset_name.]source_table_name...[OPTIONS(table_option_list)]
This statement is a variant of the CREATE TABLE
statement and has the same limitations. Other than the use of the LIKE
clause in place of a column list, the syntax is identical to the CREATE TABLE
syntax.
The CREATE TABLE LIKE
statement copies only the metadata of the source table. You can use the AS query_statement
clause to include data into the new table.
The new table has no relationship to the source table after creation; thus modifications to the source table will not propagate to the new table.
By default, the new table inherits partitioning, clustering, and options metadata from the source table. You can customize metadata in the new table by using the optional clauses in the SQL statement. For example, if you want to specify a different set of options for the new table, then include the OPTIONS
clause with a list of options and values. This behavior matches that of ALTER TABLE SET OPTIONS
.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the table. |
bigquery.tables.get | The source table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
and bigquery.tables.updateData
permissions.
If the OPTIONS
clause includes any expiration options, then the bigquery.tables.delete
permission is also required.
The following example creates a new table named newtable
in mydataset
with the same metadata as sourcetable
:
CREATETABLEmydataset.newtableLIKEmydataset.sourcetable
The following example creates a new table named newtable
in mydataset
with the same metadata as sourcetable
and the data from the SELECT
statement:
CREATETABLEmydataset.newtableLIKEmydataset.sourcetableASSELECT*FROMmydataset.myothertable
CREATE TABLE COPY
statementCreates a table that has the same metadata and data as another table. The source table can be a table, a table clone, or a table snapshot.
CREATE[ORREPLACE]TABLE[IFNOTEXISTS]table_nameCOPYsource_table_name...[OPTIONS(table_option_list)]
This statement is a variant of the CREATE TABLE
statement and has the same limitations. Other than the use of the COPY
clause in place of a column list, the syntax is identical to the CREATE TABLE
syntax.
The CREATE TABLE COPY
statement copies both the metadata and data from the source table.
The new table inherits partitioning and clustering from the source table. By default, the table options metadata from the source table are also inherited, but you can override table options by using the OPTIONS
clause. The behavior is equivalent to running ALTER TABLE SET OPTIONS
after the table is copied.
The new table has no relationship to the source table after creation; modifications to the source table are not propagated to the new table.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the table copy. |
bigquery.tables.get | The source table. |
bigquery.tables.getData | The source table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
and bigquery.tables.updateData
permissions.
If the OPTIONS
clause includes any expiration options, then the bigquery.tables.delete
permission is also required.
CREATE SNAPSHOT TABLE
statementCreates a table snapshot based on a source table. The source table can be a table, a table clone, or a table snapshot.
CREATESNAPSHOTTABLE[IFNOTEXISTS]table_snapshot_nameCLONEsource_table_name[FORSYSTEM_TIMEASOFtime_expression][OPTIONS(snapshot_option_list)]
IF NOT EXISTS
: If a table snapshot or other table resource exists with the same name, the CREATE
statement has no effect.
table_snapshot_name
: The name of the table snapshot that you want to create. The table snapshot name must be unique per dataset. See Table path syntax.
source_table_name
: The name of the table that you want to snapshot or the table snapshot that you want to copy. See Table path syntax.
If the source table is a standard table, then BigQuery creates a table snapshot of the source table. If the source table is a table snapshot, then BigQuery creates a copy of the table snapshot.
FOR SYSTEM_TIME AS OF
: Lets you select the version of the table that was current at the time specified by timestamp_expression
. It can only be used when creating a snapshot of a table; it can't be used when making a copy of a table snapshot.
snapshot_option_list
: Additional table snapshot creation options such as a label and an expiration time.
CREATE SNAPSHOT TABLE
statements must comply with the following rules:
CREATE
statement is allowed.FOR SYSTEM_TIME AS OF
clause can only be used when creating a snapshot of a table or table clone; it can't be used when making a copy of a table snapshot.snapshot_option_list
The option list lets you set table snapshot options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a table snapshot option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
expiration_timestamp | TIMESTAMP | Example: This property is equivalent to the |
friendly_name |
| Example: This property is equivalent to the |
description |
| Example: This property is equivalent to the |
labels |
| Example: This property is equivalent to the |
VALUE
is a constant expression that contains only literals, query parameters, and scalar functions.
The constant expression cannot contain:
SELECT
, CREATE
, and UPDATE
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
If VALUE
evaluates to NULL
, the corresponding option NAME
in the CREATE SNAPSHOT TABLE
statement is ignored.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the table snapshot. |
bigquery.tables.createSnapshot | The source table. |
bigquery.tables.get | The source table. |
bigquery.tables.getData | The source table. |
The following example creates a table snapshot of the table myproject.mydataset.mytable
. The table snapshot is created in the dataset mydataset
and is named mytablesnapshot
:
CREATESNAPSHOTTABLE`myproject.mydataset.mytablesnapshot` CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="my_table_snapshot",description="A table snapshot that expires in 2 days",labels=[("org_unit","development")])
If the table snapshot name already exists in the dataset, then the following error is returned:
Already Exists: myproject.mydataset.mytablesnapshot
The table snapshot option list specifies the following:
my_table_snapshot
A table snapshot that expires in 2 days
org_unit = development
The following example creates a table snapshot of the table myproject.mydataset.mytable
. The table snapshot is created in the dataset mydataset
and is named mytablesnapshot
:
CREATESNAPSHOTTABLEIFNOTEXISTS`myproject.mydataset.mytablesnapshot` CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="my_table_snapshot",description="A table snapshot that expires in 2 days"labels=[("org_unit","development")])
The table snapshot option list specifies the following:
my_table_snapshot
A table snapshot that expires in 2 days
org_unit = development
If the table snapshot name already exists in the dataset, then no action is taken, and no error is returned.
For information about restoring table snapshots, see CREATE TABLE CLONE
.
For information about removing table snapshots, see DROP SNAPSHOT TABLE
.
CREATE TABLE CLONE
statementCreates a table clone based on a source table. The source table can be a table, a table clone, or a table snapshot.
CREATETABLE[IFNOTEXISTS]destination_table_nameCLONEsource_table_name[FORSYSTEM_TIMEASOFtime_expression]...[OPTIONS(table_option_list)]
Other than the use of the CLONE
clause in place of a column list, the syntax is identical to the CREATE TABLE
syntax.
IF NOT EXISTS
: If the specified destination table name already exists, the CREATE
statement has no effect.
destination_table_name
: The name of the table that you want to create. The table name must be unique per dataset. The table name can contain the following:
OPTIONS(table_option_list)
: Lets you specify additional table creation options such as a label and an expiration time.
source_table_name
: The name of the source table.
CREATE TABLE CLONE
statements must comply with the following rules:
CREATE
statement is allowed.OPTIONS
CREATE TABLE CLONE
options are the same as CREATE TABLE
options.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the table clone. |
bigquery.tables.get | The source table. |
bigquery.tables.getData | The source table. |
bigquery.tables.restoreSnapshot | The source table (required only if the source table is a table snapshot). |
If the OPTIONS
clause includes any expiration options, then the bigquery.tables.delete
permission is also required.
The following example creates the table myproject.mydataset.mytable
from the table snapshot myproject.mydataset.mytablesnapshot
:
CREATETABLE`myproject.mydataset.mytable` CLONE`myproject.mydataset.mytablesnapshot`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL365DAY),friendly_name="my_table",description="A table that expires in 1 year",labels=[("org_unit","development")])
If the table name exists in the dataset, then the following error is returned:
Already Exists: myproject.mydataset.mytable.
The table option list specifies the following:
my_table
A table that expires in 1 year
org_unit = development
The following example creates the table clone myproject.mydataset.mytableclone
based on the table myproject.mydataset.mytable
:
CREATETABLEIFNOTEXISTS`myproject.mydataset.mytableclone`CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL365DAY),friendly_name="my_table",description="A table that expires in 1 year",labels=[("org_unit","development")])
The table option list specifies the following:
my_table
A table that expires in 1 year
org_unit = development
If the table name exists in the dataset, then no action is taken, and no error is returned.
For information about creating a copy of a table, see CREATE TABLE COPY
.
For information about creating a snapshot of a table, see CREATE SNAPSHOT TABLE
.
CREATE VIEW
statementCreates a new view.
CREATE[ORREPLACE]VIEW[IFNOTEXISTS]view_name[(view_column_name_list)][OPTIONS(view_option_list)]ASquery_expressionview_column_name_list:=view_column[,...]view_column:=column_name[OPTIONS(view_column_option_list)]
OR REPLACE
: Replaces any view with the same name if it exists. Cannot appear with IF NOT EXISTS
.
IF NOT EXISTS
: If a view or other table resource exists with the same name, the CREATE
statement has no effect. Cannot appear with OR REPLACE
.
view_name
: The name of the view you're creating. See Table path syntax.
view_column_name_list
: Lets you explicitly specify the column names of the view, which may be aliases to the column names in the underlying SQL query.
view_option_list
: Additional view creation options such as a label and an expiration time.
query_expression
: The GoogleSQL query expression used to define the view.
CREATE VIEW
statements must comply with the following rules:
CREATE
statement is allowed.view_column_name_list
The view's column name list is optional. The names must be unique but do not have to be the same as the column names of the underlying SQL query. For example, if your view is created with the following statement:
CREATEVIEWmydataset.age_groups(age,count)ASSELECTage,COUNT(*)FROMmydataset.peoplegroupbyage;
Then you can query it with:
SELECTage,countfrommydataset.age_groups;
The number of columns in the column name list must match the number of columns in the underlying SQL query. If the columns in the table of the underlying SQL query is added or dropped, the view becomes invalid and must be recreated. For example, if the age
column is dropped from the mydataset.people
table, then the view created in the previous example becomes invalid.
view_column_option_list
The view_column_option_list
lets you specify optional top-level column options. Column options for a view have the same syntax and requirements as for a table, but with a different list of NAME
and VALUE
fields:
NAME | VALUE | Details |
---|---|---|
description |
| Example: |
view_option_list
The option list allows you to set view options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a view option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
expiration_timestamp | TIMESTAMP | Example: This property is equivalent to the expirationTime table resource property. |
friendly_name |
| Example: This property is equivalent to the friendlyName table resource property. |
description |
| Example: This property is equivalent to the description table resource property. |
labels |
| Example: This property is equivalent to the labels table resource property. |
privacy_policy |
| The policies to enforce when anyone queries the view. To learn more about the policies available for a view, see the |
VALUE
is a constant expression containing only literals, query parameters, and scalar functions.
The constant expression cannot contain:
SELECT
, CREATE
, or UPDATE
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
If VALUE
evaluates to NULL
, the corresponding option NAME
in the CREATE VIEW
statement is ignored.
privacy_policy
The following policies are available in the privacy_policy
view option to create analysis rules. A policy represents a condition that needs to be met before a query can be run.
Policy | Details |
---|---|
| The aggregation threshold policy to enforce when a view is queried. Syntax: '{ "aggregation_threshold_policy": { "threshold": value, "privacy_unit_columns": value } }' Parameters:
Example: privacy_policy='{"aggregation_threshold_policy": {"threshold" : 50, "privacy_unit_columns": "ID"}}' |
| A differential privacy policy for the view. When this parameter is included, only differentially private queries can be run on the view. Syntax: '{ "differential_privacy_policy": { "privacy_unit_column": value, "max_epsilon_per_query": value, "epsilon_budget": value, "delta_per_query": value, "delta_budget": value, "max_groups_contributed": value } }' Parameters:
Example: privacy_policy='{"differential_privacy_policy": { "privacy_unit_column": "contributor_id", "max_epsilon_per_query": 0.01, "epsilon_budget": 25.6, "delta_per_query": 0.005, "delta_budget": 9.6, "max_groups_contributed": 2}}' |
| A join restriction policy for the view. When this parameter is included, only the specified joins can be run on the specified columns in the view. This policy can be used alone or with other policies, such as the aggregation threshold or differential privacy policy. Syntax: '{ "join_restriction_policy": { "join_condition": value, "join_allowed_columns": value } }' Parameters:
Example: privacy_policy='{"join_restriction_policy": { "join_condition": 'JOIN_ANY', "join_allowed_columns": ['col1', 'col2']}}' |
If the view is created in the same project used to run the CREATE VIEW
statement, the view body query_expression
can reference entities without specifying the project; the default project is the project which owns the view. Consider the sample query below.
CREATEVIEWmyProject.myDataset.myViewASSELECT*FROManotherDataset.myTable;
After running the above CREATE VIEW
query in the project myProject
, you can run the query SELECT * FROM myProject.myDataset.myView
. Regardless of the project you choose to run this SELECT
query, the referenced table anotherDataset.myTable
is always resolved against project myProject
.
If the view is not created in the same project used to run the CREATE VIEW
statement, then all references in the view body query_expression
must be qualified with project IDs. For instance, the preceding sample CREATE VIEW
query is invalid if it runs in a project different from myProject
.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the view. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
permission.
If the OPTIONS
clause includes an expiration time, then the bigquery.tables.delete
permission is also required.
The following example creates a view named newview
in mydataset
:
CREATEVIEW`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
If the view name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view option list specifies the:
newview
A view that expires in 2 days
org_unit = development
The following example creates a view named newview
in mydataset
only if no view named newview
exists in mydataset
. If the view name exists in the dataset, no error is returned, and no action is taken.
CREATEVIEWIFNOTEXISTS`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view option list specifies the:
newview
A view that expires in 2 days
org_unit = development
The following example creates a view named newview
in mydataset
, and if newview
exists in mydataset
, it is overwritten using the specified query expression.
CREATEORREPLACEVIEW`myproject.mydataset.newview` OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM
myproject.mydataset.mytable
The view option list specifies the:
newview
A view that expires in 2 days
org_unit = development
The following example creates a view named newview
in mydataset
. This view definition provides the column description for each column in mytable
. You can rename columns from the original query.
CREATEVIEW`myproject.mydataset.newview`(column_1_new_nameOPTIONS(DESCRIPTION='Description of the column 1 contents'),column_2_new_nameOPTIONS(DESCRIPTION='Description of the column 2 contents'),column_3_new_nameOPTIONS(DESCRIPTION='Description of the column 3 contents'))ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
CREATE MATERIALIZED VIEW
statementCreates a new materialized view.
CREATE[ORREPLACE]MATERIALIZEDVIEW[IFNOTEXISTS]materialized_view_name[PARTITIONBYpartition_expression][CLUSTERBYclustering_column_list][OPTIONS(materialized_view_option_list)]ASquery_expression
OR REPLACE
: Replaces a materialized view with the same name if it exists. Cannot appear with IF NOT EXISTS
.
IF NOT EXISTS
: If a materialized view or other table resource exists with the same name, the CREATE
statement has no effect. Cannot appear with OR REPLACE
.
materialized_view_name
: The name of the materialized view you're creating. See Table path syntax.
If the project_name
is omitted from the materialized view name, or it is the same as the project that runs this DDL query, then the latter is also used as the default project for references to tables, functions, and other resources in query_expression
. The default project of the references is fixed and does not depend on the future queries that invoke the new materialized view. Otherwise, all references in query_expression
must be qualified with project names.
The materialized view name must be unique per dataset.
partition_expression
: An expression that determines how to partition the table. A materialized view can only be partitioned in the same way as the table in query expression
(the base table) is partitioned.
clustering_column_list
: A comma-separated list of column references that determine how to cluster the materialized view.
materialized_view_option_list
: Allows you to specify additional materialized view options such as a whether refresh is enabled, the refresh interval, a label, and an expiration time.
query_expression
: The GoogleSQL query expression used to define the materialized view.
CREATE MATERIALIZED VIEW
statements must comply with the following rules:
CREATE
statement is allowed.If the materialized view is created in the same project used to run the CREATE MATERIALIZED VIEW
statement, the materialized view body query_expression
can reference entities without specifying the project; the default project is the project which owns the materialized view. Consider the sample query below.
CREATEMATERIALIZEDVIEWmyProject.myDataset.myViewASSELECT*FROManotherDataset.myTable;
After running the above CREATE MATERIALIZED VIEW
query in the project myProject
, you can run the query SELECT * FROM myProject.myDataset.myView
. Regardless of the project you choose to run this SELECT
query, the referenced table anotherDataset.myTable
is always resolved against project myProject
.
If the materialized view is not created in the same project used to run the CREATE VIEW
statement, then all references in the materialized view body query_expression
must be qualified with project IDs. For instance, the preceding sample CREATE MATERIALIZED VIEW
query is invalid if it runs in a project different from myProject
.
materialized_view_option_list
The option list allows you to set materialized view options such as a whether refresh is enabled. the refresh interval, a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a materialized view option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
enable_refresh | BOOLEAN | Example: |
refresh_interval_minutes | FLOAT64 | Example: |
expiration_timestamp | TIMESTAMP | Example: This property is equivalent to the expirationTime table resource property. |
max_staleness | INTERVAL | Example: The |
allow_non_incremental_definition | BOOLEAN | Example: The |
kms_key_name |
| Example: This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property. See more details about Protecting data with Cloud KMS keys. |
friendly_name |
| Example: This property is equivalent to the friendlyName table resource property. |
description |
| Example: This property is equivalent to the description table resource property. |
labels |
| Example: This property is equivalent to the labels table resource property. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the materialized view. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
permission.
If the OPTIONS
clause includes any expiration options, then the bigquery.tables.delete
permission is also required.
The following example creates a materialized view named new_mv
in mydataset
:
CREATEMATERIALIZEDVIEW`myproject.mydataset.new_mv` OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="new_mv",description="a materialized view that expires in 2 days",labels=[("org_unit","development")],enable_refresh=true,refresh_interval_minutes=20)ASSELECTcolumn_1,SUM(column_2)ASsum_2,AVG(column_3)ASavg_3FROM`myproject.mydataset.mytable`GROUPBYcolumn_1
If the materialized view name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.materialized_view
When you use a DDL statement to create a materialized view, you must specify the project, dataset, and materialized view in the following format: `project_id.dataset.materialized_view`
(including the backticks if project_id
contains special characters); for example, `myproject.mydataset.new_mv`
.
The materialized view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The materialized view option list specifies the:
new_mv
A materialized view that expires in 2 days
org_unit = development
The following example creates a materialized view named new_mv
in mydataset
only if no materialized view named new_mv
exists in mydataset
. If the materialized view name exists in the dataset, no error is returned, and no action is taken.
CREATEMATERIALIZEDVIEWIFNOTEXISTS`myproject.mydataset.new_mv` OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="new_mv",description="a view that expires in 2 days",labels=[("org_unit","development")],enable_refresh=false)ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
The materialized view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The materialized view option list specifies the:
new_mv
A view that expires in 2 days
org_unit = development
The following example creates a materialized view named new_mv
in mydataset
, partitioned by the col_datetime
column and clustered by the col_int
column:
CREATEMATERIALIZEDVIEW`myproject.mydataset.new_mv` PARTITIONBYDATE(col_datetime)CLUSTERBYcol_intASSELECTcol_int,col_datetime,COUNT(1)ascntFROM`myproject.mydataset.mv_base_table` GROUPBYcol_int,col_datetime
The base table, mv_base_table
, must also be partitioned by the col_datetime
column. For more information, see Working with partitioned and clustered tables.
CREATE MATERIALIZED VIEW AS REPLICA OF
statementCreates a replica of a materialized view. The source materialized view must be over an Amazon Simple Storage Service (Amazon S3) BigLake table. You can use the materialized view replica to make Amazon S3 data available locally for joins.
For more information, see Create materialized view replicas.
CREATEMATERIALIZEDVIEWreplica_name[OPTIONS(materialized_view_replica_option_list)]ASREPLICAOFsource_materialized_view_name
replica_name
: The name of the materialized view replica you're creating, in table path syntax. If the project name is omitted from the materialized view replica name, the current project is used as the default.
The materialized view replica name must be unique for each dataset.
materialized_view_replica_option_list
: Allows you to specify options such as the replication interval.
source_materialized_view_name
: The name of the materialized view you are replicating, in table path syntax. The source materialized view must be over an Amazon S3 BigLake table, and must be authorized on the dataset that contains that table.
materialized_view_replica_option_list
The option list lets you set materialized view replica options.
Specify a materialized view replica option list in the following format:
NAME=VALUE, ...
NAME | VALUE | Details |
---|---|---|
replication_interval_seconds | INT64 | Specifies how often to replicate the data from the source materialized view to the replica. Must be a value between Example: |
This statement requires the following IAM permissions:
bigquery.tables.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.replicateData
bigquery.jobs.create
The following example creates a materialized view replica named mv_replica
in bq_dataset
:
CREATEMATERIALIZEDVIEW`myproject.bq_dataset.mv_replica` OPTIONS(replication_interval_seconds=600)ASREPLICAOF`myproject.s3_dataset.my_s3_mv`
CREATE EXTERNAL SCHEMA
statementCreates a new federated dataset.
A federated dataset is a connection between BigQuery and an external data source at the dataset level. For more information about creating federated datasets, see the following:
CREATEEXTERNALSCHEMA[IFNOTEXISTS]dataset_name[WITHCONNECTIONconnection_name][OPTIONS(external_schema_option_list)]
IF NOT EXISTS
: If any dataset exists with the same name, the CREATE
statement has no effect.
dataset_name
: The name of the dataset to create.
connection_name
: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`
).
external_schema_option_list
: A list of options for creating the federated dataset.
The dataset is created in the location that you specify in the query settings. For more information, see Specify locations. The location must support the kind of federated dataset that you are creating, for example, you can only create AWS Glue federated datasets in AWS locations.
For more information about creating a dataset, see Create datasets. For information about quotas, see dataset limits.
external_schema_option_list
The option list specifies options for the federated dataset. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
description | STRING | The description of the dataset. |
friendly_name | STRING | A descriptive name for the dataset. |
labels | <ARRAY<STRUCT<STRING, STRING>>> | An array of labels for the dataset, expressed as key-value pairs. |
location | STRING | The location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails. The location must support the kind of federated dataset that you are creating, for example, you can only create AWS Glue federated datasets in AWS locations. |
external_source | STRING | The source of the external dataset. For AWS Glue federated datasets this must be an Amazon Resource Name (ARN), with a prefix identifying the source, such as aws-glue:// . For Spanner federated datasets, this must be a specific Spanner database with a google-cloudspanner:/ prefix. For example: google-cloudspanner:/projects/my_project/instances/my_instance/databases/my_database . |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.create | The project where you create the federated dataset. |
bigquery.connections.use | The project where you create the federated dataset. |
bigquery.connections.delegate | The project where you create the federated dataset. |
The following example creates an AWS Glue federated dataset:
CREATEEXTERNALSCHEMAmydatasetWITHCONNECTIONmyproject.`aws-us-east-1`.myconnectionOPTIONS(external_source='aws-glue://arn:aws:glue:us-east-1:123456789:database/test_database',location='aws-us-east-1');
CREATE EXTERNAL TABLE
statementCreates a new external table.
External tables let BigQuery query data that is stored outside of BigQuery storage. For more information about external tables, see Introduction to external data sources.
CREATE[ORREPLACE]EXTERNALTABLE[IFNOTEXISTS]table_name[(column_namecolumn_schema,...)][WITHCONNECTION{connection_name|DEFAULT}][WITHPARTITIONCOLUMNS[(partition_column_namepartition_column_type,...)]]OPTIONS(external_table_option_list,...);
OR REPLACE
: Replaces any external table with the same name if it exists. Cannot appear with IF NOT EXISTS
.
IF NOT EXISTS
: If an external table or other table resource exists with the same name, the CREATE
statement has no effect. Cannot appear with OR REPLACE
.
table_name
: The name of the external table. See Table path syntax.
column_name
: The name of a column in the table.
column_schema
: Specifies the schema of the column. It uses the same syntax as the column_schema
definition in the CREATE TABLE
statement. If you don't include this clause, BigQuery detects the schema automatically.
connection_name
: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`
). To use a default connection, specify DEFAULT
instead of the connection string containing PROJECT_ID.LOCATION.CONNECTION_ID.
partition_column_name
: The name of a partition column. Include this field if your external data uses a hive-partitioned layout. For more information, see: Supported data layouts.
partition_column_type
: The partition column type.
external_table_option_list
: A list of options for creating the external table.
The CREATE EXTERNAL TABLE
statement does not support creating temporary external tables.
To create an externally partitioned table, use the WITH PARTITION COLUMNS
clause to specify the partition schema details. BigQuery validates the column definitions against the external data location. The schema declaration must strictly follow the ordering of the fields in the external path. For more information about external partitioning, see Querying externally partitioned data.
external_table_option_list
The option list specifies options for creating the external table. The format
and uris
options are required. Specify the option list in the following format: NAME=VALUE, ...
Options | |
---|---|
allow_jagged_rows |
If Applies to CSV data. |
allow_quoted_newlines |
If Applies to CSV data. |
bigtable_options |
Only required when creating a Bigtable external table. Specifies the schema of the Bigtable external table in JSON format. For a list of Bigtable table definition options, see |
compression |
The compression type of the data source. Supported values include: Applies to CSV and JSON data. |
decimal_target_types |
Determines how to convert a Example: |
description |
A description of this table. |
enable_list_inference |
If Applies to Parquet data. |
enable_logical_types |
If Applies to Avro data. |
encoding |
The character encoding of the data. Supported values include: Applies to CSV data. |
enum_as_string |
If Applies to Parquet data. |
expiration_timestamp |
The time when this table expires. If not specified, the table does not expire. Example: |
field_delimiter |
The separator for fields in a CSV file. Applies to CSV data. |
format |
The format of the external data. Supported values for Supported values for The value |
hive_partition_uri_prefix |
A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables. Applies to Avro, CSV, JSON, Parquet, and ORC data. Example: |
file_set_spec_type |
Specifies how to interpret source URIs for load jobs and external tables. Supported values include:
For example, if you have a source URI of |
ignore_unknown_values |
If Applies to CSV and JSON data. |
json_extension |
For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records. Supported values include: |
max_bad_records |
The maximum number of bad records to ignore when reading the data. Applies to: CSV, JSON, and Google Sheets data. |
max_staleness |
Applicable for BigLake tables and object tables. Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it. To disable metadata caching, specify 0. This is the default. To enable metadata caching, specify an interval literal value between 30 minutes and 7 days. For example, specify |
null_marker |
The string that represents Applies to CSV data. |
object_metadata |
Only required when creating an object table. Set the value of this option to |
preserve_ascii_control_characters |
If Applies to CSV data. |
projection_fields |
A list of entity properties to load. Applies to Datastore data. |
quote |
The string used to quote data sections in a CSV file. If your data contains quoted newline characters, also set the Applies to CSV data. |
reference_file_schema_uri |
User provided reference file with the table schema. Applies to Parquet/ORC/AVRO data. Example: |
require_hive_partition_filter |
If Applies to Avro, CSV, JSON, Parquet, and ORC data. |
sheet_range |
Range of a Google Sheets spreadsheet to query from. Applies to Google Sheets data. Example: |
skip_leading_rows |
The number of rows at the top of a file to skip when reading the data. Applies to CSV and Google Sheets data. |
uris | For external tables, including object tables, that aren't Bigtable tables:
An array of fully qualified URIs for the external data locations. Each URI can contain one asterisk ( The following examples show valid
For Bigtable tables:
The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI. Example: For more information on constructing a Bigtable URI, see Retrieve the Bigtable URI. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create | The dataset where you create the external table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
permission.
If the OPTIONS
clause includes an expiration time, then the bigquery.tables.delete
permission is also required.
The following example creates a BigLake table and explicitly specifies the schema. It also specifies refreshing metadata cache automatically at a system-defined interval.
CREATEORREPLACEEXTERNALTABLEmydataset.newtable(xINT64,ySTRING,zBOOL)WITHCONNECTIONmyconnectionOPTIONS(format="PARQUET",max_staleness=STALENESS_INTERVAL,metadata_cache_mode='AUTOMATIC');
The following example creates an external table from multiple URIs. The data format is CSV. This example uses schema auto-detection.
CREATEEXTERNALTABLEdataset.CsvTableOPTIONS(format='CSV',uris=['gs://bucket/path1.csv','gs://bucket/path2.csv']);
The following example creates an external table from a CSV file and explicitly specifies the schema. It also specifies the field delimiter ('|'
) and sets the maximum number of bad records allowed.
CREATEORREPLACEEXTERNALTABLEdataset.CsvTable(xINT64,ySTRING)OPTIONS(format='CSV',uris=['gs://bucket/path1.csv'],field_delimiter='|',max_bad_records=5);
The following example creates an externally partitioned table. It uses schema auto-detection to detect both the file schema and the hive partitioning layout. If the external path is gs://bucket/path/field_1=first/field_2=1/data.parquet
, the partition columns are detected as field_1
(STRING
) and field_2
(INT64
).
CREATEEXTERNALTABLEdataset.AutoHivePartitionedTableWITHPARTITIONCOLUMNSOPTIONS(uris=['gs://bucket/path/*'],format='PARQUET',hive_partition_uri_prefix='gs://bucket/path',require_hive_partition_filter=false);
The following example creates an externally partitioned table by explicitly specifying the partition columns. This example assumes that the external file path has the pattern gs://bucket/path/field_1=first/field_2=1/data.parquet
.
CREATEEXTERNALTABLEdataset.CustomHivePartitionedTableWITHPARTITIONCOLUMNS(field_1STRING,-- column order must match the external pathfield_2INT64)OPTIONS(uris=['gs://bucket/path/*'],format='PARQUET',hive_partition_uri_prefix='gs://bucket/path',require_hive_partition_filter=false);
CREATE FUNCTION
statementCreates a new user-defined function (UDF). BigQuery supports UDFs written in SQL, JavaScript, or Python.
To create a SQL UDF, use the following syntax:
CREATE[ORREPLACE][TEMPORARY|TEMP]FUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([named_parameter[,...]])([named_parameter[,...]])[RETURNSdata_type]AS(sql_expression)[OPTIONS(function_option_list)]named_parameter:param_nameparam_type
To create a JavaScript UDF, use the following syntax:
CREATE[ORREPLACE][TEMPORARY|TEMP]FUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([named_parameter[,...]])RETURNSdata_type[determinism_specifier]LANGUAGEjs[OPTIONS(function_option_list)]ASjavascript_codenamed_parameter:param_nameparam_typedeterminism_specifier:{DETERMINISTIC|NOTDETERMINISTIC}
To create a Python UDF, use the following syntax:
CREATE[ORREPLACE]FUNCTION[IFNOTEXISTS][project_name.]dataset_name.function_name([named_parameter[,...]])RETURNSdata_typeLANGUAGEpython[WITHCONNECTIONconnection_path]OPTIONS(function_option_list)ASpython_codenamed_parameter:param_nameparam_type
To create a remote function, use the following syntax:
CREATE[ORREPLACE][TEMPORARY|TEMP]FUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([named_parameter[,...]])RETURNSdata_typeREMOTEWITHCONNECTIONconnection_path[OPTIONS(function_option_list)]named_parameter:param_nameparam_type
Routine names must contain only letters, numbers, and underscores, and be at most 256 characters long.
OR REPLACE
: Replaces any function with the same name if it exists. Cannot appear with IF NOT EXISTS
.
IF NOT EXISTS
: If any dataset exists with the same name, the CREATE
statement has no effect. Cannot appear with OR REPLACE
.
TEMP
or TEMPORARY
: Creates a temporary function. If the clause is not present, the statement creates a persistent UDF. You can reuse persistent UDFs across multiple queries, whereas you can only use temporary UDFs in a single query, script, session, or procedure.
project_name
: For persistent functions, the name of the project where you are creating the function. Defaults to the project that runs the DDL query. Do not include the project name for temporary functions.
dataset_name
: For persistent functions, the name of the dataset where you are creating the function. Defaults to the defaultDataset
in the request. Do not include the dataset name for temporary functions.
function_name
: The name of the function.
named_parameter
: A comma-separated param_name
and param_type
pair. The value of param_type
is a BigQuery data type. For a SQL UDF, the value of param_type
can also be ANY TYPE
.
determinism_specifier
: Applies only to JavaScript UDFs. Provides a hint to BigQuery as to whether the query result can be cached. Can be one of the following values:
DETERMINISTIC
: The function always returns the same result when passed the same arguments. The query result is potentially cacheable. For example, if the function add_one(i)
always returns i + 1
, the function is deterministic.
NOT DETERMINISTIC
: The function does not always return the same result when passed the same arguments, and therefore is not cacheable. For example, if the functionj add_random(i)
returns i + rand()
, the function is not deterministic and BigQuery does not use cached results.
If all of the invoked functions are DETERMINISTIC
, BigQuery tries to cache the result, unless the results can't be cached for other reasons. For more information, see Using cached query results.
data_type
: The data type that the function returns.
If the function is defined in SQL, then the RETURNS
clause is optional. If the RETURNS
clause is omitted, then BigQuery infers the result type of the function from the SQL function body when a query calls the function.
If the function is defined in JavaScript, then the RETURNS
clause is required. For more information about allowed values for data_type
, see Supported JavaScript UDF data types.
sql_expression
: The SQL expression that defines the function.
function_option_list
: A list of options for creating the function.
javascript_code
: The definition of a JavaScript function. The value is a string literal. If the code includes quotes and backslashes, it must be either escaped or represented as a raw string. For example, the code return "\n";
can be represented as one of the following:
"return \"\\n\";"
. Both quotes and backslashes need to be escaped."""return "\\n";"""
. Backslashes need to be escaped, quotes don't.r"""return "\n";"""
. No escaping is needed.python_code
: The definition of a Python function. The value is a string literal. If the code includes quotes and backslashes, it must be escaped or represented as a raw string. For example, the code return "\n";
can be represented as one of the following:
"return \"\\n\";"
. Both quotes and backslashes need to be escaped."""return "\\n";"""
. Backslashes need to be escaped, quotes don't.r"""return "\n";"""
. No escaping is needed.connection_name
: Specifies a connection resource that has credentials for accessing the remote endpoint or for running Python code. Specify the connection name in the form project_name.location.connection_id
: If the project name or location contains a dash, enclose the connection name in backticks (`
).
function_option_list
The option list specifies options for creating a UDF. The following options are supported:
NAME | VALUE | Details |
---|---|---|
description |
| A description of the UDF. This option isn't supported when creating a temporary function. |
library |
| An array of JavaScript libraries to include in the function definition. Applies only to JavaScript and Python UDFs. For more information, see Including JavaScript libraries. Example: |
endpoint |
| An HTTP endpoint of Cloud Functions. Applies only to remote functions. Example: For more information, see Create a remote function. |
user_defined_context |
| A list of key-value pairs that will be sent with every HTTP request when the function is invoked. Applies only to remote functions. Example: |
max_batching_rows |
| The maximum number of rows in each HTTP request. If not specified, BigQuery decides how many rows are included in a HTTP request. Applies only to remote functions and Python UDFs. |
runtime_version |
| The name of the runtime version to run provided Python code. Applies only to Python UDFs. Example: |
entry_point |
| The name of the function defined in Python code as the entry point when the Python UDF is invoked. Applies only to Python UDFs. |
packages |
| An array of Python packages to install in the function definition. Applies only to Python UDFs. For more information, see Use third party packages. Example: |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.routines.create | The dataset where you create the function. |
In addition, the OR REPLACE
clause requires bigquery.routines.update
permission.
To create a remote function, additional IAM permissions are needed:
Permission | Resource |
---|---|
bigquery.connections.delegate | The connection which you use to create the remote function. |
The following example creates a persistent SQL UDF named multiplyInputs
in a dataset named mydataset
.
CREATEFUNCTIONmydataset.multiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64AS(x*y);
The following example creates a temporary JavaScript UDF named multiplyInputs
and calls it from inside a SELECT
statement.
CREATETEMPFUNCTIONmultiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64LANGUAGEjsASr""" return x*y;""";SELECTmultiplyInputs(a,b)FROM(SELECT3asa,2asb);
The following example creates a temporary remote function named tempRemoteMultiplyInputs
in US
location, using a connection called myconnection
in the 'US' region.
CREATETEMPFUNCTIONtempRemoteMultiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64REMOTEWITHCONNECTIONus.myconnectionOPTIONS(endpoint="https://us-central1-myproject.cloudfunctions.net/multiply");
The following example creates a persistent remote function named remoteMultiplyInputs
in a dataset named mydataset
using a connection called myconnection
. The location and project of the dataset and the connection must match.
CREATEFUNCTIONmydataset.remoteMultiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64REMOTEWITHCONNECTIONus.myconnectionOPTIONS(endpoint="https://us-central1-myproject.cloudfunctions.net/multiply");
The following example creates a Python UDF named multiplyInputs
.
CREATEFUNCTIONmydataset.multiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64LANGUAGEpythonOPTIONS(entry_point='multiply',runtime_version='python-3.11'packages=['pandas==2.2'])ASr"""import pandas as pddef multiply(df: pd.DataFrame): return df['x'] * df['y']""";
CREATE AGGREGATE FUNCTION
statement (SQL)Creates a new SQL user-defined aggregate function (UDAF).
To create a SQL UDAF, use the following syntax:
CREATE[ORREPLACE][{TEMPORARY|TEMP}]AGGREGATEFUNCTION[IFNOTEXISTS]function_path([function_parameter[,...]])[RETURNSdata_type]AS(sql_function_body)[OPTIONS(function_option_list)]function_path:[[project_name.]dataset_name.]function_namefunction_parameter:parameter_namedata_type[NOTAGGREGATE]
OR REPLACE
: Replaces any function with the same name if it exists. OR REPLACE
can't appear with IF NOT EXISTS
. IF NOT EXISTS
: If any dataset exists with the same name, the CREATE
statement has no effect. IF NOT EXISTS
can't appear with OR REPLACE
. TEMP
or TEMPORARY
: The function is temporary; that is, it exists for the lifetime of a single query, script, session, or procedure. A temporary function can't have the same name as a built-in function. If the names match, an error is produced. If TEMP
or TEMPORARY
is not included, a persistent function is created. You can reuse persistent functions across multiple queries. function_path
: The path where the function must be created and the name of the function. project_name
: For persistent functions, the name of the project where you are creating the function. Defaults to the project that runs the DDL query. Don't include the project name for temporary functions. dataset_name
: For persistent functions, the name of the dataset where you are creating the function. Defaults to defaultDataset
in the request. Don't include the dataset name for temporary functions. function_name
: The name of the function. Function names must contain only letters, numbers, and underscores, and be at most 256 characters long. function_parameter
: A parameter for the function. parameter_name
: The name of the function parameter. parameter_data_type
: The GoogleSQL data type for the function parameter. NOT AGGREGATE
: The function parameter is not an aggregate. A non-aggregate function parameter can appear anywhere in the function definition. return_data_type
: The GoogleSQL data type that the function should return. GoogleSQL infers the result data type of the function from the function body when the RETURN
clause is omitted. function_body
: The SQL expression that defines the function body. function_option_list
: A list of options for creating the function. For more information, see function_option_list
. function_option_list
The option list specifies options for creating a SQL UDAF. The following options are supported:
NAME | VALUE | Details |
---|---|---|
description |
| A description of the UDAF. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.routines.create | The dataset where you create the function. |
In addition, the OR REPLACE
clause requires the bigquery.routines.update
permission.
The following example shows a persistent SQL UDAF that includes a non-aggregate function parameter. Inside the function definition, the aggregate SUM
method takes the aggregate function parameter dividend, while the non-aggregate division operator ( /
) takes the non-aggregate function parameter divisor.
CREATEAGGREGATEFUNCTIONmyProject.myDataset.ScaledSum(dividendFLOAT64,divisorFLOAT64NOTAGGREGATE)RETURNSFLOAT64AS(SUM(dividend)/divisor);-- Call the SQL UDAF.SELECTScaledSum(col1,2)ASscaled_sumFROM(SELECT1AScol1UNIONALLSELECT3AScol1UNIONALLSELECT5AScol1);/*------------* | scaled_sum | +------------+ | 4.5 | *------------*/
CREATE AGGREGATE FUNCTION
statement (JavaScript)Creates a new JavaScript user-defined aggregate function (UDAF).
To create a JavaScript UDAF, use the following syntax:
CREATE[ORREPLACE][{TEMPORARY|TEMP}]AGGREGATEFUNCTION[IFNOTEXISTS]function_path([function_parameter[,...]])RETURNSreturn_data_typeLANGUAGEjs[OPTIONS(function_option_list)]ASfunction_bodyfunction_path:[[project_name.]dataset_name.]function_namefunction_parameter:parameter_nameparameter_data_type[NOTAGGREGATE]
OR REPLACE
: Replaces any function with the same name if it exists. OR REPLACE
can't appear with IF NOT EXISTS
. IF NOT EXISTS
: If any dataset exists with the same name, the CREATE
statement has no effect. IF NOT EXISTS
can't appear with OR REPLACE
. TEMP
or TEMPORARY
: The function is temporary; that is, it exists for the lifetime of a single query, script, session, or procedure. A temporary function can't have the same name as a built-in function. If the names match, an error is produced. If TEMP
or TEMPORARY
is not included, a persistent function is created. You can reuse persistent functions across multiple queries. function_path
: The path where the function must be created and the name of the function. project_name
: For persistent functions, the name of the project where you are creating the function. Defaults to the project that runs the DDL query. Don't include the project name for temporary functions. dataset_name
: For persistent functions, the name of the dataset where you are creating the function. Defaults to defaultDataset
in the request. Don't include the dataset name for temporary functions. function_name
: The name of the function. Function names must contain only letters, numbers, and underscores, and be at most 256 characters long. function_parameter
: A parameter for the function. parameter_name
: The name of the function parameter. parameter_data_type
: The GoogleSQL data type for the function parameter. NOT AGGREGATE
: The function parameter is not an aggregate. Only one non-aggregate function parameter is allowed per JavaScript UDAF, and it must be the last parameter in the list. return_data_type
: The GoogleSQL data type that the function should return. function_body
: The JavaScript expression that defines the function body. For more information, see function_body
. function_option_list
: A list of options for creating the function. For more information, see function_option_list
. function_body
The body of the JavaScript function must be a quoted string literal that represents the JavaScript code. To learn more about the different types of quoted string literals you can use, see Formats for quoted literals.
Only certain type encodings are allowed. To learn more, see SQL type encodings in a JavaScript UDAF.
The JavaScript function body must include four JavaScript functions that initialize, aggregate, merge, and finalize the results for the JavaScript UDAF. To learn more about the initialState
, aggregate
, merge
, and finalize
JavaScript functions, see Required aggregate functions in a JavaScript UDAF.
Only serialized data can be passed into the JavaScript aggregate functions. If you need to serialize data such as functions or symbols to pass them into the aggregate functions, use the JavaScript serialization functions. For more information, see Serialization functions for a JavaScript UDAF.
function_option_list
The option list specifies options for creating a JavaScript UDAF. The following options are supported:
NAME | VALUE | Details |
---|---|---|
description |
| A description of the UDAF. |
library |
| An array of JavaScript libraries to include in the JavaScript UDAF function body. Example: |
In JavaScript UDAFs, GoogleSQL data types represent JavaScript data types in the following manner:
GoogleSQL data type | JavaScript data type | Notes |
---|---|---|
ARRAY | Array | An array of arrays is not supported. To get around this limitation, use the Array<Object<Array>> (JavaScript) and ARRAY<STRUCT<ARRAY>> (GoogleSQL) data types. |
BIGNUMERIC | Number or String | Same as NUMERIC . |
BOOL | Boolean | |
BYTES | Uint8Array | |
DATE | Date | |
FLOAT64 | Number | |
INT64 | BigInt | |
JSON | Various types | The GoogleSQL JSON data type can be converted into a JavaScript Object , Array , or other GoogleSQL-supported JavaScript data type. |
NUMERIC | Number or String | If a NUMERIC value can be represented exactly as an IEEE 754 floating-point value (range [-253, 253] ), and has no fractional part, it is encoded as a Number data type, otherwise it is encoded as a String data type. |
STRING | String | |
STRUCT | Object | Each STRUCT field is a named property in the Object data type. An unnamed STRUCT field is not supported. |
TIMESTAMP | Date | Date contains a microsecond field with the microsecond fraction of TIMESTAMP . |
The JavaScript function body must include the following exportable JavaScript functions:
initialState
function: Sets up the initial aggregation state of the UDAF and then returns the initial aggregation state.
Syntax:
exportfunctioninitialState([nonAggregateParam]){...}
Parameters:
nonAggregateParam
: Replace this parameter with a NOT AGGREGATE
function parameter name.Examples:
exportfunctioninitialState(){...}
exportfunctioninitialState(initialSum){...}
aggregate
function: Aggregates one row of data, updating state to store the result of the aggregation. Doesn't return a value.
Syntax:
exportfunctionaggregate(state,aggregateParam[,...][,nonAggregateParam]){...}
Parameters:
state
: The aggregate state, which is initialState
on the first invocation, and then the return value of the previous call to aggregate
thereafter.
aggregateParam
: The name of an aggregation parameter in the JavaScript UDAF. The argument for this parameter will be aggregated.
nonAggregateParam
: Replace with a NOT AGGREGATE
function parameter name.
Example:
exportfunctionaggregate(currentState,aggX,aggWeight,initialSum)
merge
function: Combines two aggregation states from a prior call to the aggregate
, merge
, or initialState
function. This function does not return a value.
Syntax:
exportfunctionmerge(state,partialState[,nonAggregateParam]){...}
Parameters:
state
: The state into which partialState
is merged.
partialState
: The second aggregation state to merge.
nonAggregateParam
: Replace with a NOT AGGREGATE
function parameter name.
Details:
Depending on the size and organization of the underlying data being queried, the merge
function might or might not be called. For example, if a particular set of data is small, or the data is partitioned in a way that results in small sets of data, the merge
function won't be called.
Example:
exportfunctionmerge(currentState,partialState,initialSum)
finalize
function: Computes the final aggregation result and then returns this result for the UDAF.
Syntax:
exportfunctionfinalize(state[,nonAggregateParam]){...}
Parameters:
state
: The final aggregation state.
nonAggregateParam
: Replace with a NOT AGGREGATE
function parameter name.
The final aggregation state is returned by the merge
function (or aggregate
function if merge
is never invoked). If the input is empty after NULL
filtering, the final aggregation state is initialState
.
Example:
exportfunctionfinalize(finalState,initialSum)
If you want to work with non-serializable aggregation states, the JavaScript UDAF must provide the serialize
and deserialize
functions:
serialize
function: Converts an aggregation state into a BigQuery-serializable object. An object in JavaScript is BigQuery-serializable if all fields are a JavaScript primitive data type (for example, String
, Number
, null
, undefined
), another BigQuery-serializable object, or a JavaScript Array
, where all elements are either primitives or BigQuery-serializable objects.
Syntax:
exportfunctionserialize(state[,nonAggregateParam]){...}
Arguments:
state
: The aggregation state to serialize.
nonAggregateParam
: Replace with a NOT AGGREGATE
function parameter name.
Example:
exportfunctionserialize(stateToSerialize,initialSum)
deserialize
function: Converts a serialized state into an aggregation state. An aggregated state can be passed into the serialize
, aggregate
, merge
, and finalize
functions.
Syntax:
exportfunctiondeserialize(serializedState[,nonAggregateParam]){...}
Arguments:
serializedState
: The serialized state to convert into the aggregation state.
nonAggregateParam
: Replace with a NOT AGGREGATE
function parameter name.
Example:
exportfunctiondeserialize(stateToDeserialize,initialSum)
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.routines.create | The dataset where you create the function. |
In addition, the OR REPLACE
clause requires the bigquery.routines.update
permission.
A JavaScript UDAF is similar to a JavaScript UDF, but defines an aggregate function instead of a scalar function. In the following example, a temporary JavaScript UDAF calculates the sum of all rows that have a positive value. The JavaScript UDAF body is quoted within a raw string:
CREATETEMPAGGREGATEFUNCTIONSumPositive(xFLOAT64)RETURNSFLOAT64LANGUAGEjsASr''' export function initialState() { return {sum: 0} } export function aggregate(state, x) { if (x > 0) { state.sum += x; } } export function merge(state, partialState) { state.sum += partialState.sum; } export function finalize(state) { return state.sum; }''';-- Call the JavaScript UDAF.WITHnumbersAS(SELECT*FROMUNNEST([1.0,-1.0,3.0,-3.0,5.0,-5.0])ASx)SELECTSumPositive(x)ASsumFROMnumbers;/*-----* | sum | +-----+ | 9.0 | *-----*/
A JavaScript UDAF can have aggregate and non-aggregate parameters. In the following example, the JavaScript UDAF calculates the weighted average for x
after starting with an initial sum (initialSum
). x
and weight
are aggregate parameters, and initialSum
is a non-aggregate parameter:
CREATEORREPLACEAGGREGATEFUNCTIONmy_project.my_dataset.WeightedAverage(xINT64,weightFLOAT64,initialSumFLOAT64NOTAGGREGATE)RETURNSINT64LANGUAGEjsAS''' export function initialState(initialSum) { return {count: 0, sum: initialSum} } export function aggregate(state, x, weight) { state.count += 1; state.sum += Number(x) * weight; } export function merge(state, partialState) { state.sum += partialState.sum; state.count += partialState.count; } export function finalize(state) { return state.sum / state.count; }''';SELECTmy_project.my_dataset.WeightedAverage(item,weight,2)ASweighted_averageFROM(SELECT1ASitem,2.45ASweightUNIONALLSELECT3ASitem,0.11ASweightUNIONALLSELECT5ASitem,7.02ASweight);/*------------------* | weighted_average | +------------------+ | 13 | *------------------*/
CREATE TABLE FUNCTION
statementCreates a new table function, also called a table-valued function (TVF).
CREATE[ORREPLACE]TABLEFUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([function_parameter[,...]])[RETURNSTABLE < column_declaration[,...] > ][OPTIONS(table_function_options_list)]ASsql_queryfunction_parameter:parameter_name{data_type|ANYTYPE}column_declaration:column_namedata_type
OR REPLACE
: Replaces any table function with the same name if it exists. Cannot appear with IF NOT EXISTS
.IF NOT EXISTS
: If any table function exists with the same name, the CREATE
statement has no effect. Cannot appear with OR REPLACE
.project_name
: The name of the project where you are creating the function. Defaults to the project that runs this DDL statement.dataset_name
: The name of the dataset where you are creating the function.function_name
: The name of the function to create.function_parameter
: A parameter for the function, specified as a parameter name and a data type. The value of data_type
is a scalar BigQuery data type or ANY TYPE
.RETURNS TABLE
: The schema of the table that the function returns, specified as a comma-separated list of column name and data type pairs. If RETURNS TABLE
is absent, BigQuery infers the output schema from the query statement in the function body. If RETURNS TABLE
is included, the names in the returned table type must match column names from the SQL query.sql_query
: Specifies the SQL query to run. The SQL query must include names for all columns.table_function_options_list
The table_function_options_list
lets you specify table function options. Table function options have the same syntax and requirements as table options but with a different list of NAME
s and VALUE
s:
NAME | VALUE | Details |
---|---|---|
description |
| The description of the table function. |
BigQuery coerces argument types when possible. For example, if the parameter type is FLOAT64
and you pass an INT64
value, then BigQuery coerces it to a FLOAT64
.
If a parameter type is ANY TYPE
, the function accepts an input of any type for this argument. The type that you pass to the function must be compatible with the function definition. If you pass an argument with an incompatible type, the query returns an error. If more than one parameter has type ANY TYPE
, BigQuery does not enforce any type relationship between them.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.routines.create | The dataset where you create the table function. |
In addition, the OR REPLACE
clause requires bigquery.routines.update
permission.
The following table function takes an INT64
parameter that is used to filter the results of a query:
CREATEORREPLACETABLEFUNCTIONmydataset.names_by_year(yINT64)ASSELECTyear,name,SUM(number)AStotalFROM`bigquery-public-data.usa_names.usa_1910_current`WHEREyear=yGROUPBYyear,name
The following example specifies the return TABLE
type in the RETURNS
clause:
CREATEORREPLACETABLEFUNCTIONmydataset.names_by_year(yINT64)RETURNSTABLE<nameSTRING,yearINT64,totalINT64> ASSELECTyear,name,SUM(number)AStotalFROM`bigquery-public-data.usa_names.usa_1910_current`WHEREyear=yGROUPBYyear,name
CREATE PROCEDURE
statementCreates a new procedure, which is a block of statements that can be called from other queries. Procedures can call themselves recursively.
To create a GoogleSQL stored procedure, use the following syntax:
CREATE[ORREPLACE]PROCEDURE[IFNOTEXISTS][[project_name.]dataset_name.]procedure_name(procedure_argument[,...])[OPTIONS(procedure_option_list)]BEGINmulti_statement_queryEND;procedure_argument:[procedure_argument_mode]argument_nameargument_type
procedure_argument_mode:IN|OUT|INOUT
To create a stored procedure for Apache Spark, use the following syntax:
CREATE[ORREPLACE]PROCEDURE[IFNOTEXISTS][[project_name.]dataset_name.]procedure_name(procedure_argument[,...])[EXTERNALSECURITYexternal_security]WITHCONNECTIONconnection_project_id.connection_region.connection_id[OPTIONS(procedure_option_list)]LANGUAGElanguage[ASpyspark_code]procedure_argument:[procedure_argument_mode]argument_nameargument_type
procedure_argument_mode:IN|OUT|INOUTexternal_security:INVOKER
OR REPLACE
: Replaces any procedure with the same name if it exists. Cannot appear with IF NOT EXISTS
.
IF NOT EXISTS
: If any procedure exists with the same name, the CREATE
statement has no effect. Cannot appear with OR REPLACE
.
project_name
: The name of the project where you are creating the procedure. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks `
(example: `google.com:my_project`
).
dataset_name
: The name of the dataset where you are creating the procedure. Defaults to the defaultDataset
in the request.
procedure_name
: The name of the procedure to create.
external_security
: The procedure to be executed with the privileges of the user that calls it.
connection_project_id
: the project that contains the connection to run Spark procedures—for example, myproject
.
connection_region
: the region that contains the connection to run Spark procedures—for example, us
.
connection_id
: the connection ID—for example, myconnection
.
When you view the connection details in the Google Cloud console, the connection ID is the value in the last section of the fully qualified connection ID that is shown in Connection ID—for example projects/myproject/locations/connection_location/connections/myconnection
.
For more information, see Create a stored procedure for Apache Spark.
multi_statement_query
: The multi-statement query to run.
language
: The language in which the stored procedure for Apache Spark is written. BigQuery supports stored procedures for Apache Spark that are written in Python, Java, or Scala.
pyspark_code
: The PySpark code for the stored procedure for Apache Spark if you want to pass the body of the procedure inline. Cannot appear with main_file_uri
in procedure_option_list
.
argument_type
: Any valid BigQuery type.
procedure_argument_mode
: Specifies whether an argument is an input, an output, or both.
procedure_option_list
The procedure_option_list
lets you specify procedure options. Procedure options have the same syntax and requirements as table options but with a different list of NAME
s and VALUE
s:
NAME | VALUE | Details |
---|---|---|
strict_mode |
| It is useful for catching many common types of errors. The errors are not exhaustive, and successful creation of a procedure with strict_mode doesn't guarantee that the procedure will successfully execute at runtime. If If Default value is strict_mode=FALSE |
description |
| A description of the procedure. Example: description="A procedure that runs a query." |
engine | STRING | The engine type for processing stored procedures for Apache Spark. Must be specified for stored procedures for Spark. Valid value:engine="SPARK" |
runtime_version | STRING | The runtime version of stored procedures for Spark. If not specified, the system default runtime version is used. Stored procedures for Spark support the same list of runtime versions as Dataproc Serverless. However, we recommend to specify a runtime version. For more information, see Dataproc Serverless Spark runtime releases. Example:runtime_version="1.1" |
container_image | STRING | Custom container image for the runtime environment of the stored procedure for Spark. If not specified, the system default container image that includes the default Spark, Java, and Python packages associated with a runtime version is used. You can provide a custom container Docker image that includes your own built Java or Python dependencies. As Spark is mounted into your custom container at runtime, you must omit Spark in your custom container image. For optimized performance, we recommend you to host your image in Artifact Registry. For more information, see Use custom containers with Dataproc Serverless for Spark. Example: |
properties | ARRAY<STRUCT<STRING, STRING>> | A key-value pair to include properties for stored procedures for Spark. Stored procedures for Spark support most of the Spark properties and a list of custom Dataproc Serverless properties. If you specify unsupported Spark properties such as YARN-related Spark properties, BigQuery fails to create the stored procedure. You can add Spark properties using the following format: bqquery--nouse_legacy_sql--dry_run 'CREATE PROCEDURE my_bq_project.my_dataset.spark_proc()WITH CONNECTION `my-project-id.us.my-connection`OPTIONS(engine="SPARK",main_file_uri="gs://my-bucket/my-pyspark-main.py",properties=[("spark.executor.instances", "3"),("spark.yarn.am.memory", "3g")])LANGUAGE PYTHON'# Error in query string: Invalid value: \ Invalidproperties:\ Attemptedtosetunsupportedproperties:\[spark:spark.yarn.am.memory]at[1:1] |
main_file_uri | STRING | The Cloud Storage URI of the main Python, Scala, or Java JAR file of the Spark application. Applies only to stored procedures for Spark. Alternatively, if you want to add the body of the stored procedure that's written in Python in the main_file_uri="gs://my-bucket/my-pyspark-main.py" For Scala and Java languages, this field contains a path to only one JAR file. You can set only one value for main_file_uri="gs://my-bucket/my-scala-main.jar" |
main_class | STRING | Applies only to stored procedures for Spark written in Java and Scala. Specify a fully-qualified class name in a JAR set with the main_class=”com.example.wordcount” |
py_file_uris | ARRAY<STRING> | Python files to be placed on the Optional. Cloud Storage URIs of Python files to pass to the PySpark framework. Supported file formats include the following: py_file_uris=[ "gs://my-bucket/my-pyspark-file1.py", "gs://my-bucket/my-pyspark-file2.py" ] |
jar_uris | ARRAY<STRING> | Path to the JAR files to include on the driver and executor classpaths. Applies only to stored procedures for Apache Spark. Optional. Cloud Storage URIs of JAR files to add to the classpath of the Spark driver and tasks. Example:jar_uris=["gs://my-bucket/my-lib1.jar", "gs://my-bucket/my-lib2.jar"] |
file_uris | ARRAY<STRING> | Files to be placed in the working directory of each executor. Applies only to stored procedures for Apache Spark. Optional. Cloud Storage URIs of files to be placed in the working directory of each executor. Example:file_uris=["gs://my-bucket/my-file1", "gs://my-bucket/my-file2"] |
archive_uris | ARRAY<STRING> | Archive files to be extracted into the working directory of each executor. Applies only to stored procedures for Apache Spark. Optional. Cloud Storage URIs of archives to be extracted into the working directory of each executor. Supported file formats include the following: archive_uris=["gs://my-bucket/my-archive1.zip", "gs://my-bucket/my-archive2.zip"] |
IN
indicates that the argument is only an input to the procedure. You can specify either a variable or a value expression for IN
arguments.
OUT
indicates that the argument is an output of the procedure. An OUT
argument is initialized to NULL
when the procedure starts. You must specify a variable for OUT
arguments.
INOUT
indicates that the argument is both an input to and an output from the procedure. You must specify a variable for INOUT
arguments. An INOUT
argument can be referenced in the body of a procedure as a variable and assigned new values.
If neither IN
, OUT
, nor INOUT
is specified, the argument is treated as an IN
argument.
If a variable is declared outside a procedure, passed as an INOUT or OUT argument to a procedure, and the procedure assigns a new value to that variable, that new value is visible outside of the procedure.
Variables declared in a procedure are not visible outside of the procedure, and vice versa.
An OUT
or INOUT
argument can be assigned a value using SET
, in which case the modified value is visible outside of the procedure. If the procedure exits successfully, then the value of the OUT
or INOUT
argument is the final value assigned to that INOUT
variable.
Temporary tables exist for the duration of the script, so if a procedure creates a temporary table, the caller of the procedure will be able to reference the temporary table as well.
Procedure bodies can reference entities without specifying the project; the default project is the project which owns the procedure, not necessarily the project used to run the CREATE PROCEDURE
statement. Consider the sample query below.
CREATEPROCEDUREmyProject.myDataset.QueryTable()BEGINSELECT*FROManotherDataset.myTable;END;
After creating the above procedure, you can run the query CALL myProject.myDataset.QueryTable()
. Regardless of the project you choose to run this CALL
query, the referenced table anotherDataset.myTable
is always resolved against project myProject
.
This statement requires the following IAM permission:
Permission | Resource |
---|---|
bigquery.routines.create | The dataset where you create the procedure. |
To create a stored procedure for Apache Spark, additional IAM permission are needed:
Permission | Resource |
---|---|
bigquery.connections.delegate | The connection which you use to create the stored procedure for Apache Spark. |
In addition, the OR REPLACE
clause requires bigquery.routines.update
permission.
You can also see examples of stored procedures for Apache Spark.
The following example creates a SQL procedure that both takes x
as an input argument and returns x
as output; because no argument mode is present for the argument delta
, it is an input argument. The procedure consists of a block containing a single statement, which assigns the sum of the two input arguments to x
.
CREATEPROCEDUREmydataset.AddDelta(INOUTxINT64,deltaINT64)BEGINSETx=x+delta;END;
The following example calls the AddDelta
procedure from the example above, passing it the variable accumulator
both times; because the changes to x
within AddDelta
are visible outside of AddDelta
, these procedure calls increment accumulator
by a total of 8.
DECLAREaccumulatorINT64DEFAULT0;CALLmydataset.AddDelta(accumulator,5);CALLmydataset.AddDelta(accumulator,3);SELECTaccumulator;
This returns the following:
+-------------+|accumulator|+-------------+|8|+-------------+
The following example creates the procedure SelectFromTablesAndAppend
, which takes target_date
as an input argument and returns rows_added
as an output. The procedure creates a temporary table DataForTargetDate
from a query; then, it calculates the number of rows in DataForTargetDate
and assigns the result to rows_added
. Next, it inserts a new row into TargetTable
, passing the value of target_date
as one of the column names. Finally, it drops the table DataForTargetDate
and returns rows_added
.
CREATEPROCEDUREmydataset.SelectFromTablesAndAppend(target_dateDATE,OUTrows_addedINT64)BEGINCREATETEMPTABLEDataForTargetDateASSELECTt1.id,t1.x,t2.yFROMdataset.partitioned_table1ASt1JOINdataset.partitioned_table2ASt2ONt1.id=t2.idWHEREt1.date=target_dateANDt2.date=target_date;SETrows_added=(SELECTCOUNT(*)FROMDataForTargetDate);SELECTid,x,y,target_date-- note that target_date is a parameterFROMDataForTargetDate;DROPTABLEDataForTargetDate;END;
The following example declares a variable rows_added
, then passes it as an argument to the SelectFromTablesAndAppend
procedure from the previous example, along with the value of CURRENT_DATE
; then it returns a message stating how many rows were added.
DECLARErows_addedINT64;CALLmydataset.SelectFromTablesAndAppend(CURRENT_DATE(),rows_added);SELECTFORMAT('Added %d rows',rows_added);
CREATE ROW ACCESS POLICY
statementCreates or replaces a row-level access policy. Row-level access policies on a table must have unique names.
CREATE[ORREPLACE]ROWACCESSPOLICY[IFNOTEXISTS]row_access_policy_nameONtable_name[GRANTTO(grantee_list)]FILTERUSING(filter_expression);
IF NOT EXISTS
: If any row-level access policy exists with the same name, the CREATE
statement has no effect. Cannot appear with OR REPLACE
.
row_access_policy_name
: The name of the row-level access policy that you are creating. The row-level access policy name must be unique for each table. The row-level access policy name can contain the following:
table_name
: The name of the table that you want to create a row-level access policy for. The table must already exist.
GRANT TO grantee_list
: An optional clause that specifies the initial members that the row-level access policy should be created with.
grantee_list
is a list of iam_member
users or groups. Strings must be valid IAM principals, or members, following the format of an IAM Policy Binding member, and must be quoted. The following types are supported:
grantee_list types | |
---|---|
user:{emailid} | An email address that represents a specific Google account. |
serviceAccount:{emailid} | An email address that represents a service account. Example: |
group:{emailid} | An email address that represents a Google group. Example: |
domain:{domain} | The Google Workspace domain (primary) that represents all the users of that domain. Example: |
allAuthenticatedUsers | A special identifier that represents all service accounts and all users on the internet who have authenticated with a Google Account. This identifier includes accounts that aren't connected to a Google Workspace or Cloud Identity domain, such as personal Gmail accounts. Users who aren't authenticated, such as anonymous visitors, aren't included. |
allUsers | A special identifier that represents anyone who is on the internet, including authenticated and unauthenticated users. Because BigQuery requires authentication before a user can access the service, allUsers includes only authenticated users. |
You can combine a series of iam_member
values, if they are comma-separated and quoted separately. For example: "user:alice@example.com","group:admins@example.com","user:sales@example.com"
filter_expression
: Defines the subset of table rows to show only to the members of the grantee_list
. The filter_expression
is similar to the WHERE
clause in a SELECT
query.
The following are valid filter expressions:
SESSION_USER()
, to restrict access only to rows that belong to the user running the query. If none of the row-level access policies are applicable to the querying user, then the user has no access to the data in the table.TRUE
. Grants the principals in the grantee_list
field access to all rows of the table.The filter expression cannot contain the following:
SELECT
, CREATE
, or UPDATE
.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.rowAccessPolicies.create | The target table. |
bigquery.rowAccessPolicies.setIamPolicy | The target table. |
bigquery.tables.getData | The target table. |
CREATE CAPACITY
statementPurchases slots by creating a new capacity commitment.
CREATECAPACITY`project_id.location_id.commitment_id`OPTIONS(capacity_commitment_option_list);
project_id
: The project ID of the administration project that will maintain ownership of this commitment.location_id
: The location of the commitment.commitment_id
: The ID of the commitment. The value must be unique to the project and location. It must start and end with a lowercase letter or a number and contain only lowercase letters, numbers and dashes.capacity_commitment_option_list
: The options you can set to describe the capacity commitment.capacity_commitment_option_list
The option list specifies options for the capacity commitment. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | TYPE | Details |
---|---|---|
plan | String | The commitment plan to purchase. Supported values include: ANNUAL , THREE_YEAR , and TRIAL . For more information, see Commitment plans. |
renewal_plan | String | The commitment renewal plan. Applies only when plan is ANNUAL , THREE_YEAR , or TRIAL . For more information, see Renewing commitments. |
slot_count | Integer | The number of slots in the commitment. |
edition | String | The edition associated with this reservation. For more information about editions, see Introduction to BigQuery editions. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.capacityCommitments.create | The administration project that maintains ownership of the commitments. |
The following example creates a capacity commitment of 100 annual slots that are located in the region-us
region and managed by a project admin_project
:
CREATECAPACITY`admin_project.region-us.my-commitment`OPTIONS(slot_count=100,plan='ANNUAL');
CREATE RESERVATION
statementCreates a reservation. For more information, see Introduction to Reservations.
CREATERESERVATION`project_id.location_id.reservation_id`OPTIONS(reservation_option_list);
project_id
: The project ID of the administration project where the capacity commitment was created.location
: The location of the reservation.reservation_id
: The reservation ID.reservation_option_list
: The options you can set to describe the reservation.reservation_option_list
The option list specifies options for the dataset. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | TYPE | Details |
---|---|---|
ignore_idle_slots | BOOLEAN | If the value is true , then the reservation uses only the slots that are provisioned to it. The default value is false . For more information, see Idle slots. |
slot_capacity | INTEGER | The number of slots to allocate to the reservation. If this reservation was created with an edition, this is equivalent to the amount of baseline slots. |
target_job_concurrency | INTEGER | A soft upper bound on the number of jobs that can run concurrently in this reservation. |
edition | STRING | The edition associated with this reservation. For more information about editions, see Introduction to BigQuery editions. |
autoscale_max_slots | INTEGER | The maximum number of slots that could be added to the reservation by autoscaling. |
secondary_location | STRING | The secondary location to use in the case of disaster recovery. |
labels | <ARRAY<STRUCT<STRING, STRING>>> | An array of labels for the reservation, expressed as key-value pairs. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.reservations.create | The administration project that maintains ownership of the commitments. |
The following example creates a reservation of 100 slots in the project admin_project
:
CREATERESERVATION`admin_project.region-us.prod`OPTIONS(slot_capacity=100);
CREATE ASSIGNMENT
statementAssigns a project, folder, or organization to a reservation.
CREATEASSIGNMENT`project_id.location_id.reservation_id.assignment_id`OPTIONS(assignment_option_list)
project_id
: The project ID of the administration project where the reservation was created.location
: The location of the reservation.reservation_id
: The reservation ID.assignment_id
: The ID of the assignment. The value must be unique to the project and location. It must start and end with a lowercase letter or a number and contain only lowercase letters, numbers and dashes.assignment_option_list
: The options you can set to describe assignment.To remove a project from any reservations and use on-demand billing instead, set reservation_id
to none
.
assignment_option_list
The option list specifies options for the dataset. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | TYPE | Details |
---|---|---|
assignee | String | The ID of the project, folder, or organization to assign to the reservation. |
job_type | String | The type of job to assign to this reservation. Supported values include QUERY , PIPELINE , ML_EXTERNAL , CONTINUOUS , and BACKGROUND . For more information, see Assignments. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.reservationAssignments.create | The administration project and the assignee. |
The following example assigns the project my_project
to the prod
reservation for query jobs:
CREATEASSIGNMENT`admin_project.region-us.prod.my_assignment`OPTIONS(assignee='projects/my_project',job_type='QUERY');
The following example assigns an organization to the prod
reservation for pipeline jobs, such as load and export jobs:
CREATEASSIGNMENT`admin_project.region-us.prod.my_assignment`OPTIONS(assignee='organizations/1234',job_type='PIPELINE');
CREATE SEARCH INDEX
statementCreates a new search index on one or more columns of a table.
A search index enables efficient queries using the SEARCH
function.
CREATESEARCHINDEX[IFNOTEXISTS]index_nameONtable_name({ALLCOLUMNS[WITHCOLUMNOPTIONS(column[,...])]|column[,...]})[OPTIONS(index_option_list)]column:=column_name[OPTIONS(index_column_option_list)]
IF NOT EXISTS
: If there is already a search index by that name on the table, do nothing. If the table has a search index by a different name, then return an error.
index_name
: The name of the search index you're creating. Since the search index is always created in the same project and dataset as the base table, there is no need to specify these in the name.
table_name
: The name of the table. See Table path syntax.
ALL COLUMNS
: If data types are not specified, creates a search index on every column in the table which contains a STRING
field. If data types are specified, create a search index on every column in the table which matches any of the data types specified.
WITH COLUMN OPTIONS
: Can only be used with ALL COLUMNS
to set options on specific indexed columns.
column_name
: The name of a top-level column in the table which is one of the following supported data types or contains a field with one of the supported data types:
Supported data types | Notes |
---|---|
STRING | Primitive data type. |
INT64 | Primitive data type. |
TIMESTAMP | Primitive data type. |
ARRAY<PRIMITIVE_DATA_TYPE> | Must contain a primitive data type in this list. |
STRUCT or ARRAY<STRUCT> | Must contain at least one nested field that is a primitive data type in this list or ARRAY<PRIMITIVE_DATA_TYPE> . |
JSON | Must contain at least one nested field of a type that matches any data types in this list. |
index_column_option_list
: The list of options to set on indexed columns.
index_option_list
: The list of options to set on the search index.
You can create only one search index per base table. You cannot create a search index on a view or materialized view. To modify which columns are indexed, DROP
the current index and create a new one.
BigQuery returns an error if any column_name
is not a STRING
or does not contain a STRING
field, or if you call CREATE SEARCH INDEX
on ALL COLUMNS
of a table which contains no STRING
fields.
Creating a search index fails on a table which has column ACLs or row filters; however, these may all be added to the table after creation of the index.
index_option_list
The option list specifies options for the search index. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
analyzer | STRING | Example: The text analyzer to use to generate tokens for the search index. The supported values are |
analyzer_options | JSON-formatted STRING | The text analyzer configurations to set when creating a search index. Supported when analyzer is equal to 'LOG_ANALYZER' or 'PATTERN_ANALYZER' . For examples of JSON-formatted strings with different text analyzers, see Work with text analyzers. |
data_types | ARRAY<STRING> | Example: An array of data types to set when creating a search index. Supported data types are |
default_index_column_granularity | STRING | In Preview. Example: The default granularity of information to store for each indexed column. The supported values are |
index_column_option_list
NAME | VALUE | Details |
---|---|---|
index_granularity | STRING | In Preview. Example: The granularity of information to store for the indexed column. This setting overrides the default granularity specified in the |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.createIndex | The base table where you create the index. |
The following example creates a search index called my_index
on all string columns of my_table
. In this case, the index is only created on column a
.
CREATETABLEdataset.my_table(aSTRING,bINT64);CREATESEARCHINDEXmy_indexONdataset.my_table(ALLCOLUMNS);
The following example creates a search index on columns a
, my_struct.string_field
, and b
that uses the NO_OP_ANALYZER
text analyzer. It sets the default index column granularity to COLUMN
and overrides the setting for column a
to GLOBAL
.
CREATETABLEdataset.complex_table(aSTRING,my_structSTRUCT<string_fieldSTRING,int_fieldINT64>,bARRAY<STRING> );CREATESEARCHINDEXmy_indexONdataset.complex_table(aOPTIONS(index_granularity='GLOBAL'),my_struct,b)OPTIONS(analyzer='NO_OP_ANALYZER',default_index_column_granularity='COLUMN');
CREATE VECTOR INDEX
statementCreates a new vector index on a column of a table.
A vector index lets you perform a vector search more quickly, with the trade-off of reducing recall and so returning more approximate results.
CREATE[ORREPLACE]VECTORINDEX[IFNOTEXISTS]index_nameONtable_name(column_name)[STORING(stored_column_name[,...])]OPTIONS(index_option_list);
OR REPLACE
: Replaces any vector index with the same name if it exists. Can't appear with IF NOT EXISTS
.
IF NOT EXISTS
: If there is already a vector index by that name on the table, do nothing. If the table has a vector index by a different name, then return an error.
index_name
: The name of the vector index you're creating. Since the index is always created in the same project and dataset as the base table, there is no need to specify these in the name.
table_name
: The name of the table. See Table path syntax.
column_name
: The name of a column with a type of ARRAY<FLOAT64>
. The column can't have any child fields. All elements in the array must be non-NULL
, and all values in the column must have the same array dimensions.
stored_column_name
: The name of a top-level column in the table to store in the vector index. The column type can't be RANGE
. Stored columns are not used if the table has a row-level access policy or the column has a policy tag. To learn more, see Store columns and pre-filter.
index_option_list
: The list of options to set on the vector index.
You can only create vector indexes on standard tables.
You can create only one vector index per table. You can't create a vector index on a table that already has a search index with the same index name.
To modify which column is indexed, DROP
the current index and create a new one.
index_option_list
The option list specifies options for the vector index. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
index_type | STRING | Required. The algorithm to use to build the vector index. The supported values are IVF and TREE_AH .
|
distance_type | STRING | Specifies the default distance type to use when performing a vector search using this index. The supported values are EUCLIDEAN , COSINE , and DOT_PRODUCT . EUCLIDEAN is the default. The index creation itself always uses If you specify a value for the |
ivf_options | JSON-formatted STRING | The options to use with the IVF algorithm. Defaults to '{}' to denote that all underlying options use their corresponding default values. The only supported option is The IVF algorithm divides the whole data space into a number of lists equal to the You can use If you don't specify a value for The statement fails if |
tree_ah_options (Preview) | JSON-formatted STRING | The options to use with the TREE_AH algorithm. Defaults to '{}' to denote that all underlying options use their corresponding default values. Two options are supported:
For example The statement fails if |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.createIndex | The table where you create the vector index. |
If you choose to use the OR REPLACE
clause, you must also have the bigquery.tables.updateIndex
permission.
The following example creates a vector index on the embedding
column of my_table
:
CREATETABLEmy_dataset.my_table(idINT64,embeddingARRAY<FLOAT64>);CREATEVECTORINDEXmy_indexONmy_dataset.my_table(embedding)OPTIONS(index_type='IVF');
The following example creates a vector index on the embedding
column of my_table
, and specifies the distance type to use and the IVF options:
CREATETABLEmy_dataset.my_table(idINT64,embeddingARRAY<FLOAT64>);CREATEVECTORINDEXmy_indexONmy_dataset.my_table(embedding)OPTIONS(index_type='IVF',distance_type='COSINE',ivf_options='{"num_lists":2500}');
The following example creates a vector index on the embedding
column of my_table
, and specifies the distance type to use and the TREE_AH options:
CREATETABLEmy_dataset.my_table(idINT64,embeddingARRAY<FLOAT64>);CREATEVECTORINDEXmy_indexONmy_dataset.my_table(embedding)OPTIONS(index_type='TREE_AH',distance_type='EUCLIDEAN',tree_ah_options='{"normalization_type": "L2"}');
ALTER SCHEMA SET DEFAULT COLLATE
statementSets collation specifications on a dataset.
ALTERSCHEMA[IFEXISTS][project_name.]dataset_nameSETDEFAULTCOLLATEcollate_specification
IF EXISTS
: If no dataset exists with that name, the statement has no effect.
DEFAULT COLLATE collate_specification
: When a new table is created in the dataset, the table inherits a default collation specification unless a collation specification is explicitly specified for a column.
The updated collation specification only applies to tables created afterwards.
project_name
: The name of the project that contains the dataset. Defaults to the project that runs this DDL statement.
dataset_name
: The name of the dataset.
collate_specification
: Specifies the collation specifications to set.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.get | The dataset to alter. |
bigquery.datasets.update | The dataset to alter. |
Assume you have an existing table, mytable_a
, in a dataset called mydataset
. For example:
CREATESCHEMAmydataset
CREATETABLEmydataset.mytable_a(numberINT64,wordSTRING)
+----------------------+ | mydataset.mytable_a | | number INT64 | | word STRING | +----------------------+
At a later time, you decide to add a collation specification to your dataset. For example:
ALTERSCHEMAmydatasetSETDEFAULTCOLLATE'und:ci'
If you create a new table for your dataset, it inherits COLLATE 'und:ci'
for all STRING
columns. For example, collation is added to characters
when you create the mytable_b
table in the mydataset
dataset:
CREATETABLEmydataset.mytable_b(amountINT64,charactersSTRING)
+--------------------------------------+ | mydataset.mytable_b | | amount INT64 | | characters STRING COLLATE 'und:ci' | +--------------------------------------+
However, although you have updated the collation specification for the dataset, your existing table, mytable_a
, continues to use the previous collation specification. For example:
+---------------------+ | mydataset.mytable_a | | number INT64 | | word STRING | +---------------------+
ALTER SCHEMA SET OPTIONS
statementSets options on a dataset.
The statement runs in the location of the dataset if the dataset exists, unless you specify the location in the query settings. For more information, see Specifying your location.
ALTERSCHEMA[IFEXISTS][project_name.]dataset_nameSETOPTIONS(schema_set_options_list)
IF EXISTS
: If no dataset exists with that name, the statement has no effect.
project_name
: The name of the project that contains the dataset. Defaults to the project that runs this DDL statement.
dataset_name
: The name of the dataset.
schema_set_options_list
: The list of options to set.
schema_set_options_list
NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
default_kms_key_name | STRING | Specifies the default Cloud KMS key for encrypting table data in this dataset. You can override this value when you create a table. |
default_partition_expiration_days | FLOAT64 | Specifies the default expiration time, in days, for table partitions in this dataset. You can override this value when you create a table. |
default_rounding_mode |
| Example: This specifies the
|
default_table_expiration_days | FLOAT64 | Specifies the default expiration time, in days, for tables in this dataset. You can override this value when you create a table. |
description | STRING | The description of the dataset. |
failover_reservation | STRING | Associates the dataset to a reservation in the case of a failover scenario. |
friendly_name | STRING | A descriptive name for the dataset. |
is_case_insensitive | BOOL | TRUE if the dataset and its table names are case-insensitive, otherwise FALSE . By default, this is FALSE , which means the dataset and its table names are case-sensitive.
|
is_primary | BOOLEAN | Declares if the dataset is the primary replica. |
labels | <ARRAY<STRUCT<STRING, STRING>>> | An array of labels for the dataset, expressed as key-value pairs. |
max_time_travel_hours | SMALLINT | Specifies the duration in hours of the time travel window for the dataset. The max_time_travel_hours value must be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168) between 48 (2 days) and 168 (7 days). 168 hours is the default if this option isn't specified. |
primary_replica | STRING | The replica name to set as the primary replica. |
storage_billing_model | STRING | Alters the storage billing model for the dataset. Set the The When you change a dataset's billing model, it takes 24 hours for the change to take effect. Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.get | The dataset to alter. |
bigquery.datasets.update | The dataset to alter. |
The following example sets the default table expiration.
ALTERSCHEMAmydatasetSETOPTIONS(default_table_expiration_days=3.75)
The following example turns on case insensitivity for the name of a dataset and the table names within that dataset.
ALTERSCHEMAmydatasetSETOPTIONS(is_case_insensitive=TRUE)
ALTER SCHEMA ADD REPLICA
statementAdds a replica to a schema (preview).
ALTERSCHEMA[IFEXISTS][project_name.]dataset_nameADDREPLICAreplica_name[OPTIONS(add_replica_options_list)]
IF EXISTS
: If no dataset exists with that name, the statement has no effect.dataset_name
: The name of the table to alter. See Table path syntax.replica_name
: The name of the new replica. Conventionally, this is the same as the location you are creating the replica in.add_replica_option_list
: The list of options to set.add_replica_options_list
The option list specifies options for the dataset. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
location | STRING | The location in which to create the replica. |
replica_kms_key | STRING | The Cloud Key Management Service key set in the destination region. replica_kms_key is used as a substitute encryption key in the destination region for any keys used in the source region. Any table in the source region that's encrypted with a Cloud KMS key is encrypted with the replica_kms_key . This value must be a Cloud KMS key created in the replica dataset's region, not the source dataset's region. For more information about setting up a Cloud KMS key, see Grant encryption and decryption permission. |
To get the permissions that you need to manage replicas, ask your administrator to grant you the BigQuery Data Editor (roles/bigquery.dataEditor
) IAM role on your schema. For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
The following example adds a secondary replica that is named EU
in the EU
multi-region to a schema that is named cross_region_dataset
:
ALTERSCHEMAcross_region_datasetADDREPLICA`EU`OPTIONS(location=`eu`);
ALTER SCHEMA DROP REPLICA
statementDrops a replica from a schema (preview).
ALTERSCHEMA[IFEXISTS]dataset_nameDROPREPLICAreplica_name
IF EXISTS
: If no dataset exists with that name, the statement has no effect.dataset_name
: The name of the table to alter. See Table path syntax.replica_name
: The name of the replica to drop. To get the permissions that you need to manage replicas, ask your administrator to grant you the BigQuery Data Editor (roles/bigquery.dataEditor
) IAM role on your schema. For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
The following example removes a replica that is located in the us-east4
region from the cross_region_dataset
dataset:
ALTERSCHEMA[IFEXISTS]cross_region_datasetDROPREPLICA`us-east4`
ALTER TABLE SET OPTIONS
statementSets the options on a table.
ALTERTABLE[IFEXISTS]table_nameSETOPTIONS(table_set_options_list)
IF EXISTS
: If no table exists with that name, the statement has no effect.
table_name
: The name of the table to alter. See Table path syntax.
table_set_options_list
: The list of options to set.
This statement is not supported for external tables.
table_set_options_list
The option list lets you set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a table option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
expiration_timestamp | TIMESTAMP | Example: This property is equivalent to the expirationTime table resource property. |
partition_expiration_days |
| Example: Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions don't expire. This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours. This property can only be set if the table is partitioned. |
require_partition_filter |
| Example: Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value is This property is equivalent to the timePartitioning.requirePartitionFilter table resource property. This property can only be set if the table is partitioned. |
kms_key_name |
| Example: This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property. See more details about Protecting data with Cloud KMS keys. |
friendly_name |
| Example: This property is equivalent to the friendlyName table resource property. |
description |
| Example: This property is equivalent to the description table resource property. |
labels |
| Example: This property is equivalent to the labels table resource property. |
default_rounding_mode |
| Example: This specifies the default rounding mode that's used for values written to any new
This property is equivalent to the |
enable_change_history |
| In preview. Example: Set this property to |
max_staleness |
| Example: The maximum interval behind the current time where it's acceptable to read stale data. For example, with change data capture, when this option is set, the table copy operation is denied if data is more stale than the
|
enable_fine_grained_mutations |
| In preview. Example: Set this property to |
storage_uri |
| In preview. Example: A fully qualified location prefix for the external folder where data is stored. Supports Required for managed tables. |
file_format |
| In preview. Example: The open-source file format in which the table data is stored. Only Required for managed tables. The default is |
table_format |
| In preview. Example: The open table format in which metadata-only snapshots are stored. Only Required for managed tables. The default is |
VALUE
is a constant expression containing only literals, query parameters, and scalar functions.
The constant expression cannot contain:
SELECT
, CREATE
, or UPDATE
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the value replaces the existing value of that option for the table, if there was one. Setting the value to NULL
clears the table's value for that option.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example sets the expiration timestamp on a table to seven days from the execution time of the ALTER TABLE
statement, and sets the description as well:
ALTERTABLEmydataset.mytableSETOPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL7DAY),description="Table that expires seven days from now")
The following example sets the timePartitioning.requirePartitionFilter
attribute on a partitioned table:
ALTERTABLEmydataset.mypartitionedtableSETOPTIONS(require_partition_filter=true)
Queries that reference this table must use a filter on the partitioning column, or else BigQuery returns an error. Setting this option to true
can help prevent mistakes in querying more data than intended.
The following example clears the expiration timestamp on a table so that it will not expire:
ALTERTABLEmydataset.mytableSETOPTIONS(expiration_timestamp=NULL)
ALTER TABLE ADD COLUMN
statementAdds one or more new columns to an existing table schema.
ALTERTABLEtable_nameADDCOLUMN[IFNOTEXISTS]column[,...]
table_name
: The name of the table. See Table path syntax.
IF NOT EXISTS
: If the column name already exists, the statement has no effect.
column
: The column to add. This includes the name of the column and schema to add. The column name and schema use the same syntax used in the CREATE TABLE
statement.
You cannot use this statement to create:
RECORD
fields.You cannot add a REQUIRED
column to an existing table schema. However, you can create a nested REQUIRED
column as part of a new RECORD
field.
This statement is not supported for external tables.
Without the IF NOT EXISTS
clause, if the table already contains a column with that name, the statement returns an error. If the IF NOT EXISTS
clause is included and the column name already exists, no error is returned, and no action is taken.
The value of the new column for existing rows is set to one of the following:
NULL
if the new column was added with NULLABLE
mode. This is the default mode.ARRAY
if the new column was added with REPEATED
mode.For more information about schema modifications in BigQuery, see Modifying table schemas.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example adds the following columns to an existing table named mytable
:
A
of type STRING
.B
of type GEOGRAPHY
.C
of type NUMERIC
with REPEATED
mode.D
of type DATE
with a description.ALTERTABLEmydataset.mytableADDCOLUMNASTRING,ADDCOLUMNIFNOTEXISTSBGEOGRAPHY,ADDCOLUMNCARRAY<NUMERIC>,ADDCOLUMNDDATEOPTIONS(description="my description")
If any of the columns named A
, C
, or D
already exist, the statement fails. If column B
already exists, the statement succeeds because of the IF NOT EXISTS
clause.
RECORD
columnThe following example adds a column named A
of type STRUCT
that contains the following nested columns:
B
of type GEOGRAPHY
.C
of type INT64
with REPEATED
mode.D
of type INT64
with REQUIRED
mode.E
of type TIMESTAMP
with a description.ALTERTABLEmydataset.mytableADDCOLUMNASTRUCT< BGEOGRAPHY,CARRAY<INT64>,DINT64NOTNULL,ETIMESTAMPOPTIONS(description="creation time")>
The query fails if the table already has a column named A
, even if that column does not contain any of the nested columns that are specified.
The new STRUCT
named A
is nullable, but the nested column D
within A
is required for any STRUCT
values of A
.
When you create a new column for your table, you can specifically assign a new collation specification to that column.
ALTERTABLEmydataset.mytableADDCOLUMNwordSTRINGCOLLATE'und:ci'
ALTER TABLE ADD FOREIGN KEY
statementAdds a foreign key constraint to an existing table. You can add multiple foreign key constraints by using additional ADD FOREIGN KEY
statements.
ALTERTABLE[[project_name.]dataset_name.]fk_table_nameADD[CONSTRAINT[IFNOTEXISTS]constraint_name]FOREIGNKEY(fk_column_name[,...])REFERENCESpk_table_name(pk_column_name[,...])NOTENFORCED[ADD...];
project_name
: The name of the project containing the table with a primary key. Defaults to the project that runs this DDL statement if undefined.dataset_name
: The name of the dataset that contains the table with a primary key. Defaults to the project that runs this DDL statement if undefined.fk_table_name
: The name of the existing table to add a foreign key to.IF NOT EXISTS
: If a constraint of the same name already exists in the defined table, the statement has no effect.constraint_name
: The name of the constraint to add.fk_column_name
: In the foreign key table, the name of the foreign key column. Only top-level columns can be used as foreign key columns.pk_table_name
: The name of the table that contains the primary key.pk_column_name
: In the primary key table, the name of the primary key column. Only top-level columns can be used as primary key columns.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example adds the my_fk_name
foreign key constraint to the fk_table
table. This example depends on an existing table, pk_table
.
Add a primary key to the pk_table
table:
ALTERTABLEpk_tableADDPRIMARYKEY(x,y)NOTENFORCED;
Create a table named fk_table
for the foreign key.
CREATETABLEfk_table(xint64,yint64,iint64,jint64,uint64,vint64);
Add the my_fk_name
foreign key constraint to the fk_table
.
ALTERTABLEfk_tableADDCONSTRAINTmy_fk_nameFOREIGNKEY(u,v)REFERENCESpk_table(x,y)NOTENFORCED
The following example adds the fk
and fk2
foreign key constraints to the fk_table
table in a single statement. This example depends on an existing table, pk_table
.
Add a primary key to the pk_table
table:
ALTERTABLEpk_tableADDPRIMARYKEY(x,y)NOTENFORCED;
Create a table named fk_table
for multiple foreign key constraints.
CREATETABLEfk_table(xint64,yint64,iint64,jint64,uint64,vint64);
Add the fk
and fk2
constraints to fk_table
in one statement.
ALTERTABLEfk_tableADDPRIMARYKEY(x,y)NOTENFORCED,ADDCONSTRAINTfkFOREIGNKEY(u,v)REFERENCESpk_table(x,y)NOTENFORCED,ADDCONSTRAINTfk2FOREIGNKEY(i,j)REFERENCESpk_table(x,y)NOTENFORCED;
ALTER TABLE ADD PRIMARY KEY
statementAdds a primary key to an existing table.
ALTERTABLE[[project_name.]dataset_name.]table_nameADDPRIMARYKEY(column_list)NOTENFORCED;
project_name
: The name of the project containing the table with a primary key. Defaults to the project that runs this DDL statement if undefined.dataset_name
: The name of the dataset that contains the table with a primary key.table_name
: The name of the existing table with a primary key.column_list
: The list of columns to be added as primary keys.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example adds the primary key constraint of x
and y
to the pk_table
table.
ALTERTABLEpk_tableADDPRIMARYKEY(x,y)NOTENFORCED;
ALTER TABLE RENAME TO
statementRenames a clone, snapshot or table.
ALTERTABLE[IFEXISTS]table_nameRENAMETOnew_table_name
IF EXISTS
: If no table exists with that name, the statement has no effect.
table_name
: The name of the table to rename. See Table path syntax.
new_table_name
: The new name of the table. The new name cannot be an existing table name.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example renames the table mydataset.mytable
to mydataset.mynewtable
:
ALTERTABLEmydataset.mytableRENAMETOmynewtable
ALTER TABLE RENAME COLUMN
statementRenames one or more columns in an existing table schema.
ALTERTABLE[IFEXISTS]table_nameRENAMECOLUMN[IFEXISTS]column_to_column[,...]column_to_column:=column_nameTOnew_column_name
(ALTER TABLE) IF EXISTS
: If the specified table does not exist, the statement has no effect.
table_name
: The name of the table to alter. See Table path syntax.
(ALTER COLUMN) IF EXISTS
: If the specified column does not exist, the statement has no effect.
column_name
: The name of the top-level column you're altering.
new_column_name
: The new name of the column. The new name cannot be an existing column name.
This statement is not supported for external tables.
If the table to be modified has active row-level access policies, the statement returns an error.
Without the IF EXISTS
clause, if the table does not contain a column with that name, then the statement returns an error. If the IF EXISTS
clause is included and the column name does not exist, then no error is returned, and no action is taken.
This statement only renames the column from the table. Any objects that refer to the column, such as views or materialized views, must be updated or recreated separately.
You cannot use this statement to rename the following:
STRUCT
After one or more columns in a table are renamed, you cannot do the following:
Renaming the columns with their original names removes these restrictions.
Multiple RENAME COLUMN
statements in one ALTER TABLE
statement are supported. The sequence of renames are interpreted and validated in order. Each column_name
must refer to a column name that exists after all preceding renames have been applied. RENAME COLUMN
cannot be used with other ALTER TABLE
actions in one statement.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example renames columns from an existing table named mytable
:
A
-> columnA
B
-> columnB
ALTERTABLEmydataset.mytableRENAMECOLUMNATOcolumnA,RENAMECOLUMNIFEXISTSBTOcolumnB
If column A
does not exist, then the statement fails. If column B
does not exist, then the statement still succeeds because of the IF EXISTS
clause.
The following example swaps the names of columnA
and columnB
:
ALTERTABLEmydataset.mytableRENAMECOLUMNcolumnATOtemp_col,RENAMECOLUMNcolumnBTOcolumnA,RENAMECOLUMNtemp_colTOcolumnB
ALTER TABLE DROP COLUMN
statementDrops one or more columns from an existing table schema.
ALTERTABLEtable_nameDROPCOLUMN[IFEXISTS]column_name[,...]
table_name
: The name of the table to alter. See Table path syntax. The table must already exist and have a schema.
IF EXISTS
: If the specified column does not exist, the statement has no effect.
column_name
: The name of the column to drop.
Dropping a column is a metadata-only operation and does not immediately free up the storage that is associated with the dropped column. The storage is freed up the next time the table is written to, typically when you perform a DML operation on it or when a background optimzation job happens. Since DROP COLUMN
is not a data cleanup operation, there is no guaranteed time window within which the data will be deleted.
There are two options for immediately reclaiming storage:
SELECT * EXCEPT
query.You can restore a dropped column in a table using time travel. You cannot use this statement to drop the following:
RECORD
fieldsAfter one or more columns in a table are dropped you cannot do the following:
This statement is not supported for external tables.
Without the IF EXISTS
clause, if the table does not contain a column with that name, then the statement returns an error. If the IF EXISTS
clause is included and the column name does not exist, then no error is returned, and no action is taken.
This statement only removes the column from the table. Any objects that refer to the column, such as views or materialized views, must be updated or recreated separately.
For more information about schema modifications in BigQuery, see Modifying table schemas.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example drops the following columns from an existing table named mytable
:
A
B
ALTERTABLEmydataset.mytableDROPCOLUMNA,DROPCOLUMNIFEXISTSB
If the column named A
does not exist, then the statement fails. If column B
does not exist, then the statement still succeeds because of the IF EXISTS
clause.
After one or more columns in a table are dropped, you cannot do the following:
bq cp
command.Recreating the table using CREATE TABLE ... AS SELECT ...
removes these restrictions.
ALTER TABLE DROP CONSTRAINT
statementDrops a constraint from an existing table. You can use this statement to drop foreign key constraints from a table.
ALTERTABLE[[project_name.]dataset_name.]table_nameDROPCONSTRAINT[IFEXISTS]constraint_name;
project_name
: The name of the project containing the table with a primary key. Defaults to the project that runs this DDL statement if undefined.dataset_name
: The name of the dataset that contains the table with a primary key.table_name
: The name of the existing table with a primary key.IF EXISTS
: If no primary key exists in the defined table, the statement has no effect.constraint_name
: The name of the constraint to drop.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example drops the constraint myConstraint
from the existing table myTable
.
ALTERTABLEmytableDROPCONSTRAINTmyConstraint;
ALTER TABLE DROP PRIMARY KEY
statementDrops a primary key from an existing table.
ALTERTABLE[[project_name.]dataset_name.]table_nameDROPPRIMARYKEY[IFEXISTS];
project_name
: The name of the project containing the table with a primary key. Defaults to the project that runs this DDL statement if undefined.dataset_name
: The name of the dataset that contains the table with a primary key.table_name
: The name of the existing table with a primary key.IF EXISTS
: If no primary key exists in the defined table, the statement has no effect.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example drops all primary keys from the existing table myTable
.
ALTERTABLEmyTableDROPPRIMARYKEY;
ALTER TABLE SET DEFAULT COLLATE
statementSets collation specifications on a table.
ALTERTABLEtable_nameSETDEFAULTCOLLATEcollate_specification
table_name
: The name of the table to alter. See Table path syntax. The table must already exist and have a schema.
SET DEFAULT COLLATE collate_specification
: When a new column is created in the schema, and if the column does not have an explicit collation specification, the column inherits this collation specification for STRING
types. The updated collation specification only applies to columns added afterwards.
If you want to add a collation specification on a new column in an existing table, you can do this when you add the column. If you add a collation specification directly on a column, the collation specification for the column has precedence over a table's default collation specification. You cannot update an existing collation specification on a column.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
Assume you have an existing table, mytable
, in a schema called mydataset
.
CREATETABLEmydataset.mytable(numberINT64,wordSTRING)DEFAULTCOLLATE'und:ci'
When you create mytable
, all STRING
columns inherit COLLATE 'und:ci'
. The resulting table has this structure:
+--------------------------------+ | mydataset.mytable | | number INT64 | | word STRING COLLATE 'und:ci' | +--------------------------------+
At a later time, you decide to change the collation specification for your table.
ALTERTABLEmydataset.mytableSETDEFAULTCOLLATE''
Although you have updated the collation specification, your existing column, word
, continues to use the previous collation specification.
+--------------------------------+ | mydataset.mytable | | number INT64 | | word STRING COLLATE 'und:ci' | +--------------------------------+
However, if you create a new column for your table, the new column includes the new collation specification. In the following example a column called name
is added. Because the new collation specification is empty, the default collation specification is used.
ALTERTABLEmydataset.mytableADDCOLUMNnameSTRING
+--------------------------------+ | mydataset.mytable | | number INT64 | | word STRING COLLATE 'und:ci' | | name STRING COLLATE | +--------------------------------+
ALTER COLUMN SET OPTIONS
statementSets options, such as the column description, on a column in a table or view in BigQuery.
ALTER{TABLE|VIEW}[IFEXISTS]nameALTERCOLUMN[IFEXISTS]column_nameSETOPTIONS({column_set_options_list|view_column_set_options_list})
(ALTER { TABLE | VIEW }) IF EXISTS
: If no table or view exists with that name, then the statement has no effect.
name
: The name of the table or view to alter. See Table path syntax.
(ALTER COLUMN) IF EXISTS
: If the specified column does not exist, the statement has no effect.
column_name
: The name of the top-level column you're altering. Modifying subfields, such as nested columns in a STRUCT
, is not supported.
column_set_options_list
: The list of options to set on the column of the table. This option must be used with TABLE
.
view_column_set_options_list
: The list of options to set on the column of the view. This option must be used with VIEW
.
This statement is not supported for external tables.
column_set_options_list
Specify a column option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
description |
| Example: This property is equivalent to the schema.fields[].description table resource property. |
rounding_mode |
| Example: This specifies the rounding mode that's used for values written to a
This property is equivalent to the |
VALUE
is a constant expression containing only literals, query parameters, and scalar functions.
The constant expression cannot contain:
SELECT
, CREATE
, or UPDATE
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the VALUE
replaces the existing value of that option for the column, if there was one. Setting the VALUE
to NULL
clears the column's value for that option.
view_column_set_options_list
The view_column_option_list
lets you specify optional top-level column options. Column options for a view have the same syntax and requirements as for a table, but with a different list of NAME
and VALUE
fields:
NAME | VALUE | Details |
---|---|---|
description |
| Example: |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example sets a new description on a table column called price
:
ALTERTABLEmydataset.mytableALTERCOLUMNpriceSETOPTIONS(description='Price per unit');
The following example sets a new description on a view column called total
:
ALTERVIEWmydataset.myviewALTERCOLUMNtotalSETOPTIONS(description='Total sales of the product');
ALTER COLUMN DROP NOT NULL
statementRemoves a NOT NULL
constraint from a column in a table in BigQuery.
ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]columnDROPNOTNULL
(ALTER TABLE) IF EXISTS
: If no table exists with that name, the statement has no effect.
table_name
: The name of the table to alter. See Table path syntax.
(ALTER COLUMN) IF EXISTS
: If the specified column does not exist, the statement has no effect.
column_name
: The name of the top level column you're altering. Modifying subfields is not supported.
If a column does not have a NOT NULL
constraint the query returns an error.
This statement is not supported for external tables.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example removes the NOT NULL
constraint from a column called mycolumn
:
ALTERTABLEmydataset.mytableALTERCOLUMNmycolumnDROPNOTNULL
ALTER COLUMN SET DATA TYPE
statementChanges the data type of a column in a table in BigQuery to a less restrictive data type. For example, a NUMERIC
data type can be changed to a BIGNUMERIC
type but not the reverse.
ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]column_nameSETDATATYPEcolumn_schema
(ALTER TABLE) IF EXISTS
: If no table exists with that name, the statement has no effect.
table_name
: The name of the table to alter. See Table path syntax.
(ALTER COLUMN) IF EXISTS
: If the specified column does not exist, the statement has no effect.
column_name
: The name of the top level column you're altering. Modifying subfields is not supported.
column_schema
: The schema that you're converting the column to. This schema uses the same syntax used in the CREATE TABLE
statement.
The following data type conversions are supported: :
INT64
to NUMERIC
, BIGNUMERIC
, FLOAT64
NUMERIC
to BIGNUMERIC
, FLOAT64
You can also convert data types from more restrictive to less restrictive parameterized data types. For example, you can increase the maximum length of a string type or increase the precision or scale of a numeric type.
The following are examples of valid parameterized data type conversions:
NUMERIC(10, 6)
to NUMERIC(12, 8)
NUMERIC
to BIGNUMERIC(40, 20)
STRING(5)
to STRING(7)
This statement is not supported for external tables.
Without the IF EXISTS
clause, if the table does not contain a column with that name, the statement returns an error. If the IF EXISTS
clause is included and the column name does not exist, no error is returned, and no action is taken.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example changes the data type of column c1
from an INT64
to NUMERIC
:
CREATETABLEdataset.my_table(c1INT64);ALTERTABLEdataset.my_tableALTERCOLUMNc1SETDATATYPENUMERIC;
The following example changes the data type of one of the fields in the s1
column:
CREATETABLEdataset.my_table(s1STRUCT<aINT64,bSTRING>);ALTERTABLEdataset.my_tableALTERCOLUMNs1SETDATATYPESTRUCT<aNUMERIC,bSTRING>;
The following example changes the precision of a parameterized data type column:
CREATETABLEdataset.my_table(ptNUMERIC(7,2));ALTERTABLEdataset.my_tableALTERCOLUMNptSETDATATYPENUMERIC(8,2);
ALTER COLUMN SET DEFAULT
statementSets the default value of a column.
ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]column_nameSETDEFAULTdefault_expression;
(ALTER TABLE) IF EXISTS
: If the specified table does not exist, the statement has no effect.
table_name
: The name of the table to alter. See Table path syntax.
(ALTER COLUMN) IF EXISTS
: If the specified column does not exist, the statement has no effect.
column_name
: The name of the top-level column to add a default value to.
default_expression
: The default value assigned to the column. The expression must be a literal or one of the following functions:
Setting the default value for a column only affects future inserts to the table. It does not change any existing table data.
The type of the default value must match the type of the column. A STRUCT
type can only have a default value set for the entire STRUCT
field. You cannot set the default value for a subset of the fields. You cannot set the default value of an array to NULL
or set an element within an array to NULL
.
If the default value is a function, it is evaluated at the time that the value is written to the table, not the time the table is created.
You can't set default values on columns that are primary keys.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example sets the default value of the column mycolumn
to the current time:
ALTERTABLEmydataset.mytableALTERCOLUMNmycolumnSETDEFAULTCURRENT_TIME();
ALTER COLUMN DROP DEFAULT
statementRemoves the default value assigned to a column. This is the same as setting the default value to NULL
.
ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]column_nameDROPDEFAULT;
(ALTER TABLE) IF EXISTS
: If the specified table does not exist, the statement has no effect.
table_name
: The name of the table to alter. See Table path syntax.
(ALTER COLUMN) IF EXISTS
: If the specified column does not exist, the statement has no effect.
column_name
: The name of the top-level column to remove the default value from. If you drop the default value from a column that does not have a default set, an error is returned.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The table to alter. |
bigquery.tables.update | The table to alter. |
The following example removes the default value from the column mycolumn
:
ALTERTABLEmydataset.mytableALTERCOLUMNmycolumnDROPDEFAULT;
ALTER VIEW SET OPTIONS
statementSets the options on a view.
ALTERVIEW[IFEXISTS]view_nameSETOPTIONS(view_set_options_list)
IF EXISTS
: If no view exists with that name, the statement has no effect.
view_name
: The name of the view to alter. See Table path syntax.
view_set_options_list
: The list of options to set.
view_set_options_list
The option list allows you to set view options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a view option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
expiration_timestamp | TIMESTAMP | Example: This property is equivalent to the expirationTime table resource property. |
friendly_name |
| Example: This property is equivalent to the friendlyName table resource property. |
description |
| Example: This property is equivalent to the description table resource property. |
labels |
| Example: This property is equivalent to the labels table resource property. |
privacy_policy |
| The policies to enforce when anyone queries the view. To learn more about the policies available for a view, see the |
VALUE
is a constant expression containing only literals, query parameters, and scalar functions.
The constant expression cannot contain:
SELECT
, CREATE
, or UPDATE
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the value replaces the existing value of that option for the view, if there was one. Setting the value to NULL
clears the view's value for that option.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The view to alter. |
bigquery.tables.update | The view to alter. |
The following example sets the expiration timestamp on a view to seven days from the execution time of the ALTER VIEW
statement, and sets the description as well:
ALTERVIEWmydataset.myviewSETOPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL7DAY),description="View that expires seven days from now")
ALTER MATERIALIZED VIEW SET OPTIONS
statementSets the options on a materialized view.
ALTERMATERIALIZEDVIEW[IFEXISTS]materialized_view_nameSETOPTIONS(materialized_view_set_options_list)
IF EXISTS
: If no materialized view exists with that name, the statement has no effect.
materialized_view_name
: The name of the materialized view to alter. See Table path syntax.
materialized_view_set_options_list
: The list of options to set.
materialized_view_set_options_list
The option list allows you to set materialized view options such as a whether refresh is enabled. the refresh interval, a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a materialized view option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME | VALUE | Details |
---|---|---|
enable_refresh | BOOLEAN | Example: |
refresh_interval_minutes | FLOAT64 | Example: |
expiration_timestamp | TIMESTAMP | Example: This property is equivalent to the expirationTime table resource property. |
max_staleness | INTERVAL | Example: The |
allow_non_incremental_definition | BOOLEAN | Example: The |
kms_key_name |
| Example: This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property. See more details about Protecting data with Cloud KMS keys. |
friendly_name |
| Example: This property is equivalent to the friendlyName table resource property. |
description |
| Example: This property is equivalent to the description table resource property. |
labels |
| Example: This property is equivalent to the labels table resource property. |
Setting the value replaces the existing value of that option for the materialized view, if there was one. Setting the value to NULL
clears the materialized view's value for that option.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.get | The materialized view to alter. |
bigquery.tables.update | The materialized view to alter. |
The following example enables refresh and sets the refresh interval to 20 minutes on a materialized view:
ALTERMATERIALIZEDVIEWmydataset.my_mvSETOPTIONS(enable_refresh=true,refresh_interval_minutes=20)
ALTER ORGANIZATION SET OPTIONS
statementSets the options on an organization.
ALTERORGANIZATIONSETOPTIONS(organization_set_options_list);
organization_set_options_list
: The list of options to set.organization_set_options_list
The option list specifies options for the organization. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
default_kms_key_name | STRING | The default Cloud Key Management Service key for encrypting table data, including temporary or anonymous tables. For more information, see Customer-managed Cloud KMS keys. Example: This property is equivalent to the |
default_time_zone | STRING | The default time zone to use in time zone-dependent SQL functions, when a time zone is not specified as an argument. For more information, see time zones. Example: |
default_query_job_timeout_ms | INT64 | The default time after which a query job times out. The timeout period must be between 10 minutes and 6 hours. Example: |
default_interactive_query_queue_timeout_ms | INT64 | The default amount of time that an interactive query is queued. If unset, the default is 6 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable interactive query queueing, set the value to -1. Example: |
default_batch_query_queue_timeout_ms | INT64 | The default amount of time that a batch query is queued. If unset, the default is 24 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable batch query queueing, set the value to -1. Example: |
default_query_optimizer_options | STRING | The history-based query optimizations. This option can be one of the following:
Example: |
default_sql_dialect_option | STRING | The default sql query dialect for executing query jobs using the bq command-line tool or BigQuery API. Changing this setting doesn't affect the default dialect in the console. This option can be one of the following:
Example: |
Setting the value replaces the existing value of that option for the organization, if there is one. Setting the value to NULL
clears the organization's value for that option.
The ALTER ORGANIZATION SET OPTIONS
statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.config.update | The organization to alter. |
The following example sets the default time zone to America/Chicago and the default query job timeout to one hour for an organization in the US region:
ALTERORGANIZATIONSETOPTIONS(`region-us.default_time_zone`="America/Chicago",`region-us.default_job_query_timeout_ms`=3600000);
The following example sets the default time zone, the default query job timeout, the default interactive and batch queue timeouts, and the default Cloud KMS key, and the default sql dialect to NULL
, clearing the organization level default settings:
ALTERORGANIZATIONSETOPTIONS(`region-us.default_time_zone`=NULL,`region-us.default_kms_key_name`=NULL,`region-us.default_query_job_timeout_ms`=NULL,`region-us.default_interactive_query_queue_timeout_ms`=NULL,`region-us.default_batch_query_queue_timeout_ms`=NULL,`region-us.default_sql_dialect_option`=NULL);
ALTER PROJECT SET OPTIONS
statementSets the options on a project.
ALTERPROJECTproject_idSETOPTIONS(project_set_options_list);
project_id
: The name of the project you're altering. This argument is optional, and defaults to the project that runs this DDL query.project_set_options_list
: The list of options to set.project_set_options_list
The option list specifies options for the project. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
default_kms_key_name | STRING | The default Cloud Key Management Service key for encrypting table data, including temporary or anonymous tables. For more information, see Customer-managed Cloud KMS keys. Example: This property is equivalent to the |
default_time_zone | STRING | The default time zone to use in time zone-dependent SQL functions, when a time zone is not specified as an argument. For more information, see time zones. Example: |
default_query_job_timeout_ms | INT64 | The default time after which a query job times out. The timeout period must be between 10 minutes and 6 hours. Example: |
default_interactive_query_queue_timeout_ms | INT64 | The default amount of time that an interactive query is queued. If unset, the default is 6 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable interactive query queueing, set the value to -1. Example: |
default_batch_query_queue_timeout_ms | INT64 | The default amount of time that a batch query is queued. If unset, the default is 24 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable batch query queueing, set the value to -1. Example: |
default_query_optimizer_options | STRING | The history-based query optimizations. This option can be one of the following:
Example: |
default_cloud_resource_connection_id | STRING | The default connection to use when creating tables and models. Only specify the connection's ID, and exclude the attached project ID and region prefixes. Using default connections can cause the permissions granted to the connection's service account to be updated, depending on the type of table or model you create. For more information, see the Default connection overview. Example: |
default_sql_dialect_option | STRING | The default sql query dialect for executing query jobs using the bq command-line tool or BigQuery API. Changing this setting doesn't affect the default dialect in the console. This option can be one of the following:
Example: |
Setting the value replaces the existing value of that option for the project, if there was one. Setting the value to NULL
clears the project's value for that option.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.config.update | The project to alter. |
The following example sets the default time zone to America/New_York
and the default query job timeout to 30 minutes for a project in the us
region.
ALTERPROJECTproject_idSETOPTIONS(`region-us.default_time_zone`="America/New_York",`region-us.default_job_query_timeout_ms`=1800000);
The following example sets the default time zone, the default query job timeout, the default Cloud KMS key to NULL
, and the default interactive and batch queue timeouts and default sql dialect, clearing the project level default settings:
ALTERPROJECTproject_idSETOPTIONS(`region-us.default_time_zone`=NULL,`region-us.default_kms_key_name`=NULL,`region-us.default_query_job_timeout_ms`=NULL,`region-us.default_interactive_query_queue_timeout_ms`=NULL,`region-us.default_batch_query_queue_timeout_ms`=NULL,`region-us.default_sql_dialect_option`=NULL);
ALTER BI_CAPACITY SET OPTIONS
statementSets the options on BigQuery BI Engine capacity.
ALTERBI_CAPACITY`project_id.location_id.default`SETOPTIONS(bi_capacity_options_list)
project_id
: Optional project ID of the project that will benefit from BI Engine acceleration. If omitted, the query project ID is used.
location_id
: The location where data needs to be cached, prefixed with region-
. Examples: region-us
, region-us-central1
.
bi_capacity_options_list
: The list of options to set.
bi_capacity_options_list
The option list specifies a set of options for BigQuery BI Engine capacity.
Specify a column option list in the following format:
NAME=VALUE, ...
The following options are supported:
NAME | VALUE | Details |
---|---|---|
size_gb | INT64 | Specifies the size of the reservation in gigabytes. |
preferred_tables | <ARRAY<STRING>> | List of tables that acceleration should be applied to. Format: project.dataset.table or dataset.table . If project is omitted, query project is used. |
Setting VALUE
replaces the existing value of that option for the BI Engine capacity, if there is one. Setting VALUE
to NULL
clears the value for that option.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.bireservations.update | BI Engine reservation |
ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(size_gb=250)
ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(size_gb=0)
ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(preferred_tables=NULL)
ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(size_gb=250,preferred_tables=["data_project1.dataset1.table1","data_project2.dataset2.table2"])
ALTERBI_CAPACITY`region-us.default`SETOPTIONS(preferred_tables=["dataset1.table1","data_project2.dataset2.table2"])
ALTER CAPACITY SET OPTIONS
statementAlters an existing capacity commitment.
ALTERCAPACITY`project_id.location_id.commitment_id`SETOPTIONS(alter_capacity_commitment_option_list);
project_id
: The project ID of the administration project that maintains ownership of this commitment.location_id
: The location of the commitment.commitment_id
: The ID of the commitment. The value must be unique to the project and location. It must start and end with a lowercase letter or a number and contain only lowercase letters, numbers and dashes.alter_capacity_commitment_option_list
: The options you can set to alter the capacity commitment.alter_capacity_commitment_option_list
The option list specifies options for the dataset. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | TYPE | Details |
---|---|---|
plan | String | The commitment plan to purchase. Supported values include: ANNUAL , THREE_YEAR , and TRIAL . For more information, see Commitment plans. |
renewal_plan | String | The plan this capacity commitment is converted to after commitment_end_time passes. Once the plan is changed, the committed period is extended according to the commitment plan. Applicable for ANNUAL, THREE_YEAR, and TRIAL commitments. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.capacityCommitments.update | The administration project that maintains ownership of the commitments. |
The following example changes a capacity commitment to a three-year plan that is located in the region-us
region and managed by a project admin_project
:
ALTERCAPACITY`admin_project.region-us.my-commitment`SETOPTIONS(plan='THREE_YEAR');
ALTER RESERVATION SET OPTIONS
statementAlters an existing reservation.
ALTERRESERVATION`project_id.location_id.reservation_id`SETOPTIONS(alter_reservation_option_list);
project_id
: The project ID of the administration project that maintains ownership of this reservation.location_id
: The location of the reservation.reservation_id
: The ID of the reservation. The value must be unique to the project and location. It must start and end with a lowercase letter or a number and contain only lowercase letters, numbers and dashes.alter_reservation_option_list
: The options you can set to alter the reservation.alter_reservation_option_list
The option list specifies options for the dataset. Specify the options in the following format: NAME=VALUE, ...
The following options are supported:
NAME | TYPE | Details |
---|---|---|
ignore_idle_slots | BOOLEAN | If the value is true , then the reservation uses only the slots that are provisioned to it. The default value is false . For more information, see Idle slots. |
slot_capacity | INTEGER | The number of slots to allocate to the reservation. If this reservation was created with an edition, this is equivalent to the amount of baseline slots. |
target_job_concurrency | INTEGER | A soft upper bound on the number of jobs that can run concurrently in this reservation. |
autoscale_max_slots | INTEGER | The maximum number of slots that can be added to the reservation by autoscaling. |
secondary_location | STRING | The secondary location to use in the case of disaster recovery. |
is_primary | BOOLEAN | If the value is true , the reservation is set to be the primary reservation. |
labels | <ARRAY<STRUCT<STRING, STRING>>> | An array of labels for the reservation, expressed as key-value pairs. |
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.reservations.update | The administration project that maintains ownership of the commitments. |
The following example changes an autoscaling reservation to 300 baseline slots and 400 autoscaling slots for a max reservation size of 700. These slots are located in the region-us
region and managed by a project admin_project
:
ALTERRESERVATION`admin_project.region-us.my-reservation`SETOPTIONS(slot_capacity=300,autoscale_max_slots=400);
DROP SCHEMA
statementDeletes a dataset.
DROP[EXTERNAL]SCHEMA[IFEXISTS][project_name.]dataset_name[CASCADE|RESTRICT]
EXTERNAL
: Specifies if that dataset is a federated dataset. The DROP EXTERNAL
statement only removes the external definition from BigQuery. The data stored in the external location is not affected.
IF EXISTS
: If no dataset exists with that name, the statement has no effect.
project_name
: The name of the project that contains the dataset. Defaults to the project that runs this DDL statement.
dataset_name
: The name of the dataset to delete.
CASCADE
: Deletes the dataset and all resources within the dataset, such as tables, views, and functions. You must have permission to delete the resources, or else the statement returns an error. For a list of BigQuery permissions, see Predefined roles and permissions.
RESTRICT
: Deletes the dataset only if it's empty. Otherwise, returns an error. If you don't specify either CASCADE
or RESTRICT
, then the default behavior is RESTRICT
.
The statement runs in the location of the dataset if it exists, unless you specify the location in the query settings. For more information, see Specifying your location.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.delete | The dataset to delete. |
bigquery.tables.delete | The dataset to delete. If the dataset is empty, then this permission is not required. |
The following example deletes the dataset named mydataset
. If the dataset does not exist or is not empty, then the statement returns an error.
DROPSCHEMAmydataset
The following example drops the dataset named mydataset
and any resources in that dataset. If the dataset does not exist, then no error is returned.
DROPSCHEMAIFEXISTSmydatasetCASCADE
UNDROP SCHEMA
statementUndeletes a dataset within your time travel window.
UNDROPSCHEMA[IFNOTEXISTS][project_name.]dataset_name
IF NOT EXISTS
: If a dataset already exists with that name, the statement has no effect.
project_name
: The name of the project that contained the deleted dataset. Defaults to the project that runs this DDL statement.
dataset_name
: The name of the dataset to undelete.
When you run this statement, you must specify the location where the dataset was deleted. If you don't, the US
multi-region is used.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.create | The project where you are undeleting the dataset. |
bigquery.datasets.get | The dataset that you are undeleting. |
The following example undeletes the dataset named mydataset
. If the dataset already exists or has passed the time travel window, then the statement returns an error.
UNDROPSCHEMAmydataset;
DROP TABLE
statementDeletes a table or table clone.
DROPTABLE[IFEXISTS]table_name
IF EXISTS
: If no table exists with that name, the statement has no effect.
table_name
: The name of the table to delete. See Table path syntax.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.delete | The table to delete. |
bigquery.tables.get | The table to delete. |
The following example deletes a table named mytable
in the mydataset
:
DROPTABLEmydataset.mytable
If the table name does not exist in the dataset, the following error is returned:
Error: Not found: Table myproject:mydataset.mytable
The following example deletes a table named mytable
in mydataset
only if the table exists. If the table name does not exist in the dataset, no error is returned, and no action is taken.
DROPTABLEIFEXISTSmydataset.mytable
DROP SNAPSHOT TABLE
statementDeletes a table snapshot.
DROPSNAPSHOTTABLE[IFEXISTS]table_snapshot_name
IF EXISTS
: If no table snapshot exists with that name, then the statement has no effect.
table_snapshot_name
: The name of the table snapshot to delete. See Table path syntax.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.deleteSnapshot | The table snapshot to delete. |
The following example deletes the table snapshot named mytablesnapshot
in the mydataset
dataset:
DROPSNAPSHOTTABLEmydataset.mytablesnapshot
If the table snapshot does not exist in the dataset, then the following error is returned:
Error: Not found: Table snapshot myproject:mydataset.mytablesnapshot
The following example deletes the table snapshot named mytablesnapshot
in the mydataset
dataset.
DROPSNAPSHOTTABLEIFEXISTSmydataset.mytablesnapshot
If the table snapshot doesn't exist in the dataset, then no action is taken, and no error is returned.
For information about creating table snapshots, see CREATE SNAPSHOT TABLE.
For information about restoring table snapshots, see CREATE TABLE CLONE.
DROP EXTERNAL TABLE
statementDeletes an external table.
DROPEXTERNALTABLE[IFEXISTS]table_name
IF EXISTS
: If no external table exists with that name, then the statement has no effect.
table_name
: The name of the external table to delete. See Table path syntax.
If table_name
exists but is not an external table, the statement returns the following error:
Cannot drop table_name which has type TYPE. An external table was expected.
The DROP EXTERNAL
statement only removes the external table definition from BigQuery. The data stored in the external location is not affected.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.delete | The external table to delete. |
bigquery.tables.get | The external table to delete. |
The following example drops the external table named external_table
from the dataset mydataset
. It returns an error if the external table does not exist.
DROPEXTERNALTABLEmydataset.external_table
The following example drops the external table named external_table
from the dataset mydataset
. If the external table does not exist, no error is returned.
DROPEXTERNALTABLEIFEXISTSmydataset.external_table
DROP VIEW
statementDeletes a view.
DROPVIEW[IFEXISTS]view_name
IF EXISTS
: If no view exists with that name, the statement has no effect.
view_name
: The name of the view to delete. See Table path syntax.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.delete | The view to delete. |
bigquery.tables.get | The view to delete. |
The following example deletes a view named myview
in mydataset
:
DROPVIEWmydataset.myview
If the view name does not exist in the dataset, the following error is returned:
Error: Not found: Table myproject:mydataset.myview
The following example deletes a view named myview
in mydataset
only if the view exists. If the view name does not exist in the dataset, no error is returned, and no action is taken.
DROPVIEWIFEXISTSmydataset.myview
DROP MATERIALIZED VIEW
statementDeletes a materialized view.
DROPMATERIALIZEDVIEW[IFEXISTS]mv_name
IF EXISTS
: If no materialized view exists with that name, the statement has no effect.
mv_name
: The name of the materialized view to delete. See Table path syntax.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.delete | The materialized view to delete. |
bigquery.tables.get | The materialized view to delete. |
The following example deletes a materialized view named my_mv
in mydataset
:
DROPMATERIALIZEDVIEWmydataset.my_mv
If the materialized view name does not exist in the dataset, the following error is returned:
Error: Not found: Table myproject:mydataset.my_mv
If you are deleting a materialized view in another project, you must specify the project, dataset, and materialized view in the following format: `project_id.dataset.materialized_view`
(including the backticks if project_id
contains special characters); for example, `myproject.mydataset.my_mv`
.
The following example deletes a materialized view named my_mv
in mydataset
only if the materialized view exists. If the materialized view name does not exist in the dataset, no error is returned, and no action is taken.
DROPMATERIALIZEDVIEWIFEXISTSmydataset.my_mv
If you are deleting a materialized view in another project, you must specify the project, dataset, and materialized view in the following format: `project_id.dataset.materialized_view`,
(including the backticks if project_id
contains special characters); for example, `myproject.mydataset.my_mv`
.
DROP FUNCTION
statementDeletes a persistent user-defined function (UDF) or user-defined aggregate function (UDAF).
DROPFUNCTION[IFEXISTS][[project_name.]dataset_name.]function_name
IF EXISTS
: If no function exists with that name, the statement has no effect.
project_name
: The name of the project containing the function to delete. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks `
(example: `google.com:my_project`
).
dataset_name
: The name of the dataset containing the function to delete. Defaults to the defaultDataset
in the request.
function_name
: The name of the function you're deleting.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.routines.delete | The function to delete. |
The following example statement deletes the function parseJsonAsStruct
contained in the dataset mydataset
.
DROPFUNCTIONmydataset.parseJsonAsStruct;
The following example statement deletes the function parseJsonAsStruct
from the dataset sample_dataset
in the project other_project
.
DROPFUNCTION`other_project`.sample_dataset.parseJsonAsStruct;
DROP TABLE FUNCTION
Deletes a table function.
DROPTABLEFUNCTION[IFEXISTS][[project_name.]dataset_name.]function_name
IF EXISTS
: If no table function exists with this name, the statement has no effect.
project_name
: The name of the project containing the table function to delete. Defaults to the project that runs this DDL query.
dataset_name
: The name of the dataset containing the table function to delete.
function_name
: The name of the table function to delete.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.routines.delete | The table function to delete. |
The following example deletes a table function named my_table_function
:
DROPTABLEFUNCTIONmydataset.my_table_function;
DROP PROCEDURE
statementDeletes a stored procedure.
DROPPROCEDURE[IFEXISTS][[project_name.]dataset_name.]procedure_name
IF EXISTS
: If no procedure exists with that name, the statement has no effect.
project_name
: The name of the project containing the procedure to delete. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks `
(example: `google.com:my_project`
).
dataset_name
: The name of the dataset containing the procedure to delete. Defaults to the defaultDataset
in the request.
procedure_name
: The name of the procedure you're deleting.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.routines.delete | The procedure to delete. |
The following example statement deletes the procedure myprocedure
contained in the dataset mydataset
.
DROPPROCEDUREmydataset.myProcedure;
The following example statement deletes the procedure myProcedure
from the dataset sample_dataset
in the project other_project
.
DROPPROCEDURE`other-project`.sample_dataset.myprocedure;
DROP ROW ACCESS POLICY
statementDeletes a row-level access policy.
DROPROWACCESSPOLICY[IFEXISTS]row_access_policy_nameONtable_name;
DROPALLROWACCESSPOLICIESONtable_name;
IF EXISTS
: If no row-level access policy exists with that name, the statement has no effect.
row_access_policy_name
: The name of the row-level access policy that you are deleting. Each row-level access policy on a table has a unique name.
table_name
: The name of the table with the row-level access policy or policies that you want to delete.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.rowAccessPolicies.delete | The row-level access policy to delete. |
bigquery.rowAccessPolicies.setIamPolicy | The row-level access policy to delete. |
bigquery.rowAccessPolicies.list | The table to delete all row-level access policies on. Only required for DROP ALL statements. |
Delete a row-level access policy from a table:
DROPROWACCESSPOLICYmy_row_filterONproject.dataset.my_table;
Delete all the row-level access policies from a table:
DROPALLROWACCESSPOLICIESONproject.dataset.my_table;
DROP CAPACITY
statementDeletes a capacity commitment.
DROPCAPACITY[IFEXISTS]project_id.location.capacity-commitment-id
IF EXISTS
: If no capacity commitment exists with that ID, the statement has no effect.project_id
: The project ID of the administration project where the reservation was created.location
: The location of the commitment.capacity-commitment-id
: The capacity commitment ID.To find the capacity commitment ID, query the INFORMATION_SCHEMA.CAPACITY_COMMITMENTS_BY_PROJECT
table.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.capacityCommitments.delete | The administration project that maintains ownership of the commitments. |
The following example deletes the capacity commitment:
DROPCAPACITY`admin_project.region-us.1234`
DROP RESERVATION
statementDeletes a reservation.
DROPRESERVATION[IFEXISTS]project_id.location.reservation_id
IF EXISTS
: If no reservation exists with that ID, the statement has no effect.project_id
: The project ID of the administration project where the reservation was created.location
: The location of the reservation.reservation_id
: The reservation ID.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.reservations.delete | The administration project that maintains ownership of the commitments. |
The following example deletes the reservation prod
:
DROPRESERVATION`admin_project.region-us.prod`
DROP ASSIGNMENT
statementDeletes a reservation assignment.
DROPASSIGNMENT[IFEXISTS]project_id.location.reservation_id.assignment_id
IF EXISTS
: If no assignment exists with that ID, the statement has no effect.project_id
: The project ID of the administration project where the reservation was created.location
: The location of the reservation.reservation_id
: The reservation ID.assignment_id
: The assignment ID.To find the assignment ID, query the INFORMATION_SCHEMA.ASSIGNMENTS
view.
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.reservationAssignments.delete | The administration project and the assignee. |
The following example deletes an assignment from the reservation named prod
:
DROPASSIGNMENT`admin_project.region-us.prod.1234`
DROP SEARCH INDEX
statementDeletes a search index on a table.
DROPSEARCHINDEX[IFEXISTS]index_nameONtable_name
IF EXISTS
: If no search index exists with that name on the table, the statement has no effect.index_name
: The name of the search index to be deleted.table_name
: The name of the table with the index.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.deleteIndex | The table with the search index to delete. |
The following example deletes a search index my_index
from my_table
:
DROPSEARCHINDEXmy_indexONdataset.my_table;
DROP VECTOR INDEX
statementDeletes a vector index on a table.
DROPVECTORINDEX[IFEXISTS]index_nameONtable_name
IF EXISTS
: If no vector index exists with that name on the table, the statement has no effect.index_name
: The name of the vector index to be deleted.table_name
: The name of the table with the vector index.This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.deleteIndex | The table with the vector index to delete. |
The following example deletes a vector index my_index
from my_table
:
DROPVECTORINDEXmy_indexONdataset.my_table;
Use the following syntax when specifying the path of a table resource, including standard tables, views, materialized views, external tables, and table snapshots.
table_path:=[[project_name.]dataset_name.]table_name
project_name
: The name of the project that contains the table resource. Defaults to the project that runs the DDL query. If the project name contains special characters such as colons, quote the name in backticks `
(example: `google.com:my_project`
).
dataset_name
: The name of the dataset that contains the table resource. Defaults to the defaultDataset
in the request.
table_name
: The name of the table resource.
When you create a table in BigQuery, the table name must be unique per dataset. The table name can:
The following are all examples of valid table names: table 01
, ग्राहक
, 00_お客様
, étudiant-01
.
Caveats:
mytable
and MyTable
can coexist in the same dataset, unless they are part of a dataset with case-sensitivity turned off.If you include multiple dot operators (.
) in a sequence, the duplicate operators are implicitly stripped.
For example, this: project_name....dataset_name..table_name
Becomes this: project_name.dataset_name.table_name
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-04-25 UTC.