NUMERIC
BIGNUMERIC
FLOAT64
An important part of creating a good matrix factorization model for recommendations is to make sure that data is trained on the algorithm that is best suited for it. For matrix factorization models, there are two different ways to get a rating for a user-item pair.
Ratings provided by the user are considered to be explicit feedback. A low explicit rating tends to imply the user felt very negatively about an item while a high explicit rating tends to imply that the use liked the item. Movie streaming sites where users give ratings are examples of explicitly labeled datasets. For explicit feedback problems, BigQuery ML uses the alternating least squares algorithm (ALS). ALS seeks to minimize the following loss function:
Where
NUM_FACTORS
. NUM_FACTORS
. L2_REG
However, most of the time data isn't labeled by users. Often, the only metrics that you have as to whether a user liked an item or movie is by the click rate or engagement time. You can use this as a proxy rating, but it is not necessarily a definitive indication as to whether a user likes or dislikes something. The data in these datasets is considered to be implicit feedback. For implicit feedback problems, BigQuery ML uses a variant of the ALS algorithm called weighted-alternating least squares (WALS), which is described in http://yifanhu.net/PUB/cf.pdf. This approach uses these proxy ratings and treats them as an indicator of the interest that a user has in an item. WALS seeks to minimize the following loss function:
Where, in addition to the variables defined above, the function also introduces the following variables:
WALS_ALPHA
For explicit matrix factorization, the input is typically integers within a known fixed range. For implicit matrix factorization, the input ratings can be doubles or integers that span a wider range. We recommend that you make sure there aren't any outliers in the input ratings, and that you scale the input ratings if the model is performing poorly.
Matrix factorization models support hyperparameter tuning, which you can use to improve model performance for your data. To use hyperparameter tuning, set the NUM_TRIALs
option to the number of trials that you want to run. BigQuery ML then trains the model the number of times that you specify, using different hyperparameter values, and returns the model that performs the best.
Hyperparameter tuning defaults to improving the key performance metric for the given model type. You can use the HPARAM_TUNING_OBJECTIVES
option to tune for a different metric if you need to.
For more information about the training objectives and hyperparameters supported for explicit matrix factorization models, see MATRIX_FACTORIZATION
(explicit). For more information about the training objectives and hyperparameters supported for implicit matrix factorization models, see MATRIX_FACTORIZATION
(implicit). To try a tutorial that walks you through hyperparameter tuning, see Improve model performance with hyperparameter tuning.
If you get the "Model is too large (>100 MB)" error, check the input data. This error is caused by having too many ratings for a single user or a single item. Hashing the user or item columns into an INT64
value or reducing the data size can help. You can use the following formula to determine whether this error might occur:
max(num_rated_user,num_rated_item) < 100million
Where num_rated_user
is the maximum item ratings that a single user has entered and num_rated_items
is the maximum user ratings for a given item.
To create a matrix factorization model you must create a reservation that uses the BigQuery Enterprise or Enterprise Plus edition, and then create a reservation assignment that uses the QUERY
job type.
The following example creates models named mymodel
in dataset mydataset
in your default project.
This example creates an explicit feedback matrix factorization model.
CREATEMODEL`project_id.mydataset.mymodel`OPTIONS(MODEL_TYPE='MATRIX_FACTORIZATION')ASSELECTuser,item,ratingFROM`mydataset.mytable`
This example creates an implicit feedback matrix factorization model.
CREATEMODEL`project_id.mydataset.mymodel`OPTIONS(MODEL_TYPE='MATRIX_FACTORIZATION',FEEDBACK_TYPE='IMPLICIT')ASSELECTuser,item,ratingFROM`mydataset.mytable`
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-04-17 UTC.