Module decomposition (2.12.0)

Matrix Decomposition models. This module is styled after Scikit-Learn's decomposition module: https://scikit-learn.org/stable/modules/decomposition.html.

Classes

MatrixFactorization

MatrixFactorization(
    *,
    feedback_type: typing.Literal["explicit", "implicit"] = "explicit",
    num_factors: int,
    user_col: str,
    item_col: str,
    rating_col: str = "rating",
    l2_reg: float = 1.0
)

Matrix Factorization (MF).

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import MatrixFactorization
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({
... "row": [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6],
... "column": [0,1] * 7,
... "value": [1, 1, 2, 1, 3, 1.2, 4, 1, 5, 0.8, 6, 1, 2, 3],
... })
>>> model = MatrixFactorization(feedback_type='explicit', num_factors=6, user_col='row', item_col='column', rating_col='value', l2_reg=2.06)
>>> W = model.fit(X)
Parameters
Name Description
feedback_type 'explicit' 'implicit'

Specifies the feedback type for the model. The feedback type determines the algorithm that is used during training.

num_factors int or auto, default auto

Specifies the number of latent factors to use.

user_col str

The user column name.

item_col str

The item column name.

l2_reg float, default 1.0

A floating point value for L2 regularization. The default value is 1.0.

PCA

PCA(
    n_components: typing.Optional[typing.Union[int, float]] = None,
    *,
    svd_solver: typing.Literal["full", "randomized", "auto"] = "auto"
)

Principal component analysis (PCA).

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import PCA
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [-1, -2, -3, 1, 2, 3], "feat1": [-1, -1, -2, 1, 1, 2]})
>>> pca = PCA(n_components=2).fit(X)
>>> pca.predict(X) # doctest:+SKIP
    principal_component_1  principal_component_2
0              -0.755243               0.157628
1               -1.05405              -0.141179
2              -1.809292               0.016449
3               0.755243              -0.157628
4                1.05405               0.141179
5               1.809292              -0.016449
<BLANKLINE>
[6 rows x 2 columns]
>>> pca.explained_variance_ratio_ # doctest:+SKIP
    principal_component_id  explained_variance_ratio
0                       1                   0.00901
1                       0                   0.99099
<BLANKLINE>
[2 rows x 2 columns]
Parameters
Name Description
n_components int, float or None, default None

Number of components to keep. If n_components is not set, all components are kept, n_components = min(n_samples, n_features). If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

svd_solver "full", "randomized" or "auto", default "auto"

The solver to use to calculate the principal components. Details: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca#pca_solver.