- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
PCA(
    n_components: typing.Optional[typing.Union[int, float]] = None,
    *,
    svd_solver: typing.Literal["full", "randomized", "auto"] = "auto"
)Principal component analysis (PCA).
Examples:
>>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import PCA
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [-1, -2, -3, 1, 2, 3], "feat1": [-1, -1, -2, 1, 1, 2]})
>>> pca = PCA(n_components=2).fit(X)
>>> pca.predict(X) # doctest:+SKIP
    principal_component_1  principal_component_2
0              -0.755243               0.157628
1               -1.05405              -0.141179
2              -1.809292               0.016449
3               0.755243              -0.157628
4                1.05405               0.141179
5               1.809292              -0.016449
<BLANKLINE>
[6 rows x 2 columns]
>>> pca.explained_variance_ratio_ # doctest:+SKIP
    principal_component_id  explained_variance_ratio
0                       1                   0.00901
1                       0                   0.99099
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| n_components | int, float or None, default NoneNumber of components to keep. If n_components is not set, all components are kept, n_components = min(n_samples, n_features). If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. | 
| svd_solver | "full", "randomized" or "auto", default "auto"The solver to use to calculate the principal components. Details: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca#pca_solver. | 
Properties
components_
Principal axes in feature space, representing the directions of maximum variance in the data.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame of principal components, containing following columns: principal_component_id: An integer that identifies the principal component. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the principal component that principal_component_id identifies. If feature isn't numeric, the value is NULL. categorical_value: A list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per component. | 
explained_variance_
The amount of variance explained by each of the selected components.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance: The factor by which the eigenvector is scaled. Eigenvalue and explained variance are the same concepts in PCA. | 
explained_variance_ratio_
Percentage of variance explained by each of the selected components.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance_ratio: the total variance is the sum of variances, also known as eigenvalues, of all of the individual principal components. The explained variance ratio by a principal component is the ratio between the variance, also known as eigenvalue, of that principal component and the total variance. | 
Methods
__repr__
__repr__()Print the estimator's constructor with all non-default parameter values.
detect_anomalies
detect_anomalies(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    *,
    contamination: float = 0.1
) -> bigframes.dataframe.DataFrameDetect the anomaly data points of the input.
| Parameters | |
|---|---|
| Name | Description | 
| X | bigframes.dataframe.DataFrame or bigframes.series.SeriesSeries or a DataFrame to detect anomalies. | 
| contamination | float, default 0.1Identifies the proportion of anomalies in the training dataset that are used to create the model. The value must be in the range [0, 0.5]. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | detected DataFrame. | 
fit
fit(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
) -> bigframes.ml.base._TFit the model according to the given training data.
| Parameters | |
|---|---|
| Name | Description | 
| X | bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.SeriesSeries or DataFrame of shape (n_samples, n_features). Training vector, where  | 
| y | default NoneIgnored. | 
| Returns | |
|---|---|
| Type | Description | 
| PCA | Fitted estimator. | 
get_params
get_params(deep: bool = True) -> typing.Dict[str, typing.Any]Get parameters for this estimator.
| Parameter | |
|---|---|
| Name | Description | 
| deep | bool, default TrueDefault  | 
| Returns | |
|---|---|
| Type | Description | 
| Dictionary | A dictionary of parameter names mapped to their values. | 
predict
predict(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
) -> bigframes.dataframe.DataFramePredict the closest cluster for each sample in X.
| Parameter | |
|---|---|
| Name | Description | 
| X | bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.SeriesSeries or a DataFrame to predict. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Predicted DataFrames. | 
register
register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._TRegister the model to Vertex AI.
After register, go to the Google Cloud console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.
| Parameter | |
|---|---|
| Name | Description | 
| vertex_ai_model_id | Optional[str], default NoneOptional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation. | 
score
score(X=None, y=None) -> bigframes.dataframe.DataFrameCalculate evaluation metrics of the model.
| Parameters | |
|---|---|
| Name | Description | 
| X | default NoneIgnored. | 
| y | default NoneIgnored. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame that represents model metrics. | 
to_gbq
to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.decomposition.PCASave the model to BigQuery.
| Parameters | |
|---|---|
| Name | Description | 
| model_name | strThe name of the model. | 
| replace | bool, default FalseDetermine whether to replace if the model already exists. Default to False. | 
| Returns | |
|---|---|
| Type | Description | 
| PCA | Saved model. |