- 2.27.0 (latest)
- 2.26.0
- 2.25.0
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
API documentation for bigquery package.
Packages Functions
array_agg
array_agg(
    obj: groupby.SeriesGroupBy | groupby.DataFrameGroupBy,
) -> series.Series | dataframe.DataFrameGroup data and create arrays from selected columns, omitting NULLs to avoid BigQuery errors (NULLs not allowed in arrays).
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
For a SeriesGroupBy object:
>>> lst = ['a', 'a', 'b', 'b', 'a']
>>> s = bpd.Series([1, 2, 3, 4, np.nan], index=lst)
>>> bbq.array_agg(s.groupby(level=0))
a    [1. 2.]
b    [3. 4.]
dtype: list<item: double>[pyarrow]
For a DataFrameGroupBy object:
>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = bpd.DataFrame(l, columns=["a", "b", "c"])
>>> bbq.array_agg(df.groupby(by=["b"]))
         a      c
b
1.0    [2]    [3]
2.0  [1 1]  [3 2]
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| obj | A GroupBy object to be applied the function. | 
array_length
array_length(series: series.Series) -> series.SeriesCompute the length of each array element in the Series.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([[1, 2, 8, 3], [], [3, 4]])
>>> bbq.array_length(s)
0    4
1    0
2    2
dtype: Int64
You can also apply this function directly to Series.
>>> s.apply(bbq.array_length, by_row=False)
0    4
1    0
2    2
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| series | A Series with array columns. | 
array_to_string
array_to_string(series: series.Series, delimiter: str) -> series.SeriesConverts array elements within a Series into delimited strings.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([["H", "i", "!"], ["Hello", "World"], np.nan, [], ["Hi"]])
>>> bbq.array_to_string(s, delimiter=", ")
    0         H, i, !
    1    Hello, World
    2
    3
    4              Hi
    dtype: string
| Parameters | |
|---|---|
| Name | Description | 
| series | A Series containing arrays. | 
| delimiter | The string used to separate array elements. | 
json_extract
json_extract(series: series.Series, json_path: str) -> series.SeriesExtracts a JSON value and converts it to a SQL JSON-formatted STRING or JSON
value. This function uses single quotes and brackets to escape invalid JSONPath
characters in JSON keys.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> bbq.json_extract(s, json_path="$.class")
0    {"students":[{"id":5},{"id":12}]}
dtype: string
| Parameters | |
|---|---|
| Name | Description | 
| series | The Series containing JSON data (as native JSON objects or JSON-formatted strings). | 
| json_path | The JSON path identifying the data that you want to obtain from the input. | 
json_extract_array
json_extract_array(series: series.Series, json_path: str = "$") -> series.SeriesExtracts a JSON array and converts it to a SQL array of JSON-formatted STRING or JSON
values. This function uses single quotes and brackets to escape invalid JSONPath
characters in JSON keys.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]
| Parameters | |
|---|---|
| Name | Description | 
| series | The Series containing JSON data (as native JSON objects or JSON-formatted strings). | 
| json_path | The JSON path identifying the data that you want to obtain from the input. | 
json_set
json_set(
    series: series.Series,
    json_path_value_pairs: typing.Sequence[typing.Tuple[str, typing.Any]],
) -> series.SeriesProduces a new JSON value within a Series by inserting or replacing values at specified paths.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.read_gbq("SELECT JSON '{\"a\": 1}' AS data")["data"]
>>> bbq.json_set(s, json_path_value_pairs=[("$.a", 100), ("$.b", "hi")])
    0    {"a":100,"b":"hi"}
    Name: data, dtype: string
| Parameters | |
|---|---|
| Name | Description | 
| series | The Series containing JSON data (as native JSON objects or JSON-formatted strings). | 
| json_path_value_pairs | Pairs of JSON path and the new value to insert/replace. | 
struct
struct(value: dataframe.DataFrame) -> series.SeriesTakes a DataFrame and converts it into a Series of structs with each struct entry corresponding to a DataFrame row and each struct field corresponding to a DataFrame column
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.series as series
>>> bpd.options.display.progress_bar = None
>>> srs = series.Series([{"version": 1, "project": "pandas"}, {"version": 2, "project": "numpy"},])
>>> df = srs.struct.explode()
>>> bbq.struct(df)
0    {'project': 'pandas', 'version': 1}
1     {'project': 'numpy', 'version': 2}
dtype: struct<project: string, version: int64>[pyarrow]
Args:
    value (bigframes.dataframe.DataFrame):
        The DataFrame to be converted to a Series of structs
Returns:
    bigframes.series.Series: A new Series with struct entries representing rows of the original DataFrame
vector_search
vector_search(
    base_table: str,
    column_to_search: str,
    query: Union[dataframe.DataFrame, series.Series],
    *,
    query_column_to_search: Optional[str] = None,
    top_k: Optional[int] = 10,
    distance_type: Literal["euclidean", "cosine"] = "euclidean",
    fraction_lists_to_search: Optional[float] = None,
    use_brute_force: bool = False
) -> dataframe.DataFrameConduct vector search which searches embeddings to find semantically similar entities.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
DataFrame embeddings for which to find nearest neighbors. The ARRAY<FLOAT64> column
is used as the search query:
>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2)
  query_id  embedding  id my_embedding  distance
1      cat  [3.  5.2]   5    [5.  5.4]  2.009975
0      dog    [1. 2.]   1      [1. 2.]       0.0
0      dog    [1. 2.]   4    [1.  3.2]       1.2
1      cat  [3.  5.2]   2      [2. 4.]   1.56205
<BLANKLINE>
[4 rows x 5 columns]
Series embeddings for which to find nearest neighbors:
>>> search_query = bpd.Series([[1.0, 2.0], [3.0, 5.2]],
...                            index=["dog", "cat"],
...                            name="embedding")
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2)
     embedding  id my_embedding  distance
dog    [1. 2.]   1      [1. 2.]       0.0
cat  [3.  5.2]   5    [5.  5.4]  2.009975
dog    [1. 2.]   4    [1.  3.2]       1.2
cat  [3.  5.2]   2      [2. 4.]   1.56205
<BLANKLINE>
[4 rows x 4 columns]
You can specify the name of the column in the query DataFrame embeddings and distance type. If you specify query_column_to_search_value, it will use the provided column which contains the embeddings for which to find nearest neighbors. Otherwiese, it uses the column_to_search value.
>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]],
...                               "another_embedding": [[0.7, 2.2], [3.3, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             distance_type="cosine",
...             query_column_to_search="another_embedding",
...             top_k=2)
  query_id  embedding another_embedding  id my_embedding  distance
1      cat  [3.  5.2]         [3.3 5.2]   2      [2. 4.]  0.005181
0      dog    [1. 2.]         [0.7 2.2]   4    [1.  3.2]  0.000013
1      cat  [3.  5.2]         [3.3 5.2]   1      [1. 2.]  0.005181
0      dog    [1. 2.]         [0.7 2.2]   3    [1.5 7. ]  0.004697
<BLANKLINE>
[4 rows x 6 columns]
| Parameters | |
|---|---|
| Name | Description | 
| base_table | The table to search for nearest neighbor embeddings. | 
| column_to_search | The name of the base table column to search for nearest neighbor embeddings. The column must have a type of  | 
| query | A Series or DataFrame that provides the embeddings for which to find nearest neighbors. | 
| query_column_to_search | Specifies the name of the column in the query that contains the embeddings for which to find nearest neighbors. The column must have a type of  | 
| top_k | Sepecifies the number of nearest neighbors to return. Default to 10. | 
| distance_type | Specifies the type of metric to use to compute the distance between two vectors. Possible values are "euclidean" and "cosine". Default to "euclidean". | 
| fraction_lists_to_search | Specifies the percentage of lists to search. Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage. It is only used when a vector index is also used. You can only specify  | 
| use_brute_force | Determines whether to use brute force search by skipping the vector index if one is available. Default to False. |