- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
BlobAccessor(*args, **kwargs)
Blob functions for Series and Index.
Properties
session
API documentation for session
property.
Methods
audio_transcribe
audio_transcribe(
*,
engine: typing.Literal["bigquery"] = "bigquery",
connection: typing.Optional[str] = None,
model_name: typing.Optional[
typing.Literal["gemini-2.0-flash-001", "gemini-2.0-flash-lite-001"]
] = None,
verbose: bool = False
) -> bigframes.series.Series
Transcribe audio content using a Gemini multimodal model.
Parameters | |
---|---|
Name | Description |
engine |
'bigquery'
The engine (bigquery or third party library) used for the function. |
connection |
str or None, default None
BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session. |
model_name |
str
The model for natural language tasks. Accepted values are "gemini-2.0-flash-lite-001", and "gemini-2.0-flash-001". See "https://ai.google.dev/gemini-api/docs/models" for model choices. |
verbose |
bool, default "False"
controls the verbosity of the output. When set to True, both error messages and the transcribed content are displayed. Conversely, when set to False, only the transcribed content is presented, suppressing error messages. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
str or struct[str, str], depend on the "verbose" parameter. Contains the transcribed text from the audio file. Includes error messages if verbosity is enabled. |
authorizer
authorizer() -> bigframes.series.Series
Authorizers of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
Autorithers(connection) as string. |
content_type
content_type() -> bigframes.series.Series
Retrieve the content type of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
string of the content type. |
display
display(
n: int = 3,
*,
content_type: str = "",
width: typing.Optional[int] = None,
height: typing.Optional[int] = None
)
Display the blob content in the IPython Notebook environment. Only works for image type now.
Parameters | |
---|---|
Name | Description |
n |
int, default 3
number of sample blob objects to display. |
content_type |
str, default ""
content type of the blob. If unset, use the blob metadata of the storage. Possible values are "image", "audio" and "video". |
width |
int or None, default None
width in pixels that the image/video are constrained to. If unset, use the global setting in bigframes.options.display.blob_display_width, otherwise image/video's original size or ratio is used. No-op for other content types. |
height |
int or None, default None
height in pixels that the image/video are constrained to. If unset, use the global setting in bigframes.options.display.blob_display_height, otherwise image/video's original size or ratio is used. No-op for other content types. |
exif
exif(
*,
engine: typing.Literal[None, "pillow"] = None,
connection: typing.Optional[str] = None,
max_batching_rows: int = 8192,
container_cpu: typing.Union[float, int] = 0.33,
container_memory: str = "512Mi"
) -> bigframes.series.Series
Extract EXIF data. Now only support image types.
Parameters | |
---|---|
Name | Description |
engine |
'pillow' or None, default None
The engine (bigquery or third party library) used for the function. The value must be specified. |
connection |
str or None, default None
BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session. |
max_batching_rows |
int, default 8,192
Max number of rows per batch send to cloud run to execute the function. |
container_cpu |
int or float, default 0.33
number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers. |
container_memory |
str, default "512Mi"
container memory size. String of the format
|
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
JSON series of key-value pairs. |
get_runtime_json_str
get_runtime_json_str(
mode: str = "R", *, with_metadata: bool = False
) -> bigframes.series.Series
Get the runtime (contains signed URL to access gcs data) and apply the ToJSONSTring transformation.
Parameters | |
---|---|
Name | Description |
mode |
str or str, default "R"
the mode for accessing the runtime. Default to "R". Possible values are "R" (read-only) and "RW" (read-write) |
with_metadata |
bool, default False
whether to include metadata in the JSON string. Default to False. |
Returns | |
---|---|
Type | Description |
str |
the runtime object in the JSON string. |
image_blur
image_blur(
ksize: tuple[int, int],
*,
engine: typing.Literal[None, "opencv"] = None,
dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
connection: typing.Optional[str] = None,
max_batching_rows: int = 8192,
container_cpu: typing.Union[float, int] = 0.33,
container_memory: str = "512Mi"
) -> bigframes.series.Series
Blurs images.
Parameters | |
---|---|
Name | Description |
ksize |
tuple(int, int)
Kernel size. |
engine |
'opencv' or None, default None
The engine (bigquery or third party library) used for the function. The value must be specified. |
dst |
str or bigframes.series.Series or None, default None
Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding. |
connection |
str or None, default None
BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session. |
max_batching_rows |
int, default 8,192
Max number of rows per batch send to cloud run to execute the function. |
container_cpu |
int or float, default 0.33
number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers. |
container_memory |
str, default "512Mi"
container memory size. String of the format
|
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
blob Series if destination is GCS. Or bytes Series if destination is BQ. |
image_normalize
image_normalize(
*,
engine: typing.Literal[None, "opencv"] = None,
alpha: float = 1.0,
beta: float = 0.0,
norm_type: str = "l2",
dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
connection: typing.Optional[str] = None,
max_batching_rows: int = 8192,
container_cpu: typing.Union[float, int] = 0.33,
container_memory: str = "512Mi"
) -> bigframes.series.Series
Normalize images.
Parameters | |
---|---|
Name | Description |
engine |
'opencv' or None, default None
The engine (bigquery or third party library) used for the function. The value must be specified. |
alpha |
float, default 1.0
Norm value to normalize to or the lower range boundary in case of the range normalization. |
beta |
float, default 0.0
Upper range boundary in case of the range normalization; it is not used for the norm normalization. |
norm_type |
str, default "l2"
Normalization type. Accepted values are "inf", "l1", "l2" and "minmax". |
dst |
str or bigframes.series.Series or None, default None
Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding. |
connection |
str or None, default None
BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session. |
max_batching_rows |
int, default 8,192
Max number of rows per batch send to cloud run to execute the function. |
container_cpu |
int or float, default 0.33
number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers. |
container_memory |
str, default "512Mi"
container memory size. String of the format
|
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
blob Series if destination is GCS. Or bytes Series if destination is BQ. |
image_resize
image_resize(
dsize: tuple[int, int] = (0, 0),
*,
engine: typing.Literal[None, "opencv"] = None,
fx: float = 0.0,
fy: float = 0.0,
dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
connection: typing.Optional[str] = None,
max_batching_rows: int = 8192,
container_cpu: typing.Union[float, int] = 0.33,
container_memory: str = "512Mi"
)
Resize images.
Parameters | |
---|---|
Name | Description |
dsize |
tuple(int, int), default (0, 0)
Destination size. If set to 0, fx and fy parameters determine the size. |
engine |
'opencv' or None, default None
The engine (bigquery or third party library) used for the function. The value must be specified. |
fx |
float, default 0.0
scale factor along the horizontal axis. If set to 0.0, dsize parameter determines the output size. |
fy |
float, defalut 0.0
scale factor along the vertical axis. If set to 0.0, dsize parameter determines the output size. |
dst |
str or bigframes.series.Series or None, default None
Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding. |
connection |
str or None, default None
BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session. |
max_batching_rows |
int, default 8,192
Max number of rows per batch send to cloud run to execute the function. |
container_cpu |
int or float, default 0.33
number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers. |
container_memory |
str, default "512Mi"
container memory size. String of the format
|
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
blob Series if destination is GCS. Or bytes Series if destination is BQ. |
md5_hash
md5_hash() -> bigframes.series.Series
Retrieve the md5 hash of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
string of the md5 hash. |
metadata
metadata() -> bigframes.series.Series
Retrieve the metadata of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
JSON metadata of the Blob. Contains fields: content_type, md5_hash, size and updated(time). |
pdf_chunk
pdf_chunk(
*,
engine: typing.Literal[None, "pypdf"] = None,
connection: typing.Optional[str] = None,
chunk_size: int = 2000,
overlap_size: int = 200,
max_batching_rows: int = 1,
container_cpu: typing.Union[float, int] = 2,
container_memory: str = "1Gi",
verbose: bool = False
) -> bigframes.series.Series
Extracts and chunks text from PDF URLs and saves the text as arrays of strings.
Parameters | |
---|---|
Name | Description |
engine |
'pypdf' or None, default None
The engine (bigquery or third party library) used for the function. The value must be specified. |
connection |
str or None, default None
BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session. |
chunk_size |
int, default 2000
the desired size of each text chunk (number of characters). |
overlap_size |
int, default 200
the number of overlapping characters between consective chunks. The helps to ensure context is perserved across chunk boundaries. |
max_batching_rows |
int, default 1
Max number of rows per batch send to cloud run to execute the function. |
container_cpu |
int or float, default 2
number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers. |
container_memory |
str, default "1Gi"
container memory size. String of the format
|
verbose |
bool, default "False"
controls the verbosity of the output. When set to True, both error messages and the extracted content are displayed. Conversely, when set to False, only the extracted content is presented, suppressing error messages. |
Returns | |
---|---|
Type | Description |
bigframe.series.Series |
array[str] or struct[str, array[str]], depend on the "verbose" parameter. where each string is a chunk of text extracted from PDF. Includes error messages if verbosity is enabled. |
pdf_extract
pdf_extract(
*,
engine: typing.Literal[None, "pypdf"] = None,
connection: typing.Optional[str] = None,
max_batching_rows: int = 1,
container_cpu: typing.Union[float, int] = 2,
container_memory: str = "1Gi",
verbose: bool = False
) -> bigframes.series.Series
Extracts text from PDF URLs and saves the text as string.
Parameters | |
---|---|
Name | Description |
engine |
'pypdf' or None, default None
The engine (bigquery or third party library) used for the function. The value must be specified. |
connection |
str or None, default None
BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session. |
max_batching_rows |
int, default 1
Max number of rows per batch send to cloud run to execute the function. |
container_cpu |
int or float, default 2
number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers. |
container_memory |
str, default "1Gi"
container memory size. String of the format
|
verbose |
bool, default "False"
controls the verbosity of the output. When set to True, both error messages and the extracted content are displayed. Conversely, when set to False, only the extracted content is presented, suppressing error messages. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
str or struct[str, str], depend on the "verbose" parameter. Contains the extracted text from the PDF file. Includes error messages if verbosity is enabled. |
read_url
read_url() -> bigframes.series.Series
Retrieve the read URL of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
Read only URLs. |
size
size() -> bigframes.series.Series
Retrieve the file size of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
file size in bytes. |
updated
updated() -> bigframes.series.Series
Retrieve the updated time of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
updated time as UTC datetime. |
uri
uri() -> bigframes.series.Series
URIs of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
URIs as string. |
version
version() -> bigframes.series.Series
Versions of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
Version as string. |
write_url
write_url() -> bigframes.series.Series
Retrieve the write URL of the Blob.
Returns | |
---|---|
Type | Description |
bigframes.series.Series |
Writable URLs. |