pyarrow.dataset.FileSystemDatasetFactory#
- class pyarrow.dataset.FileSystemDatasetFactory(FileSystem filesystem, paths_or_selector, FileFormat format, FileSystemFactoryOptions options=None)#
Bases:
DatasetFactory
Create a DatasetFactory from a list of paths with schema inspection.
- Parameters:
- filesystem
pyarrow.fs.FileSystem
Filesystem to discover.
- paths_or_selector
pyarrow.fs.FileSelector
orlist
of path-likes Either a Selector object or a list of path-like objects.
- format
FileFormat
Currently only ParquetFileFormat and IpcFileFormat are supported.
- options
FileSystemFactoryOptions
, optional Various flags influencing the discovery of filesystem paths.
- filesystem
- __init__(*args, **kwargs)#
Methods
__init__
(*args, **kwargs)finish
(self, Schema schema=None)Create a Dataset using the inspected schema or an explicit schema (if given).
inspect
(self, *[, promote_options, fragments])Inspect data fragments and return a common Schema.
inspect_schemas
(self)Attributes
- finish(self, Schema schema=None)#
Create a Dataset using the inspected schema or an explicit schema (if given).
- inspect(self, *, promote_options='default', fragments=None)#
Inspect data fragments and return a common Schema.
- Parameters:
- promote_options
str
, default “default” Control how to unify types. Accepts strings “default” and “permissive”. Default: types must match exactly, except nulls can be merged with other types. Permissive: types are promoted when possible.
- fragments
int
, defaultNone
How many fragments should be inspected to infer the unified schema. Use
None
to inspect all fragments.
- promote_options
- Returns:
- inspect_schemas(self)#
- root_partition#