Package org.apache.spark.sql.util
Class SchemaUtils
Object
org.apache.spark.sql.util.SchemaUtils
Utils for handling schemas.
TODO: Merge this file with SchemaUtils
.
-
Nested Class Summary
Nested Classes -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
checkColumnNameDuplication
(scala.collection.immutable.Seq<String> columnNames, boolean caseSensitiveAnalysis) Checks if input column names have duplicate identifiers.static void
checkColumnNameDuplication
(scala.collection.immutable.Seq<String> columnNames, scala.Function2<String, String, Object> resolver) Checks if input column names have duplicate identifiers.static void
Throws an error if the given schema has indeterminate collation.static void
checkNoCollationsInMapKeys
(DataType schema) static void
checkSchemaColumnNameDuplication
(DataType schema, boolean caseSensitiveAnalysis) Checks if an input schema has duplicate column names.static void
checkSchemaColumnNameDuplication
(StructType schema, scala.Function2<String, String, Object> resolver) Checks if an input schema has duplicate column names.static void
checkTransformDuplication
(scala.collection.immutable.Seq<Transform> transforms, String checkType, boolean isCaseSensitive) Checks if the partitioning transforms are being duplicated or not.static String
static scala.collection.immutable.Seq<String>
explodeNestedFieldNames
(StructType schema) Returns all column names in this schema as a flat list.static scala.collection.immutable.Seq<org.apache.spark.sql.util.SchemaUtils.ColumnPath>
findColumnPaths
(DataType dt, scala.Function1<DataType, Object> f) For the given dataTypedt
find all column paths that satisfy the given predicatef
.static scala.collection.immutable.Seq<Object>
findColumnPosition
(scala.collection.immutable.Seq<String> column, StructType schema, scala.Function2<String, String, Object> resolver) Returns the given column's ordinal within the givenschema
.static scala.collection.immutable.Seq<String>
getColumnName
(scala.collection.immutable.Seq<Object> position, StructType schema) Gets the name of the column in the given position.static boolean
Checks if a given data type has indeterminate collation.static boolean
Checks if a given data type has a non utf8 binary (implicit) collation type.static DataType
Replaces any collated string type with non collated StringType recursively in the given data type.static scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression>
restoreOriginalOutputNames
(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.immutable.Seq<String> originalNames)
-
Constructor Details
-
SchemaUtils
public SchemaUtils()
-
-
Method Details
-
findColumnPaths
public static scala.collection.immutable.Seq<org.apache.spark.sql.util.SchemaUtils.ColumnPath> findColumnPaths(DataType dt, scala.Function1<DataType, Object> f) For the given dataTypedt
find all column paths that satisfy the given predicatef
.- Parameters:
dt
- (undocumented)f
- (undocumented)- Returns:
- (undocumented)
-
checkSchemaColumnNameDuplication
Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.- Parameters:
schema
- schema to checkcaseSensitiveAnalysis
- whether duplication checks should be case sensitive or not
-
checkSchemaColumnNameDuplication
public static void checkSchemaColumnNameDuplication(StructType schema, scala.Function2<String, String, Object> resolver) Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.- Parameters:
schema
- schema to checkresolver
- resolver used to determine if two identifiers are equal
-
checkColumnNameDuplication
public static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, scala.Function2<String, String, Object> resolver) Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.- Parameters:
columnNames
- column names to checkresolver
- resolver used to determine if two identifiers are equal
-
checkColumnNameDuplication
public static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, boolean caseSensitiveAnalysis) Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.- Parameters:
columnNames
- column names to checkcaseSensitiveAnalysis
- whether duplication checks should be case sensitive or not
-
explodeNestedFieldNames
Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"- Parameters:
schema
- (undocumented)- Returns:
- (undocumented)
-
checkTransformDuplication
public static void checkTransformDuplication(scala.collection.immutable.Seq<Transform> transforms, String checkType, boolean isCaseSensitive) Checks if the partitioning transforms are being duplicated or not. Throws an exception if duplication exists.- Parameters:
transforms
- the schema to check for duplicatescheckType
- contextual information around the check, used in an exception messageisCaseSensitive
- Whether to be case sensitive when comparing column names
-
findColumnPosition
public static scala.collection.immutable.Seq<Object> findColumnPosition(scala.collection.immutable.Seq<String> column, StructType schema, scala.Function2<String, String, Object> resolver) Returns the given column's ordinal within the givenschema
. The length of the returned position will be as long as how nested the column is.- Parameters:
column
- The column to search for in the given struct. If the length ofcolumn
is greater than 1, we expect to enter a nested field.schema
- The current struct we are looking at.resolver
- The resolver to find the column.- Returns:
- (undocumented)
-
getColumnName
public static scala.collection.immutable.Seq<String> getColumnName(scala.collection.immutable.Seq<Object> position, StructType schema) Gets the name of the column in the given position.- Parameters:
position
- (undocumented)schema
- (undocumented)- Returns:
- (undocumented)
-
restoreOriginalOutputNames
public static scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> restoreOriginalOutputNames(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.immutable.Seq<String> originalNames) -
escapeMetaCharacters
- Parameters:
str
- The string to be escaped.- Returns:
- The escaped string.
-
hasNonUTF8BinaryCollation
Checks if a given data type has a non utf8 binary (implicit) collation type.- Parameters:
dt
- (undocumented)- Returns:
- (undocumented)
-
hasIndeterminateCollation
Checks if a given data type has indeterminate collation. -
checkIndeterminateCollationInSchema
Throws an error if the given schema has indeterminate collation.- Parameters:
schema
- (undocumented)
-
checkNoCollationsInMapKeys
-
replaceCollatedStringWithString
Replaces any collated string type with non collated StringType recursively in the given data type.- Parameters:
dt
- (undocumented)- Returns:
- (undocumented)
-