Object

org.apache.spark.sql.util.SchemaUtils

public class SchemaUtils extends Object

Utils for handling schemas.

TODO: Merge this file with SchemaUtils.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

SchemaUtils.ColumnPath$
Constructor Summary

Constructors

Constructor

Description

SchemaUtils()
Method Summary

Modifier and Type

Method

Description

static void

checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, boolean caseSensitiveAnalysis)

Checks if input column names have duplicate identifiers.

static void

checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, scala.Function2<String,String,Object> resolver)

Checks if input column names have duplicate identifiers.

static void

checkIndeterminateCollationInSchema(StructType schema)

Throws an error if the given schema has indeterminate collation.

static void

checkNoCollationsInMapKeys(DataType schema)

static void

checkSchemaColumnNameDuplication(DataType schema, boolean caseSensitiveAnalysis)

Checks if an input schema has duplicate column names.

static void

checkSchemaColumnNameDuplication(StructType schema, scala.Function2<String,String,Object> resolver)

Checks if an input schema has duplicate column names.

static void

checkTransformDuplication(scala.collection.immutable.Seq<Transform> transforms, String checkType, boolean isCaseSensitive)

Checks if the partitioning transforms are being duplicated or not.

static String

escapeMetaCharacters(String str)

static scala.collection.immutable.Seq<String>

explodeNestedFieldNames(StructType schema)

Returns all column names in this schema as a flat list.

static scala.collection.immutable.Seq<org.apache.spark.sql.util.SchemaUtils.ColumnPath>

findColumnPaths(DataType dt, scala.Function1<DataType,Object> f)

For the given dataType dt find all column paths that satisfy the given predicate f.

static scala.collection.immutable.Seq<Object>

findColumnPosition(scala.collection.immutable.Seq<String> column, StructType schema, scala.Function2<String,String,Object> resolver)

Returns the given column's ordinal within the given schema.

static scala.collection.immutable.Seq<String>

getColumnName(scala.collection.immutable.Seq<Object> position, StructType schema)

Gets the name of the column in the given position.

static boolean

hasIndeterminateCollation(DataType dt)

Checks if a given data type has indeterminate collation.

static boolean

hasNonUTF8BinaryCollation(DataType dt)

Checks if a given data type has a non utf8 binary (implicit) collation type.

static DataType

replaceCollatedStringWithString(DataType dt)

Replaces any collated string type with non collated StringType recursively in the given data type.

static scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression>

restoreOriginalOutputNames(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.immutable.Seq<String> originalNames)

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SchemaUtils
  
  public SchemaUtils()
Method Details
- findColumnPaths
  
  public static scala.collection.immutable.Seq<org.apache.spark.sql.util.SchemaUtils.ColumnPath> findColumnPaths(DataType dt, scala.Function1<DataType,Object> f)
  
  For the given dataType dt find all column paths that satisfy the given predicate f.
  
  Parameters:
  
  dt - (undocumented)
  
  f - (undocumented)
  
  Returns:
  
  (undocumented)
- checkSchemaColumnNameDuplication
  
  public static void checkSchemaColumnNameDuplication(DataType schema, boolean caseSensitiveAnalysis)
  
  Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.
  
  Parameters:
  
  schema - schema to check
  
  caseSensitiveAnalysis - whether duplication checks should be case sensitive or not
- checkSchemaColumnNameDuplication
  
  public static void checkSchemaColumnNameDuplication(StructType schema, scala.Function2<String,String,Object> resolver)
  
  Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.
  
  Parameters:
  
  schema - schema to check
  
  resolver - resolver used to determine if two identifiers are equal
- checkColumnNameDuplication
  
  public static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, scala.Function2<String,String,Object> resolver)
  
  Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.
  
  Parameters:
  
  columnNames - column names to check
  
  resolver - resolver used to determine if two identifiers are equal
- checkColumnNameDuplication
  
  public static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, boolean caseSensitiveAnalysis)
  
  Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.
  
  Parameters:
  
  columnNames - column names to check
  
  caseSensitiveAnalysis - whether duplication checks should be case sensitive or not
- explodeNestedFieldNames
  
  public static scala.collection.immutable.Seq<String> explodeNestedFieldNames(StructType schema)
  
  Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"
  
  Parameters:
  
  schema - (undocumented)
  
  Returns:
  
  (undocumented)
- checkTransformDuplication
  
  public static void checkTransformDuplication(scala.collection.immutable.Seq<Transform> transforms, String checkType, boolean isCaseSensitive)
  
  Checks if the partitioning transforms are being duplicated or not. Throws an exception if duplication exists.
  
  Parameters:
  
  transforms - the schema to check for duplicates
  
  checkType - contextual information around the check, used in an exception message
  
  isCaseSensitive - Whether to be case sensitive when comparing column names
- findColumnPosition
  
  public static scala.collection.immutable.Seq<Object> findColumnPosition(scala.collection.immutable.Seq<String> column, StructType schema, scala.Function2<String,String,Object> resolver)
  
  Returns the given column's ordinal within the given schema. The length of the returned position will be as long as how nested the column is.
  
  Parameters:
  
  column - The column to search for in the given struct. If the length of column is greater than 1, we expect to enter a nested field.
  
  schema - The current struct we are looking at.
  
  resolver - The resolver to find the column.
  
  Returns:
  
  (undocumented)
- getColumnName
  
  public static scala.collection.immutable.Seq<String> getColumnName(scala.collection.immutable.Seq<Object> position, StructType schema)
  
  Gets the name of the column in the given position.
  
  Parameters:
  
  position - (undocumented)
  
  schema - (undocumented)
  
  Returns:
  
  (undocumented)
- restoreOriginalOutputNames
  
  public static scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> restoreOriginalOutputNames(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.immutable.Seq<String> originalNames)
- escapeMetaCharacters
  
  public static String escapeMetaCharacters(String str)
  
  Parameters:
  
  str - The string to be escaped.
  
  Returns:
  
  The escaped string.
- hasNonUTF8BinaryCollation
  
  public static boolean hasNonUTF8BinaryCollation(DataType dt)
  
  Checks if a given data type has a non utf8 binary (implicit) collation type.
  
  Parameters:
  
  dt - (undocumented)
  
  Returns:
  
  (undocumented)
- hasIndeterminateCollation
  
  public static boolean hasIndeterminateCollation(DataType dt)
  
  Checks if a given data type has indeterminate collation.
- checkIndeterminateCollationInSchema
  
  public static void checkIndeterminateCollationInSchema(StructType schema)
  
  Throws an error if the given schema has indeterminate collation.
  
  Parameters:
  
  schema - (undocumented)
- checkNoCollationsInMapKeys
  
  public static void checkNoCollationsInMapKeys(DataType schema)
- replaceCollatedStringWithString
  
  public static DataType replaceCollatedStringWithString(DataType dt)
  
  Replaces any collated string type with non collated StringType recursively in the given data type.
  
  Parameters:
  
  dt - (undocumented)
  
  Returns:
  
  (undocumented)

Class SchemaUtils

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

SchemaUtils

Method Details

findColumnPaths

checkSchemaColumnNameDuplication

checkSchemaColumnNameDuplication

checkColumnNameDuplication

checkColumnNameDuplication

explodeNestedFieldNames

checkTransformDuplication

findColumnPosition

getColumnName

restoreOriginalOutputNames

escapeMetaCharacters

hasNonUTF8BinaryCollation

hasIndeterminateCollation

checkIndeterminateCollationInSchema

checkNoCollationsInMapKeys

replaceCollatedStringWithString