flytekitplugins.spark.schema
flytekitplugins.spark.schema
Directory
Classes
flytekitplugins.spark.schema.SparkDataFrameSchemaReader
Implements how SparkDataFrame should be read using the open
method of FlyteSchema
class SparkDataFrameSchemaReader (
from_path: str,
cols: typing. Optional[typing. Dict[str, type]],
fmt: < enum 'SchemaFormat' > ,
)
Parameter
Type
from_path
str
cols
typing.Optional[typing.Dict[str, type]]
fmt
<enum 'SchemaFormat'>
Methods
all()
def all (
kwargs,
) -> pyspark. sql. dataframe. DataFrame
Parameter
Type
kwargs
**kwargs
iter()
def iter (
kwargs,
) -> typing. Generator[~ T, NoneType, NoneType]
Parameter
Type
kwargs
**kwargs
Properties
Property
Type
Description
column_names
from_path
flytekitplugins.spark.schema.SparkDataFrameSchemaWriter
Implements how SparkDataFrame should be written to using open
method of FlyteSchema
class SparkDataFrameSchemaWriter (
to_path: str,
cols: typing. Optional[typing. Dict[str, type]],
fmt: < enum 'SchemaFormat' > ,
)
Parameter
Type
to_path
str
cols
typing.Optional[typing.Dict[str, type]]
fmt
<enum 'SchemaFormat'>
Methods
write()
def write (
dfs: pyspark. sql. dataframe. DataFrame,
kwargs,
)
Parameter
Type
dfs
pyspark.sql.dataframe.DataFrame
kwargs
**kwargs
Properties
Property
Type
Description
column_names
to_path
Transforms Spark DataFrame’s to and from a Schema (typed/untyped)
def SparkDataFrameTransformer ()
Methods
assert_type()
def assert_type (
t: Type[T],
v: T,
)
Parameter
Type
t
Type[T]
v
T
from_binary_idl()
def from_binary_idl (
binary_idl_object: Binary,
expected_python_type: Type[T],
) -> Optional[T]
This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.`
For untyped dict, dataclass, and pydantic basemodel:
Life Cycle (Untyped Dict as example):
python val -> msgpack bytes -> binary literal scalar -> msgpack bytes -> python val
(to_literal) (from_binary_idl)
For attribute access:
Life Cycle:
python val -> msgpack bytes -> binary literal scalar -> resolved golang value -> binary literal scalar -> msgpack bytes -> python val
(to_literal) (propeller attribute access) (from_binary_idl)
Parameter
Type
binary_idl_object
Binary
expected_python_type
Type[T]
from_generic_idl()
def from_generic_idl (
generic: Struct,
expected_python_type: Type[T],
) -> Optional[T]
TODO: Support all Flyte Types.
This is for dataclass attribute access from input created from the Flyte Console.
Note:
This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.
Parameter
Type
generic
Struct
expected_python_type
Type[T]
get_literal_type()
def get_literal_type (
t: typing. Type[pyspark. sql. dataframe. DataFrame],
) -> flytekit. models. types. LiteralType
Converts the python type to a Flyte LiteralType
Parameter
Type
t
typing.Type[pyspark.sql.dataframe.DataFrame]
guess_python_type()
def guess_python_type (
literal_type: LiteralType,
) -> Type[T]
Converts the Flyte LiteralType to a python object type.
Parameter
Type
literal_type
LiteralType
isinstance_generic()
def isinstance_generic (
obj,
generic_alias,
)
Parameter
Type
obj
generic_alias
to_html()
def to_html (
ctx: FlyteContext,
python_val: T,
expected_python_type: Type[T],
) -> str
Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div
Parameter
Type
ctx
FlyteContext
python_val
T
expected_python_type
Type[T]
to_literal()
def to_literal (
ctx: flytekit. core. context_manager. FlyteContext,
python_val: pyspark. sql. dataframe. DataFrame,
python_type: typing. Type[pyspark. sql. dataframe. DataFrame],
expected: flytekit. models. types. LiteralType,
) -> flytekit. models. literals. Literal
Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these
do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating
what was the mismatch
Parameter
Type
ctx
flytekit.core.context_manager.FlyteContext
python_val
pyspark.sql.dataframe.DataFrame
python_type
typing.Type[pyspark.sql.dataframe.DataFrame]
expected
flytekit.models.types.LiteralType
to_python_value()
def to_python_value (
ctx: flytekit. core. context_manager. FlyteContext,
lv: flytekit. models. literals. Literal,
expected_python_type: typing. Type[pyspark. sql. dataframe. DataFrame],
) -> ~ T
Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised
Parameter
Type
ctx
flytekit.core.context_manager.FlyteContext
lv
flytekit.models.literals.Literal
expected_python_type
typing.Type[pyspark.sql.dataframe.DataFrame]
Properties
Property
Type
Description
is_async
name
python_type
This returns the python type
type_assertions_enabled
Indicates if the transformer wants type assertions to be enabled at the core type engine layer