Column¶
-
pyspark_util.column.null_ratio(col_name)[source]¶ Return the null ratio of the given column.
- Parameters
col_name (str) – column name
- Returns
Null ratio.
- Return type
column
Examples
>>> df = spark.createDataFrame([ ... (1,), ... (2,), ... (None,), ... (None,), ... ], ['x']) >>> df.select(psu.null_ratio('x')).show() +---+ | x| +---+ |0.5| +---+
-
pyspark_util.column.blank_ratio(col_name, include_null=False)[source]¶ Return the null ratio of the given column.
- Parameters
col_name (str) – column name
include_null (bool, default False) – If True, the blank ratio is calculated including
NULLrows.
- Returns
Blank ratio.
- Return type
column
Examples
By default,
NULLis ignored.>>> df = spark.createDataFrame([ ... ('a',), ... ('b',), ... ('',), ... ('',), ... (None,), ... ], ['x']) >>> df.select(psu.blank_ratio('x')).show() +---+ | x| +---+ |0.5| +---+
With
include_null=True,NULLis included in the calculation.>>> df = spark.createDataFrame([ ... ('a',), ... ('b',), ... ('',), ... ('',), ... (None,), ... ], ['x']) >>> df.select(psu.blank_ratio('x', include_null=True)).show() +---+ | x| +---+ |0.4| +---+
-
pyspark_util.column.is_unique(col_name)[source]¶ Return True if the given column is unique.
- Parameters
col_name (str) – column name
- Returns
is_unique
- Return type
column
Examples
>>> df = spark.createDataFrame([(1,), (2,), (3,)], ['x']) >>> df.select(psu.is_unique('x')).show() +----+ | x| +----+ |true| +----+
>>> df = spark.createDataFrame([(1,), (2,), (2,)], ['x']) >>> df.select(psu.is_unique('x')).show() +-----+ | x| +-----+ |false| +-----+
>>> df = spark.createDataFrame([(1,), (2,), (3,), (None,)], ['x']) >>> df.select(psu.is_unique('x')).show() +----+ | x| +----+ |true| +----+
>>> df = spark.createDataFrame([(1,), (2,), (3,), (None,), (None,)], ['x']) >>> df.select(psu.is_unique('x')).show() +-----+ | x| +-----+ |false| +-----+
-
pyspark_util.column.contains(col_name, pat)[source]¶ Test if pattern or regex is contained within a string.
- Parameters
col_name (str) – column name
pat (str) – character sequence or regular expression.
- Returns
column of boolean values indicating whether the given pattern is contained within each element.
- Return type
column
Examples
>>> df = spark.createDataFrame([('abc',), ('123',), (None,)], ['x']) >>> df.select(psu.contains('x', 'abc')).show() +-----+ | x| +-----+ | true| |false| | null| +-----+
>>> df = spark.createDataFrame([('abc',), ('123',), (None,)], ['x']) >>> df.select(psu.contains('x', r'[a-z]+')).show() +-----+ | x| +-----+ | true| |false| | null| +-----+