You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using Pandera with nullable fields, there's a difference in behavior between Polars and Pandas validation. The Polars validation appears to drop rows with null values even when fields are explicitly marked as nullable, while Pandas validation correctly preserves these rows.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
importpolarsasplimportpanderaaspaimportpandera.polarsaspaplfromfunctoolsimportpartial# Create a simple dataframe with null values and an invalid valuedf=pl.DataFrame({
"col1": ['1', '2', None, 'x'],
"col2": ['valid', None, None, 'valid']
})
# Define a simple schema with nullable fields and invalid values checkinvalids= ['x']
schema_field=partial(
pa.Field,
nullable=True,
notin=invalids
)
classPolarsSchema(papl.DataFrameModel):
col1: str=schema_field()
col2: str=schema_field()
classConfig:
drop_invalid_rows=TrueclassPandasSchema(pa.DataFrameModel):
col1: str=schema_field()
col2: str=schema_field()
classConfig:
drop_invalid_rows=True# Test Polars validationprint("Original DataFrame:")
print(df)
print("\nUsing Polars validation:")
print(df.pipe(PolarsSchema.validate, lazy=True))
print("\nUsing Pandas validation:")
print(
df.to_pandas()
.pipe(PandasSchema.validate, lazy=True)
.pipe(pl.from_pandas)
)
Expected behavior
Both Polars and Pandas validation should handle null values the same way. Since the fields are marked as nullable=True, rows containing null values should be preserved. Only the row containing the invalid value 'x' should be dropped.
Desktop (please complete the following information):
Describe the bug
When using Pandera with nullable fields, there's a difference in behavior between Polars and Pandas validation. The Polars validation appears to drop rows with null values even when fields are explicitly marked as nullable, while Pandas validation correctly preserves these rows.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Expected behavior
Both Polars and Pandas validation should handle null values the same way. Since the fields are marked as nullable=True, rows containing null values should be preserved. Only the row containing the invalid value 'x' should be dropped.
Desktop (please complete the following information):
Screenshots
Console Outputs:
Additional context
The behavior is consistent - Polars validation always drops the null rows while Pandas validation preserves them
The text was updated successfully, but these errors were encountered: