Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Should DataFrame.drop() accept a set for the columns , index and labels argument? #59890

Open
1 task done
Dr-Irv opened this issue Sep 25, 2024 · 4 comments
Open
1 task done
Labels
Docs Needs Discussion Requires discussion from core team before further action

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 25, 2024

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

Documentation problem

The arguments labels, index and columns are documented as "single label or list-like". But a set is accepted:

import pandas as pd

df = pd.DataFrame({1: [2], 3: [4]})   # Fix is here
df = df.drop(columns={1})

The pandas source declaration in the typing declarations does not allow a set to be passed.

Suggested fix for documentation

Unclear.

Either we update the docs to say a set is allowed (and update the internal types), OR we add a check to see if a set is passed and raise an exception.

First raised as a pandas-stubs issue in pandas-dev/pandas-stubs#1008

@Dr-Irv Dr-Irv added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 25, 2024
@rhshadrach
Copy link
Member

In general, as long as the order is not important for the result, I'm positive on allowing sets. But not strongly so.

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 30, 2024
@AnuraagaNath
Copy link

The original method given by drop via pandas package mentions of the following internal type formats.

(method)
def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any] = ...,
    columns: Hashable | Sequence[Hashable] | Index[Any],
    level: Level | None = ...,
    inplace: Literal[True],
    errors: IgnoreRaise = ...
) -> None: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any],
    columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
    level: Level | None = ...,
    inplace: Literal[True],
    errors: IgnoreRaise = ...
) -> None: ...

def drop(
    labels: Hashable | Sequence[Hashable] | Index[Any],
    *,
    axis: Axis = ...,
    index: None = ...,
    columns: None = ...,
    level: Level | None = ...,
    inplace: Literal[True],
    errors: IgnoreRaise = ...
) -> None: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any] = ...,
    columns: Hashable | Sequence[Hashable] | Index[Any],
    level: Level | None = ...,
    inplace: Literal[False] = ...,
    errors: IgnoreRaise = ...
) -> DataFrame: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any],
    columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
    level: Level | None = ...,
    inplace: Literal[False] = ...,
    errors: IgnoreRaise = ...
) -> DataFrame: ...

def drop(
    labels: Hashable | Sequence[Hashable] | Index[Any],
    *,
    axis: Axis = ...,
    index: None = ...,
    columns: None = ...,
    level: Level | None = ...,
    inplace: Literal[False] = ...,
    errors: IgnoreRaise = ...
) -> DataFrame: ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any] = ...,
    columns: Hashable | Sequence[Hashable] | Index[Any],
    level: Level | None = ...,
    inplace: _bool = ...,
    errors: IgnoreRaise = ...
) -> (DataFrame | None): ...

def drop(
    labels: None = ...,
    *,
    axis: Axis = ...,
    index: Hashable | Sequence[Hashable] | Index[Any],
    columns: Hashable | Sequence[Hashable] | Index[Any] = ...,
    level: Level | None = ...,
    inplace: _bool = ...,
    errors: IgnoreRaise = ...
) -> (DataFrame | None): ...

def drop(
    labels: Hashable | Sequence[Hashable] | Index[Any],
    *,
    axis: Axis = ...,
    index: None = ...,
    columns: None = ...,
    level: Level | None = ...,
    inplace: _bool = ...,
    errors: IgnoreRaise = ...
) -> (DataFrame | None): ...

drop function supports this many combination of types during operations. All the functions are same. Documentation is less descriptive about this thing.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 1, 2024

drop function supports this many combination of types during operations. All the functions are same. Documentation is less descriptive about this thing.

yes and no. The type declarations for any of index, columns and labels are all allowing any of these:

  • A single Hashable
  • A sequence of Hashable
  • An Index

The docs currently say "single label or list-like" for those 3. A set is not "list-like", nor would a set pass those type declarations.

But at runtime, a set is supported. So the question is whether we change the type declarations and the docs, OR do we disallow a set at runtime (since the type declarations already disallow it).

@cmp0xff
Copy link

cmp0xff commented Nov 15, 2024

In the specific case of labels in drop, I think using a set or, in general, collections.abc.Collection is fine, since the result should not depend on the order of the labels. Maybe even collections.abc.Iterable should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants