-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST(string dtype): Resolve replace xfails #60659
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -334,7 +334,6 @@ def test_regex_replace_str_to_numeric(self, mix_abc): | |
return_value = res3.replace(regex=r"\s*\.\s*", value=0, inplace=True) | ||
assert return_value is None | ||
expec = DataFrame({"a": mix_abc["a"], "b": ["a", "b", 0, 0], "c": mix_abc["c"]}) | ||
# TODO(infer_string) | ||
expec["c"] = expec["c"].astype(object) | ||
tm.assert_frame_equal(res, expec) | ||
tm.assert_frame_equal(res2, expec) | ||
|
@@ -1469,20 +1468,23 @@ def test_regex_replace_scalar( | |
tm.assert_frame_equal(result, expected) | ||
|
||
@pytest.mark.parametrize("regex", [False, True]) | ||
def test_replace_regex_dtype_frame(self, regex): | ||
@pytest.mark.parametrize("value", [1, "1"]) | ||
def test_replace_regex_dtype_frame(self, regex, value): | ||
# GH-48644 | ||
df1 = DataFrame({"A": ["0"], "B": ["0"]}) | ||
expected_df1 = DataFrame({"A": [1], "B": [1]}, dtype=object) | ||
result_df1 = df1.replace(to_replace="0", value=1, regex=regex) | ||
# When value is an integer, coerce result to object. | ||
# When value is a string, infer the correct string dtype. | ||
dtype = object if value == 1 else None | ||
|
||
expected_df1 = DataFrame({"A": [value], "B": [value]}, dtype=dtype) | ||
result_df1 = df1.replace(to_replace="0", value=value, regex=regex) | ||
tm.assert_frame_equal(result_df1, expected_df1) | ||
|
||
df2 = DataFrame({"A": ["0"], "B": ["1"]}) | ||
if regex: | ||
# TODO(infer_string): both string columns get cast to object, | ||
# while only needed for column A | ||
expected_df2 = DataFrame({"A": [1], "B": ["1"]}, dtype=object) | ||
Comment on lines
-1481
to
-1483
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this behavior was correct - we get object dtype here because we are trying to replace string values with integer values. If we were to make the result a string dtype, then that would be introducing value-specific behavior. |
||
expected_df2 = DataFrame({"A": [1], "B": ["1"]}, dtype=dtype) | ||
else: | ||
expected_df2 = DataFrame({"A": Series([1], dtype=object), "B": ["1"]}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This behavior looks incorrect to me, |
||
expected_df2 = DataFrame({"A": Series([1], dtype=dtype), "B": ["1"]}) | ||
result_df2 = df2.replace(to_replace="0", value=1, regex=regex) | ||
tm.assert_frame_equal(result_df2, expected_df2) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to coerce to string instead of raising? The object case makes sense I'm just not as sure onn the string side if we should be implicitly casting like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. When
infer_string=True
, the input DataFrame isstr
dtype. Then when we go to replace"0"
withvalue="1"
, certainly we want the result to still bestr
dtype, no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think that makes sense, but I'm not as sure when the target value is a non-string, i.e.
replace(..., value=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the target value is a non-string, we coerce to object dtype in order to hold both integers and strings. What are you not sure about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was just misreading the comment - I think this is good