Skip to content

Default column name behavior when there is no .alias() call #339

Answered by jaychia
wenleix asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @wenleix thanks for bringing this up!

Indeed - for naming, we currently default to using the left expression's name if no alias is specified. Specifically, for the example col("A") + col("B") we keep "A" as the name. We decided to keep this as the default behavior because oftentimes the semantic meaning of a column is still kept after performing some corrections on a column. For example:

df.select(df["year"] + 1)  # perform some corrections

In the above example, the "year" column still semantically means a year, and can thus continue to be referred to as "year" in downstream operations.

When this default case is not intended, users can provide external input using .alias(), assigning …

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by jaychia
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #338 on November 28, 2022 17:55.