Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename repo and project #10

Open
MaxGhenis opened this issue Nov 21, 2018 · 3 comments
Open

Rename repo and project #10

MaxGhenis opened this issue Nov 21, 2018 · 3 comments

Comments

@MaxGhenis
Copy link
Collaborator

For our email to SOI, we decided to change the name to Synthetic Household File (SHF?) to better communicate that this project extends beyond the PUF (incorporating nonfilers, imputing other features, potentially different record count, etc.). This may also preemptively avoid a naming conflict with the TPC project, which seems like it might be called "synthetic PUF."

Should we adopt this name generally?

Also should we consider a term other than "household" since we're looking at tax units? For example, "Synthetic Microdata File?" Other ideas welcome.

@donboyd5
Copy link
Owner

I prefer to keep it general - Synthetic Household File or Synthetic Household Policy-Analysis File. I am not sure they will always be tax units; they certainly won't always be (and aren't always now) tax-filing units. If we do state-level analysis we may be very concerned about sales taxes or benefit issues, which won't always be driving by income tax filing status.

That said, it is just a preference. I don't feel strongly.

@MaxGhenis
Copy link
Collaborator Author

I agree on generality, which led me to the term "microdata," but I also don't feel strongly.

#11 got me thinking more generally about this though: SHF is really an umbrella project that would include synthesizing the PUF as well as other enhancement logic which currently lives in the taxdata and C-TAM repos. If we're just working on the PUF synthesis, maybe synpuf is actually an appropriate name, even if it's unpublicized and just lives as a piece of the SHF brand.

This raises questions around how these other enhancements play with the synthetic PUF (ideally as well as the real one, if Approach A in #11 is adopted), and what the real-PUF version of SHF would be called.

One potential end state would be to create two libraries:

  • synpuf, which has one key function, synthesize(), taking the raw PUF file and producing the synthetic one.
  • taxdata, which has one key function: enhance(), taking either the raw or synthetic PUF and producing the file used for Tax-Calculator. The synthetic file could be called SHF, real-PUF one name TBD.

All is to say, not sure we should rush to change this right now.

@MattHJensen @andersonfrailey

@donboyd5
Copy link
Owner

The two-function approach @MaxGhenis describes seems very clean to me. Even if, in practice, there is human intervention each time we run either of the steps (examining results, etc.), it is a nice separation that keeps the projects distinct and working well with each other. Curious for others' thoughts on this and on issue #11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants