Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add read support for dta formats 120 and 121. closes #85 #86

Merged
merged 4 commits into from
Aug 9, 2024

Conversation

JanMarvin
Copy link
Collaborator

@JanMarvin JanMarvin commented May 8, 2023

Untested changes to support reading from dta formats 120 and 121.

The alias format is ignored when reading and a warning returned. Write support seems possible, but useless at the moment.

@JanMarvin JanMarvin force-pushed the stata18 branch 3 times, most recently from 5ee3e44 to 8f15616 Compare August 4, 2024 17:13
@JanMarvin JanMarvin changed the base branch from main to testing August 4, 2024 17:14
@JanMarvin
Copy link
Collaborator Author

This added support for dta format 120 and a .dtas file created from Stata 18. Handling for this file type is currently not implemented, but a workaround is implemented in the test. These so called frames contain multiple dta files in a zip file (the dtas file contains dta files), which can be linked together. In the example, the column income is merged from the second data file.

* enable experimental support for writing Stata 120|121 files
* add tests for 120|121 (untested with Stata 18)
@JanMarvin
Copy link
Collaborator Author

Created a huge data file with 40,000 variables, numerics with labels. Worked smoothly.

@JanMarvin
Copy link
Collaborator Author

Support for 120|121 is atm experimental, not tested, but could be interested, if a 120 file is loaded and should be modified and stored again as 120 file.

@JanMarvin
Copy link
Collaborator Author

Tested the following for 120 and 121. This obviously isn't sufficient, but we have quite a few other tests.

library(readstata13)

n <- 100
k <- 4000 # or 40000 for 121: max for 120 is 32767
mm <- matrix(rnorm(n * k), ncol = k, nrow = n)

df <- as.data.frame(mm)
attr(df, "var.labels") <- paste0("var", seq_len(k))

save.dta13(data = df, "/tmp/huge_df.dta", version = 120)

@JanMarvin JanMarvin merged commit 5347bbc into testing Aug 9, 2024
5 checks passed
@JanMarvin JanMarvin deleted the stata18 branch August 9, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants