Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auxiliary datasets for the BHSSW and SW databases of elliptic curves #5041

Closed
AndrewVSutherland opened this issue Feb 17, 2022 · 32 comments
Assignees
Labels
ECQ Elliptic curves over Q
Milestone

Comments

@AndrewVSutherland
Copy link
Member

AndrewVSutherland commented Feb 17, 2022

The authors of Databases of elliptic curves ordered by height and distributions of Selmer groups and ranks have kindly given us permission to make their database of 238,764,310 elliptic curves over Q of naive height up to 2.7*10^10 available as an auxiliary dataset that can be downloaded form the LMFDB, similar to what we do for the database of class groups of imaginary quadratic fields.

EDIT: The authors of A database of elliptic curves -- A first report have also given us permission to host their database of approximately 150 million elliptic curves with absolute discriminant at most 10^12 and conductor at most 10^8 or prime conductor at most 10^12. Given the reasonably modest sizes of these files (gigabytes not terabytes), we can just put them in the /data directory rather than in separate storage buckets.

@AndrewVSutherland AndrewVSutherland added the ECQ Elliptic curves over Q label Feb 17, 2022
@AndrewVSutherland AndrewVSutherland added this to the v1.3 milestone Feb 17, 2022
@roed314
Copy link
Contributor

roed314 commented Feb 18, 2022

We also have an auxiliary database of Riemann zeros. I wonder if we should have some index of these somewhere? I was actually trying to find the Riemann zero table recently (to show someone), and it took a while even knowing that it was there. You need to go to one of the L-function pages, click on "ζ zeros" in the properties box, ask for a bunch of zeros and then you'll get a link to the actual download files.

@AndrewVSutherland
Copy link
Member Author

This data has been added to the subdirectory /bhkssw_ecdb on grace as has the director/stein_watkins_ecdb. Both directories need to be copied over to prodweb1 and prodweb2, can you take care of that @edgarcosta ?

@AndrewVSutherland AndrewVSutherland changed the title Add auxiliary dataset of BHSSW elliptic curves Add auxiliary datasets for the BHSSW and SW databases of elliptic curves Feb 18, 2022
@edgarcosta
Copy link
Member

edgarcosta commented Feb 25, 2022 via email

@AndrewVSutherland
Copy link
Member Author

Thanks (and no rush, there still isn't a UI for it and that will need to be tested on beta first).

@jvoight
Copy link
Member

jvoight commented Nov 29, 2022

For placement, maybe we want something in the sidebar? And maybe also at the top of relevant webpages (like EC data should be displayed at the top of the EC page)?

@roed314
Copy link
Contributor

roed314 commented Nov 8, 2024

I like making an index page linked to from the sidebar, perhaps with the name "Datasets." Here are two possibilities for where to put it:

  • At the bottom, so the last two "header" rows would be "Database" and "Datasets."
  • Remove Universe from the sidebar and add it as a link from the text of the Overview page, and replace it with "Datasets."

@jvoight
Copy link
Member

jvoight commented Nov 9, 2024

Both are good options, I think.

I guess I kinda like the Universe and don't think of Datasets as Introduction, so would opt for the first?

@edgarcosta
Copy link
Member

We already have a directory for these kinds of datasets.
https://beta.lmfdb.org/data/

@roed314
Copy link
Contributor

roed314 commented Nov 9, 2024

Interesting; do we just want to use the apache file browser, or make a custom index page? Some things we could do with a custom index page:

  • Describe the criteria for adding new datasets of this type
  • Give more information about the contents of each (bhkswwdb/ is kind of mysterious)
  • Summarize the total size of each dataset.

Anything I'm missing?

@edgarcosta
Copy link
Member

Yeah, we need to add more information to, it but would be nice to have these organized instead of being a "dropbox".

@JohnCremona
Copy link
Member

Just noting that the page https://beta.lmfdb.org/EllipticCurve/Q/CongruentNumbers is linked to on all ECQ pages under "Learn more", and what is being proposed here for this dataset can/should be similar, even though the files are larger.

@roed314
Copy link
Contributor

roed314 commented Nov 13, 2024

I've put up a rough draft of an index page at https://purple.lmfdb.xyz/datasets; comments welcome. Some questions:

  • Currently bread on pages like Congruent numbers goes back through elliptic curves. Should we change this to this index page to make other datasets more discoverable? We would still have a properties-bar link to Congruent numbers when in the elliptic curves section (and probably also to the other elliptic curve datasets once they're added).
  • Where is the SW and BHKSSW data currently? The folders in /scratch/lmfdb-buckets on grace are empty.
  • Is the ECQ.txt file there separate, or is it one of SW or BHKSSW?
  • How should we split up SW and BHKSSW to make them usable? A 102GB text file is not the greatest UI...

@JohnCremona
Copy link
Member

Thanks, @roed314 -- I like the index page for these datasets. Yes, the bread for each of the sub-index pages we have should go back here, let's do that in the same PR which adds this page. We also need to think about where there will be links to this index page -- somewhere in the sidebar?

I don't see the file ECQ.txt. There is ec-data-S6.rank.txt in /scratch, belonging to @edgarcosta (and I don't know what that is either).

The SW data can be got as optional spkgs for sage. See https://doc.sagemath.org/html/en/reference/spkg/database_stein_watkins_mini.html#spkg-database-stein-watkins-mini and https://doc.sagemath.org/html/en/reference/spkg/database_stein_watkins.html#spkg-database-stein-watkins but do not just use the data there, those packages date back to 2007 (the second one had an update in 2011). I have versions of these which would be better to use, so I will try to get these onto legendre and work out what is there, as it is some time since I did.

I'll let someone else find and do similar for BHKSSW. I'm sure it is much larger and may make SW obsolete.

@roed314
Copy link
Contributor

roed314 commented Nov 13, 2024

Currently there's a link at the bottom of the sidebar, based on my discussion with @jvoight earlier on this issue. I'll update the bread on the sub-index pages.

I wasn't sure what the ECQ.txt file was (it's currently visible here), so I haven't added it to the index yet. Maybe it is the BHKSSW database? I looked at the end of the file and the last conductor size didn't seem right for that, but maybe they're not ordered that way.

@edgarcosta
Copy link
Member

ECQ.txt was put there by me (it was from the ML workshop) and has been removed

@edgarcosta
Copy link
Member

I should also point out there is other data that all the servers have access to:

/scratch/lmfdb-data$ ls -l
total 31
drwxrwxr-x 2 lmfdb lmfdb   32 Feb 18  2022 bhkssw_ecdb
lrwxrwxrwx 1 lmfdb lmfdb   63 Jul  8  2018 class_numbers -> /scratch/lmfdb-buckets/class-groups-quadratic-imaginary-fields/
drwxrwxr-x 2 lmfdb lmfdb   35 Feb 15  2022 congruent_number_curves
drwxr-xr-x 2 lmfdb lmfdb    9 Sep 10  2013 lfunction_plots
-rw-rw-r-- 1 lmfdb lmfdb  237 Sep 17  2013 LICENSE.md
drwx------ 2 lmfdb lmfdb    2 Apr  8  2016 lost+found
drwxr-xr-x 2 lmfdb lmfdb    4 Jul 18  2020 maass_forms
drwxrwxr-x 2 lmfdb lmfdb    7 Sep 21  2013 riemann
drwxr-xr-x 7 lmfdb lmfdb    7 Mar 29  2012 Siegel-Modular-Forms
drwxrwxr-x 2 lmfdb lmfdb 1103 Feb 18  2022 stein_watkins_ecdb
drwxrwxr-x 3 lmfdb lmfdb    3 Aug 10  2013 weight_three_halves
drwxr-xr-x 7 lmfdb lmfdb    9 Sep  9  2014 zeros

@roed314
Copy link
Contributor

roed314 commented Nov 13, 2024

Thanks, I just found that folder too. In particular, it includes bhkssw and stein_watkins, so that answers my question from earlier.

@roed314
Copy link
Contributor

roed314 commented Nov 13, 2024

Do we want to make any of the other data in that folder visible on the datasets index page?

  • riemann/keiperli100k110kb.txt.tar.bz2 (I don't know what this is and it's not documented)
  • riemann/stieltjes100k125kb.txt (the only documentation is in a file that
  • weight_three_halves/... (@tornaria is listed as an author, and it's described as under construction)
  • zeros/... (there's more data in there besides zeta zeros)

I'm inclined to not add any of these for now, but if someone feels inspired feel free to work on one of them!

@roed314
Copy link
Contributor

roed314 commented Nov 13, 2024

I've pushed an initial index page for the BHKSSW dataset, but

  • The column description is incomplete (see below)
  • The files are large enough that gunicorn will kill the download if you try to download them.

@AndrewVSutherland The README.txt in /home/lmfdb/data/bhkssw_ecdb describes the data as being sqlite files, but the actual data are flat text files. So it's not clear how the descriptions of the columns match the order in the file (e.g. the first entries of 19003704300|1|-1|0|-102|-389|-1635|-26530|5940675|1|1|1|2|1||2|2|0.072|||2|0 are clear—up through the conductor 5940675—but the last ones are not).

Because of the gunicorn timeout, to get this actually working we'll need to either address #6221, find a workaround for static files, or split the BHKSSW files into smaller pieces (which will probably require reworking the download table at the bottom of the page).

@AndrewVSutherland
Copy link
Member Author

I think we should split the files (I originally received them that way, and I converted them from sqllite to plain text).

@AndrewVSutherland
Copy link
Member Author

I'll let someone else find and do similar for BHKSSW. I'm sure it is much larger and may make SW obsolete.

BHKSSW won't make SW obsolete, it is organized by height while SW is organized by conductor, and the datasets are largely disjoint. There is a future dataset that may make SW obsolete (I'm working on constructing a database of elliptic curves of bounded discriminant that goes well past the SW bounds, the file ECQ.txt was a preliminary version), but it is not ready to be published yet and I think there is value to having SW available for historical reasons regardless.

@tornaria
Copy link
Contributor

* `weight_three_halves/...` (@tornaria is listed as an author, and it's described as under construction)

This is very old so let me try to remember:

  • In 2003/2004 I made a table of modular forms of wt 3/2 (https://www.cmat.edu.uy/cnt/twist/current/), each mf is given as an explicit linear combination of theta series of ternary quadratic forms, this is all available in the link above and it doesn't take much space. The programs available there show how to compute the Fourier coefficients (up to X in time X^3/2 with the naive theta series algorithm).
  • Mike Rubinstein computed the actual fourier coefficients (by computing the theta series) to 10^9 or 10^10 fourier coefficients. This data used to be available to download from his webpage but it's probably not there anymore. Relevant paper https://arxiv.org/abs/math/0412083.
  • Someone (not me) put this data (fourier coefficients) into lmfdb a long time ago with the idea of making modular forms of wt 3/2 available (circa 2011, msri semester). I think all the fourier coefficients were in the mongodb database back then.
  • Eventually this was taken out of the db because of the huge amount of space (I think mongodb wasn't particularly efficient in storing this!) I'm guessing this is what is available in this directory but I don't know how to access this folder.
  • Storing 10^9 Fourier coefficients of modular forms is probably not reasonable (definitely not in the database, maybe as auxiliary files). Storing the information necessary to compute them seems good. The actual eigenvectors contain more interesting information (e.g. congruences). A general way to incorporate orthogonal modular forms into lmfdb might be useful.

@roed314
Copy link
Contributor

roed314 commented Nov 14, 2024

@tornaria I don't have an easy way to give you access to the folder (it's 34GB), but the directory structure is

COEFFICIENTS/ (33GB, 2397 .gz files like 1001B_i.gz)
DELAUNAY/ (606MB, 2399 files like 1001B_i, mostly the same but also includes 11A_i and 11A_r)
DISCRETE_DIST/ (35MB, 4794 files like all_twists_1001B_i and prime_twists_9967A_i)
LOG/ (11MB, 2770 files like 1001B_i)
LOGARITHMIC_DISTR/ (8.9MB, 4796 files like all_twists_1001B_i and prime_twists_9967A_i)
Readme.txt
tw_i.prime.1-20000
tw_i.squarefree.1-5000
tw_r.red

Readme.txt says

Under construction.

These files contain data related to values of L_E(1,chi_d),
i.e. quadratic twists by fundamental discriminants
of elliptic curve L-functions evaluated at the critical point.

For each E, these values are given in terms of the coefficients
of certain modular forms of weight 3/2. These modular forms are
expressed as a linear combination of ternary quadratic forms.
The ternary forms and relevant combination came from a table computed
by Fernando-Rodriguez Villegas and Gonzalo Tornario. 

-------------------------------------------------------------------------

The COEFFICIENTS directory contains the coefficients of the weight 3/2
modular form, c(|d|), with |d|< 10^8, d a fundamental discriminant.
More curves are available for negative d than positive d. 
Not all fundamental discriminants are represented in the tables.
For each curve E, one can determine which d's are represented according
to the following criterion:

   state the criterion

This imposes congruence conditions on d. For each curve,
one can find a list of the relevant residue classes 
in the LOG directory. 

For negative d, one should look at the files called, for example, 11A_i.
For positive d, one should look at the files called, for example, 11A_r.
This applies for all the other directories as well.

-------------------------------------------------------------------------

The files in the DELAUNAY directory 
contain data that confirms Delaunay's heuristics in the rank 0 
case for Tate-Shavarevich groups. For each curve examined, I
computed, for a given prime q, the percent of the c(|d|)'s that
are divisible by q. Other than q=2, this should agree with
Delaunay's prediction (for q=2, one needs to first normalize the
c(|d|)'s by certain fudge factors, something I haven't done here). 

For the sake of experimentation, I computed this percentage
in four different cases as specified by the descriptions of
columns 3-6

column 1: X  (as in |d| < X).
column 2: q
column 3: |d| prime, even rank, (percent of c(|d|) divisible by q) divided by (Delaunay's prediction)
column 4: |d| prime, rank 0 (i.e. c(|d|) nonzero)), (percent of c(|d|) divisible by q) divided by (Delaunay's prediction)
column 5: |d| fund. discriminant, even rank, percent divisible by q divided by Delaunay's prediction
column 6: |d| fund. discriminant, rank 0, percent divisible by q divided by Delaunay's prediction

Columns 3 and 5 seem to give a much better fit to Delaunay's predictions than 4 and 6. 
In the long run, columns 3 and 4 should agree, and so should columns 5 and 6. However, it
seems that by throwing in the (presumably) zero density set of curves of higher
rank, one gets better numerics. 

For example, the file DELAUNAY/11A_i has, towards the tail end of the
file, a few lines that look like this:

100000000 2 0.5741680277 0.5602304218 1.328679459 1.317788659
100000000 3 0.9920022633 0.9704193524 0.9889849587 0.9397194422
100000000 5 1.152373299 1.107627652 1.146669084 1.044547163
100000000 7 0.9945694576 0.923165625 1.003382163 0.8409144332
100000000 11 0.9940413991 0.8735921844 1.027715329 0.7541730859
100000000 13 0.9995350319 0.8547347132 1.042260592 0.7134786346
100000000 17 0.9977600175 0.8041957152 1.079808408 0.6409931771
100000000 19 0.9922611414 0.7742871066 1.09975361 0.6060184744
100000000 23 1.000921019 0.7343993131 1.148909932 0.545666906
100000000 29 1.01590385 0.6766348354 1.236701142 0.469699039
100000000 31 1.014171729 0.6505803287 1.266218234 0.4446558347

make a few comments... including exceptional primes and the
fact that this effect should dissolve in the long run.

@roed314
Copy link
Contributor

roed314 commented Nov 14, 2024

Alright, the new index page is in decent shape, and there are subpages for Stein-Watkins and BHKSSW. I need to wait a couple days for @AndrewVSutherland to get access to the column order for Stein-Watkins (and will probably need to break up the BHKSSW files into smaller pieces, perhaps 2700 files of size 10^7), but in the meantime let me know if you have feedback. Some additional questions:

  • Do we want to specify a license for these datasets? This is related to Copyright for pages? #5088. If we do want to include a license we'd need to run it by the people who contributed the data.
  • How should we sort the datasets? Currently they're sorted by total size, largest to smallest.

@jvoight
Copy link
Member

jvoight commented Nov 15, 2024

I'd assume that we should use the same license as the rest of the LMFDB as in #5088, unless the data asks for a stronger license?

With so few datasets, it doesn't seem to matter too much how they are sorted. I might sort by date submitted, but wait, do we have that information?

@roed314
Copy link
Contributor

roed314 commented Nov 15, 2024

I think using the same license makes a lot of sense. I asked about it because the page includes a section soliciting datasets; we may want to be explicit there that if someone suggests adding a dataset to the LMFDB then they should be okay licensing it as CC-BY-SA (for example).

I don't know that we have information on date submitted, though with only 5 it's probably possible to find that information in emails.

@JohnCremona
Copy link
Member

The Rathbun data on congruent numbers was first given to me in 2015. Is that the date we want? It was rather later that I got around to putting it into LMFDB with an index page etc.

@JohnCremona
Copy link
Member

Can we have some sort of checklist on this Issue so that we known when we can close it?

@alozanoroble
Copy link

@roed314 nice job on the pages for databases, and particularly happy to see the BHKSSW database easily accessible (so much easier to deal with .txt files than sqlite files!) On that note though, is there a way to download a range of txt files? Thanks! Also, perhaps we can convert each txt file (or combine them) to a Magma readable format? I did that once back in the day when I played with the database for a paper, so I could try doing that again if that would be useful.

@roed314
Copy link
Contributor

roed314 commented Nov 20, 2024

I split up the files into smaller pieces because of our limitations on file size. So the way to download a range of files is to use a loop on the client side. If we can figure out how to remove our file size limitation (which is difficult), we could do various things to make accessing this data more convenient.

As for magma downloads, it should be feasible to construct those on the fly from the text data (rather than storing duplicate data files on the server).

@roed314
Copy link
Contributor

roed314 commented Nov 20, 2024

As for @JohnCremona's suggestion of a checklist, I'm not sure what remains to be done on this issue. Maybe we should close it and @alozanoroble can open a new one with suggested improvements.

@JohnCremona
Copy link
Member

As for @JohnCremona's suggestion of a checklist, I'm not sure what remains to be done on this issue. Maybe we should close it and @alozanoroble can open a new one with suggested improvements.

OK, let's close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECQ Elliptic curves over Q
Projects
None yet
Development

No branches or pull requests

7 participants