-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5000+ runs have neither master, pr_base or pr_head labels #1175
Comments
This was how old runs were retroactively tagged: https://github.com/web-platform-tests/data-migration/blob/master/tagger/master.go We used the commit log of WPT as the ground truth. I think the logic is sound, but there might be some gaps because of timing (e.g. tagged the old runs before the prod release, out-of-date local WPT checkout). However, the linked example is a very old run, so there might be bugs in the migration code or it wasn't run until completion. |
@Hexcles, in #1195 (comment) I wrote a script to find some mislabeled Safari runs. Those would come up in an analysis like this. I feel like all runs should have one of the 3 known labels, and if anything useful remains perhaps we should come up with a label for those too so that it's easier to be sure that nothing's gone wrong like this. |
This is a problem in practice now. In trying to dust off my scripts for measuring results and interop over time, I see that some of the aligned SHAs I used that time around aren't found now because some of the runs don't have the master label. https://wpt.fyi/results/?sha=a516e7edca&label=stable is the earliest such run, where the Firefox run from Taskcluster is missing the master label. |
Simply not requiring the master label would probably work, but there might still be one-off runs with stable browsers that someone has done that then influence the results. |
Backfilling labels using web-platform-tests/data-migration@ea07520 |
Note that the backfiller is being tested on staging at the moment; prod isn't affected yet. |
The heuristics aren't strong enough: the script is incorrectly labelling some supposedly |
The new heuristics seem to work. I don't see any false positives, and the possibility of false negatives (true master runs taking <10min) is low. ~3000 runs are retroactively labelled as master on prod. |
How much is left with none of the three labels, and is the origin of all of those runs clear? |
We still have a couple of thousand, most of which are from Taskcluster and Azure, and I'm fairly certain they are I don't think backfilling the |
Nah, adding those labels itself doesn't seem valuable, I was more curious about what would remain. I'd expect at most 100 or so runs triggered by folks trying stuff out by tweaking CI config, but if it's more I'd wonder if we've missed some master runs still. |
@foolip This seems to be the issue you mentioned regarding old runs not marked as master. @Hexcles When digging up foolip's https://github.com/foolip/ad-hoc-wpt-results-analysis scripts, I found that we didn't seem to have any runs before 2018-06 coming in as 'master'; which included many in the https://wpt.fyi/api/runs?sha=b0c75cd8857f9f9596a744fac65c3be5e526aede is an example; I notice that the time_end is totally mangled ( I'm not sure if we can find a different way to backdate into 2017, but re-opening to discuss. |
Note that this probably causes #1537 |
Stephen, you are correct. This is a problem caused by the additional heuristic "only consider runs >10min". There was no false positive, but a lot of false negative because back in 2017 we did not record timestamps in wptreport JSON. The I could try a more robust heuristic (only apply the >10min rule for runs created after we set up |
Given that we have manifests now for all master commits since the beginning, can we just check the results against the manifests and label ones that are complete enough? Or just pick a constant number of results to expect? |
Yes, but either way would require us to download the full JSON and deserialize it. It'd be very slow and I probably need to write some goroutine magics so that it won't take weeks (currently the labeller runs singly). |
How about if I just ingest the runs into my wpt-results repo and from there just sort them by number of results? It'd still be slow, but almost all of the code exists and would just have to churn away for a few days. |
This should include 5000+ additional runs: web-platform-tests/wpt.fyi#1175
There are a lot less than 5000 of these runs now. After Ecosystem-Infra/wpt-results-analysis@b41047e and some local edits I see this output counting the runs in question:
|
This has bit me for some analysis work I've been doing (before 2018-06, I basically have to drop the 'master' tag to get runs in). I think we should fix it, but I'm happy to own doing so since I am the one mainly impacted. (And @Hexcles has enough on their plate!) |
I've begun looking at this. Thanks to ad-hoc-wpt-results-analysis, I have a promising approach that revolves around checking the results diff for the run. For example:
What is surprising me, however, is how new a lot of these runs are. Both of the above runs are from this month! There's other weird stuff too; run 321670007 has a For now, however, I'm going to look at the 2017 and 2018 runs (since we know there should be missing master runs from then). Plan is to generate run ids via https://github.com/foolip/ad-hoc-wpt-results-analysis/pull/14/files, then use a datastore fixup to add the master label. |
Created at is the time the run is created on wpt.fyi, which should be after
time end, which is when the run finishes.
…On Sun., Nov. 24, 2019, 16:51 Stephen McGruer, ***@***.***> wrote:
I've begun looking at this. Thanks to ad-hoc-wpt-results-analysis, I have
a promising approach that revolves around checking the results diff for the
run. For example:
run/321670007/results has 36208 insertions
...
run/324880027/results has 62 insertions
What is surprising me, however, is how *new* a lot of these runs are.
Both of the above runs are from this month! There's other weird stuff too;
run 321670007 has a created_at that appears to be after its time_end...
(Perhaps I just don't know what created_at means.)
For now, however, I'm going to look at the 2017 and 2018 runs (since we
know there should be missing master runs from then). Plan is to generate
run ids via
https://github.com/foolip/ad-hoc-wpt-results-analysis/pull/14/files, then
use a datastore fixup to add the master label.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1175?email_source=notifications&email_token=AAK6ZDAKFCR2CRNSEPYFCLLQVLZNXA5CNFSM4GZZ2XFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAVZII#issuecomment-557931681>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAK6ZDGQPN3BRL4B3S3YNPDQVLZNXANCNFSM4GZZ2XFA>
.
|
Ah, that would make sense then. We should figure out how we're getting recent results with no pr_base/pr_head/master label though, if possible. |
2017 runs were just tagged in Ecosystem-Infra/wpt-results-analysis#14; there were 251 of them. |
Does that resolve this issue? |
Sorry, no, should have been clearer. I am now working on 2018 runs and there are a lot more of them :). |
Jan-June 2018 have been tagged in Ecosystem-Infra/wpt-results-analysis#17 (comment); there were 1162 of them. Remaining work is just to check what is leftover. The expectation is that we will be done to <100 runs that we don't care about. |
Well, I was wrong.
Any idea what happened in January @Hexcles ? |
Are those SHAs actually on the master branch? |
Have not checked them all for being in the repo in general, but they mostly do not appear to be in the list produced by |
They're all junk. One has 30k tests, the rest have <50 (many have 1-10). |
I didn't find any major fire during that time. And I'd say since they're not on master we can safely ignore them, in which case, this issue is officially fixed (again)! Thanks so much, @stephenmcgruer ! |
Yay, thanks @stephenmcgruer! Now, I wonder if this resulted in many more aligned runs for historical analysis? |
Hrm. I expected it to have a substantial impact, but I'm only seeing some more data for stable (2017-08 to 2017-10 has suddenly appeared, and 2018-03 to 2018-06) and no new data for experimental (still starts 2019-01). I'm going to do some random sampling of the runs I labelled master and try to figure out what is happening. |
I remembered I am a programmer.
I guess we just didn't have STP in 2018; https://wpt.fyi/api/runs?label=experimental&product=safari&from=2018-01-01&to=2018-12-31 (not even including the master label here). |
web-platform-tests/results-collection#112 was the original tracking issue for that and I think there should be some STP runs in 2018. |
Yes, web-platform-tests/results-collection#112 (comment) implies there should be runs from mid-August 2018 onwards... |
Flipping through https://wpt.fyi/api/runs?labels=safari&from=2018-08-15&to=2018-09-15 from month to month, however, I'm seeing no traces of them. There are runs from Safari 11 and 12, but both are marked stable. Interestingly, in that thread you actually linked a sha and said there was a TP run for it, and there isn't any longer... https://wpt.fyi/results/?sha=eea0b54014&label=experimental I wonder if we purged runs at some point for some reason? @Hexcles any ideas? |
https://wpt.fyi/results/?sha=eea0b54014&max-count=10 is suspicious. It shows Safari 11.1 and 12.1 both as stable. Before |
I concur. According to https://en.wikipedia.org/wiki/Safari_version_history#Safari_12 , Safari 12.1 wasn't released until March 2019, so 12.1 back in 2018 was definitely a technology preview. Most of the history can be found in web-platform-tests/wpt#14465. Tl;dr: we went back and forth on how (STP has, or at least had, both a future Safari version and a STP version, e.g. Safari 12.1 Technology Preview 70) and where (we had various parsing and overriding in I think we might have accidentally labelled those runs as stable. |
web-platform-tests/results-collection#617 has a bunch of history too. It looks like we fixed the version problem before Safari 12.1 was released, so just listing all 12.1 runs, checking that there's a gap and relabeling the first group of runs should be correct. |
Ack, will look at doing that. Re-opening to track (though the issue is somewhat misleadingly named now, there's enough history here and this shouldn't be open long). Thanks for all the history folks! |
Right, so listing all Safari stable runs whose browser version starts with
My plan is to change every run whose browser_version starts with |
Script has been run on staging + prod (logs on the PR). https://wpt.fyi/api/runs?label=experimental&product=safari&from=2018-01-01&to=2018-12-31 now shows a bunch of runs, let's see what this has done to the available data for experimental |
We now have experimental data from 2018-08 to 2019-01. Probably as good as we're going to get :) |
Running the audit.js script again, there are now 955 runs remaining, for 558 unique commits. Looks like only 77 of those commits are merge_pr_* commits. That seems like a plausible number of runs that might have been triggered for various verification purposes, and puts a low ceiling to how many real and useful master runs could be among these. Thanks @stephenmcgruer! |
Great job, Stephen! |
When the master label was introduced some old runs were labeled, but it seems not enough.
In https://github.com/foolip/ad-hoc-wpt-results-analysis/blob/master/audit.js I listed runs with neither master, pr_base or pr_head labels, and there were more than 5000. Some probably aren't master runs, but some definitely are:
https://wpt.fyi/results/?run_id=5632908932939776
Some are probably from commits on master but incomplete, so fixing this might take some work.
The text was updated successfully, but these errors were encountered: