Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better mask_test_edges function #55

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stefanosantaris
Copy link

@tkipf This is my fix for a little bit more performant mask_test_edges function. I managed to reproduce the results of the paper and it works much better even with large graphs.
The only time consuming processes in this function are the assertions at the end of the function.

tkipf referenced this pull request Jan 3, 2020
@tkipf
Copy link
Owner

tkipf commented Jan 3, 2020

Thanks -- I'll leave this open for now in case someone is interested in a (working) example for a more efficient implementation. I rolled the master branch back to the original version before @philipjackson's PR to keep it in line with the paper.

@GuillaumeSalhaGalvan
Copy link

GuillaumeSalhaGalvan commented Jan 7, 2020

Dear all,

Contrary to previous comments (here + #54), I was able to reproduce all results from @tkipf's original paper using @philipjackson's implementation (see #25) of the mask_test_edges function.

I suspect that previous issues simply come from different train/validation/test splits. Indeed, @philipjackson set default parameters values to test_percent=30. and val_percent=20., whereas @tkipf used test_percent=10. and val_percent=5. in his experiments. So, @philipjackson masks more edges from the training graph w.r.t. original experiments, which leads to lower performances. With corrected parameters, I reach consistent results w.r.t. the paper.

Moreover, for the PubMed dataset, i.e. the largest one, @stefanosantaris's implementation runs in 3+ minutes on my laptop. @philipjackson's implementation runs in 0.03 seconds, and in a few seconds for a graph with 1 million nodes (I removed all assert lines for both functions).

As a consequence, I would recommend to use #25 with updated default parameters. :)

@haorannlp
Copy link

Dear all,

Contrary to previous comments (here + #54), I was able to reproduce all results from @tkipf's original paper using @philipjackson's implementation (see #25) of the mask_test_edges function.

I suspect that previous issues simply come from different train/validation/test splits. Indeed, @philipjackson set default parameters values to test_percent=30. and val_percent=20., whereas @tkipf used test_percent=10. and val_percent=5. in his experiments. So, @philipjackson masks more edges from the training graph w.r.t. original experiments, which leads to lower performances. With corrected parameters, I reach consistent results w.r.t. the paper.

Moreover, for the PubMed dataset, i.e. the largest one, @stefanosantaris's implementation runs in 3+ minutes on my laptop. @philipjackson's implementation runs in 0.03 seconds, and in a few seconds for a graph with 1 million nodes (I removed all assert lines for both functions).

As a consequence, I would recommend to use #25 with updated default parameters. :)

Hi, @GuillaumeSalha . Can you reproduce the results in the paper with test_percent=10, val_percent=5 with the original implementation /updated implementation? I still cannot reproduce it with the original one. Sad...

@GuillaumeSalhaGalvan
Copy link

GuillaumeSalhaGalvan commented Jan 9, 2020

Hi @haorannlp !
After changing test_percent=10. and val_percent=5. in preprocessing.py, you need to run again:

cd .. 
python setup.py install

Then, indeed, you should be able to reproduce results from the paper.

@philipjackson
Copy link
Contributor

Hi everyone,

I think what happened here is that I wrote this code along with @sbonner0 for use in a paper of our own, in which we used different sized val and test splits, and only submitted it as a pull request here as an afterthought. That's why my default val_percent and test_percent don't match up with @tkipf's originals, I didn't think to revert them when I made the pull request. Apologies for the inconvenience caused, and thanks to @GuillaumeSalha for spotting the issue!

@haorannlp
Copy link

Thank you buddy, @GuillaumeSalha! I can reproduce the results now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants