Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we rank links? #9

Open
Sandr0x00 opened this issue Mar 2, 2017 · 11 comments
Open

How do we rank links? #9

Sandr0x00 opened this issue Mar 2, 2017 · 11 comments
Assignees
Labels

Comments

@Sandr0x00
Copy link

Sandr0x00 commented Mar 2, 2017

What are trustworthy links? Why do we trust some links more than other ones? Who says that musicbrainz.org is more trusted than mymusicblog.wordpress.com? Do we say that? (For thousands of links?)

@simonzachau simonzachau self-assigned this Mar 2, 2017
@sacdallago
Copy link
Member

@pfent
Copy link

pfent commented Mar 3, 2017

One of the most teached algorithms is HITS, where you have an "authority" value. Though not strictly a "trustworthyness" value, but might be an indicator for it

@simonzachau
Copy link
Contributor

Maybe "an indicator for it" is the best we can get. How big would our focused subgraph then be for the HITS algorithm?
E.g.: Pretend we want to score the following relationship only: (Beethoven - inspired by - Mozart) -> the sources stored at 'Beethoven' and 'Mozart' by Project A could be the limited root set. However, when we judge a source as a whole, are all sources in the database our root set? Consequently, we would then need to scrape 1 level deeper for the base set?

@pfent
Copy link

pfent commented Mar 5, 2017

If we're using an optimized library (NetworkX is pretty good, but that's Python…), with all that power iteration eigenvector calculation magic, we can probably go pretty large with the focused subgraph, maybe even just use all relevant sources. But hard to tell without any measurements

@sacdallago maybe you know a Javascript library similar to NetworkX?

@sacdallago
Copy link
Member

@pfent I unfortunately don't :( But some NPM digging might make pretty things surface. One week ago I found two groups attempting to write CNNs in JS, so I'm fairly sure there's a package for everything :D :D

@simonzachau
Copy link
Contributor

Our current idea:

  1. get all links in database (root set)
  2. scrape outgoing links (for base set)
  3. generate network of all database links
  4. HITS: our plan is to try to connect NetworkX to Nodejs

@vviro
Copy link

vviro commented Mar 6, 2017

Regarding the ranking see also my comment at MusicConnectionMachine/RelationshipsG3#5 (comment)
This would be a cool thing to try out, but it seems to me that ideally the time to approach this would be when we see that we really need this refinement and we need to get to this place first.

@sacdallago
Copy link
Member

sacdallago commented Mar 6, 2017

@simonzachau Spawning child processes and assign them jobs with other languages is always a bit of an overkill! Avoid that as much as possible, and really just do that if there is no other way.

@FelsyWaschbaer
Copy link
Contributor

FelsyWaschbaer commented Mar 6, 2017

@sacdallago
Copy link
Member

https://www.npmjs.com/package/graphology-hits was last published 2 weeks ago. What that tells me is that there is someone trying to do something similar and hasn't found a solution either, and that the package is being maintained (as opposed to the year old one).

@simonzachau
Copy link
Contributor

@sacdallago thank you for reviewing our findings! That's why we also opted to try graphology-hits rather than the unmaintained ngraph.hits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants