-
Notifications
You must be signed in to change notification settings - Fork 4
Pipeline Architecture #10
Comments
@MusicConnectionMachine/group-3 if you agree with this approach, i would persist it in the wiki and create separate issues for each step. |
@krishenk regarding group1 see MusicConnectionMachine/UnstructuredData#40 Regarding page rank: I completely agree with @vviro's comment #5 (comment) |
@Henni is it already clear what the page rank and reputability will be based on? Is the idea here to extract the URLs from the HTML and use them as links? Is the code for doing this (going from a set of html documents to their page rank) already available or easily implementable and is it clear how to run it on this dataset? (Maybe this is a wrong issue to ask this question and there is a better place...) I just wonder whether the relationship extraction step will require more attention than would be possible if also the reputability is to be addressed. A word of caution here... |
@vviro Let me come back to this tomorrow. Our team will meet tomorrow morning and this is a topic I will bring up. |
About that page rank: I'm just gonna leave these links here for you to further scout out
Mining the pagerank in a larger scale is against the ToS of Google |
It seems google does not provide their pagerank API anymore, depending on the amount of pages, we might have to implement it our self. |
In my opinion page rank (in whatever way) should be a topic we will handle in the future. |
SEOstats (that ugly php script - @sacdallago right?) offers other apis in addition to the pagerank api. Thats why its in there ;) |
@Henni progress? done? needs work? |
Let's count this one as done. |
Idea:
Build our application resembling a pipeline.
This would look as follows:
Notes:
The text was updated successfully, but these errors were encountered: