Pipeline Architecture #10

Henni · 2017-03-07T14:12:59Z

Idea:
Build our application resembling a pipeline.
This would look as follows:

   get sentences from database
-> run relationship extraction
-> [classify relationships]
-> calculate page rank and reputability
-> store result in database

Notes:

sentences have to be stored in the database. Otherwise we have to add an additional step to extract them from the given URLs.
relationship classification could already be done by the relationship extraction algorithm
Page Rank and Reputability idea: factor in how much we trust that page. For example Facebook will probably return worse results than Wikipedia.

The text was updated successfully, but these errors were encountered:

Henni · 2017-03-07T14:24:46Z

@MusicConnectionMachine/group-3 if you agree with this approach, i would persist it in the wiki and create separate issues for each step.

krishenk · 2017-03-07T14:29:16Z

@Henni looks good to me. One question, Are we going to extract the sentences or it would be done by @MusicConnectionMachine/group-1 ? Also, regarding page rank, @vviro mentioned something regarding that, in issue #5. Please have a look.

Henni · 2017-03-07T14:40:02Z

@krishenk regarding group1 see MusicConnectionMachine/UnstructuredData#40

Regarding page rank: I completely agree with @vviro's comment #5 (comment)
This is also why I added the term reputability (also see MusicConnectionMachine/RelationshipsG4#9 (comment))
We should clear up the terms page rank and reputability at the meeting tomorrow.

vviro · 2017-03-07T14:48:46Z

@Henni is it already clear what the page rank and reputability will be based on? Is the idea here to extract the URLs from the HTML and use them as links? Is the code for doing this (going from a set of html documents to their page rank) already available or easily implementable and is it clear how to run it on this dataset? (Maybe this is a wrong issue to ask this question and there is a better place...) I just wonder whether the relationship extraction step will require more attention than would be possible if also the reputability is to be addressed. A word of caution here...

Henni · 2017-03-07T14:58:34Z

@vviro Let me come back to this tomorrow. Our team will meet tomorrow morning and this is a topic I will bring up.

kordianbruck · 2017-03-08T01:01:32Z

About that page rank: I'm just gonna leave these links here for you to further scout out

Mining the pagerank in a larger scale is against the ToS of Google

RBirkeland · 2017-03-08T11:54:25Z

It seems google does not provide their pagerank API anymore, depending on the amount of pages, we might have to implement it our self.

Henni · 2017-03-08T21:11:05Z

In my opinion page rank (in whatever way) should be a topic we will handle in the future.
Our next step should be to get the relation extraction going. This should already give some kind of quality indication which might already suffice.

kordianbruck · 2017-03-09T21:33:17Z

SEOstats (that ugly php script - @sacdallago right?) offers other apis in addition to the pagerank api. Thats why its in there ;)

kordianbruck · 2017-03-27T20:32:05Z

@Henni progress? done? needs work?

Henni · 2017-03-28T13:16:23Z

Let's count this one as done.
The architecture itself is an ongoing process, but the decisions described in here seem to be fine with everyone.

Sandr0x00 mentioned this issue Mar 15, 2017

Interface to Group3/4 MusicConnectionMachine/StructuredData#30

Closed

kordianbruck assigned Henni Mar 27, 2017

kordianbruck added the High Priority label Mar 27, 2017

Henni closed this as completed Mar 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline Architecture #10

Pipeline Architecture #10

Henni commented Mar 7, 2017

Henni commented Mar 7, 2017

krishenk commented Mar 7, 2017

Henni commented Mar 7, 2017

vviro commented Mar 7, 2017 •

edited

Loading

Henni commented Mar 7, 2017

kordianbruck commented Mar 8, 2017

RBirkeland commented Mar 8, 2017

Henni commented Mar 8, 2017

kordianbruck commented Mar 9, 2017 •

edited

Loading

kordianbruck commented Mar 27, 2017

Henni commented Mar 28, 2017

Pipeline Architecture #10

Pipeline Architecture #10

Comments

Henni commented Mar 7, 2017

Henni commented Mar 7, 2017

krishenk commented Mar 7, 2017

Henni commented Mar 7, 2017

vviro commented Mar 7, 2017 • edited Loading

Henni commented Mar 7, 2017

kordianbruck commented Mar 8, 2017

RBirkeland commented Mar 8, 2017

Henni commented Mar 8, 2017

kordianbruck commented Mar 9, 2017 • edited Loading

kordianbruck commented Mar 27, 2017

Henni commented Mar 28, 2017

vviro commented Mar 7, 2017 •

edited

Loading

kordianbruck commented Mar 9, 2017 •

edited

Loading