Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a resolver with a second backend for collections #57

Open
PonteIneptique opened this issue Jun 1, 2017 · 4 comments
Open

Comments

@PonteIneptique
Copy link
Member

Currently, the only resolver we have has a backend directly read from XML or from cache.

This new resolver should :

  • Inherits from the wonderful current resolver. To allow for switching from one to the other and continue improving both side by side.
  • Provide a connection to a backend for the RDFLib store. Most probably with an ORM that allows multiple solutions such as SQLAlchemy or a more graph oriented database (but less common for devs...) like Mongo (I could not find better examples for now)
    1. Retrieving metadata about text is already fully dependant on RDFLib but...
    2. There will be a need to rewrite graph traversal of the collection so to work with RDFLib. IE, rewrite getitem(), .parent, .ancestors, . descendants to access this information through RDFLib. It means mostly that the new resolver should have it's own Collection/Textgroup/etc. system. But again, the modification would be light...
  • Implements caching of answers for this metadata (Because cache would mostly be used really as cache)
  • (Optional) Potentially think about reusing the same backend to store list of references for each texts. That could speed up some other part of the code (?)
@sonofmun
Copy link
Contributor

sonofmun commented Jul 21, 2017

Some Notes:

  • The new resolver (the Extended Resolver) would manage the store already existing as a global in MyCapytain (as it stores all the metadata of all files). It should definitely reuse a RDFLib Store Adapter, more likely a SQLAlchemy one because it provides another layer of adaptation ( https://github.com/RDFLib/rdflib-sqlalchemy )
  • MyCapytain will be responsible for setting up the Graph, Nautilus for adapting the store for the Extended Resolver: Nautilus would need to provide also subclasses of collections of MyCapytain to deal with the tree navigation that currently occurs in dictionaries and lists (.descendants, .children, .readableDescendants, etc.)
  • The collection metadata will be removed from the cache in favor of the rdflib store

@sonofmun
Copy link
Contributor

More information on the current process:

The set up :

  1. The resolver with resources that need parsing are declared in https://github.com/OpenGreekAndLatin/leipzig_cts/blob/master/modules/capitains/templates/app.py.erb#L75-L79
  2. The inventory is actually built with this information in https://github.com/OpenGreekAndLatin/leipzig_cts/blob/master/modules/capitains/templates/update_capitains_repos.rb.erb#L68-L71 : everytime our corpora change, we rebuild some of the cache : we do a parse to get the inventory in cache
  3. parse is called by the manager, which goes into every xml file (text or metadata) to build some of the information needed : https://github.com/Capitains/Nautilus/blob/master/capitains_nautilus/cts/resolver.py#L161-L258
  4. App just calls the resolver in most of its queries . It's basically the core of the app.

Workflow when running

  1. Anytime we need to access metadata (name of a text, citation scheme, text itself) we hit the inventory.
  2. The inventory, defined here https://github.com/Capitains/Nautilus/blob/master/capitains_nautilus/cts/resolver.py#L87-L91,
  3. if it dropped from cache, it will ask to reparse the whole thing . It is most likely the reason for 502 because this can take a really long time for a normal process (ie it should not be the case)

@sonofmun
Copy link
Contributor

sonofmun commented Aug 7, 2017

All these new functions should be unit tested.

@PonteIneptique
Copy link
Member Author

Partially implemented in #68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants