-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #11189 part 00 refactor citation relation tab logic #11845
base: main
Are you sure you want to change the base?
Fix issue #11189 part 00 refactor citation relation tab logic #11845
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
The tool reviewdog already placed comments on GitHub to indicate the places. See the tab "Files" in you PR.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.
You can check review dog's comments at the tab "Files changed" of your pull request.
9fe8522
to
cbe9e96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
The tool reviewdog already placed comments on GitHub to indicate the places. See the tab "Files" in you PR.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.
You can check review dog's comments at the tab "Files changed" of your pull request.
cbe9e96
to
8231340
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code currently does not meet JabRef's code guidelines.
We use OpenRewrite to ensure "modern" Java coding practices.
The issues found can be automatically fixed.
Please execute the gradle task rewriteRun
, check the results, commit, and push.
You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".
8231340
to
33967c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code currently does not meet JabRef's code guidelines.
We use OpenRewrite to ensure "modern" Java coding practices.
The issues found can be automatically fixed.
Please execute the gradle task rewriteRun
, check the results, commit, and push.
You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".
33967c2
to
592d4d7
Compare
d94f4d3
to
3155242
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add code explanations to the PR
|
||
import org.eclipse.jgit.util.LRUMap; | ||
|
||
public class BibEntryRelationsCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed and tested. Also, we were not merging the relations but overwriting them when cacheOrMerge...
was called (see code and fix).
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class BibEntryRelationsRepository { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed and tested.
@@ -71,7 +74,7 @@ public class CitationRelationsTab extends EntryEditorTab { | |||
private final GuiPreferences preferences; | |||
private final LibraryTab libraryTab; | |||
private final TaskExecutor taskExecutor; | |||
private final BibEntryRelationsRepository bibEntryRelationsRepository; | |||
private final SearchCitationsRelationsService searchCitationsRelationsService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduces a service layer that segregates the fetching and repository logic definitions.
.onSuccess(fetchedList -> onSearchForRelationsSucceed(entry, listView, abortButton, refreshButton, | ||
searchType, importButton, progress, fetchedList, observableList)) | ||
this.createBackGroundTask(entry, searchType, shouldRefresh) | ||
.consumeOnRunning(task -> prepareToSearchForRelations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably be renamed applyOnRunning(Consumer<Task> consumer)
.
) { | ||
return switch (searchType) { | ||
case CitationFetcher.SearchType.CITES -> { | ||
citingTask = BackgroundTask.wrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not really appreciate this solution. The method should return a Callable
instead of BackGroundTask
and it should not be possible to restart a search if one is already running for same tab. I propose to refactor this in a next PR but lets focus on the cache refactoring first. For now logic is same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me.
You can add a TODO comment if you want.
private static final Map<DOI, Set<BibEntry>> REFERENCES_MAP = new LRUMap<>(MAX_CACHED_ENTRIES, MAX_CACHED_ENTRIES); | ||
|
||
public List<BibEntry> getCitations(BibEntry entry) { | ||
return entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns a copy now.
.toList(); | ||
} | ||
|
||
public void cacheOrMergeCitations(BibEntry entry, List<BibEntry> citations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method used to rewrite data, now it does merge inputs according the method name.
|
||
import org.jabref.model.entry.BibEntry; | ||
|
||
public class LRUBibEntryRelationsRepository implements BibEntryRelationsRepository { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the fetcher logic from previous implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why LRUBibEntryRelationsCache
and LRUBibEntryRelationsRepository
are separated? Can't they be in one class LRUBibEntryRelationsRepository
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super sure of the final implementation yet. I will take the emerging model cause I would like to try to be able to chain caches (something like one for the disk, one in memory...).
However the cache was already dissociated from the repository in previous implementation. The cache is for now making use of static fields and I clearly prefer to avoid referencing them from the repository. The cache would preferably be a singleton while repository should be dedicated to each tab instance.
Also, I guess that the repository here serves as an adapter between the domain code and the low level logic (if any).
Finally that way, I am sure that we can re-use the repository for test without having to instantiate the cache itself if needed.
@@ -1,4 +1,4 @@ | |||
package org.jabref.gui.entryeditor.citationrelationtab.semanticscholar; | |||
package org.jabref.logic.importer.fetcher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to the logic package, Fecthing
is more like a back-end process - should belong to an adapter layer.
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class SearchCitationsRelationsService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetching
and Repository
logic can now be injected in orchestration logic that should not vary in next PR.
This should also make it possible to configure a citation search service based on the execution context. This approach can also enable new features like offering the user the possibility to choose between multiple fetchers targeting another online search engine.
Please no force push if not needed. All commits will be squashed when merged |
* Move repository, cache, and fetcher to logic package * Move citations model to model/citations/semanticscholar package
* Introduce service layer * Rename LRU cache implementation * Add tests helpers for repository
* Move logic from repository to service * Refactor repositories * Update tab configuration
3155242
to
18db75e
Compare
Sorry, just re-based main branch locally. |
…lation-tab-logic # Conflicts: # src/main/java/org/jabref/gui/entryeditor/citationrelationtab/CitationRelationsTab.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good. Some minor comments.
Sorry for delay. Please go ahead with everything.
@@ -0,0 +1,60 @@ | |||
package org.jabref.logic.citation.service; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to create a package for for a single class. The class can reside into the package org.jabref.logic.citation
.
new Label(Localization.lang( | ||
"Error while fetching citing entries: %0", exception.getMessage()) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not reformat. Our tooling cannot deal with that - see https://devdocs.jabref.org/code-howtos/localization.html for some hints.
) { | ||
return switch (searchType) { | ||
case CitationFetcher.SearchType.CITES -> { | ||
citingTask = BackgroundTask.wrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me.
You can add a TODO comment if you want.
} | ||
return List.of(); | ||
}, | ||
null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the "null" here. But I think, it is OK for now.
We want to go away with nulls in JabRef. If we have it, we annotate with jspecify
. But in tests, its ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just an on purpose mock, it should not be used from the application code.
I agree with you, null
is an open door to bad experiences... like de mocks frameworks sometimes (or often).
Small other comments - IntelliJ proposed to extract a method
in the tests - maybe you can also include that. |
@alexandre-cremieux Please pull before you continue working on it - I merged |
|
||
import org.jabref.model.entry.BibEntry; | ||
|
||
public class LRUBibEntryRelationsRepository implements BibEntryRelationsRepository { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why LRUBibEntryRelationsCache
and LRUBibEntryRelationsRepository
are separated? Can't they be in one class LRUBibEntryRelationsRepository
?
var errMsg = "Error while fetching references for entry %s".formatted( | ||
referencer.getTitle() | ||
); | ||
LOGGER.error(errMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors should be handled a little bit other way.
Like this:
LOGGER.error("Error while fetching references for entry %0", references.getTitle(), e)
(Hope I haven't missed the syntax and parameters)
So you see:
- Error should be the last argument, so that we have the full information.
- And
LOGGER
could be parametrized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Logger, the syntax for placeholders is {}
. %0
is used in the localization.
var errMsg = "Error while fetching citations for entry %s".formatted( | ||
cited.getTitle() | ||
); | ||
LOGGER.error(errMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
Besides, could there be some kind of special error type, @koppor, or we can leave it as Exception
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, didn't see that. A new exception inheriting from org.jabref.logic.JabRefException should be introduced.
Thanks for the review and the merge. I will resume the work on this branch and apply the changes. |
@alexandre-cremieux Sorry for the merge conflicts - can you handle them? I was always happy with IntelliJ's "resolve merge conflicts" dialog. Hope, it works in this case, too. |
Hello @koppor . Seems that we have new conflicts to resolve to be able to merge main. But I will do that when the feature will be fully developed. Was quite busy last month, I resumed the work this week. PR comments were addressed. |
Good to hear. - I think, huge changes won't be done in |
Hello @koppor . Thanks for your answer. Please do not merge main, there is a discussion we should probably have before going further. Working on the MVStore implementation, it became more clear that an ad-hoc serialization was needed to be able to store the You can find the related tests I wrote : commit I will expose a deeper analysis and a proposal on the issue page for us to be able to discuss the design, here: #11189 |
Why do you need the full BibEntry on disk? In the AI functionality, we just used the citation key - see https://github.com/JabRef/jabref/blob/main/docs/decisions/0034-use-citation-key-for-grouping-chat-messages.md. I think, you need to persist information accross sessions.
Ah, I think, you want to store the BibEntries NOT contained in the current library. Since the result of the server is a BibEntry (isn't it?), it is the right data type?
If the design is close to the code, discuss here. The issue is more user-facing. |
Hello. I do not need to save the full BibEntry to disk. It is the main concern: as we do not need the full set of fields of a BibEntry to represent a citation relation then why do we use BibEntry structure to represent the citation relation ?
In fact my proposal was more to use a dedicated data structure to citation relation to store them rather than the BibEntry (the data structure should off course belong to JabRef itself). Main logic will not change. |
A JabRef has all the logic to insert a BibEntry into a library, check duplicates, ... all based on BibEntry. It can also render a BibEntry based on the selected preview style (which the current csl lib maybe does not do, but it should for consistency reasons). I know that |
Hello @koppor Thanks for the reply. I thought also about the comparator as you suggested on the issue page, this solution should satisfy our need (even if I would have prefer to separate the two contexts). I will resume the work using |
* Implement MVStore for relations as DAO * Implement LRUCache for relations as DAO
* Solve task 1 * Implementation of a DAO chain: memory cache and MVStore * Persist citations as relations to disk after a fetch * Avoid fetching data if relations are available from MVStore * Avoid reading data from MVStore if available in memory * Consume less from network, minimize disk usage
Task 1 is solved: minimize network access and disk usage by the same time. To do for task 2 - manage force update:
|
Please do not merge master for now. |
…): * Solve part of task 2: make impossible to force a search on a BibEntry over a week since last insertion * The MVStoreDAO search lock is based on a timestamp map (doi -> lastInsertionDate) * All time computation are based on UTC * The LRU cache will always return true -> the computer could stay up during a week, leaving cache in memory
Task 2 is partially solved -> needs a decision To do next to fulfill the task:
Hello @koppor. The development is almost finished. However, there is a detail that needs your input. For now, the lock mechanism to avoid fetching the relations is working in case the store contains them for a DOI and lasted time between an insertion and a force update is over a week. However, if a search is done and returns no relations for a DOI then the user can still force update the fetch for this DOI cause the MVStore does not contain those relations. Would you like to avoid the user to force update the fetch in case nothing was returned after a first successful fetch ? (see last point above). Personally, I would preferably avoid the fetch also in that case. |
Add a new getter to
I think, you can "borrow" code from
Depends on the error. At least, at the next launch of JabRef, there should be a retry be made.
How does the user "force" the update? Why does it depend on the number of entries? I would assume that a user has a refresh button triggering a fetch. I think, you need an additional map: from DOI to record(lastFetchDate, Optional). Then, you can check if information was fetched. |
Nothing has changed from the UI perspective: the
Error case is solved, no problems with that:
This is done also. I took the option to create a new map in the same store. Much easier, it is serializable as it (everything is UTC to be sure we are comparing apples to apples). Okay, so I guess we are on the same line. Also thanks for the Directories and example code. I will clean up everything. Should be available for review within a week. |
…ed an empty list * Solve completely Task 2: make impossible to force a search on a BibEntry over a week since last insertion
Task 1 and 2 are solved Note: see previous comment to know what still needs to be done. |
But why? This is not what I would expect as user. I could live with following thing:
Nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I've only left some comments on some code style and idioms.
It's awesome that you also written tests, not every part of JabRef is tested.
Oh, and I haven't reviewed your discussion with Oliver, so probably some of my questions were already answered 😅
citationsRelationsTabViewModel = new CitationsRelationsTabViewModel(databaseContext, preferences, undoManager, stateManager, dialogService, fileUpdateMonitor, taskExecutor); | ||
|
||
try { | ||
var jabRefPath = Paths.get("/home/sacha/Documents/projects/JabRef"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you discussed this with Oliver?
It's your local directory, but I think you probably understand that you should change it.
try { | ||
var jabRefPath = Paths.get("/home/sacha/Documents/projects/JabRef"); | ||
var citationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "citations"); | ||
var relationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "references"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Path.resolve()
? I think it's more idiomatic. What if toAbsolutePath()
will return a trailing slash? That is why there is a dedicated resolve()
.
try { | ||
var jabRefPath = Paths.get("/home/sacha/Documents/projects/JabRef"); | ||
var citationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "citations"); | ||
var relationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "references"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also make constants for "citations"
and "references"
@@ -399,47 +425,61 @@ private void searchForRelations(BibEntry entry, CheckListView<CitationRelationIt | |||
|
|||
listView.setItems(observableList); | |||
|
|||
// TODO: It should not be possible to cancel a search task that is already running for same tab | |||
if (citingTask != null && !citingTask.isCancelled() && searchType == CitationFetcher.SearchType.CITES) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nullable things we typically use Option
. (Even though IDEA will warn that oh no optional is used as a field
.
But still, Optional
is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
citedByTask != null && !citedByTask.isCancelled()
Could be rewritten like citedByTask.map(BackgroundTask::isCancelled).orElse(false)
(could you please double-check if any parameters are needed in BackgroundTask::
).
@koppor, using Optionals seems to be a little bit more verbose. Should we still use Optional there instead of null?
progress, | ||
fetchedList, | ||
observableList | ||
)) | ||
.onFailure(exception -> { | ||
LOGGER.error("Error while fetching citing Articles", exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
citing Articles
-> citing articles
just a typo 😃
MVStoreBibEntryRelationDAO(Path path, String mapName) { | ||
this.mapName = mapName; | ||
this.insertionTimeStampMapName = mapName + "-insertion-timestamp"; | ||
this.storeConfiguration = new MVStore.Builder().autoCommitDisabled().fileName(path.toAbsolutePath().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you place such chained methods in several lines, like this:
this.storeConfiguration = new MVStore.Builder()
.autoCommitDisabled()
.fileName(path.toAbsolutePath().toString());
That is what we often use.
.orElse(true); | ||
} | ||
|
||
private static class BibEntrySerializer extends BasicDataType<BibEntry> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a new serialization technique is needed? What about canonicalized bib entry
(I forgot the actual name, but using those words you can find it in code)?
private static String toString(BibEntry entry) { | ||
return String.join( | ||
FIELD_SEPARATOR, | ||
entry.getTitle().orElse("null"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm.. What if a title of a paper is null
? Or author.
In practice -- of course not, but still there are such cases
private static BibEntry fromString(String serializedString) { | ||
var fields = serializedString.split(FIELD_SEPARATOR); | ||
BibEntry entry = new BibEntry(); | ||
extractFieldValue(fields[0]).ifPresent(title -> entry.setField(StandardField.TITLE, title)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only these fields are stored?
|
||
/** | ||
* Memory size is the sum of all aggregated bibEntries memory size plus 4 bytes. | ||
* Those 4 bytes are used to store the length of the collection itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move this explanation inside this getMemory()
function as it contains only implementation details
Hello @InAnYan Thanks for the review and interesting feedback :) Will integrate your comments in the final version. I am using the TDD technique for development, the test is a valuable output of this. But most important is the design itself that comes out of this (DRY, patterns, etc), it is like agile at the code line level ;) It is not finished yet: clean up is missing. |
Hello @koppor Thanks for your reply.
As the issue definition was covering only the storage, then I didn't touch the current user experience or add any thing to the view itself). I agree with you that the UX could be enhanced. I propose that we first close this case and that you assign me another issue/feature covering the user experience for this. This could be the opportunity to better handler the multi threading on this part. That way, we will clearly separate the concerns between the features/issues task in the git history. Also, I am very sorry, but I got an unexpected overload of work recently. Do not expect a version for review before end of next week. |
OK. I fieled #12247. I assume, you are working on "Task 1" of the issue. Thus, I need to file "Task 2" as separate issue?
OK! Please comment on #12247 so that I can assign you 😅
Sure. - Looking forward to welcome you back! |
Task 2 is included here. Just a question: do you want Jabref to automatically fetch citations relations after 7 days since last search even if citations are referenced in the store ? (I guess you are asking that because of the cited by list). Otherwise, as said before, the UI already included a refresh button. The user can then refresh data itself. This seems okay for me. I might be wrong, but I guess that a cited by list does not change so much. Also somewhere, limiting access to the network result in less energy consumption, less C02, etc 😅 Up to you for this automation, both are possible with this implementation => just another check to add to the store. |
Think, we need this configurable. With a higher default value. Maybe 30 days? - And also disable the feature by default.
Sure. Manual is always nice.
My papers got 200+ more citations the last months - thus I am interested in the new cites. Maybe, just a refresh is not enough. Maybe, I need an "automatic group" that automatically adds new citations. This is future work. |
Okay, I am taking a look to make the value configurable. I guess it should be defined in a new preference pane in Also, what do you mean exactly by disable the feature by default ? Disable the automatic fetch mechanism ?
So I was wrong :) The "automatic group" will be feasible with this design in the future: we only have one and only one store for all the citations and one and only one for references. Also, I didn't took a look to Semantic Scholar's graph API but maybe they do offer a way to fetch citations for multiple DOI's at the same time, which could help JabRef developers to implement this. Just to know: what papers did you wright ? Seems to cover a hot topic if I understand well. |
… exhausted * Remove the isForceUpdate boolean * User is still able to trigger the fetch if an error occurs
Code update:
|
* Instantiate service in JabRefGui * Inject service in EntryEditor
Hello @koppor. Now, only preference for the MV store TTL time is missing. I thought about adding it under
Default value would be 7. It would be added under a new category: |
Refs #11189
This contributions aims to simplify the citations/references fetching and caching logic by introducing two layers:
This should help to make this feature more extendable without modifying orchestration logic following open/close principle.
Also, this PR will allow to introduce a new caching logic in coming PR.
Missing requirements for merging will come after draft review.
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)