Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #11189 part 00 refactor citation relation tab logic #11845

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

alexandre-cremieux
Copy link
Contributor

@alexandre-cremieux alexandre-cremieux commented Sep 28, 2024

Refs #11189

This contributions aims to simplify the citations/references fetching and caching logic by introducing two layers:

  • service
  • repository

This should help to make this feature more extendable without modifying orchestration logic following open/close principle.

Also, this PR will allow to introduce a new caching logic in coming PR.

Missing requirements for merging will come after draft review.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@alexandre-cremieux alexandre-cremieux marked this pull request as draft September 28, 2024 20:38
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
The tool reviewdog already placed comments on GitHub to indicate the places. See the tab "Files" in you PR.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

You can check review dog's comments at the tab "Files changed" of your pull request.

@alexandre-cremieux alexandre-cremieux force-pushed the fix-issue-11189-part-00-refactor-citation-relation-tab-logic branch from 9fe8522 to cbe9e96 Compare September 28, 2024 21:15
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
The tool reviewdog already placed comments on GitHub to indicate the places. See the tab "Files" in you PR.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

You can check review dog's comments at the tab "Files changed" of your pull request.

@alexandre-cremieux alexandre-cremieux force-pushed the fix-issue-11189-part-00-refactor-citation-relation-tab-logic branch from cbe9e96 to 8231340 Compare September 28, 2024 21:51
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use OpenRewrite to ensure "modern" Java coding practices.
The issues found can be automatically fixed.
Please execute the gradle task rewriteRun, check the results, commit, and push.

You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".

@alexandre-cremieux alexandre-cremieux force-pushed the fix-issue-11189-part-00-refactor-citation-relation-tab-logic branch from 8231340 to 33967c2 Compare September 28, 2024 22:20
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use OpenRewrite to ensure "modern" Java coding practices.
The issues found can be automatically fixed.
Please execute the gradle task rewriteRun, check the results, commit, and push.

You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".

@alexandre-cremieux alexandre-cremieux force-pushed the fix-issue-11189-part-00-refactor-citation-relation-tab-logic branch from 33967c2 to 592d4d7 Compare September 28, 2024 22:47
@koppor koppor changed the title Fix issue 11189 part 00 refactor citation relation tab logic Fix issue #11189 part 00 refactor citation relation tab logic Sep 29, 2024
@alexandre-cremieux alexandre-cremieux force-pushed the fix-issue-11189-part-00-refactor-citation-relation-tab-logic branch 2 times, most recently from d94f4d3 to 3155242 Compare September 29, 2024 13:24
Copy link
Contributor Author

@alexandre-cremieux alexandre-cremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add code explanations to the PR


import org.eclipse.jgit.util.LRUMap;

public class BibEntryRelationsCache {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed and tested. Also, we were not merging the relations but overwriting them when cacheOrMerge... was called (see code and fix).

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class BibEntryRelationsRepository {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed and tested.

@@ -71,7 +74,7 @@ public class CitationRelationsTab extends EntryEditorTab {
private final GuiPreferences preferences;
private final LibraryTab libraryTab;
private final TaskExecutor taskExecutor;
private final BibEntryRelationsRepository bibEntryRelationsRepository;
private final SearchCitationsRelationsService searchCitationsRelationsService;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduces a service layer that segregates the fetching and repository logic definitions.

.onSuccess(fetchedList -> onSearchForRelationsSucceed(entry, listView, abortButton, refreshButton,
searchType, importButton, progress, fetchedList, observableList))
this.createBackGroundTask(entry, searchType, shouldRefresh)
.consumeOnRunning(task -> prepareToSearchForRelations(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably be renamed applyOnRunning(Consumer<Task> consumer).

) {
return switch (searchType) {
case CitationFetcher.SearchType.CITES -> {
citingTask = BackgroundTask.wrap(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really appreciate this solution. The method should return a Callable instead of BackGroundTask and it should not be possible to restart a search if one is already running for same tab. I propose to refactor this in a next PR but lets focus on the cache refactoring first. For now logic is same as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me.

You can add a TODO comment if you want.

private static final Map<DOI, Set<BibEntry>> REFERENCES_MAP = new LRUMap<>(MAX_CACHED_ENTRIES, MAX_CACHED_ENTRIES);

public List<BibEntry> getCitations(BibEntry entry) {
return entry
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns a copy now.

.toList();
}

public void cacheOrMergeCitations(BibEntry entry, List<BibEntry> citations) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method used to rewrite data, now it does merge inputs according the method name.


import org.jabref.model.entry.BibEntry;

public class LRUBibEntryRelationsRepository implements BibEntryRelationsRepository {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the fetcher logic from previous implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why LRUBibEntryRelationsCache and LRUBibEntryRelationsRepository are separated? Can't they be in one class LRUBibEntryRelationsRepository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super sure of the final implementation yet. I will take the emerging model cause I would like to try to be able to chain caches (something like one for the disk, one in memory...).

However the cache was already dissociated from the repository in previous implementation. The cache is for now making use of static fields and I clearly prefer to avoid referencing them from the repository. The cache would preferably be a singleton while repository should be dedicated to each tab instance.

Also, I guess that the repository here serves as an adapter between the domain code and the low level logic (if any).

Finally that way, I am sure that we can re-use the repository for test without having to instantiate the cache itself if needed.

@@ -1,4 +1,4 @@
package org.jabref.gui.entryeditor.citationrelationtab.semanticscholar;
package org.jabref.logic.importer.fetcher;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the logic package, Fecthing is more like a back-end process - should belong to an adapter layer.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SearchCitationsRelationsService {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching and Repository logic can now be injected in orchestration logic that should not vary in next PR.

This should also make it possible to configure a citation search service based on the execution context. This approach can also enable new features like offering the user the possibility to choose between multiple fetchers targeting another online search engine.

@Siedlerchr
Copy link
Member

Please no force push if not needed. All commits will be squashed when merged

* Move repository, cache, and fetcher to logic package
* Move citations model to model/citations/semanticscholar package
* Introduce service layer
* Rename LRU cache implementation
* Add tests helpers for repository
* Move logic from repository to service
* Refactor repositories
* Update tab configuration
@alexandre-cremieux alexandre-cremieux force-pushed the fix-issue-11189-part-00-refactor-citation-relation-tab-logic branch from 3155242 to 18db75e Compare September 29, 2024 15:01
@alexandre-cremieux
Copy link
Contributor Author

Please no force push if not needed. All commits will be squashed when merged

Sorry, just re-based main branch locally.

…lation-tab-logic

# Conflicts:
#	src/main/java/org/jabref/gui/entryeditor/citationrelationtab/CitationRelationsTab.java
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good. Some minor comments.

Sorry for delay. Please go ahead with everything.

@@ -0,0 +1,60 @@
package org.jabref.logic.citation.service;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create a package for for a single class. The class can reside into the package org.jabref.logic.citation.

Comment on lines 443 to 445
new Label(Localization.lang(
"Error while fetching citing entries: %0", exception.getMessage())
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not reformat. Our tooling cannot deal with that - see https://devdocs.jabref.org/code-howtos/localization.html for some hints.

) {
return switch (searchType) {
case CitationFetcher.SearchType.CITES -> {
citingTask = BackgroundTask.wrap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me.

You can add a TODO comment if you want.

}
return List.of();
},
null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the "null" here. But I think, it is OK for now.

We want to go away with nulls in JabRef. If we have it, we annotate with jspecify. But in tests, its ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just an on purpose mock, it should not be used from the application code.

I agree with you, null is an open door to bad experiences... like de mocks frameworks sometimes (or often).

@koppor koppor marked this pull request as ready for review October 10, 2024 20:29
@koppor
Copy link
Member

koppor commented Oct 10, 2024

Small other comments - IntelliJ proposed to extract a method

private static SearchCitationsRelationsService getSearchCitationsRelationsService(BibEntry cited, List<BibEntry> citationsToReturn, BibEntryRelationsRepository citationsToReturn1) {

in the tests - maybe you can also include that.

@koppor
Copy link
Member

koppor commented Oct 10, 2024

@alexandre-cremieux Please pull before you continue working on it - I merged main for you (and resolved conflicts).


import org.jabref.model.entry.BibEntry;

public class LRUBibEntryRelationsRepository implements BibEntryRelationsRepository {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why LRUBibEntryRelationsCache and LRUBibEntryRelationsRepository are separated? Can't they be in one class LRUBibEntryRelationsRepository?

var errMsg = "Error while fetching references for entry %s".formatted(
referencer.getTitle()
);
LOGGER.error(errMsg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors should be handled a little bit other way.

Like this:

LOGGER.error("Error while fetching references for entry %0", references.getTitle(), e)

(Hope I haven't missed the syntax and parameters)

So you see:

  • Error should be the last argument, so that we have the full information.
  • And LOGGER could be parametrized

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Logger, the syntax for placeholders is {}. %0 is used in the localization.

var errMsg = "Error while fetching citations for entry %s".formatted(
cited.getTitle()
);
LOGGER.error(errMsg);
Copy link
Collaborator

@InAnYan InAnYan Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Besides, could there be some kind of special error type, @koppor, or we can leave it as Exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, didn't see that. A new exception inheriting from org.jabref.logic.JabRefException should be introduced.

@alexandre-cremieux
Copy link
Contributor Author

@alexandre-cremieux Please pull before you continue working on it - I merged main for you (and resolved conflicts).

Thanks for the review and the merge. I will resume the work on this branch and apply the changes.

@koppor
Copy link
Member

koppor commented Nov 8, 2024

@alexandre-cremieux Sorry for the merge conflicts - can you handle them? I was always happy with IntelliJ's "resolve merge conflicts" dialog. Hope, it works in this case, too.

@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 8, 2024

@alexandre-cremieux Sorry for the merge conflicts - can you handle them? I was always happy with IntelliJ's "resolve merge conflicts" dialog. Hope, it works in this case, too.

Hello @koppor . Seems that we have new conflicts to resolve to be able to merge main. But I will do that when the feature will be fully developed. Was quite busy last month, I resumed the work this week. PR comments were addressed.

@koppor
Copy link
Member

koppor commented Nov 11, 2024

@alexandre-cremieux Sorry for the merge conflicts - can you handle them? I was always happy with IntelliJ's "resolve merge conflicts" dialog. Hope, it works in this case, too.
Hello @koppor . Seems that we have new conflicts to resolve to be able to merge main. But I will do that when the feature will be fully developed. Was quite busy last month, I resumed the work this week. PR comments were addressed.

Good to hear. - I think, huge changes won't be done in main the next weeks. Thus, it would be a good idea to merge main. This would enable the CI to run. I could try to merge if I have time.

@alexandre-cremieux
Copy link
Contributor Author

Hello @koppor . Thanks for your answer.

Please do not merge main, there is a discussion we should probably have before going further.

Working on the MVStore implementation, it became more clear that an ad-hoc serialization was needed to be able to store the BibEntry to disk and the way it should be done make me understand that the BibEntry type is maybe not the most suitable one to address the CitationTab use case.

You can find the related tests I wrote : commit

I will expose a deeper analysis and a proposal on the issue page for us to be able to discuss the design, here: #11189

@koppor
Copy link
Member

koppor commented Nov 11, 2024

Working on the MVStore implementation, it became more clear that an ad-hoc serialization was needed to be able to store the BibEntry to disk

Why do you need the full BibEntry on disk?

In the AI functionality, we just used the citation key - see https://github.com/JabRef/jabref/blob/main/docs/decisions/0034-use-citation-key-for-grouping-chat-messages.md. I think, you need to persist information accross sessions.

and the way it should be done make me understand that the BibEntry type is maybe not the most suitable one to address the CitationTab use case.

Ah, I think, you want to store the BibEntries NOT contained in the current library. Since the result of the server is a BibEntry (isn't it?), it is the right data type?

I will expose a deeper analysis and a proposal on the issue page for us to be able to discuss the design, here: #11189

If the design is close to the code, discuss here. The issue is more user-facing.

@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 12, 2024

Why do you need the full BibEntry on disk?

Hello. I do not need to save the full BibEntry to disk. It is the main concern: as we do not need the full set of fields of a BibEntry to represent a citation relation then why do we use BibEntry structure to represent the citation relation ?

Ah, I think, you want to store the BibEntries NOT contained in the current library. Since the result of the server is a BibEntry (isn't it?), it is the right data type?

In fact my proposal was more to use a dedicated data structure to citation relation to store them rather than the BibEntry (the data structure should off course belong to JabRef itself). Main logic will not change.

@koppor
Copy link
Member

koppor commented Nov 13, 2024

Hello. I do not need to save the full BibEntry to disk. It is the main concern: as we do not need the full set of fields of a BibEntry to represent a citation relation then why do we use BibEntry structure to represent the citation relation ?

A BibEntry does not enforce to store all fields.

JabRef has all the logic to insert a BibEntry into a library, check duplicates, ... all based on BibEntry. It can also render a BibEntry based on the selected preview style (which the current csl lib maybe does not do, but it should for consistency reasons).

I know that entry.getField(StandardField.TITLE) reads strange. It is OK for me to introduce methods such as org.jabref.model.entry.BibEntry#getTitle for readability.

@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 13, 2024

Hello. I do not need to save the full BibEntry to disk. It is the main concern: as we do not need the full set of fields of a BibEntry to represent a citation relation then why do we use BibEntry structure to represent the citation relation ?

A BibEntry does not enforce to store all fields.

JabRef has all the logic to insert a BibEntry into a library, check duplicates, ... all based on BibEntry. It can also render a BibEntry based on the selected preview style (which the current csl lib maybe does not do, but it should for consistency reasons).

I know that entry.getField(StandardField.TITLE) reads strange. It is OK for me to introduce methods such as org.jabref.model.entry.BibEntry#getTitle for readability.

Hello @koppor

Thanks for the reply. I thought also about the comparator as you suggested on the issue page, this solution should satisfy our need (even if I would have prefer to separate the two contexts). I will resume the work using BibEntry to store the citation in the MVStore as this is agreed between us, and rely on JabRef's logic for duplication search.

* Implement MVStore for relations as DAO
* Implement LRUCache for relations as DAO
* Solve task 1
* Implementation of a DAO chain: memory cache and MVStore
* Persist citations as relations to disk after a fetch
* Avoid fetching data if relations are available from MVStore
* Avoid reading data from MVStore if available in memory
* Consume less from network, minimize disk usage
@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 17, 2024

Task 1 is solved: minimize network access and disk usage by the same time.

To do for task 2 - manage force update:

  • Add a method to the DAO API: boolean isUpdatable(BibEntry entry). Implementation should return true if the store contains the key or if the last update <= 7 day back.
  • Update the MVStoreDAO implementation in order to manage another map exposing the last insertion date.

@alexandre-cremieux
Copy link
Contributor Author

Please do not merge master for now.

…):

* Solve part of task 2: make impossible to force a search on a BibEntry over a week since last insertion
* The MVStoreDAO search lock is based on a timestamp map (doi -> lastInsertionDate)
* All time computation are based on UTC
* The LRU cache will always return true -> the computer could stay up during a week, leaving cache in memory
@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 24, 2024

Task 2 is partially solved -> needs a decision

To do next to fulfill the task:

  • Define the right place to configure the stores path (files for the MVStore)
  • Clean code
  • Add documentation
  • Inject the repository using the IOC framework to be sure it is a singleton using same DAOs for all tabs
  • Solve last issue with the lock: should it be impossible to search for relations regarding a DOI in case a successful fetch does not return data ? In that case the simplest solution would be to insert an empty collection of relations into the MVStore for this DOI.

Hello @koppor.

The development is almost finished. However, there is a detail that needs your input. For now, the lock mechanism to avoid fetching the relations is working in case the store contains them for a DOI and lasted time between an insertion and a force update is over a week.

However, if a search is done and returns no relations for a DOI then the user can still force update the fetch for this DOI cause the MVStore does not contain those relations.

Would you like to avoid the user to force update the fetch in case nothing was returned after a first successful fetch ? (see last point above). Personally, I would preferably avoid the fetch also in that case.

@koppor
Copy link
Member

koppor commented Nov 25, 2024

* Define the right place to configure the stores path (files for the MVStore)

Add a new getter to org.jabref.logic.util.Directories. You will see the pattern.

* Inject the repository using the IOC framework to be sure it is a singleton using same DAOs for all tabs

I think, you can "borrow" code from org.jabref.gui.maintable.MainTableDataModel.SearchIndexListener or org.jabref.logic.search.LuceneIndexer.

* Solve last issue with the lock: should it be impossible to search for relations regarding a DOI in case a successful fetch does not return data ? In that case the simplest solution would be to insert an empty collection of relations into the MVStore for this DOI.

Depends on the error. At least, at the next launch of JabRef, there should be a retry be made.

However, if a search is done and returns no relations for a DOI then the user can still force update the fetch for this DOI cause the MVStore does not contain those relations.

How does the user "force" the update? Why does it depend on the number of entries? I would assume that a user has a refresh button triggering a fetch.

I think, you need an additional map: from DOI to record(lastFetchDate, Optional). Then, you can check if information was fetched.

@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 27, 2024

How does the user "force" the update? Why does it depend on the number of entries? I would assume that a user has a refresh button triggering a fetch.

Nothing has changed from the UI perspective: the Restart search button is still available. If the user clicks on it, a search is triggered. BUT JabRef won't execute the search and stick to the store if the new search is executed within a week after the last one occurred (insertion date 2 - insertion date 1 <= 7 days).

Depends on the error. At least, at the next launch of JabRef, there should be a retry be made.

Error case is solved, no problems with that:

  • No insertion will be done in case the fetcher throw an exception
  • If no insertion occurs then the time stamp will not be inserted into the store

I think, you need an additional map: from DOI to record(lastFetchDate, Optional). Then, you can check if information was fetched.

This is done also. I took the option to create a new map in the same store. Much easier, it is serializable as it (everything is UTC to be sure we are comparing apples to apples).

Okay, so I guess we are on the same line. Also thanks for the Directories and example code. I will clean up everything. Should be available for review within a week.

…ed an empty list

* Solve completely Task 2: make impossible to force a search on a BibEntry over a week since last insertion
@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 27, 2024

Task 1 and 2 are solved

Note: see previous comment to know what still needs to be done.

@koppor
Copy link
Member

koppor commented Nov 28, 2024

How does the user "force" the update? Why does it depend on the number of entries? I would assume that a user has a refresh button triggering a fetch.
Nothing has changed from the UI perspective: the Restart search button is still available. If the user clicks on it, a search is triggered. BUT JabRef won't execute the search and stick to the store if the new search is executed within a week after the last one occurred (insertion date 2 - insertion date 1 <= 7 days).

But why? This is not what I would expect as user.

I could live with following thing:

  1. If there was an error: retry
  2. If below threshold time: ask in a dialog whether to really refe tch
  3. Refetch

Okay, so I guess we are on the same line. Also thanks for the Directories and example code. I will clean up everything. Should be available for review within a week.

Nice!

Copy link
Collaborator

@InAnYan InAnYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I've only left some comments on some code style and idioms.

It's awesome that you also written tests, not every part of JabRef is tested.

Oh, and I haven't reviewed your discussion with Oliver, so probably some of my questions were already answered 😅

citationsRelationsTabViewModel = new CitationsRelationsTabViewModel(databaseContext, preferences, undoManager, stateManager, dialogService, fileUpdateMonitor, taskExecutor);

try {
var jabRefPath = Paths.get("/home/sacha/Documents/projects/JabRef");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you discussed this with Oliver?

It's your local directory, but I think you probably understand that you should change it.

try {
var jabRefPath = Paths.get("/home/sacha/Documents/projects/JabRef");
var citationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "citations");
var relationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "references");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Path.resolve()? I think it's more idiomatic. What if toAbsolutePath() will return a trailing slash? That is why there is a dedicated resolve().

try {
var jabRefPath = Paths.get("/home/sacha/Documents/projects/JabRef");
var citationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "citations");
var relationsPath = Path.of(jabRefPath.toAbsolutePath() + File.separator + "references");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also make constants for "citations" and "references"

@@ -399,47 +425,61 @@ private void searchForRelations(BibEntry entry, CheckListView<CitationRelationIt

listView.setItems(observableList);

// TODO: It should not be possible to cancel a search task that is already running for same tab
if (citingTask != null && !citingTask.isCancelled() && searchType == CitationFetcher.SearchType.CITES) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nullable things we typically use Option. (Even though IDEA will warn that oh no optional is used as a field.

But still, Optional is better

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citedByTask != null && !citedByTask.isCancelled()

Could be rewritten like citedByTask.map(BackgroundTask::isCancelled).orElse(false) (could you please double-check if any parameters are needed in BackgroundTask::).

@koppor, using Optionals seems to be a little bit more verbose. Should we still use Optional there instead of null?

progress,
fetchedList,
observableList
))
.onFailure(exception -> {
LOGGER.error("Error while fetching citing Articles", exception);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citing Articles -> citing articles

just a typo 😃

MVStoreBibEntryRelationDAO(Path path, String mapName) {
this.mapName = mapName;
this.insertionTimeStampMapName = mapName + "-insertion-timestamp";
this.storeConfiguration = new MVStore.Builder().autoCommitDisabled().fileName(path.toAbsolutePath().toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you place such chained methods in several lines, like this:

this.storeConfiguration = new MVStore.Builder()
	.autoCommitDisabled()
	.fileName(path.toAbsolutePath().toString());

That is what we often use.

.orElse(true);
}

private static class BibEntrySerializer extends BasicDataType<BibEntry> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a new serialization technique is needed? What about canonicalized bib entry (I forgot the actual name, but using those words you can find it in code)?

private static String toString(BibEntry entry) {
return String.join(
FIELD_SEPARATOR,
entry.getTitle().orElse("null"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm.. What if a title of a paper is null? Or author.

In practice -- of course not, but still there are such cases

private static BibEntry fromString(String serializedString) {
var fields = serializedString.split(FIELD_SEPARATOR);
BibEntry entry = new BibEntry();
extractFieldValue(fields[0]).ifPresent(title -> entry.setField(StandardField.TITLE, title));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only these fields are stored?


/**
* Memory size is the sum of all aggregated bibEntries memory size plus 4 bytes.
* Those 4 bytes are used to store the length of the collection itself.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to move this explanation inside this getMemory() function as it contains only implementation details

@alexandre-cremieux
Copy link
Contributor Author

Great! I've only left some comments on some code style and idioms.

It's awesome that you also written tests, not every part of JabRef is tested.

Oh, and I haven't reviewed your discussion with Oliver, so probably some of my questions were already answered 😅

Hello @InAnYan

Thanks for the review and interesting feedback :) Will integrate your comments in the final version.

I am using the TDD technique for development, the test is a valuable output of this. But most important is the design itself that comes out of this (DRY, patterns, etc), it is like agile at the code line level ;)

It is not finished yet: clean up is missing.

@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 30, 2024

Hello @koppor

Thanks for your reply.

But why? This is not what I would expect as user.

I could live with following thing:

1. If there was an error: retry

2. If below threshold time: ask in a dialog whether to really refe tch

3. Refetch

As the issue definition was covering only the storage, then I didn't touch the current user experience or add any thing to the view itself).

I agree with you that the UX could be enhanced. I propose that we first close this case and that you assign me another issue/feature covering the user experience for this. This could be the opportunity to better handler the multi threading on this part.

That way, we will clearly separate the concerns between the features/issues task in the git history.

Also, I am very sorry, but I got an unexpected overload of work recently. Do not expect a version for review before end of next week.

@koppor
Copy link
Member

koppor commented Nov 30, 2024

As the issue definition was covering only the storage, then I didn't touch the current user experience or add any thing to the view itself).

OK. I fieled #12247.

I assume, you are working on "Task 1" of the issue. Thus, I need to file "Task 2" as separate issue?

I agree with you that the UX could be enhanced. I propose that we first close this case and that you assign me another issue/feature covering the user experience for this. This could be the opportunity to better handler the multi threading on this part.

OK! Please comment on #12247 so that I can assign you 😅

Also, I am very sorry, but I got an unexpected overload of work recently. Do not expect a version for review before end of next week.

Sure. - Looking forward to welcome you back!

@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Nov 30, 2024

I assume, you are working on "Task 1" of the issue. Thus, I need to file "Task 2" as separate issue?

Task 2 is included here. Just a question: do you want Jabref to automatically fetch citations relations after 7 days since last search even if citations are referenced in the store ? (I guess you are asking that because of the cited by list).

Otherwise, as said before, the UI already included a refresh button. The user can then refresh data itself. This seems okay for me. I might be wrong, but I guess that a cited by list does not change so much. Also somewhere, limiting access to the network result in less energy consumption, less C02, etc 😅

Up to you for this automation, both are possible with this implementation => just another check to add to the store.

@koppor
Copy link
Member

koppor commented Dec 3, 2024

I assume, you are working on "Task 1" of the issue. Thus, I need to file "Task 2" as separate issue?
Task 2 is included here. Just a question: do you want Jabref to automatically fetch citations relations after 7 days since last search even if citations are referenced in the store ? (I guess you are asking that because of the cited by list).

Think, we need this configurable. With a higher default value. Maybe 30 days? - And also disable the feature by default.

Otherwise, as said before, the UI already included a refresh button. The user can then refresh data itself. This seems okay for me.

Sure. Manual is always nice.

I might be wrong, but I guess that a cited by list does not change so much.

My papers got 200+ more citations the last months - thus I am interested in the new cites. Maybe, just a refresh is not enough. Maybe, I need an "automatic group" that automatically adds new citations. This is future work.

@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Dec 7, 2024

I assume, you are working on "Task 1" of the issue. Thus, I need to file "Task 2" as separate issue?
Task 2 is included here. Just a question: do you want Jabref to automatically fetch citations relations after 7 days since last search even if citations are referenced in the store ? (I guess you are asking that because of the cited by list).

Think, we need this configurable. With a higher default value. Maybe 30 days? - And also disable the feature by default.

Okay, I am taking a look to make the value configurable. I guess it should be defined in a new preference pane in JabRefGuiPreferences. What can we call it ? Maybe simply Citations relations preferences ?

Also, what do you mean exactly by disable the feature by default ? Disable the automatic fetch mechanism ?

I might be wrong, but I guess that a cited by list does not change so much.

My papers got 200+ more citations the last months - thus I am interested in the new cites. Maybe, just a refresh is not enough. Maybe, I need an "automatic group" that automatically adds new citations. This is future work.

So I was wrong :)

The "automatic group" will be feasible with this design in the future: we only have one and only one store for all the citations and one and only one for references. Also, I didn't took a look to Semantic Scholar's graph API but maybe they do offer a way to fetch citations for multiple DOI's at the same time, which could help JabRef developers to implement this.

Just to know: what papers did you wright ? Seems to cover a hot topic if I understand well.

… exhausted

* Remove the isForceUpdate boolean
* User is still able to trigger the fetch if an error occurs
@alexandre-cremieux
Copy link
Contributor Author

Code update:

  • The update will be done automatically after the guard delay of the repo is exhausted
  • The user is still able to re-trigger the fetch if on error occurs (UX improvement will be done later: Improve "Refresh button" of CSL Preview #12247)

* Instantiate service in JabRefGui
* Inject service in EntryEditor
@alexandre-cremieux
Copy link
Contributor Author

alexandre-cremieux commented Jan 11, 2025

Hello @koppor.

Now, only preference for the MV store TTL time is missing. I thought about adding it under Web search preferences like this:

Citation relations search cache time-to-live (in days): [value-here]

Default value would be 7.

It would be added under a new category: Citation relations web search

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants