Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot open PDF files with commas in the name #371

Closed
MassimoLauria opened this issue Apr 14, 2021 · 10 comments
Closed

Cannot open PDF files with commas in the name #371

MassimoLauria opened this issue Apr 14, 2021 · 10 comments

Comments

@MassimoLauria
Copy link

MassimoLauria commented Apr 14, 2021

I've never had this issue until some time ago but now I cannot manage to open PDF files attached to a bib entry when the filename contains a comma. I use file field to attach PDFs.

I can normally open files with ASCII filenames not containing neither : (colon), nor ; (semicolon), nor , (comma). Nevertheless my understanding was that colons and semicolons only were reserved for the file field, and that commas were allowed. Indeed I used commas for years with no issues.

Here's an example:

(setq bibtex-completion-library-path "/home/massimo/cloud/Papers/"
      bibtex-completion-pdf-field "file")

(setq
 examplegood '(("=key=" . "Bell2020AutomatingRegular")
               ("=type=" . "article")
               ("author" . "Zoe Bell")
               ("title" . "Automating Regular or Ordered Resolution is NP-Hard")
               ("pages" . "")
               ("journal" . "Electronic Colloquium on Computational Complexity {(ECCC)}")
               ("year" . "2020")
               ("volume" . "105")
               ("file" . ":Bell (2020) - Automating Regular or Ordered Resolution is NP-Hard.pdf:PDF")
               ("url" . "https://eccc.weizmann.ac.il/report/2020/105"))

 examplebad '(("=key=" . "GurevichShelah1996FiniteRigid")
              ("=type=" . "article")
              ("author" . "Yuri Gurevich and Saharon Shelah")
              ("title" . "On Finite Rigid Structures")
              ("file" . ":Gurevich, Shelah (1996) - On Finite Rigid Structures.pdf:PDF")
              ("journal" . "J. Symb. Log.")
              ("year" . "1996")
              ("volume" . "61")
              ("number" . "2")
              ("pages" . "549\\nobreakdash--562")
              ("url" . "https://doi.org/10.2307/2275675")
              ("doi" . "10.2307/2275675")))

(bibtex-completion-get-value "file" examplebad)    ;;finds the path
(bibtex-completion-get-value "file" examplegood)   ;;finds the path

(bibtex-completion-find-pdf-in-field examplegood)  ;;finds the path
(bibtex-completion-find-pdf-in-field examplebad)   ;; returns nil

The likely culprit is the expression

(replace-regexp-in-string "\\([^\\]\\)[;,]" "\\1\^^" value)

in the function bibtex-completion-find-pdf-in-field. It kills the commas because it consider them separators in the file field.
I don't know what is the formal spec for the file field, but I am pretty sure comma were not a problem until recently.

@tmalsburg
Copy link
Owner

Calibre uses commas to separate multiple PDFs. See #360. The trouble is that there's no standard syntax for the file field and every bibliography manager uses their own variant. That's the reason why I personally don't use the file field. In my setup, the filenames of PDFs follow the scheme [BibTeX-key].pdf. This also speeds up parsing of the bibliography a bit.

@tmalsburg
Copy link
Owner

tmalsburg commented Apr 15, 2021

I'd probably just mass-replace commas by underscores (or similar). Easy to do with dired (M-x M-q).

Edit: Closing because, I'm afraid there is nothing we can do to resolve this conflict. Feel free to reopen if you have an idea.

@MassimoLauria
Copy link
Author

Thank you for the quick answer.

I'd rather not touch the paper file names for various reasons:

  1. there may be other references to it, since a filename is a "public API"
  2. they are more readable when I look for them outside emacs (tablet readers, ...)
  3. machines should adapt to human formats, not viceversa ;)
  4. I want to decouple file names and bibtex keys because sometimes some local fixes are needed on both sides in a long term bibtex DB

I will likely add a configuration variable: if we specify the methods of attachment (i.e. file field) why not specifying also the convention that given field uses? (With retrocompatible defaults). I'll do a pull request eventually and then you will decide what to do with it.

Question: where do you find the Zotero, Calibre, ... file field conventions? I did not even know that Calibre could export bib file with attached documents.

@tmalsburg
Copy link
Owner

tmalsburg commented Apr 15, 2021

Good reasons to stick with your current names. But I don't think a config option is the right way to go. The assumption of the current code is that there is a standard format that works for all users. But that assumption simply doesn't hold. What we need is a solution that recognizes the reality that every bibliography manager has its own dialect. Some kind of plug-on system. For instance, a function bibtex-completion-find-pdf-calibre and so on. Then users can select the right plug-in or even specify multiple plug-ins in case their bibliographies are messy (combined from multiple sources). Users would also be able to easily supply functions for other dialects that are not covered (yet).

Question: where do you find the Zotero, Calibre, ... file field conventions? I did not even know that Calibre could export bib file with attached documents.

There is no written convention that I'm aware of. It's all reverse-engineered. :)

@tmalsburg
Copy link
Owner

The approach that I describe above (plug-ins) shouldn't be too difficult to implement actually and it would address many related issues that have accumulated over time. I just didn't find the time to implement this yet. If you feel inspired, let me know and I will be happy to provide input.

@MassimoLauria
Copy link
Author

We'll let's start with opening the issue again. I'll try to give it a shot eventually.

@tmalsburg tmalsburg reopened this Apr 15, 2021
yuchen-lea added a commit to yuchen-lea/helm-bibtex that referenced this issue Aug 1, 2021
@yuchen-lea
Copy link
Contributor

Sorry for importing the bug... I changed to a more strict regular expression, in order to make it work both for comma spliited bib and comma inside file name.

Hope this fix the issue. #385

@tmalsburg
Copy link
Owner

tmalsburg commented Aug 1, 2021

Thanks for the PR, @yuchen-lea. The diff is not terribly helpful (as is often the case for lisp code). Could you briefly summarize how you solved the problem? Thank you!

@yuchen-lea
Copy link
Contributor

I changed the original

(replace-regexp-in-string "\\([^\\]\\)[;,]" "\\1\^^" value)

to a function:

(defun bibtex-completion-get-file-record (pdf-field-value)
  "Return the splitted list of record from PDF-FIELD-VALUE"
  ; Zotero/Mendeley/JabRef format:
  (setq pdf-field-value (replace-regexp-in-string "\\([^\\]\\);" "\\1\^^" pdf-field-value))
  ; Calibre format:
  (setq pdf-field-value (replace-regexp-in-string "\\(\.[A-Za-z0-9]+:[A-Za-z0-9]+\\)," "\\1\^^" pdf-field-value))
  (s-split "\^^" pdf-field-value))

In this way, it will only replace the comma which splits multiple records, while keep the comma in file name.

yuchen-lea added a commit to yuchen-lea/helm-bibtex that referenced this issue Aug 6, 2021
@tmalsburg
Copy link
Owner

@yuchen-lea, sorry for the slow response. The code for finding PDFs is so incredibly messy (entirely my fault) that I hesitate to make further changes to it. In this particular case, I worry that we may fix things for some users and break things for others. It's become so hard to predict.

We really need a flexible plug-in approach which allows users to tailor finding PDFs to their own needs and bibliographies. The whole idea that there is a single approach that suits everyone was a mistake. I completely underestimated how many different formats there are. The good news is that it should be difficult to come up with a clean system and it's probably also going to speed up loading the library because we only need to consider the relevant cases, not all possible cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants