Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: Handprint: a program to explore and compare major cloud-based services for handwritten text recognition #4328

Closed
editorialbot opened this issue Apr 20, 2022 · 55 comments

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Apr 20, 2022

Submitting author: @mhucka (Michael Hucka)
Repository: https://github.com/caltechlibrary/handprint
Branch with paper.md (empty if default branch): joss-paper
Version: 1.5.6
Editor: @danielskatz
Reviewers: @step21, @rlskoeser
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8"><img src="https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8/status.svg)](https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@step21 & @rlskoeser, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @step21

📝 Checklist for @rlskoeser

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.88  T=0.05 s (289.4 files/s, 24346.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                         6            217              0            396
SVG                              2              1              1            182
make                             1             35             15            103
TeX                              1             10              0             75
YAML                             2             12             28             59
JSON                             1              0              0             28
HTML                             1              5              0             11
-------------------------------------------------------------------------------
SUM:                            14            280             44            854
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

@editorialbot
Copy link
Collaborator Author

Wordcount for paper.md is 958

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/1457720.1457765 is OK
- 10.1108/JD-07-2018-0114 is OK

MISSING DOIs

- 10.1007/springerreference_18289 may be a valid DOI for title: Magnetic Ink Character Recognition

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@danielskatz
Copy link

@step21 and @rlskoeser - Thanks for agreeing to review this submission.
This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4328 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use Whedon (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

@step21
Copy link

step21 commented Apr 20, 2022

Review checklist for @step21

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/caltechlibrary/handprint?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@mhucka) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@rlskoeser
Copy link

rlskoeser commented Apr 21, 2022

Review checklist for @rlskoeser

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/caltechlibrary/handprint?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@mhucka) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@danielskatz
Copy link

👋 @step21 & @rlskoeser - A couple of weeks in, I just wanted to check on how the reviews are coming.

@rlskoeser
Copy link

@danielskatz thanks for the nudge. Last week was busy, but I should be able to spend some time on it this week.

@rlskoeser
Copy link

@danielskatz question about the license: it's described in the readme as a "a BSD/MIT type license" but I don't think it's actually any of the licenses on the OSI list.

Here's the full description in the readme:

Software produced by the Caltech Library is Copyright © 2021–2022 California Institute of Technology. This software is freely distributed under a BSD/MIT type license.

Please advise.

@danielskatz
Copy link

It's almost https://opensource.org/licenses/BSD-3-Clause but not quite, and not quite isn't good enough, as it's not an OSI-approved license then.

@mhucka, can you remove line 2 ("All rights not granted herein are expressly reserved by Caltech.") from the license files?

And change the readme to say that this uses the 3-Clause BSD License, rather than "a BSD/MIT type license"?

@mhucka
Copy link

mhucka commented May 5, 2022

Well, hmmm, this was handed down to us by the institute's IP devision, but nevertheless, I'm asking if this policy can be revisited. I'll apprise people here of the answer once I know something.

@danielskatz
Copy link

Ok - @mhucka, just so that it's clear, we can't publish this work without an OSI-approved license

@mhucka
Copy link

mhucka commented May 6, 2022

Yup, understood.

@danielskatz
Copy link

@mhucka - any news on this license issue?

@mhucka
Copy link

mhucka commented May 23, 2022

Not yet. I just pinged them again ...

@danielskatz
Copy link

HI @mhucka - I'm just checking on this again...

@mhucka
Copy link

mhucka commented Jun 22, 2022

@danielskatz At precisely 5:01 pm today, I finally received an email in answer to the license question. The answer is, yes, for this project (and only this project :-)) we can change the license to be straight BSD. I will update the license file and any associated files and issue a new release of the software. It will not take long – hopefully tomorrow or Thursday at the latest.

@danielskatz
Copy link

danielskatz commented Jun 22, 2022

👋 @step21 & @rlskoeser, sorry for the delay - we might have caught the license issue earlier.

But in any case, we should now proceed with (continue) the review of the software. I look forward to seeing any more issues you find, or seeing the lack of them. 🙂

@danielskatz
Copy link

👋 @step21 & @rlskoeser - please let me know if there's anything blocking you from continuing your reviews

@step21
Copy link

step21 commented Jun 23, 2022

Hey,

there is probably nothing blocking me, but I have been sick all week. This definitely does not increase my review capability as I am now behind on everything. I hope to be able to work again next week but in all likelihood, at least for a thorough review I can only start the week after.

@danielskatz
Copy link

@step21 - thanks for the update

@mhucka
Copy link

mhucka commented Jun 25, 2022

I've released version 1.6.0 of the software, with a proper unmodified BSD 3-clause license.

@mhucka
Copy link

mhucka commented Jul 24, 2022

@rlskoeser Thanks for spotting that. I guess the html version is not really needed anyway, so I'll delete it right now. Thanks again!

mhucka added a commit to caltechlibrary/handprint that referenced this issue Jul 24, 2022
As [pointed out by
@rlskoeser](openjournals/joss-reviews#4328 (comment)),
the LICENSE.html file confuses GitHub's license navigator. I don't
think I use it for anything anymore, so let's just delete it.
@rlskoeser
Copy link

@mhucka great. Checking off the license now on my checklist. 🙂

@step21
Copy link

step21 commented Jul 26, 2022

I had some trouble getting testing access at least with google. Azure worked better but did not test yet. Haven't tried Amazon yet. If you would have any test credentials, this could help as I think I might have to sign up for a whole new Google account to get testing credit again.

@step21
Copy link

step21 commented Jul 26, 2022

Issue re paper that there is no comparison to the field and maybe too much much usage description in the paper. caltechlibrary/handprint#38

@step21
Copy link

step21 commented Jul 26, 2022

Also I would prefer if a direct usage example would also be present in the wiki, not only in the documentation site. caltechlibrary/handprint#39

@rlskoeser
Copy link

I agree with @step21 about the paper. Added some comments on caltechlibrary/handprint#38 to elaborate on what could be condensed/simplified and what I'm interested to learn more about that isn't currently included.

@step21
Copy link

step21 commented Jul 29, 2022

@rlskoeser did you do any functionality testing yet? My main problem with that is that I did not yet manage to get a google account with free credit - only for azure, and amazon I haven't tried yet. Mostly I would probably test with images from the tests in the software repository, though these are limited as they these are just a few examples of course.

@rlskoeser
Copy link

@step21 I tested against google because I already had a project setup that I was able to create credentials for. I don't have projects and accounts with the others and was holding off setting them up until I have a block of time that I'm confident I'll be able to get everything set up and tested.

I tested with an image from a project I'm currently working on, Princeton Geniza Project because I'm genuinely interested to know how well these services can do with this content (handwritten medieval content in Hebrew and Arabic script and a variety of languages).

@rlskoeser
Copy link

Code documentation doesn't include a statement of need; the statement of need in the JOSS paper should clarify the target audience.

caltechlibrary/handprint#40

@step21
Copy link

step21 commented Jul 29, 2022

@rlskoeser ok, thanks for the info, that is helpful.

@rlskoeser
Copy link

rlskoeser commented Jul 29, 2022

Not sure whether or not to check off the automated tests — there are some python tests, but it's a bit hard to tell which parts of the code they cover. There is no CI set up as far as I can tell, and no documentation on how to run the tests, although I was able to run them fairly easily with pytest. (It would probably be straightforward to set up GitHub Actions to install the app and run pytest; also easy to document running them, maybe in the contributing doc.)

I looked at the reviewer guidelines and it doesn't seem like this quite fits in any of those categories — not automated CI, not documented; don't see any instructions on how to test expected functionality manually with sample input. Also not sure how to judge if the tests cover the "core functionality" of the application.

@danielskatz what do you advise?

@rlskoeser
Copy link

There is thorough documentation for installing, configuring, and using handprint from the command line. However, I'm not finding any code-level documentation.

@mhucka in the JOSS paper under the statement of need, you say that handprint could be used in "scripts as part of automated workflows." Is it your expectation that anyone using it that way would call handprint from the command line? If I'm working in python, could I use functionality from handprint as a code library without resorting to a command line call? Does handprint expose any kind of reliable API that would be safe/appropriate/reliable to use this way, or is the code (or part of it) internal and shouldn't be used that way?

My sense of your options is that you can either:

  • clarify the target audience and usage to make it clear that handprint only exposes a CLI and is not intended to be used as a python library
  • add python api documentation and example usage for the methods that someone would need if they wanted to use handprint functionality in other python code.

Maybe @step21 or @danielskatz will have thoughts on other options here.

I'm probably thinking about it this way because as a python developer, I'd want to call it from my code for anything automated / bulk operation; but maybe I'm not your target audience! :-) If you don't want to expose a python API, it would be nice to add links to other existing tools/libraries to use instead for that kind of work. I'm trying to understand what happens after someone uses handprint to evaluate options: once they determine the best solution, can they keep using handprint or do they need to / should they switch to a different tool?

@step21
Copy link

step21 commented Aug 8, 2022

@rlskoeser I also think it looks more like it is meant to be automated via shellscript (or Python calling a shellscript)

I tested it with Azure for now on the IAM Handwriting Database. (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) - the functionality works as far asI can tell. (with sporadic checks of results)

It is not especially fast, but as far as I can see there are also no claims to that end, and it might depend on the service and account. This could probably be sped with parallel processing, if the service allows it.
In addition, I found some caveats mentioned in the docs which might be useful to put in a more prominent place:

If the input has multiple pages, only the first page/image is used; the rest (if any) are ignored.

    The Amazon Rekognition API will return [at most 50 words in an image](https://docs.aws.amazon.com/rekognition/latest/dg/limits.html).

    The Microsoft Azure API will only detect a maximum of [300 lines of text per page](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-recognizing-text).

    Some services have different file size restrictions depending on the format of the file, but Handprint always uses the same limit for all files for a given service. This is a code simplification.

Especially the Amazon limitation seems kind of low to me.

@danielskatz
Copy link

👋 @mhucka - any response on the 3 comments above from @rlskoeser and @step21?

@danielskatz
Copy link

Not sure whether or not to check off the automated tests — there are some python tests, but it's a bit hard to tell which parts of the code they cover. There is no CI set up as far as I can tell, and no documentation on how to run the tests, although I was able to run them fairly easily with pytest. (It would probably be straightforward to set up GitHub Actions to install the app and run pytest; also easy to document running them, maybe in the contributing doc.)

I looked at the reviewer guidelines and it doesn't seem like this quite fits in any of those categories — not automated CI, not documented; don't see any instructions on how to test expected functionality manually with sample input. Also not sure how to judge if the tests cover the "core functionality" of the application.

@danielskatz what do you advise?

It looks like a bit more is needed from @mhucka. As you say, JOSS does not require CI and automated tests, but the manual tests do need to be clear to understand what they do and what is being tested.

@danielskatz
Copy link

👋 @mhucka - we need your input/actions here!!

@mhucka
Copy link

mhucka commented Aug 22, 2022

Regarding @step21's comment about "Especially the Amazon limitation seems kind of low to me": that's from Amazon's service – there is nothing I can do about that. I tried to make it clear that it's the API that returns those results.

Regarding @step21's comment about "This could probably be sped with parallel processing": actually, Handprint already uses parallel processing. It sends the input images to separate services in parallel. There is not much that can be done about parallelizing the processing of the output of a given service for a given page, unfortunately.

@mhucka
Copy link

mhucka commented Aug 22, 2022

I appreciate people's efforts and comments.

Regarding comparison to the field: doing a proper exploration of the state of the field today would take hours of work, which unfortunately I don't have. The last time I looked, there were no comparable tools, which is why there is no comparison in the paper currently.

If the need for a review of the state of the field is necessary, then I'm afraid I'm going to have to shelve this, because I regret I simply don't have the time.

@danielskatz
Copy link

@mhucka - one of the JOSS review criteria is "State of the field: Do the authors describe how this software compares to other commonly-used packages?" If you think there are no other commonly-used packages that are comparable, you might just say so in the paper.

@danielskatz
Copy link

@mhucka - we do need to keep making progress on this

@mhucka
Copy link

mhucka commented Sep 6, 2022

I am sorry, but I lack the time to do more right now. Realistically, the options for me right now are: (1) pause for a couple of months or (2) withdraw the submission.

@danielskatz
Copy link

👋 @mhucka - I think we should mark this as withdrawn, which I will do. If you decide to resubmit, please mention this issue (#4328) when you do so, and we can see if the same reviewers are available.

👋 @step21 & @rlskoeser, I'm sorry that your effort will not lead to a JOSS publication at this point, but thanks very much for your work in any case.

@danielskatz
Copy link

@editorialbot withdraw

@editorialbot
Copy link
Collaborator Author

Paper withdrawn.

@rlskoeser
Copy link

Thanks @danielskatz for your work on it too. @mhucka sorry it didn't work out at this point.

@mhucka
Copy link

mhucka commented Sep 9, 2022

@rlskoeser @step21 Thank you, very much, for your time and efforts on the review.

@step21
Copy link

step21 commented Sep 15, 2022

👋 @mhucka - I think we should mark this as withdrawn, which I will do. If you decide to resubmit, please mention this issue (#4328) when you do so, and we can see if the same reviewers are available.

👋 @step21 & @rlskoeser, I'm sorry that your effort will not lead to a JOSS publication at this point, but thanks very much for your work in any case.

No worries, it was a pleasure. A good rest of the week to everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants