[REVIEW]: Handprint: a program to explore and compare major cloud-based services for handwritten text recognition #4328

editorialbot · 2022-04-20T12:47:51Z

Submitting author: @mhucka (Michael Hucka)
Repository: https://github.com/caltechlibrary/handprint
Branch with paper.md (empty if default branch): joss-paper
Version: 1.5.6
Editor: @danielskatz
Reviewers: @step21, @rlskoeser
Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8"><img src="https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8/status.svg)](https://joss.theoj.org/papers/3467be8d3425eb6955c77af7c47cf2e8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@step21 & @rlskoeser, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @step21

📝 Checklist for @rlskoeser

The text was updated successfully, but these errors were encountered:

editorialbot · 2022-04-20T12:47:52Z

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot · 2022-04-20T12:48:06Z

Software report:

github.com/AlDanial/cloc v 1.88  T=0.05 s (289.4 files/s, 24346.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                         6            217              0            396
SVG                              2              1              1            182
make                             1             35             15            103
TeX                              1             10              0             75
YAML                             2             12             28             59
JSON                             1              0              0             28
HTML                             1              5              0             11
-------------------------------------------------------------------------------
SUM:                            14            280             44            854
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

editorialbot · 2022-04-20T12:48:08Z

Wordcount for paper.md is 958

editorialbot · 2022-04-20T12:48:36Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/1457720.1457765 is OK
- 10.1108/JD-07-2018-0114 is OK

MISSING DOIs

- 10.1007/springerreference_18289 may be a valid DOI for title: Magnetic Ink Character Recognition

INVALID DOIs

- None

editorialbot · 2022-04-20T12:49:07Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

danielskatz · 2022-04-20T12:51:04Z

@step21 and @rlskoeser - Thanks for agreeing to review this submission.
This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4328 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use Whedon (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

step21 · 2022-04-20T23:32:44Z

Review checklist for @step21

Conflict of interest

I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

I confirm that I read and will adhere to the JOSS code of conduct.

General checks

Repository: Is the source code for this software available at the https://github.com/caltechlibrary/handprint?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Contribution and authorship: Has the submitting author (@mhucka) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
State of the field: Do the authors describe how this software compares to other commonly-used packages?
Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

rlskoeser · 2022-04-21T22:22:37Z

Review checklist for @rlskoeser

Conflict of interest

I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

I confirm that I read and will adhere to the JOSS code of conduct.

General checks

Repository: Is the source code for this software available at the https://github.com/caltechlibrary/handprint?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Contribution and authorship: Has the submitting author (@mhucka) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
State of the field: Do the authors describe how this software compares to other commonly-used packages?
Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

danielskatz · 2022-05-02T17:37:36Z

👋 @step21 & @rlskoeser - A couple of weeks in, I just wanted to check on how the reviews are coming.

rlskoeser · 2022-05-03T12:59:12Z

@danielskatz thanks for the nudge. Last week was busy, but I should be able to spend some time on it this week.

rlskoeser · 2022-05-04T19:42:02Z

@danielskatz question about the license: it's described in the readme as a "a BSD/MIT type license" but I don't think it's actually any of the licenses on the OSI list.

Here's the full description in the readme:

Software produced by the Caltech Library is Copyright © 2021–2022 California Institute of Technology. This software is freely distributed under a BSD/MIT type license.

Please advise.

danielskatz · 2022-05-04T19:59:29Z

It's almost https://opensource.org/licenses/BSD-3-Clause but not quite, and not quite isn't good enough, as it's not an OSI-approved license then.

@mhucka, can you remove line 2 ("All rights not granted herein are expressly reserved by Caltech.") from the license files?

And change the readme to say that this uses the 3-Clause BSD License, rather than "a BSD/MIT type license"?

mhucka · 2022-05-05T22:05:10Z

Well, hmmm, this was handed down to us by the institute's IP devision, but nevertheless, I'm asking if this policy can be revisited. I'll apprise people here of the answer once I know something.

danielskatz · 2022-05-06T00:30:08Z

Ok - @mhucka, just so that it's clear, we can't publish this work without an OSI-approved license

mhucka · 2022-05-06T00:31:57Z

Yup, understood.

danielskatz · 2022-05-23T12:51:18Z

@mhucka - any news on this license issue?

mhucka · 2022-05-23T16:41:04Z

Not yet. I just pinged them again ...

danielskatz · 2022-06-08T12:32:23Z

HI @mhucka - I'm just checking on this again...

mhucka · 2022-06-22T00:14:39Z

@danielskatz At precisely 5:01 pm today, I finally received an email in answer to the license question. The answer is, yes, for this project (and only this project :-)) we can change the license to be straight BSD. I will update the license file and any associated files and issue a new release of the software. It will not take long – hopefully tomorrow or Thursday at the latest.

danielskatz · 2022-06-22T00:31:02Z

👋 @step21 & @rlskoeser, sorry for the delay - we might have caught the license issue earlier.

But in any case, we should now proceed with (continue) the review of the software. I look forward to seeing any more issues you find, or seeing the lack of them. 🙂

danielskatz · 2022-06-23T14:42:51Z

👋 @step21 & @rlskoeser - please let me know if there's anything blocking you from continuing your reviews

step21 · 2022-06-23T14:58:22Z

Hey,

there is probably nothing blocking me, but I have been sick all week. This definitely does not increase my review capability as I am now behind on everything. I hope to be able to work again next week but in all likelihood, at least for a thorough review I can only start the week after.

danielskatz · 2022-06-23T15:10:13Z

@step21 - thanks for the update

mhucka · 2022-06-25T02:27:26Z

I've released version 1.6.0 of the software, with a proper unmodified BSD 3-clause license.

mhucka · 2022-07-24T19:00:05Z

@rlskoeser Thanks for spotting that. I guess the html version is not really needed anyway, so I'll delete it right now. Thanks again!

@rlskoeser

As [pointed out by @rlskoeser](openjournals/joss-reviews#4328 (comment)), the LICENSE.html file confuses GitHub's license navigator. I don't think I use it for anything anymore, so let's just delete it.

rlskoeser · 2022-07-25T13:45:21Z

@mhucka great. Checking off the license now on my checklist. 🙂

step21 · 2022-07-26T19:47:51Z

I had some trouble getting testing access at least with google. Azure worked better but did not test yet. Haven't tried Amazon yet. If you would have any test credentials, this could help as I think I might have to sign up for a whole new Google account to get testing credit again.

step21 · 2022-07-26T20:12:00Z

Issue re paper that there is no comparison to the field and maybe too much much usage description in the paper. caltechlibrary/handprint#38

step21 · 2022-07-26T20:12:34Z

Also I would prefer if a direct usage example would also be present in the wiki, not only in the documentation site. caltechlibrary/handprint#39

rlskoeser · 2022-07-29T20:27:59Z

I agree with @step21 about the paper. Added some comments on caltechlibrary/handprint#38 to elaborate on what could be condensed/simplified and what I'm interested to learn more about that isn't currently included.

step21 · 2022-07-29T20:38:44Z

@rlskoeser did you do any functionality testing yet? My main problem with that is that I did not yet manage to get a google account with free credit - only for azure, and amazon I haven't tried yet. Mostly I would probably test with images from the tests in the software repository, though these are limited as they these are just a few examples of course.

rlskoeser · 2022-07-29T20:42:30Z

@step21 I tested against google because I already had a project setup that I was able to create credentials for. I don't have projects and accounts with the others and was holding off setting them up until I have a block of time that I'm confident I'll be able to get everything set up and tested.

I tested with an image from a project I'm currently working on, Princeton Geniza Project because I'm genuinely interested to know how well these services can do with this content (handwritten medieval content in Hebrew and Arabic script and a variety of languages).

rlskoeser · 2022-07-29T20:48:36Z

Code documentation doesn't include a statement of need; the statement of need in the JOSS paper should clarify the target audience.

caltechlibrary/handprint#40

step21 · 2022-07-29T20:54:58Z

@rlskoeser ok, thanks for the info, that is helpful.

rlskoeser · 2022-07-29T21:03:46Z

Not sure whether or not to check off the automated tests — there are some python tests, but it's a bit hard to tell which parts of the code they cover. There is no CI set up as far as I can tell, and no documentation on how to run the tests, although I was able to run them fairly easily with pytest. (It would probably be straightforward to set up GitHub Actions to install the app and run pytest; also easy to document running them, maybe in the contributing doc.)

I looked at the reviewer guidelines and it doesn't seem like this quite fits in any of those categories — not automated CI, not documented; don't see any instructions on how to test expected functionality manually with sample input. Also not sure how to judge if the tests cover the "core functionality" of the application.

@danielskatz what do you advise?

rlskoeser · 2022-07-29T21:35:33Z

There is thorough documentation for installing, configuring, and using handprint from the command line. However, I'm not finding any code-level documentation.

@mhucka in the JOSS paper under the statement of need, you say that handprint could be used in "scripts as part of automated workflows." Is it your expectation that anyone using it that way would call handprint from the command line? If I'm working in python, could I use functionality from handprint as a code library without resorting to a command line call? Does handprint expose any kind of reliable API that would be safe/appropriate/reliable to use this way, or is the code (or part of it) internal and shouldn't be used that way?

My sense of your options is that you can either:

clarify the target audience and usage to make it clear that handprint only exposes a CLI and is not intended to be used as a python library
add python api documentation and example usage for the methods that someone would need if they wanted to use handprint functionality in other python code.

Maybe @step21 or @danielskatz will have thoughts on other options here.

I'm probably thinking about it this way because as a python developer, I'd want to call it from my code for anything automated / bulk operation; but maybe I'm not your target audience! :-) If you don't want to expose a python API, it would be nice to add links to other existing tools/libraries to use instead for that kind of work. I'm trying to understand what happens after someone uses handprint to evaluate options: once they determine the best solution, can they keep using handprint or do they need to / should they switch to a different tool?

step21 · 2022-08-08T13:59:09Z

@rlskoeser I also think it looks more like it is meant to be automated via shellscript (or Python calling a shellscript)

I tested it with Azure for now on the IAM Handwriting Database. (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) - the functionality works as far asI can tell. (with sporadic checks of results)

It is not especially fast, but as far as I can see there are also no claims to that end, and it might depend on the service and account. This could probably be sped with parallel processing, if the service allows it.
In addition, I found some caveats mentioned in the docs which might be useful to put in a more prominent place:

If the input has multiple pages, only the first page/image is used; the rest (if any) are ignored.

    The Amazon Rekognition API will return [at most 50 words in an image](https://docs.aws.amazon.com/rekognition/latest/dg/limits.html).

    The Microsoft Azure API will only detect a maximum of [300 lines of text per page](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-recognizing-text).

    Some services have different file size restrictions depending on the format of the file, but Handprint always uses the same limit for all files for a given service. This is a code simplification.

Especially the Amazon limitation seems kind of low to me.

danielskatz · 2022-08-08T20:54:32Z

👋 @mhucka - any response on the 3 comments above from @rlskoeser and @step21?

danielskatz · 2022-08-08T20:56:04Z

Not sure whether or not to check off the automated tests — there are some python tests, but it's a bit hard to tell which parts of the code they cover. There is no CI set up as far as I can tell, and no documentation on how to run the tests, although I was able to run them fairly easily with pytest. (It would probably be straightforward to set up GitHub Actions to install the app and run pytest; also easy to document running them, maybe in the contributing doc.)

I looked at the reviewer guidelines and it doesn't seem like this quite fits in any of those categories — not automated CI, not documented; don't see any instructions on how to test expected functionality manually with sample input. Also not sure how to judge if the tests cover the "core functionality" of the application.

@danielskatz what do you advise?

It looks like a bit more is needed from @mhucka. As you say, JOSS does not require CI and automated tests, but the manual tests do need to be clear to understand what they do and what is being tested.

danielskatz · 2022-08-18T21:42:30Z

👋 @mhucka - we need your input/actions here!!

mhucka · 2022-08-22T17:22:56Z

Regarding @step21's comment about "Especially the Amazon limitation seems kind of low to me": that's from Amazon's service – there is nothing I can do about that. I tried to make it clear that it's the API that returns those results.

Regarding @step21's comment about "This could probably be sped with parallel processing": actually, Handprint already uses parallel processing. It sends the input images to separate services in parallel. There is not much that can be done about parallelizing the processing of the output of a given service for a given page, unfortunately.

mhucka · 2022-08-22T17:34:11Z

I appreciate people's efforts and comments.

Regarding comparison to the field: doing a proper exploration of the state of the field today would take hours of work, which unfortunately I don't have. The last time I looked, there were no comparable tools, which is why there is no comparison in the paper currently.

If the need for a review of the state of the field is necessary, then I'm afraid I'm going to have to shelve this, because I regret I simply don't have the time.

danielskatz · 2022-08-22T17:44:57Z

@mhucka - one of the JOSS review criteria is "State of the field: Do the authors describe how this software compares to other commonly-used packages?" If you think there are no other commonly-used packages that are comparable, you might just say so in the paper.

danielskatz · 2022-08-31T20:31:11Z

@mhucka - we do need to keep making progress on this

mhucka · 2022-09-06T17:52:04Z

I am sorry, but I lack the time to do more right now. Realistically, the options for me right now are: (1) pause for a couple of months or (2) withdraw the submission.

danielskatz · 2022-09-07T07:42:45Z

👋 @mhucka - I think we should mark this as withdrawn, which I will do. If you decide to resubmit, please mention this issue (#4328) when you do so, and we can see if the same reviewers are available.

👋 @step21 & @rlskoeser, I'm sorry that your effort will not lead to a JOSS publication at this point, but thanks very much for your work in any case.

danielskatz · 2022-09-07T07:42:53Z

@editorialbot withdraw

editorialbot · 2022-09-07T07:42:54Z

Paper withdrawn.

rlskoeser · 2022-09-07T13:08:51Z

Thanks @danielskatz for your work on it too. @mhucka sorry it didn't work out at this point.

mhucka · 2022-09-09T21:39:28Z

@rlskoeser @step21 Thank you, very much, for your time and efforts on the review.

step21 · 2022-09-15T14:48:34Z

👋 @mhucka - I think we should mark this as withdrawn, which I will do. If you decide to resubmit, please mention this issue (#4328) when you do so, and we can see if the same reviewers are available.

👋 @step21 & @rlskoeser, I'm sorry that your effort will not lead to a JOSS publication at this point, but thanks very much for your work in any case.

No worries, it was a pleasure. A good rest of the week to everyone!

editorialbot added Makefile Python review TeX labels Apr 20, 2022

editorialbot assigned danielskatz Apr 20, 2022

editorialbot mentioned this issue Apr 20, 2022

[PRE REVIEW]: Handprint: a program to explore and compare major cloud-based services for handwritten text recognition #4292

Closed

editorialbot added the withdrawn label Sep 7, 2022

editorialbot closed this as completed Sep 7, 2022

[REVIEW]: Handprint: a program to explore and compare major cloud-based services for handwritten text recognition #4328

[REVIEW]: Handprint: a program to explore and compare major cloud-based services for handwritten text recognition #4328

Comments

editorialbot commented Apr 20, 2022 • edited Loading

Status

Reviewer instructions & questions

Checklists

editorialbot commented Apr 20, 2022

editorialbot commented Apr 20, 2022

editorialbot commented Apr 20, 2022

editorialbot commented Apr 20, 2022

editorialbot commented Apr 20, 2022

danielskatz commented Apr 20, 2022

step21 commented Apr 20, 2022 • edited Loading

Review checklist for @step21

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

rlskoeser commented Apr 21, 2022 • edited Loading

Review checklist for @rlskoeser

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

danielskatz commented May 2, 2022

rlskoeser commented May 3, 2022

rlskoeser commented May 4, 2022

danielskatz commented May 4, 2022

mhucka commented May 5, 2022

danielskatz commented May 6, 2022

mhucka commented May 6, 2022

danielskatz commented May 23, 2022

mhucka commented May 23, 2022

danielskatz commented Jun 8, 2022

mhucka commented Jun 22, 2022

danielskatz commented Jun 22, 2022 • edited Loading

danielskatz commented Jun 23, 2022

step21 commented Jun 23, 2022

danielskatz commented Jun 23, 2022

mhucka commented Jun 25, 2022

mhucka commented Jul 24, 2022

rlskoeser commented Jul 25, 2022

step21 commented Jul 26, 2022

step21 commented Jul 26, 2022 • edited Loading

step21 commented Jul 26, 2022

rlskoeser commented Jul 29, 2022

step21 commented Jul 29, 2022

rlskoeser commented Jul 29, 2022

rlskoeser commented Jul 29, 2022

step21 commented Jul 29, 2022

rlskoeser commented Jul 29, 2022 • edited Loading

rlskoeser commented Jul 29, 2022

step21 commented Aug 8, 2022

danielskatz commented Aug 8, 2022

danielskatz commented Aug 8, 2022

danielskatz commented Aug 18, 2022

mhucka commented Aug 22, 2022 • edited Loading

mhucka commented Aug 22, 2022

danielskatz commented Aug 22, 2022

danielskatz commented Aug 31, 2022

mhucka commented Sep 6, 2022

danielskatz commented Sep 7, 2022

danielskatz commented Sep 7, 2022

editorialbot commented Sep 7, 2022

rlskoeser commented Sep 7, 2022

mhucka commented Sep 9, 2022

step21 commented Sep 15, 2022

editorialbot commented Apr 20, 2022 •

edited

Loading

step21 commented Apr 20, 2022 •

edited

Loading

rlskoeser commented Apr 21, 2022 •

edited

Loading

danielskatz commented Jun 22, 2022 •

edited

Loading

step21 commented Jul 26, 2022 •

edited

Loading

rlskoeser commented Jul 29, 2022 •

edited

Loading

mhucka commented Aug 22, 2022 •

edited

Loading