-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatically align jdsw with cleaned source texts #11
Comments
Following up on this, as I will soon approach 2.d) from #10, to look into what's going on with the keys that couldn't be assigned. Ideally, I would identify an underlying logic to that issue. |
last night I ran the algorithm from #10 on everything in out/jdsw (except the laozi, which we don't have an sbck edition of). if you take a peek at those files, you should see in the third column a note about whether the jdsw annotation matched the source text (in the sbck edition), the commentary (in the sbck edition), or wasn't found. taking a close look to see if the algorithm seems to be correct (and why things aren't found) would be super helpful at this point. after that, depending on what you find, I'll implement the logic in this issue (which should be pretty similar to #10) in another script. when that script runs, it'll copy any of the annotations from out/jdsw that have the "source" note in the third column and paste them into the |
Sounds perfect. I'll take the time this weekend and/or early next week to take a deep dive into this, and will keep you posted on how well that algorithm does. Just following up on this: "except the laozi, which we don't have an sbck edition of" -- you might have overlooked this SBCK edition thereof? |
oh — indeed I did! the script that converts it was looking for a file named something like |
That must have been my mistake -- sorry about the misnomer there! |
note to self: it's worth trying the needleman-wunsch global alignment algorithm here, just to see how it performs vs our homegrown one. |
@GDRom has already done some work to do this manually; we want to see if we can automate it.
#10 is a prerequisite for getting the JDSW in shape to align.
#9 is a prerequisite for getting 正文 versions to align to.
this uses a modified version of the algorithm from #10:
a. Look through the source text (same chapter) and find the first instance of the key (unbroken) that occurs after the previous annotation (annotations must be sequential)
b. If that key is found, take the JDSW annotation and insert it into the source text at that point
c. If that key isn't found, just skip since we'll already know about it from clean annotations of commentaries from jdsw #10
The text was updated successfully, but these errors were encountered: