-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write a tool to convert bamini text to unicode text, keeping the english words as english itself #235
Comments
Now, trying to find if the word "Indigendus" is english or not using python.
Result : the word indigendus is a latin word. Have to find, how to find meaningful english or latin words from the given input string, using python. |
A non-traditional solution will be to use a large language model (LLM) or a I checked with GPT and this prompt works: The following string is a non-unicode representation using a custom font of english_and_latin_words = ["word1", "word2"...] The string : "nfhOk;gpYs;s MAu; Ntj kUj;Jtf; fy;Y}upapNy fw;W Diploma in Your Answer:english_and_latin_words = ["Diploma", "in", "Indigendus", "Medicine", You could use a small language model for lower cost by running it yourself. BTW we make it super easy to implement the above and deploy it as an API Received the above solution from ILUGC mailing list. |
Thanks arun for the idea. In my CPU only desktop (4 vcpu), the ollama with Phi3, took 100% of |
Bamini is a tamil typing system. non unicode.
sample:
nfhOk;gpYs;s MAu; Ntj kUj;Jtf; fy;Y}upapNy fw;W Diploma in Indigendus Medicine and Surgery gl;lk; ngw;W itj;jpa fyhepjpahshu;. ,q;F fy;tp fw;Fk; fhyj;jpy; tpupTiuahsu;fs; Nguhrpupau;fs; rf khztu;fs; KjypNahupd; ed;kjpg;ig ngw;wpUe;jhu;. Nguhrpupau; gzpf;fu; lhf;lu; ey;yehjd; lhf;lu; rghgjpg;gps;is KjypNahupd; ed;kjpg;Gg; ngw;w cj;jkhd khztdhf tpsq;fpdhu;.
We can convert to unicode using open-tamil
Result:
கொழும்பிலுள்ள ஆயுர் வேத மருத்துவக் கல்லூரியிலே கற்று னுipடழஅய in ஐனெபைநனெரள ஆநனiஉiநெ யனெ ளுரசபநசல பட்டம் பெற்று வைத்திய கலாநிதியாளார். இங்கு கல்வி கற்கும் காலத்தில் விரிவுரையாளர்கள் பேராசிரியர்கள் சக மாணவர்கள் முதலியோரின் நன்மதிப்பை பெற்றிருந்தார். பேராசிரியர் பணிக்கர் டாக்டர் நல்லநாதன் டாக்டர் சபாபதிப்பிள்ளை முதலியோரின் நன்மதிப்புப் பெற்ற உத்தமான மாணவனாக விளங்கினார்.
The source Bamini has all letters as english charecters only. We can convert them all to tamil with bamini2unicode method in open-tamil python library.
But, when the source has regular english words, those also converted as tamil, which provide junk charecters.
Example -
source = nfhOk;gpYs;s MAu; Ntj kUj;Jtf; fy;Y}upapNy
result = கொழும்பிலுள்ள ஆயுர் வேத மருத்துவக் கல்லூரியிலே
source = Diploma in Indigendus Medicine and Surgery
result = னுipடழஅய in ஐனெபைநனெரள ஆநனiஉiநெ யனெ ளுரசபநசல
We have to write code to check each source word, skip for conversion, if it is real english world.
The text was updated successfully, but these errors were encountered: