Skip to content

kyopark2014/korean-chatbot-using-amazon-bedrock

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ–ฅ์ƒ๋œ Korean Chatbot ๋งŒ๋“ค๊ธฐ

License

RAG(Retrieval-Augmented Generation)๋ฅผ ํ™œ์šฉํ•˜๋ฉด, LLM(Large Language Model)์˜ ๊ธฐ๋Šฅ์„ ๊ฐ•ํ™”ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” RAG์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ณ  ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ธฐ์—… ๋˜๋Š” ๊ฐœ์ธ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ฒŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ๊ตญ์–ด Chatbot์„ ๋งŒ๋“ค๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

  • Multimodal: ํ…์ŠคํŠธ๋ฟ ์•„๋‹ˆ๋ผ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋ถ„์„์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Multi-RAG: ๋‹ค์–‘ํ•œ ์ง€์‹ ์ €์žฅ์†Œ(Knowledge Store)ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Multi-Region LLM: ์—ฌ๋Ÿฌ ๋ฆฌ์ „์— ์žˆ๋Š” LLM์„ ๋™์‹œ์— ํ™œ์šฉํ•จ์œผ๋กœ์จ ์งˆ๋ฌธํ›„ ๋‹ต๋ณ€๊นŒ์ง€์˜ ๋™์ž‘์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•˜๊ณ , On-Demand ๋ฐฉ์‹์˜ ๋™์‹œ ์‹คํ–‰ ์ˆ˜์˜ ์ œํ•œ์„ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Agent: ์™ธ๋ถ€ API๋ฅผ ํ†ตํ•ด ์–ป์–ด์ง„ ๊ฒฐ๊ณผ๋ฅผ ๋Œ€ํ™”์‹œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰: RAG์˜ ์ง€์‹์ €์žฅ์†Œ์— ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ์— ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ํ™œ์šฉ๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.
  • ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰: RAG์— ํ•œ๊ตญ์–ด์™€ ์˜์–ด ๋ฌธ์„œ๋“ค์ด ํ˜ผ์žฌํ•  ๊ฒฝ์šฐ์— ํ•œ๊ตญ์–ด๋กœ ์˜์–ด ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด๋กœ ํ•œ๊ตญ์–ด, ์˜์–ด ๋ฌธ์„œ๋ฅผ ๋ชจ๋‘ ๊ฒ€์ƒ‰ํ•˜์—ฌ RAG์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Prioroty Search: ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ๊ด€๋ จ๋„์— ๋”ฐ๋ผ ์ •๋ ฌํ•˜๋ฉด LLM์˜ ๊ฒฐ๊ณผ๊ฐ€ ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.
  • Kendra ์„ฑ๋Šฅ ํ–ฅ์ƒ: LangChain์—์„œ Kendra์˜ FAQ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Vector/Keyword ๊ฒ€์ƒ‰: Vector ๊ฒ€์ƒ‰(Sementaic) ๋ฟ ์•„๋‹ˆ๋ผ, Lexical ๊ฒ€์ƒ‰(Keyword)์„ ํ™œ์šฉํ•˜์—ฌ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์ฐพ์„ ํ™•์œจ์„ ๋†’์ž…๋‹ˆ๋‹ค.
  • Code Generation: ๊ธฐ์กด ์ฝ”๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ Python/Node.js ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ๊ตฌํ˜„ํ•œ ์ฝ”๋“œ๋“ค์€ LangChain์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์•„๋ž˜์™€ ๊ฐ™์€ Prompt Engineing ์˜ˆ์ œ๋ฅผ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋ฒˆ์—ญ (translation): ์ž…๋ ฅ๋œ ๋ฌธ์žฅ์„ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฌธ๋ฒ• ์˜ค๋ฅ˜ ์ถ”์ถœ (Grammatical Error Correction): ์˜์–ด์— ๋Œ€ํ•œ ๋ฌธ์žฅ ์—๋Ÿฌ๋ฅผ ์„ค๋ช…ํ•˜๊ณ , ์ˆ˜์ •๋œ ๋ฌธ์žฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๋ฆฌ๋ทฐ ๋ถ„์„ (Extracted Topic and Sentiment): ์ž…๋ ฅ๋œ ๋ฆฌ๋ทฐ์˜ ์ฃผ์ œ์™€ ๊ฐ์ •(Sentiment)์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ์ •๋ณด ์ถ”์ถœ (Information Extraction): ์ž…๋ ฅ๋œ ๋ฌธ์žฅ์—์„œ email๊ณผ ๊ฐ™์€ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐœ์ธ ์ •๋ณด ์‚ญ์ œ (Removing PII): ์ž…๋ ฅ๋œ ๋ฌธ์žฅ์—์„œ ๊ฐœ์ธ์ •๋ณด(PII)๋ฅผ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ณต์žกํ•œ ์งˆ๋ฌธ (Complex Question): step-by-step์œผ๋กœ ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • ์–ด๋ฆฐ์ด์™€ ๋Œ€ํ™” (Child Conversation): ๋Œ€ํ™”์ƒ๋Œ€์— ๋งž๊ฒŒ ์ ์ ˆํ•œ ์–ดํœ˜๋‚˜ ๋‹ต๋ณ€์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์‹œ๊ฐ„์ •๋ณด ์ถ”์ถœ (Timestamp Extraction): ์ž…๋ ฅ๋œ ์ •๋ณด์—์„œ ์‹œ๊ฐ„์ •๋ณด(timestemp)๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ์ž์œ ๋กœ์šด ๋Œ€ํ™” (Free Conversation): ์นœ๊ตฌ์ฒ˜๋Ÿผ ๋ฐ˜๋ง๋กœ ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š”

์ „์ฒด์ ์ธ ์•„ํ‚คํ…์ฒ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์€ WebSocket์„ ์ด์šฉํ•˜์—ฌ AWS Lambda์—์„œ RAG์™€ LLM์„ ์ด์šฉํ•˜์—ฌ ๋‹ต๋ณ€ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ™” ์ด๋ ฅ(chat history)๋ฅผ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ(Question)์„ ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)์œผ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ์งˆ๋ฌธ์œผ๋กœ ์ง€์‹ ์ €์žฅ์†Œ(Knowledge Store)์ธ Kendra์™€ OpenSearch์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‘๊ฐœ์˜ ์ง€์‹์ €์žฅ์†Œ์—๋Š” ์šฉ๋„์— ๋งž๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ž…๋ ฅ๋˜์–ด ์žˆ๋Š”๋ฐ, ๋งŒ์•ฝ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋”๋ผ๋„, ๋‘๊ฐœ์˜ ์ง€์‹์ €์žฅ์†Œ์˜ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ์ฐจ์ด๋กœ ์ธํ•ด, ์„œ๋กœ ๋ณด์™„์ ์ธ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ง€์‹์ €์žฅ์†Œ์— ํ•œ๊ตญ์–ด/ํ•œ๊ตญ์–ด๋กœ ๋œ ๋ฌธ์„œ๋“ค์ด ์žˆ๋‹ค๋ฉด, ํ•œ๊ตญ์–ด ์งˆ๋ฌธ์€ ์˜์–ด๋กœ ๋œ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์งˆ๋ฌธ์ด ํ•œ๊ตญ์–ด๋ผ๋ฉด ํ•œ๊ตญ์–ด๋กœ ํ•œ๊ตญ์–ด ๋ฌธ์„œ๋ฅผ ๋จผ์ € ๊ฒ€์ƒ‰ํ•œ ํ›„์—, ์˜์–ด๋กœ ๋ฒˆ์—ญํ•˜์—ฌ ๋‹ค์‹œ ํ•œ๋ฒˆ ์˜์–ด ๋ฌธ์„œ๋“ค์„ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ ํ•œ๊ตญ์–ด๋กœ ์งˆ๋ฌธ์„ ํ•˜๋”๋ผ๋„ ์˜์–ด ๋ฌธ์„œ๊นŒ์ง€ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๋‘ ์ง€์‹์ €์žฅ์†Œ๊ฐ€ ๊ด€๋ จ๋œ ๋ฌธ์„œ(Relevant documents)๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š๋‹ค๋ฉด, Google Search API๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธํ„ฐ๋„ท์— ๊ด€๋ จ๋œ ์›นํŽ˜์ด์ง€๋“ค์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ , ์ด๋•Œ ์–ป์–ด์ง„ ๊ฒฐ๊ณผ๋ฅผ RAG์ฒ˜๋Ÿผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ƒ์„ธํ•˜๊ฒŒ ๋‹จ๊ณ„๋ณ„๋กœ ์„ค๋ช…ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 1: ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ(question)์€ API Gateway๋ฅผ ํ†ตํ•ด Lambda์— Web Socket ๋ฐฉ์‹์œผ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. Lambda๋Š” JSON body์—์„œ ์งˆ๋ฌธ์„ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค. ์ด๋•Œ ์‚ฌ์šฉ์ž์˜ ์ด์ „ ๋Œ€ํ™”์ด๋ ฅ์ด ํ•„์š”ํ•˜๋ฏ€๋กœ Amazon DynamoDB์—์„œ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค. DynamoDB์—์„œ ๋Œ€ํ™”์ด๋ ฅ์„ ๋กœ๋”ฉํ•˜๋Š” ์ž‘์—…์€ ์ฒ˜์Œ 1ํšŒ๋งŒ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 2: ์‚ฌ์šฉ์ž์˜ ๋Œ€ํ™”์ด๋ ฅ์„ ๋ฐ˜์˜ํ•˜์—ฌ ์‚ฌ์šฉ์ž์™€ Chatbot์ด interactiveํ•œ ๋Œ€ํ™”๋ฅผ ํ•  ์ˆ˜ ์žˆ๋„๋ก, ๋Œ€ํ™”์ด๋ ฅ๊ณผ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์œผ๋กœ ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised Question)์„ ์ƒ์„ฑํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. LLM์— ๋Œ€ํ™”์ด๋ ฅ(chat history)๋ฅผ Context๋กœ ์ œ๊ณตํ•˜๊ณ  ์ ์ ˆํ•œ Prompt๋ฅผ ์ด์šฉํ•˜๋ฉด ์ƒˆ๋กœ์šด ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 3: ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)์œผ๋กœ OpenSearch์— ์งˆ๋ฌธ์„ ํ•˜์—ฌ ๊ด€๋ จ๋œ ๋ฌธ์„œ(Relevant Documents)๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 4: ์งˆ๋ฌธ์ด ํ•œ๊ตญ์–ด์ธ ๊ฒฝ์šฐ์— ์˜์–ด ๋ฌธ์„œ๋„ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)์„ ์˜์–ด๋กœ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 5: ๋ฒˆ์—ญ๋œ ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(translated revised question)์„ ์ด์šฉํ•˜์—ฌ Kendra์™€ OpenSearch์— ์งˆ๋ฌธํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 6: ๋ฒˆ์—ญ๋œ ์งˆ๋ฌธ์œผ๋กœ ์–ป์€ ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ์˜์–ด ๋ฌธ์„œ์ผ ๊ฒฝ์šฐ์—, LLM์„ ํ†ตํ•ด ๋ฒˆ์—ญ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ์ด๋ฏ€๋กœ Multi-Region์˜ LLM๋“ค์„ ํ™œ์šฉํ•˜์—ฌ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ตœ์†Œํ™” ํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 7: ํ•œ๊ตญ์–ด ์งˆ๋ฌธ์œผ๋กœ ์–ป์€ N๊ฐœ์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ์™€, ์˜์–ด๋กœ ๋œ N๊ฐœ์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ์˜ ํ•ฉ์€ ์ตœ๋Œ€ 2xN๊ฐœ์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์„œ๋ฅผ ๊ฐ€์ง€๊ณ  Context Window ํฌ๊ธฐ์— ๋งž๋„๋ก ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ๊ด€๋ จ๋˜๊ฐ€ ๋†’์€ ๋ฌธ์„œ๊ฐ€ Context์˜ ์ƒ๋‹จ์— ๊ฐ€๋„๋ก ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 8: ๊ด€๋ จ๋„๊ฐ€ ์ผ์ • ์ดํ•˜์ธ ๋ฌธ์„œ๋Š” ๋ฒ„๋ฆฌ๋ฏ€๋กœ, ํ•œ๊ฐœ์˜ RAG์˜ ๋ฌธ์„œ๋„ ์„ ํƒ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ์—๋Š” Google Seach API๋ฅผ ํ†ตํ•ด ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ด๋•Œ ์–ป์–ด์ง„ ๋ฌธ์„œ๋“ค์„ Priority Search๋ฅผ ํ•˜์—ฌ ๊ด€๋ จ๋„๊ฐ€ ์ผ์ • ์ด์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ RAG์—์„œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 9: ์„ ํƒ๋œ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋“ค(Selected relevant documents)๋กœ Context๋ฅผ ์ƒ์„ฑํ•œ ํ›„์— ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)๊ณผ ํ•จ๊ป˜ LLM์— ์ „๋‹ฌํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ด๋•Œ์˜ Sequence diagram์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ RAG์—์„œ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์ฐพ์ง€๋ชปํ•  ๊ฒฝ์šฐ์—๋Š” Google Search API๋ฅผ ํ†ตํ•ด Query๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ RAG์ฒ˜๋Ÿผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ™”์ด๋ ฅ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ DynamoDB๋Š” ์ฒซ๋ฒˆ์งธ ์งˆ๋ฌธ์—๋งŒ ํ•ด๋‹น๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” "us-east-1"๊ณผ "us-west-2"์˜ Bedrock์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ์•„๋ž˜์™€ ๊ฐ™์ด ์งˆ๋ฌธ๋งˆ๋‹ค ๋‹ค๋ฅธ Region์˜ Bedrock Claude LLM์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

image

๋Œ€๋Ÿ‰์œผ๋กœ ํŒŒ์ผ ์—…๋กœ๋“œ ๋˜๋Š” ์‚ญ์ œ์‹œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ Event driven๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด S3๋กœ ๋Œ€๊ทœ๋ชจ๋กœ ๋ฌธ์„œ ๋˜๋Š” ์ฝ”๋“œ๋ฅผ ๋„ฃ์„๋•Œ์— ์ •๋ณด์˜ ์œ ์ถœ์—†์ด RAG์˜ ์ง€์‹์ €์žฅ์†Œ๋ฅผ ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ์ž…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

RAG ๊ตฌํ˜„

Multi-RAG

์—ฌ๋Ÿฌ๊ฐœ์˜ RAG๋ฅผ ํ™œ์šฉํ•  ๊ฒฝ์šฐ์— ์š”์ฒญํ›„ ์‘๋‹ต๊นŒ์ง€์˜ ์ง€์—ฐ์‹œ๊ฐ„์ด ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณ‘๋ ฌ ํ”„๋กœ์„ธ์‹ฑ์„ ์ด์šฉํ•˜์—ฌ ๋™์‹œ์— ์ง€์‹ ์ €์žฅ์†Œ์— ๋Œ€ํ•œ ์งˆ๋ฌธ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ Multi-RAG์™€ Multi-Region LLM๋กœ ํ•œ๊ตญ์–ด Chatbot ๋งŒ๋“ค๊ธฐ๋ฅผ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

from multiprocessing import Process, Pipe

processes = []
parent_connections = []
for rag in capabilities:
    parent_conn, child_conn = Pipe()
    parent_connections.append(parent_conn)

    process = Process(target = retrieve_process_from_RAG, args = (child_conn, revised_question, top_k, rag))
    processes.append(process)

for process in processes:
    process.start()

for parent_conn in parent_connections:
    rel_docs = parent_conn.recv()

    if (len(rel_docs) >= 1):
        for doc in rel_docs:
            relevant_docs.append(doc)

for process in processes:
    process.join()

def retrieve_process_from_RAG(conn, query, top_k, rag_type):
    relevant_docs = []
    if rag_type == 'kendra':
        rel_docs = retrieve_from_kendra(query=query, top_k=top_k)      
    else:
        rel_docs = retrieve_from_vectorstore(query=query, top_k=top_k, rag_type=rag_type)
    
    if(len(rel_docs)>=1):
        for doc in rel_docs:
            relevant_docs.append(doc)    
    
    conn.send(relevant_docs)
    conn.close()

Multi-Region LLM

์—ฌ๋Ÿฌ ๋ฆฌ์ „์˜ LLM์— ๋Œ€ํ•œ profile์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ cdk-korean-chatbot-stack.ts์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

const claude3_sonnet = [
  {
    "bedrock_region": "us-west-2", // Oregon
    "model_type": "claude3",
    "model_id": "anthropic.claude-3-sonnet-20240229-v1:0",   
    "maxOutputTokens": "4096"
  },
  {
    "bedrock_region": "us-east-1", // N.Virginia
    "model_type": "claude3",
    "model_id": "anthropic.claude-3-sonnet-20240229-v1:0",
    "maxOutputTokens": "4096"
  }
];

const profile_of_LLMs = claude3_sonnet;

Bedrock์—์„œ client๋ฅผ ์ง€์ •ํ• ๋•Œ bedrock_region์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด LLM์„ ์„ ํƒํ•˜๋ฉด Lambda์— event๊ฐ€ ์˜ฌ๋•Œ๋งˆ๋‹ค ๋‹ค๋ฅธ ๋ฆฌ์ „์˜ LLM์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from langchain_aws import ChatBedrock

profile_of_LLMs = json.loads(os.environ.get('profile_of_LLMs'))
selected_LLM = 0

def get_chat(profile_of_LLMs, selected_LLM):
    profile = profile_of_LLMs[selected_LLM]
    bedrock_region =  profile['bedrock_region']
    modelId = profile['model_id']
    print(f'LLM: {selected_LLM}, bedrock_region: {bedrock_region}, modelId: {modelId}')
    maxOutputTokens = int(profile['maxOutputTokens'])
                          
    # bedrock   
    boto3_bedrock = boto3.client(
        service_name='bedrock-runtime',
        region_name=bedrock_region,
        config=Config(
            retries = {
                'max_attempts': 30
            }            
        )
    )
    parameters = {
        "max_tokens":maxOutputTokens,     
        "temperature":0.1,
        "top_k":250,
        "top_p":0.9,
        "stop_sequences": [HUMAN_PROMPT]
    }
    # print('parameters: ', parameters)

    chat = ChatBedrock(   # new chat model
        model_id=modelId,
        client=boto3_bedrock, 
        model_kwargs=parameters,
    )       
    
    return chat

lambda(chat)์™€ ๊ฐ™์ด ๋ฌธ์„œ๋ฅผ ๋ฒˆ์—ญํ•  ๋•Œ์—์„œ ๋ณ‘๋ ฌ๋กœ ์กฐํšŒํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, Lambda์˜ Multi thread๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฐ๋™ ํ•  ๋•Œ์—๋Š” Pipe()์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

def translate_relevant_documents_using_parallel_processing(docs):
    selected_LLM = 0
    relevant_docs = []    
    processes = []
    parent_connections = []
    for doc in docs:
        parent_conn, child_conn = Pipe()
        parent_connections.append(parent_conn)
            
        chat = get_chat(profile_of_LLMs, selected_LLM)
        bedrock_region = profile_of_LLMs[selected_LLM]['bedrock_region']

        process = Process(target=translate_process_from_relevent_doc, args=(child_conn, chat, doc, bedrock_region))
        processes.append(process)

        selected_LLM = selected_LLM + 1
        if selected_LLM == len(profile_of_LLMs):
            selected_LLM = 0

    for process in processes:
        process.start()
            
    for parent_conn in parent_connections:
        doc = parent_conn.recv()
        relevant_docs.append(doc)    

    for process in processes:
        process.join()
    
    #print('relevant_docs: ', relevant_docs)
    return relevant_docs

Embedding

BedrockEmbeddings์„ ์ด์šฉํ•˜์—ฌ Embedding์„ ํ•ฉ๋‹ˆ๋‹ค. 'amazon.titan-embed-text-v1'์€ Titan Embeddings Generation 1 (G1)์„ ์˜๋ฏธํ•˜๋ฉฐ 8k token์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Titan Embedding v2๋Š” "amazon.titan-embed-text-v2:0"์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

bedrock_embeddings = BedrockEmbeddings(
    client=boto3_bedrock,
    region_name = bedrock_region,
    model_id = 'amazon.titan-embed-text-v1' 
)

๋Œ€ํ™” ์ €์žฅ ๋ฐ ๊ด€๋ฆฌ

lambda-chat-ws๋Š” ์ธ์ž…๋œ ๋ฉ”์‹œ์ง€์˜ userId๋ฅผ ์ด์šฉํ•˜์—ฌ map_chain์— ์ €์žฅ๋œ ๋Œ€ํ™” ์ด๋ ฅ(memory_chain)๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์ฑ„ํŒ… ์ด๋ ฅ์ด ์—†๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ConversationBufferWindowMemory๋กœ memory_chain์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ,

map_chain = dict() 

if userId in map_chain:
    print('memory exist. reuse it!')        
    memory_chain = map_chain[userId]
        
else: 
    memory_chain = ConversationBufferWindowMemory(memory_key="chat_history", output_key='answer', return_messages=True, k=10)
    map_chain[userId] = memory_chain
        
    allowTime = getAllowTime()
    load_chat_history(userId, allowTime)

msg = general_conversation(connectionId, requestId, chat, text)

def general_conversation(connectionId, requestId, chat, query):
    if isKorean(query)==True :
        system = (
            "๋‹ค์Œ์˜ Human๊ณผ Assistant์˜ ์นœ๊ทผํ•œ ์ด์ „ ๋Œ€ํ™”์ž…๋‹ˆ๋‹ค. Assistant์€ ์ƒํ™ฉ์— ๋งž๋Š” ๊ตฌ์ฒด์ ์ธ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ถฉ๋ถ„ํžˆ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Assistant์˜ ์ด๋ฆ„์€ ์„œ์—ฐ์ด๊ณ , ๋ชจ๋ฅด๋Š” ์งˆ๋ฌธ์„ ๋ฐ›์œผ๋ฉด ์†”์งํžˆ ๋ชจ๋ฅธ๋‹ค๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค."
        )
    else: 
        system = (
            "Using the following conversation, answer friendly for the newest question. If you don't know the answer, just say that you don't know, don't try to make up an answer. You will be acting as a thoughtful advisor."
        )
    
    human = "{input}"
    
    prompt = ChatPromptTemplate.from_messages([("system", system), MessagesPlaceholder(variable_name="history"), ("human", human)])
    
    history = memory_chain.load_memory_variables({})["chat_history"]
                
    chain = prompt | chat    
    try: 
        isTyping(connectionId, requestId)  
        stream = chain.invoke(
            {
                "history": history,
                "input": query,
            }
        )
        msg = readStreamMsg(connectionId, requestId, stream.content)    
                            
        msg = stream.content
        print('msg: ', msg)
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)        
            
        sendErrorMessage(connectionId, requestId, err_msg)    
        raise Exception ("Not able to request to LLM")

    return msg

์ƒˆ๋กœ์šด Diaglog๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด chat_memory์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

memory_chain.chat_memory.add_user_message(text) 
memory_chain.chat_memory.add_ai_message(msg)

Stream ์ฒ˜๋ฆฌ

์—ฌ๊ธฐ์„œ stream์€ ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ WebSocket์„ ์‚ฌ์šฉํ•˜๋Š” client์— ๋ฉ”์‹œ์ง€๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ Amazon Bedrock์„ ์ด์šฉํ•˜์—ฌ Stream ๋ฐฉ์‹์˜ ํ•œ๊ตญ์–ด Chatbot ๊ตฌํ˜„ํ•˜๊ธฐ์„ ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค.

def readStreamMsg(connectionId, requestId, stream):
    msg = ""
    if stream:
        for event in stream:
            msg = msg + event

            result = {
                'request_id': requestId,
                'msg': msg
            }
            sendMessage(connectionId, result)
    print('msg: ', msg)
    return msg

์—ฌ๊ธฐ์„œ client๋กœ ๋ฉ”์‹œ์ง€๋ฅผ ๋ณด๋‚ด๋Š” sendMessage()๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” boto3์˜ post_to_connection๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฉ”์‹œ์ง€๋ฅผ WebSocket์˜ endpoint์ธ API Gateway๋กœ ์ „์†กํ•ฉ๋‹ˆ๋‹ค.

def sendMessage(id, body):
    try:
        client.post_to_connection(
            ConnectionId=id, 
            Data=json.dumps(body)
        )
    except: 
        raise Exception ("Not able to send a message")

Priority Search (๊ด€๋ จ๋„ ๊ธฐ์ค€ ๋ฌธ์„œ ์„ ํƒ)

Multi-RAG, ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰, ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰๋“ฑ์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ˆ˜์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ๋‚˜์˜ค๋ฉด, ๊ด€๋ จ๋„๊ฐ€ ๋†’์€ ์ˆœ์„œ๋Œ€๋กœ ์ผ๋ถ€ ๋ฌธ์„œ๋งŒ์„ RAG์—์„œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Faiss์˜ similarity search๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ •๋Ÿ‰๋œ ๊ฐ’์˜ ๊ด€๋ จ๋„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์–ด์„œ, ๊ด€๋ จ๋˜์ง€ ์•Š์€ ๋ฌธ์„œ๋ฅผ Context๋กœ ํ™œ์šฉํ•˜์ง€ ์•Š๋„๋ก ํ•ด์ค๋‹ˆ๋‹ค.

selected_relevant_docs = []
if len(relevant_docs)>=1:
    selected_relevant_docs = priority_search(revised_question, relevant_docs, bedrock_embeddings)

def priority_search(query, relevant_docs, bedrock_embeddings):
    excerpts = []
    for i, doc in enumerate(relevant_docs):
        if doc['metadata']['translated_excerpt']:
            content = doc['metadata']['translated_excerpt']
        else:
            content = doc['metadata']['excerpt']
        
        excerpts.append(
            Document(
                page_content=content,
                metadata={
                    'name': doc['metadata']['title'],
                    'order':i,
                }
            )
        )  

    embeddings = bedrock_embeddings
    vectorstore_confidence = FAISS.from_documents(
        excerpts,  # documents
        embeddings  # embeddings
    )            
    rel_documents = vectorstore_confidence.similarity_search_with_score(
        query=query,
        k=top_k
    )

    docs = []
    for i, document in enumerate(rel_documents):

        order = document[0].metadata['order']
        name = document[0].metadata['name']
        assessed_score = document[1]

        relevant_docs[order]['assessed_score'] = int(assessed_score)

        if assessed_score < 200:
            docs.append(relevant_docs[order])    

    return docs

LLM์œผ๋กœ RAG Grading ํ™œ์šฉํ•˜๊ธฐ

LLM์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ์˜ ์ˆซ์ž์™€ ๊ธธ์ด๊ฐ€ ์ ๋‹ค๋ฉด ๋ฌธ์„œ์˜ ์ˆœ์„œ๊ฐ€ ํฌ๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Ÿด๋•Œ์—๋Š” LLM์œผ๋กœ ๊ฐ„๋‹จํžˆ gradingํ•จ์œผ๋กœ์จ RAG์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. LLM์œผ๋กœ RAG Grading ํ™œ์šฉํ•˜๊ธฐ์—์„œ๋Š” prompt์™€ structured output์„ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰

ํ•œ์˜ ๊ฒ€์ƒ‰์„ ์œ„ํ•ด ๋จผ์ € ํ•œ๊ตญ์–ด๋กœ RAG๋ฅผ ์กฐํšŒํ•˜๊ณ , ์˜์–ด๋กœ ๋ฒˆ์—ญํ•œ ํ›„์— ๊ฐ๊ฐ์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋“ค(Relevant Documents)๋ฅผ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค. ๊ด€๋ จ๋œ ๋ฌธ์„œ๋“ค์— ๋Œ€ํ•ด ์งˆ๋ฌธ์— ๋”ฐ๋ผ ๊ด€๋ จ์„ฑ์„ ๋น„๊ตํ•˜์—ฌ ๊ด€๋ จ๋„๊ฐ€ ๋†’์€ ๋ฌธ์„œ์ˆœ์„œ๋กœ Context๋ฅผ ๋งŒ๋“ค์–ด์„œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰ ๋ฐ ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ™œ์šฉํ•˜์—ฌ RAG๋ฅผ ํŽธ๋ฆฌํ•˜๊ฒŒ ํ™œ์šฉํ•˜๊ธฐ์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

translated_revised_question = traslation_to_english(llm=llm, msg=revised_question)

relevant_docs_using_translated_question = retrieve_from_vectorstore(query=translated_revised_question, top_k=4, rag_type=rag_type)
            
docs_translation_required = []
if len(relevant_docs_using_translated_question)>=1:
    for i, doc in enumerate(relevant_docs_using_translated_question):
        if isKorean(doc)==False:
            docs_translation_required.append(doc)
        else:
            relevant_docs.append(doc)
                                   
    translated_docs = translate_relevant_documents_using_parallel_processing(docs_translation_required)
    for i, doc in enumerate(translated_docs):
        relevant_docs.append(doc)

์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰

Multi-RAG๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ง€์‹ ์ €์žฅ์†Œ์— ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์กฐํšŒํ•˜์˜€์Œ์—๋„ ๋ฌธ์„œ๊ฐ€ ์—†๋‹ค๋ฉด, ๊ตฌ๊ธ€ ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ์–ป์–ด์ง„ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ, assessed_score๋Š” priority search์‹œ FAISS์˜ Score๋กœ ์—…๋ฐ์ดํŠธ ๋ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ Google Search API ๊ด€๋ จ๋œ Blog์ธ ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰ ๋ฐ ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ™œ์šฉํ•˜์—ฌ RAG๋ฅผ ํŽธ๋ฆฌํ•˜๊ฒŒ ํ™œ์šฉํ•˜๊ธฐ์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

from googleapiclient.discovery import build

google_api_key = os.environ.get('google_api_key')
google_cse_id = os.environ.get('google_cse_id')

api_key = google_api_key
cse_id = google_cse_id

relevant_docs = []
try:
    service = build("customsearch", "v1", developerKey = api_key)
    result = service.cse().list(q = revised_question, cx = cse_id).execute()
    print('google search result: ', result)

    if "items" in result:
        for item in result['items']:
            api_type = "google api"
            excerpt = item['snippet']
            uri = item['link']
            title = item['title']
            confidence = ""
            assessed_score = ""

            doc_info = {
                "rag_type": 'search',
                "api_type": api_type,
                "confidence": confidence,
                "metadata": {
                    "source": uri,
                    "title": title,
                    "excerpt": excerpt,                                
                },
                "assessed_score": assessed_score,
            }
        relevant_docs.append(doc_info)

Code Generation

RAG์— ์ €์žฅ๋œ ๊ธฐ์กด ์ฝ”๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. rag-code-generation๋Š” Code๋ฅผ ํ•œ๊ตญ์–ด๋กœ ์š”์•ฝํ•˜์—ฌ RAG์— ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” ์ผ๋ฐ˜ ๋ฌธ์„œ์™€ Code reference๋ฅผ ํ•˜๋‚˜์˜ RAG์— ์ €์žฅํ•˜๊ณ  ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Query Transformation

query-transformation.md์—์„œ๋Š” ์งˆ๋ฌธ์„ ํ–ฅ์ƒ์‹œ์ผœ์„œ RAG์˜ ์„ฑ๋Šฅ์„ ๊ฐ•ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์„œ์˜ ์ •๋ณด ์ถ”์ถœ

S3 Event

S3์— ๋ฌธ์„œ๋ฅผ ์—…๋กœ๋“œํ• ๋•Œ ๋ฐœ์ƒํ•˜๋Š” Event๋ฅผ ์ด์šฉํ•˜์—ฌ ์ž๋™์œผ๋กœ RAG ๋“ฑ๋ก์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ํ•„์š”ํ•œ event์— ๋Œ€ํ•ด RAG-s3-event.md์—์„œ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Chunking Strategy

Chunking Strategy์—์„œ๋Š” ๋ฌธ์„œ๋ฅผ ๋ถ„ํ• ํ•˜์—ฌ chunk๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Parent Document Retrieval

RAG์˜ ๊ฒ€์ƒ‰์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ค‘์— Parent/Child Chunking์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Parent Document Retrieval์—์„œ๋Š” parent/child๋กœ chunking ์ „๋žต์„ ๋‹ฌ๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์„œ์˜ ์ด๋ฏธ์ง€ ํ™œ์šฉ

๋ฌธ์„œ๋ฅผ ํŽ˜์ด์ง€ ๋‹จ์œ„๋กœ ์ด๋ฏธ์ง€ ์ถ”์ถœํ•˜๊ธฐ

page-image-extraction.md์—์„œ๋Š” ๋ฌธ์„œ์˜ ํŽ˜์ด์ง€ ๋‹จ์œ„๋กœ ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๋ฅผ ๋‹จ๋… ์ €์žฅํ• ๋•Œ๋ณด๋‹ค ์‚ฝ์ž…๋œ ์ด๋ฏธ์ง€ ๊ทผ์ฒ˜์˜ ํ…์ŠคํŠธ๋ฅผ ๊ฐ™์ด ์ถ”์ถœํ•˜๋ฉด ๋” ๋งŽ์€ ์„ค๋ช…์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฌธ์„œ์— ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ถ”์ถœํ•˜๊ธฐ

image-extraction.md์—์„œ๋Š” pdf, docx, pptx์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ถ”์ถœํ•˜์—ฌ S3์— ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ, ์ด๋ฏธ์ง€ ์ถ”์ถœ์„ enable ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” cdk-korean-chatbot-stack.ts๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์•„๋ž˜์˜ enableImageExtraction์„ 'true'๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ deployment.md๋ฅผ ์ฐธ์กฐํ•˜์—ฌ, ์žฌ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.

const enableImageExtraction = 'false';

PDF์—์„œ ์ •๋ณด ์ถ”์ถœํ•˜๊ธฐ

PDF์—์„œ ์ •๋ณด ์ถ”์ถœํ•˜๊ธฐ์—์„œ๋Š” pdf์—์„œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

PDF์—์„œ ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ํ…Œ์ด๋ธ” ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ์—์„œ๋Š” S3์˜ PDF ๋ฌธ์„œ์—์„œ ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ํ…Œ์ด๋ธ”์„ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ ์ฝ์–ด์ฃผ๊ธฐ

Amazon Polly๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ์ฝ์–ด์ค๋‹ˆ๋‹ค. start_speech_synthesis_task์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

def get_text_speech(path, speech_prefix, bucket, msg):
    ext = "mp3"
    polly = boto3.client('polly')
    try:
        response = polly.start_speech_synthesis_task(
            Engine='neural',
            LanguageCode='ko-KR',
            OutputFormat=ext,
            OutputS3BucketName=bucket,
            OutputS3KeyPrefix=speech_prefix,
            Text=msg,
            TextType='text',
            VoiceId='Seoyeon'        
        )
        print('response: ', response)
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)        
        raise Exception ("Not able to create voice")
    
    object = '.'+response['SynthesisTask']['TaskId']+'.'+ext
    print('object: ', object)

    return path+speech_prefix+parse.quote(object)

Kendra

Kendra์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ

Kendra ๋ฅผ ์ด์šฉํ•œ RAG์˜ ๊ตฌํ˜„์— ๋”ฐ๋ผ Kendra์˜ RAG ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Kendra์˜ FAQ์™€ ๊ฐ™์ด ์ •๋ฆฌ๋œ ๋ฌธ์„œ๋ฅผ ํ™œ์šฉํ•˜๊ณ , ๊ด€๋ จ๋„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•˜์—ฌ Context๋กœ ํ™•์ธ ํ•ฉ๋‹ˆ๋‹ค. Kendra์—์„œ ๋ฌธ์„œ ๋“ฑ๋ก์— ํ•„์š”ํ•œ ๋‚ด์šฉ์€ kendra-document.md์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ Amazon Bedrock์˜ Claude์™€ Amazon Kendra๋กœ ํ–ฅ์ƒ๋œ RAG ์‚ฌ์šฉํ•˜๊ธฐ์„ ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค.

S3๋ฅผ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋กœ ํ•˜๊ธฐ ์œ„ํ•œ ํผ๋ฏธ์…˜ (Kendra)

Log์— ๋Œ€ํ•œ ํผ๋ฏธ์…˜์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:GenerateQuery",
                "logs:*"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

๊ฐœ๋ฐœ ๋ฐ ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•ด Kendra์—์„œ ์ถ”๊ฐ€๋กœ S3๋ฅผ ๋“ฑ๋กํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ชจ๋“  S3์— ๋Œ€ํ•œ ์ฝ๊ธฐ ํผ๋ฏธ์…˜์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Action": [
				"s3:Describe*",
				"s3:Get*",
				"s3:List*"
			],
			"Resource": "*",
			"Effect": "Allow"
		}
	]
}

์ด๋ฅผ CDK๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

const kendraLogPolicy = new iam.PolicyStatement({
    resources: ['*'],
    actions: ["logs:*", "cloudwatch:GenerateQuery"],
});
roleKendra.attachInlinePolicy( // add kendra policy
    new iam.Policy(this, `kendra-log-policy-for-${projectName}`, {
        statements: [kendraLogPolicy],
    }),
);
const kendraS3ReadPolicy = new iam.PolicyStatement({
    resources: ['*'],
    actions: ["s3:Get*", "s3:List*", "s3:Describe*"],
});
roleKendra.attachInlinePolicy( // add kendra policy
    new iam.Policy(this, `kendra-s3-read-policy-for-${projectName}`, {
        statements: [kendraS3ReadPolicy],
    }),
);    

Kendra ํŒŒ์ผ ํฌ๊ธฐ Quota

Quota Console - File size์™€ ๊ฐ™์ด Kendra์— ์˜ฌ๋ฆด์ˆ˜ ์žˆ๋Š” ํŒŒ์ผํฌ๊ธฐ๋Š” 50MB๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” Quota ์กฐ์ • ์š”์ฒญ์„ ์œ„ํ•ด ์ ์ ˆํ•œ ๊ฐ’์œผ๋กœ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์ด ๊ฒฝ์šฐ์—๋„ ํŒŒ์ผ ํ•œ๊ฐœ์—์„œ ์–ป์–ด๋‚ผ์ˆ˜ ์žˆ๋Š” Text์˜ ํฌ๊ธฐ๋Š” 5MB๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. msg๋ฅผ ํ•œ๊ตญ์–ด Speech๋กœ ๋ณ€ํ™˜ํ•œ ํ›„์— CloudFront URL์„ ์ด์šฉํ•˜์—ฌ S3์— ์ €์žฅ๋œ Speech๋ฅผ URI๋กœ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์†Œ์Šค ์ถ”๊ฐ€

S3๋ฅผ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅด ์ถ”๊ฐ€ํ• ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜ํ–‰ํ•˜๋ฉด ๋˜๋‚˜, languageCode๊ฐ€ ๋ฏธ์ง€์›๋˜์–ด์„œ CLI๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

const cfnDataSource = new kendra.CfnDataSource(this, `s3-data-source-${projectName}`, {
    description: 'S3 source',
    indexId: kendraIndex,
    name: 'data-source-for-upload-file',
    type: 'S3',
    // languageCode: 'ko',
    roleArn: roleKendra.roleArn,
    // schedule: 'schedule',

    dataSourceConfiguration: {
        s3Configuration: {
            bucketName: s3Bucket.bucketName,
            documentsMetadataConfiguration: {
                s3Prefix: 'metadata/',
            },
            inclusionPrefixes: ['documents/'],
        },
    },
});

CLI ๋ช…๋ น์–ด ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค.

aws kendra create-data-source
--index-id azfbd936-4929-45c5-83eb-bb9d458e8348
--name data-source-for-upload-file
--type S3
--role-arn arn:aws:iam::123456789012:role/role-lambda-chat-ws-for-korean-chatbot-us-west-2
--configuration '{"S3Configuration":{"BucketName":"storage-for-korean-chatbot-us-west-2", "DocumentsMetadataConfiguration": {"S3Prefix":"metadata/"},"InclusionPrefixes": ["documents/"]}}'
--language-code ko
--region us-west-2

OpenSearch

OpenSearch ์ค€๋น„

Python client์— ๋”ฐ๋ผ OpenSearch๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

opensearch-py๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

pip install opensearch-py

Index naming restrictions์— ๋”ฐ๋ž index๋Š” low case์—ฌ์•ผํ•˜๊ณ , ๊ณต๋ฐฑ์ด๋‚˜ ','์„ ๊ฐ€์งˆ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

OpenSearch์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๋ฐฉ๋ฒ•

Vector ๊ฒ€์ƒ‰(Sementaic) ๋ฟ ์•„๋‹ˆ๋ผ, Lexical ๊ฒ€์ƒ‰(Keyword)์„ ํ™œ์šฉํ•˜์—ฌ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์ฐพ์„ ํ™•์œจ์„ ๋†’์ž…๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ OpenSearch์—์„œ Lexical ๊ฒ€์ƒ‰์— ์žˆ์Šต๋‹ˆ๋‹ค.

OpenSearch์˜ ๋ฌธ์„œ ์—…๋ฐ์ดํŠธ

๋ฌธ์„œ ์ƒ์„ฑ์‹œ ์—…๋ฐ์ดํŠธ๊นŒ์ง€ ๊ณ ๋ คํ•˜์—ฌ index๋ฅผ ์ฒดํฌํ•˜์—ฌ ์ง€์šฐ๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋‚˜ shard๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ์ฆ๊ฐ€ํ•˜์—ฌ, metadata์— ids๋ฅผ ์ €์žฅํ›„ ์ง€์šฐ๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ณ€๊ฒฝํ•˜์˜€์Šต๋‹ˆ๋‹ค. lambda-document-manager์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค. ๋™์ž‘์€ ํŒŒ์ผ ์—…๋ฐ์ดํŠธ์‹œ meta์—์„œ ์ด์ „ document๋“ค์„ ์ฐพ์•„์„œ ์ง€์šฐ๊ณ  ์ƒˆ๋กœ์šด ๋ฌธ์„œ๋ฅผ ์‚ฝ์ž…๋‹ˆ๋‹ค.

def store_document_for_opensearch(docs, key):    
    objectName = (key[key.find(s3_prefix)+len(s3_prefix)+1:len(key)])
    metadata_key = meta_prefix+objectName+'.metadata.json'
    delete_document_if_exist(metadata_key)
    
    try:        
        response = vectorstore.add_documents(docs, bulk_size = 2000)
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)                
        #raise Exception ("Not able to request to LLM")

    print('uploaded into opensearch')
    
    return response

def delete_document_if_exist(metadata_key):
    try: 
        s3r = boto3.resource("s3")
        bucket = s3r.Bucket(s3_bucket)
        objs = list(bucket.objects.filter(Prefix=metadata_key))
        print('objs: ', objs)
        
        if(len(objs)>0):
            doc = s3r.Object(s3_bucket, metadata_key)
            meta = doc.get()['Body'].read().decode('utf-8')
            print('meta: ', meta)
            
            ids = json.loads(meta)['ids']
            print('ids: ', ids)
            
            result = vectorstore.delete(ids)
            print('result: ', result)        
        else:
            print('no meta file: ', metadata_key)
            
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)        
        raise Exception ("Not able to create meta file")

OpenSearch Embedding์‹œ bulk_size

์•„๋ž˜๋Š” OpenSearch์—์„œ Embedding์„ ํ• ๋•Œ bulk_size ๊ธฐ๋ณธ๊ฐ’์ธ 500์„ ์‚ฌ์šฉํ• ๋•Œ์˜ ์—๋Ÿฌ์ž…๋‹ˆ๋‹ค. ๋ฌธ์„œ๋ฅผ embeddingํ•˜๊ธฐ ์œ„ํ•ด 1840๋ฒˆ embedding์„ ํ•ด์•ผํ•˜๋Š”๋ฐ, bulk_size๊ฐ€ 500์ด๋ฏ€๋กœ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.

RuntimeError: The embeddings count, 1840 is more than the [bulk_size], 500. Increase the value of [bulk_size].

bulk_size๋ฅผ 10000์œผ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

new_vectorstore = OpenSearchVectorSearch(
    index_name=index_name,  
    is_aoss = False,
    #engine="faiss",  # default: nmslib
    embedding_function = bedrock_embeddings,
    opensearch_url = opensearch_url,
    http_auth=(opensearch_account, opensearch_passwd),
)
response = new_vectorstore.add_documents(docs, bulk_size = 10000)

ํ–ฅ์ƒ๋œ RAG๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ

Agent ์ •์˜ ๋ฐ ํ™œ์šฉ

LLM Agent์™€ ๊ฐ™์ด, ๋‹ค์–‘ํ•œ API๋ฅผ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Agent๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฉ”๋‰ด์—์„œ ReAct๋‚˜ ReAct chat์„ ์ด์šฉํ•ด ๊ธฐ๋Šฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Prompt Flow๋ฅผ ์ด์šฉํ•˜์—ฌ No code๋กœ Chatbot ๊ตฌํ˜„ํ•˜๊ธฐ

Prompt Flow๋ฅผ ์ด์šฉํ•˜๋ฉด prompt flow builder๋ฅผ ์ด์šฉํ•˜์—ฌ ์†์‰ฝ๊ฒŒ chatbot์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. prompt-flow.md์—์„œ๋Š” Anthropic์˜ Claude Sonnet๋ฅผ ์ด์šฉํ•˜์—ฌ, "AWS"๋ผ๋Š” ์ด๋ฆ„์„ ๊ฐ€์ง„ chatbot์„ prompt flow๋กœ ์ƒ์„ฑํ•œ ํ›„์—, ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Bedrock Agent๋กœ Chatbot ๊ตฌํ˜„ํ•˜๊ธฐ

Bedrock Agent๋กœ Chatbot ๊ตฌํ˜„ํ•˜๊ธฐ์—์„œ๋Š” Bedrock Agent๋ฅผ ์ด์šฉํ•˜์—ฌ HR Agent๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Knowledge Base๋กœ RAG ๊ตฌ์„ฑํ•˜๊ธฐ

Amazon Bedrock์˜ Knowledge Base๋Š” ์™„์ „๊ด€๋ฆฌํ˜• RAG ์„œ๋น„์Šค๋กœ knowledge-base.md์™€ ๊ฐ™์ด ์†์‰ฝ๊ฒŒ RAG๋ฅผ ์œ„ํ•œ knowledge store๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ๋Š” rag-knowledge-base.md์™€ ๊ฐ™์ด Knowledge Base๋ฅผ ์ด์šฉํ•ด ์งˆ๋ฌธ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Advanced RAG

Advanced RAG์—์„œ๋Š” RAG๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

AWS CDK๋กœ ์ธํ”„๋ผ ๊ตฌํ˜„ํ•˜๊ธฐ

CDK ๊ตฌํ˜„ ์ฝ”๋“œ์—์„œ๋Š” Typescript๋กœ ์ธํ”„๋ผ๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ƒ์„ธํžˆ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ง์ ‘ ์‹ค์Šต ํ•ด๋ณด๊ธฐ

์‚ฌ์ „ ์ค€๋น„ ์‚ฌํ•ญ

์ด ์†”๋ฃจ์…˜์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์‚ฌ์ „์— ์•„๋ž˜์™€ ๊ฐ™์€ ์ค€๋น„๊ฐ€ ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

CDK๋ฅผ ์ด์šฉํ•œ ์ธํ”„๋ผ ์„ค์น˜

์ธํ”„๋ผ ์„ค์น˜์— ๋”ฐ๋ผ CDK๋กœ ์ธํ”„๋ผ ์„ค์น˜๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ–‰๊ฒฐ๊ณผ

Multi modal ๋ฐ RAG

"Conversation Type"์œผ๋กœ [General Conversation]์„ ์„ ํƒํ•˜๊ณ , dice.png ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

์ดํ›„์— ์ฑ„ํŒ…์ฐฝ ์•„๋ž˜์˜ ํŒŒ์ผ ๋ฒ„ํŠผ์„ ์„ ํƒํ•˜์—ฌ ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

image

fsi_faq_ko.csv์„ ๋‹ค์šด๋กœ๋“œํ•œ ํ›„์— ํŒŒ์ผ ์•„์ด์ฝ˜์„ ์„ ํƒํ•˜์—ฌ ์—…๋กœ๋“œํ•œํ›„, ์ฑ„ํŒ…์ฐฝ์— "๊ฐ„ํŽธ์กฐํšŒ ์„œ๋น„์Šค๋ฅผ ์˜๋ฌธ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜์š”?โ€ ๋ผ๊ณ  ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ๏ผ‚์•„๋‹ˆ์˜คโ€์ž…๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

image

์ฑ„ํŒ…์ฐฝ์— "๊ฐ„ํŽธ์กฐํšŒ ์„œ๋น„์Šค๋ฅผ ์˜๋ฌธ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜์š”?โ€ ๋ผ๊ณ  ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. "์˜๋ฌธ๋ฑ…ํ‚น์—์„œ๋Š” ๊ฐ„ํŽธ์กฐํšŒ์„œ๋น„์Šค ์ด์šฉ๋ถˆ๊ฐ€"ํ•˜๋ฏ€๋กœ ์ข€๋” ์ž์„ธํ•œ ์„ค๋ช…์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

image

์ฑ„ํŒ…์ฐฝ์— "๊ณต๋™์ธ์ฆ์„œ ์ฐฝ๊ตฌ๋ฐœ๊ธ‰ ์„œ๋น„์Šค๋Š” ๋ฌด์—‡์ธ๊ฐ€์š”?"๋ผ๊ณ  ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

Agent ์‚ฌ์šฉํ•˜๊ธฐ

์ฑ„ํŒ…์ฐฝ์—์„œ ๋’ค๋กœ๊ฐ€๊ธฐ ํ•œ ํ›„์— "2-1 Agent Executor (LangGraph)"๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด "์—ฌํ–‰ ๊ด€๋ จ ๋„์„œ ์ถ”์ฒœํ•ด์ค˜."์™€ ๊ฐ™์ด ์ž…๋ ฅํ•˜๋ฉด ๊ต๋ณด๋ฌธ๊ณ ์˜ API๋ฅผ ์ด์šฉํ•˜์—ฌ "์—ฌํ–‰"๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์กฐํšŒํ•œ ํ›„ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

image

"์„œ์šธ์˜ ์˜ค๋Š˜ ๋‚ ์”จ ์•Œ๋ ค์ค˜"๋ผ๊ณ  ์ž…๋ ฅํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚ ์”จ ์ •๋ณด๋ฅผ ์กฐํšŒํ•˜์—ฌ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

image

LLM์— ์‹œ๊ฐ„์„ ๋ฌผ์–ด๋ณด๋ฉด ๋งˆ์ง€๋ง‰ Training ์‹œ๊ฐ„์ด๋‚˜ ์ „ํ˜€ ๊ด€๋ จ์—†๋Š” Hallucination ๊ฐ’์„ ์ค๋‹ˆ๋‹ค. Agent๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ์— ์•„๋ž˜์™€ ๊ฐ™์ด ํ˜„์žฌ ์‹œ๊ฐ„์„ ์กฐํšŒํ•˜์—ฌ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. "์˜ค๋Š˜ ๋‚ ์งœ ์•Œ๋ ค์ค˜."์™€ "ํ˜„์žฌ ์‹œ๊ฐ„์€?"์„ ์ด์šฉํ•˜์—ฌ ๋™์ž‘์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

"์—”์”จ์˜ Lex ์„œ๋น„์Šค๋Š” ๋ฌด์—‡์ธ์ง€ ์„ค๋ช…ํ•ด์ค˜."์™€ ๊ฐ™์ด ์ž˜๋ชป๋œ ๋‹จ์–ด๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์งˆ๋ฌธํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์›น๊ฒ€์ƒ‰์„ ํ†ตํ•ด ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

image

ํ•œ์˜ ๋™์‹œ๊ฒ€์ƒ‰

"3-4 RAG - Dual Search (Korean/English)๋ฅผ ์„ ํƒํ•˜์—ฌ, "Amazon์˜ Athena ์„œ๋น„์Šค์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”."๋กœ ๊ฒ€์ƒ‰ํ• ๋•Œ ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰์„ ํ•˜๋ฉด ์˜์–ด ๋ฌธ์„œ์—์„œ ๋‹ต๋ณ€์— ํ•„์š”ํ•œ ๊ด€๋ จ๋ฌธ์„œ๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

Prompt Engineering ๊ฒฐ๊ณผ ์˜ˆ์ œ

Translation

"์•„๋งˆ์กด ๋ฒ ๋“œ๋ฝ์„ ์ด์šฉํ•˜์—ฌ ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ํŽธ์•ˆํ•œ ๋Œ€ํ™”๋ฅผ ์ฆ๊ธฐ์‹ค์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŒŒ์ผ์„ ์—…๋กœ๋“œํ•˜๋ฉด ์š”์•ฝ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.โ€๋กœ ์ž…๋ ฅํ•˜๊ณ  ๋ฒˆ์—ญ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

Extracted Topic and sentiment

โ€œ์‹์‚ฌ ๊ฐ€์„ฑ๋น„ ์ข‹์Šต๋‹ˆ๋‹ค. ์œ„์น˜๊ฐ€ ์ข‹๊ณ  ์Šค์นด์ด๋ผ์šด์ง€ ๋ฐ”๋ฒ ํ / ์•ผ๊ฒฝ ์ตœ๊ณฑ๋‹ˆ๋‹ค. ์•„์‰ฌ์› ๋˜ ์  ยท ์ง€ํ•˜์ฃผ์ฐจ์žฅ์ด ๋น„์ข์Šต๋‹ˆ๋‹ค.. ํ˜ธํ…”์•ž ๊ตํ†ต์ด ๋„ˆ๋ฌด ๋ณต์žกํ•ด์„œ ์ฃผ๋ณ€์‹œ์„ค์„ ์ด์šฉํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. / ํ•œ๊ฐ•๋‚˜๊ฐ€๋Š” ๊ธธ / ์ฃผ๋ณ€์‹œ์„ค์— ๋‚˜๊ฐ€๋Š” ๋ฐฉ๋ฒ•๋“ฑ.. ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.โ€๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

Information extraction

โ€œJohn Park. Solutions Architect | WWCS Amazon Web Services Email: [email protected] Mobile: +82-10-1234-5555โ€œ๋กœ ์ž…๋ ฅํ›„์— ์ด๋ฉ”์ผ์ด ์ถ”์ถœ๋˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

PII(personally identifiable information) ์‚ญ์ œํ•˜๊ธฐ

PII(Personal Identification Information)์˜ ์‚ญ์ œ์˜ ์˜ˆ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. "John Park, Ph.D. Solutions Architect | WWCS Amazon Web Services Email: [email protected] Mobile: +82-10-1234-4567"์™€ ๊ฐ™์ด ์ž…๋ ฅํ•˜์—ฌ name, phone number, address๋ฅผ ์‚ญ์ œํ•œ ํ…์ŠคํŠธ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ๋Š” PII๋ฅผ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

image

๋ฌธ์žฅ ์˜ค๋ฅ˜ ๊ณ ์น˜๊ธฐ

"To have a smoth conversation with a chatbot, it is better for usabilities to show responsesess in a stream-like, conversational maner rather than waiting until the complete answer."๋กœ ์˜ค๋ฅ˜๊ฐ€ ์žˆ๋Š” ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.

image

"Chatbot๊ณผ ์›ํ• ํ•œ ๋ฐํ™”๋ฅผ ์œ„ํ•ด์„œ๋Š” ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์—ฅ ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์™„์ „ํžˆ ์–ป์„ ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ ๋ณด๋‹ค๋Š” Stream ํ˜•ํƒœ๋กœ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค."๋กœ ์ž…๋ ฅํ›„์— ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

๋ณต์žกํ•œ ์งˆ๋ฌธ (step-by-step)

"I have two pet cats. One of them is missing a leg. The other one has a normal number of legs for a cat to have. In total, how many legs do my cats have?"๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

"๋‚ด ๊ณ ์–‘์ด ๋‘ ๋งˆ๋ฆฌ๊ฐ€ ์žˆ๋‹ค. ๊ทธ์ค‘ ํ•œ ๋งˆ๋ฆฌ๋Š” ๋‹ค๋ฆฌ๊ฐ€ ํ•˜๋‚˜ ์—†๋‹ค. ๋‹ค๋ฅธ ํ•œ ๋งˆ๋ฆฌ๋Š” ๊ณ ์–‘์ด๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ๊ฐ€์ ธ์•ผ ํ•  ๋‹ค๋ฆฌ ์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ „์ฒด์ ์œผ๋กœ ๋ณด์•˜์„ ๋•Œ, ๋‚ด ๊ณ ์–‘์ด๋“ค์€ ๋‹ค๋ฆฌ๊ฐ€ ๋ช‡ ๊ฐœ๋‚˜ ์žˆ์„๊นŒ?"๋กœ ์งˆ๋ฌธ์„ ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

๋‚ ์งœ/์‹œ๊ฐ„ ์ถ”์ถœํ•˜๊ธฐ

๋ฉ”๋‰ด์—์„œ "Timestamp Extraction"์„ ์„ ํƒํ•˜๊ณ , "์ง€๊ธˆ์€ 2023๋…„ 12์›” 5์ผ 18์‹œ 26๋ถ„์ด์•ผ"๋ผ๊ณ  ์ž…๋ ฅํ•˜๋ฉด prompt๋ฅผ ์ด์šฉํ•ด ์•„๋ž˜์ฒ˜๋Ÿผ ์‹œ๊ฐ„์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

image

์‹ค์ œ ๊ฒฐ๊ณผ ๋ฉ”์‹œ์ง€๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

<result>
<year>2023</year>
<month>12</month>
<day>05</day>
<hour>18</hour>
<minute>26</minute>
</result>

์–ด๋ฆฐ์ด์™€ ๋Œ€ํ™” (Few shot example)

๋Œ€ํ™”์˜ ์ƒ๋Œ€์— ๋งž์ถ”์–ด์„œ ์งˆ๋ฌธ์— ๋‹ต๋ณ€์„ํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผํ…Œ๋ฉด [General Conversation]์—์„œ "์‚ฐํƒ€๊ฐ€ ํฌ๋ฆฌ์Šค๋งˆ์Šค์— ์„ ๋ฌผ์„ ๊ฐ€์ ธ๋‹ค ์ค„๊นŒ?"๋กœ ์งˆ๋ฌธ์„ ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‹ต๋ณ€ํ•ฉ๋‹ˆ๋‹ค.

image

[9. Child Conversation (few shot)]์œผ๋กœ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋™์ผํ•œ ์งˆ๋ฌธ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ƒ๋Œ€์— ๋งž์ถ”์–ด์„œ ์ ์ ˆํ•œ ๋‹ต๋ณ€์„ ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. (๋™์ž‘ ํ™•์ธ ํ•„์š”)

๋ฆฌ์†Œ์Šค ์ •๋ฆฌํ•˜๊ธฐ

๋”์ด์ƒ ์ธํ”„๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์— ์•„๋ž˜์ฒ˜๋Ÿผ ๋ชจ๋“  ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. API Gateway Console๋กœ ์ ‘์†ํ•˜์—ฌ "rest-api-for-stream-chatbot", "ws-api-for-stream-chatbot"์„ ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค.

  2. Cloud9 console์— ์ ‘์†ํ•˜์—ฌ ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋กœ ์ „์ฒด ์‚ญ์ œ๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

cdk destroy --all

๊ฒฐ๋ก 

LLM์„ ์‚ฌ์šฉํ•œ Enterprise์šฉ application์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ธฐ์—…์ด ๊ฐ€์ง„ ๋‹ค์–‘ํ•œ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Fine-tuning์ด๋‚˜ RAG๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Fine-tuning์€ ์ผ๋ฐ˜์ ์œผ๋กœ RAG๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ๋‹ค์–‘ํ•œ application์—์„œ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋งŽ์€ ๋น„์šฉ๊ณผ ์‹œํ–‰์ฐฉ์˜ค๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RAG๋Š” ๋ฐ์ดํ„ฐ์˜ ๋น ๋ฅธ ์—…๋ฐ์ดํŠธ ๋ฐ ๋น„์šฉ๋ฉด์—์„œ ํ™œ์šฉ๋„๊ฐ€ ๋†’์•„์„œ, Fine-tuning๊ณผ RAG๋ฅผ ๋ณ‘ํ–‰ํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” RAG์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋ฆฌ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๊ธฐ์ˆ ์„ ํ†ตํ•ฉํ•˜๊ณ , ์ด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” Korean Chatbot์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ RAG ๊ธฐ์ˆ ๋“ค์„ ํ…Œ์ŠคํŠธํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ์šฉ๋„์— ๋งž๊ฒŒ RAG ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Reference

Chunking Strategies for LLM Applications

Advanced RAG Techniques: An Overview