11-3. Self-RAG와 CRAG | LLM 핵심 이론

01

Self-RAG: 자기 반성 RAG

스스로 평가하고 수정하는 RAG

Self-RAG (Self-Reflective Retrieval-Augmented Generation)는 LLM이 검색과 생성 과정에서 스스로를 평가하고 수정하는 방법론입니다. 특수한 "반성 토큰(Reflection Tokens)"을 사용하여 검색 필요성, 문서 관련성, 응답 품질을 자체 판단합니다.

1

검색 필요성 판단 (Retrieve)

주어진 질문에 대해 추가 정보 검색이 필요한지 판단

[Retrieve] = Yes: 검색 실행 [Retrieve] = No: 바로 생성

2

문서 관련성 평가 (IsRel)

검색된 각 문서가 질문과 관련있는지 평가

[IsRel] = Relevant [IsRel] = Irrelevant - 필터링

3

응답 지원성 평가 (IsSup)

생성된 응답이 검색 문서에 의해 뒷받침되는지 평가

[IsSup] = Fully Supported [IsSup] = Partially Supported [IsSup] = No Support - 재생성

4

응답 유용성 평가 (IsUse)

최종 응답이 질문에 유용하게 답변하는지 평가

[IsUse] = 5 (매우 유용) [IsUse] = 3 (보통) [IsUse] = 1 (유용하지 않음)

Self-RAG 반성 토큰

Reflection Tokens

[Retrieve]

Yes No Continue

[IsRel]

Relevant Irrelevant

[IsSup]

Fully Partially No Support

[IsUse]

1 2 3 4 5

Self-RAG의 핵심 특징

Self-RAG 모델은 이러한 반성 토큰을 생성하도록 파인튜닝됩니다. 일반 LLM에 프롬프트로 적용하는 것과 달리, 토큰 레벨에서 자연스럽게 판단이 이루어집니다. 이는 기존 RAG 대비 더 빠르고 일관된 품질 평가를 가능하게 합니다.

02

CRAG: Corrective RAG

검색 결과 품질에 따른 적응형 전략

CRAG (Corrective Retrieval-Augmented Generation)는 검색 결과의 품질을 평가하고, 품질에 따라 다른 전략을 적용하는 적응형 RAG입니다. 검색이 불충분하면 웹 검색으로 보완하고, 관련 없는 문서는 걸러냅니다.

Query

|

문서 검색

|

Relevance Evaluator

|

Correct

Incorrect

Ambiguous

|

Knowledge Refinement

Web Search

Hybrid (Both)

|

Generate Response

CRAG의 3단계 판정

Correct

Correct (정확)

검색 문서가 질문과 관련있고 신뢰할 수 있음. 문서를 정제(Knowledge Refinement)하여 사용.

Incorrect

Incorrect (부정확)

검색 문서가 질문과 관련없음. 웹 검색을 통해 추가 정보를 수집.

Ambiguous

Ambiguous (모호)

일부만 관련있거나 확신이 없음. 기존 문서 정제 + 웹 검색을 함께 수행.

Knowledge Refinement

CRAG의 핵심 기법 중 하나는 검색된 문서에서 노이즈를 제거하고 핵심 정보만 추출하는 Knowledge Refinement입니다.

                        Python (Pseudocode)
                    

                        def knowledge_refinement(documents, query):
    """문서를 분해하고 관련성 점수로 필터링"""
    refined_knowledge = []

    for doc in documents:
        # 1. 문서를 작은 단위로 분해
        strips = decompose_into_strips(doc)

        for strip in strips:
            # 2. 각 strip의 관련성 점수 계산
            score = calculate_relevance(strip, query)

            # 3. 임계값 이상인 strip만 유지
            if score > THRESHOLD:
                refined_knowledge.append(strip)

    # 4. 정제된 지식 조각들을 재조합
    return concatenate(refined_knowledge)
                    

03

Self-RAG vs CRAG 비교

두 방법론의 차이점

특성	Self-RAG	CRAG
평가 방식	반성 토큰 (모델 내재화)	별도 Evaluator (외부 분류기)
검색 보완	재검색 또는 생략	웹 검색으로 보완
구현 복잡도	파인튜닝 필요	파이프라인 조합
평가 세분화	4개 반성 토큰	3단계 판정
추론 속도	빠름 (토큰 생성)	보통 (별도 평가 단계)
외부 의존성	낮음	높음 (웹 검색 API)

04

LangGraph를 활용한 CRAG 구현

실제 코드 예시

                        Python
                    

                        from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Literal

class CRAGState(TypedDict):
    query: str
    documents: list
    grade: Literal["correct", "incorrect", "ambiguous"]
    web_results: list
    final_response: str

def retrieve(state: CRAGState) -> CRAGState:
    """벡터 DB에서 문서 검색"""
    docs = vector_store.similarity_search(state["query"])
    return {**state, "documents": docs}

def grade_documents(state: CRAGState) -> CRAGState:
    """문서 관련성 평가"""
    grade = relevance_evaluator(state["query"], state["documents"])
    return {**state, "grade": grade}

def route_by_grade(state: CRAGState) -> str:
    """등급에 따라 다음 노드 결정"""
    if state["grade"] == "correct":
        return "refine"
    elif state["grade"] == "incorrect":
        return "web_search"
    else:  # ambiguous
        return "hybrid"

def web_search(state: CRAGState) -> CRAGState:
    """웹 검색 수행"""
    results = tavily_search(state["query"])
    return {**state, "web_results": results}

# 그래프 구성
workflow = StateGraph(CRAGState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade", grade_documents)
workflow.add_node("refine", knowledge_refinement)
workflow.add_node("web_search", web_search)
workflow.add_node("generate", generate_response)

workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "grade")
workflow.add_conditional_edges("grade", route_by_grade, {
    "refine": "refine",
    "web_search": "web_search",
    "hybrid": "refine"  # hybrid는 refine 후 web_search도 실행
})
workflow.add_edge("refine", "generate")
workflow.add_edge("web_search", "generate")
workflow.add_edge("generate", END)

app = workflow.compile()
                    

05

실전 적용 팁

효과적인 활용 방법

Self-RAG 활용 시

파인튜닝된 Self-RAG 모델(예: selfrag/selfrag-llama2-7b)을 사용하거나, 프롬프트 기반으로 유사한 동작을 구현할 수 있습니다. 프롬프트 방식은 정확도가 낮지만 구현이 간편합니다.

CRAG 활용 시

Evaluator로 작은 분류 모델이나 LLM을 사용할 수 있습니다. 비용 최적화를 위해 저렴한 모델로 평가하고, 생성에는 고성능 모델을 사용하세요.

Hybrid 접근

Self-RAG의 반성 개념과 CRAG의 교정 전략을 결합하면 더 강력한 시스템을 구축할 수 있습니다. 예: Self-RAG 스타일 평가 + CRAG 스타일 웹 검색 보완.

평가 임계값 조정

관련성 임계값은 도메인과 요구사항에 따라 조정하세요. 높은 정확도가 필요하면 엄격하게, 재현율이 중요하면 느슨하게 설정합니다.

비용과 지연 시간 트레이드오프

Self-RAG와 CRAG 모두 추가적인 평가/검색 단계로 인해 비용과 지연 시간이 증가합니다. 모든 쿼리에 적용하기보다, 복잡하거나 중요한 질문에 선별적으로 적용하는 것이 효율적입니다.

06

참고 자료

더 깊이 학습하기

Original Papers

SUMMARY

핵심 요약

Self-RAG는 반성 토큰으로 검색 필요성, 관련성, 지원성을 자체 평가
CRAG는 검색 결과를 Correct/Incorrect/Ambiguous로 분류하여 적응적 대응
Self-RAG는 파인튜닝 필요, CRAG는 파이프라인 조합으로 구현
CRAG는 부족한 검색 결과를 웹 검색으로 보완
두 방식 모두 비용-품질 트레이드오프 고려하여 선별적 적용 권장