0_basic-agent/LangGraph/1_building-graph-practice.ipynb (348 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Building Graph Practice\n",
"\n",
"When building a multi-agent system in LangGraph, it is crucial to first conceptualize the overall graph structure abstractly rather than focusing on immediate implementation. Starting with an abstract graph model allows for easier modifications and adaptations, enabling a more agile approach to refining individual agent behaviors.\n",
"\n",
"Think of the graph as a blueprint—once the relationships and information flow are well-defined, implementing specific agents and their behaviors becomes a more structured and efficient process. By prioritizing conceptual design, you enable the flexibility to experiment with different agent roles and interactions without committing prematurely to rigid implementations. \n",
"\n",
"This notebook provides a step-by-step guide to building a graph without focusing on implementation details. The goal is to help you understand the key components of a graph and how they interact with each other.\n",
"\n",
"Before you start, make sure you have a clear understanding of the following concepts:\n",
"- [LangGraph QuickStart - finish until Part 2](https://langchain-ai.github.io/langgraph/tutorials/introduction/)\n",
"- [LangGraph Basic - How to define and update graph state](https://langchain-ai.github.io/langgraph/how-tos/state-reducers/)\n",
"- [LangGraph Basic - How to create a sequence of steps](https://langchain-ai.github.io/langgraph/how-tos/sequence/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"\n",
"## 1. State Definitions\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import TypedDict, Annotated, List\n",
"from langchain_core.documents import Document\n",
"import operator\n",
"\n",
"\n",
"class GraphState(TypedDict):\n",
" context: Annotated[List[Document], operator.add]\n",
" answer: Annotated[List[Document], operator.add]\n",
" question: Annotated[str, \"user question\"]\n",
" sql_query: Annotated[str, \"sql query\"]\n",
" binary_score: Annotated[str, \"binary score yes or no\"]\n",
"\n",
"\n",
"class PDFState(TypedDict):\n",
" filepath: str # path\n",
" filetype: str # pdf\n",
" page_numbers: list[int] # page numbers\n",
" batch_size: int # batch size\n",
" split_filepaths: list[str] # split files\n",
" analyzed_files: list[str] # analyzed files\n",
" page_elements: dict[int, dict[str, list[dict]]] # page elements\n",
" page_metadata: dict[int, dict] # page metadata\n",
" page_summary: dict[int, str] # page summary\n",
" images: list[str] # image paths\n",
" image_summary: list[str] # image summary\n",
" tables: list[str] # table\n",
" table_summary: dict[int, str] # table summary\n",
" table_markdown: dict[int, str] # table markdown\n",
" texts: list[str] # text\n",
" text_summary: dict[int, str] # text summary\n",
" table_summary_data_batches: list[dict] # table summary data batches"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"\n",
"## 2. Node Definition\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Node Definition for PDF retrieval"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def split_pdf(state: PDFState) -> PDFState:\n",
" split_files = \"\"\n",
" return PDFState(split_filepaths=split_files)\n",
"\n",
"\n",
"def layout_analysis(state: PDFState) -> PDFState:\n",
" page_elements = \"\"\n",
" return PDFState(page_elements=page_elements)\n",
"\n",
"\n",
"def page_extract(state: PDFState) -> PDFState:\n",
" page_texts = \"\"\n",
" return PDFState(texts=page_texts)\n",
"\n",
"\n",
"def image_extract(state: PDFState) -> PDFState:\n",
" images = \"\"\n",
" return PDFState(images=images)\n",
"\n",
"\n",
"def table_extract(state: PDFState) -> PDFState:\n",
" tables = \"\"\n",
" return PDFState(tables=tables)\n",
"\n",
"\n",
"def image_summarize(state: PDFState) -> PDFState:\n",
" image_summary = \"\"\n",
" return PDFState(image_summary=image_summary)\n",
"\n",
"\n",
"def table_summarize(state: PDFState) -> PDFState:\n",
" table_summary = \"\"\n",
" return PDFState(table_summary=table_summary)\n",
"\n",
"\n",
"def extract_page_text(state: PDFState) -> PDFState:\n",
" page_texts = \"\"\n",
" return PDFState(texts=page_texts)\n",
"\n",
"\n",
"def page_summarize(state: PDFState) -> PDFState:\n",
" text_summary = \"\"\n",
" return PDFState(text_summary=text_summary)\n",
"\n",
"\n",
"def markdownify_table(state: PDFState) -> PDFState:\n",
" table_markdown = \"\"\n",
" return PDFState(table_markdown=table_markdown)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make a subgraph and Visualize it"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langgraph.graph import END, StateGraph\n",
"from langgraph.checkpoint.memory import MemorySaver\n",
"from azure_genai_utils.graphs import visualize_langgraph\n",
"\n",
"workflow = StateGraph(PDFState)\n",
"\n",
"# Node definition\n",
"workflow.add_node(\"split_pdf_node\", split_pdf)\n",
"workflow.add_node(\"layout_analysis\", layout_analysis)\n",
"workflow.add_node(\"page_extract\", page_extract)\n",
"workflow.add_node(\"image_extract\", image_extract)\n",
"workflow.add_node(\"table_extract\", table_extract)\n",
"workflow.add_node(\"extract_page_text\", extract_page_text)\n",
"workflow.add_node(\"page_summarize\", page_summarize)\n",
"workflow.add_node(\"image_summarize\", image_summarize)\n",
"workflow.add_node(\"table_summarize\", table_summarize)\n",
"workflow.add_node(\"markdownify_table\", markdownify_table)\n",
"\n",
"# Edge connection\n",
"workflow.add_edge(\"split_pdf_node\", \"layout_analysis\")\n",
"workflow.add_edge(\"layout_analysis\", \"page_extract\")\n",
"workflow.add_edge(\"page_extract\", \"image_extract\")\n",
"workflow.add_edge(\"page_extract\", \"table_extract\")\n",
"workflow.add_edge(\"page_extract\", \"extract_page_text\")\n",
"workflow.add_edge(\"image_extract\", \"page_summarize\")\n",
"workflow.add_edge(\"table_extract\", \"page_summarize\")\n",
"workflow.add_edge(\"extract_page_text\", \"page_summarize\")\n",
"workflow.add_edge(\"page_summarize\", \"image_summarize\")\n",
"workflow.add_edge(\"page_summarize\", \"table_summarize\")\n",
"workflow.add_edge(\"image_summarize\", END)\n",
"workflow.add_edge(\"table_summarize\", \"markdownify_table\")\n",
"workflow.add_edge(\"markdownify_table\", END)\n",
"\n",
"workflow.set_entry_point(\"split_pdf_node\")\n",
"\n",
"memory = MemorySaver()\n",
"subgraph = workflow.compile(checkpointer=memory)\n",
"visualize_langgraph(subgraph)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Node Definition for whole workflow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def retrieve(state: GraphState) -> GraphState:\n",
" documents = \"\"\n",
" return {\"context\": documents}\n",
"\n",
"\n",
"def grade_documents(state: GraphState) -> GraphState:\n",
" binary_score = \"yes\"\n",
" return {\"binary_score\": binary_score}\n",
"\n",
"\n",
"def rewrite_query(state: GraphState) -> GraphState:\n",
" documents = \"\"\n",
" return GraphState(context=documents)\n",
"\n",
"\n",
"def generate(state: GraphState) -> GraphState:\n",
" answer = \"answer from LLM\"\n",
" return GraphState(answer=answer)\n",
"\n",
"\n",
"def relevance_check(state: GraphState) -> GraphState:\n",
" binary_score = \"yes\"\n",
" return GraphState(binary_score=binary_score)\n",
"\n",
"\n",
"def web_search(state: GraphState) -> GraphState:\n",
" documents = state[\"context\"] = \"context\"\n",
" searched_documents = \"\"\n",
" documents += searched_documents\n",
" return GraphState(context=documents)\n",
"\n",
"\n",
"def handle_error(state: GraphState) -> GraphState:\n",
" error = \"\"\n",
" return GraphState(context=error)\n",
"\n",
"\n",
"def decide_to_generate(state: GraphState) -> GraphState:\n",
" if state[\"binary_score\"] == \"yes\":\n",
" return \"END\"\n",
" else:\n",
" return \"retry\"\n",
"\n",
"\n",
"def check_relevance(state: GraphState) -> GraphState:\n",
" binary_score = \"yes\"\n",
" return GraphState(binary_score=binary_score)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make a graph and Visualize it\n",
"\n",
"You can see the whole workflow in the graph above. <br>\n",
"If you want to replace the PDF retrieval node with the subgraph, you can add the subgraph node to the workflow."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langgraph.graph import END, StateGraph, START\n",
"\n",
"REPLACE_PDF_RETRIEVAL_SUBGRAPH = True\n",
"\n",
"workflow = StateGraph(GraphState)\n",
"\n",
"# Node definition\n",
"if REPLACE_PDF_RETRIEVAL_SUBGRAPH:\n",
" workflow.add_node(\"retrieve\", subgraph)\n",
"else:\n",
" workflow.add_node(\"retrieve\", retrieve)\n",
"\n",
"workflow.add_node(\"grade_documents\", grade_documents)\n",
"workflow.add_node(\"generate\", generate)\n",
"workflow.add_node(\"query_rewrite\", rewrite_query)\n",
"workflow.add_node(\"web_search_node\", web_search)\n",
"workflow.add_node(\"check_relevance\", check_relevance)\n",
"\n",
"# Edge connection\n",
"workflow.add_edge(START, \"retrieve\")\n",
"workflow.add_edge(\"retrieve\", \"grade_documents\")\n",
"workflow.add_conditional_edges(\n",
" \"grade_documents\",\n",
" decide_to_generate,\n",
" {\n",
" \"query_rewrite\": \"query_rewrite\",\n",
" \"generate\": \"generate\",\n",
" },\n",
")\n",
"workflow.add_edge(\"query_rewrite\", \"web_search_node\")\n",
"workflow.add_edge(\"web_search_node\", \"generate\")\n",
"workflow.add_edge(\"generate\", \"check_relevance\")\n",
"workflow.add_conditional_edges(\n",
" \"check_relevance\",\n",
" decide_to_generate,\n",
" {\n",
" \"generate\": \"generate\",\n",
" END: END,\n",
" },\n",
")\n",
"\n",
"app = workflow.compile()\n",
"visualize_langgraph(app, xray=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv_agent",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}