Date Approved

1-30-2026

Graduate Degree Type

Thesis

Degree Name

Applied Computer Science (M.S.)

Degree Program

School of Computing and Information Systems

First Advisor

Yong Zhuang

Second Advisor

Jonathan Leidig

Third Advisor

Rahat Rafiq

Academic Year

2025/2026

Abstract

Engaging with philosophical works is a rewarding but demanding task that challenges both human readers and computational systems designed to extract arguments from dense philosophical reasoning, and although large language models (LLMs) have made substantial progress in argument extraction, the most advanced models are often costly to run. As a result, there is growing interest in determining if multi-agent pipelines that divide a task into smaller stages can reduce cost while maintaining or improving performance.

This study investigates a modular multi-agent approach for extracting arguments from philosophical texts using LLMs, and compares its performance, cost, and runtime to both single-agent and hybrid-agent pipelines. We first construct a JSON schema representing argument structure through three components: propositions, arguments, and attacks, inspired by existing computational argumentation frameworks. These components define a three stage multi-agent system implemented through an application that communicates with commercial LLM APIs. With this system in place, we then evaluate its behavior across a range of LLMs with different capability levels and cost profiles. We evaluate four models across different price points: Claude Haiku (approximately $0.80 input / $4 output), GPT-4.1 mini ($0.40 / $1.60), Claude Sonnet ($3 / $15), and GPT-4.1 ($2 / $8). Testing suggests that less expensive models appear to benefit more noticeably from multi-agent architectures.

Based on our experiment results, we also observe a clear pattern in how model capability influences the structure of the extracted argument graphs. More capable 3 models tend to generate a larger number of propositions and a larger number of arguments, but the number of attack relations does not increase proportionally. This leads to graphs that are larger in scale but comparatively sparse in their relational structure. A hybrid multi-agent approach that uses Claude Sonnet for proposition extraction and Haiku for argument and attack identification demonstrates how task-specific model assignment can be effective, achieving competitive proposition coverage and F1 scores at moderate per-run cost. Together, these results suggest that multi-agent architectures may be most valuable for budget constrained philosophical text analysis, and that strategic assignment of different models to different subtasks represents a viable approach for balancing cost and capability.

Share

COinS