TECHNICAL REFERENCE · DEPT. 04
Research Papers
INDEXED
PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing
Optimizing Life Sciences Agents in Real-Time using Reinforcement Learning
Perceptions of Agentic AI in Organizations: Implications for Responsible AI and ROI
Redefining Human Resource Practices With AI Agents and Agentic AI: Automated Compliance and Enhanced Productivity
The role of agentic AI in shaping a smart future: A systematic review
Generative AI agents in life sciences face a critical challenge: determining the optimal approach for diverse queries ranging from simple factoid questions to complex mechanistic reasoning. Traditional methods rely on fixed rules or expensive labeled training data, neither of which adapts to changing conditions or user preferences. We present a novel framework that combines AWS Strands Agents with Thompson Sampling contextual bandits to enable AI agents to learn optimal decision-making strategies from user feedback alone. Our system optimizes three key dimensions: generation strategy selection (direct vs. chain-of-thought), tool selection (literature search, drug databases, etc.), and domain routing (pharmacology, molecular biology, clinical specialists). Through empirical evaluation on life science queries, we demonstrate 15-30% improvement in user satisfaction compared to random baselines, with clear learning patterns emerging after 20-30 queries. Our approach requires no ground truth labels, adapts continuously to user preferences, and provides a principled solution to the exploration-exploitation dilemma in agentic AI systems.
Highlights
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied to align LLMs with human preferences [14,15].
Traditional methods rely on fixed rules or expensive labeled training data, neither of which adapts to changing conditions or user preferences.
determining the optimal approach for diverse queries ranging from simple factoid questions to complex mechanistic reasoning.
Thompson Sampling contextual bandits
AWS Strands Agents
Our key insight is that user satisfaction signals (thumbs up/down) provide sufficient information for an agent to learn which strategies work best for different query types, without requiring ground truth labels or fixed rules.
We propose a third approach: learning from user feedback through contextual bandits.
Traditional approaches to this problem fall into two categories: 1. Rule-based systems: Use fixed heuristics (e.g., keyword matching) to route queries. These lack adaptability and fail to capture nuanced patterns. 2. Supervised learning: Train classifiers on labeled data to predict optimal strategies. This requires expensive expert annotations and cannot adapt to changing conditions.
Our work demonstrates that contextual bandits provide a principled, practical solution for adaptive agent optimization in high-stakes domains.
Through empirical evaluation on life science queries, we demonstrate 15-30% improvement in user satisfaction compared to random baselines, with clear learning patterns emerging after 20-30 queries.
domain routing (pharmacology, molecular biology, clinical specialists).
tool selection (literature search, drug databases, etc.)
generation strategy selection (direct vs. chain-of-thought)
Synthesizing unstructured research materials into manuscripts is an essential yet under-explored challenge in AI-driven scientific discovery. Existing autonomous writers are rigidly coupled to specific experimental pipelines, and produce superficial literature reviews. We introduce PaperOrchestra, a multi-agent framework for automated AI research paper writing. It flexibly transforms unconstrained pre-writing materials into submission-ready LaTeX manuscripts, including comprehensive literature synthesis and generated visuals, such as plots and conceptual diagrams. To evaluate performance, we present PaperWritingBench, the first standardized benchmark of reverse-engineered raw materials from 200 top-tier AI conference papers, alongside a comprehensive suite of automated evaluators. In side-by-side human evaluations, PaperOrchestra significantly outperforms autonomous baselines, achieving an absolute win rate margin of 50%-68% in literature review quality, and 14%-38% in overall manuscript quality.
Highlights
The efficiency of PaperOrchestra is achieved by strategically decoupling the paper discovery and verification pipeline.
Although the Single Agent executes rapidly, it fails to produce the rigorous citations and data-grounded visuals generated by our system. Despite requiring more LLM calls (∼60–70) than AI Scientist-v2 (∼40–45), PaperOrchestra maintains a highly competitive mean processing time of 39.6 minutes (compared to 35.1 minutes for AI Scientist-v2).
although our current refinement agent effectively improves paper quality using structured, LLM-generated feedback, transitioning this framework toward an interactive, human-in-the-loop (HITL) system would enable researchers to iteratively steer drafts via natural language critiques.
relying on external frameworks like PaperBanana (Zhu et al., 2026) for visual generation limits our direct control over figure hallucinations.
While PaperOrchestra incorporates robust programmatic safeguards (such as API-grounded citation validation) to minimize hallucinations and ensure academic rigor, users are responsible for verifying the outputs to prevent the propagation of LLM-derived biases or misinformation.
We position our system as an advanced assistive tool designed to accelerate the drafting process of AI research papers, rather than an independent entity capable of claiming authorship.
In this work, we introduced PaperOrchestra and PaperWritingBench to transform unstructured, preliminary AI research materials into submission-ready manuscripts. Experiments demonstrate that our multi-agent framework can generate high-quality research papers with competitive runtime (App. B) while synthesizing deep, context-aware literature reviews.
our content refinement loop is critical for elevating raw drafts into rigorous, submission-ready manuscripts.
PaperOrchestra synthesizes coherent visuals from scratch, effectively augmenting the manuscript and reinforcing the scientific narrative.
human experts prioritize dense, pragmatic factuality and a nuanced, narrative-driven flow, which the autorater often mistakenly penalizes for lacking explicit formatting cues.
LLMs tend to act as structural graders, rewarding rigid formatting such as explicit “Problem-Gap-Solution” paragraphs or bulleted thematic groups.
Literature review correlation is lower due to inherent LLM self-bias.
To validate automated SxS metrics and assess human-perceived manuscript quality, we recruited 11 AI researchers to evaluate 40 randomly sampled papers (20 per venue) from PaperWritingBench.
Dominating in Citation Practices and Critical Analysis, our framework synthesizes analytical, well-grounded narratives rather than generic LLM summaries.
This proves our method actively explores the broader academic landscape rather than relying on shallow keyword matching.
PaperOrchestra significantly increases P1 Recall by 12.59%–13.75% over the strongest baselines.
we evaluate holistic manuscript quality using the AI Scientist-v2 and ScholarPeer frameworks.
It significantly outperforms existing AI pipelines, delivering absolute acceptance gains of 13% (CVPR) and 9% (ICLR) over the strongest autonomous baseline.
In Literature Review Quality, our framework dominates the autonomous baselines, achieving absolute win margins of 88%–99%. For Overall Paper Quality, although Human (GT) remains the upper bound, PaperOrchestra substantially surpasses all tested AI competitors. It strongly outperforms AI Scientist-v2 and the Single Agent by margins of 39%–86% and 52%–88%, respectively, across all settings, confirming that our multi-agent architecture significantly enhances overall manuscript quality.
mitigate LLM positional bias, we evaluate each pair of manuscripts in both orderings. The final aggregated outcome is recorded as a win (two wins, or one win and one tie), a tie (one win and one loss, or two ties), or a loss.
(1) SxS Literature Review Quality extracts the Introduction and Related Work to assess problem framing, prior work coverage, organization and synthesis, contribution positioning, and readability. (2) SxS Paper Quality holistically compares the full manuscript (including visual layout) across six axes: scientific depth, technical execution, logical flow, writing clarity, presentation of evidence, and academic style.
Overall Quality. To evaluate the holistic technical quality of the generated papers, we employ two AI-based peer review frameworks simulating expert assessment: (1) the AI Scientist-v2 Reviewer (Yamada et al., 2025), an automated module for structured manuscript evaluation; and (2) ScholarPeer (Goyal et al., 2026), a search-enabled multi-agent system mimicking expert workflows via iterative retrieval and evidence checking. Both systems yield multi-axis scores, an overall rating, and a simulated acceptance decision.
As artificial intelligence (AI) systems rapidly gain autonomy, the need for robust responsible AI frameworks becomes paramount. This paper investigates how organizations perceive and adapt such frameworks amidst the emerging landscape of increasingly sophisticated agentic AI. Employing an interpretive qualitative approach, the study explores the lived experiences of AI professionals. Findings highlight that the inherent complexity of agentic AI systems and their responsible implementation, rooted in the intricate interconnectedness of responsible AI dimensions and the thematic framework (an analytical structure developed from the data), combined with the novelty of agentic AI, contribute to significant challenges in organizational adaptation, characterized by knowledge gaps, a limited emphasis on stakeholder engagement, and a strong focus on control. These factors, by hindering effective adaptation and implementation, ultimately compromise the potential for responsible AI and the realization of ROI.
Highlights
Moreover, as agentic AI evolves, it’s essential to critically examine its emerging characteristics—autonomy, imperfections, creativity—and consider its role as an “actant” (a participant in a network of relationships), with complex interactions with our human and increasingly digital world (Kolt, 2025; Li & Zhu, 2024). Seeing AI as more than just a “tool”, is an important adjustment for advancing its potential as part of the digital workforce, and the potential of the entire workforce.
It takes courage to imagine how future AI will ethically challenge our conception of humanity and the world. And it takes courage to admit that established ethical practices, beliefs, and theories are limited, and therefore need not only be questioned, but also developed…
Neglecting responsible AI, particularly in areas like bias and data security, can lead to significant financial risks, legal liabilities, and reputational damage (Bengio et al., 2025; Bevilacqua et al., 2023).
given leadership skill gaps, can organizations effectively navigate this landscape, prioritize investments across initiatives and time horizons, and ultimately realize agentic AI’s ROI? How can leaders prioritize investments when they lack a fundamental understanding of these initiatives?
for many organizations – calculating ROI for agentic AI projects is a challenge
Bias mitigation strategies and stakeholder engagement are crucial for ethical guidelines and control mechanisms.
However, beyond accountability, proactive measures are needed to maintain trust. This includes securing data, ensuring AI access doesn’t reveal or misuse sensitive information, and implementing practices, audits, training, and standard operating procedures.
One respondent shared: “Organizations must navigate regulatory uncertainty, ensure transparency, and develop fail-safes to maintain control over autonomous systems. How do you maintain safe, sustainable, scale?” This tension is echoed in concerns about “control”, “rules”, “guidelines”, “guardrails”, “keeping humans in control”, “kill switches”, “red-teaming”, “fail-safes”, “robust oversight”, “ethical alignment”, and “morality code integration.”
Autonomy, control, and ethical alignment reveal a central tension in the development and deployment of agentic AI: the desire to harness its power while ensuring alignment with human values and oversight.
It is important to distinguish between “complicated” and “complex” systems. A complicated system, like a car engine, may have many parts, but its behavior is predictable and can be understood by analyzing its individual components. A complex system, like a rainforest or a multi-agent AI system, is characterized by interconnectedness, emergence, and unpredictability. In complex systems, the interactions between components are crucial, and the system’s behavior cannot be easily predicted or controlled by examining individual parts. The Cynefin framework (Snowden & Boone, 2007) provides a useful model for understanding these differences and the appropriate approaches for managing them.
Responsibility is paramount in ethical discussions. As Havel (1990) stated, “… the only genuine backbone of all our actions – if they are to be moral – is responsibility. Responsibility is something higher than my family, my country, my firm, my success.”
Furthermore, examining Agentic AI’s emerging characteristics—autonomy, imperfections, motivations, creativityand consider its role as an “actant” (a participant in a network of relationships) in complex interactions with the human and digital world (Kolt, 2025; Li & Zhu, 2024) will raise questions about workforce composition and the experience of human and digital workers (Biilmann, 2025).
practitioners find ethical ideals abstract, open to interpretation, and difficult to apply to AI, given AI’s agency and the lack of understanding of its inner workings (Buijsman, Klenk, & van den Hoven, 2025).
The adoption of generative AI has outpaced past technology launches like the personal computer and the internet (Bick, Blandin, & Deming, 2024).
Agentic AI1, a new class of highly autonomous and adaptable AI agents, leverages large language models (LLMs) and multimodal AI capabilities to exhibit: emergent behavior, generating novel solutions and adapting to unforeseen challenges; multimodal reasoning, enabling them to process information from various sources like text, images, and audio; proactive planning, giving them the ability to autonomously plan and execute complex tasks; and continuous learning, which allows them to adapt based on new information.
mechanisms, bias mitigation strategies, privacy and data protection protocols, safety and security standards, and stakeholder engagement processes. These frameworks are informed by ethical principles (e.g., OECD AI Principles; Organisation for Economic Co-operation and Development (OECD), 2024) and risk management guidelines (e.g., NIST AI Risk Management Framework; National Institute of Standards and Technology, 2023) which will continue to evolve in response to the broader socio-technical environment (Dignum, 2019; Floridi, 2023; MacKenzie & Wajcman, 1999).
This paper explores how organizations are navigating the complexities of agentic AI. Responsible AI frameworks, which guide the ethical development and deployment of AI, include ethical guidelines, transparency measures, accountability
This article analyzes how agentic artificial intelligence is revolutionizing human resource management through automated workflows, enhanced decision making, and improved employee experiences while addressing implementation challenges like security risks, regulatory compliance, and workforce adoption.
Highlights
Human oversight remains essential to ensure AI aligns with ethical standards and business objectives.
Agentic AI’s success hinges on high-quality, unbiased data; inaccuracies can lead to biased outcomes and erode trust. Ensuring data integrity is crucial to prevent reinforcing existing biases and to support effective AI adoption in HRM.
Lattice’s 2024 attempt to incorporate virtual workers into org charts sparked immediate backlash, forcing the company to abandon the initiative within days.
Employee resistance to AI adoption persists, particularly in HR and administrative roles, where fears of job displacement remain prevalent.27 Some employees remain hesitant to adopt AI agents, concerned about potential role displacement.
In HRM, agentic LLMs handling sensitive employee and candidate data present major security risks. Their extensive access to personal information and system controls makes them attractive targets for cyberattacks.
The EU AI Act mandates strict data protection for agentic AI, requiring data minimization and user control over personal data. High-risk applications must undergo bias assessments, rigorous testing, and external review before deployment.
the EU AI Act regulates all AI applications, including agentic AI, by risk level to ensure ethical, transparent, and accountable deployment.
In the construction industry, for instance, agentic AI serves as a valuable tool for risk management by autonomously analyzing site hazards, optimizing workflows, and facilitating human-like decision making. This technology enhances safety protocols and operational efficiency, leading to more secure and productive construction environments.
AI agents use the Internet of Things (IoT) and cameras to automate incident reporting and enforce protocols, ensuring a safer, compliant worksite.
In construction, AI agents analyze video feeds to detect unsafe behaviors (for example, missing personal protective equipment) and historical data to forecast dangers,
AI-powered scheduling tools can boost efficiency, with some leaders predicting they will enable widespread four-day workweeks.
AI agents with generative capabilities enhance HR modules, improving decision making and operational efficiency.
hiring; some new hires even mistake the bot (“Amelia”) for a human.
Chipotle cut hiring time by 75% using Paradox’s AI agent, enabling fully automated
AI-powered chatbots have transformed HR by streamlining employee queries and reducing routine tasks. Now, AI agents promise even greater autonomy and operational efficiency.
Agentic AI tools also automate compliance processes by monitoring policy adherence, flagging violations, and providing real-time guidance, dramatically reducing manual oversight while improving accuracy. For instance, ZBrain’s compliance agent autonomously audits financial transactions against corporate policies, detecting noncompliant activity with precision to mitigate risks and operational inefficiencies.
Accenture deploys AI-powered scheduling assistants (Agentforce) to optimize employee productivity through automated work planning, priority identification, and office time utilization.
AI agents enhance employee experiences by personalizing training and career development recommendations, assigning role-specific modules to new hires to equip them with necessary skills.1 For instance, ZBrain’s Training Module Assignment Agent analyzes job roles and employee data to deliver customized and efficient learning experiences.
AI agents such as Galileo, Microsoft Copilot, Workday Assistant, and Eightfold’s AI agent streamline HR operations by automating tasks, improving employee experiences, and reducing integration efforts for HR teams.13 Integrating AI agents into HR systems will eliminate the need for employees to navigate multiple platforms, significantly enhancing the HR tech experience.
In general, AI-driven systems enable employees to swiftly access information via simple queries, eliminating the need to navigate multiple platforms or submit service requests.
Traditional HR systems, primarily databases for payroll and compliance, are thus evolving into integrated talent intelligence systems through the incorporation of intelligent agents into core human capital management platforms.
ServiceNow has launched the AI Agent Orchestrator to coordinate specialized AI agents across tasks, systems, and departments, along with thousands of prebuilt agents for HR, plus the AI Agent Studio for creating custom agents.7 The recruitment sector is at the forefront of leveraging advanced AI agents.8 Workday offers AI-powered agents to improve HR processes, including recruiting and succession planning, and has introduced the Agent System of Record to manage these agents effectively.7 Oracle has developed AI agents within its Cloud HCM to streamline HR processes, assisting employees with career development, time-off requests, and workforce analytics, while providing HR teams with centralized data insights.
agentic AI can streamline onboarding by guiding new hires through necessary documentation and training modules, ensuring a smooth integration into the company.
manual effort.
In talent acquisition, AI agents can autonomously screen resumes, evaluate candidate qualifications, and generate shortlists of top applicants, thereby expediting the recruitment process and minimizing
Agentic AI solutions benefit not only large enterprises but also small and medium-sized enterprises (SMEs), enhancing operational efficiency and competitiveness across the board.
Agentic AI allows users to define a task, which the AI autonomously breaks into steps, designs workflows, and selects appropriate AI or IT services for execution.
Artificial intelligence (AI), particularly Agentic AI, is increasingly critical for addressing the demand for speed, efficiency, and customer focus in modern organizations. However, the rapid evolution of Agentic AI, including Generative AI (GenAI) agents, has outpaced a cohesive understanding of its applications, challenges, and strategic implications. This narrative review explores the role of Agentic AI in shaping an intelligent future, focusing on its key attributes—autonomy, reactivity, proactivity, and learning ability—and its potential to transform organizational performance. We identify a research gap in synthesizing the diverse capabilities of Agentic AI (e. g., multimodal processing, hierarchical architectures, and machine learning outsourcing) and providing actionable strategies for adoption. The paper examines how Agentic AI enables autonomous decision-making, automates processes, and enhances efficiency through tools like LangChain, CrewAI, AutoGen, and AutoGPT. It highlights the transition from assisted (“Copilot”) to autonomous (“Autopilot”) models and the importance of hierarchical agent structures for system coordination. Key contributions include a framework for organizations to formulate GenAI strategies, addressing business needs, tool selection, human resource training, and risk management. Findings reveal that Agentic AI significantly improves productivity, reduces costs, and drives innovation, though challenges such as privacy, security, and ethical concerns remain. Future research should focus on industry-specific case studies to deepen understanding, explore the ethical and social impacts (e.g., privacy, data security, labor market effects), and investigate the integration of Agentic AI with emerging technologies like quantum computing. This review provides a foundation for researchers and practitioners to leverage Agentic AI effectively while addressing its limitations and opportunities.
Highlights
Timeline of key technological breakthroughs 1. Early Rule-Based Systems (Pre-2000s) o AI systems primarily relied on hand-crafted rules and expert systems for problem-solving. These systems lacked adaptability and required extensive manual programming. 2. Integration of Machine Learning (2000s) o AI models began leveraging statistical learning techniques to improve decision-making. The introduction of large-scale datasets and early neural networks enhanced pattern recognition and data-driven insights. o Key milestone: Rise of Natural Language Processing (NLP) and basic machine learning algorithms. 3.Deep Learning Revolution (2010s) o The advent of deep learning significantly advanced AI capabilities. The introduction of demonstrated the power of convolutional neural networks (CNNs) in image recognition [17]. o The development of the Transformer architecture revolutionized NLP, enabling context-aware language models and multimodal processing [18]. o Key milestone: Transformer models paved the way for the rise of powerful AI systems capable of processing multimodal inputs. 4. Generative AI and Multimodality (Late 2010s–2020s) o AI models became capable of generating high-quality content across different modalities. The introduction of GPT models demonstrated the power of generative models in text generation. o DALL⋅E expanded AI capabilities to image-text alignment and multimodal reasoning. o Key milestone: The fusion of NLP, computer vision, and audio processing in large-scale AI systems. 5. Advanced Autonomy and Real-Time Interactions (2020s onwards) o AI agents are now capable of self-supervised learning, autonomous decision-making, and real-time multimodal interactions.
Multi-agent systems: This highlights Agentic AI’s ability to facilitate communication and collaboration among multiple agents, enabling the creation of complex workflows and integration with other systems or tools (e.g., email, code execution, search engines) to perform diverse tasks
Workflow optimization: This aspect emphasizes how Agentic AI enhances business processes by integrating language understanding, reasoning, planning, and decision-making, leading to improved resource allocation, communication, collaboration, and automation opportunities.
Learning capability: This refers to Agentic AI’s ability to improve its performance over time through machine learning or reinforcement learning, leveraging past experiences to refine decision-making and achieve better results.
Environmental interaction: This capability allows Agentic AI to perceive and adapt to changes in its surroundings, enabling it to function effectively in dynamic and complex real-world scenarios, such as adjusting logistics in response to traffic conditions.
Goal-oriented behavior: This indicates that Agentic AI is designed to pursue specific objectives and optimize its actions to achieve desired outcomes, such as minimizing costs in transportation or maximizing efficiency in energy systems.
Autonomy: This aspect highlights Agentic AI’s ability to operate independently, making decisions, and taking actions without direct human intervention. It reflects the system’s capacity to use planning, learning, and environmental data to perform complex tasks autonomously.
Agentic AI refers to AI systems that exhibit autonomous decisionmaking, goal-oriented behavior, and continuous learning while interacting with dynamic environments. Unlike traditional AI, which often relies on human intervention or pre-programmed instructions, Agentic AI adapts based on real-time data and evolving objectives. These intelligent agents leverage machine learning, reinforcement learning, and multi-agent coordination to perform tasks efficiently
A Smart Future refers to an era where AI-driven automation, intelligence augmentation, and autonomous decision-making systems contribute to optimized operations in various sectors, such as healthcare, transportation, finance, and energy.
Agentic AI represents a significant advancement over traditional AI agents by incorporating features such as self-learning, real-time adaptability, and multi-agent collaboration.
Artificial Intelligence (AI) has evolved from being a mere computational tool to a transformative force that is reshaping industries, economies, and societies. AI is no longer limited to executing predefined tasks; rather, it now exhibits autonomous decision-making capabilities, adaptability, and goal-directed behavior. The integration of AI into various domains has led to increased efficiency, speed, and automation, driving the rapid adoption of AI-powered solutions by organizations seeking competitive advantages