Tuesday, April 29, 2025

Art - seal of Salomon


 

Project 2025

 original

https://static.project2025.org/2025_MandateForLeadership_FULL.pdf


1. Centralization of Power and Disregard for Bureaucracy

  • Detail: The document advocates for a strong, unified executive branch to aggressively pursue its agenda. It proposes dismantling the "Administrative State" by curbing the power of federal agencies and increasing presidential control.

  • Sociological Analysis: This reflects an authoritarian tendency to concentrate power in the executive, diminishing the role of checks and balances provided by the bureaucracy. Sociologically, this can lead to:

    • Erosion of Institutional Trust: Undermining the bureaucracy can erode public trust in established institutions, as they are portrayed as obstacles to the popular will.
    • Increased Political Instability: Centralizing power in the executive can destabilize the political system, making it more susceptible to the whims of a single leader.
    • Weakening of Civil Service: Attacking the civil service can demoralize public employees and politicize government functions, reducing efficiency and expertise.

2. Imposition of Social and Cultural Values

  • Detail: The document promotes the use of federal power to enforce specific social and cultural values, particularly concerning issues like abortion and gender identity.

  • Sociological Analysis: This demonstrates an authoritarian approach to social issues, seeking to impose a single set of values on a diverse society. Sociologically, this can result in:

    • Cultural Conflict: Efforts to enforce particular values can lead to intense cultural conflict and polarization, dividing society along ideological lines.
    • Marginalization of Minority Groups: The imposition of majority values can marginalize minority groups and suppress their rights and identities.
    • Erosion of Social Tolerance: Authoritarian approaches to culture can erode social tolerance and pluralism, creating a less open and inclusive society.

3. Economic Nationalism and Protectionism

  • Detail: The document advocates for reducing reliance on foreign supply chains and taking a confrontational stance towards China's economic influence.

  • Sociological Analysis: This reflects economic nationalism, which can align with fascist ideologies that prioritize national interests above global cooperation. Sociologically, this can lead to:

    • Increased Social Cohesion (in the short term): Nationalism can create a sense of unity and shared purpose, particularly in response to perceived external threats.
    • International Conflict: Protectionist trade policies and aggressive economic nationalism can strain international relations and increase the risk of conflict.
    • Xenophobia and Social Exclusion: Nationalist sentiment can foster xenophobia and social exclusion, targeting immigrants and foreign populations.

4. The Language of Crisis and Urgency

  • Detail: The document frequently uses language that emphasizes a sense of crisis and the need for immediate, drastic action to "save" the country.

  • Sociological Analysis: This is a common tactic in authoritarian movements, creating a sense of urgency to justify the rapid implementation of policies without extensive deliberation or democratic process. Sociologically, this can result in:

    • Suppression of Dissent: In a crisis atmosphere, dissent is often portrayed as unpatriotic or even treasonous, limiting free debate and opposition.
    • Rapid Social Change: A sense of urgency can be used to push through radical changes to social and political structures, often with unintended consequences.
    • Increased Anxiety and Insecurity: The constant emphasis on crisis can increase social anxiety and insecurity, making people more willing to accept authoritarian measures.

It is important to note that while these elements have sociological parallels with authoritarian and fascist tendencies, whether the document is authoritarian or fascist is a matter of interpretation and debate.



The document includes a section called "Authors" that provides a short biography of each author. Generally, the authors appear to have been referenced because of their subject matter expertise and/or experience in the area they wrote about. For example:

  • Daren Bakst is Deputy Director, Center for Energy and Environment, and Senior Fellow at the Competitive Enterprise Institute, and wrote the section on the Department of Agriculture.

  • Jonathan Berry is managing partner at Boyden Gray & Associates PLLC, and served as acting Assistant Secretary for Policy at the U.S. Department of Labor, and wrote the section on the Department of Labor.

  • Lindsey M. Burke is Director of the Center for Education Policy at The Heritage Foundation, and wrote the section on the Department of Education.

  • David R. Burton is Senior Fellow in Economic Policy in the Thomas A. Roe Institute for Economic Policy Studies at The Heritage Foundation, and wrote sections on the Department of Commerce and Financial Regulatory Agencies.

  • Adam Candeub is a professor of law at Michigan State University, and wrote the section on the Federal Trade Commission.

  • Dustin J. Carmack is Research Fellow for Cybersecurity, Intelligence, and Emerging Technologies in the Border Security and Immigration Center at The Heritage Foundation, and wrote the section on the Intelligence Community.

  • Brendan Carr has nearly 20 years of private-sector and public-sector experience in communications and tech policy and currently serves as the senior Republican on the Federal Communications Commission, and wrote the section on the Federal Communications Commission.

  • Benjamin S. Carson, Sr., MD, is Founder and Chairman of the American Cornerstone Institute and previously served as the 17th Secretary of the U.S. Department of Housing and Urban Development, and wrote the section on the Department of Housing and Urban Development.

  • Ken Cuccinelli served as Acting Director of U.S. Citizenship and Immigration Services in 2019 and then, from November 2019 through the end of the Trump Administration, as Acting Deputy Secretary for the U.S. Department of Homeland Security, and wrote the section on the Department of Homeland Security.

  • Rick Dearborn served as Deputy Chief of Staff for President Donald Trump and was responsible for the day-to-day operations of five separate departments of the Executive Office of the President, and wrote the section on the White House Office.

  • Veronique de Rugy is the George Gibbs Chair in Political Economy and Senior Research Fellow at the Mercatus Center at George Mason University, and wrote a section on why the Export-Import Bank should be abolished.

  • Donald Devine is Senior Scholar at The Fund for American Studies in Washington, DC, and was President Ronald Reagan’s first-term Office of Personnel Management Director, and wrote a section on Central Personnel Agencies.

  • Diana Furchtgott-Roth, an Oxford-educated economist, directs the Center for Energy, Climate, and Environment at The Heritage Foundation and is adjunct professor of economics at George Washington University, and wrote the section on the Department of Transportation.

  • Thomas F. Gilman served as Assistant Secretary of Commerce for Administration and Chief Financial Officer of the U.S. Department of Commerce in the Trump Administration, and wrote a section on the Department of Commerce.

  • Mandy M. Gunasekara is a principal at Section VII Strategies, a Senior Policy Analyst at the Independent Women’s Forum, and Visiting Fellow in the Center for Energy, Climate, and Environment at The Heritage Foundation, and wrote the section on the Environmental Protection Agency.

  • Gene Hamilton is Vice-President and General Counsel of America First Legal Foundation, and wrote the section on the Department of Justice.

  • Jennifer Hazelton has worked as a senior strategic consultant for the Department of Defense in Industrial Base Policy and has held senior positions at USAID, the Export–Import Bank of the United States, and the State Department, and wrote the section on the case for the Export-Import Bank.

  • Karen Kerrigan is President and CEO of the Small Business & Entrepreneurship Council, and wrote the section on the Small Business Administration.

  • Dennis Dean Kirk is Associate Director for Personnel Policy with the 2025 Presidential Transition Project at The Heritage Foundation, and wrote a section on Central Personnel Agencies.

  • Kent Lassman is President and CEO of the Competitive Enterprise Institute, and wrote the section on the case for Free Trade.

  • Bernard L. McNamee is an energy and regulatory attorney with a major law firm and was formerly a member of the Federal Energy Regulatory Commission, and wrote the section on the Department of Energy and Related Commissions.

  • Christopher Miller served in several positions during the Trump Administration, including as Acting U.S. Secretary of Defense, and wrote the section on the Department of Defense.

  • Stephen Moore is a conservative economist and author, and wrote a section on the Department of the Treasury.

  • Mora Namdar is an attorney and Senior Fellow at the American Foreign Policy Council, and wrote the section on the U.S. Agency for Global Media Corporation for Public Broadcasting.

  • Peter Navarro holds a PhD in economics from Harvard and was one of only three senior White House officials to serve with Donald Trump from the 2016 campaign to the end of the President’s first term, and wrote the section on the case for Fair Trade.

  • William Perry Pendley was an attorney on Capitol Hill, a senior official for President Ronald Reagan, and leader of the Bureau of Land Management for President Donald Trump, and wrote the section on the Department of the Interior.

  • Max Primorac is Director of the Douglas and Sarah Allison Center for Foreign Policy Studies at The Heritage Foundation, and wrote the section on the Agency for International Development.

  • Roger Severino is Vice President of Domestic Policy at The Heritage Foundation, and wrote the section on the Department of Health and Human Services.

  • Kiron K. Skinner is President and CEO of the Foundation for America and the World, Taube Professor of International Relations and Politics at Pepperdine University, and wrote the section on the Department of State.

The reference to these authors appears to lend credibility to the ideas presented in the book.




Kwok Pui-lan's work often highlights the importance of acknowledging the full humanity of biblical figures





This passage, focusing on the humanity and persistent prayer of Elijah, resonates deeply with the themes Kwok Pui-lan brings to feminist and postcolonial theologies. Reading it through her lens invites us to consider the text beyond a simplistic individualistic piety and to explore its broader implications for marginalized communities and our understanding of divine engagement.

Firstly, Kwok Pui-lan's work often highlights the importance of acknowledging the full humanity of biblical figures, especially those who might be idealized or presented in ways that obscure their struggles. The passage begins by emphasizing that "Elijah was human subject to like passions as we are." This resonates with Kwok's critique of theological frameworks that create a distant, unattainable God and saints who are unrelatable. Instead, we see Elijah in his vulnerability – murmuring, complaining, even experiencing unbelief. This resonates with the lived experiences of many, particularly those facing systemic oppression, who may grapple with doubt and despair in the face of suffering. Kwok's emphasis on the embodied and contextual nature of faith allows us to see Elijah's humanity not as a flaw, but as a point of connection and solidarity.

Secondly, Kwok Pui-lan's engagement with postcolonial perspectives encourages us to consider the power dynamics inherent in interpreting scripture. Often, readings focus solely on the individual's relationship with God, potentially overlooking the socio-political contexts in which these figures lived and prayed. While the passage emphasizes Elijah's personal prayer, Kwok's lens might prompt us to ask: What were the pressures Elijah faced? Were they solely internal, or were they connected to larger systems of injustice and oppression in his time? Recognizing these external pressures allows us to see Elijah's "murmuring and complaining" not just as personal failings, but as potential responses to systemic issues, mirroring the laments of marginalized communities throughout history.

Thirdly, Kwok Pui-lan's feminist theological insights invite us to consider the nature of prayer itself. The passage highlights the original Greek phrase "proseuchē prosēuxato," emphasizing a persistent, ongoing dialogue with God. This moves beyond a transactional view of prayer as simply asking for favors. Instead, it suggests a sustained relationship, a "meaningful, uninterrupted dialogue." This resonates with feminist theologies that often emphasize relationality and a God who is in deep connection with humanity. For those who have been historically silenced or whose voices have been marginalized, this image of persistent dialogue can be particularly empowering. It suggests that even in moments of doubt and complaint, the connection with the divine can endure.

Finally, the concluding lesson, "As we may gather, we must keep at it," takes on a richer meaning through Kwok Pui-lan's lens. It's not just about individual perseverance in prayer, but perhaps also about the collective persistence of marginalized communities in their struggles for justice and liberation. Just as Elijah kept at his dialogue with God through his human struggles, so too must communities facing oppression maintain their voices, their resistance, and their hope. This "keeping at it" becomes a form of spiritual resilience, a refusal to be silenced or defeated in the face of adversity.

In essence, reading this passage through Kwok Pui-lan's style encourages us to:

  • Embrace the full humanity of biblical figures and ourselves, including our doubts and struggles.
  • Consider the socio-political contexts that shape both individual and communal experiences of faith and prayer.
  • Value prayer as a sustained, relational dialogue with the divine, particularly empowering for those whose voices have been marginalized.
  • See persistence not just as individual piety, but also as a form of communal resilience in the face of injustice.


Sunday, April 27, 2025

Agentic AI Design


 

The Core Components of an AI Agent


 The diagram depicts a simplified model of how an AI agent interacts with its environment. It highlights the flow of information and actions between the user, the environment, and the AI agent itself.

Here's a breakdown of each component:

  • User Input: This represents the initial stimulus or instructions provided to the environment, often through a human-computer interface like a keyboard, mouse, voice command, or a touch screen. In a technical sense, this input is translated into a digital signal that the environment can process.

  • Environment: This is the external world with which the AI agent interacts. The diagram distinguishes between:

    • Digital Infrastructure: This encompasses the virtual or computational aspects of the environment, such as software systems, databases, networks, and the internet. User input might directly manipulate this digital space.
    • Physical Infrastructure: This refers to the tangible aspects of the environment, including robots, machines, physical spaces, and sensors embedded within them. User input might indirectly affect this through digital commands. The environment generates percepts based on its current state, influenced by user input and the AI agent's actions.
  • AI Agent: This is the intelligent entity that perceives its environment and acts upon it to achieve its goals. It comprises three key components:

    • Sensors: These are the agent's perception mechanisms. Technically, sensors are devices or software modules that convert raw data from the environment into a format that the AI agent can understand. Examples include cameras (visual data), microphones (audio data), tactile sensors (pressure), GPS (location), or software interfaces that provide data from digital systems. The output of the sensors are the percepts, which are the agent's instantaneous view of the environment.
    • Control Centre: This is the "brain" of the AI agent, where the processing and decision-making occur. Technically, this involves algorithms, models (like machine learning models), and logic that analyze the percepts, reason about the current situation, and decide on the next action. This component handles tasks like data processing, pattern recognition, knowledge representation, planning, and learning.
    • Effectors: These are the means by which the AI agent acts upon the environment. Technically, effectors are devices or software modules that translate the agent's decisions (the "action") into physical movements or digital operations. Examples include motors in a robot arm, actuators that control a valve, software commands that modify a database, or signals sent to a display screen.
  • Flow of Information and Action: The arrows illustrate the interaction loop:

    1. User Input influences the Environment.
    2. The Environment generates Percepts based on its current state.
    3. The AI Agent's Sensors receive these Percepts.
    4. The Control Centre processes the Percepts and decides on an Action.
    5. The AI Agent's Effectors execute the Action on the Environment, potentially changing its state and leading to new percepts in the next cycle.

Are LLMs Actually Reliable for Cyber Threat Intelligence?

 

Introduction: The Promise and Perils of LLMs in Cyber Threat Intelligence

The digital landscape is characterized by an ever-increasing volume and sophistication of cyber threats, necessitating robust and efficient methods for Cyber Threat Intelligence (CTI) analysis. Understanding the tactics, techniques, and procedures (TTPs) of threat actors, along with identifying vulnerabilities and potential attack vectors, is crucial for proactive defense and effective incident response. Large Language Models (LLMs), with their remarkable capabilities in natural language processing and understanding, have emerged as a promising technology in various domains, including cybersecurity. The potential of LLMs to automate the processing and analysis of vast amounts of unstructured threat data has generated considerable enthusiasm within the cybersecurity community.1 By enhancing the ability to extract relevant information, identify patterns, and potentially predict future threats, LLMs offer a compelling solution to the challenge of managing the growing deluge of cyber threat information.

However, alongside this optimism, significant concerns persist regarding the actual reliability and accuracy of LLMs for critical CTI operations.1 The unique requirements of cyber threat intelligence, which demands precision, consistency, and a high degree of trustworthiness, raise questions about the suitability of current LLM technology. Recent research has begun to rigorously evaluate the performance of LLMs on real-world CTI tasks, revealing potential limitations and risks associated with their deployment in this sensitive domain. This report aims to provide a comprehensive analysis of the reliability of LLMs for CTI, drawing upon recent studies and expert opinions to offer an evidence-based perspective on their current capabilities and limitations. By examining various facets of LLM performance, including accuracy on different types of CTI reports, consistency of responses, the phenomenon of overconfidence in incorrect answers, and the effectiveness of reliability-enhancing techniques, this report seeks to inform cybersecurity professionals, researchers, and decision-makers about the practical implications of integrating LLMs into their CTI workflows.

Evaluating the Performance of LLMs on Real-World CTI Reports

Recent research has focused on developing methodologies to evaluate the effectiveness of LLMs in handling real-world Cyber Threat Intelligence reports. One such evaluation methodology, presented in several works, allows for testing LLMs on CTI tasks using zero-shot learning, few-shot learning, and fine-tuning approaches.1 This methodology also enables the quantification of the models' consistency and their confidence levels in the generated outputs. Experiments conducted using three state-of-the-art LLMs on a dataset comprising 350 threat intelligence reports have revealed potential security risks associated with relying on LLMs for CTI.1 The findings indicate that these models often struggle to guarantee sufficient performance when processing real-size reports, exhibiting both inconsistency and overconfidence in their analyses.

To address the limitations of general-purpose benchmarks in evaluating LLMs for the specific demands of CTI, researchers have developed specialized benchmarks such as CTIBench.3 CTIBench is designed as a novel suite of benchmark tasks and datasets aimed at assessing LLMs' performance in practical CTI applications. This benchmark includes multiple datasets that focus on evaluating the knowledge acquired by LLMs within the cyber-threat landscape. CTIBench incorporates a variety of tasks designed to test different cognitive abilities crucial for CTI analysis, including CTI Multiple Choice Questions (CTI-MCQ), CTI Root Cause Mapping (CTI-RCM), CTI Vulnerability Severity Prediction (CTI-VSP), and CTI Threat Actor Attribution (CTI-TAA).3 These tasks collectively assess an LLM's understanding of CTI standards, reasoning about vulnerabilities, and ability to attribute threats to specific actors.

Evaluations of several prominent LLMs, including ChatGPT 3.5, ChatGPT 4, Gemini 1.5, Llama 3-70B, and Llama 3-8B, on the CTIBench tasks have provided valuable insights into their strengths and weaknesses in CTI contexts.5 For instance, ChatGPT 4 has demonstrated strong performance across most CTIBench tasks, indicating a good understanding of CTI knowledge and reasoning capabilities.6 However, the performance of different models varies across the specific tasks within the benchmark. For example, while ChatGPT 4 excels in CTI-MCQ, CTI-RCM, and CTI-TAA, Gemini 1.5 has shown superior results in CTI-VSP, which involves predicting vulnerability severity.6 Notably, the open-source model LLAMA3-70B has also exhibited competitive performance, comparable to or even outperforming Gemini 1.5 on certain tasks like CTI-MCQ and CTI-TAA, although it faces challenges in vulnerability severity prediction.6 The development and application of CTI-specific benchmarks like CTIBench are crucial for obtaining a more accurate and nuanced understanding of LLM capabilities in this specialized domain, moving beyond the broader evaluations offered by general language understanding benchmarks.3

While CTIBench focuses specifically on CTI, other benchmarks like SECURE, NetEval, and DebugBench offer insights into LLM performance in different cybersecurity domains.7 SECURE is tailored for evaluating LLMs in the context of Industrial Control Systems (ICS) cybersecurity, while NetEval assesses capabilities in network operations, and DebugBench focuses on debugging capabilities.5 The existence of these diverse benchmarks underscores the broad interest in leveraging LLMs across various cybersecurity tasks, yet it also highlights that the unique challenges of CTI necessitate dedicated evaluation tools like CTIBench to accurately gauge the reliability of LLMs in this field.

The Impact of CTI Report Length and Complexity on LLM Accuracy

The length and complexity of Cyber Threat Intelligence reports appear to significantly influence the accuracy of Large Language Models in extracting and processing information. Research suggests that LLM performance on CTI tasks can be affected by the amount of text they need to analyze. For instance, one study indicated that when LLMs were tasked with extracting information from threat reports, their performance worsened when processing complete, longer reports compared to individual paragraphs from the same reports.1 This degradation in performance was characterized by an increase in both false positives and false negatives, suggesting that the models struggled to maintain accuracy and relevance as the input size grew.

The concept of context length, which refers to the maximum number of tokens an LLM can process in a single input, plays a crucial role in this phenomenon.8 A longer context length generally allows an LLM to understand more detailed commands and maintain context over longer interactions, potentially leading to higher quality outputs for complex tasks.8 However, processing longer contexts also demands greater computational resources and can sometimes lead to confusion or loss of focus on the most critical information within the report.1

In contrast, studies focusing on shorter text samples have reported higher accuracy rates. For example, research evaluating an LLM system on summaries and coding of variables from cybercrime forum conversations, which tend to be shorter and more focused, achieved an average accuracy of 98%.9 This stark difference in performance between shorter forum posts and full-length threat reports implies that the ability of LLMs to handle extensive context and maintain accuracy is a significant challenge in the domain of CTI. The increased length and complexity of real-world CTI reports might overwhelm the models' processing capabilities or introduce ambiguities that negatively impact their reliability in extracting key intelligence. Therefore, understanding the interplay between report length, complexity, and LLM accuracy is essential for determining the appropriate use cases and limitations of these models in CTI workflows.

Identifying the Limitations: Instances of Missed Campaign Information and Inconsistency

One of the critical aspects in evaluating the reliability of LLMs for Cyber Threat Intelligence is their ability to consistently and comprehensively extract essential information from CTI reports. Recent research has revealed significant limitations in this regard, particularly concerning the frequency with which LLMs miss crucial campaign information and overlook vulnerabilities. Studies have indicated that even state-of-the-art LLMs can fail to extract critical information with sufficient reliability, overlooking a notable percentage of key campaign entities and vulnerabilities present in real-world CTI reports.10 For instance, findings suggest that LLMs might miss up to 20% of campaign entities and 10% of vulnerabilities, which are fundamental elements for understanding the scope and impact of cyber threats.10 Further supporting this, research on entity extraction from threat reports showed recall rates as low as 0.72 for campaign entities when using certain LLM models, implying that as much as 28% of this vital information could be overlooked.1 The failure to accurately identify and extract such critical data can have serious implications for threat detection, incident response, and overall cybersecurity situational awareness.

Another significant concern regarding the reliability of LLMs in CTI analysis is the consistency of their responses when presented with the same information multiple times. Evaluations have demonstrated that LLMs can exhibit inconsistency in their responses to identical queries about a CTI report.1 This lack of consistent output, even when the input remains the same, introduces uncertainty into the analysis process, particularly for critical security decisions such as prioritizing patching or attributing attacks.10 The probabilistic nature of LLMs, where responses are generated based on probability distributions over tokens, contributes to this variability.11 Factors such as temperature settings and the stochastic process of token sampling can lead to different outputs across multiple runs, even for the same prompt. A study analyzing LLM response variability in the context of ranking intrusion detection systems further illustrates this issue, finding significant divergence in the recommendations provided by different LLMs for the same query.15 This inherent inconsistency in LLM responses raises concerns about their suitability for tasks requiring repeatable and dependable analytical outcomes, which are paramount in the field of cyber threat intelligence.

The Challenge of Overconfidence: LLMs Expressing High Confidence in Incorrect Answers

A particularly concerning aspect of relying on Large Language Models for Cyber Threat Intelligence is the phenomenon of these models expressing high confidence in answers that are, in fact, incorrect.1 This issue of overconfidence, often occurring despite poor calibration of the model's certainty with the actual correctness of its output 10, poses a significant risk to cybersecurity professionals who might be misled into trusting inaccurate information. The tendency of LLMs to generate fluent and plausible-sounding text, even when the content is factually wrong, can create a false sense of security and potentially lead to flawed decision-making in critical CTI scenarios.

Research from cognitive and computer scientists has corroborated this tendency, revealing that people generally overestimate the accuracy of LLM outputs.17 Studies have identified a "calibration gap," representing the difference between what LLMs know and what users perceive they know, as well as a "discrimination gap," which measures how well humans and models can distinguish between correct and incorrect answers.17 These findings highlight a fundamental challenge in the human-AI interaction when it comes to interpreting the reliability of LLM-generated information. LLMs often do not inherently communicate their level of uncertainty in their responses, leading users to potentially trust confidently stated but ultimately erroneous information.17 To address this issue, researchers have explored methods for calibrating LLMs, such as the "Thermometer" technique, which aims to align a model's confidence level with its actual accuracy, thereby providing users with a clearer signal of when a model's response should be trusted.18 Overcoming the problem of LLM overconfidence in incorrect answers is crucial for their safe and effective integration into cyber threat intelligence workflows, where accuracy and reliability are paramount.

Enhancing Reliability: The Role of Few-Shot Learning and Fine-Tuning

In an effort to improve the reliability of Large Language Models for Cyber Threat Intelligence tasks, researchers have explored techniques such as few-shot learning and fine-tuning. Few-shot learning involves providing the LLM with a limited number of examples within the prompt to guide its performance on a specific task, particularly useful when extensive labeled data is unavailable.19 Fine-tuning, on the other hand, involves further training a pre-trained LLM on a smaller, task-specific dataset to adapt its parameters for improved performance in that domain.21

However, studies evaluating the impact of these techniques on LLM performance in CTI have yielded mixed results. Research has suggested that few-shot learning and fine-tuning may only partially improve the reliability of LLMs for CTI tasks like entity extraction, and in some instances, these techniques have even been observed to worsen performance.1 The limited effectiveness could be attributed to several factors, including the inherent complexity of CTI tasks, the scarcity of high-quality labeled datasets in the cybersecurity domain for effective fine-tuning 1, and the potential for overfitting the model to the limited examples provided in few-shot learning scenarios.16

Despite these challenges, some research has shown promise in leveraging these techniques for specific CTI sub-tasks. For example, one study proposed a method that combines data augmentation using ChatGPT with instruction supervised fine-tuning of open large language models for Tactics, Techniques, and Procedures (TTPs) classification in few-shot learning scenarios, reporting encouraging results.23 Additionally, ongoing projects aim to fine-tune LLMs using CTI-specific data and evaluate their performance on benchmarks like CTIBench, with the goal of enhancing their capabilities in areas such as identifying threat actors and mapping attack techniques.21 While current evidence suggests that few-shot learning and fine-tuning alone may not be sufficient to guarantee high reliability for all CTI tasks, targeted and innovative approaches in applying these techniques continue to be explored as potential avenues for improvement.

Expert Perspectives on the Risks and Benefits of LLMs in CTI Workflows

Experts in the field of cybersecurity and artificial intelligence hold diverse perspectives on the potential risks and benefits of integrating Large Language Models into Cyber Threat Intelligence workflows. On the one hand, LLMs are recognized for their significant potential to enhance various aspects of CTI analysis.24 Their ability to process and understand vast quantities of human language allows for improved threat detection and prediction by analyzing historical attack data to identify patterns and trends.24 LLMs can also accelerate incident response by rapidly analyzing incident data to determine the scope and root cause of an attack and recommend remediation strategies.24 Furthermore, they can streamline threat intelligence analysis by automating time-consuming tasks like data collection, aggregation, and correlation, freeing up human analysts to focus on higher-order tasks.24 The potential for cost reduction through automation and enhanced collaboration among security analysts are also noted as significant benefits.24

However, experts also emphasize the considerable risks associated with relying on LLMs for critical CTI tasks.1 A primary concern is the potential for LLMs to generate inaccurate information or "hallucinations," which could lead to flawed security strategies and increased risk.6 The inherent biases present in the training data of LLMs can also impact their ability to detect certain types of threats or lead to discriminatory outcomes.26 Security vulnerabilities, such as prompt injection attacks, data leakage, and model theft, pose additional risks that need careful consideration.27 Experts caution against overreliance on LLM outputs without proper validation, as this could lead to security breaches and misinformation.27 Moreover, ethical concerns surrounding data privacy, potential misuse of the technology, and the need for transparency and accountability are also highlighted.27 The OWASP Top 10 security risks for LLM applications further underscore the importance of understanding and mitigating these potential vulnerabilities.27 It is also important to note that threat actors are also exploring the use of LLMs to enhance their offensive capabilities, creating a dual-use scenario that necessitates a proactive and cautious approach to the adoption of this technology in cybersecurity.11 Overall, expert opinions suggest that while LLMs offer promising avenues for enhancing CTI workflows, a thorough understanding of their limitations and potential risks is crucial for their safe and effective implementation.

Hybrid Approaches and Alternative Methods for Reliable CTI Analysis

Given the current limitations of Large Language Models when applied to Cyber Threat Intelligence, alternative and hybrid approaches that combine the strengths of LLMs with human expertise and other AI/ML techniques are being actively explored. One prominent strategy is the "human-in-the-loop" approach, where LLMs are used to assist with initial analysis and information extraction, but human analysts retain the crucial role of reviewing, validating, and interpreting the findings.32 This collaborative model allows for leveraging the efficiency of LLMs in processing large volumes of data while ensuring the accuracy and context-awareness that human expertise provides. LLMs can act as "copilots" for human analysts, augmenting their capabilities and freeing them from more routine tasks to focus on complex investigations and strategic decision-making.34

Another promising avenue is the integration of LLMs with knowledge graphs.36 Knowledge graphs can provide a structured representation of cyber threat intelligence, enhancing the context and accuracy of LLM analysis by grounding it in a network of entities and relationships. Techniques like Retrieval Augmented Generation (RAG) are also gaining traction.3 RAG involves providing the LLM with relevant context retrieved from external knowledge bases, which can improve the quality and reliability of the generated responses by reducing hallucinations and increasing factual accuracy.

Beyond LLMs, traditional machine learning (ML) techniques and Natural Language Processing (NLP) methods continue to play a vital role in CTI analysis.42 ML algorithms are effective for pattern recognition and anomaly detection in network traffic and security events, while NLP techniques can be used for extracting key insights from threat intelligence reports and other textual sources. Hybrid methods that combine LLMs with these more established AI/ML techniques can leverage the unique strengths of each approach for a more robust and comprehensive CTI analysis pipeline. For example, LLMs could be used for initial processing and summarization of unstructured data, followed by ML models for pattern analysis and anomaly detection, with human analysts providing oversight and validation throughout the process. These hybrid strategies and alternative AI methods offer pathways to enhance the reliability of cyber threat intelligence analysis by mitigating some of the inherent limitations of relying solely on LLMs.

Conclusion and Recommendations

The exploration of Large Language Models for Cyber Threat Intelligence reveals a landscape characterized by both significant potential and considerable challenges. While LLMs offer compelling advantages in terms of processing large volumes of unstructured data and extracting information, concerns regarding their accuracy, consistency, and tendency towards overconfidence, especially when dealing with complex, real-world CTI reports, cannot be overlooked. Recent research, including evaluations using CTI-specific benchmarks like CTIBench, underscores the fact that current LLM technology is not yet a panacea for autonomous CTI analysis.

Given these findings, a cautious and pragmatic approach is recommended for organizations considering the integration of LLMs into their CTI workflows. A phased adoption strategy that prioritizes hybrid methods, combining the strengths of LLMs with human expertise, appears to be the most viable path forward. LLMs can be effectively leveraged as powerful tools to augment the capabilities of human analysts, particularly in tasks such as initial triage of threat information, summarization of reports, and entity extraction, provided that the outputs are carefully reviewed and validated by cybersecurity professionals.

Organizations should also invest in training their security teams on how to effectively utilize LLMs and critically evaluate their outputs. Understanding the limitations and potential biases of these models is crucial for preventing over-reliance and ensuring the accuracy of the intelligence derived. Furthermore, the ongoing development and adoption of CTI-specific benchmarks will be essential for objectively assessing the performance of new LLM models and techniques in this domain. Continued research into enhancing LLM reliability for CTI, including advancements in fine-tuning methodologies, context management, and the ability to quantify uncertainty in their responses, is also vital.

In conclusion, while Large Language Models are not yet a fully reliable solution for autonomous cyber threat intelligence analysis, they hold significant promise as tools to enhance the efficiency and scope of human analysts. By adopting a balanced and informed approach that emphasizes human oversight and continuous evaluation, organizations can responsibly integrate LLMs into their security practices, ultimately contributing to a more proactive and resilient cybersecurity posture.

Works cited

1.     Large Language Models are unreliable for Cyber Threat Intelligence - arXiv, accessed April 27, 2025, https://arxiv.org/html/2503.23175v1

2.     Large Language Models are Unreliable for Cyber Threat Intelligence, accessed April 27, 2025, https://www.arxiv.org/abs/2503.23175

3.     proceedings.neurips.cc, accessed April 27, 2025, https://proceedings.neurips.cc/paper_files/paper/2024/file/5acd3c628aa1819fbf07c39ef73e7285-Paper-Datasets_and_Benchmarks_Track.pdf

4.     CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat ..., accessed April 27, 2025, https://openreview.net/forum?id=iJAOpsXo2I

5.     Top Eight Large Language Models Benchmarks for Cybersecurity ..., accessed April 27, 2025, https://www.infosecurityeurope.com/en-gb/blog/future-thinking/top-8-llm-benchmarks-for-cybersecurity-practices.html

6.     Academics Develop Testing Benchmark for LLMs in CTI ..., accessed April 27, 2025, https://www.infosecurity-magazine.com/news/testing-benchmark-llm-cyber-threat/

7.     Generative AI and LLMs for Critical Infrastructure Protection ... - MDPI, accessed April 27, 2025, https://www.mdpi.com/1424-8220/25/6/1666

8.     The Crucial Role of Context Length in Large Language Models for ..., accessed April 27, 2025, https://groq.com/the-crucial-role-of-context-length-in-large-language-models-for-business-applications/

9.     The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums - Flare, accessed April 27, 2025, https://flare.io/wp-content/uploads/WhitePaper_LLM_UdeM_Flare.pdf

10.  Brief #101: OAuth Exploits Target Microsoft 365, Verizon DBIR Third ..., accessed April 27, 2025, https://mandos.io/newsletter/brief-101-oauth-exploits-target-microsoft-365-verizon-dbir-third-party-risk-llms-fail-at-cti/

11.  Testing your LLMs differently: Security updates from our latest Cyber Snapshot Report, accessed April 27, 2025, https://cloud.google.com/blog/products/identity-security/testing-your-llms-differently-security-updates-from-our-latest-cyber-snapshot-report

12.  Why do LLMs give different responses to the same prompt? : r/artificial - Reddit, accessed April 27, 2025, https://www.reddit.com/r/artificial/comments/1bh38a0/why_do_llms_give_different_responses_to_the_same/

13.  [D] Why do LLM's produce different answers with same input? : r/MachineLearning - Reddit, accessed April 27, 2025, https://www.reddit.com/r/MachineLearning/comments/1j3erqf/d_why_do_llms_produce_different_answers_with_same/

14.  Why does the answer vary for the same question asked multiple times - Community, accessed April 27, 2025, https://community.openai.com/t/why-does-the-answer-vary-for-the-same-question-asked-multiple-times/770718

15.  Why do Different LLMs Give Different Answers to the Same Question ..., accessed April 27, 2025, https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1123&context=covacci-undergraduateresearch

16.  Large Language Models are Unreliable for Cyber Threat ..., accessed April 27, 2025, https://www.researchgate.net/publication/390354860_Large_Language_Models_are_Unreliable_for_Cyber_Threat_Intelligence

17.  UC Irvine study finds mismatch between human perception and ..., accessed April 27, 2025, https://news.uci.edu/2025/01/22/uc-irvine-study-finds-mismatch-between-human-perception-and-reliability-of-ai-assisted-language-tools/

18.  Method prevents an AI model from being overconfident about wrong answers | MIT News, accessed April 27, 2025, https://news.mit.edu/2024/thermometer-prevents-ai-model-overconfidence-about-wrong-answers-0731

19.  What is few shot prompting? - IBM, accessed April 27, 2025, https://www.ibm.com/think/topics/few-shot-prompting

20.  Few-shot Prompting: The Essential Guide | Nightfall AI Security 101, accessed April 27, 2025, https://www.nightfall.ai/ai-security-101/few-shot-prompting

21.  FaroukDaboussi0/Fine-Tuning-LLMs-for-Cyber-Threat ... - GitHub, accessed April 27, 2025, https://github.com/FaroukDaboussi0/Fine-Tuning-LLMs-for-Cyber-Threat-Intelligence

22.  My experience on starting with fine tuning LLMs with custom data : r/LocalLLaMA - Reddit, accessed April 27, 2025, https://www.reddit.com/r/LocalLLaMA/comments/14vnfh2/my_experience_on_starting_with_fine_tuning_llms/

23.  (PDF) Few-Shot Learning of TTPs Classification Using Large ..., accessed April 27, 2025, https://www.researchgate.net/publication/377225255_Few-Shot_Learning_of_TTPs_Classification_Using_Large_Language_Models

24.  Decoding the Threat Matrix: How LLMs Amplify Cyber Threat Intelligence - CyberDB, accessed April 27, 2025, https://www.cyberdb.co/decoding-the-threat-matrix-how-llms-amplify-cyber-threat-intelligence/

25.  How Large Language Models Are Changing Threat Intelligence ..., accessed April 27, 2025, https://www.rsaconference.com/library/blog/how-large-language-models-are-changing-threat-intelligence-report-analysis

26.  Large Language Models for Cybersecurity: The Role of LLMs in Threat Hunting - Bolster AI, accessed April 27, 2025, https://bolster.ai/blog/large-language-models-cybersecurity

27.  OWASP Top 10 for LLMs in 2025: Risks & Mitigations Strategies - Strobes Security, accessed April 27, 2025, https://strobes.co/blog/owasp-top-10-risk-mitigations-for-llms-and-gen-ai-apps-2025/

28.  The Benefits and Risks of Using Large Language Models (LLM) in AI for Privacy Compliance | TrustArc, accessed April 27, 2025, https://trustarc.com/resource/benefits-risks-large-language-models-llm-ai-privacy-compliance/

29.  LLM Security: Top 10 Risks and 7 Security Best Practices - Exabeam, accessed April 27, 2025, https://www.exabeam.com/explainers/ai-cyber-security/llm-security-top-10-risks-and-7-security-best-practices/

30.  Large Language Models and Intelligence Analysis | Centre for ..., accessed April 27, 2025, https://cetas.turing.ac.uk/publications/large-language-models-and-intelligence-analysis

31.  Staying ahead of threat actors in the age of AI | Microsoft Security Blog, accessed April 27, 2025, https://www.microsoft.com/en-us/security/blog/2024/02/14/staying-ahead-of-threat-actors-in-the-age-of-ai/

32.  What Does an LLM-Powered Threat Intelligence Program Look Like? - Black Hat, accessed April 27, 2025, https://i.blackhat.com/BH-US-23/Presentations/US-23-Grof-Miller-LLM-Powered-TI-Program.pdf

33.  Leveraging LLMs for Non-Security Experts in Threat Hunting ... - MDPI, accessed April 27, 2025, https://www.mdpi.com/2504-4990/7/2/31

34.  Gen AI in Security – Improving SOC, CTI, and Red Team Tasks, accessed April 27, 2025, https://www.tidalcyber.com/blog/gen-ai-in-security-improving-soc-cti-and-red-team-tasks

35.  Matching AI Strengths to Blue Team Needs | Splunk, accessed April 27, 2025, https://www.splunk.com/en_us/blog/security/leveraging-ai-llms-for-cybersecurity-blue-team.html

36.  tmylla/Awesome-LLM4Cybersecurity: An overview of LLMs for cybersecurity. - GitHub, accessed April 27, 2025, https://github.com/tmylla/Awesome-LLM4Cybersecurity

37.  (PDF) Design of an Autonomous Cyber Defence Agent using Hybrid ..., accessed April 27, 2025, https://www.researchgate.net/publication/381196238_Design_of_an_Autonomous_Cyber_Defence_Agent_using_Hybrid_AI_models

38.  CTIKG: LLM-Powered Knowledge Graph Construction from Cyber ..., accessed April 27, 2025, https://openreview.net/forum?id=DOMP5AgwQz

39.  Ideas for Combining AI and Cyber Threat Intelligence? : r/cybersecurity - Reddit, accessed April 27, 2025, https://www.reddit.com/r/cybersecurity/comments/1gr5q0v/ideas_for_combining_ai_and_cyber_threat/

40.  www.first.org, accessed April 27, 2025, https://www.first.org/resources/papers/conf2024/1115-Neurocti-Kaplan-Dulaunoy-Brandl.pdf

41.  Hybrid Security with AI: Key Concepts and Benefits - Adevait, accessed April 27, 2025, https://adevait.com/artificial-intelligence/hybrid-security-ai

42.  www.first.org, accessed April 27, 2025, https://www.first.org/resources/papers/firstcti24/Sergeev-Processing-Threat-Reports-at-Scale-Using-AI-and-ML.pdf

43.  Tactical intelligence: leveraging AI to identify cyber threats - Telefónica Tech, accessed April 27, 2025, https://telefonicatech.com/en/blog/tactical-intelligence-leveraging-ai-to-identify-cyber-threats

44.  AI for Predictive Cyber Threat Intelligence - International Journal of Sustainable Development in Computing Science, accessed April 27, 2025, https://ijsdcs.com/index.php/IJMESD/article/download/590/228

45.  4 use cases for AI in cyber security - Red Hat, accessed April 27, 2025, https://www.redhat.com/en/blog/4-use-cases-ai-cyber-security

Joke about Cybercriminal hiring