Leituras, traduções e links
Monday, May 5, 2025
Thursday, May 1, 2025
Tuesday, April 29, 2025
Project 2025
original
https://static.project2025.org/2025_MandateForLeadership_FULL.pdf
1. Centralization of Power and Disregard for Bureaucracy
-
Detail: The document advocates for a strong, unified executive branch to aggressively pursue its agenda. It proposes dismantling the "Administrative State" by curbing the power of federal agencies and increasing presidential control.
-
Sociological Analysis: This reflects an authoritarian tendency to concentrate power in the executive, diminishing the role of checks and balances provided by the bureaucracy. Sociologically, this can lead to:
- Erosion of Institutional Trust: Undermining the bureaucracy can erode public trust in established institutions, as they are portrayed as obstacles to the popular will.
- Increased Political Instability: Centralizing power in the executive can destabilize the political system, making it more susceptible to the whims of a single leader.
- Weakening of Civil Service: Attacking the civil service can demoralize public employees and politicize government functions, reducing efficiency and expertise.
2. Imposition of Social and Cultural Values
-
Detail: The document promotes the use of federal power to enforce specific social and cultural values, particularly concerning issues like abortion and gender identity.
-
Sociological Analysis: This demonstrates an authoritarian approach to social issues, seeking to impose a single set of values on a diverse society. Sociologically, this can result in:
- Cultural Conflict: Efforts to enforce particular values can lead to intense cultural conflict and polarization, dividing society along ideological lines.
- Marginalization of Minority Groups: The imposition of majority values can marginalize minority groups and suppress their rights and identities.
- Erosion of Social Tolerance: Authoritarian approaches to culture can erode social tolerance and pluralism, creating a less open and inclusive society.
3. Economic Nationalism and Protectionism
-
Detail: The document advocates for reducing reliance on foreign supply chains and taking a confrontational stance towards China's economic influence.
-
Sociological Analysis: This reflects economic nationalism, which can align with fascist ideologies that prioritize national interests above global cooperation. Sociologically, this can lead to:
- Increased Social Cohesion (in the short term): Nationalism can create a sense of unity and shared purpose, particularly in response to perceived external threats.
- International Conflict: Protectionist trade policies and aggressive economic nationalism can strain international relations and increase the risk of conflict.
- Xenophobia and Social Exclusion: Nationalist sentiment can foster xenophobia and social exclusion, targeting immigrants and foreign populations.
4. The Language of Crisis and Urgency
-
Detail: The document frequently uses language that emphasizes a sense of crisis and the need for immediate, drastic action to "save" the country.
-
Sociological Analysis: This is a common tactic in authoritarian movements, creating a sense of urgency to justify the rapid implementation of policies without extensive deliberation or democratic process. Sociologically, this can result in:
- Suppression of Dissent: In a crisis atmosphere, dissent is often portrayed as unpatriotic or even treasonous, limiting free debate and opposition.
- Rapid Social Change: A sense of urgency can be used to push through radical changes to social and political structures, often with unintended consequences.
- Increased Anxiety and Insecurity: The constant emphasis on crisis can increase social anxiety and insecurity, making people more willing to accept authoritarian measures.
It is important to note that while these elements have sociological parallels with authoritarian and fascist tendencies, whether the document is authoritarian or fascist is a matter of interpretation and debate.
The document includes a section called "Authors" that provides a short biography of each author. Generally, the authors appear to have been referenced because of their subject matter expertise and/or experience in the area they wrote about. For example:
-
Daren Bakst is Deputy Director, Center for Energy and Environment, and Senior Fellow at the Competitive Enterprise Institute, and wrote the section on the Department of Agriculture.
-
Jonathan Berry is managing partner at Boyden Gray & Associates PLLC, and served as acting Assistant Secretary for Policy at the U.S. Department of Labor, and wrote the section on the Department of Labor.
-
Lindsey M. Burke is Director of the Center for Education Policy at The Heritage Foundation, and wrote the section on the Department of Education.
-
David R. Burton is Senior Fellow in Economic Policy in the Thomas A. Roe Institute for Economic Policy Studies at The Heritage Foundation, and wrote sections on the Department of Commerce and Financial Regulatory Agencies.
-
Adam Candeub is a professor of law at Michigan State University, and wrote the section on the Federal Trade Commission.
-
Dustin J. Carmack is Research Fellow for Cybersecurity, Intelligence, and Emerging Technologies in the Border Security and Immigration Center at The Heritage Foundation, and wrote the section on the Intelligence Community.
-
Brendan Carr has nearly 20 years of private-sector and public-sector experience in communications and tech policy and currently serves as the senior Republican on the Federal Communications Commission, and wrote the section on the Federal Communications Commission.
-
Benjamin S. Carson, Sr., MD, is Founder and Chairman of the American Cornerstone Institute and previously served as the 17th Secretary of the U.S. Department of Housing and Urban Development, and wrote the section on the Department of Housing and Urban Development.
-
Ken Cuccinelli served as Acting Director of U.S. Citizenship and Immigration Services in 2019 and then, from November 2019 through the end of the Trump Administration, as Acting Deputy Secretary for the U.S. Department of Homeland Security, and wrote the section on the Department of Homeland Security.
-
Rick Dearborn served as Deputy Chief of Staff for President Donald Trump and was responsible for the day-to-day operations of five separate departments of the Executive Office of the President, and wrote the section on the White House Office.
-
Veronique de Rugy is the George Gibbs Chair in Political Economy and Senior Research Fellow at the Mercatus Center at George Mason University, and wrote a section on why the Export-Import Bank should be abolished.
-
Donald Devine is Senior Scholar at The Fund for American Studies in Washington, DC, and was President Ronald Reagan’s first-term Office of Personnel Management Director, and wrote a section on Central Personnel Agencies.
-
Diana Furchtgott-Roth, an Oxford-educated economist, directs the Center for Energy, Climate, and Environment at The Heritage Foundation and is adjunct professor of economics at George Washington University, and wrote the section on the Department of Transportation.
-
Thomas F. Gilman served as Assistant Secretary of Commerce for Administration and Chief Financial Officer of the U.S. Department of Commerce in the Trump Administration, and wrote a section on the Department of Commerce.
-
Mandy M. Gunasekara is a principal at Section VII Strategies, a Senior Policy Analyst at the Independent Women’s Forum, and Visiting Fellow in the Center for Energy, Climate, and Environment at The Heritage Foundation, and wrote the section on the Environmental Protection Agency.
-
Gene Hamilton is Vice-President and General Counsel of America First Legal Foundation, and wrote the section on the Department of Justice.
-
Jennifer Hazelton has worked as a senior strategic consultant for the Department of Defense in Industrial Base Policy and has held senior positions at USAID, the Export–Import Bank of the United States, and the State Department, and wrote the section on the case for the Export-Import Bank.
-
Karen Kerrigan is President and CEO of the Small Business & Entrepreneurship Council, and wrote the section on the Small Business Administration.
-
Dennis Dean Kirk is Associate Director for Personnel Policy with the 2025 Presidential Transition Project at The Heritage Foundation, and wrote a section on Central Personnel Agencies.
-
Kent Lassman is President and CEO of the Competitive Enterprise Institute, and wrote the section on the case for Free Trade.
-
Bernard L. McNamee is an energy and regulatory attorney with a major law firm and was formerly a member of the Federal Energy Regulatory Commission, and wrote the section on the Department of Energy and Related Commissions.
-
Christopher Miller served in several positions during the Trump Administration, including as Acting U.S. Secretary of Defense, and wrote the section on the Department of Defense.
-
Stephen Moore is a conservative economist and author, and wrote a section on the Department of the Treasury.
-
Mora Namdar is an attorney and Senior Fellow at the American Foreign Policy Council, and wrote the section on the U.S. Agency for Global Media Corporation for Public Broadcasting.
-
Peter Navarro holds a PhD in economics from Harvard and was one of only three senior White House officials to serve with Donald Trump from the 2016 campaign to the end of the President’s first term, and wrote the section on the case for Fair Trade.
-
William Perry Pendley was an attorney on Capitol Hill, a senior official for President Ronald Reagan, and leader of the Bureau of Land Management for President Donald Trump, and wrote the section on the Department of the Interior.
-
Max Primorac is Director of the Douglas and Sarah Allison Center for Foreign Policy Studies at The Heritage Foundation, and wrote the section on the Agency for International Development.
-
Roger Severino is Vice President of Domestic Policy at The Heritage Foundation, and wrote the section on the Department of Health and Human Services.
-
Kiron K. Skinner is President and CEO of the Foundation for America and the World, Taube Professor of International Relations and Politics at Pepperdine University, and wrote the section on the Department of State.
The reference to these authors appears to lend credibility to the ideas presented in the book.
Kwok Pui-lan's work often highlights the importance of acknowledging the full humanity of biblical figures
This passage, focusing on the humanity and persistent prayer of Elijah, resonates deeply with the themes Kwok Pui-lan brings to feminist and postcolonial theologies. Reading it through her lens invites us to consider the text beyond a simplistic individualistic piety and to explore its broader implications for marginalized communities and our understanding of divine engagement.
Firstly, Kwok Pui-lan's work often highlights the importance of acknowledging the full humanity of biblical figures, especially those who might be idealized or presented in ways that obscure their struggles. The passage begins by emphasizing that "Elijah was human subject to like passions as we are." This resonates with Kwok's critique of theological frameworks that create a distant, unattainable God and saints who are unrelatable. Instead, we see Elijah in his vulnerability – murmuring, complaining, even experiencing unbelief. This resonates with the lived experiences of many, particularly those facing systemic oppression, who may grapple with doubt and despair in the face of suffering. Kwok's emphasis on the embodied and contextual nature of faith allows us to see Elijah's humanity not as a flaw, but as a point of connection and solidarity.
Secondly, Kwok Pui-lan's engagement with postcolonial perspectives encourages us to consider the power dynamics inherent in interpreting scripture. Often, readings focus solely on the individual's relationship with God, potentially overlooking the socio-political contexts in which these figures lived and prayed. While the passage emphasizes Elijah's personal prayer, Kwok's lens might prompt us to ask: What were the pressures Elijah faced? Were they solely internal, or were they connected to larger systems of injustice and oppression in his time? Recognizing these external pressures allows us to see Elijah's "murmuring and complaining" not just as personal failings, but as potential responses to systemic issues, mirroring the laments of marginalized communities throughout history.
Thirdly, Kwok Pui-lan's feminist theological insights invite us to consider the nature of prayer itself. The passage highlights the original Greek phrase "proseuchē prosēuxato," emphasizing a persistent, ongoing dialogue with God. This moves beyond a transactional view of prayer as simply asking for favors. Instead, it suggests a sustained relationship, a "meaningful, uninterrupted dialogue." This resonates with feminist theologies that often emphasize relationality and a God who is in deep connection with humanity. For those who have been historically silenced or whose voices have been marginalized, this image of persistent dialogue can be particularly empowering. It suggests that even in moments of doubt and complaint, the connection with the divine can endure.
Finally, the concluding lesson, "As we may gather, we must keep at it," takes on a richer meaning through Kwok Pui-lan's lens. It's not just about individual perseverance in prayer, but perhaps also about the collective persistence of marginalized communities in their struggles for justice and liberation. Just as Elijah kept at his dialogue with God through his human struggles, so too must communities facing oppression maintain their voices, their resistance, and their hope. This "keeping at it" becomes a form of spiritual resilience, a refusal to be silenced or defeated in the face of adversity.
In essence, reading this passage through Kwok Pui-lan's style encourages us to:
- Embrace the full humanity of biblical figures and ourselves, including our doubts and struggles.
- Consider the socio-political contexts that shape both individual and communal experiences of faith and prayer.
- Value prayer as a sustained, relational dialogue with the divine, particularly empowering for those whose voices have been marginalized.
- See persistence not just as individual piety, but also as a form of communal resilience in the face of injustice.
Sunday, April 27, 2025
The Core Components of an AI Agent
The diagram depicts a simplified model of how an AI agent interacts with its environment. It highlights the flow of information and actions between the user, the environment, and the AI agent itself.
Here's a breakdown of each component:
-
User Input: This represents the initial stimulus or instructions provided to the environment, often through a human-computer interface like a keyboard, mouse, voice command, or a touch screen. In a technical sense, this input is translated into a digital signal that the environment can process.
-
Environment: This is the external world with which the AI agent interacts. The diagram distinguishes between:
- Digital Infrastructure: This encompasses the virtual or computational aspects of the environment, such as software systems, databases, networks, and the internet. User input might directly manipulate this digital space.
- Physical Infrastructure: This refers to the tangible aspects of the environment, including robots, machines, physical spaces, and sensors embedded within them. User input might indirectly affect this through digital commands. The environment generates percepts based on its current state, influenced by user input and the AI agent's actions.
-
AI Agent: This is the intelligent entity that perceives its environment and acts upon it to achieve its goals. It comprises three key components:
- Sensors: These are the agent's perception mechanisms. Technically, sensors are devices or software modules that convert raw data from the environment into a format that the AI agent can understand. Examples include cameras (visual data), microphones (audio data), tactile sensors (pressure), GPS (location), or software interfaces that provide data from digital systems. The output of the sensors are the percepts, which are the agent's instantaneous view of the environment.
- Control Centre: This is the "brain" of the AI agent, where the processing and decision-making occur. Technically, this involves algorithms, models (like machine learning models), and logic that analyze the percepts, reason about the current situation, and decide on the next action. This component handles tasks like data processing, pattern recognition, knowledge representation, planning, and learning.
- Effectors: These are the means by which the AI agent acts upon the environment. Technically, effectors are devices or software modules that translate the agent's decisions (the "action") into physical movements or digital operations. Examples include motors in a robot arm, actuators that control a valve, software commands that modify a database, or signals sent to a display screen.
-
Flow of Information and Action: The arrows illustrate the interaction loop:
- User Input influences the Environment.
- The Environment generates Percepts based on its current state.
- The AI Agent's Sensors receive these Percepts.
- The Control Centre processes the Percepts and decides on an Action.
- The AI Agent's Effectors execute the Action on the Environment, potentially changing its state and leading to new percepts in the next cycle.
Are LLMs Actually Reliable for Cyber Threat Intelligence?
Introduction: The Promise and Perils of LLMs in Cyber Threat Intelligence
The digital landscape is characterized by an ever-increasing
volume and sophistication of cyber threats, necessitating robust and efficient
methods for Cyber Threat Intelligence (CTI) analysis. Understanding the
tactics, techniques, and procedures (TTPs) of threat actors, along with
identifying vulnerabilities and potential attack vectors, is crucial for
proactive defense and effective incident response. Large Language Models
(LLMs), with their remarkable capabilities in natural language processing and understanding,
have emerged as a promising technology in various domains, including
cybersecurity. The potential of LLMs to automate the processing and analysis of
vast amounts of unstructured threat data has generated considerable enthusiasm
within the cybersecurity community.1 By enhancing the ability to extract relevant information,
identify patterns, and potentially predict future threats, LLMs offer a
compelling solution to the challenge of managing the growing deluge of cyber
threat information.
However, alongside this optimism, significant concerns persist
regarding the actual reliability and accuracy of LLMs for critical CTI
operations.1 The unique requirements of cyber threat intelligence, which
demands precision, consistency, and a high degree of trustworthiness, raise
questions about the suitability of current LLM technology. Recent research has
begun to rigorously evaluate the performance of LLMs on real-world CTI tasks,
revealing potential limitations and risks associated with their deployment in
this sensitive domain. This report aims to provide a comprehensive analysis of
the reliability of LLMs for CTI, drawing upon recent studies and expert
opinions to offer an evidence-based perspective on their current capabilities
and limitations. By examining various facets of LLM performance, including
accuracy on different types of CTI reports, consistency of responses, the
phenomenon of overconfidence in incorrect answers, and the effectiveness of
reliability-enhancing techniques, this report seeks to inform cybersecurity
professionals, researchers, and decision-makers about the practical
implications of integrating LLMs into their CTI workflows.
Evaluating the
Performance of LLMs on Real-World CTI Reports
Recent research has focused on developing methodologies to
evaluate the effectiveness of LLMs in handling real-world Cyber Threat
Intelligence reports. One such evaluation methodology, presented in several
works, allows for testing LLMs on CTI tasks using zero-shot learning, few-shot
learning, and fine-tuning approaches.1 This methodology also enables the quantification of the models'
consistency and their confidence levels in the generated outputs. Experiments
conducted using three state-of-the-art LLMs on a dataset comprising 350 threat
intelligence reports have revealed potential security risks associated with
relying on LLMs for CTI.1 The findings indicate that these models often struggle to
guarantee sufficient performance when processing real-size reports, exhibiting
both inconsistency and overconfidence in their analyses.
To address the limitations of general-purpose benchmarks in
evaluating LLMs for the specific demands of CTI, researchers have developed
specialized benchmarks such as CTIBench.3 CTIBench is designed as a novel suite of benchmark tasks and
datasets aimed at assessing LLMs' performance in practical CTI applications.
This benchmark includes multiple datasets that focus on evaluating the
knowledge acquired by LLMs within the cyber-threat landscape. CTIBench
incorporates a variety of tasks designed to test different cognitive abilities
crucial for CTI analysis, including CTI Multiple Choice Questions (CTI-MCQ),
CTI Root Cause Mapping (CTI-RCM), CTI Vulnerability Severity Prediction
(CTI-VSP), and CTI Threat Actor Attribution (CTI-TAA).3 These tasks
collectively assess an LLM's understanding of CTI standards, reasoning about
vulnerabilities, and ability to attribute threats to specific actors.
Evaluations of several prominent LLMs, including ChatGPT 3.5,
ChatGPT 4, Gemini 1.5, Llama 3-70B, and Llama 3-8B, on the CTIBench tasks have
provided valuable insights into their strengths and weaknesses in CTI contexts.5 For instance, ChatGPT 4
has demonstrated strong performance across most CTIBench tasks, indicating a
good understanding of CTI knowledge and reasoning capabilities.6 However, the
performance of different models varies across the specific tasks within the
benchmark. For example, while ChatGPT 4 excels in CTI-MCQ, CTI-RCM, and
CTI-TAA, Gemini 1.5 has shown superior results in CTI-VSP, which involves
predicting vulnerability severity.6 Notably, the open-source model LLAMA3-70B has also exhibited
competitive performance, comparable to or even outperforming Gemini 1.5 on
certain tasks like CTI-MCQ and CTI-TAA, although it faces challenges in
vulnerability severity prediction.6 The development and application of CTI-specific benchmarks like
CTIBench are crucial for obtaining a more accurate and nuanced understanding of
LLM capabilities in this specialized domain, moving beyond the broader
evaluations offered by general language understanding benchmarks.3
While CTIBench focuses specifically on CTI, other benchmarks
like SECURE, NetEval, and DebugBench offer insights into LLM performance in
different cybersecurity domains.7 SECURE is tailored for evaluating LLMs in the context of
Industrial Control Systems (ICS) cybersecurity, while NetEval assesses
capabilities in network operations, and DebugBench focuses on debugging
capabilities.5 The existence of these diverse benchmarks underscores the broad
interest in leveraging LLMs across various cybersecurity tasks, yet it also
highlights that the unique challenges of CTI necessitate dedicated evaluation
tools like CTIBench to accurately gauge the reliability of LLMs in this field.
The Impact of CTI Report
Length and Complexity on LLM Accuracy
The length and complexity of Cyber Threat Intelligence reports
appear to significantly influence the accuracy of Large Language Models in
extracting and processing information. Research suggests that LLM performance
on CTI tasks can be affected by the amount of text they need to analyze. For
instance, one study indicated that when LLMs were tasked with extracting
information from threat reports, their performance worsened when processing
complete, longer reports compared to individual paragraphs from the same
reports.1 This degradation in
performance was characterized by an increase in both false positives and false
negatives, suggesting that the models struggled to maintain accuracy and
relevance as the input size grew.
The concept of context length, which refers to the maximum
number of tokens an LLM can process in a single input, plays a crucial role in
this phenomenon.8 A longer context length generally allows an LLM to understand
more detailed commands and maintain context over longer interactions,
potentially leading to higher quality outputs for complex tasks.8 However, processing
longer contexts also demands greater computational resources and can sometimes
lead to confusion or loss of focus on the most critical information within the
report.1
In contrast, studies focusing on shorter text samples have
reported higher accuracy rates. For example, research evaluating an LLM system
on summaries and coding of variables from cybercrime forum conversations, which
tend to be shorter and more focused, achieved an average accuracy of 98%.9 This stark difference
in performance between shorter forum posts and full-length threat reports
implies that the ability of LLMs to handle extensive context and maintain
accuracy is a significant challenge in the domain of CTI. The increased length
and complexity of real-world CTI reports might overwhelm the models' processing
capabilities or introduce ambiguities that negatively impact their reliability
in extracting key intelligence. Therefore, understanding the interplay between
report length, complexity, and LLM accuracy is essential for determining the
appropriate use cases and limitations of these models in CTI workflows.
Identifying the
Limitations: Instances of Missed Campaign Information and Inconsistency
One of the critical aspects in evaluating the reliability of
LLMs for Cyber Threat Intelligence is their ability to consistently and
comprehensively extract essential information from CTI reports. Recent research
has revealed significant limitations in this regard, particularly concerning
the frequency with which LLMs miss crucial campaign information and overlook
vulnerabilities. Studies have indicated that even state-of-the-art LLMs can
fail to extract critical information with sufficient reliability, overlooking a
notable percentage of key campaign entities and vulnerabilities present in
real-world CTI reports.10 For instance, findings suggest that LLMs might miss up to 20%
of campaign entities and 10% of vulnerabilities, which are fundamental elements
for understanding the scope and impact of cyber threats.10 Further supporting
this, research on entity extraction from threat reports showed recall rates as
low as 0.72 for campaign entities when using certain LLM models, implying that
as much as 28% of this vital information could be overlooked.1 The failure to
accurately identify and extract such critical data can have serious
implications for threat detection, incident response, and overall cybersecurity
situational awareness.
Another significant concern regarding the reliability of LLMs in
CTI analysis is the consistency of their responses when presented with the same
information multiple times. Evaluations have demonstrated that LLMs can exhibit
inconsistency in their responses to identical queries about a CTI report.1 This lack of consistent
output, even when the input remains the same, introduces uncertainty into the
analysis process, particularly for critical security decisions such as
prioritizing patching or attributing attacks.10 The probabilistic
nature of LLMs, where responses are generated based on probability
distributions over tokens, contributes to this variability.11 Factors such as
temperature settings and the stochastic process of token sampling can lead to
different outputs across multiple runs, even for the same prompt. A study
analyzing LLM response variability in the context of ranking intrusion
detection systems further illustrates this issue, finding significant
divergence in the recommendations provided by different LLMs for the same
query.15 This inherent
inconsistency in LLM responses raises concerns about their suitability for
tasks requiring repeatable and dependable analytical outcomes, which are
paramount in the field of cyber threat intelligence.
The Challenge of
Overconfidence: LLMs Expressing High Confidence in Incorrect Answers
A particularly concerning aspect of relying on Large Language
Models for Cyber Threat Intelligence is the phenomenon of these models
expressing high confidence in answers that are, in fact, incorrect.1 This issue of
overconfidence, often occurring despite poor calibration of the model's
certainty with the actual correctness of its output 10, poses a significant
risk to cybersecurity professionals who might be misled into trusting
inaccurate information. The tendency of LLMs to generate fluent and
plausible-sounding text, even when the content is factually wrong, can create a
false sense of security and potentially lead to flawed decision-making in
critical CTI scenarios.
Research from cognitive and computer scientists has corroborated
this tendency, revealing that people generally overestimate the accuracy of LLM
outputs.17 Studies have identified
a "calibration gap," representing the difference between what LLMs
know and what users perceive they know, as well as a "discrimination
gap," which measures how well humans and models can distinguish between correct
and incorrect answers.17 These findings highlight a fundamental challenge in the
human-AI interaction when it comes to interpreting the reliability of
LLM-generated information. LLMs often do not inherently communicate their level
of uncertainty in their responses, leading users to potentially trust
confidently stated but ultimately erroneous information.17 To address this issue,
researchers have explored methods for calibrating LLMs, such as the
"Thermometer" technique, which aims to align a model's confidence
level with its actual accuracy, thereby providing users with a clearer signal
of when a model's response should be trusted.18 Overcoming the problem
of LLM overconfidence in incorrect answers is crucial for their safe and
effective integration into cyber threat intelligence workflows, where accuracy
and reliability are paramount.
Enhancing Reliability:
The Role of Few-Shot Learning and Fine-Tuning
In an effort to improve the reliability of Large Language Models
for Cyber Threat Intelligence tasks, researchers have explored techniques such
as few-shot learning and fine-tuning. Few-shot learning involves providing the
LLM with a limited number of examples within the prompt to guide its
performance on a specific task, particularly useful when extensive labeled data
is unavailable.19 Fine-tuning, on the other hand, involves further training a
pre-trained LLM on a smaller, task-specific dataset to adapt its parameters for
improved performance in that domain.21
However, studies evaluating the impact of these techniques on
LLM performance in CTI have yielded mixed results. Research has suggested that
few-shot learning and fine-tuning may only partially improve the reliability of
LLMs for CTI tasks like entity extraction, and in some instances, these
techniques have even been observed to worsen performance.1 The limited
effectiveness could be attributed to several factors, including the inherent
complexity of CTI tasks, the scarcity of high-quality labeled datasets in the
cybersecurity domain for effective fine-tuning 1, and the potential for
overfitting the model to the limited examples provided in few-shot learning
scenarios.16
Despite these challenges, some research has shown promise in
leveraging these techniques for specific CTI sub-tasks. For example, one study
proposed a method that combines data augmentation using ChatGPT with
instruction supervised fine-tuning of open large language models for Tactics,
Techniques, and Procedures (TTPs) classification in few-shot learning
scenarios, reporting encouraging results.23 Additionally, ongoing projects aim to fine-tune LLMs using
CTI-specific data and evaluate their performance on benchmarks like CTIBench,
with the goal of enhancing their capabilities in areas such as identifying
threat actors and mapping attack techniques.21 While current evidence suggests that few-shot learning and
fine-tuning alone may not be sufficient to guarantee high reliability for all
CTI tasks, targeted and innovative approaches in applying these techniques
continue to be explored as potential avenues for improvement.
Expert Perspectives on
the Risks and Benefits of LLMs in CTI Workflows
Experts in the field of cybersecurity and artificial
intelligence hold diverse perspectives on the potential risks and benefits of
integrating Large Language Models into Cyber Threat Intelligence workflows. On
the one hand, LLMs are recognized for their significant potential to enhance
various aspects of CTI analysis.24 Their ability to process and understand vast quantities of
human language allows for improved threat detection and prediction by analyzing
historical attack data to identify patterns and trends.24 LLMs can also
accelerate incident response by rapidly analyzing incident data to determine
the scope and root cause of an attack and recommend remediation strategies.24 Furthermore, they can
streamline threat intelligence analysis by automating time-consuming tasks like
data collection, aggregation, and correlation, freeing up human analysts to
focus on higher-order tasks.24 The potential for cost reduction through automation and
enhanced collaboration among security analysts are also noted as significant
benefits.24
However, experts also emphasize the considerable risks
associated with relying on LLMs for critical CTI tasks.1 A primary concern is
the potential for LLMs to generate inaccurate information or
"hallucinations," which could lead to flawed security strategies and
increased risk.6 The inherent biases present in the training data of LLMs can
also impact their ability to detect certain types of threats or lead to
discriminatory outcomes.26 Security vulnerabilities, such as prompt injection attacks,
data leakage, and model theft, pose additional risks that need careful
consideration.27 Experts caution against overreliance on LLM outputs without
proper validation, as this could lead to security breaches and misinformation.27 Moreover, ethical
concerns surrounding data privacy, potential misuse of the technology, and the
need for transparency and accountability are also highlighted.27 The OWASP Top 10
security risks for LLM applications further underscore the importance of
understanding and mitigating these potential vulnerabilities.27 It is also important to
note that threat actors are also exploring the use of LLMs to enhance their
offensive capabilities, creating a dual-use scenario that necessitates a
proactive and cautious approach to the adoption of this technology in
cybersecurity.11 Overall, expert opinions suggest that while LLMs offer
promising avenues for enhancing CTI workflows, a thorough understanding of
their limitations and potential risks is crucial for their safe and effective
implementation.
Hybrid Approaches and
Alternative Methods for Reliable CTI Analysis
Given the current limitations of Large Language Models when
applied to Cyber Threat Intelligence, alternative and hybrid approaches that
combine the strengths of LLMs with human expertise and other AI/ML techniques
are being actively explored. One prominent strategy is the
"human-in-the-loop" approach, where LLMs are used to assist with
initial analysis and information extraction, but human analysts retain the
crucial role of reviewing, validating, and interpreting the findings.32 This collaborative
model allows for leveraging the efficiency of LLMs in processing large volumes
of data while ensuring the accuracy and context-awareness that human expertise
provides. LLMs can act as "copilots" for human analysts, augmenting
their capabilities and freeing them from more routine tasks to focus on complex
investigations and strategic decision-making.34
Another promising avenue is the integration of LLMs with
knowledge graphs.36 Knowledge graphs can provide a structured representation of
cyber threat intelligence, enhancing the context and accuracy of LLM analysis
by grounding it in a network of entities and relationships. Techniques like
Retrieval Augmented Generation (RAG) are also gaining traction.3 RAG involves providing
the LLM with relevant context retrieved from external knowledge bases, which
can improve the quality and reliability of the generated responses by reducing
hallucinations and increasing factual accuracy.
Beyond LLMs, traditional machine learning (ML) techniques and
Natural Language Processing (NLP) methods continue to play a vital role in CTI
analysis.42 ML algorithms are
effective for pattern recognition and anomaly detection in network traffic and
security events, while NLP techniques can be used for extracting key insights
from threat intelligence reports and other textual sources. Hybrid methods that
combine LLMs with these more established AI/ML techniques can leverage the
unique strengths of each approach for a more robust and comprehensive CTI
analysis pipeline. For example, LLMs could be used for initial processing and
summarization of unstructured data, followed by ML models for pattern analysis
and anomaly detection, with human analysts providing oversight and validation
throughout the process. These hybrid strategies and alternative AI methods
offer pathways to enhance the reliability of cyber threat intelligence analysis
by mitigating some of the inherent limitations of relying solely on LLMs.
Conclusion and
Recommendations
The exploration of Large Language Models for Cyber Threat
Intelligence reveals a landscape characterized by both significant potential
and considerable challenges. While LLMs offer compelling advantages in terms of
processing large volumes of unstructured data and extracting information,
concerns regarding their accuracy, consistency, and tendency towards
overconfidence, especially when dealing with complex, real-world CTI reports,
cannot be overlooked. Recent research, including evaluations using CTI-specific
benchmarks like CTIBench, underscores the fact that current LLM technology is
not yet a panacea for autonomous CTI analysis.
Given these findings, a cautious and pragmatic approach is
recommended for organizations considering the integration of LLMs into their
CTI workflows. A phased adoption strategy that prioritizes hybrid methods,
combining the strengths of LLMs with human expertise, appears to be the most
viable path forward. LLMs can be effectively leveraged as powerful tools to
augment the capabilities of human analysts, particularly in tasks such as
initial triage of threat information, summarization of reports, and entity
extraction, provided that the outputs are carefully reviewed and validated by
cybersecurity professionals.
Organizations should also invest in training their security
teams on how to effectively utilize LLMs and critically evaluate their outputs.
Understanding the limitations and potential biases of these models is crucial
for preventing over-reliance and ensuring the accuracy of the intelligence
derived. Furthermore, the ongoing development and adoption of CTI-specific
benchmarks will be essential for objectively assessing the performance of new
LLM models and techniques in this domain. Continued research into enhancing LLM
reliability for CTI, including advancements in fine-tuning methodologies,
context management, and the ability to quantify uncertainty in their responses,
is also vital.
In conclusion, while Large Language Models are not yet a fully
reliable solution for autonomous cyber threat intelligence analysis, they hold
significant promise as tools to enhance the efficiency and scope of human
analysts. By adopting a balanced and informed approach that emphasizes human
oversight and continuous evaluation, organizations can responsibly integrate
LLMs into their security practices, ultimately contributing to a more proactive
and resilient cybersecurity posture.
Works
cited
1.
Large
Language Models are unreliable for Cyber Threat Intelligence - arXiv, accessed
April 27, 2025, https://arxiv.org/html/2503.23175v1
2.
Large
Language Models are Unreliable for Cyber Threat Intelligence, accessed April
27, 2025, https://www.arxiv.org/abs/2503.23175
3.
proceedings.neurips.cc,
accessed April 27, 2025, https://proceedings.neurips.cc/paper_files/paper/2024/file/5acd3c628aa1819fbf07c39ef73e7285-Paper-Datasets_and_Benchmarks_Track.pdf
4.
CTIBench:
A Benchmark for Evaluating LLMs in Cyber Threat ..., accessed April 27, 2025, https://openreview.net/forum?id=iJAOpsXo2I
5.
Top
Eight Large Language Models Benchmarks for Cybersecurity ..., accessed April
27, 2025, https://www.infosecurityeurope.com/en-gb/blog/future-thinking/top-8-llm-benchmarks-for-cybersecurity-practices.html
6.
Academics
Develop Testing Benchmark for LLMs in CTI ..., accessed April 27, 2025, https://www.infosecurity-magazine.com/news/testing-benchmark-llm-cyber-threat/
7.
Generative
AI and LLMs for Critical Infrastructure Protection ... - MDPI, accessed April
27, 2025, https://www.mdpi.com/1424-8220/25/6/1666
8.
The
Crucial Role of Context Length in Large Language Models for ..., accessed April
27, 2025, https://groq.com/the-crucial-role-of-context-length-in-large-language-models-for-business-applications/
9.
The
Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in
Cybercrime Forums - Flare, accessed April 27, 2025, https://flare.io/wp-content/uploads/WhitePaper_LLM_UdeM_Flare.pdf
10. Brief #101: OAuth Exploits Target Microsoft
365, Verizon DBIR Third ..., accessed April 27, 2025, https://mandos.io/newsletter/brief-101-oauth-exploits-target-microsoft-365-verizon-dbir-third-party-risk-llms-fail-at-cti/
11. Testing your LLMs differently: Security
updates from our latest Cyber Snapshot Report, accessed April 27, 2025, https://cloud.google.com/blog/products/identity-security/testing-your-llms-differently-security-updates-from-our-latest-cyber-snapshot-report
12. Why do LLMs give different responses to the
same prompt? : r/artificial - Reddit, accessed April 27, 2025, https://www.reddit.com/r/artificial/comments/1bh38a0/why_do_llms_give_different_responses_to_the_same/
13. [D] Why do LLM's produce different answers
with same input? : r/MachineLearning - Reddit, accessed April 27, 2025, https://www.reddit.com/r/MachineLearning/comments/1j3erqf/d_why_do_llms_produce_different_answers_with_same/
14. Why does the answer vary for the same
question asked multiple times - Community, accessed April 27, 2025, https://community.openai.com/t/why-does-the-answer-vary-for-the-same-question-asked-multiple-times/770718
15. Why do Different LLMs Give Different
Answers to the Same Question ..., accessed April 27, 2025, https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1123&context=covacci-undergraduateresearch
16. Large Language Models are Unreliable for
Cyber Threat ..., accessed April 27, 2025, https://www.researchgate.net/publication/390354860_Large_Language_Models_are_Unreliable_for_Cyber_Threat_Intelligence
17. UC Irvine study finds mismatch between
human perception and ..., accessed April 27, 2025, https://news.uci.edu/2025/01/22/uc-irvine-study-finds-mismatch-between-human-perception-and-reliability-of-ai-assisted-language-tools/
18. Method prevents an AI model from being
overconfident about wrong answers | MIT News, accessed April 27, 2025, https://news.mit.edu/2024/thermometer-prevents-ai-model-overconfidence-about-wrong-answers-0731
19. What is few shot prompting? - IBM, accessed
April 27, 2025, https://www.ibm.com/think/topics/few-shot-prompting
20. Few-shot Prompting: The Essential Guide |
Nightfall AI Security 101, accessed April 27, 2025, https://www.nightfall.ai/ai-security-101/few-shot-prompting
21. FaroukDaboussi0/Fine-Tuning-LLMs-for-Cyber-Threat
... - GitHub, accessed April 27, 2025, https://github.com/FaroukDaboussi0/Fine-Tuning-LLMs-for-Cyber-Threat-Intelligence
22. My experience on starting with fine tuning
LLMs with custom data : r/LocalLLaMA - Reddit, accessed April 27, 2025, https://www.reddit.com/r/LocalLLaMA/comments/14vnfh2/my_experience_on_starting_with_fine_tuning_llms/
23. (PDF) Few-Shot Learning of TTPs
Classification Using Large ..., accessed April 27, 2025, https://www.researchgate.net/publication/377225255_Few-Shot_Learning_of_TTPs_Classification_Using_Large_Language_Models
24. Decoding the Threat Matrix: How LLMs
Amplify Cyber Threat Intelligence - CyberDB, accessed April 27, 2025, https://www.cyberdb.co/decoding-the-threat-matrix-how-llms-amplify-cyber-threat-intelligence/
25. How Large Language Models Are Changing
Threat Intelligence ..., accessed April 27, 2025, https://www.rsaconference.com/library/blog/how-large-language-models-are-changing-threat-intelligence-report-analysis
26. Large Language Models for Cybersecurity:
The Role of LLMs in Threat Hunting - Bolster AI, accessed April 27, 2025, https://bolster.ai/blog/large-language-models-cybersecurity
27. OWASP Top 10 for LLMs in 2025: Risks &
Mitigations Strategies - Strobes Security, accessed April 27, 2025, https://strobes.co/blog/owasp-top-10-risk-mitigations-for-llms-and-gen-ai-apps-2025/
28. The Benefits and Risks of Using Large
Language Models (LLM) in AI for Privacy Compliance | TrustArc, accessed April
27, 2025, https://trustarc.com/resource/benefits-risks-large-language-models-llm-ai-privacy-compliance/
29. LLM Security: Top 10 Risks and 7 Security
Best Practices - Exabeam, accessed April 27, 2025, https://www.exabeam.com/explainers/ai-cyber-security/llm-security-top-10-risks-and-7-security-best-practices/
30. Large Language Models and Intelligence
Analysis | Centre for ..., accessed April 27, 2025, https://cetas.turing.ac.uk/publications/large-language-models-and-intelligence-analysis
31. Staying ahead of threat actors in the age
of AI | Microsoft Security Blog, accessed April 27, 2025, https://www.microsoft.com/en-us/security/blog/2024/02/14/staying-ahead-of-threat-actors-in-the-age-of-ai/
32. What Does an LLM-Powered Threat
Intelligence Program Look Like? - Black Hat, accessed April 27, 2025, https://i.blackhat.com/BH-US-23/Presentations/US-23-Grof-Miller-LLM-Powered-TI-Program.pdf
33. Leveraging LLMs for Non-Security Experts in
Threat Hunting ... - MDPI, accessed April 27, 2025, https://www.mdpi.com/2504-4990/7/2/31
34. Gen AI in Security – Improving SOC, CTI,
and Red Team Tasks, accessed April 27, 2025, https://www.tidalcyber.com/blog/gen-ai-in-security-improving-soc-cti-and-red-team-tasks
35. Matching AI Strengths to Blue Team Needs |
Splunk, accessed April 27, 2025, https://www.splunk.com/en_us/blog/security/leveraging-ai-llms-for-cybersecurity-blue-team.html
36. tmylla/Awesome-LLM4Cybersecurity: An
overview of LLMs for cybersecurity. - GitHub, accessed April 27, 2025, https://github.com/tmylla/Awesome-LLM4Cybersecurity
37. (PDF) Design of an Autonomous Cyber Defence
Agent using Hybrid ..., accessed April 27, 2025, https://www.researchgate.net/publication/381196238_Design_of_an_Autonomous_Cyber_Defence_Agent_using_Hybrid_AI_models
38. CTIKG: LLM-Powered Knowledge Graph
Construction from Cyber ..., accessed April 27, 2025, https://openreview.net/forum?id=DOMP5AgwQz
39. Ideas for Combining AI and Cyber Threat
Intelligence? : r/cybersecurity - Reddit, accessed April 27, 2025, https://www.reddit.com/r/cybersecurity/comments/1gr5q0v/ideas_for_combining_ai_and_cyber_threat/
40. www.first.org, accessed April 27, 2025, https://www.first.org/resources/papers/conf2024/1115-Neurocti-Kaplan-Dulaunoy-Brandl.pdf
41. Hybrid Security with AI: Key Concepts and
Benefits - Adevait, accessed April 27, 2025, https://adevait.com/artificial-intelligence/hybrid-security-ai
42. www.first.org, accessed April 27, 2025, https://www.first.org/resources/papers/firstcti24/Sergeev-Processing-Threat-Reports-at-Scale-Using-AI-and-ML.pdf
43. Tactical intelligence: leveraging AI to
identify cyber threats - Telefónica Tech, accessed April 27, 2025, https://telefonicatech.com/en/blog/tactical-intelligence-leveraging-ai-to-identify-cyber-threats
44. AI for Predictive Cyber Threat Intelligence
- International Journal of Sustainable Development in Computing Science,
accessed April 27, 2025, https://ijsdcs.com/index.php/IJMESD/article/download/590/228
45. 4 use cases for AI in cyber security - Red
Hat, accessed April 27, 2025, https://www.redhat.com/en/blog/4-use-cases-ai-cyber-security
-
Summarization of hundreds of comments on Reddit. Ineffective Service: The users explicitly states, "Confirmed that it doesn't w...