Visualização de leitura
Which Came First: The System Prompt, or the RCE?
During a recent penetration test, we came across an AI-powered desktop application that acted as a bridge between Claude (Opus 4.5) and a third-party asset management platform. The idea is simple: instead of clicking through dashboards and making API calls, users just ask the agent to do it for them. “How many open tickets do […]
The post Which Came First: The System Prompt, or the RCE? appeared first on Praetorian.
The post Which Came First: The System Prompt, or the RCE? appeared first on Security Boulevard.
Insights into Claude Code Security: A New Pattern of Intelligent Attack and Defense
On February 20, 2026, AI company Anthropic released a new code security tool called Claude Code Security. This release coincided with the highly sensitive period of global capital markets to AI technology subverting the traditional software industry, which quickly triggered violent fluctuations in the capital market and caused the fall of stock prices of major […]
The post Insights into Claude Code Security: A New Pattern of Intelligent Attack and Defense appeared first on NSFOCUS, Inc., a global network and cyber security leader, protects enterprises and carriers from advanced cyber attacks..
The post Insights into Claude Code Security: A New Pattern of Intelligent Attack and Defense appeared first on Security Boulevard.
How red teaming helps safeguard the infrastructure behind AI models
Artificial intelligence (AI) is now squarely on the frontlines of information security. However, as is often the case when the pace of technological innovation is very rapid, security often ends up being a secondary consideration. This is increasingly evident from the ad-hoc nature of many implementations, where organizations lack a clear strategy for responsible AI use.
Attack surfaces aren’t just expanding due to risks and vulnerabilities in AI models themselves but also in the underlying infrastructure that supports them. Many foundation models, as well as the data sets used to train them, are open-source and readily available to developers and adversaries alike.
Unique risks to AI models
According to Ruben Boonen, CNE Capability Development Lead at IBM: “One problem is that you have these models hosted on giant open-source data stores. You don’t know who created them or how they were modified, and there are a number of issues that can occur here. For example, let’s say you use PyTorch to load a model hosted on one of these data stores, but it has been changed in a way that’s undesirable. It can be very hard to tell because the model might behave normally in 99% of cases.”
Recently, researchers discovered thousands of malicious files hosted on Hugging Face, one of the largest repositories for open-source generative AI models and training data sets. These included around a hundred malicious models capable of injecting malicious code onto users’ machines. In one case, hackers set up a fake profile masquerading as genetic testing startup 23AndMe to deceive users into downloading a compromised model capable of stealing AWS passwords. It was downloaded thousands of times before finally being reported and removed.
In another recent case, red team researchers discovered vulnerabilities in ChatGPT’s API, in which a single HTTP request elicited two responses indicating an unusual code path that could theoretically be exploited if not addressed. This, in turn, could lead to data leakage, denial of service attacks and even escalation of privileges. The team also discovered vulnerabilities in plugins for ChatGPT, potentially resulting in account takeover.
While open-source licensing and cloud computing are key drivers of innovation in the AI space, they’re also a source of risk. On top of these AI-specific risk areas, general infrastructure security concerns also apply, such as vulnerabilities in cloud configurations or poor monitoring and logging processes.
AI models are the new frontier of intellectual property theft
Imagine pouring huge amounts of financial and human resources into building a proprietary AI model, only to have it stolen or reverse-engineered. Unfortunately, model theft is a growing problem, not least because AI models often contain sensitive information and can potentially reveal an organization’s secrets should they end up in the wrong hands.
One of the most common mechanisms for model theft is model extraction, whereby attackers access and exploit models through API vulnerabilities. This can potentially grant them access to black-box models — like ChatGPT — at which point they can strategically query the model to collect enough data to reverse engineer it.
In most cases, AI systems run on cloud architecture rather than local machines. After all, the cloud provides the scalable data storage and processing power required to run AI models easily and accessibly. However, that accessibility also increases the attack surface, allowing adversaries to exploit vulnerabilities like misconfigurations in access permissions.
“When companies provide these models, there are usually client-facing applications delivering services to end users, such as an AI chatbot. If there’s an API that tells it which model to use, attackers could attempt to exploit it to access an unreleased model,” says Boonen.
Red teams keep AI models secure
Protecting against model theft and reverse engineering requires a multifaceted approach that combines conventional security measures like secure containerization practices and access controls, as well as offensive security measures.
The latter is where red teaming comes in. Red teams can proactively address several aspects of AI model theft, such as:
- API attacks: By systematically querying black-box models in the same way adversaries would, red teams can identify vulnerabilities like suboptimal rate limiting or insufficient response filtering.
- Side-channel attacks: Red teams can also carry out side-channel analyses, in which they monitor metrics like CPU and memory usage in an attempt to glean information about the model size, architecture or parameters.
- Container and orchestration attacks: By assessing containerized AI dependencies like frameworks, libraries, models and applications, red teams can identify orchestration vulnerabilities, such as misconfigured permissions and unauthorized container access.
- Supply chain attacks: Red teams can probe entire AI supply chains spanning multiple dependencies hosted in different environments to ensure that only trusted components like plugins and third-party integrations are being used.
A thorough red teaming strategy can simulate the full scope of real-world attacks against AI infrastructure to reveal gaps in security and incident response plans that could lead to model theft.
Mitigating the problem of excessive agency in AI systems
Most AI systems have a degree of autonomy with regard to how they interface with different systems and respond to prompts. After all, that’s what makes them useful. However, if systems have too much autonomy, functionality or permissions — a concept OWASP calls “excessive agency” — they can end up triggering harmful or unpredictable outputs and processes or leaving gaps in security.
Boonen warns that components, such as optical character recognition (OCR) for PDF files and images which multimodal systems rely on to process inputs, “can introduce vulnerabilities if they’re not properly secured”.
Granting an AI system excessive agency also expands the attack surface unnecessarily, thus giving adversaries more potential entry points. Typically, AI systems designed for enterprise use are integrated into much broader environments spanning multiple infrastructures, plugins, data sources and APIs. Excessive agency is what happens when these integrations result in an unacceptable trade-off between security and functionality.
Let’s consider an example where an AI-powered personal assistant has direct access to an individual’s Microsoft Teams meeting recordings stored in OneDrive for Business, the purpose being to summarize content in those meetings in a readily accessible written format. However, let’s imagine that the plugin doesn’t only have the ability to read meeting recordings but also everything else stored in the user’s OneDrive account, in which many confidential information assets are also stored. Perhaps the plugin even has write capabilities, in which case a security flaw could potentially grant attackers an easy pathway for uploading malicious content.
Once again, red teaming can help identify flaws in AI integrations, especially in environments where many different plugins and APIs are in use. Their simulated attacks and comprehensive analyses will be able to identify vulnerabilities and inconsistencies in access permissions, as well as cases where access rights are unnecessarily lax. Even if they don’t identify any security vulnerabilities, they will still be able to provide insight into how to reduce the attack surface.
The post How red teaming helps safeguard the infrastructure behind AI models appeared first on Security Intelligence.
Stress-testing multimodal AI applications is a new frontier for red teams
Human communication is multimodal. We receive information in many different ways, allowing our brains to see the world from various angles and turn these different “modes” of information into a consolidated picture of reality.
We’ve now reached the point where artificial intelligence (AI) can do the same, at least to a degree. Much like our brains, multimodal AI applications process different types — or modalities — of data. For example, OpenAI’s ChatGPT 4.0 can reason across text, vision and audio, granting it greater contextual awareness and more humanlike interaction.
However, while these applications are clearly valuable in a business environment that’s laser-focused on efficiency and adaptability, their inherent complexity also introduces some unique risks.
According to Ruben Boonen, CNE Capability Development Lead at IBM: “Attacks against multimodal AI systems are mostly about getting them to create malicious outcomes in end-user applications or bypass content moderation systems. Now imagine these systems in a high-risk environment, such as a computer vision model in a self-driving car. If you could fool a car into thinking it shouldn’t stop even though it should, that could be catastrophic.”
Multimodal AI risks: An example in finance
Here’s another possible real-world scenario:
An investment banking firm uses a multimodal AI application to inform its trading decisions, processing both textual and visual data. The system uses a sentiment analysis tool to analyze text data, such as earnings reports, analyst insights and news feeds, to determine how market participants feel about specific financial assets. Then, it conducts a technical analysis of visual data, such as stock charts and trend analysis graphs, to offer insights into stock performance.
An adversary, a fraudulent hedge fund manager, then targets vulnerabilities in the system to manipulate trading decisions. In this case, the attacker launches a data poisoning attack by flooding online news sources with fabricated stories about specific markets and financial assets. Next, they launch an adversarial attack by making pixel-level manipulations — known as perturbations — to stock performance charts that are imperceptible to the human eye but enough to exploit the AI’s visual analysis abilities.
The result? Due to the manipulated input data and false signals, the system recommends buying orders at artificially inflated stock prices. Unaware of the exploit, the company follows the AI’s recommendations, while the attacker, holding shares in the target assets, sells them for an ill-gotten profit.
Getting there before adversaries
Now, let’s imagine that the attack wasn’t really carried out by a fraudulent hedge fund manager but was instead a simulated attack by a red team specialist with the goal of discovering the vulnerability before a real-world adversary could.
By simulating these complex, multifaceted attacks in safe, sandboxed environments, red teams can reveal potential vulnerabilities that traditional security systems are almost certain to miss. This proactive approach is essential for fortifying multimodal AI applications before they end up in a production environment.
According to the IBM Institute of Business Value, 96% of executives agree that the adoption of generative AI will increase the chances of a security breach in their organizations within the next three years. The rapid proliferation of multimodal AI models will only be a force multiplier of that problem, hence the growing importance of AI-specialized red teaming. These specialists can proactively address the unique risk that comes with multimodal AI: cross-modal attacks.
Cross-modal attacks: Manipulating inputs to generate malicious outputs
A cross-modal attack involves inputting malicious data in one modality to produce malicious output in another. These can take the form of data poisoning attacks during the model training and development phase or adversarial attacks, which occur during the inference phase once the model has already been deployed.
“When you have multimodal systems, they’re obviously taking input, and there’s going to be some kind of parser that reads that input. For example, if you upload a PDF file or an image, there’s an image-parsing or OCR library that extracts data from it. However, those types of libraries have had issues,” says Boonen.
Cross-modal data poisoning attacks are arguably the most severe since a major vulnerability could necessitate the entire model being retrained on an updated data set. Generative AI uses encoders to transform input data into embeddings — numerical representations of the data that encode relationships and meanings. Multimodal systems use different encoders for each type of data, such as text, image, audio and video. On top of that, they use multimodal encoders to integrate and align data of different types.
In a cross-modal data poisoning attack, an adversary with access to training data and systems could manipulate input data to make encoders generate malicious embeddings. For example, they might deliberately add incorrect or misleading text captions to images so that the encoder misclassifies them, resulting in an undesirable output. In cases where the correct classification of data is crucial, as it is in AI systems used for medical diagnoses or autonomous vehicles, this can have dire consequences.
Red teaming is essential for simulating such scenarios before they can have real-world impact. “Let’s say you have an image classifier in a multimodal AI application,” says Boonen. “There are tools that you can use to generate images and have the classifier give you a score. Now, let’s imagine that a red team targets the scoring mechanism to gradually get it to classify an image incorrectly. For images, we don’t necessarily know how the classifier determines what each element of the image is, so you keep modifying it, such as by adding noise. Eventually, the classifier stops producing accurate results.”
Vulnerabilities in real-time machine learning models
Many multimodal models have real-time machine learning capabilities, learning continuously from new data, as is the case in the scenario we explored earlier. This is an example of a cross-modal adversarial attack. In these cases, an adversary could bombard an AI application that’s already in production with manipulated data to trick the system into misclassifying inputs. This can, of course, happen unintentionally, too, hence why it’s sometimes said that generative AI is getting “dumber.”
In any case, the result is that models that are trained and/or retrained by bad data inevitably end up degrading over time — a concept known as AI model drift. Multimodal AI systems only exacerbate this problem due to the added risk of inconsistencies between different data types. That’s why red teaming is essential for detecting vulnerabilities in the way different modalities interact with one another, both during the training and inference phases.
Red teams can also detect vulnerabilities in security protocols and how they’re applied across modalities. Different types of data require different security protocols, but they must be aligned to prevent gaps from forming. Consider, for example, an authentication system that lets users verify themselves either with voice or facial recognition. Let’s imagine that the voice verification element lacks sufficient anti-spoofing measures. Chances are, the attacker will target the less secure modality.
Multimodal AI systems used in surveillance and access control systems are also subject to data synchronization risks. Such a system might use video and audio data to detect suspicious activity in real-time by matching lip movements captured on video to a spoken passphrase or name. If an attacker were to tamper with the feeds, resulting in a slight delay between the two, they could mislead the system using pre-recorded video or audio to gain unauthorized access.
Getting started with multimodal AI red teaming
While it’s admittedly still early days for attacks targeting multimodal AI applications, it always pays to take a proactive stance.
As next-generation AI applications become deeply ingrained in routine business workflows and even security systems themselves, red teaming doesn’t just bring peace of mind — it can uncover vulnerabilities that will almost certainly go unnoticed by conventional, reactive security systems.
Multimodal AI applications present a new frontier for red teaming, and organizations need their expertise to ensure they learn about the vulnerabilities before their adversaries do.
The post Stress-testing multimodal AI applications is a new frontier for red teams appeared first on Security Intelligence.
Insights from CISA’s red team findings and the evolution of EDR
A recent CISA red team assessment of a United States critical infrastructure organization revealed systemic vulnerabilities in modern cybersecurity. Among the most pressing issues was a heavy reliance on endpoint detection and response (EDR) solutions, paired with a lack of network-level protections.
These findings underscore a familiar challenge: Why do organizations place so much trust in EDR alone, and what must change to address its shortcomings?
EDR’s double-edged sword
A cornerstone of cyber resilience strategy, EDR solutions are prized for their ability to monitor endpoints for malicious activity. But as the CISA report demonstrated, this reliance can become a liability when paired with inadequate network defenses. Here’s why:
- Tunnel vision on endpoints: EDR excels at identifying threats on individual devices but struggles with network-wide attacks. This leaves gaps when hackers exploit lateral movement or unusual data transfers — activities that often require network-level visibility to detect.
- Playing catch-up with threats: Traditional EDR tools depend on recognizing known indicators of compromise (IOCs). Advanced attackers can easily sidestep these tools by using novel techniques or blending in with legitimate activity.
- Blind spots in legacy systems: Legacy environments often go unnoticed by EDR, giving attackers free rein. In the CISA case, these systems allowed the red team to persist for months undetected.
- Overwhelmed defenders: Even when EDR generates alerts, security teams can become desensitized by a flood of notifications. As seen in the CISA assessment, critical warnings can slip through the cracks simply because defenders are too stretched to respond.
Common EDR pain points
The challenges highlighted in the CISA report mirror broader issues organizations face with EDR:
- Detection without context: EDR tools often spot anomalies on endpoints but fail to connect the dots across the broader network. This lack of context can leave organizations blind to coordinated attacks.
- Weak network integration: Without network-layer defenses, EDR struggles to identify malicious activities like unusual traffic patterns or data exfiltration, key tactics in advanced breaches.
- Fragmented systems: Many organizations operate a patchwork of security tools, leaving critical gaps in coverage and making it harder to correlate data across endpoints, networks and cloud environments.
The next evolution of EDR
Recognizing these shortcomings, cybersecurity is rapidly evolving beyond traditional EDR. Here’s how:
- Extended detection and response (XDR): XDR takes EDR to the next level by integrating endpoint, network and cloud data into a single platform. This broader scope allows organizations to see the full attack picture and respond more effectively.
- AI-driven insights: Cutting-edge EDR solutions now harness machine learning to detect subtle behavioral anomalies. By identifying deviations from normal activity, these tools catch threats even when no IOCs exist.
- Zero trust security: Zero trust architectures take endpoint defense a step further by ensuring no device or user is trusted by default. This integration of endpoint, identity and network security reduces dependence on EDR alone.
- Network visibility: Modern EDR tools are incorporating network traffic analysis to close the gaps identified in the CISA report. Monitoring traffic for anomalies, such as unusual data flows or external connections, bolsters defenses.
- Cloud-native solutions: As businesses embrace hybrid and cloud environments, EDR is evolving to provide seamless coverage across on-premises and cloud systems, addressing vulnerabilities in these critical areas.
Why do gaps persist?
Even with these advancements, many organizations struggle to fully address EDR’s limitations:
- Resource strains: Small security teams often lack the bandwidth or expertise to implement and manage advanced solutions like XDR.
- Budget constraints: Upgrading to integrated platforms or modernizing legacy systems can be costly.
- Legacy challenges: Outdated environments remain vulnerable, acting as weak points that attackers can exploit.
- Leadership missteps: As the CISA report pointed out, organizations sometimes deprioritize known vulnerabilities, leaving critical gaps unaddressed.
Building a more resilient future
The CISA red team findings are a wake-up call: Endpoint protection alone is no longer enough. To outsmart today’s sophisticated adversaries, organizations must adopt a layered defense strategy that integrates endpoint, network and cloud security. Solutions like XDR, zero trust principles and advanced behavioral analysis offer a path forward — but they require strategic investments and cultural shifts.
The post Insights from CISA’s red team findings and the evolution of EDR appeared first on Security Intelligence.
Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models
With generative artificial intelligence (gen AI) on the frontlines of information security, red teams play an essential role in identifying vulnerabilities that others can overlook.
With the average cost of a data breach reaching an all-time high of $4.88 million in 2024, businesses need to know exactly where their vulnerabilities lie. Given the remarkable pace at which they’re adopting gen AI, there’s a good chance that some of those vulnerabilities lie in AI models themselves — or the data used to train them.
That’s where AI-specific red teaming comes in. It’s a way to test the resilience of AI systems against dynamic threat scenarios. This involves simulating real-world attack scenarios to stress-test AI systems before and after they’re deployed in a production environment. Red teaming has become vitally important in ensuring that organizations can enjoy the benefits of gen AI without adding risk.
IBM’s X-Force Red Offensive Security service follows an iterative process with continuous testing to address vulnerabilities across four key areas:
- Model safety and security testing
- Gen AI application testing
- AI platform security testing
- MLSecOps pipeline security testing
In this article, we’ll focus on three types of adversarial attacks that target AI models and training data.
Prompt injection
Most mainstream gen AI models have safeguards built in to mitigate the risk of them producing harmful content. For example, under normal circumstances, you can’t ask ChatGPT or Copilot to write malicious code. However, methods such as prompt injection attacks and jailbreaking can make it possible to work around these safeguards.
One of the goals of AI red teaming is to deliberately make AI “misbehave” — just as attackers do. Jailbreaking is one such method that involves creative prompting to get a model to subvert its safety filters. However, while jailbreaking can theoretically help a user carry out an actual crime, most malicious actors use other attack vectors — simply because they’re far more effective.
Prompt injection attacks are much more severe. Rather than targeting the models themselves, they target the entire software supply chain by obfuscating malicious instructions in prompts that otherwise appear harmless. For instance, an attacker might use prompt injection to get an AI model to reveal sensitive information like an API key, potentially giving them back-door access to any other systems that are connected to it.
Red teams can also simulate evasion attacks, a type of adversarial attack whereby an attacker subtly modifies inputs to trick a model into classifying or misinterpreting an instruction. These modifications are usually imperceptible to humans. However, they can still manipulate an AI model into taking an undesired action. For example, this might include changing a single pixel in an input image to fool the classifier of a computer vision model, such as one intended for use in a self-driving vehicle.
Explore X-Force Red Offensive Security ServicesData poisoning
Attackers also target AI models during training and development, hence it’s essential that red teams simulate the same attacks to identify risks that could compromise the whole project. A data poisoning attack happens when an adversary introduces malicious data into the training set, thereby corrupting the learning process and embedding vulnerabilities into the model itself. The result is that the entire model becomes a potential entry point for further attacks. If training data is compromised, it’s usually necessary to retrain the model from scratch. That’s a highly resource-intensive and time-consuming operation.
Red team involvement is vital from the very beginning of the AI model development process to mitigate the risk of data poisoning. Red teams simulate real-world data poisoning attacks in a secure sandbox environment air-gapped from existing production systems. Doing so provides insights into how vulnerable the model is to data poisoning and how real threat actors might infiltrate or compromise the training process.
AI red teams can proactively identify weaknesses in data collection pipelines, too. Large language models (LLMs) often draw data from a huge number of different sources. ChatGPT, for example, was trained on a vast corpus of text data from millions of websites, books and other sources. When building a proprietary LLM, it’s crucial that organizations know exactly where they’re getting their training data from and how it’s vetted for quality. While that’s more of a job for security auditors and process reviewers, red teams can use penetration testing to assess a model’s ability to resist flaws in its data collection pipeline.
Model inversion
Proprietary AI models are usually trained, at least partially, on the organization’s own data. For instance, an LLM deployed in customer service might use the company’s customer data for training so that it can provide the most relevant outputs. Ideally, models should only be trained based on anonymized data that everyone is allowed to see. Even then, however, privacy breaches may still be a risk due to model inversion attacks and membership inference attacks.
Even after deployment, gen AI models can retain traces of the data that they were trained on. For instance, the team at Google’s DeepMind AI research laboratory successfully managed to trick ChatGPT into leaking training data using a simple prompt. Model inversion attacks can, therefore, allow malicious actors to reconstruct training data, potentially revealing confidential information in the process.
Membership inference attacks work in a similar way. In this case, an adversary tries to predict whether a particular data point was used to train the model through inference with the help of another model. This is a more sophisticated method in which an attacker first trains a separate model – known as a membership inference model — based on the output of the model they’re attacking.
For example, let’s say a model has been trained on customer purchase histories to provide personalized product recommendations. An attacker may then create a membership inference model and compare its outputs with those of the target model to infer potentially sensitive information that they might use in a targeted attack.
In either case, red teams can evaluate AI models for their ability to inadvertently leak sensitive information directly or indirectly through inference. This can help identify vulnerabilities in training data workflows themselves, such as data that hasn’t been sufficiently anonymized in accordance with the organization’s privacy policies.
Building trust in AI
Building trust in AI requires a proactive strategy, and AI red teaming plays a fundamental role. By using methods like adversarial training and simulated model inversion attacks, red teams can identify vulnerabilities that other security analysts are likely to miss.
These findings can then help AI developers prioritize and implement proactive safeguards to prevent real threat actors from exploiting the very same vulnerabilities. For businesses, the result is reduced security risk and increased trust in AI models, which are fast becoming deeply ingrained across many business-critical systems.
The post Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models appeared first on Security Intelligence.