Pentesting web vs. pentesting LLM/AI: How do they differ and why does it matter?

Yesenia Trejo

2 minutes

min read

June 6, 2025

The rise of artificial intelligence has brought impressive advances, but it has also opened new doors to potential attack vectors. One of the most notable shifts we’re seeing is the emergence of a new testing surface: large language models (LLMs). Unlike traditional pentesting conducted on websites and applications, LLM/AI security assessments present entirely different challenges that security teams need to address urgently.

How do these two disciplines differ? In this blog, we’ll break down the main differences between them—from attack surfaces and methodologies to exploitation techniques and potential impacts. This analysis was developed by our Lead Striker, Yesenia Trejo, an expert in offensive security and AI.

What’s being tested?

The first significant difference between web pentesting and LLM pentesting lies in the attack surface.

For a traditional website or application, the focus is on technical components like:

API
Databases
Authentication mechanisms
Input forms
User sessions

The goal is straightforward: identify common vulnerabilities like SQL injections, XSS, or CSRF that can compromise the system.

In contrast, LLM security assessments target a completely different surface. The focus shifts to:

The model’s API
Prompts or instructions it receives
Training data used
Content generated in its responses
System commands that might be hidden

Here, flaws aren’t necessarily in the code itself, but in the model’s behavior, its exposure to sensitive data, or how it reacts to manipulated inputs.

Attacker goals

While both types of pentesting aim to identify vulnerabilities, the what and how are significantly different.

For web pentesting, typical goals include:

Finding exploitable technical flaws
Escalating privileges
Gaining unauthorized access to internal systems

For LLM/AI pentesting, the attacker’s goals might be to:

Inject malicious instructions through prompts
Extract confidential information the model may have learned
Manipulate biases or influence the model’s output
Bypass security controls set by the provider

Exploitation techniques

The methods of attack also adapt to the target.

For websites:

Fuzzing
Exploiting known CVEs
Session hijacking
Privilege escalation

For LLMs/AI:

Adversarial prompts (designed to evade restrictions)
Jailbreak attacks (unlocking hidden functionalities)
Indirect prompt injections
Manipulation of the model’s fine-tuning or training data
Model control protocol (MCP) attacks, among others

These techniques don’t aim to “break” the model itself but to make it behave in ways it shouldn’t: leaking data, generating disinformation, or executing unauthorized actions.

What if the attack succeeds?

The impacts also reflect the different nature of these systems.

For websites:

Unauthorized access
Data leaks
Defacements
Full server compromise

For LLMs/AI:

Leakage of sensitive information (such as data learned from previous sessions)
Generation of false or biased content
Compliance violations (exposing protected data)
Ethically risky behavior from the model
Potential backdoors in the supply chain via compromised training data

The last point is especially concerning: a poorly trained or compromised model can become a security threat for the entire organization.

How is each tested?

Web pentesting is supported by well-established standards, including:

OWASP Testing Guide
OWASP Top 10
PTES
NIST

For LLMs/AI, there is no globally recognized methodology yet. However, emerging frameworks like the OWASP Top 10 for LLMs are beginning to identify specific threats such as prompt injection and data exposure through poorly filtered outputs.

At Strike, we combine these emerging frameworks with proprietary techniques developed by our research team. We test leading AI models like ChatGPT, DeepSeek, and Ngrok, and actively participate in bug bounty programs to responsibly disclose vulnerabilities we uncover in these systems.

A rapidly evolving field

Unlike web pentesting—where many threats are already well documented—the security of language models is a newer, fast-moving field. It requires not only technical expertise but also a thorough understanding of how these models behave, their limitations, and the risks associated with their training and use.

This is why, at Strike, we apply internally developed evasion and jailbreak techniques that we keep confidential to ensure their effectiveness. Our mission is to stay at the forefront of research in this area and provide our clients with genuine protection in a threat environment that’s constantly changing.