Strumenti Red Teaming LLM e GenAI

Appendix B del progetto OWASP “Red Teaming LLM” presenta un elenco di strumenti e dataset, sviluppati e selezionati in base all’esperienza collettiva di operatori e autori coinvolti. Il catalogo comprende risorse progettate per il Red Teaming su GenAI e LLM. L’elenco non è esaustivo e viene aggiornato con nuove soluzioni selezionate. Le organizzazioni che desiderano includere nel catalogo strumenti specifici per il Red Teaming su GenAI devono contattare il team OWASP per proporne l’inserimento. L’uso dei tool provenienti da repository pubblici comporta rischi: è responsabilità degli utenti valutarne la sicurezza prima dell’adozione.

Strumenti per Red Teaming LLM e GenAI

ASCII Smuggler: tool per nascondere contenuti nei prompt.
https://embracethered.com/blog/ascii-smuggler.html (Open Source)
Adversarial Attacks and Defences in Machine Learning (AAD) Framework: framework Python per la difesa dei modelli ML da esempi avversari.
https://github.com/changx03/adversarial_attack_defence.git (Source disponibile)
Adversarial Robustness Toolbox (ART): libreria Python per la sicurezza ML.
https://github.com/TrustedAI/adversarial-robustnesstoolbox.git (MIT License)
Advertorch: toolbox Python per ricerche su robustness e avversarial attack in PyTorch.
https://github.com/BorealisAI/advertorch (GNU LGPL v3.0)
CleverHans: libreria Python per testare la vulnerabilità dei sistemi ML ad esempi avversari.
https://github.com/cleverhanslab/cleverhans.git (MIT License)
CyberSecEval: benchmark per quantificare rischi e capacità di sicurezza su LLM.
https://ai.meta.com/research/publications/cyberseceval-3-advancing-the-evaluation-of-cybersecurity-risks-and-capabilities-in-large-language-models/ (MIT License)
DeepEval: valutazione LLM, test unitari e metrica output multipla.
https://github.com/confident-ai/deepeval (Apache License 2.0)
Deep-pwning: framework leggero per valutare la robustness dei modelli ML contro avversari motivati.
https://github.com/cchio/deeppwning (MIT License)
Dioptra: piattaforma per testare l’affidabilità di sistemi AI.
https://pages.nist.gov/dioptra/index.html (CC BY 4.0)
Foolbox: tool per attacchi avversari e benchmarking robustness ML in PyTorch, TensorFlow e JAX.
https://github.com/bethgelab/foolbox (MIT License)
Garak: kit per red-teaming e assessment su GenAI.
https://garak.ai/ (Apache License 2.0)

https://github.com/NVIDIA/garak
Giskard: suite di test su ML e LLM.
https://www.giskard.ai/ (Apache License 2.0)
Generative Offensive Agent Tester (GOAT): sistema automatizzato che simula conversazioni avversarie per identificare vulnerabilità in LLM.
https://arxiv.org/abs/2410.01606
Gymnasium: libreria Python con API standard per test e sviluppo reinforcement learning.
https://github.com/FaramaFoundation/Gymnasium.git (MIT License)
Harmbench: framework open source scalabile per la valutazione di metodi automatizzati di Red Teaming e attacchi/difese su LLM.
https://github.com/centerforaisafety/HarmBench (MIT License)
HouYi: framework per attacchi tramite injection di prompt in applicativi LLM-integrated.
https://github.com/LLMSecurity/HouYi?tab=readme-ov-file (Apache License 2.0)
JailbreakingLLMs – PAIR: test di jailbreak per LLM con Prompt Automatic Iterative Refinement.
https://jailbreakingllms.github.io/ (MIT License)
Llamator: pentesting per applicativi RAG.
https://github.com/RomiconEZ/LLaMator (CC)
LLM Attacks: automatizzazione nella costruzione di attacchi avversari su LLM.
https://llm-attacks.org/ (MIT License)
LLM Canary: benchmarking e scoring su LLM.
https://github.com/LLMCanary/LLM-Canary (Apache License 2.0)
Modelscan: rilevamento di attacchi Model Serialization.
https://github.com/protectai/modelscan (Apache License 2.0)
MoonShot: tool modulare per valutare applicazioni LLM.
https://github.com/aiverifyfoundation/moonshot (Apache Software License 2)
Prompt Fuzzer: tool per test di sicurezza su prompt GenAI contro attacchi LLM dinamici.
https://github.com/promptsecurity/ps-fuzz (MIT License)
Promptfoo: Red Teaming, penetration testing e vulnerability scanning su LLM.
https://github.com/promptfoo/promptfoo (MIT License)
ps-fuzz: tool interattivo per sicurezza dei prompt GenAI.
https://github.com/promptsecurity/ps-fuzz (MIT License)
PromptInject: analisi quantitativa sulla robustezza LLM rispetto a prompt avversari.
https://github.com/agencyenterprise/PromptInject (MIT License)
Promptmap: prompt injection su istanze ChatGPT.
https://github.com/utkusen/promptmap (MIT License)
Python Risk Identification Toolkit (PyRIT): libreria Microsoft per valutare la robustezza di endpoint LLM in relazione a contenuti come hallucination, bias e proibiti.
https://github.com/Azure/PyRIT (MIT License)
SplxAI: Red Teaming automatizzato per Conversational AI.
https://splx.ai/
StrongREJECT: benchmark di jailbreak con metodologia di valutazione.
https://strongreject.readthedocs.io/en/latest/#license,
https://arxiv.org/abs/2402.10260 (MIT License)

Dataset per GenAI Red Teaming

AdvBench: attacchi avversari universali e trasferibili su modelli linguistici allineati.
https://api.semanticscholar.org/CorpusID:260202961 (Open Source)
BBQ Bias Benchmark for Question Answering: benchmark bias per task QA.
https://github.com/nyu-mll/BBQ (Open Source)
Bot Adversarial Dialogue Dataset: dataset di dialoghi avversari per bot.
https://github.com/facebookresearch/ParlAI/tree/main/parlai/tasks/bot_adversarial_dialogue (Open Source)
HarmBench: framework standard per Red Teaming automatizzato e robust refusal.
https://api.semanticscholar.org/CorpusID:267499790 (Open Source)
JailbreakBench: benchmark open per robustness dei LLM ai jailbreaking.
https://api.semanticscholar.org/CorpusID:268857237 (Open Source)
HAP: modelli efficienti per il rilevamento di odio, abuso e profanità.
https://arxiv.org/abs/2402.05624 (Open Source)

Risorse aggiuntive

Il progetto OWASP evidenzia inoltre la AI Security Solutions Landscape, una risorsa che raccoglie controlli di sicurezza, sia tradizionali che emergenti, per affrontare rischi LLM e Generative AI mappati nell’OWASP Top 10.

Conclusioni

L’appendice elenca strumenti e dataset per identificare e valutare criticità sui LLM e GenAI, inclusi rischi come prompt injection, bias, tossicità e data leakage. Le risorse segnalate supportano le attività di Red Teaming con un approccio sistematico e aggiornabile.

Strumenti e Dataset per Red Teaming LLM e GenAI

Strumenti per Red Teaming LLM e GenAI

Dataset per GenAI Red Teaming

Risorse aggiuntive

Conclusioni