Llama Guard

Llama Guard is a natively multimodal input-output safeguard model geared towards Human-AI conversation use cases. If the input is determined to be safe, the response will be Safe. Else, the response will be Unsafe, followed by one or more of the violating categories from the MLCommons Taxonomy of Hazards:

S1: Violent Crimes.
S2: Non-Violent Crimes.
S3: Sex Crimes.
S4: Child Sexual Exploitation.
S5: Defamation.
S6: Specialized Advice.
S7: Privacy.
S8: Intellectual Property.
S9: Indiscriminate Weapons.
S10: Hate.
S11: Suicide & Self-Harm.
S12: Sexual Content.
S13: Elections.
S14: Code Interpreter Abuse.

This repository contains a Streamlit app for exploring content moderation with Llama Guard 4 on Groq. Sign up for an account at GroqCloud and get an API token, which you'll need for this project. See this blog for more details, and deploy this app on Railway or similar platforms to explore it further.

Here's a sample response by Llama Guard upon detecting a prompt that violated a specific category.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llama-guard.png		llama-guard.png
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Llama Guard

About

Uh oh!

Sponsor this project

Uh oh!

Uh oh!

Contributors 2

Languages

Uh oh!

License

alphasecio/llama-guard

Folders and files

Latest commit

History

Repository files navigation

Llama Guard

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Contributors 2

Languages