5 min read

Is it time to rethink the Santa Clara Principles?

The 2018 standards set the benchmark for moderation transparency and were adopted by the world’s biggest platforms. But with recommendation algorithms and AI now shaping online speech before it’s even published, the Principles may need updating.

I'm Alice Hunsberger. Trust & Safety Insider is my weekly rundown on the topics, industry trends and workplace strategies that Trust & Safety professionals need to know about to do their job.

This week, I'm thinking about the Santa Clara Principles — the 2018 expert-created standards that have since been endorsed by a dozen major platforms — and wondering how we can modernise them for the age of AI.

As always, get in touch if you'd even the most fiendish question answered or just want to share your feedback. I've really appreciated all the kind words I've gotten recently, and I want to be sure to write about things that resonate with you. This week's edition is inspired by a reader question (thanks Jenni!), so please feel free to send your ideas. Here we go! — Alice

P.S. A places you can catch me in the coming weeks:

  • On September 11th, I'm speaking on a webinar about the study on moderator wellness that I wrote about a while back, with forensic psychologist Jeffrey DeMarco and researcher Sabine Ernst.
  • If you'll be in NYC for Marketplace Risk (16-18 September), be sure to join Juliet Shen (ROOST), Marc Leone (Giphy), Nick Tapalansky (MediaLab) and me on the topic of prioritisation.

Are we still doing "content moderation"?

Why this matters: The Santa Clara Principles have been a vital standard for content moderation transparency since 2018, but AI has fundamentally changed how platforms work. We might need to expand these principles to address new realities — not because we have all the answers, but because the questions have changed.

I’ve been revisiting the Santa Clara Principles (SCP) lately, those early guardrails for content moderation that first emerged in 2018.

Drafted by a coalition of academics and civil society groups in — you guessed it — Santa Clara, they pushed platforms to be more transparent about how and why they take content down, emphasising notice, appeals, and accountability. A 2021 update widened the lens, bringing in perspectives from more marginalised communities and urging platforms to think beyond the “big three” of removal, appeals, and transparency reports.

Ben interviewed Jillian C. York — who was involved in the original SCP — about why they were refreshed and how they could be applied to provide greater accountability. It's worth a read if you haven't already.

Q&A: Jillian C. York on the newly revised Santa Clara Principles
Covering: why an inclusive process was key and what Trust and Safety teams should take from the new recommendations

However, as I watch how AI is reshaping Trust & Safety work, I'm wondering: to what extent are these Principles still applicable?

What's changed since 2018 (or even 2021)

The first iteration of the SCP assumed a reactive, removal model: users post content, systems or moderators detect violations, and companies step in after the fact. Crucially, this rested on the idea that platforms were neutral spaces.

But today is very different. Modern platforms typically shape user behaviour through recommendation algorithms, search rankings, and behavioural nudging. The line between moderation and curation is increasingly a fine one. When algorithmic changes affect what millions of users see more than any individual content removal ever could, we're dealing with something fundamentally different from traditional content moderation.

This isn't to say the Santa Clara Principles are obsolete; it's clear their core idea is needed more than ever. But I wonder if we need to expand them to address new realities that didn't exist when they were written. I don't have all the answers, but the gaps are worth exploring.

Where they fall short?

For me, there are three gaps in the current Principles:

  1. The limits of human review
    When large language models (LLMs) can process millions of moderation decisions per day while providing reasoning for each choice, the basic assumption about individual case review starts to break down. While traditional appeals can still be meaningful, I question whether human review of individual content is even the best use of people's time and skills. New principles might need to emphasise systematic quality assurance, confidence thresholds, and statistical oversight methods.
  2. AI generation companies
    The most obvious gap is that the Santa Clara Principles don't address AI generation companies — which prevent harmful content at the input stage as well as moderating it after creation — or speak to modern AI explainability. That means new principles concerning input filtering transparency, generation safety reporting, and training data accountability might be helpful.
  3. Regulation vs. human rights
    Finally, the principles explicitly state they're "not designed to provide a template for regulation," but regulation now exists across multiple jurisdictions. Companies must balance regulatory requirements with human rights obligations, perhaps in conflicting ways. Future principles may need to help platforms navigate these tensions — especially as governments exert more pressure on moderation practices.

The safety by design shift

These changes point toward a new paradigm often described as safety by design: building safeguards into systems from the ground up rather than reacting after harm occurs. That could mean:

  • Transparent design choices: explaining how recommender systems influence what users see and do.
  • Behavioural influence disclosure: making clear when systems are actively designed to change their behaviour rather than just moderate their content.
  • Predictive intervention accountability: setting standards for how and when AI prevents harmful content before it exists, balanced against user rights and freedom of expression.

This proactive model is fundamentally different from the reactive moderation the SCP were built around. But the current framework doesn’t yet fully reflect it.

Where next?

The Santa Clara Principles were groundbreaking when they launched and they continue to be incredibly valuable to practitioners. But the information landscape has shifted dramatically since 2021 and some thought is needed about how to bolt AI onto existing principles.

The good news is that the T&S community has always been good at adapting to challenges like this. We figured out how to moderate at scale when platforms exploded in size. We developed frameworks for handling disinformation campaigns. We can figure this out too.

First though, we need to acknowledge that we might be dealing with something new. And that's a conversation worth having not just among T&S practitioners, but with the broader community of researchers, advocates, and policymakers who care about how these systems affect human rights and democratic discourse.

You ask, I answer

Send me your questions — or things you need help to think through — and I'll answer them in an upcoming edition of T&S Insider, only with Everything in Moderation*

Get in touch

Also worth reading

It’s Time to Rethink the Assumptions That Guide the Design of Tech Platforms (Tech Policy Press)
Why? A critical look at engagement metrics. Maybe users don't always know best?

Any takers for the NO FAKES Act? (Rob Leathern)
Why? A look at how the NO FAKES act could be strengthened– I especially like this point: "Current takedown systems (like YouTube's) are easily gamed and often wrong, and propose making them more accurate, transparent, and fair - especially important as AI makes it easier to both create problematic content and to abuse takedown processes at massive scale."

Then and Now: How Does AI Electoral Interference Compare in 2025? (Cigi)
Why? "Ultimately, AI is not a stand-alone disruptor but rather a powerful new layer in existing influence operations, with the potential to outpace rules and regulations if not managed appropriately."