5 min read

Your questions, answered

Debunking myths, working with AI — and working against it!

I'm Alice Hunsberger. Trust & Safety Insider is my weekly rundown on the topics, industry trends and workplace strategies that trust and safety professionals need to know about to do their job. 

For this week, I put an AMA out on Linkedin. Here’s a selection of what you asked, along with my answers. Remember, this is never just a one-time thing. You can email me at any time, with any T&S questions you may have.  

Here we go! — Alice


What are the myths in content moderation using AI, if any? Which are true, which are false, what could have been done better? | Anna Truong

The first myth is that using AI for T&S is new. Machine Learning is a type of AI, and T&S teams have been using ML models to automate T&S queues for a long time now. But let’s assume you mean using LLMs for T&S. LLMs are a much newer technology and so come with their own unique myths.

One myth I see a lot is that LLMs can’t be good at moderation because they won’t always come up with the exact same answer every single time — they’re probabilistic, not deterministic. The thing is, humans are not super consistent in their decision-making either, and they’re often held up as the gold standard for moderation. There have been plenty of times that I’ve made a moderation call, then gone back and looked at it again, and realized I was wrong. While LLMs won’t always have the exact same answer for every single moderation decision, they do come up with the same answer most of the time. And the times they don’t, they can give a reason, which often gives clues as to what the issue might be. From there, you can change the policy and steer the model to come up with the right answer every time. Whereas a fixed ML model will always be wrong until it gets retrained. 

Another myth I hear a lot is that LLMs are biased, so they’re not appropriate for content moderation. If you ask an LLM a broad question, it will give you the most probabilistic answer, which could well be biased — as the LLM is trained on biased human data. But as mentioned above, LLMs are not rigid tools; you can tweak and steer them to work for your specific policies, and avoid or disregard the learned bias. LLMs have also been trained on a lot of material around equity and bias mitigation so when steered in that direction, they can do very well. 

The final myth is that with LLM-optimized policies, you can kind of set-it-and-forget-it. This would only be true if the LLM landscape wasn’t constantly shifting: models change and drift, new models come out all the time (and others are no longer supported), and policy needs to change as you find more edge cases or world events happen. Keeping track of all of that can be complicated, especially as you have more than one person who is doing prompt engineering and optimization. 

Do you see variability in human moderation decisions as a significant challenge in Trust & Safety? Rather than making decisions itself, could AI play a role in identifying inconsistencies, surfacing similar precedent cases, or highlighting differences in reviewer reasoning to improve calibration and decision quality? | Anushka Gautam

Yes! Just like I described before, humans aren’t deterministic, just like LLMs aren’t. It’s really hard (and expensive) to get high accuracy from a moderation team at scale. I’ve heard of BPO contracts that are only guaranteeing 70% or 80% accuracy! That shouldn’t be acceptable. The thing that everyone is still figuring out in tech is where people’s judgement is really needed. I would never, ever, advocate for a fully automated moderation system that never has human checks and balances. And, to your point, having AI checks and balances against human decisions can also be hugely helpful. One metric you can use – whether you have a mostly human or mostly AI system – is “agreement rate” between all human and AI moderators. You’ll really quickly see where the inconsistencies are and how policy is breaking down.

How are AI trends including new and evolving content threats and AI integration into T&S workflows shifting the balance between online and offline harm prevention? | James Gresham

This is a really complex question which I will do my best to answer here! AI is certainly driving a lot of new content threats, like scaled romance scams and deepfake-enabled sextortion targeting teenagers. On the other hand, AI also helps us combat those harms faster and better than we’ve ever been able to before. We have classifier and LLM moderation, faster policy iteration, better triage so human reviewers spend time on the hardest calls, and agentic tools that can investigate networks of bad actors rather than just scoring individual pieces of content.

But defenders and attackers aren't symmetric. Attackers only have to succeed once, while defenders have to be right consistently, across every surface, for every user. So even while new technology is unlocking so much for T&S teams, we’re still seeing the same “cat and mouse” game that has always existed. 

Every T&S leader I’ve talked to wants to use more AI in T&S workflows so that they can redeploy their moderation team (and other resources) to be more proactive and prevent harm sooner, or prevent it from spreading from online to off. Obviously there’s always so much to do and never enough resources, and AI is one way to help solve that problem. But, as we’ve seen with layoffs in the tech industry, AI can also be used as an excuse to cut costs, and take the humans out of these processes, even while there’s still plenty of risk. So I’d say AI can and should be used to help shift more power to T&S teams, but that may not always be how it works in practice in every company. When I write about how AI is great for T&S and fraud, I’m never advocating for replacing human judgement and human teams entirely. And I work with a lot of companies that are doing really great things with AI in a responsible way. 

The final thing I’ll say is that the work of offline harm prevention has to also take place off-platform. This means working with advocacy organizations and non-profits, doing community outreach and education, and working with law enforcement. Platforms can’t solve social problems in a silo. Platforms are also just one small surface area for addressing social needs. We as a society need to support mental health programs and effective sexual assault prosecution. And we need to do the real work, as a society, to stop discrimination and racism and transphobia. The harms that we see online aren’t created in a vacuum; what T&S teams do to protect users is just one part of what makes up a larger societal effort to live in a world that is kind, fair, and enjoyable.

You ask, I answer

Send me your questions — or things you need help to think through — and I'll answer them in an upcoming edition of T&S Insider, only with Everything in Moderation*

Get in touch