4 min read

Will we fix AI bias against LGBTQ+ users?

A new report shows how AI systems are already failing LGBTQ+ users. The problem is: it may also the best way to fix moderation issues that traditional systems never managed to address.

I'm Alice Hunsberger. Trust & Safety Insider is my weekly rundown on the topics, industry trends and workplace strategies that trust and safety professionals need to know about to do their job.

I shared a post this week shouting out some amazing women making things happen in T&S, online governance and AI. And I've been thinking since about the many individual contributions that practitioners of all kinds — including many EiM readers — make to help the field move forwards.

If you've done something you're proud of recently, hit reply and share it with me — I'd love to share more widely in future editions of T&S Insider.

GLAAD's new report builds on that idea that it's people like you and me who will decide whether AI repeats old moderation dynamics or fixes them. That's heartening, at least to me. So, without further ado, here we go! — Alice


Fixed classifiers meet steerability

Why this matters: We know all about AI bias against LGBTQ+ users, and the GLAAD report is a helpful reminder of how far we have to go. I wonder whether we're underplaying the opportunity for LLMs — when developed with LGBTQ+ communities in mind — to finally address moderation issues that fixed classifiers made almost impossible to solve.

If we’ve known implicitly for a while that AI systems are at risk of perpetuating anti-LGBTQ+ bias in online moderation, a new report from GLAAD makes it explicit. 

Build for Everyone, which was published last week, is a careful account of how AI systems are failing LGBTQ+ people. (Disclaimer: I gave feedback on it and I'm in the acknowledgements, so I'm not a neutral party here). 

The report itself and the early reporting on it have, understandably, led with the harm that can be caused by AI systems that aren’t built with marginalised communities in mind. That’s incredibly important, and the recommendations in the report are really good. You should go read the whole thing. 

However, there's a second part to this story that is getting less attention: AI is the best chance the LGBTQ+ community has had in years to fix moderation problems that traditional moderation systems could never solve.

Bias isn't new, and that's the point

Part of what makes Build for Everyone a good read is the way it chronicles LGBTQ+ moderation failures over the years: Meta's Llama 4 recommending conversion "therapy" in response to user queries, Grok generating non-consensual intimate imagery, and the long-running problem of automated systems suppressing legitimate LGBTQ+ content through shadowbanning, demonetization, and wrongful removals.

It's easy to read all of this as a story about AI introducing bias into moderation, but I’d argue that isn't quite right. The bias that GLAAD documents isn’t new; it’s part and parcel of human review and it also lives in the fixed machine learning classifiers that replaced some of that human work.

I wrote about this back in 2024, in a Pride-season guide to protecting LGBTQ+ users. A lot of what I flagged then was about human moderators: the need for anti-bias training, the importance of internal documentation of reclaimed language and the pattern where trans users get flagged as "fake" by bigots and disproportionately banned. None of that was an AI problem. It was a people problem that automation then inherited, and I've written separately about how these systems learn the wrong shortcuts and perpetuate familiar patterns.

So it’s not that AI made moderation biased against LGBTQ+ people. Bias was already there, and AI made it both more visible and, for the first time, more directly addressable.

The promise of "steerability"

The property that makes large language models risky — that they absorbed the internet's bias along with everything else — is the same one that lets you instruct them away from it. It’s called “steerability”.

What steerability means in practice is that it’s possible to steer LLMs towards that part of their training and away from the biased parts. For example, you can tell a system, in plain language, that reclaimed language used by in-group members isn't a slur, that talking about discrimination isn't itself discriminatory, and that LGBTQ+ content shouldn't be held to a stricter sexualization standard than anyone else's. LLMs will know what that means.

You couldn't do that with a fixed classifier. Correcting a bias there meant retraining data labellers, then relabeling the data, and finally retraining the model, which is a process expensive enough that it rarely got prioritised for smaller communities. The bias just stayed in production.

With an LLM, the same correction is a simple change to a prompt. For the first time, a marginalised community's context can be written directly into the instructions a moderation system follows.

AI with and for the LGBTQ+ community

Let’s get one thing straight: Build for Everyone doesn’t pretend any of this will happen by magic. It is clear-eyed about how the opportunity to address longstanding moderation bias only exists if someone does the unglamorous, resource-intensive work of building golden datasets, red-teaming with people who actually represent the affected communities and auditing for disparities. 

That work costs time and money, and right now the political climate, particularly in the United States, is pushing platform investment in the opposite direction. Internal teams aren't prioritising this issue and the report's concern about bias being baked into widely-used LLMs is well-founded.

That said, the report respondents who are worried about bias in content moderation are also hopeful. 65% think AI could help fight harassment and hate speech, and 73% think it could improve access to accurate information about identity, health, and legal rights. That optimism is conditional, and the condition is that AI gets built with this community rather than around it, which is exactly where the opportunity exists.

Chart showing that LGBTQ adults in the US are hopeful about what AI can do
LGBTQ adults in the US are hopeful about what AI could enable