AI replaces moderators (again), governments replace nuance and Kim Cameron was right
Hello and welcome to Everything in Moderation's Week in Review, your need-to-know news and analysis about platform policy, content moderation and internet regulation. It's written by me, Ben Whitelaw and supported by paid members like you.
TrustCon — a mix of Glastonbury and Davos for the Trust & Safety industry — is fast-approaching and I'm excited to be taking part after a year away. If you're going to be there, hit reply so we can grab a coffee or, if it's more your bag, engage in some "fiber-maxxing". I'm not the only one looking forward to it, as today's Post of Note suggests.
Talking of Ctrl-Alt-Speech, I'm joined this week by Cori Crider, executive director of the Future of Technology Institute and co-founder of Foxglove. She explains why better social media bans are coming and how she's surprised Meta has any moderators left to fire. Patreon listeners can tune in now and it'll be live on public feeds later today.
Thanks for forgoing the lure of Korea’s dopamine sites to read today’s edition. This is your EiM Week in Review — BW
While hash-matching catches known content, generative AI has unleashed a wave of novel, un-fingerprinted CSAM and non-consensual intimate imagery (NCII) that reactive workflows cannot handle.
On July 9th, Resolver is hosting an expert panel featuring the Internet Watch Foundation (IWF), SWGfL (StopNCII.org), and Reality Defender to map out the future of proactive platform defense.
Discover what a proactive detection architecture looks like in practice, how to intercept AI-generated material early, and how to align your workflows with shifting UK, US, and EU regulatory mandates.
Policies
New and emerging internet policy and online speech regulation
Australia has announced it will beef up the penalties for breaches of its social media ban, after claims that the biggest platforms “aren’t doing enough to comply with the law”. Prime minister Anthony Albanese doubled the proposed fine from $49.5m to $99m and gave the country’s regulator, the eSafety Commission, greater powers to investigate platforms for breaches of the law. It comes just weeks after researched published by the British Medical Journal found 85% of teens were still using at least one social media three months after the law came into force.
Hectic metrics: The BMJ research has placed further doubt on the effectiveness of teen bans but there’s a broader question emerging: at what point in time should we measure whether a ban is working or has worked? One academic writing for The Conversation has said “the full effects may not be clear for a decade” and the greatest opportunity is “with children under eight who have not yet started using social media”.
That might be true but a) that’s no use to the current teens and parents that worked hard to advocate for a ban and b) that’s not how politicians messaged the ban when it was announced. Anthony Albanese, for example, called it a way to “change lives for this and future generations”. (EiM #318)
Refusing to be left out of the child safety conversation, the US House of Representatives passed a package of bills this week that critics called “a mess” and would lead to “restrictive age-checking practices across their entire platforms”. The KIDS Act, which includes revised version of the controversial Kids Online Safety Act, instead focuses on specific design features and safeguards — which is a concern for youth-led organisations who believe the removed “duty of care” clause lets platforms off the hook.
Meanwhile, new public polling says a slim majority — 56% — of US adults would support a social media ban for teens. What's interesting to me is just how many — a further 23% — are unsure. A ban would likely run aground when tested against the First Amendment, but that undecided bloc could prove influential as lawmakers push towards regulating platforms in some form.
Also in this section...
- Social media bans go global: big tech faces a reckoning after Australia’s crackdown (The Guardian)
- EU funded project in Estonia aims to make the internet safer for children (European Commission)
Products
Features, functionality and technology shaping online speech
Almost three weeks after the US government put in place export controls, Anthropic has redeployed its Claude Fable 5 model, playing down the vulnerabilities uncovered by Amazon researchers that got it taken offline. The blogpost — which is worth reading in full — explains that other less capable models were able to identify the same vulnerabilities as that claimed in the Amazon report and that Fable 5 has “no such unique offensive capabilities”.
The greater good? Alongside the relaunch, Anthropic published a severity framework for assessing AI jailbreak severity and invited other frontier labs to adopt it. It’s either a genuine attempt to create industry consensus on AI safety — seemingly the goal of the new UN commission that features top execs including Anthropic’s Jack Clark — or a naked attempt to bolster its reputation on safety. Perhaps it’s both.
Also in this section...
💡 Become an individual member and get access to the whole EiM archive, including the full back catalogue of Alice Hunsberger's T&S Insider.
💸 Send a tip whenever you particularly enjoyed an edition or shared a link you read in EiM with a colleague or friend.
📎 Urge your employer to take out organisational access so your whole team can benefit from ongoing access to all parts of EiM!
Platforms
Social networks and the application of content guidelines
There must be something in the platform water. Just a week after Meta announced it was replacing moderation staff with generative AI (EiM #341), TikTok announced it would reduce its headcount by 300 jobs in its Dublin office. It comes almost 12 months to the day after German workers protested against redundancy threats brought on by AI (EiM #299) and four months after workers protested outside its London offices after 400 lost their jobs (EiM #323). If any EiM readers who were affected — or you know someone who wants to discuss — drop me a line.
Same story: The consistent thread from the reporting on these redundancies is that workers don’t believe AI is “ready” to be rolled out. One worker said TikTok’s models produce false positives such as mistaking an outstretched hand as a gun or stains on a wall as blood. A cynic might say: those workers would say that, their jobs are on the line. But the automated decision making has been a concern of civil society groups and non-profits for years. What if they’re right?
A shocking related story from India: Meta’s “proactive detection technology” hasn’t stopped dozens of ads from appearing on Instagram promoting child sexual abuse material and linking users to Telegram channels where they can buy abuse videos for less than a dollar. The BBC reported a number of said ads, only to be told that they did not “violate its community guidelines”. Sigh.
Also in this section...
- Many Child Safety Features on Social Apps Don't Work, Report finds (New York Times)
- Meta Contractors Posed as Teens to Prompt Rival Chatbots About Suicide, Sex, and Drugs (Wired)
- What 20 million bans reveal about the strain on Wikipedia’s volunteers (The Conversation)
People
Those impacting the future of online safety and moderation
Long before AI agents were a thing, Kim Cameron was among a group of researchers warning us all that the internet was missing a trick. As Microsoft’s then-Architect of Identity (what a title, btw), Cameron argued in his paper The Laws of Identity that the web was never built to verify who or what users were interacting with — and that it would be something we’d come to need.
I came across Cameron’s work in a great new essay by researcher and writer Renee di Resta in Noema, which lays out — using Cameron’s research as an origin poin — that proving “personhood” is the internet’s next great obstacle. Developing such digital credentials would “verify humanness without disclosing anything else and would be limited to one per person per credentialer, with unlinkable pseudonymity”.
Companies like World ID (built by Sam Altman’s Tools for Humanity) and Humanity Protocol (the decentralised identity project, which suffered a massive hack last month) are already building these systems but they rarely feature in T&S conversations. I’m guessing that won’t be the case for long.
Posts of note (TrustCon countdown)
Handpicked posts that caught my eye this week
NB: Mike and I will be joined at TrustCon 2026 by special guests Kat Duffy and Zoe Darme for a special live edition of Ctrl-Alt-Speech. Join us IRL if you're coming and, if you can't, fear not — you can tune in wherever you get your podcasts.
- “I'm not kidding. I see a lot of value and a lot of problem happening in our industry, and I want to hear, learn, share, align, commiserate, and plan.” - Modsquad’s Izzy Neis isn’t letting the opportunity pass.
- “This is my fifth one #TrustconOGs and I still can't believe how quickly my schedule has filled up at and around the conference. Catch me at one of these sessions, or let's try and grab coffee if schedules permit!” - Viashnavi J’s is set to make it five out of five TrustCons.
- “The idea is simple: no presentations, no formal agenda, just an open, interactive conversation where practitioners can share experiences, discuss real-world challenges, exchange ideas, and learn directly from one another” - Ravi Bhalla from Concentrix is leading a session with a difference.
Member discussion