Facebook has released a progress report on the Civil Rights Audit conducted by Laura Murphy and others from law firm Relman, Dane & Colfax. The report offers insight into the policy changes that Facebook has already adopted to improve content moderation policies, in particular in relation to hate speech. The auditors also give recommendations in their report. These recommendations include widening Facebook's definition of hate speech to also cover region-based attacks, and no longer using 'humor' as an exception to Facebook's hate speech policy:

Hate Speech

  • National Origin: Facebook defines hate speech to include attacks against people based on their national origin. Facebook’s current conception of national origin is focused on attacks based on one’s country of origin. However, hate speech based on one’s ancestry or location of origin often does not track country borders; instead it can include attacks based on larger regions or continents. Dehumanizing attacks based on regional origins (e.g., people from Central America or the Middle East) or continental ancestry should not be treated differently simply because the geographical location at issue extends beyond a single country. The Audit Team recommends that Facebook revise the definition of national origin in Facebook’s Community Standards to include continents and regions larger than a single country where used to attack people from that region or continent.

  • Humor: Facebook’s hate speech policy currently contains an exception for humor. However, what qualifies as humor is not standard, objective, or clearly defined; what one user (or content reviewer) may find humorous may be perceived as a hateful, personal attack by another. Nor does labeling something as a joke or accompanying it with an emoji make it any less offensive or harmful. Because of the inability to precisely define and consistently identify a subjective and amorphous category like humor, identifying humor as an exception to Facebook’s hate speech policy likely contributes to enforcement inconsistencies and runs the risk of allowing the exception to swallow the rule.

The Audit Team recommends that Facebook remove humor as an exception to the hate speech policy (and ensure that any future humor-related carve-outs are limited and precisely and objectively defined).

Specifically regarding white nationalist expressions on Facebook, Murphy and her team recommend to also remove statements that do not explicitly refer to the terms "white nationalism" and "white separatism", but that are nonetheless white nationalist in nature.

The Auditors believe that Facebook’s current white nationalism policy is too narrow because it prohibits only explicit praise, support, or representation of the terms “white nationalism” or “white separatism.” The narrow scope of the policy leaves up content that expressly espouses white nationalist ideology without using the term “white nationalist.” As a result, content that would cause the same harm is permitted to remain on the platform.

The Audit Team recommends that Facebook expand the white nationalism policy to prohibit content which expressly praises, supports, or represents white nationalist ideology even if it does not explicitly use the terms “white nationalism” or “white separatism.”

That of course requires content moderators to give greater consideration of the content's purport and the context of the expression at hand.

The content moderation-part of the report focuses very much on Facebook's policies for content moderation. What is hate speech? When should it be removed? Etcetera. However, no attention is paid to the algorithms that are at work to detect hate speech and other types of illegal content. The report only mentions that a great deal of hate speech is removed after "proactive detection using technology". But how do these technologies work and how effective are they?

Content moderation starts with finding content that could be unlawful or that is in breach with the platform's policies, and technologies play an important role in that process. Yet, these technologies are not perfect. The algorithms to detect illegal content may perfectly detect explicit expressions, but they may be less effective in detecting more implicit types of hate speech. The algorithms may also be biased and disproportionately flag content by certain groups of people or leave expressions of other groups untouched. Of course, there are human reviewers going over all flagged content. But they must take decisions quickly and - as the report also notes - they may not have all relevant information to correctly assess the flagged content. By focusing on content moderation policies and the decision-making process once hate speech has been detected, the report overlooks an important aspect of content moderation: what influence do the content detection algorithms have? And that creates a blind spot in the evaluation of Facebook's content moderation policies and practices.

In the next few months, I'll be researching the role and influence of algorithms in content moderation as part of a bigger WODC research project into algorithmic decision-making. I intend to write short updates now and then.