Gender and content

First Monday, the peer-reviewed journal that delves into all things digital, has published a paper about the success of an algorithm that can tell males and females apart purely by the content they write on Twitter and LinkedIn.

In itself this isn’t a problem, but a quick glance at the inner machinations of the algorithm and it soon becomes apparent that it is based on some pretty ‘traditional’ views on what differentiates the sexes. Take this précis of research on the topic:

In summary, the language of men uses patterns that are characterised by more marked expressions of independence and assertions of hierarchical power. This language includes strongly assertive, aggressive and self–promoting features as well as rhetorical questions, authoritative orientation and challenges. In contrast, women tend to express themselves with a more emotional language, including the use of more frequently emotionally intensive adverbs and affective adjectives, such as really, quite, adorable, charming and lovely. Women use more attenuated assertions, apologies, questions, personal orientation and support.

…This research exploits both content–based features — words related to specific feelings that can act as markers of emotional, psychological and cognitive states — and traditional features — markers of writing styles — that act as strong gender indicators.

Interestingly, the paper lists a number of differentiators that would strike many as odd: for instance, the proportion of white-space characters, lower-case letters, abbreviations and exclamation marks, among many. (Frustratingly, the paper’s authors shed no light on which gender these indicators point to.)

Of course, the algorithm is intended to be successful, rather than engage in a polemic about why the differences occur in the first place. When tested, it was 92.2% accurate for Twitter users, and 98.4% accurate when looking at LinkedIn.


Visit Content Cloud to sign up as a creator, or commission the content your business needs

Editor's pick

Most popular