Actions

Work Header

Rating:
Archive Warning:
Fandom:
Additional Tags:
Language:
English
Series:
Part 29 of Fandom Stats
Stats:
Published:
2014-09-04
Words:
1,310
Chapters:
1/1
Comments:
2
Kudos:
24
Bookmarks:
1
Hits:
583

[Fandom stats] Chatty tags, canonical tags, and synonyms on AO3

Summary:

This started out as an attempt to estimate the number of long, chatty tags on AO3, and to see how much chattiness varied by fandom. But it turned out my analysis didn't provide much insight into that question -- however, it spawned a super geeky (and interesting, to me) conversation about the details of AO3 tag wrangling. This analysis includes replies from multiple AO3 wranglers.

Notes:

Originally posted on Tumblr. Thanks to wrangletangle and melannen for letting me archive their contributions to the thread!

Work Text:

I initially posted:

In wrangletangle's post about chatty AO3 tags,, there was a linked AO3 admin post from samjohnssonvt with data about tags (from Oct 2012).  Guess what I like to do with data?  If you guessed "make brightly colored graphs", you are right. :)

Sam J's data shows for a sample of fandoms how many of the "additional tags" (the ones that aren't one of the special Fandom/Character/Relationship/Category/Warning/Rating tags) were canonical -- that is, how many had been wrangled by AO3 wranglers -- for a number of fandoms as of Oct 2012.  "Canonical" doesn't necessarily mean "non-chatty" -- as wrangletangle points out above, fricking awesome popular chatty tags like "Up all night to get Bucky" and "Everyone Is Poly Because Avengers" can eventually get canonized.  But this does give a sense of the proportion of tags that authors use that are rare or one-offs (many of the non-canonical tags will fall in those categories).

Here's the relative numbers of canon vs. non- for a number of the fandoms that Sam J listed (all the ones with > 1K fanworks) -- you can see that it varies a lot by fandom:

image

And we can also look at the proportion of non-canonical tags:

image

Note -- this is not how many USES of tags there are, but just how many unique tags there are in the fandom that are canonical vs. not.

We can also look at the total number of tags compared to the total number of fanworks in each fandom...  That is, how many unique tags are generated per 1000 fanworks in each fandom:

image

This possibly (?) roughly corresponds to the "chattiness" of each fandom on AO3.

I'm guessing the variance we see here is a function of multiple factors, including:

  • Number of wranglers working in the fandom 
  • Age of the fandom (more tags will have had time to become popular and get canonized in older fandoms -- plus there's been a trend to use more tags over time on AO3)
  • More chatty vs. more utilitarian approaches to tagging differing across fandoms

AO3 wrangling folks & other staff, any other ideas about what causes this variance?  (And/or any more data to play with here?  ;D)

As the wranglers say, it's not the chatty tags that put a strain on the servers at AO3 or make life difficult for the volunteers.  But it is really interesting to see how tagging practices differ across different fandoms!

wrangletangle then replied:

You are the best. :DDD

So, things that can affect these numbers!

  1. These are canonicals in red, not synonyms. So if users in the fandom tend to make many tags for the same idea, rather than using the canonical, then the blue:red ratio will go up.
  2. Relatedly, if the wranglers start canonizing late in the process (due to, say, a fandom exploding in May of 2012 and not having enough bodies on hand to keep up with more than just characters and relationships….), there will tend to be more syns per canonical, and the same thing will happen.
  3. The number of characters in the fandom and variety of work and style types can affect tagging, as character-related tags will be extremely diverse in type and number. Ditto if there are a wide variety of popular pairings. (I honestly expected Homestuck’s ratio to be even more dramatic for that reason.)
  4. How quickly tags are canonized in the fandom may affect the numbers, because users may grab the canonicals. But I think it’s more likely that fandom tagging conventions will affect them. ETA: By which I mean, if the fandom convention is to use canonicals, more people will do that, whereas if the fandom convention is to create one’s own tags, more people will do that. It’s partly a fannish culture, as you said.
  5. Whether the fandom keeps shared/generic canonized tags in the fandom or not has a big impact. If the fandom keeps, say, several hundred tags for single-name common characters or common concepts, those will bump the canonical numbers, whereas fandoms like One Direction kick out all of these tags, meaning they show as a very low ratio even though their wranglers are canonizing freeforms daily.
  6. Probably other things I’ve missed. Other wranglers could probably point out more reasons.

melannen added:

It’s actually a lot easier to pull the synonym data now than it was in 2012, so since I can never resist the call of GRAPHs, I went and made similar ones with today’s data for the same fandoms.

Here’s the one with canonicity of tags. Quick overview of the three tag types I’m graphing here: “Canonical” are the tags that appear in the dropdowns and sidebar filters. “Synonym” are tags that aren’t canonical but are close enough in meaning to canonical tags that they are combined with them - things like misspellings, plural vs. singular, or, yes, actual synonyms. “Combined Unfilterable” are tags that the wranglers have decided not to link to any canonical (it combines the internal categories of “tags that may be canonical someday” and “tags that will never ever ever be canonical” but that makes no difference to users, and tags move between those two categories a lot, so I’ve combined them in the graph.) “Chatty” tags can be in any of the three categories, but most of them do spend at least some time as unfilterables.

The first thing you’ll note is that there’s almost ten times as many total tags in the most popular fandoms as there were in 2012. (If you’re wondering why the filters get stressed, it really is sheer number, not number of any particular kind of tag.) I used the current equivalent fandom name for the 2012 names, also; some of these have been recombined or renamed since. You can also see just how much is done with making synonyms of canon tags - most of the large fandoms have five times as many synonyms as canonicals, some of them more.

@wrangletangle did a good job of covering some of the things that might affect the differences in proportions showing here; I have a few more to add:

7. For pre-AO3 fandoms (on this list, I’d include Buffy, HP, and Naruto), a significant number of works may have been imported from other archives or websites, either via Open Doors or manually by users; most other archives don’t have any user-entered freeform tags, and most users haven’t gone back to old works and added a bunch of new tags, so that drops the numbers a lot even if new works in those fandoms are very chatty.

8. Some of these fandoms are part of larger fandom structures (MCU and Sherlock, I’m looking at you) and the way the unfilterable tags are divided up between the subfandoms in the larger structure may give misleading numbers. Since the fandom an unfilterable tag is in has no effect on the user experience, each wrangling team divides them up differently.

9. When it comes to the “chatty” tags that sparked off this discussion, the vast majority of them wouldn’t show up on these charts anyway - most “chatty” tags don’t have anything fandom-specific in them, and those tags get sent to “No Fandom” (and adding No Fandom to the graphs would make even MCU’s numbers too small to see in comparison.) Some fandoms are disproportionately likely to keep “chatty” tags in their fandom, either because of wranglers’ preference, or because of fandom - for example, an SF fandom with a lot of worldbuilding, like Homestuck, is more likely to have something fandom-specific show up in a freeform than a modern AU fandom like Sherlock, so I suspect that skews the data if you try to extract overall chattiness from these numbers.

…and I may do another post later with more numbers and charts but this is long enough for a tumblr reply that I can’t put a readmore in, I think. ^_^

Series this work belongs to: