Where Are We Now? (II)
We’ve been quiet on this blog for far too long, and I thought it was time that we let you in on what the What Were Comics? team is up to as we approach the end of 2016.
In a nutshell, the main activity these days is constructing our data-set. As you may recall, our goal is to code two per cent of comic books published in the United States between 1934 and 2014. To do this we need to complete two distinct tasks: first, we need to compile a list of all the comic books published during this period, and then we need to select that two per cent.
To compile the sampling frame of all the comic books published between 1934 and 2014 has been an incredibly onerous task – far more time-consuming than we originally envisioned, and this has been the main reason for our silence all year. Essentially our updates would have read “processing”.
Rather than settle on a potentially biased definition of what a “comic book” is a priori, we decided at the beginning to outsource that definition. We had originally thought that we would define a “comic book” as any publication appearing in two of three existing data-sets: The Grand Comic Book Database, the Overstreet Price Guide, and on mycomicshop.com. We recently revised that, however, so that a publication needs to appear in all three of these databases, which we think makes it more consistent and fair. I may write something on the topic of comic books that got demoted from our data set based on this rule (spoiler: a lot of regional publications and give-aways so far).
To produce our master list, we exported the data from the Grand Comic Book Database – over 600,000 individual entries. These were sorted in an Excel spreadsheet by year and then we unleashed our research assistants on that data. We needed them to clean up two things: entries that were undated (which was several hundred thousand items) and obvious duplicates. Undated entries plagued us, as we needed to hand-date them from our other two databases and then enter them into the appropriate year in the Excel sheet. Duplicates are far easier to note. We don’t count newsstand edition and direct market editions of the same Marvel comic book from the 1980s as different items, for example, so it is just a matter of hitting delete thousands of times. By the time we get to variant covers in the 1990s that is a whole lot of duplication.
So, we are extremely grateful to our group of research assistants who undertook that first step this year. It wound up taking most of the year to accomplish.
Currently we are, well, I am going through each year chronologically and doing a final cull. Beginning with the dataset provided by GCD I am first confirming that the comics appear in MyComicShop. When I find one that does not, it is deleted. When I find an odd-sounding one, I mark that and then look it up by hand in this year’s Overstreet (my copy is very well-thumbed already). If it’s not in Overstreet then it is also deleted.
There are two main reasons that I’ve chosen to do this fairly boring task myself (each year of data takes a couple of hours to run through – I’m listening to a lot of podcasts). The first is that it’s not that boring. Simply reading a list of comic books published year by year is quite enlightening. You can literally see changes to the industry as they appear over time and it serves as an immediate historical corrective to certain longstanding assumptions. Read a list of comic books published during the Second World War and you will quickly come to realize that comics studies has dramatically overemphasized the role of the superhero during that period, for instance.
The second reason is that there are still a lot of decisions to be made, and I’d prefer a team lead do this work as opposed to a research assistant. Take this as but one example. This morning I opened the spread sheet to 1953 and entry number 166 is Animal Adventures #1 and number 167 is Animal Adventures #1. An obvious duplicate, right? Well, not so fast. The first of these is listed as Timor and the second is Premier Magazines. Did two different publishers release a comic book with the same name in the same year? It’s certainly possible.
Turning to MyComicShop.com I note that there are two titles called Animal Adventures. The first, beginning in 1953, is from Timor (#166 in our dataset) and the second, from Accepted Publications, is dated 1955. Now we have an issue. Are Premier Magazines and Accepted Publications the same company? And even if they are, did their version of Animal Adventures come out in 1953 (as GCD claims) or 1955 (as mycomicshop claims)? Clicking through on mycomicshop gives us these two covers:
So, the Accepted version is a reprint of Timor edition. Going back to the GCD, it turns out that the Accepted version is actually a reprint of the first three issues (actually, the only three issues) of the Timor series. This makes it a different comic, and it must be included in our dataset. But here’s the problem: GCD says that the Accepted version is undated. So what year? Well, mycomicshop says 1955. Turning to Overstreet, their listing shows the three Timor issues from 1953 and 1954, plus an undated reprint of all three in a single issue from Accepted and, again, no date. So in the end we have two votes for no date, and one vote for 1955.
Turning to ComicBookPlus, they have copies of the first and third issues of this series from Timor, and the first issue is clearly dated December 1953 in the indicia. They don’t have the reprint, so no more clarity is added. So, now the judgement call: the Accepted version has to be in our dataset as it in all three sources, but where does it go? We don’t have an undated area (though perhaps we should?) and so, given one data point, I’ve moved it to 1955 where mycomicshop believes it belongs. All that work to remove one entry from the year and add one entry to another year. And there is only a two per cent chance that the comic even winds up in our sample (though, at this point, having invested the labour, I’m rooting for it).
When that process is completed (a couple of hours per year) I re-sort the spreadsheet and run the sample.
To do the sampling we are relying on the random integer generator at random.org. I enter the number of comic books for the year (for 1952: 2,895) and then ask it to generate a list of, in this case, 58 entries (two per cent, rounded up). That gets me a random list of numbers that correspond to rows in the spreadsheet and I swap one for the other. Thus, 2878 becomes Young Romance #42. As a final step, I take a look at ComicBook+ and the Digital Comics Museum to see if scans of those comics are available. Generally about forty to sixty per cent are, but that is about to end as we reach the end of the fair use exemption for those sites.
At this point we’ve completed about one quarter of our sampling, and we hope to be finished by the end of January. At that point we will begin the process of acquiring the rest of the comics that we don’t have and then we will move into beta-testing our app. We expect to take a couple of months in 2017 to iron out issues with the app and with the coding protocols, and, with luck, by spring 2017 we will begin entering data in earnest. Then the fun will really begin.
I hope to post a couple of other updates over the next couple of weeks, including a report on some preliminary findings that were shared at the Superhero Identities conference in Melbourne at the beginning of the month. In the meantime, I’ve been amusing myself while I slog through the data by posting oddities from the database on Twitter. Follow us at @whatwerecomics if you’re interested in seeing bizarre public service comics about tree farms show up in your feed!