Getting Physical

The primary task of the What Were Comics? team over the past two months has been acquisitions. There are 3, 563 comic books in our corpus, ranging from Bringing Up Father (1934) to Young Avengers #14 (2014), representing a randomly sampled two per cent of the comic books published in the United States from 1934 to 2014 inclusive. To date, we have worked with three comics shops (The Beguiling in Toronto, Lone Star Comics in Arlington, and Mile High Comics in Denver) to acquire physical copies of our corpus. Just over two-thirds of the collection is now in our greedy little hands.

Even though we have a fairly substantial budget line item for comic book acquisitions, one thing that has become clear to us quickly is that we will not be able to acquire all 3, 563 comic books that we are studying with our current funding envelope. One reason for this is that many of the comics in the corpus are bank-busters. Even with the remit of buying the lowest grade copies we can find, a comic book like Action Comics #17 is prohibitively expensive. Lone Star, for example, has an incomplete copy of this item (coverless, and missing parts of its pages) on sale for $444. That’s just too rich for us, even if it were complete. The second reason, of course, is that some of our sampled comics we simply cannot locate for sale at any price. We are on the lookout for four different issues of Gulf Funny Weekly, and we haven’t found any issues for sale.

 Too rich for us, alas...

Too rich for us, alas...

Needless to say, the combination of expense and scarcity has hit us particularly hard in the earliest part of our study. Currently, we own physical copies of three comic books from 1940 and none at all from the 1930s. As we move closer to the present our success rate improves dramatically. Without actually running the numbers, we probably have greater than 98% of the comics we want from the 2010s, and greater than 95% from the 2000s. Basically, we have picked almost all of the low-hanging fruit and what remains to be bought is material from the first forty years of our study. Particularly notable gaps in our collection fall in the areas of early romance comic books and horror comics, while the most surprising gaps may be children’s comics from the past two decades (think Nickelodeon).

So, what to do about these gaps?

Well, in the first instance, we will do provisional work on comic books that we can find digitally. While we may only own physical copies of about fifteen per cent of our comic books from the 1940s, we own electronic scans of probably eighty per cent or more of them.

Initially, I was quite content to work from scans that are available on ComicBookPlus or the Digital Comics Museum. Not only would we save funds, but they are easy to handle and easy to share with our research assistants. Over the course of the past two months, however, I’ve really changed my mind. I now see the scans as a back-up and a tool of last resort. Let me explain why.

As our comic books have rolled in, we have been cataloguing them and tagging them with a unique identifying number. This week they were all arranged chronologically and deposited on industrial shelves in the WWC office. It only takes a glance to see the importance of dealing with physical objects rather than electronic copies. For instance, simply looking at the books on the shelves highlights the way that squarebound comic books were much common in the 1960s than may be generally assumed. Moreover, look at the spine for Lone Ranger #47, with its factoids about jaguars. In the words of Nick Sousanis, “it’s almost a random encyclopedic entry and nonsense poetry all at once”. We don’t have an electronic copy of this book, but if the spine weren’t scanned I think we would be missing something important.


More generally, the physical arrangement of the comic books on the shelves tells its own story: you can actually watch the books shrink in physical size, for example. Moreover, we wind up with somewhat wacky looking shelves like this one:

 Magazine-sized comic books stand out like a sore thumb in the 1970s

Magazine-sized comic books stand out like a sore thumb in the 1970s

Here we see a series of magazine-sized comic books all arriving in a short time span: Mads, Crackeds, Crazys, Spirits, Eeeries, and even Planet of the Apes. While we have earlier magazine-sized comics (including a Humbug on the top shelf) we have nothing else like this many of them and by the mid-1990s they almost completely disappear except for a couple of late-era Mads.

Similarly, we can see at a glance the rising move towards graphic novels and collected editions that begin to show up in volume only on our final three shelves, where we find several books of several hundred pages that are hell-bent on skewing our page count averages.

 Our comics get thicker as the years go by...

Our comics get thicker as the years go by...

We’re still on the hunt for more books from our corpus, of course. Sometime soon we should have a better sense of which comics might be impossible to find in any form – physical or digital – and we will then need to address that issue through a selective re-sampling. This is the main reason that we have not yet published the corpus list to our site – we know that this is likely to be only the semi-final version. Nonetheless, we’re excited to be sharing the corpus document here in the near future and we will write about it in greater detail at that time.

Acquisitions Mode

We've been horribly negligent in terms of updating this blog, and I'm hoping to write something a little more substantial in the next few days. We (Ben Woo and Bart Beaty) provided an update to the comics scholars community earlier this month at ICAF in Seattle that is certainly worth expanding on here for the record. This isn't that post.

In a nutshell: This past summer we concluded the work of assembling the What Were Comics? sampling frame. This meant cross-checking our three data sources (the Grand Comics Database, the Overstreet Price Guide, and the database at for works that are in all three. The resulting Excel spreadsheet was then organized by year, and we randomly sampled to generate a corpus that amounts to two per cent of the comics in our sampling frame, by year and rounded up. This is the What Were Comics? corpus and it amounts to 3,563 comic books.

 Hand-checked copy of Overstreet required seven highlighters...

Hand-checked copy of Overstreet required seven highlighters...

So. We are now in the process of acquiring those comic books. Ideally, we would like to have physical and digital copies of each of them. Realistically, this may not be possible. For instance, our corpus includes four different issues of Gulf Funny Weekly from 1934 to 1940, and we have yet to find any copies for sale. It also includes things like Action Comics #17. MyComicShop currently has two of those for sale, both of which are incomplete and which cost in the hundreds of dollars. Budget realities will work against us.

 Off to the library with this lot, space cleared for the What Were Comics? corpus

Off to the library with this lot, space cleared for the What Were Comics? corpus

That said, we are making progress. This month, I donated my collection of American comic books to Special Collections at the University of Calgary, but before I did I rifled through it for comics that appear in our corpus. At the same time, we have begun purchasing what we can - including buying more than 1,500 of the comics today in one fell swoop.

This has been an interesting exercise in comics buying. Notably, the vast majority of what we have bought so far has cost us less than $2 per book. It turns out that random Black Lightning comics from the 1970s are not that valuable. We are currently focussed on the lowest hanging fruit, acquiring as much as possible at prices below $20 per book. Once the dust settles, we will see about getting the older works that we need.

I think that the most interesting things that I have noticed so far is that two categories of books are trickiest to find: the first is reasonably contemporary children's comics (like Nickelodeon books), which are probably not in high demand and which don't have a big secondary market. The second are almost any kind of romance comics, which are causing huge gaps in the Excel spreadsheet. War comics from the 1950s? Easy. Western comics from the 1950s? Here you go! Romance comics from the 1950s? No way. There is certainly something substantial to be written here about gender and collecting...

Speaking of collecting, the one thing I was telling people at ICAF was that it was striking how the random sample turned up so few "important" comics. No Fantastic Four #1, no Detective Comics #27, and so on. This is mostly good news for the budget conscious researcher. However, yesterday I went to buy Marvel Super Heroes (2nd series) #8 from 1992, fully expecting it to cost about $1.70. Imagine my surprise, therefore, to learn that it is a "historically significant" comic book:

 Squirrel Girl is REALLY popular!

Squirrel Girl is REALLY popular!

So, the first appearance of current fan favourite Squirrel Girl is highly collectible. We've passed on this comic at those prices - just can't mentally justify it. We'll have to hope her stock goes down before the end of our project.

So. Sampling frame = Done. Corpus = Complete. Acquisitions = Ongoing. Coding Protocols = Drafted. Data Entry = About to begin beta-testing.

Back soon to talk about a couple of the larger issues. If you'd like to donate a copy of Marvel Super Heroes #8 to comics research, hit us up!

Where Are We Now? (II)

We’ve been quiet on this blog for far too long, and I thought it was time that we let you in on what the What Were Comics? team is up to as we approach the end of 2016.

In a nutshell, the main activity these days is constructing our data-set. As you may recall, our goal is to code two per cent of comic books published in the United States between 1934 and 2014. To do this we need to complete two distinct tasks: first, we need to compile a list of all the comic books published during this period, and then we need to select that two per cent.

To compile the sampling frame of all the comic books published between 1934 and 2014 has been an incredibly onerous task – far more time-consuming than we originally envisioned, and this has been the main reason for our silence all year. Essentially our updates would have read “processing”.


Rather than settle on a potentially biased definition of what a “comic book” is a priori, we decided at the beginning to outsource that definition. We had originally thought that we would define a “comic book” as any publication appearing in two of three existing data-sets: The Grand Comic Book Database, the Overstreet Price Guide, and on We recently revised that, however, so that a publication needs to appear in all three of these databases, which we think makes it more consistent and fair. I may write something on the topic of comic books that got demoted from our data set based on this rule (spoiler: a lot of regional publications and give-aways so far).

To produce our master list, we exported the data from the Grand Comic Book Database – over 600,000 individual entries. These were sorted in an Excel spreadsheet by year and then we unleashed our research assistants on that data. We needed them to clean up two things: entries that were undated (which was several hundred thousand items) and obvious duplicates. Undated entries plagued us, as we needed to hand-date them from our other two databases and then enter them into the appropriate year in the Excel sheet. Duplicates are far easier to note. We don’t count newsstand edition and direct market editions of the same Marvel comic book from the 1980s as different items, for example, so it is just a matter of hitting delete thousands of times. By the time we get to variant covers in the 1990s that is a whole lot of duplication.

So, we are extremely grateful to our group of research assistants who undertook that first step this year. It wound up taking most of the year to accomplish.

Currently we are, well, I am going through each year chronologically and doing a final cull. Beginning with the dataset provided by GCD I am first confirming that the comics appear in MyComicShop. When I find one that does not, it is deleted. When I find an odd-sounding one, I mark that and then look it up by hand in this year’s Overstreet (my copy is very well-thumbed already). If it’s not in Overstreet then it is also deleted.

There are two main reasons that I’ve chosen to do this fairly boring task myself (each year of data takes a couple of hours to run through – I’m listening to a lot of podcasts). The first is that it’s not that boring. Simply reading a list of comic books published year by year is quite enlightening. You can literally see changes to the industry as they appear over time and it serves as an immediate historical corrective to certain longstanding assumptions. Read a list of comic books published during the Second World War and you will quickly come to realize that comics studies has dramatically overemphasized the role of the superhero during that period, for instance.

The second reason is that there are still a lot of decisions to be made, and I’d prefer a team lead do this work as opposed to a research assistant. Take this as but one example. This morning I opened the spread sheet to 1953 and entry number 166 is Animal Adventures #1 and number 167 is Animal Adventures #1. An obvious duplicate, right? Well, not so fast. The first of these is listed as Timor and the second is Premier Magazines. Did two different publishers release a comic book with the same name in the same year? It’s certainly possible.

Turning to I note that there are two titles called Animal Adventures. The first, beginning in 1953, is from Timor (#166 in our dataset) and the second, from Accepted Publications, is dated 1955. Now we have an issue. Are Premier Magazines and Accepted Publications the same company? And even if they are, did their version of Animal Adventures come out in 1953 (as GCD claims) or 1955 (as mycomicshop claims)? Clicking through on mycomicshop gives us these two covers:


So, the Accepted version is a reprint of Timor edition. Going back to the GCD, it turns out that the Accepted version is actually a reprint of the first three issues (actually, the only three issues) of the Timor series. This makes it a different comic, and it must be included in our dataset. But here’s the problem: GCD says that the Accepted version is undated. So what year? Well, mycomicshop says 1955. Turning to Overstreet, their listing shows the three Timor issues from 1953 and 1954, plus an undated reprint of all three in a single issue from Accepted and, again, no date. So in the end we have two votes for no date, and one vote for 1955.

Turning to ComicBookPlus, they have copies of the first and third issues of this series from Timor, and the first issue is clearly dated December 1953 in the indicia. They don’t have the reprint, so no more clarity is added. So, now the judgement call: the Accepted version has to be in our dataset as it in all three sources, but where does it go? We don’t have an undated area (though perhaps we should?) and so, given one data point, I’ve moved it to 1955 where mycomicshop believes it belongs. All that work to remove one entry from the year and add one entry to another year. And there is only a two per cent chance that the comic even winds up in our sample (though, at this point, having invested the labour, I’m rooting for it).

When that process is completed (a couple of hours per year) I re-sort the spreadsheet and run the sample.

To do the sampling we are relying on the random integer generator at I enter the number of comic books for the year (for 1952: 2,895) and then ask it to generate a list of, in this case, 58 entries (two per cent, rounded up). That gets me a random list of numbers that correspond to rows in the spreadsheet and I swap one for the other. Thus, 2878 becomes Young Romance #42. As a final step, I take a look at ComicBook+ and the Digital Comics Museum to see if scans of those comics are available. Generally about forty to sixty per cent are, but that is about to end as we reach the end of the fair use exemption for those sites.

At this point we’ve completed about one quarter of our sampling, and we hope to be finished by the end of January. At that point we will begin the process of acquiring the rest of the comics that we don’t have and then we will move into beta-testing our app. We expect to take a couple of months in 2017 to iron out issues with the app and with the coding protocols, and, with luck, by spring 2017 we will begin entering data in earnest. Then the fun will really begin.


I hope to post a couple of other updates over the next couple of weeks, including a report on some preliminary findings that were shared at the Superhero Identities conference in Melbourne at the beginning of the month. In the meantime, I’ve been amusing myself while I slog through the data by posting oddities from the database on Twitter. Follow us at @whatwerecomics if you’re interested in seeing bizarre public service comics about tree farms show up in your feed!

Where Are We Now?

 Tippy Teen, as drawn by the great Harry Lucey, finds a problem in the coding protocols...

Tippy Teen, as drawn by the great Harry Lucey, finds a problem in the coding protocols...

We've gone two months without updating this blog, but I would hate to let you think that we haven't been working all summer. Here's where we are at right now:

1. Data curation. Having exported the GCD database we are trying to determine the scope of comic book publishing between 1934 and 2014 so that we can develop our random sample. This requires eliminating a lot of items that are included in GCD that we don't want included. One example (that doesn't actually impact us because it falls one month outside our timeframe): Star Wars #1 shipped in January this year with, I believe, seventy different variant covers. We don't count variants as different comics, so this would need to show up in our database once, not seventy times. In part that's as simple as selecting and deleting sixty-nine instances of the book from an Excel spreadsheet. The problem is that the comics industry loves variants (or has loved them in the past) and there are literally tens of thousands of them and we have to confirm each one by hand. Slow-going. We now have four students working on this issue, and we hope to be done in the next few weeks.

2. Finalizing the Sampling Frame. Related to this, for our purposes a comic book is only a comic book if it appears in two of three databases (GCD,, or the Overstreet Guide). We have been cross-referencing GCD and mycomicshop all summer, but in late August we will have to cross-reference against Overstreet. Fortunately, I am expecting two copies of the new Overstreet Guide to arrive this week, so we can finally move on to this task.

3. Data entry tool. As this is a big data project, we need a consistent way for our coders to input data that will be usable in the long term. To this end, we have hired a computer programmer to build us a unique tool that will allow each of our coders in the next step to connect to the database and input findings. We are in the final stages of this project, which has been slowed somewhat by the research team's constant "what if we added this function?" requests. We are the type of people who drive our programmers crazy.

4. Coding protocols. This has been the big one. Last week, Ben Woo came to Calgary for four days of meetings. This was the first time that the entire team had been able to sit down in person to discuss the project. Over the course of a couple of days we drafted a fifteen page coding guide and a supplemental booklet of illustrations and examples that ran to another hundred pages. We then shared this with several students and asked them to read it and code one comic book: Tippy Teen #21 (which you can read online here). We then met at the Visualization Studio at the University of Calgary to compare notes on the data. Discussing the coding with the students made it very clear which parts of our protocols need to be more precise, and we are now revising those instructions. We will run a second round of tests likely in August. Ideally we would like the instructions to be clear enough that any student could step in and read a comic book and code it the same as any other student. The reality is that people are people and this will never be the case, but we're trying to eliminate as much of the guess work as possible.

In our initial test we encountered two significant areas of confusion. First, we are trying to define "shot lengths" (to borrow a term from film studies). That is, at what distance are the characters from the reader? Close-up, medium length, extremely long? These terms are simple to define, but in comics they are not so easy to code. In film, at least in the pre-CGI era, coding shot lengths is relatively easy because you identify what is in sharpest focus - it Cary Grant is in the centre of the screen with people at a party behind him that are not in focus, you code for Cary Grant. In comics, though, everything can be "in focus" because there is no technological limitation on the drawing. This is something that we need to define. Ideas like "the element that is the focal point of the panel" have proved far too vague.

Second, the overall design of the page has proved to be a total bear. We probably argued for three hours about this one day, came to a resolution, and then argued about it for three hours again the next day. Notions of the grid, and, more importantly, types of grids have already become our bête noir. We are planning a longer post on this topic - and possibly a journal article - coming soon. Suffice to say: the entire concept of the "tier" in comics studies is ridiculously fraught because of the myriad innovative ways that cartoonists find to layout their pages for maximum visual interest. 

So that is where we are at as of the end of July 2015. We are optimistic that we will have our coding book, data collection tool, and random sample ready to go for September, which is when the real work will begin.