Acquisitions Mode

We've been horribly negligent in terms of updating this blog, and I'm hoping to write something a little more substantial in the next few days. We (Ben Woo and Bart Beaty) provided an update to the comics scholars community earlier this month at ICAF in Seattle that is certainly worth expanding on here for the record. This isn't that post.

In a nutshell: This past summer we concluded the work of assembling the What Were Comics? sampling frame. This meant cross-checking our three data sources (the Grand Comics Database, the Overstreet Price Guide, and the database at MyComicShop.com) for works that are in all three. The resulting Excel spreadsheet was then organized by year, and we randomly sampled to generate a corpus that amounts to two per cent of the comics in our sampling frame, by year and rounded up. This is the What Were Comics? corpus and it amounts to 3,563 comic books.

Hand-checked copy of Overstreet required seven highlighters...

Hand-checked copy of Overstreet required seven highlighters...

So. We are now in the process of acquiring those comic books. Ideally, we would like to have physical and digital copies of each of them. Realistically, this may not be possible. For instance, our corpus includes four different issues of Gulf Funny Weekly from 1934 to 1940, and we have yet to find any copies for sale. It also includes things like Action Comics #17. MyComicShop currently has two of those for sale, both of which are incomplete and which cost in the hundreds of dollars. Budget realities will work against us.

Off to the library with this lot, space cleared for the What Were Comics? corpus

Off to the library with this lot, space cleared for the What Were Comics? corpus

That said, we are making progress. This month, I donated my collection of American comic books to Special Collections at the University of Calgary, but before I did I rifled through it for comics that appear in our corpus. At the same time, we have begun purchasing what we can - including buying more than 1,500 of the comics today in one fell swoop.

This has been an interesting exercise in comics buying. Notably, the vast majority of what we have bought so far has cost us less than $2 per book. It turns out that random Black Lightning comics from the 1970s are not that valuable. We are currently focussed on the lowest hanging fruit, acquiring as much as possible at prices below $20 per book. Once the dust settles, we will see about getting the older works that we need.

I think that the most interesting things that I have noticed so far is that two categories of books are trickiest to find: the first is reasonably contemporary children's comics (like Nickelodeon books), which are probably not in high demand and which don't have a big secondary market. The second are almost any kind of romance comics, which are causing huge gaps in the Excel spreadsheet. War comics from the 1950s? Easy. Western comics from the 1950s? Here you go! Romance comics from the 1950s? No way. There is certainly something substantial to be written here about gender and collecting...

Speaking of collecting, the one thing I was telling people at ICAF was that it was striking how the random sample turned up so few "important" comics. No Fantastic Four #1, no Detective Comics #27, and so on. This is mostly good news for the budget conscious researcher. However, yesterday I went to buy Marvel Super Heroes (2nd series) #8 from 1992, fully expecting it to cost about $1.70. Imagine my surprise, therefore, to learn that it is a "historically significant" comic book:

Squirrel Girl is REALLY popular!

Squirrel Girl is REALLY popular!

So, the first appearance of current fan favourite Squirrel Girl is highly collectible. We've passed on this comic at those prices - just can't mentally justify it. We'll have to hope her stock goes down before the end of our project.

So. Sampling frame = Done. Corpus = Complete. Acquisitions = Ongoing. Coding Protocols = Drafted. Data Entry = About to begin beta-testing.

Back soon to talk about a couple of the larger issues. If you'd like to donate a copy of Marvel Super Heroes #8 to comics research, hit us up!

Where Are We Now? (II)

We’ve been quiet on this blog for far too long, and I thought it was time that we let you in on what the What Were Comics? team is up to as we approach the end of 2016.

In a nutshell, the main activity these days is constructing our data-set. As you may recall, our goal is to code two per cent of comic books published in the United States between 1934 and 2014. To do this we need to complete two distinct tasks: first, we need to compile a list of all the comic books published during this period, and then we need to select that two per cent.

To compile the sampling frame of all the comic books published between 1934 and 2014 has been an incredibly onerous task – far more time-consuming than we originally envisioned, and this has been the main reason for our silence all year. Essentially our updates would have read “processing”.

 

Rather than settle on a potentially biased definition of what a “comic book” is a priori, we decided at the beginning to outsource that definition. We had originally thought that we would define a “comic book” as any publication appearing in two of three existing data-sets: The Grand Comic Book Database, the Overstreet Price Guide, and on mycomicshop.com. We recently revised that, however, so that a publication needs to appear in all three of these databases, which we think makes it more consistent and fair. I may write something on the topic of comic books that got demoted from our data set based on this rule (spoiler: a lot of regional publications and give-aways so far).

To produce our master list, we exported the data from the Grand Comic Book Database – over 600,000 individual entries. These were sorted in an Excel spreadsheet by year and then we unleashed our research assistants on that data. We needed them to clean up two things: entries that were undated (which was several hundred thousand items) and obvious duplicates. Undated entries plagued us, as we needed to hand-date them from our other two databases and then enter them into the appropriate year in the Excel sheet. Duplicates are far easier to note. We don’t count newsstand edition and direct market editions of the same Marvel comic book from the 1980s as different items, for example, so it is just a matter of hitting delete thousands of times. By the time we get to variant covers in the 1990s that is a whole lot of duplication.

So, we are extremely grateful to our group of research assistants who undertook that first step this year. It wound up taking most of the year to accomplish.

Currently we are, well, I am going through each year chronologically and doing a final cull. Beginning with the dataset provided by GCD I am first confirming that the comics appear in MyComicShop. When I find one that does not, it is deleted. When I find an odd-sounding one, I mark that and then look it up by hand in this year’s Overstreet (my copy is very well-thumbed already). If it’s not in Overstreet then it is also deleted.

There are two main reasons that I’ve chosen to do this fairly boring task myself (each year of data takes a couple of hours to run through – I’m listening to a lot of podcasts). The first is that it’s not that boring. Simply reading a list of comic books published year by year is quite enlightening. You can literally see changes to the industry as they appear over time and it serves as an immediate historical corrective to certain longstanding assumptions. Read a list of comic books published during the Second World War and you will quickly come to realize that comics studies has dramatically overemphasized the role of the superhero during that period, for instance.

The second reason is that there are still a lot of decisions to be made, and I’d prefer a team lead do this work as opposed to a research assistant. Take this as but one example. This morning I opened the spread sheet to 1953 and entry number 166 is Animal Adventures #1 and number 167 is Animal Adventures #1. An obvious duplicate, right? Well, not so fast. The first of these is listed as Timor and the second is Premier Magazines. Did two different publishers release a comic book with the same name in the same year? It’s certainly possible.

Turning to MyComicShop.com I note that there are two titles called Animal Adventures. The first, beginning in 1953, is from Timor (#166 in our dataset) and the second, from Accepted Publications, is dated 1955. Now we have an issue. Are Premier Magazines and Accepted Publications the same company? And even if they are, did their version of Animal Adventures come out in 1953 (as GCD claims) or 1955 (as mycomicshop claims)? Clicking through on mycomicshop gives us these two covers:

 

So, the Accepted version is a reprint of Timor edition. Going back to the GCD, it turns out that the Accepted version is actually a reprint of the first three issues (actually, the only three issues) of the Timor series. This makes it a different comic, and it must be included in our dataset. But here’s the problem: GCD says that the Accepted version is undated. So what year? Well, mycomicshop says 1955. Turning to Overstreet, their listing shows the three Timor issues from 1953 and 1954, plus an undated reprint of all three in a single issue from Accepted and, again, no date. So in the end we have two votes for no date, and one vote for 1955.

Turning to ComicBookPlus, they have copies of the first and third issues of this series from Timor, and the first issue is clearly dated December 1953 in the indicia. They don’t have the reprint, so no more clarity is added. So, now the judgement call: the Accepted version has to be in our dataset as it in all three sources, but where does it go? We don’t have an undated area (though perhaps we should?) and so, given one data point, I’ve moved it to 1955 where mycomicshop believes it belongs. All that work to remove one entry from the year and add one entry to another year. And there is only a two per cent chance that the comic even winds up in our sample (though, at this point, having invested the labour, I’m rooting for it).

When that process is completed (a couple of hours per year) I re-sort the spreadsheet and run the sample.

To do the sampling we are relying on the random integer generator at random.org. I enter the number of comic books for the year (for 1952: 2,895) and then ask it to generate a list of, in this case, 58 entries (two per cent, rounded up). That gets me a random list of numbers that correspond to rows in the spreadsheet and I swap one for the other. Thus, 2878 becomes Young Romance #42. As a final step, I take a look at ComicBook+ and the Digital Comics Museum to see if scans of those comics are available. Generally about forty to sixty per cent are, but that is about to end as we reach the end of the fair use exemption for those sites.

At this point we’ve completed about one quarter of our sampling, and we hope to be finished by the end of January. At that point we will begin the process of acquiring the rest of the comics that we don’t have and then we will move into beta-testing our app. We expect to take a couple of months in 2017 to iron out issues with the app and with the coding protocols, and, with luck, by spring 2017 we will begin entering data in earnest. Then the fun will really begin.

--

I hope to post a couple of other updates over the next couple of weeks, including a report on some preliminary findings that were shared at the Superhero Identities conference in Melbourne at the beginning of the month. In the meantime, I’ve been amusing myself while I slog through the data by posting oddities from the database on Twitter. Follow us at @whatwerecomics if you’re interested in seeing bizarre public service comics about tree farms show up in your feed!

Where Are We Now?

Tippy Teen, as drawn by the great Harry Lucey, finds a problem in the coding protocols...

Tippy Teen, as drawn by the great Harry Lucey, finds a problem in the coding protocols...


We've gone two months without updating this blog, but I would hate to let you think that we haven't been working all summer. Here's where we are at right now:

1. Data curation. Having exported the GCD database we are trying to determine the scope of comic book publishing between 1934 and 2014 so that we can develop our random sample. This requires eliminating a lot of items that are included in GCD that we don't want included. One example (that doesn't actually impact us because it falls one month outside our timeframe): Star Wars #1 shipped in January this year with, I believe, seventy different variant covers. We don't count variants as different comics, so this would need to show up in our database once, not seventy times. In part that's as simple as selecting and deleting sixty-nine instances of the book from an Excel spreadsheet. The problem is that the comics industry loves variants (or has loved them in the past) and there are literally tens of thousands of them and we have to confirm each one by hand. Slow-going. We now have four students working on this issue, and we hope to be done in the next few weeks.

2. Finalizing the Sampling Frame. Related to this, for our purposes a comic book is only a comic book if it appears in two of three databases (GCD, mycomicshop.com, or the Overstreet Guide). We have been cross-referencing GCD and mycomicshop all summer, but in late August we will have to cross-reference against Overstreet. Fortunately, I am expecting two copies of the new Overstreet Guide to arrive this week, so we can finally move on to this task.

3. Data entry tool. As this is a big data project, we need a consistent way for our coders to input data that will be usable in the long term. To this end, we have hired a computer programmer to build us a unique tool that will allow each of our coders in the next step to connect to the database and input findings. We are in the final stages of this project, which has been slowed somewhat by the research team's constant "what if we added this function?" requests. We are the type of people who drive our programmers crazy.

4. Coding protocols. This has been the big one. Last week, Ben Woo came to Calgary for four days of meetings. This was the first time that the entire team had been able to sit down in person to discuss the project. Over the course of a couple of days we drafted a fifteen page coding guide and a supplemental booklet of illustrations and examples that ran to another hundred pages. We then shared this with several students and asked them to read it and code one comic book: Tippy Teen #21 (which you can read online here). We then met at the Visualization Studio at the University of Calgary to compare notes on the data. Discussing the coding with the students made it very clear which parts of our protocols need to be more precise, and we are now revising those instructions. We will run a second round of tests likely in August. Ideally we would like the instructions to be clear enough that any student could step in and read a comic book and code it the same as any other student. The reality is that people are people and this will never be the case, but we're trying to eliminate as much of the guess work as possible.

In our initial test we encountered two significant areas of confusion. First, we are trying to define "shot lengths" (to borrow a term from film studies). That is, at what distance are the characters from the reader? Close-up, medium length, extremely long? These terms are simple to define, but in comics they are not so easy to code. In film, at least in the pre-CGI era, coding shot lengths is relatively easy because you identify what is in sharpest focus - it Cary Grant is in the centre of the screen with people at a party behind him that are not in focus, you code for Cary Grant. In comics, though, everything can be "in focus" because there is no technological limitation on the drawing. This is something that we need to define. Ideas like "the element that is the focal point of the panel" have proved far too vague.

Second, the overall design of the page has proved to be a total bear. We probably argued for three hours about this one day, came to a resolution, and then argued about it for three hours again the next day. Notions of the grid, and, more importantly, types of grids have already become our bête noir. We are planning a longer post on this topic - and possibly a journal article - coming soon. Suffice to say: the entire concept of the "tier" in comics studies is ridiculously fraught because of the myriad innovative ways that cartoonists find to layout their pages for maximum visual interest. 

So that is where we are at as of the end of July 2015. We are optimistic that we will have our coding book, data collection tool, and random sample ready to go for September, which is when the real work will begin.

First Data

When I typed the title “First Data” above, initially it came out as “First Date”. I almost kept that in, because in some ways diving into our data set for the first time is exactly like getting to know someone - there is some excitement, some trepidation, some sense of not wanting to move too fast, or to screw anything up.

Yesterday, thanks to some help from the great Rodrigo Baeza, we were able to look at our master list for the first time. If you’re just joining us, in our attempt to sample two per cent of all “American comic books” published between 1934 and 2014, we have worked out a preliminary definitional system that something is an “American comic book” if it was published in the United States and if it appears in at least two of these three sources: the Grand Comics Database; the Overstreet Price Guide; and mycomicshop.com. For the underground period, we are also using Jay Kennedy’s price guide, since Overstreet intentionally omits underground comix. 

So far, so good.

Our first step has been to use the MySQL data dump from the GCD and to sort it in Excel. This is primitive, to be sure, but all we are interested in at this point in time is making a series of eighty-one annual lists that can be cross-checked against the other reference sources and then randomized. 

At first glance, our data dump includes 293,381 items. Some of these were cut immediately: things published before 1934, and things published in 2015 (if we do add 2015, it will be after the year has concluded). Simple. 

Sorting by year and moving comics to individual pages was also a cut and dried process for the most part. Interestingly, just moving all those issues electronically provides a good historical overview of the rises and falls in publishing volume over time. We’ll have more to share on this point when we get precise data.

The first big stumbling block came when I looked at all of the comics that have no dates attached to them in the GCD. That number was much higher than I anticipated: 69,630 - or more than twenty-three per cent of all the titles indexed lack this basic piece of information. 

What to do? Well, the obvious answer is that we need to generate that by hand ourselves. Simply beginning at the first entry I began typing titles into the mycomicshop.com database to see what is there. Fortunately, quite a lot is there, including good dating information. An added bonus is that this has already demonstrated that mycomicshop.com is not simply taking data from GCD and duplicating it, which was a back of the mind worry. They are independent of each other, so can be used to corroborate each other. Items with dates in mycomicshop.com are being added to our master list, and items that are absent are being flagged as absent so that they can be checked against Overstreet for inclusion or exclusion. The bad news? It’s a time-consuming and dull process. I checked about 300 records in an hour. At that pace, it will take a research assistant six and a half weeks of full time work just to put dates on entries that are missing them, and even then we will need to check the validity of the 230,000 comics that already had dates. It will be a long summer.

One other issue has already raised its head: GCD indexes some material that seems very instinctively to be not comics, like Toyfare Magazine and Amazing Heroes, and mycomicshop.com also lists them for sale. We may need to modify our search parameters based on the stated GCD rule that: “The individual issues of this series are each less than 50% comics. Only comics sequences are indexed and cover scans are accepted only if the issue has 10% indexed comics content.” This would mean restricting our searches to items that are “cover scan eligible” from GCD. Thoughts?

Defining "Comic Books"

Previously I outlined our first four goals with this project this way:

  1. Define what we mean by a “comic book"
  2. Find out how many “comic books” were published in each year
  3. Randomly select two per cent of those to form a data set
  4. Track down electronic or physical copies that are in the data set

Yesterday the three investigators on this project sat down to talk about this for the first time since we received our grant funding from the Social Sciences and Humanities Research Council of Canada. It was an epic four-hour Skype call, much of which addressed that first point.

Here’s the sticking point - this is an interdisciplinary team. Bart Beaty is trained in Communication Studies but works in an English department. Ben Woo is trained in Communications and works in Communications. Nick Sousanis is trained in Art Education and is hoping to continue in that area. Nick is a practicing artist, Ben was an aspiring artist, Bart can’t draw a lick. So we all come to this project with slightly different backgrounds and interests. We all want to use the data set that we will produce to do slightly different things. This is, we believe, one of the strengths of the team-based approach. But it also means that we have to initially get on the same page before we can even take the first steps.

So: what are we actually studying? “American comic books”. Great. But what are those exactly? Are we referring to a publishing format? If so, we need to start from the fact that the “American comic book” has been 64 pages, 52 pages, 36 pages, even occasionally 80 or 100 pages, and that its dimensions are not constant. If we focus on format we lose magazine-sized comics from Warren and Fantagraphics and we lose digest-sized comics from Archie, even though all of those things seem, intuitively, to be comics. Moreover, we predetermine a good deal of our results. Format, distribution channel, publisher - all of these were considered and rejected as inadequate starting points. 

In the end we returned to something that I wrote in my book Comics Versus Art:

By conceptualizing comics as the products of a particular social world, rather than as a set of formal strategies, it is possible to highlight the various conventions that are frequently used by comics artists. While most theorists of comics have come to identify certain traits (sequential images, word/text relations, continuing characters, reproducibility, word balloons, and so on) as essential to the comics form, given the difficulties presented by border cases, of which children's illustrated picture books and artist’s book are but two, it seems much more productive to say that these traits are merely conventions of the comics form, rather than defining elements of it.

The key for us moving forward is to study the “social world” of “American comic books”. In a tautological way, you might say that we want to study the “American comic books” that are recognized as “American comic books” by people who create, sell, and read “American comic books”.

So how to do that? 

In our preliminary work so far we have been working with the data in the Grand Comic Book Database. This has been an incredibly useful launching point, but it is not without problems for us because they seem to be more expansive in their definitions than we might be. One solution would simply be to defer to them and use everything that they index, including Ballyhoo and other collections of gag cartoons, and the Spirit Supplements. This seemed to us to be ceding too much authority to a group that has a slightly different mandate than we do.

The solution that we arrived at was to use a three-pronged approach. We will include in our master list any “American comic book” that is included in at least two of these three reference sites: The Grand Comic Book Database; the Overstreet Price Guide; and, MyComicShop.com.

The rationale for these selections is as follows:

The GCD is the most comprehensive public database attempting to catalogue comics production around the world.

The Overstreet Price Guide has been the standard reference work for comic book collectors for more than forty years. While it is admittedly more selective than the GCD, it has had an indelible influence on the self-image of the “comics world” for decades.

MyComicShop.com was selected because it is an extensive site that lists both items for sale even when they are not in stock. It also has a very friendly interface.

We are very interested in hearing suggestions about other sites or indices that seek to be roughly comprehensive descriptors of comic book culture. We did consider The Standard Catalog of Comic Books, but as it has not been updated since 2005 we ruled it out.

So, essentially, we will be making a master list of everything that appears in at least two of these three sources. Relative to our previous posts, this would exclude Ballyhoo (only found in GCD) but would include the Spirit Supplements (found in all three). So that argument is concluded for now.

A few other notes on our initial assumptions:

On reprints. We will be including reprints as separate items. This was contentious, and the question became: Are we studying what was created in the industry at various moments in time, or what was made available to readers? Take Watchmen, for example. We could count Watchmen #1 as a 1986 release and then exclude all of its many reprintings. That would allow us to focus on what was created and when. Yet it seems very important that Watchmen #1 has been reprinted as a trade paperback, a hardcover, an Absolute Edition, as a DC Comics Essentials, particularly as that helps us track the shape of “the comics world” in terms of reprint strategies. Watchmen will just keep turning up. That said, we are only using new editions rather than immediate reprintings to satisfy demand on under-ordered titles.

On translations. We will be including these, where they appear in two of three sources. To exclude them seemed as if it would misrepresent the shape of the field (particularly with the rise of manga). Coding them as a distinct category will allow us to do some interesting things.

On “American”. One of the oddities, given where our funding comes from, is the fact that we are excluding Cerebus and comics published by Drawn and Quarterly. We are not going to include work from Canadian publishers so as to make the our data set coherent. One thing that we all agreed: we hope that we will be able to recruit a graduate student who will use our tools to run a comparative study focusing on Canadian comics. It would be fascinating to see the results.

So, we are beginning with a working definition of our first question, above. We are defining a “comic book” as something that is included in at least two of the three most commonly used databases that define “comic books”.

We are particularly interested in thoughts on this because if we have this step fundamentally wrong it would be better for us to know that before we move on to step two rather than after.

Our next stop to actually compile a master list, from which we will derive our randomized two per cent sample set. That will be a lot of manual labor on the part of our research assistants!

Perpetually Falling Horse

Or, Lost in the Jungle.

This morning I spent some time on the Digital Comics Museum looking at comic books from the 1930s and 1940s that are now in the public domain in advance of thinking through some of our coding protocols. Looking specifically at Jungle Comics 38 (Fiction House, 1943) I came across this page that initially flummoxed me. Here's the page:

You can imagine the story: the man is Kaanga, the blond king of the jungle, and here he is rescuing a woman named Julie who has foolishly tried to ride across a river just upstream from a waterfall.

What is unusual here is the second panel, which stretches across two tiers. Double-height panels are not at all uncommon in comic books of this period, but they seem less likely to appear in the middle of three columns than either side (something, I suppose that we'll have to verify empirically). Anyway, my initial presumption was that the page read in this direction:

Confronted with the tall panel in the middle, I assumed that the first move was to go down, then across. Yet if you read the captions, it is clear that this is not the case at all. The reading order is actually like this:

This reading pattern was surprising to me because of the long diagonal move back to the beginning of the second tier that moves through panel two again. Further, it is clear that the reader seems meant to skip the waterfall panel on the second left-to-right read through, otherwise that horse is simply hanging in midair forever. I suppose that might be one explanation for why Kaanga and Julie don't land on the dead horse at the bottom...

I'm not sure, precisely, how we would be meant to read this in terms of the typology that Scott McCloud provides in Understanding Comics. If we are not intended to re-read the waterfall, then the panel itself becomes a sort of over-sized gutter in an action-to-action transition. But that seems completely erroneous. Indeed, it seems that we are meant to go through the waterfall a second time - though maybe without the horse falling on us. 

It's a dangerous business.

Excluding Comics

Given that the central drive of our research project is to define what was “typical” of American comic books over the past eight decades, we need to begin with a working definition of what is an “American comic book”. We have already signalled that problem with a discussion of the Spirit supplements that began in the 1940s, but there are plenty of other occasions where this issue raises its definitionally ugly head.

In order to define typicality, it is our goal to randomly sample two per cent of the comic books published in the United States. We are not interested in “the best” works but wish to work from a completely neutral sample set. This requires several steps:

1. Define what we mean by a “comic book"
2. Determine how many “comic books” were published in each year
3. Randomly select two per cent of those to form a data set
4. Track down electronic or physical copies that are in the data set

One interesting thing that I’ve already discovered is that it is somewhat easier to do step two than it is to do step one.

Right now, we are extremely fortunate that a tool like the Grand Comics Database exists. This volunteer project has been cataloguing comics in the United States and around the world for a long time, and it is a remarkably flexible and searchable database. Over the past few days I have been running some experiments using the GCD data using their Advanced Search functionality where I can set country of publication to the United States and delimit an annual date range very easily. Let’s consider an example:

Setting the date range to 1970 returns 1,635 items. Even right there on the first page of results, however, we run into a couple of issues. Scrolling down we see Archie and the Generation Gap and Archie’s Girls, both published by Bantam Books. Since I have own both of these as a result of writing Twelve-Cent Archie, I can tell you that these are not “comic books” in the sense that we are using that term - they are books that we would like to exclude.

So how to best do that? The simplest way would be to refine our search based on format, by narrowing it to comics that are “saddle stitched” (to use a GCD term). This reduces our count from 1,635 to 1,432 and it eliminates the Bantam Books. Success! But, what if we have eliminated too much with this search? After all, many giant size comics (like Annuals) were square bound. Indeed, a search with “square bound” as the format turns up 411 comics (certain titles, like Action Comics, are listed with both formats). That’s not ideal. Similarly, searching with “cardstock” as the paper stock catches the Bantam Books and others (44 in total). At first glance, it looks like all of those should probably be excluded.

Beginning with “saddle stitched” isn’t a great deal of help because the GCD likely includes magazines that we might want to exclude. We could refine by searching on “standard silver age size” - this brings up 1,065 results - but that makes me wonder how we could have possibly excluded such an enormous proportion of our initial search (almost 600 issues). The GCD has a more expansive definition of comics than what we are using, which is great, but finding exactly the information that we need becomes a challenge.

There is one obvious solution: Review all the data by hand. Simply begin with the largest number and then check the entries and do the exclusions manually. I did that, for example, for 1935.

Running our basic search for 1935, we get 41 results. This seemed intuitively high to me based on my knowledge of comics history. Indeed, some results seemed to need to be scrubbed right away. The Seventh New Yorker Album, is a hardcover collection of New Yorker gag cartoons and is not what we would conventionally call a comic book. It’s out. The strip collections published by David McKary (Popeye, Henry, Little Annie Rooney) similarly don’t fit our conventional definition - they’re out. The Tijuana Bibles that have been included here should be out. Finally, the first result is eight issues of Ballyhoo, a Dell published humour magazine that was an important forerunner of Mad. It’s a fascinating magazine (read about here), but it doesn’t fit our initial working definition.

In the end we are left with New FunNew Comics, Mickey Mouse Magazine (a semi-borderline case because of the heavier use of non-comics pieces), Famous Funnies, and Big Book of Fun Comics (which has a cardboard cover, but which otherwise seems to fit). That is 24 “comic books” out of an initial sampling of 41. 

In the end, this result doesn’t change much. Two per cent of 41 is less than 1 book and so is two per cent of 24 (we will round up to one for the year). Eliminating titles by hand wasn’t difficult, but the sample size is tiny. By contrast, for 2014, for example, the GCD returns 8,793 results - that will take a lot longer to clean by hand. The very first result shows one of our problems: the first issue of the Image comic book ’68 Homefront is listed three times due to variant covers. That’s a really simple item to scrub out by putting “variant” into the search (an astonishing 1,799 results!). Still, that only brings the initial set to 7,000 entries - an awful lot to cull by hand.

You’ll note that even after all of this work we still won’t have defined “American comic book” format either. Still a lot of work to do before we can even begin!

Bechdel's Tiers

One of the things that I’ve already noticed about this project is how much it has impacted not only the way I’ve been reading comics, but also the way that I’ve been teaching them. This afternoon my undergraduate class wraps up its study of Alison Bechdel’s memoir Fun Home, one of the most remarkable (and most studied!) comics of the past decade.

When we discussed this book in our last meeting, there was a pretty clear consensus in the class that pages 220 and 221 were the “most important” pages in the book. Certainly that is easy to see both narratively (it is Alison’s last important conversation with her father), thematically (it is the scene in which the facts are laid bare), and even formally. What is striking here is the use of the four-tiered grid, the regularity of the panel sizes, and the density of the panels, all of which seem unusual given the previous 219 pages of the book.

Throughout Fun Home, Bechdel uses a small number of single tiered pages (splash pages) as chapter titles:

Most of the book relies on pages with two or three tiers, with the three tier page predominating:

The counting of these tiers will be one of our most important undertakings in this project. In our shorthand while talking about this project we’ve often said we’re counting pages, then panels per page, then balloons per panel, then words per balloon. That’s crude and inaccurate, but it at least gave a general sense of the first stage of our project. Yet even the most cursory glance at a comic book page will demonstrate that before the panel comes the tier - it is central to the geography of the comics page. Moreover, it is not simple. The pages above are all relatively uncontroversial in terms of tier count. But what about this one?

When that first panel stretches over two tiers, do we count this as three? I think that most critics would say that we should, that this page has three tiers but that one panel happens to be in two of them. That’s more of a coding issue than an analytic one. 

But what about this?:

If the previous page was three tiers, this is clearly four. Yet there is something about this page that makes it seem like three - probably the fact that only one of the first two panels has a caption, which makes the lower one seem like it could be part of the upper one. Regardless, I think that this has to be counted as four tiers. That would make this the first four tier page in the book. It also makes the conversation in the car slightly less formally anomalous. Similarly, this page probably needs to be counted as having four tiers:

One of the interesting things about Fun Home is that once Bechdel introduces the four-tiered page, she begins to use it more frequently. I think that an argument could be made about this page being either four or five-tiered:

Nonetheless, it is this conversation between Alison and Bruce that most leaps out at us. Perhaps not because of the number of tiers, but because of the regularity of the panel sizes and the density. There are no other pages that have anywhere near to the number of panels that Bechdel uses here.

After the conversation, which comes extremely late in the book, the four-tiered page becomes a semi-regular feature:

Given the importance of this scene, I think that any analysis that doesn’t take into consideration the increasing importance of the tier as a unit of page design is likely going to be on shaky ground.

Bechdel’s book also poses some interesting conundrums for our coding (if it is randomly selected as part of our sample), but probably not much more than many others. I think that some people presume that the count of panels, for instance, will be relatively straightforward, while the coding of panel transitions (a la Scott McCloud) will be more interpretive. That is certainly true, but panels themselves can cause problems. Bechdel frequently uses panels without images, just text:

To my mind there is no doubt that the panel beginning “It could have been” is, indeed, a panel. Yet what about the other text. What about “He would cultivate”. Bechdel’s captions are overwhelmingly in the gutters in Fun Home (interestingly, this is a technique that she does not much use in her follow-up, Are You My Mother?). They seem intuitively connected to the images below them in a way that “It could have been” does not seem connected to the drawing of Alison watching It’s A Wonderful Life. Yet if the text had appeared above that image, and the image were stretched page-width, we probably would count this as a five panel page rather than a six panel page. There is something about the tier that determines the panel.

Let’s look again:

This seems to be a three-tiered page, with one panel made up entirely of text. Why is it, then, that “Wearing a black velvet dress” seems to not be a panel, while the text beside it does? Is it simply the presence of the panel border?

Finally, there are examples like the first tier here:

That caption spread across the two panels becomes not so much an analytic issue as a coding one (it is akin to a voice-over in film that stretches over multiple shots). This is a common comics technique (though not common in Fun Home). It is expertly deployed here - that is a terrific composition. But it is the type of thing that will begin to give our databases fits. There’s a lot to work on for such a seemingly commonplace rhetorical trope. 

One thing that leaps out at me now as I look at comics is just much they lack standardization. When I mentally picture an Archie comic of the 1960s, for instance, I think of a three tiered page - but I know that there are so many pages that stray from that patterning. I think that by paying strict attention to the geography of the page as it changes over time or even within individual works we are going to learn a great deal. 

 

Is The Spirit a comic book?

Previously we indicated that “we are proposing to study a randomly generated sample of American comic books produced between 1934 and 2014”. Already you can probably see that this has a number of problems. The first, of course, is the sheer scope of production. We have talked about trying to sample two per cent of these comics; that would be a dataset of more than 5,000 comics. That would allow us to do a lot of interesting analyses, but the sheer scope may be too vast to take on.

This afternoon I began the process of experimenting with the numbers, trying to gauge the scope of comics production over those eight decades. The preliminary numbers are fascinating in themselves, and I look forward to having something concrete to share once we’ve made certain that our data is reliable. 

As I ran some trials today, one thing that kept coming up was our need to define the term “comic books”. I ran a number of trials today on 1948 because it is a year that I’ve previously researched for a different project and I have a reasonably good sense of what was being published that year. The preliminary number that I came up with was 1,858 comic books published that year (a large leap from 1947, but only about two-thirds of the early peak year of 1952). If we sample two per cent of 1948’s production we would have to study thirty-seven issues. When I ran a first attempt at randomizing that year something immediately leapt out at me: I got three issues of The Spirit.

The Spirit, of course, was a weekly publication in an era where everything else was monthly (or less frequent than that). There were fifty-two issues of The Spirit in 1948, so it should show up more than, say, Action Comics. Indeed, The Spirit represents three per cent of all the comic books published in 1948. But only if The Spirit is a comic book.

That’s the core question: is a sixteen-page insert into twenty Sunday newspapers a comic book? If it isn’t sold in the same channels as the other comic books, is it still a comic book? If it is released weekly at a time that no other comic books are produced that frequently is it still a comic book? Or is it something else? A comic book/comic strip hybrid? This is a question to be resolved before we begin. 

If we include The Spirit it will show up frequently because there are so many of them, and they will slightly skew our numbers (sixteen pages rather than the more typical page lengths of the era). If we exclude The Spirit we need to do so before we start, and recalculate our sample size by subtracting all those issues from the initial data pool.

You can also see how this goes: Do we exclude magazine-sized comics (Mad, Creepy, Eerie) or are those also comic books?

Decisions will have to be made. I’d be interested to hear your thoughts especially on whether The Spirit belongs in a study of comic books.

Just what is it that we're doing here?

If you take a look at the “About” tab on our site you can find the single page summary of the SSHRC grant proposal that is the primary driver of this research project. That’s all well and good, but what are we doing really?

To help explain this project to the 136 students in my ENGL 388 (Comics and Graphic Novels) class the other day I drew on a pair of examples from two of the books that we have read over the course of this term. The first of these was Watchmen. We all know, for example, that the structural basis of Watchmen is the nine-panel grid (three tiers of three panels) that regularizes the layout of the entire work. Here’s an example of that layout from the first issue. 

But let’s recall that Watchmen only seems to rely on a nine-panel grid. The first page, for example, has seven panels, while the second has eight. Indeed, the first nine-panel page doesn’t appear until page five (above). In the first issue, only nine of the twenty-six pages has nine panels. None has more than nine, but the vast majority - almost two out of three - contain fewer. While it is true that you can distinguish the alignment of the grid even in pages that don’t use it due to the choices that Dave Gibbons makes about panel width and height, Watchmen is predominantly not a comic featuring a nine-panel grid, although it is frequently discussed as if it were. Our impression may be that Watchmen is mostly nine panels, but the data shows that only 133 of the 336 pages of Watchmen have nine-panels (39.5%). This project is a first step in discussing these kinds of comic book stylistics through the use of a large data set.

With this project, we are proposing to study a randomly generated sample of American comic books produced between 1934 and 2014. Specifically, we will study a statistically significant sample from each of those eighty years. In the first phase, we will code a series of relevant pieces of data (number of pages per issues; numbers of stories per issue; numbers of panels per page; number of word balloons per panel; number of words per balloon). During the second phase we will be looking at data that is more subjective and more difficult to quantify (for instance, typologies of panel transitions). In the final phase, we will draw upon the data set to author a study of the evolution of comic book styles over time. 

Let’s return, for example, to the Watchmen example. If we note that throughout the series the nine-panel grid typically runs about 40% of all pages, what can we make of this chart?:

That enormous spike in issue six (“The Abyss Gazes Also”) immediately leaps out at us. This is the chapter in which Dr. Malcolm Long examines Rorshach and we learn the back story of the book’s protagonist. Indeed, the preceding chapter (“Fearful Symmetry”, the most formally inventive chapter in the book) is also primarily, though not exclusively, about Rorschach. The seventh chapter (“A Brother to Dragons”, which is the first to have pages with more than nine panels (during the sex scenes)) primarily focusses on Dan and Laurie. Certainly, then, an interpretation that suggests a connection between the non-powered heroes of the book and the nine-panel grid becomes an enticing route for thematic investigation. This would be supported by the fact that the alien-infected final chapter has the lowest number of nine-panel grids (only four out of the thirty-two pages), and is the only chapter that uses traditional single panel splash pages. 

Watchmen provides a very obvious example of how a study like this might work because it is so formally structured. What we are not particularly interested in, however, is analyzing individual works like Watchmen but charting the evolution of the comic book format over time. We are interested in the typical comic books of various periods, not the ones that have been proclaimed the best. Indeed, it is more likely than not that Watchmen would not even be randomly selected into our sample set! Examining changes to comics over time is one of the things we are most interested in.

Take, for example, Saga. I taught my students this book shortly after teaching Watchmen. One of the things that my students immediately noted is that the first issue contains six splash pages, and that the splash in general is one of the visual hallmarks of the work. Moreover, they noted how much more quickly it reads than does Watchmen because the volume of text is quite reduced. I think it is a commonplace to note that contemporary superhero comics contain less text than do comics from thirty years ago, and that those contain less text than comics from thirty years prior to that. Yet we lack the tools to actually demonstrate this assumption that is so routinely taken for granted. 

ri6blyr3267ctqzgvdje.png

Moreover, we throw a spanner into the works when we talk only about superhero comics, or, worse, only certain types of superhero comics (that is, comics from “the big two”). Watchmen is often praised as realist because it lacks thought balloons and sound effects, and it is noted that in the wake of Watchmen many comic book artists abandoned these story-telling elements. Can we actually demonstrate that with data? Moreover, when we step further back we might note that most Archie comic books of the 1960s contained no thought balloons or sound effects. Is Watchmen radical for borrowing the formal stylistics of Archie?

This is very much only the tip of the iceberg, but, I hope, it gives a small sense of our goals. In the coming weeks I hope to blog a bit about the theory behind what we are doing, and then, once term ends, we will begin working on our data set and coding protocols in earnest. We will be sharing our thoughts on these topics as we go, and we really hope to encourage feedback wherever possible : better to identify a misdirection early in the process than to have to go back and begin again.

-BB