We've gone two months without updating this blog, but I would hate to let you think that we haven't been working all summer. Here's where we are at right now:
1. Data curation. Having exported the GCD database we are trying to determine the scope of comic book publishing between 1934 and 2014 so that we can develop our random sample. This requires eliminating a lot of items that are included in GCD that we don't want included. One example (that doesn't actually impact us because it falls one month outside our timeframe): Star Wars #1 shipped in January this year with, I believe, seventy different variant covers. We don't count variants as different comics, so this would need to show up in our database once, not seventy times. In part that's as simple as selecting and deleting sixty-nine instances of the book from an Excel spreadsheet. The problem is that the comics industry loves variants (or has loved them in the past) and there are literally tens of thousands of them and we have to confirm each one by hand. Slow-going. We now have four students working on this issue, and we hope to be done in the next few weeks.
2. Finalizing the Sampling Frame. Related to this, for our purposes a comic book is only a comic book if it appears in two of three databases (GCD, mycomicshop.com, or the Overstreet Guide). We have been cross-referencing GCD and mycomicshop all summer, but in late August we will have to cross-reference against Overstreet. Fortunately, I am expecting two copies of the new Overstreet Guide to arrive this week, so we can finally move on to this task.
3. Data entry tool. As this is a big data project, we need a consistent way for our coders to input data that will be usable in the long term. To this end, we have hired a computer programmer to build us a unique tool that will allow each of our coders in the next step to connect to the database and input findings. We are in the final stages of this project, which has been slowed somewhat by the research team's constant "what if we added this function?" requests. We are the type of people who drive our programmers crazy.
4. Coding protocols. This has been the big one. Last week, Ben Woo came to Calgary for four days of meetings. This was the first time that the entire team had been able to sit down in person to discuss the project. Over the course of a couple of days we drafted a fifteen page coding guide and a supplemental booklet of illustrations and examples that ran to another hundred pages. We then shared this with several students and asked them to read it and code one comic book: Tippy Teen #21 (which you can read online here). We then met at the Visualization Studio at the University of Calgary to compare notes on the data. Discussing the coding with the students made it very clear which parts of our protocols need to be more precise, and we are now revising those instructions. We will run a second round of tests likely in August. Ideally we would like the instructions to be clear enough that any student could step in and read a comic book and code it the same as any other student. The reality is that people are people and this will never be the case, but we're trying to eliminate as much of the guess work as possible.
In our initial test we encountered two significant areas of confusion. First, we are trying to define "shot lengths" (to borrow a term from film studies). That is, at what distance are the characters from the reader? Close-up, medium length, extremely long? These terms are simple to define, but in comics they are not so easy to code. In film, at least in the pre-CGI era, coding shot lengths is relatively easy because you identify what is in sharpest focus - it Cary Grant is in the centre of the screen with people at a party behind him that are not in focus, you code for Cary Grant. In comics, though, everything can be "in focus" because there is no technological limitation on the drawing. This is something that we need to define. Ideas like "the element that is the focal point of the panel" have proved far too vague.
Second, the overall design of the page has proved to be a total bear. We probably argued for three hours about this one day, came to a resolution, and then argued about it for three hours again the next day. Notions of the grid, and, more importantly, types of grids have already become our bête noir. We are planning a longer post on this topic - and possibly a journal article - coming soon. Suffice to say: the entire concept of the "tier" in comics studies is ridiculously fraught because of the myriad innovative ways that cartoonists find to layout their pages for maximum visual interest.
So that is where we are at as of the end of July 2015. We are optimistic that we will have our coding book, data collection tool, and random sample ready to go for September, which is when the real work will begin.