My blog Uncategorized Statistics and Gender in A Midsummer Night’s Dream

Statistics and Gender in A Midsummer Night’s Dream

Honestly, out of the “romantic” Shakespeare plays, Midsummer is probably not one of my favorites. In my opinion, there are far more interesting character dynamics and themes present in Twelfth Night, for example, than this one. Still, that does leave me with some questions: namely, how does the play portray gender roles? Even though I’m a relatively recent convert, it’s already apparent to me that there’s a tendency for women to be talked at, or in romantic contexts about, rather than being allowed to speak for themselves. While my Shakespeare 1 professor might disagree with this, I think that it’s just a reality that many of Will’s plays recreate the gender roles of his society – one in which this problem was far more pervasive. One good way to decide would be to look at something that can’t be argued about – the data of the text itself.

For the text, I used an XML version of the play provided by my professor originally marked up by Jon Bosak (and written by William Shakespeare, obviously). I would upload that file for you here, but WordPress doesn’t allow xml files to be uploaded for security reasons. 🙁

For this analysis, my primary tools were Voyant Tools to gather data from the text and Microsoft Excel to make graphs and tables of the data. However, since Voyant Tools didn’t run well on my computer on top of being rather obtuse to use, I don’t have a single corpus to embed in this page. Instead, I’ve got the closest thing I could make in the tool itself, which you can find below this paragraph.

Side note: yes, both of these are centered. No, I don’t know how to get the graph to not be to the right like that. Also, the nine sections of the graph don’t correlate to the nine scenes, as Voyant doesn’t have an option for setting it to anything other than equal chunks of the text. This is why I abandoned doing any analysis in Voyant halfway through the process, the sheer lack of easily available options.

My primary question of interest is if I can find a discernible difference in how Shakespeare uses male and female characters. To do this, I’m going to look at two things. First, I want to see if there’s a difference in word correlation, the words most often near other words, between men and women.

Uninterestingly, Hermia, Helena, and Lysander are most strongly correlated with the four Athenian lovers and the word love. However, Demetrius breaks this pattern, correlating with the other three plus “enter” and “die”. Enter, of course, is the stage command to have characters move on stage, which is written next to Demetrius quite often as he is always written first amongst the four Athenians. “Die” is unfortunately also a fluke, as Demetrius speaks right after Bottom’s line as Pyramus where he repeats the word five times.

Bottom’s correlated words are Fairy, Quince, and Pyramus, which are all fairly predictable given the plot of his story. Theseus and Quince both just have the names of other people in their realms, the denizens of the court and the rest of the acting troupe respectively. What is interesting is that, for Titania and Hippolyta, the “train” they are accompanied by at all times is so present in their characters that it is gives as their correlated phrases, something none of the others have (at least in a way that makes sense). Also present in both are the names of their attendants, as both always have a man at their side to assist them. While this is an interesting observation about gender roles, that the two supposedly powerful women must be surrounded at all times by men, I’m more interested in evidence of a more subconscious bias on the part of the author.

Now, we have the main event. I want to see if there’s a difference between how often male and female characters get to speak vs how often they are spoken about. To do so, I first needed to get Voyant to see the difference between the speakers and spoken words. Fortunately, that wasn’t too difficult, as I was able to simply recreate the corpus using the following XML tags:

//SPEECH/LINE
//SPEECH/SPEAKER

This was a great thing in that it allowed me to gather data for the two categories rather easily. This was a bad thing in that I know have two corpora to deal with, which will become a bigger issue later. From these new corpora, I used the find word function to see all versions of a word in the text (including plurals, possessives, etc.), and made three graphs.

However, me being me, I decided I should look at how this breakup differs by scene. After trying to get the XML tag filter to work only for Voyant to crash for a few hours, I eventually decided to manually chop up the original file into 9 parts, one for each scene. I then imported each into Voyant twice, once to check lines, one for speakers, to produce two more graphs of the spoken names and speaker names by scene. I’m not sure how useful this was, it didn’t help my analysis that much other than to remember that some characters aren’t as present physically than others, but it was too much of an effort to make for me to not mention here.

The five graphs are given below. They’re too small to see in the page, so please ctrl+click them to open in another tab.

“Name Frequency” refers to the total amount of times the character’s name is mentioned int the full text. “Name Spoken Aloud” means their name is present in a, well, line an actor would speak in the play. Finally, “Spoken Lines” refers to how many lines the character gets in the play. There are two versions of the latter, one broken up by scene and one for the whole play.

The characters that made it onto the graph were chosen based on their total amount of times their name was mentioned exceeding 20. While Pyramus and Thisby aren’t real characters in Midsummer, instead being characters within the play-within-a-play put on by Bottom and his crew, their names are mentioned quite frequently, so they made the cut. In case it wasn’t clear, men are in blue and women are in red.

To start out with the easiest ones: obviously Pyramus and Thisby are spoken about far more than they get to speak, as they are fictional characters and their “lines” are given only during the play in the last scene. Three of our four main protagonists get quite a bit of focus in both lines and being spoken about, those being Lysander, Demetrius, and Hermia. However, Helena is a different case, as there is a notable difference between how much she speaks compared to how much she is spoken about, 36 vs 24. This also appears to be a big issue with Hippolyta, with a gap of 14 to 6.

However, some characters get more lines than others. It would probably be a good idea to remove this factor as much as possible by comparing total lines to total times names are spoken, instead of raw counts. To do this, some more Excel formulas were invoked to eventually produce this graph, with Pyramus and Thisby removed as they significantly altered the results.

A positive y-axis value means the name was spoken a greater amount than they had lines, and vice versa. Here, the gender difference seems less noticeable. Characters like Theseus, who get little screen time compared to how often characters speak of them have a large value here, while the protagonists, who speak a lot, have far more lines than mentions of their name.

Well, it looks like William has got me again. I couldn’t find evidence that he’s treating his female characters different from his male ones in terms of the raw statistics, at least in this play. I would be very curious to see where this sort of analysis would lead looking at all of Shakespeare’s plays, possibly even how it trends over time. I’d guess that the historical plays would see a larger trend than the comedies and tragedies, but that’s just speculation. I would also look more into the correlated words, with my next idea being to filter for a specific character’s dialogue to see what their most commonly spoken words are. However, doing just what I’ve shown here took long enough. This analysis was fairly interesting, if frustrating on the technical side (if Voyant gives me an error box with no error in it again, I’m going to scream), and overall I’m glad I chose go to this route with the midterm.

My excel file used to generate the graphs is available below, for transparency.

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php