Research Blog

A Very Akhmatova Thanksgiving

Elise Thorsen -- 24 November 2012

The Akhmatova group's Thanksgiving hiatus from one another began a little early because one of their members was awol. Thus, as we mentioned last Friday, we've kind of set off in our individual directions to get around to supplying the end data that we'll analyze and present together.

The data we would like to begin writing up and representing visually by the end of Thanksgiving weekend are:

  • alliteration: repetition of sounds at the beginnings of words (with an ambition, but not an expectation, of expanding that to consonance in general, which is slightly more complicated because it can refer to clusters that include vowels or can permutate),
  • rhyme: repetition of sound at the end of words (as Sam wrote, a very interesting visualization of rhyme is available at The Sonneteer--what would be useful would be if we could rope all of the things that fall under "sound repetition" into one visualization, although perhaps with rhyme marked apart, given the extra attention and development it has received in the history of poetry)
  • meter and rhythm: Russian meter is syllabotonic--a combination of the number of syllables and the patterns of stressed syllables within the set (very much like English, actually). What we would like to do is determine the rhythm (the actual pattern of stressed and unstressed vowels in a line/stanza/poem), derive the meter (the general pattern to which this rhythm adheres, though typically in binary meters, there are unrealized feet because the ratio of stressed to unstressed vowels in the Russian language is more than 1:1), and use those elements to find patterns in failure to realize meter (an aberration so typical as to be a general rule in iambic meters is Regressive Dissimulation, where the penultimate foot is always stressed, but the penultimate foot is stressed far less, between a third and two thirds of the time depending on the period, and the antepenultimate foot is stressed less than 100% of the time, but more often than the penultimate foot--this becomes particularly interesting again when authors break this "rule").
  • change of the above elements over time

Essentially, these are developments on data we have generated already from the basic xml: stress and phonetic realization. The XSL transformations continue to be challenging, but we have spent a lot of time, particularly when we have David and Eric as resources, thinking through our algorithms and how to describe what we want in mark-up, and those seem sound.

With our mastery of SVG this week (I am looking at Sam and Erin when I say "mastery," because I may still be working through a few things), we know the tools we have for visualization of our data.

In other words, there are more full plates this weekend than just those holding stuffing and gravy. But... sally forth, onwards and upwards, выше и выше, excelsior!


Akhmatova Pre-Thanksgiving Update

Sam Depretis - 16 November 2012

This week we focused on cleaning up some of our markup and continued progress towards automated mark up of various elements of Akhmatova’s poetry. Now that we have added more elements to the mark up, we need to return to the schema and change a few things to make sure everything is consistent.

After learning how to use SSI, we realized that some of the elements of our website are inefficient, such as every page having a separate but exact copy of the title and menu bar. Hopefully by the end of this weekend this will be fixed, as well as some other small details like the incomplete separator bars under the html poems section, left align for the poems, and the addition of the Russian titles of each poem. When this is done, adding our research and SVG (when completed) should be pretty simple.

For SVG, we were inspired by the character frequency chart in the Karolina Pavlova project

http://pavlova.obdurodon.org/character_chart.xhtml

Because we are analyzing properties of her poems over time, it seems logical to use change of variables over a period of time. We have also found an example of rhyme association here

http://cocoon.lis.illinois.edu:8080/lis590dpl/wapiez/Sonneteer/sonnets/m...

that we like and may use as a guide. We agreed during our meeting that something seems strange about it, however, so we hope to make improvements in our version. This is all assuming that we get far enough into rhyme to have this as an option, of course. We think some of the strangeness of this model is that the connections are at the end and that they are a good distance away from the poem itself. Although, we understand why they are at the end of the line, since that is where rhyme generally occurs. If anyone has any advice on how to improve on this, please let us know!


Akhmatova: Positioning for the Last Month of Coding

Elise Thorsen - 9 November 2012

For the first time in a while, you the ordinary member of our class can see a lot of what we've been up to in the last week. We have all of the texts we want for this "proof of concept" project marked up; you can see these poems on our website. If anything jumps out at you in reading them, or in your user experience with the site, we welcome your feedback. In general, the site has had a few additions recently, and we would like to know how comfortable you feel navigating what we have so far.

Meanwhile, in addition to this presentational stuff, we've been working on marking up the texts for analysis--trying to think about how computers will make reading poetry easier, and allow us to look at the use of devices over time. Our major breakthroughs in this sense were threefold. First, Erin took advantage of one of our XQuery assignments this week to look at alliteration (the repetition of beginning sounds in words) in Akhmatova, which is definitely something we want to track in this project. We got a significant way towards analyzing rhyme by converting Akhmatova's Russian from its orthographic presentation to the phonetic representation, how it actually sounds when one reads words out loud. The stylesheets for that can be found at our resources page, where we will share some of our project files (as well as the promise of a bibliography about what all of this Russian poetry and analysis stuff is, anyways). Finally, we've been working on computationally obtaining the meter of a given line or stanza of poetry--we've worked through a significant amount of the pseudocode for that, and we're looking forward to being able to analyze patterns in meter soon.

In the next week, we look forward to finishing and tweaking our automated assessments of the two fundamental components of poetry: rhyme and meter. These, and other elements like alliteration analysis, will finally position us to do analysis and see how individual components of poetry (e.g., meter) change over time, and whether their relationships to other components, devices, and themes (Mallet might come in useful for this!) change over time in Akhmatova's poetry, and maybe to use R productively, as well. We would also like to put together a schematron file that will mark when "rules" of metric generation are broken, so that we can mark those--these may well be the spots where interesting changes of attitude or irony occur, something we were interested in from the beginning!

Also in connection with R and representing our data, given that we are covering SVG in class next week, the exercises for this section will provide us with opportunity to think about our visual representations of our results. We hope to have a concrete plan for representing at least some of our data graphically by the end of next week.


Akhmatova Update

Erin Harrington - 1 November 2012

This week we took a little breather after the midterm. Elise made excellent progress last week with creating XSLT stylesheets to mark stressed and unstressed vowels as well as voiced and unvoiced consonants. After all these are marked up we can give this information to David, who has access to the dictionary that marks stress, and he can generate the output. This is very important for when we look at rhyme because Russian is a very inflectional language. This means that stress can switch and that certain letters make different sounds depending on where it is in the word. This means that words that don't appear to end the same way can rhyme.

Here is a copy of our to-do list for last week:

Pseudocode for orthoToPhono
Notes:
1. Slavic has its own transliteration system, which probably will serve for the phonetic presentation at the moment. But I’m not sure how that will affect our ability to examine distinguishing features (or whatever it was that David kept nattering at the tongue-twister group to examine).
2. We should agree on a convention for representing schwa that is not schwa. The soft signs can be replaced by ‘j’ by ߰ (′). Unstressed vowels are also characterized by their shortness—maybe that’s something we can mark so as not to find ourselves using lots of special characters.
3. David knows these rules as a linguist, not just as rules of thumb for L1 English learners of Russian. If he has suggestions, I’m open to them.

Pseudo-Code

1. Mark all unstressed vowels as <unstressed> in the <w>/<str> elements
2. Create a new element <phono> within <w>
a. For all <w>, <xsl:apply-template select=“ortho”>, <xsl:apply-template select=“str”> and <phono><xsl:apply-template select=“str”></phono>

3. Analyze-string tasks
a. Replace all palatalized vowels with their component sounds: j + unpalatalized vowel; replace all ь with j
i. (replace())
b. For all consonants that are always hard, remove mark of softness after
i. Regex-group using something like (([А-Яа-я]*[жшцкгх])j)*([а-я]*); concat() the regex groups. I’m not sure this can work
c. For all consonants that are always soft, add mark of softness after
d. Reduce vowels (transform(), but how does one pay attention to position and preceding softness, which matters?)
i. ‘A’ after a soft consonant  ‘i’
ii. All unstressed “o” and “a” at absolute first position and in syllable immediately before stress to [ɐ] (short “a”)
iii. All unstressed “o” in other positions to schwa (short “e”).
iv. All unstressed “э” before the stressed vowel to “i”
v. All unstressed “э” after the stressed vowel to schwa.
e. Assimilate clitics to following word
f. Devoice voiced final consonants and voiced consonants followed by final soft sign.
i. Actually, this is easy enough to do with regex-group as well, I suspect.
g. Change тся and ться to ца; сч to щ
h. Change щ to šč; make one-to-one replacements into latin alphabet.

To create:
Meter
Caesurae
Rhyme
Start Analysis
These poems are associated with these devices (metric, rhythmic, rhyming)
Library
PHP
Cache and check the date

This week Elise put the finishing touch on phonetics, and Sam and I will start making a style sheet to mark meter. We are also interested in using Mallet to find high frequency words and themes in our poems. This has several potential drawbacks. First, since we have a small corpus we may get inaccurate results. Second, Russian words take different ending depending on its function in the sentence. For example, the word "dog" (sobaka) has a different ending depending on weather it is the subject (i.e. sobaka) or direct object ( i.e. sobaky) of the sentence. This posses a problem because Mallet will mark them as different words.

Eventually we will start using XQuery in our project but first we need to fully mark up our xml documents. Please comment with any advice for accomplishing these tasks or if you would like to see how we configured our xslt style sheets.


Akhmatova Update

Sam Depretis - 26 October 2012

We have made a substantial amount of progress over the last week! As of yesterday, we have completed our structural mark up of the poems that we will be using. In addition, we have created the website template that will (soon) house all of our poems and research. We also have an xslt stylesheet that is nearly finished that will allow us to translate our structurally marked up poems to html to display on our site.

What we are currently working on is a system of stylesheets for orthographic to phonetic representation of the poems. We are doing this by creating xslt stylesheets for vowel reduction, tagging stressed and unstressed vowels, tagging consonants for hard and soft sounds, starting with consonants that are always hard or always soft being tagged to show this, and tagging reducing vowels. After completing this, we will be moving on to marking meter of Akhmatova’s poems and will hopefully have a lot of progress on this to share for next week.

During our meeting with Professor Birnbaum, he had informed us about data types for dates, which can be used to help compare time differences. After reading Michael Kay's book, I am not sure if these data types would be worth using. They seem to be easily converted into other simpler data types that are more practical, but this conversation about these had created a renewed focus on one our of main research questions, which is the change of Akhmatova's poems over time.


Akhmatova Weekly Update

Erin Harrington - 19 October 2012

The dictionary-based stress lookup is handy because of the way Russian orthography is related to Russian phonetics. In Russian, as in English, the place of stress in a word cannot be predicted reliably or consistently unless one knows the word. But if one does know the place of stress, it is much easier to determine the pronunciation of a Russian word than an English word. Russian doesn't have silent vowels and it doesn't have instances where two vowel letters represent a single vowel sound, which means that you can calculate the number of syllables in a line by counting the vowels, and if you know where the stress is, you can therefore calculate the meter. Furthermore, the other rules for Russian pronunciation are much more reliably associated with orthography than is the case in English, so it's also possible to determine how the word is pronounced. This means that rhyme can be calculated from a combination of place of stress plus pronunciation.

The Hard Part, though, is that Russian, like English, is normally written without marking stress, and Russian speakers know where the stress falls the same way English speakers do. We could mark stress by hand, but that's laborious and error-prone. If we could automatically determine the place of stress in an automated way, though, we could get from a text in normal Russian orthography (that is, a regular plain-text electronic file) to something that would support the machine-assisted analysis of formal poetic features.

Elsewhere in the Slavic Department Oscar Swan and Nicholas Reimer have built a Russian dictionary that includes stress information for 30,000 words. They are permitting us to use their files to build an automated stressing tool: we feed it a Russian text in normal Russian orthography, and it gives us back the same text with stresses marked. Preliminary output is temporarily at http://www.obdurodon.org/~djb/report.html, which has the input poem at the top (no stresses) and a report of which words are stressed and which aren't below.

In the report, the blue cells are words not found in the dictionary. In some cases that's because the dictionary contains only 30,000 words, and it doesn't contain proper nouns. Additionally, at the moment we're looking up only nouns, verbs, and adjectives, so other parts of speech are not being retrieved. The pink cells return multiple hits. It's not uncommon for Russian words to have multiple inflected forms (forms with grammatical endings) that have the same ending as far as spelling is concerned, but different stress. At the moment our routine returns all possible stresses, even though only one can be correct for the actual wordform in context. Disambiguating such situations in, say, newspapers is tough because it depends on contextual analysis that may require semantic interpretation. In poetry, though, if the meter is regular, we can use the dominant meter in the poem to predict which of two competing potential stresses is more likely to be correct in a specific context.


Transforming and Viewing Akhmatova

Elise Thorsen - 12 October 2012

We envision a multi-tiered view of the Akhmatova poems that we are marking up.

At the most basic level, we would like to have a library of traditional readable view of the poems: recognizable stanzas and lines framed by a title and maybe a date. The page with this view will probably benefit from a table of contents. In principle, right now we could create an XSL transformation that would do that for us and the poems we have already marked up at the most basic level right now. Having such a page up by the end of next week seems like a reasonable goal, once we have a chance to address how to get these translation schemes uploaded.

In conjunction with this basic level, we would like to have a style sheet associated with this basic view that will allow us to eyeball connections (by means of a color code, for example) among poems: rhyme, alliteration, word boundaries, deviations from meter, etc. In this view, the temporal dimension that we are interested in mapping is understated. However, given that temporal mapping will inevitably, I think, lead to some amount of abstraction (patterns become more important than the words?), it is useful to let color and styling serve as a touchstone or point of reference between the verbal text and the more abstracted version.

It may be useful to have an interlinear view of poems--presumably the mechanisms for this will be clearer when we read about the TEI critical apparatus module. This could be useful for the translation aspect of this project, or to present allusions or other draft versions (such as in "Poem without a Hero," which exists in four versions). Or, it could also be generated based on lines in other Akhmatova poems that are connected to a given line by any number of possible connections.

Finally, we would certainly like to have graphic means of presenting patterns in the poems we are looking at, with graphs and charts (this will take exploring additional graphic translations schemes like SVG, I expect). This could include bar graphs, which would be particularly useful as far as providing a stable visual axis to represent time. It would also be interesting to see if we couldn't create word clouds with Force-based algorithm graphs that could draw the strongest relationships of any word to any or all of the other words, either at a particular time or during a given range of time.


Akhmatova Schema Update

Sam Depretis - 5 October 2012

This week we have finished our schema for the structural aspects of Akhmatova’s poems and are beginning to mark up each text according to that schema. The current schema includes structural elements like title, sections titles (intro, I, II etc), section text, lines of those sections, and rhyme. We decided that for now, it is best to mark up the structure of these poems, then go back and add our research later. We hope by Friday to have the structure of some of our poems that we will definitely be using marked up, begin to narrow down and set a complete list of poems to mark up, and create a rough draft version of our home page for our website.

We used the TEI as a guide but with one small change. In the TEI, they suggest using “text” as the root element. Because we are dealing only with poems, we decided to use poem as the root element to avoid confusion between the root text, and the text attributes.

To mark up the rhyme, we decided to use <rhyme> to mark the rhyming elements in each line. For example, in a pair of lines in Ahkmatova's "Reqiuem", the first line ends with the word "больна" (bol'na), and the second line ends with the word "одна" (adna). To show this relation, we will mark up rhyme like this-</rhyme>

First line text боль<rhyme>на</rhyme>
Second line text од<rhyme>на</rhyme>


Akhmatova

Erin Harrington - 27 September 2012

We have decided on two major research questions for our Akhmatova project. One of our goals is visualization of change over time. The changes that we are interested in are changes between words, for example rhyme or on a more sophisticated level irony. Therefore we want to both analyze changes within the text on a minute level, the relationship between words, and then work outward from there. For example, we would like to look at the relationship between different parts of the poem in order to record differences in rhyme, irony and tone. We are also trying to measure both linear and nonlinear concordances between words. Furthermore, on a grand scale, we would like to look at the relationship between poems over time. After this is all completed, if we have time, we will analyze differences between translations as a sort of capstone question.

Sam has already started writing a schema for “Requiem” using TEI as a guideline. Our goal is to have one schema for all 20 poems. One of the problems he ran into was whether or not to name the poem “text” as TEI suggest as it could be confused with the text elements. TEI also suggested using “div” to mark up divisions between the poems but this is also a reserved word. We have also decided that the best way to mark up parts of the poem as strings.

We also plan to incorporate several attribute tags and attach them to a CSS document. This way elements with attributes that rhyme can be red, and elements with attributes that are contain irony can be yellow, and elements that contain both attributes can be orange. Essentially we would like the text to be visually engaging although we will hide a lot of the markup, for example part of speech, from the user unless the user enables the CSS color for parts of speech.

One of the major aspects of our project is that we are essentially asking the question without having a clear idea what the answer might be. Therefore we will probably markup everything we can think of in the first couple of poems to see if there are any similarities between texts. Also we understand that sometimes we may have to look at different aspect of each poem on an individual basis because some poems are very ironic and some are not ironic at all.


Akhmatova General Outline

Elise Thorsen - 21 September 2012

We think it is a reasonable goal for the remainder of the semester to mark up selected poems by Anna Akhmatova, in particular her long poems "Requiem" and "Poem without a Hero." In addition to verse elements such as rhyme and meter, we would like to mark up other ways that linkages and associations are created in verse. It may be possible to represent and visualize relationships between words within the poems through proximity and phonetics, which can then be interpreted, perhaps in terms of tone and irony, perhaps to the effect of revealing new patterns. Similarly, translations of Akhmatova's poems into English may reveal something about the semantic range of these poems.

Our intention in the next week is to establish a common schema for sections of "Requiem," with reference to the TEI guidelines on verse and linking. Once we have negotiated what we want for a project standard, we will divide our texts and each mark them up according to the standard schema. This may take place in two stages; one a matter of laying out the overall structure of headings and stanzas, the next of propagating links between elements and creating alignments between originals and English translations (which, judging by the TEI guidelines, don't tend to interfere with hierarchies for presentation. The texts are generally short enough to tolerate touching twice.

Once we have some texts with at least presentation mark-up, we'd like to get to get to work on XSLT to transform the XML into HTML, on an overall site design, and on building the CSS to support that design (all of these have yet to be broached). This line has a lot of work condensed into it, but I think we were all enthusiastic about each contributing something to all of these elements, so the workload should not be onerous for any one person on this front.

One element of the site design we had already considered was a timeline as a stable element somewhere on any given page, which could serve as a browser of the poems as well as biographical and historical information.

Finally, with reference to what we'll do with all of our mark-up besides display the poems, we are looking at ways to represent visually the relationships and patterns among words--including and beyond linear concordances. We are not sure what the results of this activity will be, but in principle, it will mean the ability to visualize the "semantic halo" of any given word or set of words within a poem or this body of poems, which will be a boon to future interpretive endeavors.

Although I have laid this out in a linear fashion, and we will probably begin each step in this order, the scale of each step is such that we will probably end up working on more than one step simultaneously, as the texts and our progression through class topics allows.

If you have ideas about what further applications we could find for data about translations and relationships among words within poems, we would love to hear them while the project is still young.


Sam/Elise/Erin Project Update

Sam Depretis - 14 September 2012

For our project, we have clarified what aspects of Anna Akhmatova’s poems we found the most interesting to use as focus points of our work. The first important aspect of her poems is her word usage. This will include aspects of her poems such as her use of rhyme, alliteration, and word association. By marking up these factors we hope to be able to quantify literary aspects of her poem such as tone and irony.

We feel that to be able to truly analyze her works, it is important to study her poems in Russian. When deciding in which language the poems would be best studied in, we came up with an important question- why would it be best to study the original Russian versions of these poems? It feels more natural to study her works in the language in which they were originally written, but we will attempt to solve why this is, and what aspects of her poetry get lost in the translation from Russian to English.

Word association is another characteristic of her poetry that we believe really stands out. To research her use of word association, it is important to find which words are associated with others, if the association is good or bad, and the frequency of these associated words. We found that a scatter plot with lines connecting associated words would be a great visual tool to display this information. This graph will display the frequency of the associated words used, the connotations of the words being used, and lines connecting them to display their association with each other.

The last main point we would like to touch on are the overall style changes in her poetry over her lifetime. For example, how do her pre-1940s works compare to her works during and after the Siege of Leningrad? We plan to study around twenty of her poems ranging from before the revolution until into the 1960s, focusing on her two longer works “Poem without a Hero” and “Requiem”.