As is well known at this point across the Internets, GPT-3 can do many amazing things. Among his list of examples of amazing things that GPT-3 can do, Gwern includes a “translation” of a Chinese poem from a LessWrong user. The user in question (username: “oceaninthemiddleofanisland”, henceforth “Ocean”) effuses with breathless praise for the astounding ability of GPT-3 to, supposedly, “produce idiomatically and culturally accurate translations of Chinese poetry”, which someone who appears to be the same person, albeit with a different username, describes on Reddit as a “notoriously difficult task”.
Translating poetry is indeed notoriously difficult, in general. Classical Chinese poetry such as that of the Tang and Song dynasties, pithy in the extreme, often laden with symbolism and allusions, and making heavy use of sonic and rhythmic elements inherent to the language (even partially lost in some modern Chinese dialects, especially the northern ones), is indeed possibly some of the hardest material to translate into English while balancing competing priorities of conveying content, connotation, and aural characteristics.
而此非彼比！(Translation: However, this cannot be compared to that! See how that was only 5 characters? That’s Classical Chinese.) What we have here is a poem written in a poetic register of modern vernacular Chinese that is actually very close to typical, contemporary Chinese prose — more free verse rather than Shakespearean sonnet, to analogize to an English-language context. This kind of text presents some hurdles, for sure, but overall not much more than, say, a contemporary French poem would. Most importantly, though, just because GPT-3 created something that feels poetic doesn’t necessarily mean it did a great job.
First things first: did GPT-3 actually translate the poem? You might imagine that this isn’t even a question given that the response seems to circle around how well it translated the poem. Well, this is indeed a question, and the answer to this question is no (in my humble opinion). Ocean notes that upon attempting to feed the text of the “translation” back through GPT-3 a third time, the model “began producing poems that — despite being on topic and more beautiful than its second draft — bore little resemblance to the original”. I would be curious to see the first draft, as I consider the preceding in fact quite a good description of this second, published draft — generally on topic, aesthetically pleasing, but only vaguely resembling the original.
Even for a reader with no Chinese, a cursory comparison to the higher-fidelity output of Google Translate shows wide divergences. Not that the Google Translate rendition is perfect either, but, as one might expect for a commercial translation system, it does a decent job for the most part of neither subtracting nor adding content relative to the original, and accurately, if boringly, translating the words it was given. “Yellow is enough to show the value of its pure gold” is no ringing lyric gleaned from the mellifluous murmurs of the Muses, but it is a correct translation of “黃色就足以展示其純金的價值”. I won’t bore you with other examples of this sort.
The flaws Google Translate displays are somewhat interesting in their own right, and serve as a useful point of comparison for GPT-3’s. It forgets that in the context of this poem, Wugong (武功 Wu3gong1) is the name of a township in Shaanxi rather than referring to martial arts. There are some weird artifacts later on (“Nuannuankengtou, Lili dream” for 暖暖坑頭，離離夢想) where it didn’t know what to make of one of the more Classical Chinese-style turns of phrase. It misses that 藍田 Lan2tian2 refers to Lantian county, also in Shaanxi, and thus misinterprets “Lantian jade” (藍田玉 Lan2tian2 yu4) as a (fairly plausible) name “Lan Tianyu”. It mangles the (uncommon) idiom 仙山瓊閣xian1shan1qiong2ge2, “crystal palaces in the fairy mountain”, interpreting the latter half as a proper noun “Qiongge” (tbh that might have been my guess without consulting a dictionary also so ¯\_(ツ)_/¯ ), ostensibly the name of the fairy mountain. Where it does subtract, it often excises some of the poetic frills, such as rendering 挺挺的白楊 ting3ting3 de bai2yang2 “the poplars, standing ramrod straight” as simply “the poplars”. And so on.
How does GPT-3 do with these passages? It’s impossible to say, because none of them show up in any recognizable form in GPT-3’s output. Elements of the original certainly glimmer through throughout the text from GPT-3, but only as if refracted through a mosaic of prisms, by turns distorted, elided, duplicated, displaced or diffused. Line 14 of the original for instance, 早已化作原上泥土的芳香 (“… have all long ago turned into the fragrance of the original topsoil”), seems to have influenced two lines in GPT-3’s text — “Does it not bring with it the fragrance of a thousand herbs?” as well as “have long since become a wisp of smoke”. GPT-3 liked the “fairy palaces in/of the mountains” (仙山瓊閣) so much it decided to play that card twice, instead of only once as in the original. Other lines, such as “Then suddenly, your solitude is broken; you glow with an inner vitality” are fabricated from whole cloth, with no clear connection to the original at all, besides perhaps a vaguely similar tone of triumph against adversity.
How about the annotations? Here Google Translate has no hope of competing of course. In the context of the long-term trajectory of natural language processing technologies it is, of course, astounding that a computer system can now do anything remotely like explaining the nuances of its own translation of a text. The capability GPT-3 displays here is even, amazingly, almost somewhat useful. Almost. The problem here is twofold: most damningly, that what GPT-3 is annotating cannot really be called a translation of the original text, as discussed above, and secondly that, although superficially convincing, the quality of the annotations is extremely mixed.
Some of the annotations seem totally superfluous. I don’t think any literate human reader of English needs clarification that “value” and “worth” are synonyms, as GPT-3 provides, noting that “價值 means literally ‘price’ or ‘value’ but in the context of this poem refers to ‘worth'”. It neglects what would be some useful contextual knowledge for many readers, such as that both “the descendants of Shennong” and “the children of Yanhuang” are mythologically-tinged epithets for the Chinese people. Of course, some annotations are also factually inaccurate, either grossly or subtly, such as both references to “the last four characters” in two different lines, in each case actually referring to approximately four of the last 6 or 7 characters in the given line. In one of these cases GPT-3 also delves into some (incorrect) explanation of 春節 Chun1jie2, literally “Spring Festival” but better known in English as “Chinese New Year” (or, more transnationally, “Lunar New Year”), by far the most important holiday in the Chinese calendar. The problem is, “春節”, ie Spring Festival, doesn’t show up at all in the original. It appears that the presence of the words 拔節 ba2jie2 “jointing” and 立春 Li4chun1 “Lichun” (a two-week period in the traditional Chinese calendar marking the beginning of spring), both logically associated with springtime and the respective second characters of which make up 春節, ie Spring Festival, caused the model to grasp onto the latter concept. Within the small amount of Chinese text the model was trained on (for the most part, non-English text was filtered out for training), there is no doubt in my mind that Spring Festival showed up at least 2 orders of magnitude more frequently than either jointing or Lichun, so in a sense this is “a good guess”. I suspect this kind of recombination of two words or concepts into another, more common one is something that happens much more frequently in low-training data domains than it would in domains where the model would have been exposed to more extensive data.
Nevertheless, we should give credit where credit is due: GPT-3 does re-create a few lines with panache unmatched by Google Translate. The “divine aura of the Great Tang” is distinctly better than Google Translate’s “the charm of Datang” for “大唐的神韻”. Impressively, GPT-3 correctly intuits (presumably based on the overall nation-exalting tone of the piece) that “大唐” refers to the widely admired Tang dynasty. Although debatable, I would personally take “divine aura” for 神韻 (shen2yun4, “the lingering charm of (a dignified or charismatic) spirit”; notably, the second character 韻 yun4 literally means “rhyme”, alluding to the immortal poetic legacy of the Tang dynasty, giving the term in this context a possible second interpretation as “divine rhymes” ) over Google Translate’s blandly accurate “charm”. Neither system, thankfully, mistakes this wonderful term for the Falun Gong-linked performance group which has adopted it as their name — yes, that one.
This example of GPT-3’s virtuosities and deficiencies is fairly frivolous. It does exemplify, however, what is in my mind an important weakness with regard to the zero- or few-shot translation abilities of GPT models. The output sounds good. Because it generally sounds good, uses the right style, good grammar and so on, it seems convincing. Wow, you think, look at how far machine translation has come. It certainly is a lot better than what a machine could have spit out in the past, across many dimensions. But we should not get carried away with giddiness and declare a spade an industrial-scale excavator (although perhaps “jackhammer” would not be an analogue too far). Even more importantly, we should consider whether the dimensions upon which this system shines are actually the dimensions we care about.
For people who enjoy reading poetry, why might they want to read a poem translated from a foreign language, such as Chinese? Probably, they are interested in various characteristics of the original poem — the aesthetics, the content, the way of viewing the world from which the author composed it. For someone else who doesn’t care about poetry terribly much for its own sake, an ability to translate and explicate poetry may also be exciting by implication that it could enable better translation of difficult bits of more pragmatic language use, such as a speech by a political leader including erudite allusions and poetic diction.
In neither of these cases is GPT-3 nearly as much of an improvement over Google Translate as Ocean seems to think. For the poetry fan, well, reading GPT-3’s text is really reading a different poem. What’s worse, it’s a different poem that is masquerading as the poem you actually wanted to read, while also throwing in some quasi-stereotypical elements prevalent in English-language discourse about the culture you wanted to come to know better from its own perspective (among other additions, GPT-3 mentions “fabled immortals” and “inner vitality” — if this were a human I was speaking with, I would have to respond: “you know not all Chinese people talk about cultivating their 氣 qi4 all the time, right… ?”). The conference participant who hoped to unpack the subtle rhetorical implications of some utterance from, oh I don’t know, one of the many Chinese leaders who in harmonious collective collaboration form the political elite of the PRC, will probably be disappointed when, instead of discussing the Belt and Road Initiative, GPT-3 goes off on a tangent about the beautiful sashes with which countries around the world are festooning their container ports.
At the same time though, this is what GPT-3 can do out-of-the-box, having not been trained as a machine translation system, or even intentionally exposed to non-English text, at all. Although some issues would fundamentally remain without significant architectural modifications (for instance, ideally you would want such a system to know where it was making debatable editorial choices and be able to proactively identify and support its choice alongside a slate of alternative options, the way a human translator would be able to do), the bounds are vast on the field of potential application of GPT-3 not as a finished product itself, but as a component or a raw material with which to create other systems.
So, considering that all GPT-3’s translation capabilities were a by-product of its extremely simple training objective, the ebullient enthusiasm is understandable. Beyond just being able to convert back and forth between “yellow” and “黃色” (and perhaps getting to the core of what set Ocean’s heart aflutter), at its best, I don’t think it would be exaggeration to call some of GPT-3’s creations “inspired”. Is not inspiration precisely the surfacing of low-probability options in some problem space which nevertheless fit, possibly much better than more predictable or conventional alternatives? The kaleidoscopic recombination of concepts within and between domains which underlies this process comes naturally for deep neural networks. Of course, when “inspired” choices don’t fit, which is most of the time currently with systems like GPT-3, we would probably describe the results as insanity. But the prospect of inspiration on the cheap, at machine speed and scale, with an expert human in the loop separating the wheat from the chaff, is immensely exciting for the future of human science and culture alike.
此僅始也。This, as they say, is only the beginning.