Monday, October 28 2002

Subject: Re: dc language in rss

Date: Mon, 28 Oct 2002 08:24:08 -0500 (EST)

From: Aaron Straup Cope

To: Bill Kearney

Subject: Re: dc language in rss

On Fri, 25 Oct 2002, Bill Kearney wrote:

> That would indeed be a problem.  You could actually mark up those sections, even

> down to the paragraphs or even words with span tags.  I shudder at the thought

> of what most environments would DO with that data, but it's certainly possible.

If I were a better person, I(would(learn(lisp))) and write an Emacs

minor-mode to do that. (Sadly(,(lisp(scares(me))))).

> Well, the problem is what does that element mean?  What purpose is it being used

> for?  I daresay outside of Syndic8's listing of feeds by language, not much is

> paying attention to it.  So my question to you is what would you have a reader

> program DO with multiple languages?

The short answer is : I have no idea.

The longer answer is : Who cares?

There are two issues here :

The first falls into the Foofy Grand Unifying Principles category - the

people who invented the Internet didn't know what it was going to be used

for. Why should RSS, and its tool set, presume the samething as basic and

often controversial as language?

The second falls into the Dueling Shakespeare category - RFC 1766 states

that :

"In some contexts, it is possible to have information in more than one

language, or it might be possible to provide tools for assisting in the

understanding of a language (like dictionaries).

"A prerequisite for any such function is a means of labelling the

information content with an identifier for the language in which is is


But in the absense of multiple language tags, the correct answer when

prigs like me start pussing is :

<quote src = "rfc1766">

The information in the subtag may for instance be:

    -    Country identification, such as en-US (this usage is

         described in ISO 639)

    -    Dialect or variant information, such as no-nynorsk or en-


    -    Languages not listed in ISO 639 that are not variants of

         any listed language, which can be registered with the i-

         prefix, such as i-cherokee

    -    Script variations, such as az-arabic and az-cyrillic


Which doesn't solve everyone's problem, but can be adapted to deal with

the problem of Quebec. I chose en-quebecois, because I like the sound of

it. Sovereigntists, on the other hand will probably opt for 'en-qc' since

it implies nationhood.

Then, of course, there is the question of how to deal with representing a

weblog written by the province's allophone population (translation:

persons whose mother tongue is neither English nor French and who, in my

limited experience, often speak upward of 4-6 languages). What then?


