today | current | recent | random ... categories | search ... who ... syndication

Monday, October 28 2002

Subject: Re: dc language in rss

Date: Mon, 28 Oct 2002 08:24:08 -0500 (EST)

From: Aaron Straup Cope

To: Bill Kearney

Subject: Re: dc language in rss

On Fri, 25 Oct 2002, Bill Kearney wrote:

> That would indeed be a problem.  You could actually mark up those sections, even

> down to the paragraphs or even words with span tags.  I shudder at the thought

> of what most environments would DO with that data, but it's certainly possible.

If I were a better person, I(would(learn(lisp))) and write an Emacs

minor-mode to do that. (Sadly(,(lisp(scares(me))))).

> Well, the problem is what does that element mean?  What purpose is it being used

> for?  I daresay outside of Syndic8's listing of feeds by language, not much is

> paying attention to it.  So my question to you is what would you have a reader

> program DO with multiple languages?

The short answer is : I have no idea.

The longer answer is : Who cares?

There are two issues here :

The first falls into the Foofy Grand Unifying Principles category - the

people who invented the Internet didn't know what it was going to be used

for. Why should RSS, and its tool set, presume the samething as basic and

often controversial as language?

The second falls into the Dueling Shakespeare category - RFC 1766 states

that :

"In some contexts, it is possible to have information in more than one

language, or it might be possible to provide tools for assisting in the

understanding of a language (like dictionaries).

"A prerequisite for any such function is a means of labelling the

information content with an identifier for the language in which is is


But in the absense of multiple language tags, the correct answer when

prigs like me start pussing is :

<quote src = "rfc1766">

The information in the subtag may for instance be:

    -    Country identification, such as en-US (this usage is

         described in ISO 639)

    -    Dialect or variant information, such as no-nynorsk or en-


    -    Languages not listed in ISO 639 that are not variants of

         any listed language, which can be registered with the i-

         prefix, such as i-cherokee

    -    Script variations, such as az-arabic and az-cyrillic


Which doesn't solve everyone's problem, but can be adapted to deal with

the problem of Quebec. I chose en-quebecois, because I like the sound of

it. Sovereigntists, on the other hand will probably opt for 'en-qc' since

it implies nationhood.

Then, of course, there is the question of how to deal with representing a

weblog written by the province's allophone population (translation:

persons whose mother tongue is neither English nor French and who, in my

limited experience, often speak upward of 4-6 languages). What then?


refers to


Me : 0.34

Added support for the rssUpdate method (XML-RPC only, so far) and a bunch of wonkish niggling in the black box. It will take a while for the CPAN listings to update so, until then, you can grab a copy over here. see also : docs.

refers to


Simon Willison : "I've put together an XML-RPC proxy for the [W3C Validator]."

refers to


Me : 0.1

see also : docs

refers to


Le Québec en images

via afroginthevally

refers to


Mina Naguib :

Since everyone seems to think I actually care what they're listening to when they post to their weblog, I think I might have to start telling them what the weather's like when I post to mine. Clear and one degree Celcius, in Montreal.

refers to


Sunday, October 27 2002 ←  → Tuesday, October 29 2002