NAME

ASCOPE::Article::Collection - Aaron's widget for munging a specialized subset of the DocBook format


SYNOPSIS

 use ASCOPE::Article::Collection;
 my $collection = ASCOPE::Article::Collection->new("/path/to/articles.xml");
 $collection->mk_html({htroot => "/htdocs/news",
                       uri    => "http://127.0.0.1/news";,
                       flush  => 1});


DESCRIPTION

Aaron's widget for munging a specialized subset of the DocBook format used to record one or more articles.


DOCUMENT STRUCTURE

Book widgets contain the following elements :

 + book
   + bookinfo
     - title
     - pubdate
   + part (+)
     + partinfo
       - title
     + xref (+)
       @linkend
     + article (+)

Article widgets contain the following elements :

 + article
   @id
   + articleinfo
     - title
     + keywordset (or not)
       - keyword
     + indexterm (or not)
       @ type
       - primary
     + authorgroup
       - corpauthor
       OR
       + author
        - surname
        - firstname
        - othername (or not)
     + abstract
       - para
     - pubdate
     + copyright (or not, but should be)
       - year
       - holder
   - para (+)


TOPICS, KEYWORDS, INDEXTERMS AND RELATIONS

Individual articles may contain one or more keywordset or indexterm thingies. When these relations are indexed in any of the methods below, they will be returned with the following labels :

keywords

A keyword is labeled as an ``idea''.

indexterms

Index terms are labeled as follows - where the type attribute is :


PACKAGE METHODS

__PACKAGE__->new($path)

Returns a ASCOPE::Article::Collection object. Woot!


OBJECT METHODS

$obj->load_collection($path)

$obj->pubdate()

Returns a string.

$pkg->mk_html(\%args)

Render a collection of articles as an HTML tree.

Returns true or false.

$obj->authors()

Returns a hash reference. Each key is the name of an author (or news service.) Each value is an array reference of URIs.

$obj->articles()

Returns a list.

$obj->articles_for_tag($type,$tag)

Returns a list of URLs.

$obj->tags_for_article($uri)

Returns a hash reference.

$obj->index_plucene(Plucene::Simple)

Index a collection with Plucene::Simple.

Returns true or false.

$obj->index_contextgraph(Search::ContextGraph)

Index a collection with Search::ContextGraph.

$obj->index_delicious(ASCOPE::Delicious)

Index a collection with ASCOPE::Delicious.

$obj->stats()

Returns a hash reference of relations and their number of referents organized by topic. e.g. :

 geo => %
     Baltimore (Md) => 1
     Brazil => 1
     Brooklyn (NYC) => 1
     ...
 org => %
     Air Canada => 1
     Al Jazeera => 1
     Al Qaeda => 2

This method shares a degree of overlap with the weight_relations method that remains to be reconciled.

$obj->relations()

Returns a hash ref of all the relations and their topics and referents. e.g. :

 Johnson, Magic => %
       keywords => %
            Automobile Racing => 1
       org => %
            National Assn of Stock Car Auto Racing => 1
 Leiter, Al => %
       keywords => %
            Baseball => 1
       org => %
            New York Mets => 1

$obj->weigh_relations(\%relations)

Returns a hash reference of all topics and the number of references pointing to each.

  my $relations = $collection->relations();
  my $weights   = $collection->weigh_relations($relations);

The hash reference contains a nested hash whose key is __topic and which maps topic name to topic type. e.g. :

 %
     Accounting and Accountants => 3
     Air Canada => 2
     Air Pollution => 3
     Airlines and Airplanes => 2
     Al Jazeera => 4
     ...
 __topics => %
     Taxation => keywords 
     ...

This method shares a degree of overlap with the stats method that remains to be reconciled.

$obj->heaviest_relations(\%weights)

Returns an array reference of one, or more, topics with the highest number of references.

  my $relations = $collection->relations();
  my $weights   = $collection->weigh_relations($relations);
  my $heaviest  = $collection->heaviest_relations($weights);

$obj->graph_relations(GraphViz,\@subset)

Graph all relations, or a user-defined subset of keywords or indexterms passed as an array reference.

    my $gv = GraphViz->new(...);
  my $relations = $collection->relations();
  my $weights   = $collection->weigh_relations($relations);
  my $heaviest  = $collection->heaviest_relations($weights);
  $collection->graph_relations($gv,$heaviest);
  $gv->as_svg(...);

$obj->hr_articleinfo(XML::LibXML::Node)

Utility method to return the articleinfo for an article as a hash reference.


VERSION

1.6


DATE

$Date: 2004/09/18 21:24:50 $


AUTHOR

Aaron Straup Cope <ascope@cpan.org>


LICENSE

Copyright (c) 2003-2004, Aaron Straup Cope. All Rights Reserved.

This is free software, you may use it and distribute it under the same terms as Perl itself.