project-cdsware-users@cern.ch archives


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

enriching the url namespace


  • From: Frédéric Gobry <frederic.gobry@xxxxxxx>
  • Subject: enriching the url namespace
  • Date: Fri, 15 Jul 2005 08:32:20 +0200

Hi,

The instance of CDSware at EPFL is harvested by lots of external search
engines (and we want it to be). We would like to improve the way these
engines are navigating in the site.

 - one issue is that most of the urls in the system are /search.py +
   arguments. This makes it difficult to limit the harvesting to
   specific parts of the tree via a robots.txt for instance, and having
   the robots follow every link to every export format and to the
   "similar documents" query  is a waste of resource and leads to more
   noise in the search results.

   So I'd ideally see specialized url namespaces, like:

     /collection/... which will replace the ?c=... urls

     /details/123456 for detailed records
     
     /author/....   for links coming from authors. These pages are
                    important as entry points

   (these names are just suggestions)


 - another point that might improve the visibility of the pages in
   external engines is the use of more detailed <title>s for the pages.
   By default, detailed records have their recid in the title, instead
   of the actual document title. This is not very helpful when one sees
   the page in Google for instance.  Same problem for links on author
   searches, titled "Search results".

We already tried to address these points locally, but it might be of
interest to others.

Frédéric