[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
enriching the url namespace
- From: Frédéric Gobry <frederic.gobry@xxxxxxx>
- Subject: enriching the url namespace
- Date: Fri, 15 Jul 2005 08:32:20 +0200
The instance of CDSware at EPFL is harvested by lots of external search
engines (and we want it to be). We would like to improve the way these
engines are navigating in the site.
- one issue is that most of the urls in the system are /search.py +
arguments. This makes it difficult to limit the harvesting to
specific parts of the tree via a robots.txt for instance, and having
the robots follow every link to every export format and to the
"similar documents" query is a waste of resource and leads to more
noise in the search results.
So I'd ideally see specialized url namespaces, like:
/collection/... which will replace the ?c=... urls
/details/123456 for detailed records
/author/.... for links coming from authors. These pages are
important as entry points
(these names are just suggestions)
- another point that might improve the visibility of the pages in
external engines is the use of more detailed <title>s for the pages.
By default, detailed records have their recid in the title, instead
of the actual document title. This is not very helpful when one sees
the page in Google for instance. Same problem for links on author
searches, titled "Search results".
We already tried to address these points locally, but it might be of
interest to others.