|
Cookbook /
GoogleSitemapsSummary: How to submit a complete list of web pages to google
Version:
Prerequisites:
Status:
Maintainer:
Search engines and especially google are major source of visitors for many if not most websites. Optimal indexing of a webpage by means of the search engine spider (for example googlebot) is a key issue in achieving good search engine results. A spider visits a web page, the page is indexed and the spider crawls on following the links on the page. PmWiki ensures a proper linkage between the different wiki-pages, and enables easy generation of a sitemap by means of the (:pagelist:) directive. Still, since a spider indexes a website step by step it can take a while before a site is fully indexed, and it will take a while before added or changed pages are re-spidered. Recently google introduced a new method to have a website indexed: Google sitemaps, as usual as a beta program. Google sitemaps allows a Webmaster to submit a complete list of web pages to google. Several content management systems provide a method to use Google Sitemaps. I think it's time for PmWiki as well Using RSSOne method to provide a (partial) index to google sitemaps is to use the rss feed provided by pmwiki based on for example Main.AllRecentChanges:
Do not use the syntax like ..../pmwiki.php/Main/AllRecentChanges?action=rss . Why? from Google: The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://yoursite.com/catalog/sitemap.xml can include any URLs starting with http://yoursite.com/catalog/ but cannot include URLs starting with http://yoursite.com/images/. Thus the syntax above would not add .../pmwiki.php/Cookbook/... to the index Set parameters for a more complete listIt might be useful to tweak the rss a little, by default the feed only displays the last 20 changes: if ( $action=="sitemap" ) {
$RssMaxItems=50000; # maximum items to display
$RssSourceSize=0; # max size to build desc from
$RssDescSize=0; # max desc size
$action="rss";
}
include_once("scripts/rss.php");
Set .htaccess to overcome directory layout restrictionsGoogle is quite strict about the directory layout and the sitemap url must be in the top directory of your website. However redirects are accepted. So a little teak in the .htaccess can overcome that restriction: Redirect /sitemap.rss http://gnuada.sourceforge.net/index.php/Site/AllRecentChanges?action=sitemap(approve links) Now use a syntax like:
Submit this link to google sitemaps using the ping-link or the web form. (see the google pages for details) Using XML-SitemapGoogle provides a special XML scheme for this purpose. Benefit of using the XML-Sitemap scheme are the tags: how important is this page ( relative to the other pages on the site)
how often is the page updates
The changefreq could be derived from the values of the page history. I’m not sure yet how to get a priority of a page. Probably using some patternarray Any thoughts are welcome BrBrBr A Basic scriptChangelog
Comments
Older Comments
solved in version 1.7
actually pages like recentchanges are not included in the sitemap. Since the sitemap alreade includes change-times having the recentchanges in the sitemap is not neccessary
1] It's not clear how to generate the .gz sitemap. I have set $SitemapDelay=0, made a wiki edit, and still I don't see the file. The XML is shown in browser correctly. I temporarily set the pmwiki directory to ALL write, with no sucess. (ref http://www.mr2wiki.com/?action=sitemap). DaveG Here's my hack: adding a script on a linux or OS X system as a (daily? hourly?) cronjob. Say I make a bash script called "makesitemap" for each wiki on my system and put it in the webroot for the site. #! /bin/bash curl -o sitemap.xml http://www.myurl.org/index.php?action=sitemap rm sitemap.xml.gz gzip sitemap.xml chmod 644 sitemap.xml.gz I had to remove the old sitemap or the gzip command asks for overwrite verification Now I just need a cronjob to run it. Most advanced cPanel type webhosts give you a user crontab. No, this won't work for everyone, but people worried about Google sitemaps are already getting a bit advanced :) XES Okay ...I can't run bash on my server so I figured there had to be away of doing the same thing above with PHP ...so a gleamed the net and came-up with the following by hacking other peoples code ...cause I am not a programmer by any means... ARNOLD <?php |
||||||