Author: Stefan Boddie, Managing Director, DL Consulting Ltd.
Digital collections, and particularly digitized newspapers, are unlike other websites in that they comprise massive amounts of content on wide-ranging and disparate topics. Proper SEO to enable suitable indexing is the only way to ensure that the content can be discovered by search engine crawlers like Google.
Joanna DiPasquale, Digital Project Librarian at Vassar College, has been in charge of the Vassar College Digital Newspaper Archives project since 2011. In her work with the collection she noticed that Google and other search engines were not discovering the site. Joanna did some initial investigations and working with the team at Veridian determined that the site was not optimized for Google Indexing. Within a few days the Veridian team enhanced the site for indexing through a software upgrade. The results were impressive. In the 3 months prior to the SEO enhancements total site visits were 447 or about 5 per day. In the 3 months after indexing enhancements total site visits jumped to 12,552, about 140 per day!
With over 50,000 pages of rich content, the Vassar College Digital Newspaper Archives posed a significant challenge for the “Googlebot” crawler prior to setting up appropriate SEO. There was simply too much data to get through without the proper technical parameters in place. Adding a sitemap and a robots.txt file has the effect of directing the crawler through the content pages and preventing it from getting lost in pages with information that would be irrelevant in search queries. As the crawler passes through the content pages it indexes the metadata which now informs Google search results, matching queries directly to the content within the collection.
The SEO enhancements created for and lessons learned from Vassar College and other similar projects are now an integral part of Veridian. Every Veridian installation is now carefully configured and monitored to ensure it is indexed properly by Google, Bing, and other major web search engines.