How to design SEO friendly pagination with HTML5

Web designers and developers spend a lot of time striving to perfect the search engine optimisation ("SEO") of their websites and applications. This may include promoting their brand across various social media, implementing the latest programming techniques, such as HTML5 Semantic Elements; or reducing the time their website's pages take to load.

A lot of energy can be spent researching SEO, and rightly so. uality SEO can drive large amounts of traffic to your website/application; widening your audience and resulting in more conversions for your content/services.
I have witnessed many designers and developers going exceptional lengths to improve their SEO by the tiniest amounts. However, many overlook key elements causing devastating results to their SEO.

WEBSITE PAGINATION

Website pagination is a key part of many websites/applications and comes in many shapes and sizes.
I have seen some incredibly innovative paginations, and amazing designs, which have greatly increased usability.

Pagination Examples:

A well designed pagination is great for your human audience but is not going to help your SEO, since search engines do not browse your website in the same way as your human audience. Search engines unfortunately do not care for the beautiful graphics or fancy transitions used on your website's pagination.
Search engines commonly use "Web Crawlers" (Sometimes called "Spiders") to crawl your website for content.
An example of a well-known crawler is "GoogleBot" used by Google to index websites.
It is vital that these crawlers can navigate and understand the content of your website.
Fortunately some new features added in HTML5 have made it a lot easier to direct crawlers in the right direction.

Pagination with rel="next" and rel="prev"

Scenario:

1. Let's say for example you have a post on your blog that spans multiple pages.
2. You have a pagination linking the pages together for your audience to navigate from one page to the next.
3. Your URL layout is similar to following:

http://www.example.com/articles/story/page1.html
http://www.example.com/articles/story/page2.html
http://www.example.com/articles/story/page3.html
http://www.example.com/articles/story/page4.html
http://www.example.com/articles/story/page5.html

In the above example we have one article "story" that spans five separate pages. If the crawler lands on page2.html we are going to want to let it know that page1.html, page2.html, page3.html, page4.html and page5.html are related for many reasons;

1. Search Engine Indexing:
Hinting at the relationship between the five pages to the search engine can improve indexing of your content and reduce the risk of duplicate content being indexed.
2. Helping the crawler understand where the content begins:
It would be preferable if the search engine served page1.html to searchers rather than page2.html so that your audience lands at the beginning of your content rather than half way through.

Solution:

By using HTML attributes rel="next" and rel="prev" we can help search engine crawlers understand the relationship between our pages.
On our first page in the above scenario http://www.example.com/articles/story/page1.html We are going to include the following line in our <head> section:

• The first page in the series includes only rel="next" mark-up as there is no previous page in the series. This helps the crawler identify this page's relationship as the first page of the series.
• The following pages in the series (pages 2-4) are double linked with rel="next" and rel="prev" mark-up as these pages have a previous page and next page in relation to their position in the series.
• The last page of the series only includes rel="prev" mark-up as there is no next page in the series. This helps the crawler identify this page's relationship as the last in the series.

Additional Tips for rel="prev" and rel="next" usage

• rel="previous" can be used as a syntactic variant of rel="prev" and is recognised by many major search engines.
• rel="prev" and rel="next" values can be absolute or relative URL.
• rel="canonical" can be declared on the same page as rel="prev" and rel="next" and are constituted as independent concepts.

How to use rel="canonical" to avoid duplicate content indexing

Scenario:

1. Let's say for example you have multiple URLs that link to the same page:
http://www.example.com/contact
http://www.example.com/contact.php?fontsize=16
http://www.example.com/contact.php?fontsize=20
2. Search engines may be indexing this page multiple times under the different URLs.
3. This may cause a search engine to penalize your website for duplicate content.

Solution:

Using HTML attribute rel="canonical" we can notify search engine crawlers of the "canonical" or "preferred" version of a page to avoid duplicate content issues.

In our above scenario we have three URLs that all link to the same page at http://www.example.com/contact.

Search engine crawlers may index these URLs as three unrelated pages.

By adding the following line to our <head> section of all three pages we can declare a "canonical" or "preferred" page for the crawler to index:

<link rel="canonical" href="http://www.example.com/contact"/> The crawler will now identify all three pages as our preferred page http://www.example.com/contact and will not index duplicate content under the alternative URLs.

Using rel="canonical" to index a View-All Page

User testing has taught us that searchers prefer a view-all single page version of content over a component page showing only one portion of the same information with arbitrary breaks.

Many websites include a view-all page as well as paginated content. They may contain a paginated version of the content due to user latency issues or large amounts of content.

You may wish to index your View-All page rather than your individual component pages to deliver a better user experience for searchers.

Scenario:

1. Let's say for example you have the following component pages and a view-all page:
http://www.example.com/articles/story/page1.html
http://www.example.com/articles/story/page2.html
http://www.example.com/articles/story/page3.html
http://www.example.com/articles/story/view-all.html

2. This may cause duplicate content to be indexed as the content in your view-all.html page is identical to the content split amongst your component pages page1.html, page2.html and page3.html.

3. You may want search engines to index your view-all page instead of the component pages so searchers are directed to your view-all page from searches.

Solution:

Using HTML attribute rel="canonical" we can direct crawlers from our component pages to our "canonical" or "preferred" view-all page.

By adding the following line to the <head> section of the component pages: page1.html, page2.html and page3.html; we can declare the view-all page as the "canonical" or "preferred" page for the crawler to index:
<link rel="canonical" href="http://www.example.com/articles/story/view-all.html" />
We can also declare rel="prev" and rel="next" attributes in the <head> section of the component pages as they are constituted as independent concepts.

Using Meta Tag NOINDEX to tell a crawler not to scan a page

There may be a scenario where you do not wish a crawler to index or scan one of your pages, for example, if you would prefer the search engine crawler to index your component pages rather than a view-all page you can use the META Tag NOINDEX on your view-all page to declare to the crawler not to index your page.

Include the following line in your page's <head> section to declare to the crawler not to index your page:

<META NAME="ROBOTS" CONTENT="NOINDEX">

*Crawlers/Robots can ignore your <META> tag. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.

You can also add the value "NOFOLLOW" to your META tag to declare to the crawler not to follow any links on your page:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

*If another page or website also links to an identical link on this page you cannot guarantee that a crawler/robot will not visit the page by following the link from the alternative page/website.