The Imperative: Successful Site Architecture

Posted In SEO

By Matt Peterson

It seems that lately, I’ve spent most of my time as an SEO delivering site audits and diagnoses. It never ceases to amaze me how small tweaks in removing duplicate content and internal link optimization can really snowballÂ the good work done across other classic SEO efforts.

Needless to say, “The Imperative: Successful Site Archictecture” is an SES New York session I’ve been looking forward to for weeks.

SES advisory board member and Beyond Ink founder Anne Kennedy moderated the session. Up first is Shari Thurow, who is theÂ founder and SEO director of Omni Marketing Interactive

When we talk about site architecture, everyone has a different response. Information architecture is an ongoing process, part of a continuum and comes before content development and visual design.

SEO’s define architecture as crawlability and indexation, often to the confusion of users and others within the organization. Shari notes: You don’t create information architecture for Google, you create them for users.

The problem is, usability professionals often don’t know how to treat optimization architecture,Â while SEO’s often don’t know how users actually interact with information architecture.

Contrary to what you may believe, site architecture is not aboutÂ your CEO, marketing department, or SEO’s perceptions, site architecture is about the users.

Information Architecture vs Technical Architecture
Information architecture is about how files are arranged onÂ a web server. Is a tomato a fruit or a vegetable? Why do people put it in the vegetable category? The same with turtles as amphibians – you need to be able to cross reference multiple categories.

After site architecture – you can then design the user interface. Search engine magic happens when these are in step. Order your navigation the way your users want it to be ordered, and what you’re trying to sell. One visualization tactic is to take away your masthead and ask yourself “what page am I viewing”.

Site Architecture and Links
Understand that page interlinking is not the same as link development which is actually external links. How you link to your own content helps search engines determine what’s important on your site, both in vertical hierarchy and horizontal hierarchy.

With vertical hierarchy, the home page “looks” like the most important page on the site, even though your content pages may be the real most important pages. It’s important for you content pages to link to eachother in a keyword friendly way. The Food Network does an excellent job of vertical linking – on each recipe page, they keyword focus additional links to related recipes.

URL structure can also communicate information architecture, although Shari downplays the effect they can play on rankings except for navigational queries. Users with Navigational queries rarely look past 1st through 3rd results in search engine results pages.

Ask These Questions About Your URL Structure

Which URL is your target audience most likely to remember?Â Make sure they are not difficult to read.

Which is better for search engine visibility, one or two sub directories? Shari gives cookies for those who say “it really doesn’t matter,” focus primarily on interlinking and keywords.

What is better, a subdomain or subdirectory? Cookies again for those who say “it doesn’t matter”, as long as they crosslinkÂ purposefully, they both communicate the same information.

From the outset you need to look at user goals, business goals, and search engine goals and see where they align. You need your keywords in the navigation, the breadcrumbs, and in the footer. After you implement these, then you design the site, and more importantly you begin usability testing.

How often do you actually test your website usability and information architecture with real humans before launching?

The biggest mistake on the planet is jumping straight to what a site looks like. It’s very difficult to retool information arch. “Page Rank Sculpting” is actually not a desirable practice, spend the time getting your architecture right at the outset rather than rolling out complicated conditional Nofollow schemes.

Unfortunately, SEO’s often resort to building doorway pages because management doesn’t want changes to the website, so they ask SEO’s to do things that don’t affect the look and feel. That’s where you get invisible layers and page rank sculpting, but in the end they are still just workarounds, not solutions. Eventually the site needs to be fixed for long term results.

Next was Alan Perkins, Head of Search Marketing at SilverDisc Limited. Alan apparently has two presentations, but after a bit of deliberations, decides to go with the “long presentation”.

Why focus on successful site architecture?
Everything else you do in SEO is built on site architecture and mistakes are always harder to correct later. It’s important to all of your marketing efforts. Also, understand that your site doesn’t exist in isolation, think about your site within the whole web.

Alan notes that technical architecture is comprised of hardware side which includesÂ routers, caches, etc. as well as a software side which includes protocols and databases.

The Core of the Web & PageRank
You want to be in the core of the web and to get there, you need links from pages in the core and you need to link to pages in the core. Page rank hoarding positions you on the outside of the core, it actually does you damage. Search engines ultimately want pages very meshed into the web.

Understand the Information Architecture of Your Linking Partners
What is their platform, language, location, and vertical/theme? For success, link your technical architecture to the services and partners you want to feature.

The mission of your website should match the searchers mission and to understand the searchers mission, you need to do keyword research. Keyword research reveals the high volume/high competition head keywords and low volume/competition tail keywords.

Your Website’s Skeleton
Get your website’s skeleton right; this consist of organization, navigation and labeling. In Alan’s opinion, the best website skeleton is hierarchical. Look at Yahoo directories for example.

A hierarchical skeleton provides easier site management and better defined categories. Spiders employ breadth first crawling algorithm, hierarchical allows you to better see how spiders crawl. It’s also easier to create deep links into other sites and directories, linking partners with a niche can link into a niche page on your site

The Power of Breadcrumbs
Alan shows off a typical hierarchical taxonomyÂ by defining a niche within that site, a vertical slice.

My Store – >Men’s Footwear – >Running Shoes – >NikeAir

Imagine the breadcrumb trails for this hierarchy; examine how these keywords in the breadcrumbs are used in the links and page text. Focus on your primary keywords for the most important pages of level of hierarchy. The breadcrumb trail provides a means of channeling keyword power through your site hierarchy.

Information Architecture on Your Site
Link to items direct from the home page if you think they will be of particular interest to most visitors.
By the same token, bury things that tend to be of more niche or long tail interest. Build a long term marketing spine to capture the more competitive head terms.

Alan advises that a site map should be an aid to humans as well as search engines.

As a general rule of thumb:Â The more important you considerÂ a piece of content/item/product for your users, the higher you place it in your marketing information architecture.

Canonicalization
A canon isÂ simply a body of rules and we all canonicalize frequently without thinking. We apply a set of rules to apply to theÂ items we want to buy. The rules for aÂ t-shirt purchase could be defined by size, color etc.Â Ultimately you find the one that best satisfies these rules, the one you like the most. That’s canonicalization in a nutshell that’s exactly what search engines do.

More specifically, canonicalization in a search engine is where multiple URL’s are treated as the same url.Â This one URL represents the set, and this happens most often with duplicate content and redirects.

Duplicate Content
This generally occurs when same content is available at different URL’s. It can occur on dynamic websites and also occurs when same site is available under different domains. For example:

www.example.com
http://example.com
http://www.example.com/index.php

All may be the same page. How do search engines determine which page to pick? Often, it is the page with highest PagePank, shortest url, or the one with most inbound links.

Keep in mind, you don’t want to leave canonicalization up to search engines, take control by telling search engines your preference with robots.txt.

Know that you can be penalized for accidental duplicate content. The search engine spider spends time reading the same content rather than potential new content that the spider might miss for redundant content crawling.

In the end, these duplicate pages end up competing against each other rather than you controllingÂ the “champion” page for each keyword, where channel all the specific keyword energy and power into one page

Redirects
Redirects help get the right content indexed at the right address. Server side redirects includeÂ HTTP 301’s,Â 302’s etc. Try to separate in your mind the address of content from the content itself.

Temporary redirects are good for a home page to dynamic rendition
Permanent redirects are good for non-www to www versions of page

Normalization
Similar to canonicalization, normalization determines the standard form of a single URL. Search engines typically normalize all url’s before queuing them for indexing.Â Normalization involves scrubbing up a url to make it look nice to the outside world. Ultimately what you’re trying to achieve is each piece of content on your site indexing once on the best possible URL.

Location
To appear in country specific versions of major search engines, use an appropriate domain extension and/or host in the specific country. If possible, get links from sites in your desired country

Javascript & AJAX

Search engines don’t crawl links written by javascript or links that are javascript.

Search engines don’t index content that is written by javascript or pulled in by ajax

To solve – use plain links with a NoScript tag.

Search engines don’t support cookies – don’t rely on cookies for marketing pages, they are mostly important for checkout process pages, pages that don’t always have indexing value anyways.

The Four Robots Standards

1. Robots.txt
A plain text file stops content from being crawled and indexed. It is placed on the domain root folder.

2. Robots Meta Tag
Placed in the <head> of an HTML file, stops content from indexing, but not from being crawled. The header has to be read in the first place, right? It only works on HTML content, but it is useful for retrofitting a site with existant problems

3. Rel=Canonical
The newest standard, placed in <head> of a HTML file, it specifies a preferred canonical URL, and it’s good for removing affiliate codes and tracking codes from a URL. Note that you only need it for canonicalization, it does not help you with normalization

4. Rel= No Follow
Works at the link level and is placed in anchor tags, ex. <a href=”link” rel=”nofollow”>. Google wants you to use it to label paid links, but most people used for channeling page rank or to deter blog spam.

Tools of the Trade for Successful Site Architecture:
Google Webmaster Tools
Yahoo Site Explorer
Live Search Webmaster Center