Many SEOs over the years have mined data from Google Webmaster Tools—it’s a vital, constantly moving piece of our daily analysis work. Without GWT, we would navigate significantly more in the dark and with far less information at our fingertips. And Google, thankfully, has effectively kept GWT from being stagnant, partly by adding new functionality (URL Parameters, Index Status or Structured Data charts) or removing existing functionality (the Site Latency chart recently relocated to Google Analytics).
One area we often look to for help when assessing the health of a site is the "Crawl Errors" section of GWT. Extremely valuable information can be attained here, including possible systemic issues, specters of website iterations past or pertinent, and current troubles that need to be addressed. Furthermore, Google recently added the ability to look at incoming links that point to a page throwing a status code other than 200. While this data is very useful when cleaning up possible loss of external link equity, it also raises some interesting questions in another particular area.
After looking at the "Other" section on a client report recently, we inquired about the origin of some links. The client, however, had never seen those kinds of URLs on their site.
As SEOs, we can fix a problem when we’re able to see where links originate, especially when a client links to pages from the past. In this example, all the links were coming from the same directory.
<a href="DOMAIN.com">XXX XXX Indoor Planters Organize It All Planters | COMPANY</a></b><br />Free shipping on orders over $75! COMPANY carries many XXX XXX indoor planters organize it all planters<BR /><span>www.domain.com/XXX-XXX-indoor-planters-organize-it-all...</span>
GWT picked up the link that appears between the <span> tags (on-page content) as the source of the URL. The link above displayed as:
Again, with this specific example, the same issue occurred many times, in which no other link to the actual page appeared except outside the <a> tags.
The takeaway is that links can show up as regular content with no <a> tag in the vicinity:
In fact, we were tipped off about this nearly two years ago in Google’s response to “Bizzaro URLs that never existed.”
Q: Most of my 404s are for bizarro URLs that never existed on my site. What’s up with that? Where did they come from?
Special thanks to RKG's Craig Zagurski for editing my posts and making them understandable to human eyeballs.