Sep 242006

Web Signatures and Soft Cookies: Guessing The Identity of Anonymous Site Visitors

Padmanabhan and Yang published an interesting paper called Click Tracks On The Web: Are There Signatures In Web Browsing Data?

Their answer: yes, there seem to be such signatures.

What's a web browsing signature? And why would online retailers care?

Just people have characteristic walking patterns, typing patterns, and online writing patterns, it turns out that people have characteristic ways of using websites.

And just as aggregate patterns of whorls and arches in a fingerprint can sometimes identify an individual, statistical measures of web usage can sometimes identify an online visitor.

In short: even if someone doesn't log in to your website, you might be able to make a good guess as to who they are using the time they visit, how long the stay, and how they click across your site.

You could think of a click signature as a "soft cookie" -- you're not 100% sure of the user's identity, but you might have a reasonable guess.

Not this year, however. Today's sites aren't advanced enough, and clickstreams aren't well enough understood (or even stored) well enough. But maybe, some day, in far distant future -- I'd predict about five years out -- sites might be able to make reasonable guesses as who is using the site, just by their browsing behavior.

How could browsing signatures help an online retailer?

  • Fraud. P & Y propose this approach as a way to detect possible fraud: "Alert! The user in session 12345 is attempting to check out using Jane Smith's credit card, yet the user in session 12345 uses our site very differently than Jane Smith typically does. Investigate for potential credit card fraud."
  • Targeting. P & Y don't discuss targeting, but a site could guess the identity of an anonymous visitor to serve relevant offers: "Alert! The user in session 67890 has similiar click sigature to Bob Jones, who often buys fishing equipment from us -- serve fishing ads across site to this user."
  • Multichannel tracking. P & Y don't discuss multichannel, but a retailer might be able to use click signatures to match anonymous web visits back to known web users, and then back to offline marketing: "Attention! The user in session 456789 has a similar click signature to Mary Thompson, to whom we just mailed a fall test catalog. That version seems particularly effective at getting older women to our site."

Again, this software doesn't really exist yet, and won't for a long time.

I predict the major search engines will be among the first to try out this technology: "The searcher in session 33445566 isn't logged in but we suspect it is 123456789 so we can maximize paid click revenue by favoring ads in categories 998, 776, and 554."

It will be several more years before click signatures become standard enough to be rolled into e-commerce platforms for retailers. (mod_softcookie standard in Apache 4.0, perhaps?)

Still, it is intriguing to consider the long-term implications of click signatures. And these patterns remind us, yet again, how non-anonymous our web sessions are.

(Tech details: Using ComScore panel data, P & Y looked 50,000 users interacting with the "5 most popular online sites" in the data across all of 2004. P & Y characterized each session using five crude metrics -- session duration, session pages viewed, session average time per page, session start time of day, session start day of week -- and characterized each user-site pair by the mean, median, variance, min and max of these metrics. P & Y used binary search to find the best level of aggregation for a j4.8 classification tree. What I find most amazing is that this very crude five metric approach worked at all -- I'd've thought they'd need much more detailed click stream information to get any predictive power.)

Leave A Comment