• Home
  • About
  • Piqq.us Invite Feed
  • Links
  • RSS CULT
  • Google’s User Data Empire

    November 24th, 2008 | admin
    Add to Mixx!

    I’ve been holding off on doing this entry for a bit, but with the introduction of SearchWiki their aims are so clear to me, I just can’t hold off anymore. Google’s problems over the past 2 years have been the result of an algorithm overly based on links. They’ve finally hit their wall. With the latest batch of link buying platforms, their options for truly detecting it are dying out. One can call Google many things, but ignorant of the marketplace and SEOs is not one of those things. So they needed a response. Their response? User data. Lots of fucking user data.
    I know I’ve covered a similar topic before(how Google is essentially creating it’s own internet), but I wanted to do one specifically on user data.

    The Basic Layout of the Google User Data Empire

    • Google Adsense - Google adsense has the unique ability to track without fear of repurcussion. Why? Because any data they send back can be used and archived in their eternal battle against click fraud. This means they transmit everything from screen resolution to ability/version of flash(things that arguably have nothing to do with click fraud). Either way, it’s a window they have into millions and millions of hits on the internet daily. It’s targetted towards informational sites though, and not commercial sites(Google’s true interest).
    • Google Analytics - This is Google’s window into non informational sites. It tracks an absolutely obscene amount of user data(actually, more than you can see/use in their analytics panel). Without this, they’d have no window into sale based sites that would give the competition traffic if they ran adsense. Webmasters flock to this tool, not realizing the danger of feeding Google all that information. Here’s a hint: it tracks conversion rates. Now, Google is currently taking anywhere from 2-5x the amount of adsense revenue they’re giving to the website owner, which means if you do PPC you’re more or less at their mercy for how much you’re paying per click. Them knowing how much you’re making per click via their conversion tracking could (in theory) allow them to adjust your PPC expenses up, while still remaining profitable. But once again, the real gold here is the ability to track the users.
    • Google Chrome - Google Chrome is an interesting creation. Google is a public company. That means they cannot create something like chrome without a significant financial reason. The trick is they’re already propping up firefox via $59.5-70 million a year in donations(85% of Firefox’s revenue) to keep them as the default search. $70 million is jack shit to Google, so they definitely wouldn’t create Chrome simply to save on that, and they’re already getting the ad revenue from firefox searches so that itself doesn’t make sense. So why would they create Chrome?
      • Unique Identifier - Chrome generates a unique id whether or not you agree to send your data to Google. If you agree to send it, this ID gets trasmitted. So what does that do? It makes it so they can identify you regardless of where your computer is, and regardless of cookies. It’s truly the perfect information gatherer.
      • [Partially] Closed Source - I’m no open source junkie, but let’s not kid ourselves. The one primary difference between Firefox and Chrome is that Chrome is closed source. It’s based off of Chromium, a BSD licensed piece of software. BSD license means you don’t have to open source your modification on their code(unlike the GPL). This means one has to run a sniffer to see the data Chrome is sending out; you can’t simply open the source code. While initial versions don’t send out an excessive amount of data, I’m willing to bet user adoption will change that.
      • Typing Tracking - I just sniffed a Chrome request(opted in to trasmit data). The page I was going to was complete blank except for a fake 404 error. Magically, it created 2 requests to Google. One was a “google suggest” style query(which means yes, Google suggest is used for tracking). The other was a curious query, as it trasmitted events(used generic names so I dont know what each stood for), a unique ID, and interestingly enough a variable called “rep”, presumably implying a user reputation level. A single type in of a domain created 3 of these “events”. I wonder what they are.
    • Google Checkout - One of a few ways Google is moving to be able to identify real people. That is to say it’s a way to be able to tie an IP and a cookie/username to a real, 100% legit name. This is worth more than most could ever imagine. Not only is that person identified as someone with a credit card, but the billing address itself gives you a region the person is from, and a probable demographic. Also used to tie back to a real identity is the much debated Google Health, which can store medical information on an individual.
    • Google Toolbar - Fantastic for identifying webmasters, the Google toolbar is among the most powerful methods of getting user data. How long do you think it will be before they turn users into unknowing cloaking checkers(click search results, omgz this pagerank request isn’t for the right domain)? Every single webpage you access, private or not, gets sent to Google for their page rank check.
    • Google Android - The one set of data they couldn’t access properly before. Phone habits. Note how agressively they’ve pursued the cell phone market(IPhone anyone?)
    • SearchWiki - Google’s latest addition to let you reorganize the search results. They say the data is used only for the user that changes it. Fun fact? That makes no sense. Google already has bookmarks, and if you are logged in and click “Web History”(and are  opted in) it will show you the searches you’ve made and the results you’ve clicked. So their is absolutely no reason for the creation of this other than to alter search results, and more importantly gauge user’s reactions to commercial vs. informational sites.
    • Other Obvious Sources - Gmail(your contacts, your interests), the actual search results, and many more.

    Google justifies all of this on the idea that a lot of other companies have been gathering this data for some time. But there’s a difference. Those companies only had data from one source at a time. For Google, it’s different. Their specialty is organizing information. They have access to more avenues for userdata than any other company in the history of the world, and the ability to connect every aspect of every person’s life. Log into gmail on android? Congrats, your phone number can now be tied to your IP home IP. Don’t search using Google? Between adsense and analytics, you’ve probably got a 35-50% chance of sending data to Google anyways with every page load. Did you buy something through an ad served by Google? With conversion tracking, they know you bought, and can tie that back to everything else.

    Why I’m Scared as a User
    I’m really beginning to get scared here. Even ignoring Google’s less than benevolent intentions, can anyone imagine a data breach? No company is truly secure. 4 years ago the entire member database of the largest porn network on the planet was available(including passwords) for 1 grand. over 500,000 records. There have been data breaches at pharmaceutical companies, leaking millions customer records, down to the pill they took and when the prescription was up. Government servers get compromised, credit bureaus get compromised. So why would Google be any different?

    Why I’m Scared as a Webmaster
    Google has an interesting issue. They have more userdata than they can allow adwords advertisers to target. This is an absolutely insane amount of information. So they’re left with 3 options.

    1. Enter the CPA Market - With their Google Affiliate Network, this seems like a likely path. Imagine a massive in house program that can get clicks for dirt cheap(remember, Google takes a HUGE cut out of adsense revenue. Surrendering that they can afford conversion rates that would make normal PPCers cringe).
    2. Not Use the Data  - Google is a publically traded company. Their responsibility is to stock holders. So regardless of how warm and fuzzy they act to the internet community at large, this option is not viable. Their privacy policies contradict the filth they spew towards the consumer about how the data will and won’t be used. And guess which one is legally the reality? The privacy policy. They’re using the data folks.
    3. Take Control from Advertisers - They can’t let me target based on all the data they have, so the alternative is to make the decisions for me based on what they think is best. Well, sort of. Remember that Google automatically optimizes not for conversions, but for click through and profit on their end.

    I don’t understand how prominent geeks normally so paranoid over spyware and whatnot can ignore Google. They function on a higher level than any spyware company in history, and do it all by winking at the webmaster community and acting like they’ll look out for us. “Do No Evil” is the motto of a private company. Not a public company. It’s the antithesis of the free market economy. What is good for the consumer is not good for the company, and that is especially true with an advertising company that has access to so much data.

    Until next time,
    XMCP

    PS: Edited the entry to indicate that chrome is partially closed source. Though the open source aspects are chromium for the most part. To clarify, here’s a line from Chrome’s TOS: 10.2 You may not (and you may not permit anyone else to) copy, modify, create a derivative work of, reverse engineer, decompile or otherwise attempt to extract the source code of the Software or any part thereof, unless this is expressly permitted or required by law, or unless you have been specifically told that you may do so by Google, in writing.

    The Election Algorithm and Black PR

    November 6th, 2008 | admin
    Add to Mixx!

    For as long as I can remember, I’ve loved elections. I get as tired of the repetitive commercials as fast as anyone, but looking at anything on the broad scale is not only interesting to me as a marketer, but also as an SEO. I’m going to try and keep this as balanced as I can. I obviously have my own biases, but whenever I mention one campaign doing something, I’ll try and find a similar example from the other. Lord knows they’re all guilty to some extent.

    The Black PR Machine
    Think of the american people as links, and the whole thing breaks down perfectly like an algorithm, and the ‘big players’ and news hubs ironically like a blackhat setup. The methods (both) campaigns used directly and indirectly to spread black PR was absolutely fascinating, and not dissimilar to the processes used to attract legitimate links via less legitimate ones to overall build authority in (at least my version of) internet marketing.  Each side had their ‘blackhat’ (non-reliable) sites. For the dems this would be dailykos, for the republicans freerepublic and the drudge report. These take snippets of other media(like a scraper), then spin them to their respective bias. All are note legitimate enough themselves to gain attention of truly mainstream media, or be cited in any significant publication.

    So what do they do? Launder the information. If you can take the political message pushed out by one of the propaganda sites, and get it to be rapidly picked up by sites that lean towards the same bias(but not as extreme), it generates a buzz not only amongst the supporting sites, but creates conflict amongst the partisan sites on the other side of the political spectrum. From the moment this conflict is created, it’s given credibility. Newspapers and stations(generally) want to avoid stories that have only one visible side(whether they represent both is a crapshoot) because anything else reeks of a stealthy PR release. So having successfully spread the concept from extreme to (more) moderate sources, it spreads into the mainstream. Fantastic, eh?

    Why does it have to be so involved? It’s because the candidates themselves don’t want to have their own names tied to the dirty message. For example(for better or worse is your opinion) you’ll notice a lot of the comments that Palin was saying(socialist, terrorist, marxist, etc) were more or less restricted from John McCain’s speeches. This also happened at the same time as her “going rogue”. See the subtle disconnection that relieves him of responsibility for her? She’s “going rogue”, not relaying “his message”, even though that message happens to benefit him.

    To be fair, the same thing was done by the democrats, but with them it was done through more substantial blogger/youth support; an excellent means for spreading a message disconnected from the candidate themselves. And much less traceable. So it’s tricky to find a comparison here.

    Similar things happen in the SEO world constantly. We are taking the buzz we can build up through our own tightly connected networks of friends and sites, and attempting to get that buzz spread into the mainstream. Chances are news direct from our site(essentially a press release) won’t have an impact, and neither will sites too closely tied to ours(implied bias). But as that message/link spreads further and further from our niche, it becomes more credible and accepted.

    Now to the offline world of elections. They’re getting efficient: breaking down the demographics of their potential voters(a niche), finding out if they have a chance to control that demographic(ranking), and then attempt to saturate it with branding. Prime examples of this are Obama taking out advertisements in popular video games. The youth demographic was identified as a group that has a lot of pull on the internet, interested, and easier to convert than their older counterparts. So they were targetted hard. Fun fact: For 18-25 year old males on facebook there was always at least 1 Barack Obama ad in the top 3 most common ads, and another 2 not far behind.

    So looking at that from an SEO perspective, we’ll think of each demographic of voters as a niche to control, and each vote as a link. On an individual basis, each niche must be built up to have any substantial affects. For example: all the video game and facebook ads in the world won’t reach a substantial amount of senior citizens. So they must be approached in a different way, but feeding back into the same concept.

    The Niche Breakup
    I like to think of the election demographic breakdown as an attempt to recreate wikipedia, but with linkbait. The whole(or 51%) must be interested in the same platform(wikipedia). Now considering peoples different biases and the different emphasis they place on different things, this must be done by altering the message to each group(like a landing page that changes based on referring keyword).

    Each demographic in each region gets an assigned significance. For example, Barack Obama went hard after the youth vote, taking out ads in major video games, several ads on facebook for the male demographic, and a variety of other means. This is because part of Barack’s strategy was saturation of media and control of message. The youth/bloggers are harder to control, but also have enough of a degree of seperation from the campaign that it is not necessarilly held responsible for their actions. This creates the demand and market for the mainstream media to cash in on. With the constant buzz surrounding the candidate, it makes sense to them to run stories about him.

    McCain on the other hand had a difficult task. He had to deliver different messages to different demographics. He had to take the religious conservative and appease them(hence the introduction of Palin), and not scare off the centrists/independents. By largely keeping Palin off of media aside from places that appealed to her niche demographic (think Fox News and more southern sources) he was more able to send an economic message to the less religious areas without interference. Note that towards the end of the campaign that last part did change.

    To me, this obviously speaks heavily to PPC and traditional banner buys, but beyond that is absolutely fascinating to look at from as a linkbait scheme. It’s using the same resources to convey different messages from different sources that benefit the same (central) source. Almost like a site network? Related Site/CandidateA pushes view X to control demographic Y, then funnels it’s power back into the money site/campaign. At the same time, Related Site/VP Candidate pushes view Z to an entirely different demographic, but benefiting the same money site. Absolutely fantastic.

    The Authority Sites
    As with SEO, your power and ultimately your authority is determined largely by your acceptance level by the power players in that industry. Instead of buying links, support is bought via pandering and promises. But the support from that person isn’t the big deal. The support from the visitors/supporters of the supporter is the significance. They have the ability to fuel the black PR mentioned at the beginning of this entry via essentially untraceable means. This was actually used by Rev. Falwell against McCain when he ran against Bush in the primaries the first time around. Remember the ‘illegitimate black baby’ rumor? That was orchestrated by Falwell(who supported Bush) and disseminated through his ranks as a whisper campaign. The same thing was done this time around with Obama being muslim.

    So once again, how to apply this to SEO? Well, first pick apart the definition of “authority”. An authority link for these purposes is not just a powerful link, but rather a powerful link that controls a niche. The higher visibility of your link within a certain demographic is worth more than it’s weight in the algorithm(or an endorsement on paper). It’s the reach it gives you, and your ability to use it to pull more power into your site.

    Bringing it All Together as an SEO/Marketing Campaign
    Ok. So we obviously don’t have quite the audience of a candidate, but that doesn’t change the fact that we can learn a lot from how they run their campaigns.

    So the first thing of theirs I’d hit on as significant is how to create your own citations. You put something up on a blog or on a site and are looking to try and get that message out. The first thing to look at would be the other sites that would agree/disagree with your idea/product/buzz, and find out which ones are most closely connected/more referenced by less niche and more mainstream media. These are going to be the goal. Saturating or getting mentioned in these sites is the best bet to get propelled into the mainstream. Find out the sites they mention closest to your own, and start advertising. Links, guest posts, whatever. Anything to get recognized by those that agree with you that are closest to the goal.

    Another thing I would get out of this is the benefit of anonymity. The ability to have sockpuppets, or closely controlled “media” outlets. The ability to start e-mail chainletters that are seen by millions, but never have a tie back to any entity. The ability to control not only the message, but the source. Source is everything.

    Hopefully I’ve managed to keep my own bias out of this(I’m a lib), but it’s pretty tricky to do.

    -XMCP
    PS: Am I correct to assume there’s a market here? You don’t generally see black PR companies in the yellow pages.

    The Super Linkable (and Defensible) Site Elements

    November 2nd, 2008 | admin
    Add to Mixx!

    Hey there.
    Finally got a few ideas down for entries, been a bit slow getting them out of the box. I went to the Scary SEO conference in October though, and it really got me thinking again(been so insanely busy lately I’ve hardly had time to just sit and think). So today we’re going to cover site structures/elements, and how to arrange them to milk the internet for as many links as possible using them. This is going to be a surprisingly whitehat post methinks, but I’ll add in some odd tricks to fill my shady quota. Massive thanks to Jeff Quipp for getting me to take Web 2.0 seriously.

    The Concept
    No matter how pretty your commercial site is, people are not going to be inclined to do that true *organic* link that we so desire. By building out non-commercial(read: Web 2.0) aspects of it, we can increase the likelihood of links. With a little bit of thought, we can use those to funnel their respective link juice into our mothership(commercial) site. There’s also a few off-site sites that we can build that would survive nearly any manual review, but can overtime get some decent power behind them.

    The Commercial Site
    Ok. So purchased links to commercial sites always stand out because in reality, people don’t find a miscellanious productX store to link to when they mention productX. So what we’re going to do is keep the homepage to having strictly defensible links. Links that are not questionable, (rarely if ever) purchased, and are just generally high quality. We’re setting it up this way because we’re not trying to drive the homepage up with external links: of all the pages on a site it has probably the least amount of plausible deniability. For it think high quality directories, press releases/media mentions(if possible), perhaps a few discussions of it around misc. related messageboards, nothing heavy duty at all.

    The Articles
    A few basic articles linked to from the homepage/menu constantly. These are going to be targetted to top search terms, generally answering a specific question. Think of them as the sub-keywords we’re going after. More content on these is better, since we’re going to be driving a substantial amount of our link juice to these at the expense of a lot of other pages(so we want to get some serious longtail action going). Get your <h#> tags under control, and really optimize these pages.

    The Blog
    That’s right folks. The articles aren’t the end of the content nightmare. The blog is going to be the primary source of links. That said, we’re going to keep it relatively weak itself. It can be used to weigh in on controversy, or if you have a commercial-friendly niche, can be used for linkbait. Sound boring? Not done yet.
    Strip down the theme so it has as few internal links as possible. Essentially a link to the blog homepage, some nofollowed categories, then the “Articles” mentioned above, and of course your home page. Outbound links are acceptable, but this is going to try and horde and target the link juice flow internally, using as little nofollow as possible hopefully. This means “next/previous post”, etc all is a no go. Obviously you do have to factor in (bleh) user experience, but try and maintain a simple blog structure if possible. Any post that wishes to survive and rank can do so on it’s own.

    Optionally, you can have the blog show a couple “hot” entries or something like that, if you want to cash in on some current controversy within the topic.

    The Gallery
    People like pictures. Basic fact, true, and can be applied anywhere. Screw flickr, put your own gallery on your own domain, and take the link juice yourself. Don’t disable hotlinking, and watermark the images. Prepare for some serious bandwidth damage, but it’s worth it. Social icons, pre-prepared text to link and display the image(both html/bbcode), the works. Don’t just throw any old images up here. If you don’t have any of your own and don’t mind scraping, do so(be prepared to take down images if requested though: it will happen).

    Most niches have a weakness for some kind of image. Stoner/Marijuana sites for example, will almost always have a image gallery(sometimes with voting) for different buds/plants. The ever-dirty “Make Money Online” niche seems to have a check fetish. You get the idea. This is another section of the site that is driving juice, not trying to rank itself(though image search can be fan-tastic when properly handled).

    If you’re really feeling masochistic(this requires real administration) you can even set up an image host for others in your niche(Niche-flickr kind of). Just make sure you actually watch what’s going up there to an extent. Spam e-mailers and kiddy porn junkies do so love free image hosts.

    The Software/Scripts
    This is one of the largely overlooked elements in link building.The application doesn’t even have to be good. It can be random and pointless even. Why? Because you can still get accepted on a bunch of different download sites, and you get links. The more platforms you port it to, the more places you can submit it to, and the more links you get. Think about it. You write a piece of software, and pay a bit of cash to get a PC/Mac/Linux version, an IPhone version, a php version, maybe even a stripped down open source version.

    That opens up a bunch of different niche sites you could never get a link from otherwise, that will gladly(and normally freely) give you a link simply for a piece of software. Sticking to the iBeer example(linked to as “random and pointless”), it has managed to get over 2000 links and it’s only available on the iPhone. Or for a more useful script, the SEOBook keyword tool, which has over 3000 links now. Just make sure there’s some reason to link to your actual site, and not just the executable. FAQs, installation guides, anything like that is tremendously useful.

    The Off-Site Linkbait Blog
    Having issues with linkbait due to a blatantly commercial domain? Pick up a new, non-commercial domain for it. 301 redirect it after a given amount of time to a page on your real site, with the content still in place. SearchEnginePeople has a redirect plugin for wordpress that can let you schedule when it’s time to redirect each post over.

    The Employee (Fake or Real) Blog
    If you have employees, get them set up with personal blogs. If not, make fake employees and just pay someone to update the blogs. These don’t have to pass much juice(it’s hard to get links into them) but can be used as links that are normally above question. Put them on different IPs if possible.

    Adding a Bit of Shadiness

    • The real beauty of all these different features is that they get so much more latitude than a flat commercial homepage in terms of the links they can get.
      • Linkbait for example, can be dropped in dozens of places under different names or through different associates and it looks like a legitimate viral effect.
      • Imagehosts and gallerys can both be watermarked and used by paid posters in different forums(find people actually interested in the topic; not a clueless pro-poster). The image provides the hotlink, the watermark gets the site name out.

    Ok. So no, I’m not going soft on everyone (I surprised myself as I typed the words “user experience”), but more and more the trick is becoming how to look like the cleanest site in a niche, while putting in the minimum amount of time. The elements I’ve described above get much more latitude, and can have nearly any BH tactic you want(except perhaps cloaking) applied to them in an intelligent way, and they will survive much longer than any other method I’ve seen so far.

    -XMCP

    Marketing & SEO Blogs - Blog Top Sites
    © Slightly Shady SEO, All Rights Reserved. Scrape me, and I will eat your soul.