0:09
Hey everybody, this is Matt Cutts coming to you from the Googleplex.
0:14
And, why are you watching this talk?
0:16
Well, if you weren't able to attend the WebmasterWorld PubCon conference in Las Vegas, I thought that I might sort of recreate the talk that I gave, which was a "State of the Index."
0:27
So, what has Google done in 2008, what does the outlook look like for 2009, and then that way, you know you guys could see what you might of missed at the conference.
0:36
Other things you might of missed included a lot of SEOs and webmasters talking about search.
0:42
So I thought I'd give the director's cut version in this talk. If you don't want to stick around for the whole thing, you don't have to.
0:51
If you want to make it more fun, turn it into a drinking game. You know, you're at home, you're in the privacy of your own home.
0:57
If you want to take a drink every time I say "webmaster" or "backlinks," feel free. I'm not going to get offended.
1:03
So let's jump right into it. So, talking about the state of the index.
1:07
One unusual aspect of this talk is that I was on a panel, it was a super search engine smack-down panel.
1:15
We don't usually smack each other down; we're usually very polite.
1:17
But the Ask Jeeves guy didn't show up. We had one guy from Microsoft, and one guy from Yahoo!.
1:24
And the Yahoo! guy was Sean Suchter, who evidently has just left Yahoo!, and might be going to Microsoft.
1:30
So we didn't know that at the time, but just a sort of unusual occurrence.
1:35
But it doesn't affect what I'm going to talk about today, which is what has Google done in 2008, and what are we probably looking at in 2009 as far as search trends?
1:43
One of the big pieces of news is Google Chrome. I'm using it right now on this laptop.
1:48
I've been using it for months, it's wicked fast. If you haven't given it a try, I definitely recommend that you give it a try.
1:54
The Chrome team, they update the binary, especially if you go for the developer's release, like once a week.
2:00
So it just keeps getting better and better and better. They don't wait six months in between releases.
2:04
It's a really high-quality piece of software. Another release that we had is an operating system.
2:12
Now, it happens to be an operating system that's primarily for mobile phones at first, but Android is pretty exciting.
2:17
And you can use it to write your own applications. I wrote my own "Hello, world!" in Java. It was about ten lines of Java.
2:25
And so it's pretty neat that you can just have a little mini, almost Linux/Unix computer in your pocket. And so you can run SSH and all that sort of stuff, and you can run any application you want to.
2:36
As far as other stuff that Google's done in 2008, better machine translation, so that's kind of nice.
2:43
You can translate from different pairs of languages. Also, better voice recognition.
2:48
And I was sort of hoping that Google might release its iPhone Google Mobile App which does voice recognition in time for the talk, so I could talk about it.
2:57
But now you're watching this a little bit later, by now it's already come out.
3:01
So it works really well. I would recommend that you give it a try.
3:05
You can do all sorts of stuff by voice, and it recognizes your voice much better than a lot of other applications do.
3:10
We've rolled out Google Suggest. Took us some time to make sure that we get the interface just right, that it doesn't affect the user experience, that it doesn't slow people down.
3:18
And another thing, which I believe is just launching today, it's on Techmeme as I'm recording this with the help of Wysz, is SearchWiki.
3:26
And so SearchWiki is something that lets you sort of move the result up to the top of the search results, or you can click on a little X and this thing just disappears completely, which is kind of fun.
3:37
And just to reiterate, SearchWiki affects only only your current, you know, only your search results. It doesn't affect anybody else's search results.
3:47
And so it's kinda fun, but please don't try to go spamming it and all that sort of stuff. But SearchWiki is definitely very nice.
3:53
We've also continued to improve personalization and universal search.
3:57
We had a ton of small things. I couldn't begin to scratch the surface of all the small neat fun things we've done.
4:03
One particular favorite of mine was a search index that we revived from 2001.
4:08
So data from when we were crawling the web in 2001, and I have to say it was so much fun to do a search for let's say like [Levitra] and [Cialis], and those products didn't exist back then, so there was no webspam for them back then.
4:23
And so I was just pining for the old days. Oh, I only had to clean up these queries, and oh, we only had to figure out the algorithms to improve search quality on these sets of queries.
4:32
So it's kind of fun to go back in time, like to 2001.
4:35
Video and voice chat in Gmail, and one of my favorites is if you look at this next slide, the ability to track the flu by looking at people who have typed in "flu," "flu symptoms," "Am I coming down with the flu," "cold symptoms," things like that.
4:50
And so in the slide I highlighted that Kentucky, which is my home state, has a slightly elevated risk of getting the flu right now.
4:57
And it really just works because you have a lot of different search data, much of which can be used for really useful, good purposes like this.
5:04
But not everybody realizes that if you go to the next slide that we provide that data to a lot of different people in an aggregated anonymous form.
5:13
So, we have a product called Google Trends. And in this one, I took Obama and McCain.
5:19
And you could see the search volume for Obama and McCain. And you could see that Obama has had a big spike in search volume, whereas McCain hasn't had quite as big of a spike in search volume.
5:29
And you can do this for anything. You can look at, you know, people searching for swimsuits, people searching for flowers.
5:35
You'll see little spikes around Valentine's Day, and you'll see little spikes around Mother's Day. And so you can learn all sorts of things about how you can make a better website or be a better marketer, or just learn more about what people are thinking by looking at this anonymous aggregate data.
5:51
And it's not just for queries, either. We have something called Google Trends for Websites, and so you can type in two or three or four different websites, and you can sort of compare how much traffic they're getting according to Google.
6:05
So this conference was put on by WebmasterWorld, it's called PubCon. So I, in this slide was making the point that WebmasterWorld is totally trashing me on traffic. A forum will get a lot more traffic than a blog a lot of the times.
6:18
But then I was actually getting a lot more traffic than pubcon.com.
6:22
And so you can do that for any sort of website. And it it will also show you related sites.
6:27
So if you were to type in my site, mattcutts.com, you'd see people that went to my site also went to, and you'd get a list of like five or ten other websites that you could check out.
6:35
So that can be a really useful tool.
6:37
As far as tools for marketers in general, I had a couple slides, I'll go through them quickly, I don't want to be too salesy.
6:45
But Google Ad Manager is something that a lot of people like. I believe it's free, and it lets you manage all of the ads on your site, including whatever third-party ads you want to have.
6:54
And so it's just a very nice way that you can sort of get started and have very professional ads.
6:58
And Google Ad Planner lets you slice and dice demographic data. So you can say okay, I'm interested in targeting sites that are primarily visited by women who are over 40, you know, and all these sorts of demographic categories.
7:11
And it's a lot of fun. Once you find those sites you could approach them directly to advertise, you could advertise through Google, but it just shows you the power of advertising and ROI and how well you can use that.
7:24
But, people are not just advertisers and marketers, let's talk about webmasters. Webmasters are asking, "What have you done for me lately in 2008?"
7:32
So let's break some of those down. One thing that is really kind of cool is that we've introduced optical character recognition in some PDFs.
7:40
So some PDFs actually have you know, text that you can extract out. But some PDFs are nothing but a snapshot of what the page would look like.
7:49
And Google has taken its optical character recognition systems and made it so that we can take those pictures of text, run optical character recognition on them, and then index those just like they're regular text.
8:01
So that's really handy if you have a PDF that wasn't accessible before.
8:04
Another big change is better crawling of Flash. And I'm not the world expert on this, but as I understand it, Adobe essentially provides an API such that a search engine can march through the different states, you know pretending to a user and clicking on various things,
8:18
and iterate through those states in a Flash file, and then at any point can say, okay, what is the state, so they can pull text out, for example, and index it.
8:26
Now a lot of people see this, and they say, great, I don't have to worry about Flash and search engines, and life is good, I can just make all of my stuff in Flash.
8:35
And I showed one slide, which, I didn't want to be a wet blanket, but I do want people to consider all the different aspect of having Flash.
8:42
I'm a little bit of a yogurt fanatic, I actually went to five yogurt places in one day at one time. I was trying out a bunch of different spots.
8:51
So, Pinkberry is a yogurt place that's very popular down in Los Angeles. Red Mango is a place that has red yogurt, has regular yogurt up here in the Bay Area.
9:02
And this slide shows Pinkberry's website on an iPhone versus Red Mango's website on an iPhone.
9:09
Pinkberry detects that you're on an iPhone, probably by the user agent, and shows you something that's specific to that iPhone. Which is great, you get a very useful thing where you can look up Pinkberry locations.
9:17
On Red Mango, you sort of get this blue cube of Flash, and that's all you can see. And so you have to think about that fact that some of your users come from different places around the world, and they don't all run Flash.
9:29
So it's always good to have at least some accessible links or static HTML that you can use so that search engines and other things like an iPhone can get some useful content out of your website.
9:41
So let's keep drilling on other things that Google has done for webmasters.
9:45
In terms of webspam, we've worked hard on getting better at keyword spam and gibberish, especially in a lot of different languages. We've targeted a lot of languages for improvements in 2008.
9:56
And we've also, this is in addition to the director's cut, I forgot to mention this in the original talk, but we're doing better processing of JavaScript.
10:04
So you'll see some people saying, oh is Google able to process some JavaScript, and webspam and our crawl and indexing teams are doing much better at trying to find URLs and process JavaScript.
10:14
And then the last couple things, I thought I'd give a quick demonstration on the slides, you can say, okay, of all the people who link to me, there are some people who get it wrong. Right?
10:25
There's always a few people who link to a 404 page on your site. And if you could just find out those people and write to them and say, hey thanks for the link, but you linked to something that doesn't exist. Would you mind kind of changing your link to be the right page?
10:40
That's almost like free links. And so we have an interface which you can see on this next slide, there are actually 28 different sites which link to mattcutts.com/blog/asdfasdfasdf.
10:54
And that was deliberate on my part, I was using it as an example for a 404 post. But if that were a real broken 404, I could write to those 28 people, or maybe just the top two or three that I think oh, I've heard of them, ZDNet, Google Blogscoped, can you guys point to my real site, instead of this broken URL?
11:10
The other thing that we've done with 404s is we've made it really easy to improve your 404s.
11:16
Sometimes people just have a simple 404 page and they kind of lose out on good traffic. Because you don't help people find good pages on the site, other than, "Sorry, 404."
11:27
So this slide shows an example with 14 lines of JavaScript. There's only a couple of parameters you want to set.
11:34
A language, and what your site name is. And then you put that in your 404 template.
11:39
So when anybody lands on your page and gets a 404, this JavaScript can run, and can suggest other pages on your site that might be useful.
'
11:52
I think Microsoft has something like this, but I think it has to run on IIS. It can't run on any kind of server like Apache, for example.
12:00
So this is just a simple little thing that fits in your template, it's just 14 lines of JavaScript, and if you search on the Webmaster Central Blog you can find out more information about how to install this 404 JavaScript.
12:12
So just drilling through a few other things that we've done for webmasters. It's not all just you know getting links or buying advertising.
12:22
If you're gonna be a smart marketer, you also want to think about analytics. And you can use any analytics package you want, Google has a free package, Google Analytics, that works quite well.
12:33
And they introduced advanced segmentation this year. So you can do things like, I want to compare how my paid clicks have converted or performed in terms of return on investment versus my organic traffic.
12:44
And you can slice and dice in all kinds of really useful ways.
12:47
In green, I have highlighted on-demand indexing for Google Custom Search Engines. And that was announced 10:30 in the morning of this particular talk.
12:58
And the idea is you can for free get, you know double-digit number of pages where you tell Google, I want to have this page indexed. Hey, you know, index this page for me.
13:08
And Google will go and fetch those pages, and within 24 hours, and usually I believe much faster, index those pages.
13:14
Now they don't affect the main Google index, they affect the Google Custom Search Engine, but if you have content that changes pretty rapidly, this can be really really helpful for you.
13:22
Webmaster APIs, you can get access to your data whether you're a webhost, you can use GData.
13:29
And then a translation gadget, which is kind of handy. I know at least one charity, their website is in English but they have a lot of Chinese people, because it's a charity for Chinese children.
13:41
And so they have a translation widget right there, because they don't want to translate every page into Chinese, but if someone can just click and very quickly using Google translate a page, hey, you know, all those people visiting from China have a better chance to read your content.
13:57
Totally free, everybody's welcome to use it.
14:00
And then we have a slide about webmaster communication. Because we have done a lot of webmaster communication in 2008.
14:08
We've done more videos, mmm? Mmm? As you can see. And we've also done all kinds of webmaster chats.
14:13
In the last webmaster chat we had over 700 people show up for the last call. And just to give you a quick little secret, a lot of times we'll have an agenda where we do a big talk and three little talks. And then we'll have a Q and A at the end.
14:26
And at least I always try to show up and stay for a nice long time to do Q and A. So we used Google Moderator to vote questions up or down.
14:35
We had like a hundred people from Google, a hundred questions get answered by people from Google. And so we skirted around the top popular questions, we looked through the random ones.
14:45
And that worked pretty well, so we'll probably continue to do Q and A for the webmaster chats.
14:50
We've started a series of blog posts about what it's like being in search quality. So in case you can't make it all the way to the Googleplex you can find out about how we do some evaluation, how international works, how we think about poor search quality.
15:03
And we've also started to blog about webmaster issues in other languages. So we've done lots of posts in lots of languages, but we specifically have a German, and a Spanish, and a Chinese webmaster blog.
15:15
So we can take some of our best greatest hits posts to educate people and translate them into those languages, and we have a lot of country-specific posts as well.
15:23
A few other things, you might not have noticed this, but if you register for Webmaster Central, we will keep messages waiting for you until you register.
15:33
So it used to be that if you had never registered, and maybe your site had been hacked, if you signed up, only from that point forward would you start to get messages.
15:43
So we changed it earlier this year so that those messages would be waiting for you as soon as you sign up. So that can be really really handy for people who didn't know about Webmaster Central, and then when they sign up they find really useful information for them.
15:55
We started to identify some sites that are hackable. We started with a version of WordPress that's known to be hackable, and as we scan and crawl across the web if we see a little signature that makes us think your site might be vulnerable, we'll just drop you a little message in the Message Center, which is another reason why you might want to try out Google's webmaster console.
16:13
And then another thing that we did for this conference is we produced a 22-page PDF talking about SEO. And it's sort of an SEO beginner's guide, a 101 sort of level.
16:24
But a lot of the times people need something they can just give to somebody and say okay, here's where to start.
16:30
And so whenever we get queries from other Google properties, this is the PDF that we give to them and say okay, here's where you can read about SEO.
16:38
So we wanted to produce something that we could use internally, but that other people outside of Google could use as well.
16:44
Brandon Falls wrote that. He did a really great job. One thing that I love is that it sort of pops this myth that Google hates SEO.
16:51
SEO is like, you know, how do you polish your résumé? You want to have a good résumé, you never want to lie on a résumé, but there's nothing wrong with trying to figure out how to put your best foot forward, or how to interview well.
17:02
And in a lot of ways, a good SEO can help you do that. So a lot of people think Google hates SEO, or all SEO is wrong, and that's not the case.
17:11
And that PDF guide, 22 pages of it, helps to prove that.
17:16
And then, now we've talked about what's been going on in 2008 with Google, especially with webmasters, let's talk a little bit about 2009 and what we expect to see.
17:27
If you watched my previous video about the Web 2.0 Summit, you'll notice that I cribbed the next three slides, so if you want to take a little drinking break or you know, bathroom break or whatever and come back in three or four minutes, you won't be missing much.
17:41
This is a slide that shows that Ask Jeeves had a site called jeevesretirement.com where Jeeves had decided to sort of take a boat off into the sunset or something like that.
17:52
Unfortunately, Jeeves didn't renew his domain name. And so a porn spammer got it.
17:57
And so one of the big trends that we see in 2009 is continuing black hat trends, and just the normal sort of stuff where people grab domain names and put porn spam on it.
18:06
I gave you an example of what it looked like, but I changed all the nasty porn swear words into like "shiny ponies" and "rainbows" and "smurfs" and all that sort of stuff. But you don't want to go to that site.
18:17
It'll give you a headache if you go there.
18:19
So that will continue. You know, you'll continue to see people grab expired domain names, put porn on it, you know, the normal sort of black hat stuff.
18:26
But you'll also see black hat start to get more and more malicious. So I wanted to throw up a couple examples, or throw out a couple examples of that for you.
18:35
I met a Googler, he was a brand-new Googler, a Noogler as we call them. And he set up a meeting, and so I decided to Google him using the Google search engine to find out some information about him before I got to meet him.
18:47
And you can see this on the slide what I saw, the snippet for his name for his website was "xxx simpsons yourdirtymind 3d sex comics dadadadada" and so I got to go to the meeting and say, hey, nice to meet you, welcome to Google, glad you're here, but your site's been hacked.
19:06
And that was a really kind of, it's stressful. It's traumatic whenever your site gets hacked. And unfortunately, it's going to be more common.
19:14
This guy hadn't installed any software whatsoever. So as best as we could tell, it was his web host who had had some sort of security problem.
19:21
And you can see if you go to the next slide, they weren't pointing to any of their own domains, they were pointing to other pages that they had taken over or other sites that they had hacked.
19:30
And so the trend is these spammers will just try to get links, sometimes they'll install malware. They won't even buy their own domain names.
19:38
They'll just attack these other domains, parasitic hosting, whatever you want to call it, and just try to get links from all these hacking exploits.
19:45
And so you might ask yourself, you know, what are you willing to do to get rankings? Will you, you know, will you do blog spam? Will you do guestbook spam? Will you do referrer spam? Will you grab expired domain names and put porn on them? Will you hack other people's site, which is where we start to move into the actively illegal area.
20:03
Or, and this is one of my favorite slides, there's no pictures on this slide, but in my head, I call it "three felonies on one slide."
20:14
So let's walk through exactly what was going on here. There was a black hat spammer who took a normal domain name, in this case it was artandstyle.com.au, and they did DNS subdomain hijacking.
20:27
So there's, you know, some percentage of the web is vulnerable to this attack. And what you do is you try to what's known as poisoning a DNS cache where you try to take control of another person's subdomain.
20:38
And it's, as I understand it, probably pretty darn illegal. This guy did it with thousands and thousands of subdomains, and then those hacked subdomains linked to keywordspy.com.
20:52
And if you look in the URL of keywordspy.com, they were trying to take advantage of a cross-site scripting flaw.
20:59
So whenever you load that, that particular URL, it would actually include a frameset from a completely different webpage. So now, exploiting cross-site scripting holes, also really bad thing to do, because I'm pretty sure that's illegal too.
21:13
So if you visited the keywordspy.com page, it would load this IP address. And what's the payload on that IP address? It tries to install malware on your machine.
21:24
This guy was doing three different things, all of which were really really scuzzy. And at that point, you kind of have to ask yourself, if this is what it takes to compete in the black hat SEO world, if I have to be illegal in two or three different ways, if I have to deface all of these different sites and run a much higher risk, is it worth it?
21:43
Or, should I concentrate on trying to make sure that my sites are more white hat, more legitimate, and are more likely to stand the test of time, not just last for a few hours or a few days before it gets caught?
21:53
So I think if we look forward, if we look at the conclusions, what is the state of the index? Google's done a lot to communicate with webmasters.
22:01
We're going to keep rolling out fun products. We're going to keep improving our products a lot.
22:05
But, there's a really interesting trend if you look towards 2009. And that is that black hats will continue to veer towards doing more and more illegal things.
22:13
And so if you're an SEO or a webmaster, you need to decide what is your risk tolerance.
22:18
You know, are you willing to do the illegal sort of things that it might take to compete in some of these really really scary areas?
22:25
One thing that is not up for debate is that I believe Google will keep communicating with webmasters. All the people who want to make legitimate content, white hat content, good content that stands the test of time, we want to help you market that, make it discoverable, get visitors to your site.
22:40
And we're going to keep looking at ways to provide tools to webmaster. So one of the things we've heard is people worrying about duplicate content.
22:47
Canonicalization, and that's a very long word, a fancy word that essentially just means if you have two URLs, which URL should be the preferred URL, or the canonical URL?
22:57
So we'll keep looking at tools that we can provide such that webmasters are able to decide that sort of stuff and are able to pick out the URLs that they want on their own to rank well.
23:06
Okay, well if you made it all the way through that talk, all 20-plus minutes of it, congratulations. It may have been a little dry, but it was the director's cut.
23:17
You got the full version, plus a few extra tidbits that you wouldn't have gotten if you'd gone to PubCon itself.
23:23
However, if you thought it was a little boring, and you were playing the webmaster drinking game, then just like Steve Ballmer used to thrown his developers stuff, I'll throw it in, webmasters, webmasters, webmasters!
23:36
Thanks again for sticking around.
No comments:
Post a Comment