0:09
Hi everybody. My name is Peter Linsley, I'm
a product manager at Google, working on image
0:14
search. Now what we thought we'd do today
was to run over some slides that I presented
0:19
at SMX West in February 2009. Just a couple
of weeks ago. And the slides were very, um,
0:27
sort of a high level introduction to image
search. So first of all I thought we'd run
0:31
through the presentation that I gave at SMX
West and then afterwards I'll run through
0:35
some of the questions that came up that seemed
like topics of interest to the webmasters.
0:39
Okay, so I'll start the presentation.
0:42
So first of all, our mission with Google Image
Search is to organize the world's images.
0:49
We put a lot of focus on satisfying the end
users. So when they come with a query, and
0:54
they have an image that they're looking for,
our goal is to provide relevant and useful
0:58
images for that query. And of course the theory
here being that if they find what they're
1:02
looking for and they enjoy their experience
they'll come back and use us again. What I
1:06
wanted to get out of this talk as well was
to start to engage a little bit more with
1:11
the webmaster community. If we look at what
has come out of various conferences like this.
1:17
Where web search representatives from different
companies have gone out and had a conversations
1:22
with the webmasters and found out what their
pain points were and we found this sort of
1:26
ad hoc consortium came together and came up
with things like the Sitemaps standard or
1:30
they came up with rel nofollow and well they
came up with robots wild cards and things
1:34
along those lines. So one of our hopes in
Image Search is that we can try and start
1:40
this dialogue and find out what sort of pain
points you guys might have as webmasters.
1:45
Where we, you know, the likes of Google and
also other search engine companies can try
1:49
and come together and try and enhance the
end user experience by finding easier ways
1:54
for you guys to get your images both indexed
and ranked.
1:59
So I'm just going to move on to the first
slide that I had. I wanted to paint a little
2:03
bit of a picture of Image Searches. What they
do and why they might be slightly different
2:07
to the kind of audience you might be used
to with Web Search. First of all Image Search
2:13
appears in a lot of places beyond images.google.com.
You've probably seen images appear in Universal
2:22
Search, so whenever you do a query like "pictures
of San Francisco" there might well be a portion
2:27
of the results page that's dedicated to showing
images for that result. And the theory here
2:31
is very much in line with our goal in Image
Search which is that we're going to show you
2:36
those results when we believe that they are
very useful and informative and relevant to
2:39
the query. Images also appear in other places
like on Maps. You might have seen a little
2:46
row of images in Maps which come from our
Panoramio.com property, which is a really
2:50
cool product if you haven't seen it before.
So images appear everywhere across all of
2:55
our properties and we're really just trying
to align it with when they match the user
3:00
intent or they enhance the user experience.
Image searches also have a very unique search
3:05
behavior. They are very different animals
to web searches. If you think about the paradigm
3:11
when they do a query, it's not so much about
what's the first result. We don't really have
3:15
this sort of I'm feeling lucky paradigm. It's
more about saying here's a query, well here's
3:19
20 images that you might like. And users can
consume those images in a heartbeat. And if
3:24
the image they happen to like is at the bottom
left hand corner or the bottom right hand
3:28
corner than so be it. They'll see that, there'll
be something about that image that attracts
3:32
them and they'll click through.
3:34
The other thing that they do is that they
search a lot of images. So there's a lot of
3:38
next paging going on, they'll go very deep
looking for the image that they like. One
3:42
of the reasons why this happens is that a
lot of queries we see are very subjective
3:46
in nature. So if you see a query like "waterfalls"
then the waterfall that you like and the waterfall
3:52
that I like might be on two very different
pages and there's no way as a search engine
3:55
we can figure out what you're looking for.
So there's a lot of next-paging, users can
4:00
consume results very quickly and it's just
interesting to think about what that might
4:04
mean for you guys as marketers. That it's
not all about being in first position on the
4:08
first page.
4:10
There's also a lot of novel use cases in Image
Search which might not be apparent. Users
4:16
use image search for inspiration. They want
to get a haircut or a tattoo and they're looking
4:20
for ideas. So "Tattoo ideas" and then they
go through the pages looking for some inspiration.
4:26
Quite often they'll refine their query, there's
a lot of sort of exploring and this sort of
4:30
browse with intent. Users also use image search
for shopping. They use it for research, health
4:37
queries, or sometimes they use it just to
kill time, just for the fun of it. Another
4:42
really interesting use case that we've seen
is using Image Search as a visual dictionary.
4:47
So there's a Googler in Germany who's learning
German and he, if he hears a noun or a word
4:54
that he's not too sure what it is, he'll type
it in. And he knows exactly what the word
4:57
means. Even though he doesn't look it up in
a text dictionary.
5:01
Okay, so this is the slide on How Does Image
Search Works. Simply put, as a webmaster you'll
5:08
see Googlebot come along and download the
HTML as normal. Then what happens is, we pass
5:15
through your page and we look for references
to images. And typically references to images
5:19
can come in one of two forms. It's either
an href where you're linking to an image directly,
5:24
or its an inline image, in an image
source tag. Then what happens is we come along
5:30
and crawl the images and then go through this
process of classifying it. Now what we're
5:35
trying to do here is to figure out how to
bucketize this image correctly and one classification
5:42
we do is to work out if its a photograph or
not. Another one might be "does it contain
5:47
a face?" Other buckets might be things like,
is it line art? Is it black and white? Or
5:53
is it an unsavory image that we want to put
under Safe Search , only show when Safe Search
5:59
is disabled. So this sort of classification
goes on and the reason we do that is we found
6:05
that image searchers really like to slice
and dice their results. They like to do a
6:10
query and look at it and say "Well, these
images are sort of nice, but I really wanted
6:13
to see just images with faces in them. " So
if you've seen across the top of the results
6:17
page there's a blue bar which contains some
drop-downs where you can actually filter the
6:23
results down to just photographs or just faces
or just line art and so on and so forth. And
6:30
these filters tend to get used quite heavily,
so we like to try and bucketize things off
6:36
so that they're shown in a more relevant context.
6:40
Finally of course the images are indexed.
And that's where we squirrel them away and
6:44
we have an index of the image with all of
the text that we associate with it, with that
6:48
particular image. Another part of this process
is about identifying the duplicates. So if
6:54
you think about the way images are typically
deployed online, you might put an image up
6:59
and a particular page will refer to it, another
page might refer to it, you might have other
7:04
pages on your site that refer to it. Every
now and again an image will get copied. And
7:09
maybe it gets copied as is or maybe it gets
transformed ever so slightly. But as far as
7:13
the user is concerned it's still very much
the same image. So the next process we go
7:17
through is one of trying to cluster all of
the very similar or identical images and try
7:22
and treat them as one. And this is very much
the same as the way things are done in web,
7:27
when web pages are analyzed for duplicates
and then one sort of canonical winner is picked
7:34
out of that entire group. So the same thing
happens with image search. We try and identify
7:38
all of the duplicates, and again, the main
reason for us doing this is that when somebody
7:42
comes in and types blue widgets we really
don't want to be showing them exactly the
7:45
same blue widget twenty times. We want to
try and cluster those together and say, well,
7:48
here's one interpretation and here's another
one. So there are multiple images, our goal
7:55
is to try and cluster these and figure out
which is the best one. And at the same time
7:58
we have multiple pages that are including
that image. And another task is at run time,
8:03
at query time, to try and figure out which
one of these referrers makes the most sense
8:07
for this particular image that we've chosen.
And the answer to this is we try and choose
8:10
the best one. We try to choose the best image
that meets the user's intent more accurately
8:18
and maybe it's about size or maybe it's about
quality or something like that. And the referring
8:25
page that includes that image is selected
based on how good it is essentially. And that
8:30
could be one of many things such as its relevance
to the actual query itself.
8:37
And finally, ranking is performed on a whole
lot of signals. And typically we don't go
8:42
into the details of the signals but its very
much like web, there's more than one signal
8:48
that we use to try and figure out what the
most relevant image would be. So the next
8:53
slide, is on best practices. You're sitting
there thinking, "That sounds great, I've got
9:00
good images that I think are going to be useful
for your users. What can I do about it?" Probably
9:05
the best bit of advice we can give is to really
focus on the user. Now you might be thinking
9:10
what exactly does that mean? What can I go
out and do tomorrow to focus on the user?
9:15
The answer is pretty simple. If you think
of a user that comes to Google Image Search
9:20
and what they might be looking for, and if
we take one use case such as coloring pages.
9:24
Maybe they're looking for a site that has
a lot of coloring pages and they choose to
9:27
use Image Search to get there. Then the first
thing they're going to do is come along and
9:32
type in "coloring pages" and they're going
to look at the results. And maybe they see
9:37
something they like, maybe they don't, they
might hit next page a few times. And all of
9:42
a sudden one image will catch their eye and
they like it for some reason. Maybe its just
9:46
the quality of the image itself, or maybe
its the snippet, maybe there's something about
9:50
the size or host name that sort of draws their
attention. Maybe its coloringpages.com "I
9:56
know that site! I'm going to click through,
I trust it". Then they land up on your page
10:01
and the question is: What sort of experience
are you immersing them into? What sort of
10:05
experience are they getting now that they've
come to your page, given that they were looking
10:09
for coloring pages. Do they see the coloring
page they just clicked on above the fold?
10:15
Is it large enough? It's one thing to send
people to a coloring page page where you show
10:21
them very small thumbnails and its another
thing to say this is what you just clicked
10:24
on, here it is, here's some descriptive text,
here is some related pictures, here is some
10:33
comments from the users, ratings and all sorts
of things. It's really about immersing the
10:37
user into a very image-centric experience.
These are the kinds of landing pages and the
10:43
kinds of images we've observed that our users
tend to like. And again, our intent is to
10:49
try and match up the intent of the end-user.
So focus on the user, and high quality images
10:55
are always good. If you're taking photographs
to put on your site go and buy a digital SLR
11:00
and learn how to use it, get a good lens,take
really nice high quality pictures. You don't
11:06
necessarily have to show, take up the whole
screen with the photograph or the image, but
11:10
just you know large enough is usuallly what
the users like. Above the fold and plenty
11:16
of descriptive text. And fundamentally the
impetus to all of Image Search is a text query
11:24
and the extent to which you have a lot of
descriptive text that is on topic and talks
11:28
about what is in the image, maybe you want
to expose EXIF data, maybe you want
11:33
to talk about when the image was taken, maybe
it has a nice title across the top. All those
11:37
sorts of things are really good clues for
us to figure out when an image is relevant
11:41
or not. But more importantly its useful for
the end user. They can read the description,
11:47
read the caption and learn a lot more about
the image.
11:51
The last slide I talk about resources. We
have Webmaster Help Centers where you can
11:55
go and read a lot more about Image Search.
We also have forums where you can post questions
12:00
about Image Search. We really really encourage
webmasters to come to these forums and post
12:04
all of their questions or concerns. We monitor
these very closely, we pick up these concerns,
12:11
and we'll take a look at them. There's also
Web Search Help and Forum for end users so
12:20
if you're an end user of image search and
you have questions you can leave them there.
12:24
The other thing is to monitor the Google official
blog because that's where we typically put
12:28
our announcements of new features and changes
and news and what have you, specifically around
12:33
image search. So that was the end of my presentation
that I gave at SMX in a nutshell.
12:40
So at the end of the presentation we had Q&A
and I wanted to pick up on a couple of questions
12:45
that came up during that time. The first question
was "Hey, you guys mentioned large images
12:52
are a good best practice but I have a concern
with that because I don't want to load up
12:56
the really really large version of the image
that I have because it takes the page a whole
13:00
lot of time to load up. How do I manage the
trade off there?" So I think the answer to
13:07
that question is to show an image that is
large enough. Typically two-thirds of a screen
13:12
is one sort of rule of thumb. The point here
being that users tend to like to actually
13:17
be able to see the image as opposed to it
being a very small thumbnail. So a good way
13:22
to get around this, to allow users to see
the larger version if they wanted to is to
13:28
turn your image into a link to either the
large image itself or another HTML page that
13:33
includes a larger version of the image. Ultimately,
nobody really wants to see an image that is
13:37
larger than the browser size. So, another
question that came up was about Analytics.
13:44
Somebody was saying "Hey, can I get Analytics
information from the traffic that's coming
13:49
from Image Search?" The answer is "Absolutely".
In the referrer string that we send across,
13:55
well, that the browser sends across, both
the query that ranked the image plus the image
14:03
itself is sent in that string. So one slight
difference with image search, of course, is
14:07
that we're not necessarily sending people
to that page as much as we're sending people
14:11
to your image on the page and there could
of course be more than one image. So by passing
14:16
it by the referrer string you should get all
the analytics you need to know. What query
14:20
sent the user there and what image sent the
user there.
14:24
So that about wraps it up. I hope you found
this useful. By all means drop all of your
14:30
questions in Webmaster and Help forums and
we'll swing by and take a look. Thanks.
No comments:
Post a Comment