Build a Site Search with the Google Search API
Published on Wednesday 16th of November,
2005
Google provides several APIs, including one for
their web based search, another for their desktop toolbar, and one
for their Adwords program. Here we look at creating a custom site
search with PHP and the search API.
A quick note before getting into the meat of
the article. If you happened to make it thru last week's lengthy instalment, you should be well
prepared for this article. I'm going to do my best to make this one
a bit more concise, so that readers can get to playing with the
API. Apologies if it seems that I gloss over anything. Code is
available for downloading at the end of the article.
The formalities: Getting started with the Google APIs
Develop
Your Own Applications Using Google is the homepage for Google's
APIs. From this page you can get started by registering for an API
Key, downloading their developers kit with some example code, and
head over to their terms of service and some help and Faqs
(their is also a Google Group).
Visions of Dollar Signs, Dancing
Before anyone gets too worked up over this API, note the following
from their terms of service:
Can I develop commercial applications using Google Web
APIs?
You can develop any application you want, but you must abide by
the Google Web APIs terms of service. One condition is you cannot
create a commercial service using Google Web APIs without first
obtaining written consent from Google. Another is that you can only
create one account for your personal use.
The Google Search API
Much like our look at the Yahoo! Search API, we will be focusing on
the Search API to build a site search.
For their main search API, however, Google offers us
doGoogleSearch(), doGetCachedPage() and
doSpellingSuggestion() - these allow you to do what
you would expect:
- Given Search terms, return results
- Given a web page, return the cache file
- Given a word it returns spelling suggestions
Getting creative, you could allow a user to do a search query,
then fetch the results and the resulting cache pages, take an
automated screenshot of the cached pages (trimming out the Google
cache info from the top of the page) and provide a set of linked
images with the search results.
Steps to building our site search
Much like the last instalment, we will be taking input, in the
form of search terms from a user, and building a request that we
will send to the API server. We will then take the response,
unserialize it and format the results into some html.
Google uses the SOAP protocol (specs, W3Schools,
Wikipedia) for sending data between it's API server and
your application. PHP 5 can handle SOAP natively, however for PHP 4
we need to use an external library to send, receive, and
unserialize our communications. As you can see, this is much the
same process as with the REST powered Yahoo! example from last
week.
There are many SOAP classes available for use, and since we'll
be using PHP (4) in this example, I've chosen to use NuSOAP. You may also want to check out the PEAR SOAP
class.
Note: if you are playing
with nusoap on PHP 5, it will throw an error as the nusoap class
has the same name as the SOAP client in PHP 5:
soapclient.
Step 1: Request and receive
The first thing we need to do is build a request to send to
Google. We will do this by setting our search parameters in an
array:
-
// Build an array with the parameters we
want to use:
-
$params = array(
-
'key' =>
'yourGoogleAPIKeyHere',
-
'q' => 'Search Terms Here',
-
'start' => 0,
-
'maxResults' => 10,
-
'filter' => true,
-
);
-
-
Download this
code
Site search
In order to make a site search, a couple of things need to take
place.
- A form that passes the search terms must be used, and when
submitted those terms must be passed into the value for
q in our array above.
- As Google doesn't provide a parameter for a site search, a
value of
site:www.yoursite.com must be embedded into
the value for q.
Taking into account those two points, the line for q would be
'q' => 'site:www.yoursite.com search terms'.
Moving forward: request and receive in one easy step
This next part moves quite fast, condensing a few of the steps
done in last weeks Yahoo! example into simply a few lines of code.
Rather than opening a file with PHP, we will be passing the url and
the parameters directly to the nusoap class:
-
// include the class:
-
include('nusoap.php');
-
//
-
// instantiate a new soap
client:
-
$soapclient = new
soapclient("http://api.google.com/search/beta2");
-
//
-
// send the query off to the server, with
our
-
// parameters and using the 'doGoogleSearch'
method
-
$searchresults = $soapclient->call("doGoogleSearch",
$params,
-
"urn:GoogleSearch", "urn:GoogleSearch");
-
-
Download this
code
First we include the class, then we instantiate a new
'soapclient', passing it the URI for the API server. From this
point we can call the server as outlined in the example.
Nusoap returns the data from the server to us in the form of an
array. Compared to the example from last week, this was certainly
much simpler to get from the request to an array of
data (granted, nusoap is composed of a lot of lines of code,
and it did all of the heavy lifting).
A look at the data
If you were to print_r($searchresults) at this
point, you would see something similar to the following:
-
Array
-
(
-
[directoryCategories] => Array
-
(
-
)
-
[documentFiltering] =>
-
[endIndex] => 2
-
[estimateIsExact] =>
-
[estimatedTotalResultsCount] => 190
-
[resultElements] => Array
-
(
-
[0] => Array
-
(
-
[URL] =>
http://www.fiftyfoureleven.com/sandbox/weblog/2004/jun/the-definitive-css-gzip-method/
-
[cachedSize] => 18k
-
[directoryCategory] => Array
-
(
-
[fullViewableName] =>
-
[specialEncoding] =>
-
)
-
[directoryTitle] =>
-
[hostName] =>
-
[relatedInformationPresent] => 1
-
[snippet] => This post is the source for the most
definitive/recent/tested version of gzipping
-
your CSS.
-
[summary] =>
-
[title] => The Definitive Post on Gzipping your
CSS
-
)
-
[1] => Array
-
(
-
[URL] =>
http://www.fiftyfoureleven.com/weblog/web-development/css/applied-css-management-and-optimization
-
[cachedSize] => 44k
-
[directoryCategory] => Array
-
(
-
[fullViewableName] =>
-
[specialEncoding] =>
-
)
-
[directoryTitle] =>
-
[hostName] =>
-
[relatedInformationPresent] => 1
-
[snippet] => Building on the previous discussion about
managing CSS files, this post looks at
-
the practical solutions in use to help offset the results
of some of the ...
-
[summary] =>
-
[title] => Applied CSS Management and
Optimization
-
)
-
)
-
[searchComments] =>
-
[searchQuery] => site:www.fiftyfoureleven.com
css
-
[searchTime] => 0.093227
-
[searchTips] =>
-
[startIndex] => 1
-
)
-
Download this
code
Looking at the array above, we can see that the total number of
search results can be taken from
$searchresults[estimatedTotalResultsCount], and that
our results are held in
$searchresults[resultElements]. Each of those elements
holds a result, with the associated title, url, file size and s
snippet from the page. Further down, you can find the search query
and the start index, among some other details.
Step 3: Presenting the results
Now that the data is held in an array, it will be quite easy to
loop thru the $searchresults[resultElements] and
present the data however you see fit. I won't elaborate an example
here, but in the downloadable code at the bottom an example is
given.
A short discussion
This is the second search API that has been examined, and we can
see some similarities despite the use of REST for Yahoo! and SOAP
for Google. The data transfer is done with XML - apart from the
initial request when using Yahoo! - and we have unserialized the
data to arrays in both instances.
Personally, I prefer to move the data to an array in this case,
because after some "array normalization", I could pass either the
Yahoo! or Google results array to the same HTML processing
function.
Not only would this be an efficient setup if I decide to change
the format later, but in this manner I could provide three separate
sets of results (Yahoo! + GOOG + MSN) in the site search for a
site. Or, I could develop a search website (gada.be anyone?) and
have one single function deal with processing the data into a
view.
One last note, pagination is possible here just like Yahoo!, and
with Google it may be more of a necessity as you can only have 10
results returned at a time.
Download some code!
Here is an example script that pulls this whole article
together. When using, remember:
- To change the extension to php.
- That you need a server with php installed to run it.
- That you will need a Google Application Key to use it.
That's it!
As usual, questions or comments invited below, and stay tuned
for the MSN search API - which will be similar work to this one -
coming up later this week. Next week we move away from search
APIs...
Bookmark this link at:
del.icio.us new digg blinklist newsvine
Comments and Feedback
There are 6 comments for this post. Subscribe to the RSS 2.0 comments feed or add your comment »