Skip to content

dvallance/google-site-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gem: google-site-search

Description

This gem was created to aid in the querying and parsing of the Google Site Search api.

In the simplest use case it will query your google site search for a term and supply you with an object containing the results. However I’ve built this gem with the intention that you will want to explicitly handle what and how the specific results are stored.

Installation

Add the following to your projects Gemfile.

gem 'google-site-search', :git => "[email protected]:dvallance/google-site-search.git"

Require the code if necessary (note: some frameworks like rails are set to auto-require gems for you by default)

require 'google-site-search'

Usage

The simpliest way to use the gem is by providing just a search query term and your search engine unique id code (e.g. looks like this 00255077836266642015:u-scht7a-8i and is located in your google site search control panel)

# just assign the query to an object
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")

# object has search attributes like
puts search.next_results_url
puts search.previous_results_url
puts search.xml
puts search.spelling
puts search.spelling_url

# object has an array of each specific result that contains title, description and its link by default
search.results.each do |result|
  puts result.title
  puts result.description
  puts result.link
end

The query method expects a valid url so if you wanted to supply your own you can! However I have created a builder class to help with proper url creation and to help do some of the work for you.

Multiple Search

Since google only allows a max of 20 returned results I have added a method that will capture up to n number of results.

# the array will be up to 5 search objects if the query actually has that many results.
# has soon as a search doesn't have a next_results_url the method stops.
array_of_search_results = GoogleSiteSearch.query_multiple(5, *YOUR_URL*)

Caching

To aid in caching I added a a caching_key method which sorts the query parameters, removes some unique parameters that google adds (i.e. an ie parameter which has something to do with related searchs), and compresses the result. It should be a unique representation of a search url, and used to cache results.

Blocks (and a caching example)

Both the query and query_multiple methods can take a block which executes for each Search object found.

# here is a rails example using caching and supplying a block
url = GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")

@search = Rails.cache.fetch(GoogleSiteSearch::caching_key(url.to_s), :expires => 1.day) do |search|
  GoogleSiteSearch.query(url, SearchResult) do |search|
    # I can do something with the search objects.
    # possibly some custom analytics for our searchs?
    search.url # we can access the object
  end
end

Advanced Usage

An important requirement for this gem was to be able to use structured data for:

  • querying the search api itself (i.e. filtering and sorting )

  • displaying specific information in views (i.e. display a specific field like’author’, or ‘product_type’)

Therefore I allow the developer to supply his own “Results” class to the query and allow them to parse each result xml element explicitly.

The default Result class is as follows:

class Result
  attr_reader :title, :link, :description
    def initialize(node)
    @title = node.find_first("T").content
    @link = node.find_first("UE").content
    @description = node.find_first("S").content
  end
end

As you can see it is very simple. Your class simply needs an initialize method that will recieve an xml node, which it can then do with as it pleases. After it is initialized it is added to the search.results array as shown previously.

See

Pagination

The google search api does the work of pagination for us, supplying the next and previous urls. The urls are relative paths and contain the search engine id parameter. Since this is a security concern I strip out the search engine id when I store them in the Search.next_results_url and Search.previous_results_url methods. This makes them safe to put in links on views and it is why you must supply the search engine id again on the paginate call; so the full url can be rebuilt for the query call.

search2 = GoogleSiteSearch.query(GoogleSiteSearch.paginate(search1.next_results_url, "00255077836266642015:u-scht7a-8i"))

Pagination Simple Example

This works and is fairly straight forward.

In your controller:

if params[:move]
  @search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(params[:move], "00255077836266642015:u-scht7a-8i"))
else
  @search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i", :num => 5))
end

In your view:

<% if @search.previous_results_url %>
  <%= link_to "Previous", search_url(:move => @search.previous_results_url) %>
<% end %>
<% if @search.next_results_url %>
  <%= link_to "More", search_url(:move => @search.next_results_url) %>
<% end %>

Escaping

If you start passing around the url’s in parameters you may run into issues if you don’t escape/unescape the url. If so try…

View adds escape:

<%= link_to "Previous", search_url(:move => CGI::escape(@search.previous_results_url)) %>

Controller unescapes:

@search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(CGI::unescape(params[:move]), "00255077836266642015:u-scht7a-8i"))

Filtering and Sorting

See Filtering and sorting search results.

Filtering

Google expects filtering to be on the “search query” itself. However I feel my end users won’t and shouldn’t be aware of all the possible filtering options (most of my filtering will be based off of dataobject values I supply myself). So I try and keep the filters and actual “search term” separate.

From the google reference link above an example filter search query is halloween more:pagemap:document-author:lisamorton

# using the example above would look like this.
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter =>  "more:pagemap:document-author:lisamorton")

Separate Search Term From Filters

The full “search query” is returned by google’s api and stored in the Search object in a few spots. (i.e @search.search_query method and @search.spelling_q).

To separate the search term from the filter use:

search_term, filters = GoogleSiteSearch.separate_search_term_from_filters(@search.search_query)

Sorting

Sorting would also be done by specifing a sort option.

search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter =>  "more:pagemap:document-author:lisamorton", :sort => "data-sdate")

Other Params

Any [param=value] query string additions you want to add can be assigned like the sorting above. For example to limit the search results return, to 5, would look like…

# get only 5 search results with the filtering and sorting from above still applyed.
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween more:pagemap:document-author:lisamorton", "00255077836266642015:u-scht7a-8i", :sort => "date-sdate", :num => "5" )

Author

David Vallance

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages