Assignment #1 Program Description

How it works

This system is based of some server side technologies that are provided by Sun Microsystems and Google. The product that was used by Sun was their Java Server Pages, or JSP for short. Using JSP as our sever side scripting language I implemented that Google API that is provided free of charge for 1000 uses per day.

When the user first comes to my search page they will be greeted with a form that will ask them to submit a query to my search. After the request is sent to the Tomcat server the JSP script will be ran. The first step that the script will take is that it will attempt to locate the domains.txt file with all of the restricted domains in it. At this point if it cannot find the file the JSP file will throw an error and the user will be told that the file cannot be accessed properly for the script to continue. However, if the file is found to be suitable for the program to continue then the search will begin for each of the domains in the domains.txt file.

The thing to think about is how exactly the data gets from Google's main servers to the page that you use. When the program is called my JSP script utilizes calls that are found in a provided googleapi.jar file that is stored on the Tomcat sever. From this jar file the JSP script can “tap into” the Google search and caching resources that they have online. From this JSP file I can call to make a query to the Google search libraries and return results as if it was Google doing the searching. The advantage now is that I can restrict what domains I want to search and I can format the output in any fashion that I see fit for the application that I am generating.

After the search object is passed back to my JSP script it is ready to be formatted to my liking. For each of the domains in the domains.txt file an array of Google search result elements will be created and then later parsed out through the JSP into HTML formatted text. Each of the domains will be categorized in their own grouping. So, all results from http://www.washington.edu/ will be grouped together while all results from another domain will be grouped with that domain.

The results are listed with the maximum of 10 being listed for any particular domain via an HTML ordered list. Some can have no results and some might have more than 10 results. At this point the user has a set of results displayed before them and they can click on the title, which is a link to the site, and read a brief excerpt on the page that they are about to visit. Also, the search terms are highlighted (or bolded in this case) to make it easier for the user to find the search terms within the excerpt.

Features

One of the nice features about this is the ability to manipulate the information in any format that the end user sees fit. They can make their lists in a specific manner or they can provide a limited number of results to the user. Really, the possibilities are only limited to what the developer can think of.

Another nice feature about the Google API is the ability to use it for a broad range of searches. If you ran a site and wanted to provide searching on your site, you could with the Google API. You would not need to spend the time developing a system to search your page. You can just use the Google API for whatever you need it for and then you can restrict the results to any site that you would want. This could come in really handy for commercial sites that are very large and contain a lot of information that might be lost or not easily found.

Limitations

Obviously the main limitation, at least in the free version, is the 1000 queries per day limit. If one tried to use this on a site for searching purposed they would soon realize that if it gets used often that their search will eventually stop because they have reached their 1000 queries mark for the day. Other limitations are small, if at all. This current API only links to their main searching catalogue and not to their other specialty catalogues that they provide on their site. Whether or not those would be useful to have an API for might still be in debate. But if you wanted to use them for searching that would be a limitation of the current implementation of the API.

Scalability

From a standpoint of which this would be implemented over a larger project, I could see this being a highly scalable product. Google has basically left the reigns of control with the developer to do what they desire with the information from Google. Anything could be possible from this API in the way of formatting/layout or even what content is provided to the end user. The possibilities and combinations are really endless. It would also be possible to have this be used in a multitude of settings with a variety of implementations. There really is no stopping, in my view, what a developer can and cannot do with the information that comes from this API and it's calls to and from Google's main computers.