Stop Words
To save disk space and to reduce calculation times for search responses, search engines either:
- do not index (record),
- or ignore in their search response calculations,
- or both,
common words which are unlikely to have any relevancy in searches.
These are known as "stop words." Examples: "this", "that", "those", "the".
Saving Space on Servers
Consider this sentence:
The way to the school is long and hard when walking in the rain.
The appears three times. To save space, a search engine might replace it with what's called a marker. The sentence would be stored like this:
* way to * school is long and hard when walking in * rain.
This explanation is simplified, but the point is that using markers can save a lot of disk space. The sentence retains most of its relevancy, and the extra space can be used to store more web pages.
Speeding up Searches
Some search engines store every word on a web page but they don't search for certain ones to save time. Consider a search for: the piano player
The search engine has to make three runs to find matches (again, this is oversimplified).
First it looks for all matches of the, then all matches of piano, then all matches of player.
Chances are, just looking for the last two words is enough to find relevant pages. So to save time, the search engine excludes searching for a select number of small words. It won't "stop" to look for them.