Google Search
From Wikipedia the free encyclopedia, by MultiMedia
Google is a search engine owned by Google Inc.
whose mission statement is to "organize the world's information and make
it universally accessible and useful." The largest search engine on the
web, Google receives over 200 million queries each day through its
various
services.
In addition to its tool for searching webpages, Google also provides
services
for searching images, Usenet newsgroups, news websites, videos,
searching by locality, maps, and items
for sale online. As of June 2005, Google has indexed 8.05 billion web
pages, 1.3 billion images, and over one billion Usenet messages — in
total, approximately 10.4 billion items. It also caches much of the
content that it indexes. Google operates other
tools and
services
including Google News, Google Suggest,
Froogle, and
Google Desktop Search. See
list of Google services
and tools for a complete list.
![Google's main page](./modules/Google_Guide-MM/images/Google23.jpg)
Google's main page's unusually spartan design, uncluttered appearance and
quick loading time have contributed greatly to the site's mass appeal.
History
The Google search engine began as a research project in early 1996 by
Larry Page and Sergey Brin, two Stanford University graduate students who
developed the theory that a search engine based on a mathematical analysis
of the relationships between websites would produce better results than the
basic techniques then in use. It was originally nicknamed "BackRub" because
the system checked backlinks to estimate a site's importance.
Convinced that the pages with the most links to them from other highly
relevant web pages must be the most relevant ones, Page and Brin decided to
test their thesis as part of their studies, and laid the foundation for
their search engine. The web site called "Google!" (with an exclamation
mark) went live at the domain name google.com. They formally founded their
company of the same name, Google Inc., on September 7, 1998 in a friend's
garage in Menlo Park, California. Brin's lack of interest in writing HTML
code used for designing web pages meant that the site's design used a
minimal interface.
Google introduced advertisements in 2000, which were sold by the keyword so
that they would be more relevant to the end user, and the ads were
text-based in order to reduce loading time and to keep the page uncluttered.
In September 2001, Google's ranking mechanism
PageRank was awarded a U.S. patent. The patent was officially awarded to
Stanford University and lists Lawrence Page as the inventor. At its peak in
early 2004, Google handled upwards of 80 percent of all search requests on
the Internet through its website and clients like Yahoo!, AOL, and CNN. 1
Google's share of web search fell in 2004 when Yahoo! dropped Google's
search technology in favor of their own.
The Google search site includes humorous features such as cartoon
modifications (called "Google Doodles") of
their logo for special occasions, the option to display the site in
fictional or humorous languages such as Klingon and Leet, and April Fool's
Day jokes about the company.
It has been conjectured that Google's future is personalized searches, using
the data that is gathered from their Orkut,
Gmail, and Froogle
products to give results based on an individuals previous actions. In fact,
there is a Personalized Google Search Beta in
Google Labs, the experimental section of the site. 2
The name "Google"
Etymology
The name "Google" is an accidental misspelling of the word googol, which
was coined in 1938 by Milton Sirotta, nephew of mathematician Edward Kasner,
to refer to the number represented by 1 followed by a hundred zeros, 10100.
Google's use of the term reflects the company's mission to organize the
immense amount of information available on the Web.
Trademark and domain names
"To google," as a verb, has come to mean "to search for something on
Google"; because of Google's popularity (in January 2005, 52 percent of all
web searches 3 , but was as high as 80 percent) it has also generically come
to mean "to search the web." Google officials have discouraged this usage of
the company's name out of fear of trademark dilution, as it could lead to
their name becoming a genericized trademark.
To prevent domain hijacking by unaffiliated third parties, Google has
purchased the redirecting rights to several similar-sounding domain names
like gogle.com, googel.com, etc. See external links below for other domain
names owned by Google. The registration of other domain names to prevent
hijacking and for humorous purposes is by no means restricted to Google.
The search engine
Index size
- ~ 1998: ~ 25,000,000
- August 2000: 1,060,000,000
- January 2002: 2,073,000,000
- February 2003: 3,083,000,000
- September 2004: 4,285,000,000
- November 2004: 8,058,044,651 web pages, 880,000,000 images,
845,000,000 Usenet messages, 4,500 news sources
- June 2005: 8,058,044,651 web pages, 1,187,630,000 images, 1 billion
Usenet messages, 6,600 print catalogs, 4,500 news sources
(source: Internet Archive 4, GoogleBlog 5, Google Groups 6, Google
Catalogs 7)
Physical structure
Google employs data centers full of low-cost commodity computers running
a custom Red Hat Linux in several locations around the world to respond to
search requests and to index the web. The server farms in the data centers
are built using a shared nothing architecture. The indexing is performed by
a program named Googlebot, which periodically
requests new copies of web pages it already knows about. The more often a
page updates, the more often Googlebot will
visit. The links in these pages are examined to discover new pages to be
added to its internal database of the web. This index database and web page
cache is several terabytes in size. Google has developed its own file system
called Google File System for storing
all this data.
Please see Google platform regarding the
number of Google's servers and their hard- and software.
Programming technology
Google use their own concept for distributing the task of processing
collected data. Chunks from the Google
File System of typically 64 MB are processed by the
MapReduce framework. This framework makes it
possible to apply the map and reduce concepts from functional programming
languages across the data stored in the GFS. First a function is mapped
across the collected data, then the result is reduced. For example a
function extracting the hostname of the URL can be mapped across all pages,
it is then sorted and reduced, yielding a figure of how many times a certain
hostname has occurred. All mapping and reducing is massively parallelized
across the nodes and fault tolerant, so if nodes crash or misbehave during
map reduction, work is moved over to another machine.
Google uses an algorithm called PageRank to
rank web pages that match a given search string. The
PageRank algorithm computes a recursive figure of merit for web pages,
based on the weighted sum of the PageRanks of
the pages linking to them. The PageRank thus
derives from human-generated links, and correlates well with human concepts
of importance. Previous keyword-based methods of ranking search results,
used by many search engines that were once more popular than Google, would
rank pages by how often the search terms occurred in the page, or how
strongly associated the search terms were within each resulting page. In
addition to PageRank, Google also uses other
secret criteria for determining the ranking of pages on result lists.
Google not only indexes and caches HTML files but also 13 other file types
8, which include PDF, Word documents, Excel spreadsheets and plain text
files. Except in the case of text files, the cached version is a conversion
to HTML, allowing those without the corresponding viewer application to read
the file.
Google may have difficulty indexing some websites, in particular those that
use frames, links embedded within JavaScript or Java, or complex URLs with
more than six variables in the query string. Google offers an explanation
why some web pages haven't been included 8.
Users can customize the search engine somewhat. They can set a default
language, use "SafeSearch" filtering technology (which is on 'moderate'
setting by default), and set the number of results shown on each page.
Google has been criticized for placing long-term cookies on users' machines
to store these preferences, a tactic which also enables them to track a
user's search terms over time. For any query (of which only the 32 first
keywords are taken into account), up to the first 1000 results can be shown
with a maximum of 100 displayed per page.
Despite its immense index, there is also a considerable amount of data in
databases, which are accessible from websites by means of queries, but not
by links. This so-called deep web is minimally covered by Google and
contains, for example, catalogues of libraries, official legislative
documents of governments, phone books, etc.
As an April Fool's parody of PageRank, Google
introduced an explanation of something called "PigeonRank" 10
Google optimization
Since Google is the most popular search engine, many webmasters have
become eager to influence their websites' Google rankings. An industry of
consultants has arisen to help websites raise their rankings on Google and
on other search engines. This field, called search engine optimization,
attempts to discern patterns in search engine listings, and then develop a
methodology for improving rankings.
![The webpage that shows the results of a search for Miserable failure](./modules/Google_Guide-MM/images/Google24.jpg)
The webpage that shows the results of a search for
Miserable failure. This is an example
of Google bombing.
One of Google's chief challenges is that as its algorithms and results
have gained the trust of web users, the profit to be gained by a commercial
web site in subverting those results has increased dramatically. Some search
engine optimization firms have attempted to inflate specific Google rankings
by various artifices, and thereby draw more searchers to their clients'
sites. Google has managed to weaken some of these attempts by reducing the
ranking of sites known to use them.
Search engine optimization encompasses both "on page" factors (like body
copy, title tags, H1 heading tags and image alt attributes) and "off page"
factors (like anchor text and PageRank). The
general idea is to affect Google's relevance algorithm by incorporating the
keywords being targeted in various places "on page," in particular the title
tag and the body copy (note: the higher up in the page, the better its
keyword prominence and thus the ranking). Too many occurrences of the
keyword, however, cause the page to look suspect to Google's spam checking
algorithms.
One "off page" technique that works particularly well is
Google bombing in which websites link to
another site using a particular phrase in the anchor text, in order to give
the site a high ranking when the word is searched for.
Google publishes a set of guidelines for a website's owners who would like
to raise their rankings when using legitimate optimization consultants 11.
The New Zealand DMA offers a more comprehensive guide to SEO ethics
standards 12.
Main article: List of
Google services and tools
Google offers a number of
tools and
services.
Some, such as Google's calculator, stock quotes and weather results are
integrated into what they call the "OneBox", meaning they appear in-line
with other search results 13. The name is based on an ideal of all
information being available from the one search box.
Many of Google's other
services are
based on applying search technology to other sources of data. Examples of
this are Google Image Search, Google News,
and Google Video, as well as
Froogle, their catalog searching service.
However, many of these
services have
become integrated as OneBox results and now appear in normal search results
as well as having their own pages. 14
Google also provides other related
services that
are not directly related to searching. These include their
AdSense and AdWords
targeted text advertising
services,
Gmail, Blogger
web-logging service and Google Web Portal a beta web service similar to My
Yahoo.
Lastly, there are a number of
tools written by
Google to interact with their search and
services. As
of February 2005, these have been written exclusively for the Microsoft
Windows operating system. Such
tools include
Google Desktop, Google Deskbar, Google
Toolbar (for IE and also as a Firefox extension), Gmail
Notifier, Google Earth and
Google Talk.
Jargon
SEO
Search Engine Optimization
To google
to search something using google (also, to seek information on someone
by entering their full name or other information)
Googler
a person who uses Google's features very efficiently. Mostly uses the "I
am feeling lucky" button when searching. Fan of a google. 'Googler' is
sometimes also used for "Expert Online Searcher". Also, a full-time google
employee.
Noogler
New Googler
Googlosophy
The science of Google
Googlenym, Googlonym, Memomark, Google URL
A mental bookmark expressed as Google search ("go to my site by entering
'John Doe Chicago' into Google"). A phrase or group of random key words for
which a Google search returns a corresponding page.
SERPs
Search Engine Result Pages
Nigritude ultramarine,
SERPs,
Seraphim Proudleduck, Mangeur de
cigogne
SEO competitions
Blackhat SEO
search engine optimization using dirty tricks such as linkfarms, wiki or
guestbook spamming, and so on
Googledork
A person who accidentally exposes information to the web by placing it
into a location spidered by Google.
Whitehat SEO
search engine optimization using enhanced content, improved
accessibility and usability, unique page titles, non-JavaScript linking
methods, and so on
Google-proof
search-phrase delivering exactly the intended result while searching
with google
Sandbox Effect
The name given to the phenomenon in which Google filters (from its
results) websites created after March 2004.
Google bomb
An attempt to influence the ranking of a given site in results returned
by the Google search engine. Also known as Google wash.
Blue Red Yellow Blue Green Red
synonym of Google (from the colors of their logo)
Googlewhack
A search using two dictionary-valid (underlined by Google) words that
only results in one hit.
Games with Google
- In Googlewhack you attempt to find
two words that produce exactly one search result.
- In Google Talk Game, google searches
are used to complete a beginning of a sentence with words, leading to
amusing or interesting results.
- In Googlefight, you pit two keywords
against each other to find which one has more results.
- In Guess The Google, you attempt to guess which search term resulted
in the displayed images.
Books
- Google Hacks from O'Reilly is a book
containing tips about using Google effectively. Now in its second
edition. ISBN 0596008570
- Google: The Missing Manual by Sarah Milstein and Rael Dornfest
(O'Reilly, 2004). ISBN 0596006136
- How to Do Everything with Google by Fritz Schneider, Nancy Blachman,
and Eric Fredricksen (McGraw-Hill Osborne Media, 2003). ISBN 0072231742
- Google Power by Chris Sherman (McGraw-Hill Osborne Media, 2005).
ISBN 0072257873
See also
References
External links
Google Guide made by MultiMedia | Free content and software
This guide is licensed under the GNU
Free Documentation License. It uses material from the Wikipedia.
|