26 February 2012

If you need to buy a product online and you do not know the exact website that is offering the same, what will you do? Like most of the people, you will go for a search engine. And in a moment, the desired web pages will be in front of your eyes. That’s the work a search engine does. Not confining itself in just searching web pages, it also searches images, videos and lot more from the databases stored on the entire World Wide Web.


Search Engines


There are billions and trillions of web pages available on the World Wide Web. No doubt most of them are full of some useful information but managing all of them or accessing them in order to find the needed web page is a daunting task and so rather impractical too. On the other hand, it becomes quite easy using a search engine. Search engines are the first stop for smart internet users to find out specific information or a website. Since the last decade, with the rapidly growing volume of web pages, search engines became an inseparable part of the World Wide Web.
If you are familiar with the internet and websites, you must have used a search engine too. Typically a search engine can be defined as – “A search engine is a sophisticated piece of software that can be accessed through a web browser and by accepting a search query from its users, it results a list of web pages related to the query.” It’s the algorithms that make all these executions in a flash of a second. But the process is much more complex than it seems.
How Search Engines Work?
A search engine incorporates a number of processes to bring out the desired results. These processes can be majorly divided into three different tasks: 
1.      Web Crawling
Before a search engine is able to display a web page in its search result pages, it should know about that web page thoroughly. How does it know about web pages? Web crawling is the process that a search engine uses to get information about all web pages on the internet. And the part of the codes that does this is known as Spider or Crawler.
Crawlers continuously visit the web pages on the internet to get the entire content listed in the database of search engines. As there are billions of web pages, generally a search engine employ thousands of crawlers which keep visiting the whole World Wide Web day and night. By going on each web page, it reads every text from the page and stores them in a database known as search engine index.  During crawling, if any hyperlink is found then crawlers also visit that too. Hence they travel from one page to another until all the pages are crawled. In fact they never stop and revisit the pages to search any updates on the old pages.
When search engine crawlers visit an html page, they store all the texts (except the frequently used words like a, an, the, is, on, at, etc.) of the web page in to the index. Generally these two things are stored:
·         What text were found on the page
·        The corresponding Link of the webpage where that keyword were found.
Crawlers give much significance to the keywords that were found in the Title, Meta Tags of the web page.

Search Engine Crawling and Indexing

2.      Indexing
After crawlers finish the crawling, they store the information from web pages in a repository known as Index. This process is generally referred as indexing. An index is the database which is used to provide the search results. It stores the keywords and the links of the web pages where that keyword was found. For example suppose that there are 4 keywords that were found on the 5 web pages. Then in a simplest form they will be stored something like this:
Keyword                               Web Pages
Apple                                      2, 4, 5
Ball                                          1, 4
Cat                                          3, 5
Dog                                         1, 3, 4
Where 1, 2, 3, 4, 5 are the URLs of web pages.
Now if a user searches the term dog in the search engine then the web pages 1, 3 & 5 will be displayed in the search result page, where their order will be depend on the various factors including the frequency of keywords in the web pages.
To avail better results to its users, search engines stores more information about web pages than just keywords and the web pages. They can also store the frequency of the keywords they found, location of the keyword in the page, links of the web page associated with that keyword on other pages, etc. Each time a web page is crawled, the index is refreshed.
3.      Responding a Query
On searching just a simple keyword or a combination of keywords (called key-phrase), the entire index is checked to find out the associated web pages with the keyword/phrase. Web pages are sorted in an order of their corresponding weightage. Every search engine applies different criteria to determine the weightage hence every search engine displays different results for a similar query. Search engine also facilitate you to make more complex search queries by using the following Boolean operations:
·         AND – All the terms associated with ‘AND’ operation must appear in a web pages.
·         QUOTATION MARKS – A search term in quotation marks is treated as a complete phrase and in response to such a query, web pages that directly include that complete phrase are displayed.
·         OR – One of the terms joined with ‘OR’ operation should appear in the web pages.
·         NOT – NOT is used to display web pages without the term connected with the NOT operation.
·         FOLLOWED BY – The searched term must be followed by the other one.
Factors affecting Search Results:
A search engine display search engine result pages (commonly called SERPs) according to its Index. URLs stored in higher position in the index are displayed on higher rank in SERPs. Different search engines decide the index positions using different criteria; however these are some common factors that most of the search engine uses to decide the index positions:
·         Domain name is an important factor for all search engines. If the searched key phrase matches with a domain name, that page is indexed on higher position corresponding to that keyword/key phrase.
·         Web Pages with the searched term in its Page Title are stored on higher positions.
·         Crawler search the term in the URL itself and if found then its increase wightage for that web page.
·         Searched term in meta description, meta keywords & content.
·         Back links (A web page’s links found on other web pages) of the web pages.

Some Facts about Search Engines:
1.      Search Engine Optimization is a large business in the industry, where professionals work to bring web pages on higher position of the search engine result pages.
2.      Search Engines frequently change their algorithms in order to provide quality sites at the top.
3.      Google is now also facilitating users to conduct a search by an image or voice.
4.      Approximately 93% of total consumers use a search engine to find and access a website.
5.      85% users do not see after 1st page.
6.      The first result gets an average of 50% traffic, second gets 15-20% and tenth gets only 2-3% of the total traffic.
Categories:

0 comments:

Post a Comment