Friday, January 12, 2018

How Search engine companies depend on data mining. (Basic Idea)

Introduction

From world wide web (www) we can get many type of information such as Page Link, web page, accessible document, images videos and also many type of content .So database continually increase. The WWW has added abundant of data and information transform into complex information. For the complex and large volume of information, it is not easy to find relevant information in a short time. In this regard problem has been resolved by data mining which is a process of extracting previously unknown data. However, data mining is a process by which previously unknown information and patterns are extracted from large quantity of data. I try to describe basic idea of search engine and data mining.

Search engine:

For a large volume of data on internet it is difficult to find and extract information for you. It has said that if you spend only one minute per page, 10 hour a day, it would take four and half year to explore only 1 million web pages. So for real need data mining is necessary. There are many search utilities such as google, bing, ask, AOL, webCrawler etc. Every search engine has large database.
A search engine database typically contains information such as
1.      Title of the page
2.      The url
3.      A short abstract of the content
4.      Keyword to help the search engine
Web sites are indexed, scored and ranks for different search engine. Ranking algorithms are work by web site usability and search frequency of keywords.
For Example: If 10 different user are search by “Data mining” text within 15 user. Other 5 user search by “Data mining and search engine” text. First 10 user also have interest of search engine related result. Here frequency of “Data mining” related web pages is increase. So next time when any one write “Data Mining” text for pick result then most browsed web site will show first. 

Data Mining:

Data mining extract related data for you from large database by use of KDD(Knowledge discovery in database) .
KDD can be :
1.      Database
2.      Relational database
3.      Structure database
4.      Unstructured database      
5.      Flat file
6.      Transactional database
7.      Object Oriented database
8.      Data Warehouse
9.      Multimedia database
10.  Time series database
You can use Association and clustering analysis in search engine algorithm to extract required result.
1.    Association Analysis:
Association analysis discovers the pattern that describes strongly associated features in data. For example: they who search by text “data mining” would most possibility to enjoy “data mining and search engine” related result.
2.    Cluster Analysis:
Cluster analysis seeks to find groups of closely related observations so that observations belong to same clusters are more similar to each other.
For example: Search result of data mining and data science may closely related.

Bibliography
[1] Mohammad Alhamami,Using Data mining to enhance web search engine. Ref: http://www.ehulool.com/using-data-mining-to-enhance-web-search-engines/
[2] Hillal Hadi Saleh, Mohammad Ala’a AL-Hamami, “A Proposed System to Improve Relevant Information Retrieval on the Web”, the 1st International Conference on Digital Communications and Computer Applications (DCCA2007), the Jordan University of Science and Technology, Irbid, Jordan.2007.
[3] Alaa H. AL-Hamami, Mohammad A. AL-Hamami, Soukaena H. Hashem, “Using Data Mining Confidence and Support for Privacy Preserving Secure Database”, Journal of Statistical Sciences, Volume 1, No. 1, Issued by Arab Institute for Training and Research in Statistics, July –December 2009.
[4] Smith J. R., and Chang S. F., “Visually Searching the Web for Content”, IEEE Multimedia Magazine, vol. 4, pp. 12-20, 1997.

[5] Pang- Ning Tan,Michael Steinbach,Vipin Kumar,”Introduction to data mining” 2006