Home » Software » How To Create a Simple Search Engine

How To Create a Simple Search Engine

According to wikipedia, a Web Search Engine is a software that designed to search for information, data, and etc on the internet world wide web. This article will covers on how to create a search engine like google.

How-To-Create-a-Simple-Search-Engine2 How To Create a Simple Search Engine

What You Need To Create A Simple Search Engine

In order to create Search engine. You need 2 main part of search engine, They are

  1. Web crawler, to collect information on the internet.
  2. Search Platform, used for searching any data, in this case web pages.

To set up a web crawler, you can use Apache Nutch. You can check my previous article to create a web crawler using Apache Nutch. After web crawler is working. You need a search platform to display crawled information from apache nutch. The search platform that we will use is Apache Solr.

Apache Solr is a search platform which is built on top of Apache Lucene. It’s a very powerful searching platform because provides full-text search, dynamic clustering, database integration, rich document handling, and much more.

How To Install Apache Solr

Follow these steps for installation of Apache Solr

1. Download Apache Solr from apache’s website

2. Extract the downloaded file by use following commands

These commands will extract all apache solr’s file in the destined folder.

3. Navigate to ~/.bashrc file (go to the root directory and type gedit ~/.bashrc) and put the following configuration into ~/.bashrc file :

This will create an enviroment variable called SOLR_HOME which is required for Apache Solr to run.

4. Test your Apache Solr installation by navigating to example directory of apache solr, and type following command to start Apache Solr

If it’s done correctly, You will get this output

5. Verify Apache Solr integrity by browsing the following URL

You will get the image of Running Apache Solr like images below

How-To-Create-a-Simple-Search-Engine How To Create a Simple Search Engine


6. At this point, Both Apache Nutch and Apache Solr are installed correctly. We need to integrate Apache Solr into Apache Nutch.

Integrate Apache Solr to Apache Nutch

Integration is required for indexing URLs to Apache Solr crawled by Apache Nutch. So once Apache Nutch done with crawling. The information will be indexed by Apache Solr. To integrate Apache Solr into Apache Nutch follow these steps

1. Copy Schema.xml file (Apache Nutch directory/conf) and put it into the conf directory of Apache Solr.

2. Enter the following command to copy schema.xml

3. Navigate to example directory. Type the following command to restart Apache Solr

4. Now you can start Apache Nutch by use these command

cd<Apache Nutch’s directory>/runtime
bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/2

Now you will be able to create a simple search engine. Apache Nutch provide many parameters to extend according to your requirements.

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *
Email *

This site uses Akismet to reduce spam. Learn how your comment data is processed.