Indexing using Data Import Handler

The following steps help you configure Data Import Handler for indexing databases:

  1. Do the following:

    • For full import: Create the following dataconfig.xml to define mapping of database table columns to Solr:

      <dataconfig> 
          <datasource driver="org.hsqldb.jdbcDriver" url="jdbc:mysql:/temp/example/ex" user="user" 
                      password="user"/> 
          <document name="products"> 
              <entity name="item" query="select * from item"> 
                  <field column="ID" name="id"/> 
                  <field column="NAME" name="name"/> 
              </entity> 
          </document> 
      </dataconfig>
    • For delta import: Create the following dataconfig.xml:

      <dataconfig> 
          <dataSource 
          driver="com.mysql.jdbc.Driver" "jdbc:mysql:/temp/example/ex" user="user" password="password" /> 
          <document name="rrr"> 
          <entity name="item" pk="ID" query="select             ID,NAME,PRICE,WEIGHT,last_modified  from item" 
                  deltaimportquery="select ID,NAME,PRICE,WEIGHT,last_modified from item where ID='${dataimporter.delta.id}'" 
                  deltaquery="select id from item where last_modified > '${dataimporter.last index_time}'"> 
              <field column="ID" name="uid"/> 
              <field column="NAME" name="name_t"/> 
              <field column="PRICE" name="price_f"/> 
              <field column="WEIGHT" name="weight_d"/> 
              <entity name="feature" pk="ITEM_ID" 
                      query="select description as         features from feature where item_id='${item.ID}'"> 
                  <field name="features_t" column="features"/> 
              </entity> 
              <entity name="item_category" pk="ITEM_ID, CATEGORY_ID" 
                      query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"> 
                  <entity name="category" pk="ID" 
                          query="select description as cat from category where id = '${item_category.CATEGORY_ID}'"> 
                      <field column="cat" n a m 
                             e="cat t"/> 
                  </entity> 
              </entity> 
          </entity> 
      </dataconfig>

    For details of the attributes, see Schema for the data config in the section Configuration in data-config.xml at the URL http://wiki.apache.org/solr/DataImportHandler.

    • Ensure that last_modified is the column name of the table that you index and the column has time stamp.

    • Unless you have this column mapped, partial import fails.

    • The latest timestamp is created in the dataimport.properties available in the collection location.

  2. Save the file in the conf directory of the collection that you have created.

  3. In the solrconfig.xml (in the conf directory), uncomment the following section.

    <!-- 
      <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> 
        <lst name="defaults"> 
          <str name="config">data-config.xml</str> 
        </lst> 
      </requestHandler> 
      -->

    This enables Data Import Handler.

  4. Reload the collection.

  5. Use one of the following cfindex actions: fullImport, deltaImport, status, or abort.