Skip to content
This repository was archived by the owner on Jun 13, 2025. It is now read-only.
Sivasurya Santhanam edited this page Jun 21, 2018 · 8 revisions

This project should give a general overview how to configure the dataimport for KnowledgeFinder.

In the first section is described how to configure it on your system. The other parts should give you a deeper understanding of how this configuration works.

Use this project

Replace the baseDir="/path/to/your/workspace" in the entity "file" and filePrefix="/path/to/your/workspace" in the field "filePath" with your checkout location.

data-conf.xml

<dataSource type="FileDataSource" />
    <document>
        <!-- the same transformers need to be defined for all entities -->
        <!-- the order of the transformer is the order in which they are executed.
            The order of the fields are not important -->
        <entity name="file" processor="FileListEntityProcessor"
            baseDir="/path/to/your/workspace/knowledgefinder-config-example/dataset/metadata"
            fileName=".*xml" rootEntity="false" datasource="null"
            transformer="de.dlr.knowledgefinderII.dataimport.utils.transformer.FilePathTransformer">
            <field column="filePath"
                filePrefix="/path/to/your/workspace/knowledgefinder-config-example/dataset/documents"
                fileSuffix=".pdf" oldFileSuffix=".xml" srcColName="file"/>
            <!-- import file content -->
            <entity name="metadataImport" processor="XPathEntityProcessor"
                forEach="/documents/document" url="${file.fileAbsolutePath}"

When using the webservice you have to tell the webservice where to find solr, edit the file webservice.properties and set the correct values. If you have your own frontend theme you can use the css classed by setting the css classes in the file facetFilterConfig.json of the portlet configuration.

Example Data

This part supplies 5 papers and there corresponding metadata in xml format.

Solr configuration

  • This part of the project provides an example solr configuration to work with the example data and the transformer provided by the utils in knowledgefinder-dataimport.

  • Important for the dataimport are the files data-conf.xml and schema.xml.

  • The schema.xml file describe how solr should store the data (see: Schema File).

  • In the data-conf.xml is described how data should import in the schema (see: Data Import Handler). The example configuration use a FileDataSource. The used EntityProcessors are FileListEntityProcesor, XPathEntityProcessor and CustomeTikaEntityProcessor.

  • The FileListEntityProcessor find all xml files in the given directory to give the information where to find the files to the other processors. Furthermore he invokes some transformes to compute the ID of the entry and the path to the related pdf document.

  • The XPathEntityProcessor gets the metadata from the xml file and match them to the schema. It invokes transformers to get a proper java date object and split some comma separated strings to a list.

  • In the end the CustomTikaEtityProcessor gives solr the content of the pdf to parse this for a full text search. The list of the file formats supported by Apache Tikka can be found in the Tika documentation

The default field used by Solr is defined in the file solrconfig.xml. In this case "full_text_search" is used as "default field(df)" for searching

solrconfig.xml

...
<requestHandler name="/select" class="solr.SearchHandler">
        <lst name="defaults">
            ...
            <str name="df">full_text_search</str>
            ...
        </lst>
...
</requestHandler>

Thus we have to define a copy of the attribute (p.e. description) into this field:

schema.xml

...
<fields>
    ...
    <copyField source="description" dest="full_text_search" />
    ...
</fields>
...

Now the configuration has to be reloaded inorder to regenerate the index

Webservice configuration

This project holds the information were the solr server is expected and some access control files.

The connection to solr is configured in the file webservice.proporties and my look like this:

/knowledgefinder-config-example/webservice-config/src/main/resources/webservice.properties

##
## Solr connection
##
    solr.scheme=http
    solr.host=localhost
    solr.port=8983
    # solr.username=username
    # solr.password =password
    solr.core=solr/example

The other configuration files give restrictions and default values for solr queries. For more information see Webservice - Configuration

Portlet configuration

Here you can find the required files for the portlet configuration. No changes are needed.

In the file server.properties is where liferay expects the webservice and which urls should be called to access it.

knowledgefinder-config-example/portlet-config/src/main/resources/server.properties

host=http://localhost:8080
url=/api/jsonws/KnowledgeFinderWebservice.knowledgefinder/
urlDocuments=get-documents/
urlNodes=get-nodes/

The file resultListConfig.json contains information which attributes should be displayed in the result set, how many results should displayed per site and which sorts are available. The file detailedViewConfig.json contains which attributes should be displayed when clicking on "more information". The file facetFilterConfig.json describes the possible solr facets, just extended facets will be displayed.