-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This project should give a general overview how to configure the dataimport for KnowledgeFinder.
In the first section is described how to configure it on your system. The other parts should give you a deeper understanding of how this configuration works.
Replace the baseDir="/path/to/your/workspace" in the entity "file" and filePrefix="/path/to/your/workspace" in the field "filePath" with your checkout location.
data-conf.xml
<dataSource type="FileDataSource" />
<document>
<!-- the same transformers need to be defined for all entities -->
<!-- the order of the transformer is the order in which they are executed.
The order of the fields are not important -->
<entity name="file" processor="FileListEntityProcessor"
baseDir="/path/to/your/workspace/knowledgefinder-config-example/dataset/metadata"
fileName=".*xml" rootEntity="false" datasource="null"
transformer="de.dlr.knowledgefinderII.dataimport.utils.transformer.FilePathTransformer">
<field column="filePath"
filePrefix="/path/to/your/workspace/knowledgefinder-config-example/dataset/documents"
fileSuffix=".pdf" oldFileSuffix=".xml" srcColName="file"/>
<!-- import file content -->
<entity name="metadataImport" processor="XPathEntityProcessor"
forEach="/documents/document" url="${file.fileAbsolutePath}"When using the webservice you have to tell the webservice where to find
solr, edit the file webservice.properties and set the correct values.
If you have your own frontend theme you can use the css classed by setting
the css classes in the file facetFilterConfig.json of the portlet
configuration.
This part supplies 5 papers and there corresponding metadata in xml format.
-
This part of the project provides an example solr configuration to work with the example data and the transformer provided by the utils in knowledgefinder-dataimport.
-
Important for the dataimport are the files
data-conf.xmlandschema.xml. -
The
schema.xmlfile describe how solr should store the data (see: Schema File). -
In the
data-conf.xmlis described how data should import in the schema (see: Data Import Handler). The example configuration use a FileDataSource. The used EntityProcessors are FileListEntityProcesor, XPathEntityProcessor and CustomeTikaEntityProcessor. -
The FileListEntityProcessor find all xml files in the given directory to give the information where to find the files to the other processors. Furthermore he invokes some transformes to compute the ID of the entry and the path to the related pdf document.
-
The XPathEntityProcessor gets the metadata from the xml file and match them to the schema. It invokes transformers to get a proper java date object and split some comma separated strings to a list.
-
In the end the CustomTikaEtityProcessor gives solr the content of the pdf to parse this for a full text search. The list of the file formats supported by Apache Tikka can be found in the Tika documentation
The default field used by Solr is defined in the file solrconfig.xml. In this case "full_text_search" is used as "default field(df)" for searching
solrconfig.xml
...
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
...
<str name="df">full_text_search</str>
...
</lst>
...
</requestHandler>Thus we have to define a copy of the attribute (p.e. description) into this field:
schema.xml
...
<fields>
...
<copyField source="description" dest="full_text_search" />
...
</fields>
...Now the configuration has to be reloaded inorder to regenerate the index
This project holds the information were the solr server is expected and some access control files.
The connection to solr is configured in the file webservice.proporties and my look like this:
/knowledgefinder-config-example/webservice-config/src/main/resources/webservice.properties
##
## Solr connection
##
solr.scheme=http
solr.host=localhost
solr.port=8983
# solr.username=username
# solr.password =password
solr.core=solr/exampleThe other configuration files give restrictions and default values for solr queries. For more information see Webservice - Configuration
Here you can find the required files for the portlet configuration. No changes are needed.
In the file server.properties is where liferay expects the webservice
and which urls should be called to access it.
knowledgefinder-config-example/portlet-config/src/main/resources/server.properties
host=http://localhost:8080
url=/api/jsonws/KnowledgeFinderWebservice.knowledgefinder/
urlDocuments=get-documents/
urlNodes=get-nodes/The file resultListConfig.json contains information which attributes should be
displayed in the result set, how many results should displayed per site
and which sorts are available. The file detailedViewConfig.json
contains which attributes should be displayed when clicking on "more
information". The file facetFilterConfig.json describes the possible
solr facets, just extended facets will be displayed.