A new solr multilingual index and search architecture, it can support index and search across multiple languages at the same time in the same field.

Feature

Configuration

To use this plugin, you need to config field and fieldType in the schema.xml, and config language detection in the solrconfig.xml.

FieldType

In addition to support all of solr's default attributes, such as sortMissingLast, positionIncrementGap, etc, we support some customize attributes for this special MultiLangField.

The standard of lang codes that support language-detection plugin:

"af", "ar", "bg", "bn", "cs", "da", "de", "el", "en", "es", "et", "fa", "fi", "fr", "gu",
"he", "hi", "hr", "hu", "id", "it", "ja", "kn", "ko", "lt", "lv", "mk", "ml", "mr", "ne",
"nl", "no", "pa", "pl", "pt", "ro", "ru", "sk", "sl", "so", "sq", "sv", "sw", "ta", "te",
"th", "tl", "tr", "uk", "ur", "vi", "zh-cn", "zh-tw"

The following is the example code:

<fieldType name="multi_lang"
           class="com.pleasecode.solr.schema.MultiLangField"
           sortMissingLast="true"
           removeDuplicates="true"
           defaultFieldType="text_general"
           fieldTypeMappings="en:text_en, zh-cn:text_ik, ja:text_ja"/>
Field

Set type attribute of the field as MultiLangField type,

For example:

<field name="content" type="multi_lang" indexed="true" stored="true" omitNorms="true"/>
Language Identification

The following is the example code:

<updateRequestProcessorChain name="multi-langid">
  <processor class="com.pleasecode.solr.langdetect.MultiLangDetectLanguageIdentifierUpdateProcessorFactory">
        <lst name="defaults">
            <str name="multi-langid">true</str>
            <str name="multi-langid.fl">subject,content</str>
            <str name="multi-langid.whitelist">zh-cn,ja,en</str>
            <str name="multi-langid.fallback">en</str>
            <str name="multi-langid.threshold">0.8</str>
            <str name="multi-langid.hidePreLangs">false</str>
        </lst>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

<requestHandler name="/update" class="solr.UpdateRequestHandler">
    <lst name="invariants">
      <str name="update.chain">multi-langid</str>
    </lst>
</requestHandler>

LICENSE

This plugin is released under the MIT license.