Danish analyzer for Lucene.Net

Danish analyzer for Lucene.Net

I've been looking for a Danish analyzer for Lucene.Net to use with Sitecore ContentSearch.

There exists one in the original Java version of Lucene but it hasn't been ported for Lucene.Net yet - the .NET port of Lucene.

Actually that's not completely true. The Danish analyzer [seems to have been ported in October 2016, but there hasn't been a new relase of Lucene.Net since version 3.0.3 in October 2012.

So, I decided to "create" one myself. "Create" because it's basically just combining a tokenizer, filters and the (already ported) Danish stemmer.

You can see the source code on GitHub where you can also report issues or request new features.

There exists two NuGet packages:

Sitecore ContentSearch

The NuGet package meant for Sitecore contains the below config file as well, to add the Danish analyzer to the culture execution contexts for the Danish culture da.

With this the Danish analyzer will be used when indexing items in language da and create better terms that should result in better search results.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <indexConfigurations>
        <defaultLuceneIndexConfiguration>

          <analyzer>
            <param>
              <map>

                <!-- Add danish analyzer for 'da' culture -->
                <mapEntry patch:before="*[1]" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.PerExecutionContextAnalyzerMapEntry, Sitecore.ContentSearch.LuceneProvider">
                  <param hint="executionContext" type="Sitecore.ContentSearch.CultureExecutionContext, Sitecore.ContentSearch">
                    <param hint="cultureInfo" type="System.Globalization.CultureInfo, mscorlib">
                      <param hint="name">da</param>
                    </param>
                  </param>
                  <param hint="analyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
                    <param hint="defaultAnalyzer" type="Additio.Lucene.Analyzers.DanishAnalyzer, Additio.Lucene.Analyzers">
                      <param hint="version">Lucene_30</param>
                    </param>
                  </param>
                </mapEntry>

              </map>
            </param>
          </analyzer>

        </defaultLuceneIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>

Remember to also use the CultureExecutionContext when searching so your query get analyzed the same way as when the item's fields were indexed.

using (var context = ContentSearchManager.GetIndex("INDEX_NAME").CreateSearchContext())
{
    var cultureExecutionContext = new CultureExecutionContext(new CultureInfo("da"));
    var query = context.GetQueryable<SearchResultItem>(cultureExecutionContext);

    // configure query and get results
}