Clone this repo:
  1. 277d5d3 Migrate to wmf-jvm-parent-pom. by Guillaume Lederrey · 8 weeks ago master
  2. 634cbf9 [maven-release-plugin] prepare for next development iteration by Erik Bernhardson · 2 months ago
  3. 64729e4 [maven-release-plugin] prepare release extra-parent-7.10.2-wmf12 by Erik Bernhardson · 2 months ago extra-parent-7.10.2-wmf12
  4. a48082a [maven-release-plugin] prepare for next development iteration by Erik Bernhardson · 2 months ago
  5. 3fd23fb [maven-release-plugin] prepare release extra-parent-7.10.2-wmf11 by Erik Bernhardson · 2 months ago extra-parent-7.10.2-wmf11

Extra Queries and Filters Build Status

The plan is for this to include any extra queries, filters, native scripts, score functions, and anything else we think we end up creating to make search nice for Wikimedia. It contains four diffferent plugins:

extra

The extra plugin contains utilities that are generally useful.

Queries:

  • source_regex - An nGram accelerated regular expression filter that is generally much much faster than sequentially checking all documents.
  • token_count_router - Simple query wrapper that evaluates some conditions based on the number of tokens of the input query.
  • simswitcher - Simple query wrapper that allows to override similarity settings at query time (expert: use with caution).
  • term_freq - Simple term query with filtering based on term frequency.

Native Scripts:

  • super_detect_noop - Like detect_noop but supports configurable sloppiness. New in 1.5.0, 1.4.1, and 1.3.1.

Analysis:

  • preserve_original - A token filter that wraps a filter chain to keep and emit the original term at the same position. New in 2.3.4.
  • term_freq - A token filter to populate the term frequency from the input string. New in 5.5.2.6.

extra-analysis-homoglyph

Analysis:

  • homoglyph_norm - A token filter that will provide additional single-script tokens for multi-script tokens that contain homoglyphs.

extra-analysis-khmer

Analysis:

  • khmer_syll_reorder - A character filter that will replace deprecated Khmer characters and attempt to canonically reorder Khmer orthographic syllables.

extra-analysis-slovak

This plugin contains a Slovak stemmer.

Analysis:

  • slovak_stemmer - A token filter that provides stemming for the Slovak language. New in 5.5.2.4.

extra-analysis-textify

This plugin contains miscellaneous text mungers.

Analysis:

  • acronym_fixer - A character filter that removes periods from acronym-like contexts.
  • camelCase_splitter - A character filter that splits camelCase words.
  • icu_token_repair - A token filter that rejoins tokens split asunder by the ICU tokenizer.
  • limited_mapping - A character filter that is limited to changing or deleting single characters.

extra-analysis-turkish

Analysis:

extra-analysis-ukrainian

These filters are provided to allow for unpacking the monolithic Elasticsearch Ukrainian analyzer, which is a wrapper around the monolithic Lucene Ukrainian analyzer. This version of the Urkainian stemmer uses slightly a newer version of the Morfologik Ukrainian stemming dictionary than the parallel version in Elastic/Lucene.

Analysis:

  • ukrainian_stop - A stopword token filter for Ukrainian.

  • ukrainian_stemmer - A token filter than provides stemming for the Ukrainian language.

Installation

Extra Queries and Filters PluginElasticSearch
6.3.1.2, master branch6.3.1
5.5.2.75.5.2
5.5.25.5.2
5.3.25.3.2
5.2.25.2.2
5.2.15.2.1
5.2.05.2.0
5.1.25.1.2
2.4.1, 2.4 branch2.4.1
2.4.02.4.0
2.3.5, 2.3 branch2.3.5
2.3.42.3.4

Install it like so for Elasticsearch x.y.z:

<= 2.4.1

./bin/plugin --install org.wikimedia.search/extra/x.y.z

>= 5.1.2

./bin/elasticsearch-plugin install org.wikimedia.search:extra:x.y.z
./bin/elasticsearch-plugin install org.wikimedia.search:extra-analysis-slovak:x.y.z

Build

Spotbugs is run during the verify phase of the build to find common issues. The build will break if any issue is found. The issues will be reported on the console.

To run just the check, use mvn spotbugs:check on a project that was already compiled (mvn compile). mvn spotbugs:gui will provide a graphical UI that might be easier to read.

Like all tools, spotbugs is much dumber than you. If you find a false positive, you can ignore it with the @SuppressFBWarnings annotation. You can provide a justification to make document why this rule should be ignored in this specific case. Some rules don't make sense for this project and they can be ignored via src/dev-tools/spotbugs-excludes.xml.