Scoring Function Formulas

Introduction

Scoring functions are defined by mathematic formulas that take data from the document, the query and the textual relevance in order to assign a score to each matching document for a query. The resulting scores are used when searching the index to provide specific orderings for the results.

You can modify these formulas in real-time and they can be as complex as you need them to be.

When writing the formula, have in mind that:

  • Formulas must be well formed. A missing parenthesis, an unknown function or variable name or an undefined operator will result in a syntax error when editing the function.
  • Variable values and the resulting score are float numbers.
  • All expressions except conditions are float expressions. In any context where you can use a variable you can also use a function or a literal scalar value ("1", "0.5", "-5").

Operators

Formulas allow the following operators to work with expressions: +, -, *, /

These are all binary operators except for "-" which can also be used to negate (being a unary operator).

Variables

Scoring function formulas are computed for each document matching a given query. Hence there is a list of variables related to the document or the query that can be used in the formula:

Textual Relevance

Description: For each document matching a query a textual relevance (how relevant is the documents text for the query) is calculated. You may or may not consider this value in your formula, or decide how important it is in the final calculation (e.g.: if you want to sort your results just by creation time, you may discard this variable in your scoring function).

Syntax: relevance

Short syntax: rel or r or R

Values: Relevance is always a positive float number. Because of precision issues, relevance CAN be zero.

Sample:

-age * relevance
(sorts documents considering how new and
how relevant to the query the document is equally)
                            
Document's Age

Description: When indexed, every document is assigned a timestamp, an integer value which usually describes its creation time. The larger the value, the newer the document. The timestamp field can be provided when adding the document. Otherwise, IndexDen automatically assigns a value representing the number of seconds since Unix Epoch (00:00:00 UTC on 1 January 1970) until the moment the document was indexed.

When writing formulas you can use the document's age, which is the result of subtracting the documents timestamp to the number of seconds since Unix Epoch until the moment the query was executed. When using UNIX time for the documents' timestamps, this variable represents the age of the document in seconds.

Syntax: doc.age

Short syntax: age or a or A

Values: Since there are no restrictions to the documents' timestamps provided, age can contain negative values.

Sample:

-age * relevance
(sorts documents considering how new and
how relevant to the query the document is equally)
                            
Document's Variables

Description: When a document is indexed, it is possible to assign numeric (float) variables to it. These variables may represent rapidly changing numeric values that have some implication on the document's possible valuation in a sorting function (number of positive and negative votes, number of comments, user generated score, review score, number of visits, etc.). Once a document is indexed, its variables can be changed any time it is necessary with no cost other than the one related to the HTTP communication.

The variables are identified by an integer number from zero to the the variables' limit minus one. The maximum number of variables available for each document will depend on the package of the account.

Syntax: doc.var[n] (where n is the variable's integer identifier)

Short syntax: d[n] or D[n]

Values: Any float entered when the document was indexed or afterwards. For negative values, NaN (not a number) will be returned. A zero value will return negative infinity.

Sample:

log(doc.var[0]) - age/86400
(sorts documents considering natural
logarithm of the variable #0 of the document minus its age in days)
                            
Document's Variables

Description: When performing a search in the index, it is possible to pass float variables along with the query (check the searching documentation). These variables can be later used in the scoring function's formula.

Syntax: query.var[n] (where n is the variable's integer identifier)

Short syntax: q[n] or Q[n]

Values: Any float passed as a query variable.

Sample:

miles(query.var[0], query.var[1], doc.var[0], doc.var[1])
(sorts documents considering the distance between doc and a point passed in the query)
                            

Available functions

There is a set of mathematical functions available for writing formulas:

Natural logarithm

Description: Calculates the natural logarithm of an expression. The logarithm function is useful when there's a need to consider the order of magnitude of an expression instead of its actual value (for example, it is comparable to considering the number of digits of the value). Mathematically speaking, it is the inverse to an exponential function.

Syntax: log(val)

Arguments:
val: a float expression to the which apply the logarithm. For negative values, NaN

Sample:

log(doc.var[0]) - age/86400
(sorts documents considering natural logarithm
of the variable #0 of the document minus its age in days)
                            
Power

Description: Raises a given float expression to a given power (integer). The power function can be used to create exponential functions or to weight different factors (make one more important than another in a product).

Syntax: pow(base, exponent)

Arguments:
base: a float expression, the base.
exponent: a integer expression, the exponent. Zero, and negative values can be used. Float expressions (variables, function results) can be used, but will be truncated (the integer value closest to zero will be considered).


Sample:

pow(doc.var[0], 3) * doc.var[1]
(sorts documents considering variable #0 three times as important as variable #1)
                            
Max

Description: Returns the greater of two values.

Syntax: max(a, b)

Arguments:
a: a float expression.
b: a float expression.

Sample:

max(doc.var[0], doc.var[1])
(sorts documents considering variable #0 or variable #1 wichever is greater)
                            
Min

Description: Returns the smaller of two values.

Syntax: min(a, b)

Arguments:
a: a float expression.
b: a float expression.

Sample:

min(doc.var[0], doc.var[1])
(sorts documents considering variable #0 or variable #1 wichever is smaller)
                            
Absolute

Description: Returns the absolute value of a double value. For positive values, the argument is returned. For negative values, the negation of the value is returned.

Syntax: abs(value)

Arguments:
value: a float expression, zero, positive, or negative.

Sample:

abs(doc.var[0])
(sorts documents considering variable #0 equally when its value is 1 or -1)
                            
Square root

Description: Calculates the square root of a double value. This function is a variant of the power function that considers one case of non integer exponent (1/2).

Syntax: sqrt(value)

Arguments:
value: a float expression. For negative values, NaN (not a number) will be returned.

Sample:

sqrt(doc.var[0])
(sorts documents considering the square root of variable #0)
                            
If clause

Description: Evaluates a condition and returns the corresponding expression. This function takes three arguments: a boolean condition, the expression to evaluate when the condition is met and the expression to consider when it is not.

The expressions are regular float expressions (a variable, the result of a function, the result of an operation, a literal).

The boolean condition is expressed by comparing two float expressions (no boolean operations allowed) with one of this comparators:

  • a == b: true when both expressions are equal.
  • a <= b: true when expression a is smaller than or equal to expression b.
  • a >= b: true when expression a is greater than or equal to expression b.
  • a < b: true when expression a is smaller than expression b.
  • a > b: true when expression a is greater than expression b.
  • a != b: true when expression a and b are not equal.

Syntax: if(cond, true, false)

Arguments:
cond: the boolean condition comparing two expressions.
true: the expression to evaluate when cond is met.
false: the expression to evaluate when cond is not met.

Sample:

if(doc.var[0] < 1, doc.var[0], rel)
(sorts documents considering variable #0
while its value is less than 1, otherwise considering textual relevance)
                            
Kms/miles calculator

Description: Calculates the distance between two geographical points expressed as longitude/latitude coordinates. The distance can be expressed in kilometers or in miles.

Syntax: km(lat1, long1, lat2, long2) or miles(lat1, long1, lat2, long2)

Arguments:
lat1: latitude of point 1.
long1: longitude of point 1.
lat2: latitude of point 2.
long2: longitude of point 2.

All coordinates are float values and they are expressed in degrees (non integer values ARE considered).

Sample:

miles(query.var[0], query.var[1], doc.var[0], doc.var[1])
(sorts documents considering the distance between doc and a point passed in the query)
                            

Real Time Web Analytics