Skip to main content

ConfigNaiveBayes

ByJG\TextClassifier\NaiveBayes\ConfigNaiveBayes controls the smoothing behaviour of the NaiveBayes classifier. Parameters are set via constructor.

Constructor

new ConfigNaiveBayes(float $robS = 1.0, float $robX = 0.5)

Parameters

ParameterConstructor argDefaultTypeDescription
robSFirst positional1.0floatRobinson smoothing weight. Controls how strongly rare tokens are pulled toward the neutral prior robX.
robXSecond positional0.5floatNeutral prior probability. The assumed score for a token/category pair with no training data.

Usage

use ByJG\TextClassifier\NaiveBayes\ConfigNaiveBayes;
use ByJG\TextClassifier\NaiveBayes\NaiveBayes;

// Default config
$nb = new NaiveBayes($storage, $lexer);

// Custom smoothing
$nb = new NaiveBayes($storage, $lexer, new ConfigNaiveBayes(robS: 2.0, robX: 0.5));

Tuning guidance

robS

Higher values make the classifier more conservative — tokens with few observations are pulled strongly toward 0.5 and have less influence on the final score. Lower values give rare tokens more influence, which can cause instability with small training sets.

robS valueBehaviour
0.10.5Aggressive — rare tokens matter more
1.0 (default)Balanced
2.05.0Conservative — rare tokens matter less

robX

The neutral prior. 0.5 means "no information" — a token never seen in any category contributes nothing to any category's score. Changing this biases the classifier.

Getters

MethodReturns
getRobS()float
getRobX()float

Comparison with ConfigBinaryClassifier

ConfigNaiveBayes has different defaults than ConfigBinaryClassifier because the algorithms differ:

ParameterConfigBinaryClassifier defaultConfigNaiveBayes default
robS0.31.0
robX0.50.5

NaiveBayes uses a higher robS because the one-vs-rest approach with multiple categories is more sensitive to sparse data than the binary Fisher test used by BinaryClassifier.