Skip to main content

NaiveBayes Class

ByJG\TextClassifier\NaiveBayes\NaiveBayes is the multi-class Naive Bayes text classifier.

Constructor

new NaiveBayes(
StorageInterface $storage,
LexerInterface $lexer,
ConfigNaiveBayes $config = new ConfigNaiveBayes(),
?LlmInterface $llm = null,
?ConfigLlm $configLlm = null,
)
ParameterTypeDescription
$storageByJG\TextClassifier\NaiveBayes\Storage\StorageInterfacePersistence backend
$lexerByJG\TextClassifier\Lexer\LexerInterfaceTokeniser
$configConfigNaiveBayesSmoothing parameters (optional)
$llmLlmInterface|nullOptional LLM for low-confidence escalation
$configLlmConfigLlm|nullLLM escalation thresholds (defaults apply when null)

Methods

train()

public function train(string $text, string $category): void

Trains the classifier with $text as an example of $category. The category is created automatically if it does not exist.

  • Increments the document count for $category by 1
  • Increments the token count for each unique token in $text, scaled by its occurrence count

untrain()

public function untrain(string $text, string $category): void

Reverses a previous train() call. Decrements document and token counts. If a category's document count reaches zero, it is removed from storage.

getCategories()

public function getCategories(): array<string>

Returns the list of categories currently present in storage.

classify()

public function classify(string $text): ?ClassificationResult

Classifies $text and returns a ClassificationResult, or null when no categories have been trained yet.

Return valueMeaning
ClassificationResultClassification succeeded
nullNo categories trained, or only one category exists (one-vs-rest requires ≥ 2)
$result = $nb->classify('machine learning algorithms');

echo $result->choice; // 'tech'
echo $result->score; // 0.91

When an LLM is injected and autoLearn=true, the model is retrained on the LLM decision before returning. statScores always reflects the raw score before any LLM involvement.

ClassificationResult fields

FieldTypeDescription
choicestringWinning category name
scorefloatFinal score of the winning category 0.01.0
scoresarray<string, float>All final scores, sorted descending
statScoresarray<string, float>Raw statistical scores before any LLM escalation
llmDecisionstring|nullLLM's label if consulted, otherwise null
escalatedbooltrue when the LLM was invoked

Usage example

use ByJG\TextClassifier\Lexer\ConfigLexer;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\NaiveBayes\NaiveBayes;
use ByJG\TextClassifier\NaiveBayes\Storage\Memory;

$nb = new NaiveBayes(
new Memory(),
new StandardLexer(new ConfigLexer())
);

$nb->train('PHP is a programming language', 'tech');
$nb->train('The dog ran in the park', 'animals');

$result = $nb->classify('programming language');
echo $result->choice; // 'tech'
echo $result->score; // e.g. 0.87