Pular para o conteúdo principal

Classifying Text

classify() returns a ClassificationResult object, or null when no categories have been trained yet.

$result = $nb->classify(string $text): ?ClassificationResult

Return value

$result = $nb->classify('programming language Python');

$result->choice; // 'tech'
$result->score; // 0.94
$result->scores; // ['tech' => 0.94, 'politics' => 0.51, 'animals' => 0.48]

ClassificationResult fields

FieldTypeDescription
choicestringWinning category name
scorefloatScore of the winning category 0.01.0 (final, after any LLM retraining)
scoresarray<string, float>All final category scores, sorted descending
statScoresarray<string, float>Raw statistical scores before any LLM escalation
llmDecisionstring|nullLLM's label if it was consulted, otherwise null
escalatedbooltrue when the LLM was invoked

Getting the top category

$result = $nb->classify($text);
echo $result?->choice; // null-safe when no categories trained yet

Confidence threshold

Scores close to 0.5 mean the classifier is uncertain. A score near 1.0 means strong evidence for that category:

$result = $nb->classify($text);
if ($result === null) {
echo "No categories trained yet";
} elseif ($result->score >= 0.8) {
echo "Confident: {$result->choice}";
} elseif ($result->score >= 0.6) {
echo "Likely: {$result->choice}";
} else {
echo "Uncertain — consider retraining";
}

One-vs-rest scoring

Each category is scored independently using a one-vs-rest approach: the classifier asks "how likely is this text to belong to category X versus all other categories combined?" This means scores across categories do not sum to 1.0.

Categories with no overlap

If a token appears only in one category, it becomes a strong signal for that category. Tokens shared across many categories carry less discriminative weight.

Edge cases

SituationResult
No trained categoriesReturns null
Text with no known tokensCategory scores stay near 0.5; order is unpredictable
Only one category trainedReturns null (one-vs-rest requires at least 2 categories with docs)
Empty or non-string inputReturns null silently (lexer returns no tokens)