Pular para o conteúdo principal

LLM-Assisted Classification

When a statistical classifier is uncertain, you can escalate to an LLM for a definitive decision. The LLM result is optionally fed back as training data (active learning), so the statistical model improves over time.

The LLM is injected directly into BinaryClassifier and NaiveBayes via their constructors — no wrapper classes needed.


Spam filter with LLM fallback

use ByJG\TextClassifier\BinaryClassifier;
use ByJG\TextClassifier\ClassificationResult;
use ByJG\TextClassifier\ConfigBinaryClassifier;
use ByJG\TextClassifier\Degenerator\ConfigDegenerator;
use ByJG\TextClassifier\Degenerator\StandardDegenerator;
use ByJG\TextClassifier\Lexer\ConfigLexer;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\Llm\ConfigLlm;
use ByJG\TextClassifier\Llm\OpenAiLlmClient;
use ByJG\TextClassifier\Storage\Rdbms;
use ByJG\Util\Uri;

// 1. Build the OpenAI/Ollama client
$openai = OpenAI::client(getenv('OPENAI_API_KEY'));
$llm = new OpenAiLlmClient($openai, 'gpt-4o-mini');

// 2. Configure escalation thresholds (optional — these are the defaults)
$config = (new ConfigLlm())
->setLowerBound(0.35) // escalate when score >= 0.35 …
->setUpperBound(0.65) // … and score <= 0.65
->setAutoLearn(true); // feed LLM decision back as training data

// 3. Build the classifier with LLM injected
$lexer = new StandardLexer(new ConfigLexer());
$degenerator = new StandardDegenerator(new ConfigDegenerator());
$storage = new Rdbms(new Uri('sqlite:///tmp/spam.db'), $degenerator);
$storage->createDatabase();

$classifier = new BinaryClassifier(
new ConfigBinaryClassifier(),
$storage,
$lexer,
llm: $llm,
configLlm: $config,
);

// 4. Classify — LLM escalation happens internally when the score is uncertain
$result = $classifier->classify('You have won a free prize! Click here now.');

if ($result instanceof ClassificationResult) {
echo $result->choice; // 'spam' or 'ham'
echo $result->score; // final score
echo $result->escalated ? 'LLM was consulted' : 'stat model was sufficient';
}

When the statistical score lands in the uncertain zone [lowerBound, upperBound], the LLM is asked to decide. With autoLearn=true, the decision is fed back via learn() and the classifier re-scores the text with the updated model — score in the result reflects this final re-scored value. statScores always holds the original statistical score before any LLM involvement.


Multi-class classifier with LLM fallback

use ByJG\TextClassifier\Lexer\ConfigLexer;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\Llm\ConfigLlm;
use ByJG\TextClassifier\Llm\OpenAiLlmClient;
use ByJG\TextClassifier\NaiveBayes\NaiveBayes;
use ByJG\TextClassifier\NaiveBayes\Storage\Memory;

// 1. Build the OpenAI/Ollama client
$openai = OpenAI::client(getenv('OPENAI_API_KEY'));
$llm = new OpenAiLlmClient($openai, 'gpt-4o-mini');

$config = (new ConfigLlm())
->setMinConfidence(0.65) // escalate when top score < 0.65
->setMinMargin(0.15) // escalate when top − second < 0.15
->setAutoLearn(true);

// 2. Build NaiveBayes with LLM injected
$storage = new Memory();
$nb = new NaiveBayes(
$storage,
new StandardLexer(new ConfigLexer()),
llm: $llm,
configLlm: $config,
);

// 3. Train some examples
$nb->train('PHP is a server-side programming language', 'tech');
$nb->train('The cat sat on the mat', 'animals');

// 4. Classify — LLM escalation happens internally when confidence is low
$result = $nb->classify('Python machine learning algorithms');

echo $result->choice; // 'tech'
echo $result->score; // final score after any LLM retraining
echo $result->escalated; // true/false

Escalation triggers when any of:

  • classify() returns null (no categories trained yet)
  • The top score is below minConfidence
  • The gap between the top and second score is below minMargin

Reading the ClassificationResult

Both classifiers return a ClassificationResult with the same fields:

FieldTypeDescription
choicestringFinal winning category
scorefloatFinal score of the winning category
scoresarray<string, float>All final scores, sorted descending
statScoresarray<string, float>Raw statistical scores before LLM escalation
llmDecisionstring|nullWhat the LLM chose, or null if not consulted
escalatedbooltrue when the LLM was invoked
$result = $nb->classify($text);
if ($result === null) {
// no categories trained yet
}

printf(
"Choice: %s (%.0f%%) stat: %.0f%% LLM: %s\n",
$result->choice,
$result->score * 100,
array_values($result->statScores)[0] * 100,
$result->escalated ? $result->llmDecision : '-',
);

Bringing your own LLM

Implement LlmInterface to connect any LLM:

use ByJG\TextClassifier\Llm\LlmInterface;

class MyCustomLlm implements LlmInterface
{
public function classify(string $text, array $categories): string
{
// $categories is the full list of allowed labels
// return exactly one value from $categories
}
}

The single classify() method is used for both binary (categories = ['spam', 'ham']) and multi-class scenarios.


See also