Pular para o conteúdo principal

Storage: RDBMS (NaiveBayes)

ByJG\TextClassifier\NaiveBayes\Storage\Rdbms persists NaiveBayes training data to a relational database using direct SQL via byjg/anydataset-db. Supports SQLite, MySQL, and PostgreSQL.

Setup

SQLite

use ByJG\TextClassifier\NaiveBayes\Storage\Rdbms;
use ByJG\Util\Uri;

$storage = new Rdbms(new Uri('sqlite:///var/data/classifier.db'));

// Run once to create the schema
$storage->createDatabase();

MySQL

$storage = new Rdbms(new Uri('mysql://user:password@localhost/mydb'));
$storage->createDatabase();

PostgreSQL

$storage = new Rdbms(new Uri('pgsql://user:password@localhost/mydb'));
$storage->createDatabase();

Use with NaiveBayes

use ByJG\TextClassifier\Lexer\ConfigLexer;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\NaiveBayes\NaiveBayes;

$nb = new NaiveBayes($storage, new StandardLexer(new ConfigLexer()));

$nb->train('Python is a programming language', 'tech');
$result = $nb->classify('machine learning');

createDatabase()

Runs byjg/migration against the db/ directory, applying the 00002.sql migration which creates:

CREATE TABLE nb_internals (
category VARCHAR(255) NOT NULL,
doc_count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (category)
);

CREATE TABLE nb_wordlist (
token VARCHAR(255) NOT NULL,
category VARCHAR(255) NOT NULL,
count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (token, category)
);

Note: createDatabase() also creates the tc_wordlist table (migration 00001) because the migration runs all scripts in order. This is harmless — the BinaryClassifier spam filter tables are simply unused by NaiveBayes.

Sharing a database with BinaryClassifier

If you use both the spam filter and the multi-class classifier in the same application, they can share the same database. Both engines use different tables (tc_wordlist vs nb_wordlist/nb_internals) and do not interfere.

$uri = new Uri('sqlite:///var/data/shared.db');

// Run createDatabase() from either Rdbms implementation — both apply the full migration set
$storage = new \ByJG\TextClassifier\NaiveBayes\Storage\Rdbms($uri);
$storage->createDatabase();

$spamStorage = new \ByJG\TextClassifier\Storage\Rdbms($uri, $degenerator);
// No createDatabase() needed — already applied above