Pular para o conteúdo principal

Database Schema

Both the BinaryClassifier spam filter and NaiveBayes RDBMS backends use the same database, managed by byjg/migration. Migrations are located in db/migrations/.

Migration files

FileVersionDescription
db/migrations/up/00001.sql1Creates tc_wordlist and seeds internal variables
db/migrations/up/00002.sql2Creates nb_internals and nb_wordlist
db/migrations/down/00001.sqlDrops tc_wordlist
db/migrations/down/00002.sqlDrops nb_wordlist and nb_internals

Tables

tc_wordlist — BinaryClassifier spam filter tokens

CREATE TABLE tc_wordlist (
token VARCHAR(255) NOT NULL,
count_ham INTEGER DEFAULT NULL,
count_spam INTEGER DEFAULT NULL,
PRIMARY KEY (token)
);
ColumnTypeDescription
tokenVARCHAR(255)The word or URI fragment
count_hamINTEGERTimes this token appeared in ham training texts
count_spamINTEGERTimes this token appeared in spam training texts

Internal rows (seeded by migration):

tokencount_hamcount_spamPurpose
tc*dbversion3NULLSchema version check
tc*texts00Total ham / spam text counts

nb_internals — NaiveBayes document counts

CREATE TABLE nb_internals (
category VARCHAR(255) NOT NULL,
doc_count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (category)
);
ColumnTypeDescription
categoryVARCHAR(255)Category name (user-defined)
doc_countINTEGERNumber of texts trained under this category

Rows are created on first train() for a category and deleted when all samples are untrained.

nb_wordlist — NaiveBayes token counts

CREATE TABLE nb_wordlist (
token VARCHAR(255) NOT NULL,
category VARCHAR(255) NOT NULL,
count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (token, category)
);
ColumnTypeDescription
tokenVARCHAR(255)The word token
categoryVARCHAR(255)Category name
countINTEGERCumulative occurrence count of this token in this category

Running migrations

Migrations are applied automatically by createDatabase():

// BinaryClassifier spam filter
$storage = new \ByJG\TextClassifier\Storage\Rdbms($uri, $degenerator);
$storage->createDatabase();

// NaiveBayes
$storage = new \ByJG\TextClassifier\NaiveBayes\Storage\Rdbms($uri);
$storage->createDatabase();

Both methods call Migration::reset(), which:

  1. Drops the migration version table (if it exists)
  2. Runs all up scripts in numerical order

Warning: createDatabase() is destructive. Do not call it on a database that already contains training data.

Shared database

Both engines can coexist in the same database. Their tables are independent. Run createDatabase() once from either implementation — the migration applies all scripts regardless of which class calls it.