Pular para o conteúdo principal

Storage Backends

Both the BinaryClassifier spam filter and NaiveBayes classifier use pluggable storage backends. This page compares all available options.

BinaryClassifier (spam filter) backends

BackendClassPersistenceExternal dependency
RDBMSByJG\TextClassifier\Storage\RdbmsDatabasebyjg/micro-orm (bundled)
GDBMByJG\TextClassifier\Storage\DbaFileext-dba PHP extension

NaiveBayes backends

BackendClassPersistenceExternal dependency
MemoryByJG\TextClassifier\NaiveBayes\Storage\MemoryOptional (JSON file)None
RDBMSByJG\TextClassifier\NaiveBayes\Storage\RdbmsDatabasebyjg/anydataset-db (bundled)

Feature comparison

FeatureBC RdbmsBC DbaNB MemoryNB Rdbms
Persistent by defaultYesYesNo (opt-in via save())Yes
Multiple process safeYesNoNoYes
External server requiredOptionalNoNoOptional
SQLite supportYesYes
MySQL supportYesYes
PostgreSQL supportYesYes
createDatabase()YesYesNot neededYes
Schema migrationsYesNoNoYes

Choosing a backend

Use ByJG\TextClassifier\Storage\Rdbms when:

  • You have an existing relational database
  • Training data needs to survive process restarts
  • Multiple processes or servers share the same filter
  • You want SQL-level inspection of token data

Use ByJG\TextClassifier\Storage\Dba when:

  • No database server is available
  • You want a simple, self-contained file
  • Single-process only

Use NaiveBayes\Storage\Memory when:

  • Prototyping or testing
  • Model is trained once and loaded per process
  • Low-overhead inference without database connections

Use NaiveBayes\Storage\Rdbms when:

  • Model is updated from multiple processes
  • You need durable, consistent storage
  • Sharing the database with BinaryClassifier

Implementing a custom storage backend

For BinaryClassifier

Implement ByJG\TextClassifier\Storage\StorageInterface. The key methods are:

public function storageOpen(): void;
public function storageClose(): void;
public function storageRetrieve(array|string $tokens): array; // returns Word[]
public function storagePut(Word $word): void;
public function storageUpdate(Word $word): void;
public function storageDel(string $token): void;

Extend ByJG\TextClassifier\Storage\Base to inherit the getInternals(), getTokens(), and processText() implementations.

For NaiveBayes

Implement ByJG\TextClassifier\NaiveBayes\Storage\StorageInterface directly:

public function getCategories(): array;
public function getDocCount(string $category): int;
public function getTotalDocCount(): int;
public function incrementDocCount(string $category): void;
public function decrementDocCount(string $category): void;
public function getTokenCount(string $token, string $category): int;
public function getTotalTokenCount(string $token): int;
public function getTokenCounts(array $tokens): array;
public function incrementToken(string $token, string $category, int $count = 1): void;
public function decrementToken(string $token, string $category, int $count = 1): void;