Pular para o conteúdo principal

Storage: GDBM (DBA)

ByJG\TextClassifier\Storage\Dba persists the word list to a GDBM file using PHP's dba_* extension. It is a fast, embedded key-value store with no external server dependency.

Requirements

The ext-dba PHP extension must be installed and the gdbm handler must be available:

# Ubuntu / Debian
sudo apt-get install php-dba

# Verify gdbm is available
php -r "print_r(dba_handlers(true));"

Setup

use ByJG\TextClassifier\Degenerator\ConfigDegenerator;
use ByJG\TextClassifier\Degenerator\StandardDegenerator;
use ByJG\TextClassifier\Storage\Dba;

$storage = new Dba(
'/var/data/wordlist.db',
new StandardDegenerator(new ConfigDegenerator())
);

// Run once to create the database file and seed internal variables
$storage->createDatabase();

createDatabase()

Creates a new gdbm file at the specified path using dba_open($path, 'c', 'gdbm') and inserts the two required internal variables:

keyvalue
tc*dbversion3
tc*texts0 0 (ham count, spam count)

Call this once on a new, empty path. Calling it on an existing file will fail because the file already exists and is not empty.

File path

Pass an absolute filesystem path:

$storage = new Dba('/absolute/path/to/wordlist.db', $degenerator);

The .db extension is conventional but not required.

Data format

Each token is stored as a key-value pair:

key:   "some_word"
value: "count_ham count_spam" e.g. "12 5"

Internal variables use a b8* prefix to avoid collisions with real tokens.

When to use this backend

  • Single-process applications where no external database is available
  • Embedded deployments where minimising dependencies is important
  • High read throughput with infrequent writes

Limitations

  • Not safe for concurrent writes from multiple processes
  • Not suitable for distributed or shared-access setups
  • File size grows unboundedly as tokens accumulate; there is no built-in compaction