PHPackages                             envoymediagroup/columna - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. envoymediagroup/columna

ActiveProject[Utility &amp; Helpers](/categories/utility)

envoymediagroup/columna
=======================

Columnar analytics for PHP - a pure PHP library to read and write simple columnar files in a performant way.

v1.0.7(7mo ago)3674—0%MITPHPPHP ^7.3 || ^8.0

Since Sep 25Pushed 7mo ago6 watchersCompare

[ Source](https://github.com/envoymediagroup/columna)[ Packagist](https://packagist.org/packages/envoymediagroup/columna)[ RSS](/packages/envoymediagroup-columna/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (8)Dependencies (2)Versions (11)Used By (0)

Columnar Analytics (in pure PHP)
================================

[](#columnar-analytics-in-pure-php)

On GitHub:

About the project
-----------------

[](#about-the-project)

### What does it do?

[](#what-does-it-do)

This library allows you to write and read a simple columnar file format in a performant way with a lightweight, pure PHP implementation.

### Why columnar analytics in PHP?

[](#why-columnar-analytics-in-php)

This library started as a scratch-our-own-itch project at [Envoy Media Group](https://www.envoymediagroup.com/). We needed fast, columnar analytics that would work well with our all-PHP stack, but found PHP's support and performance for mainstream columnar formats (Parquet, ORC, etc.) to be lacking. So we rolled our own simple columnar format with its own speedy writer and reader.

### How battle tested is it?

[](#how-battle-tested-is-it)

This library has been in production use as the backbone of Envoy's analytics and business intelligence since early 2022. It processes hundreds of thousands of reads and writes per day, serving both custom reports for business users and automated requests for monitoring and machine learning applications. Bug fixes, feature adds, and improvements are ongoing based on our experience using this library every day in production.

Installation
------------

[](#installation)

Add this library to your project using [Composer](https://getcomposer.org/):

```
composer require envoymediagroup/columna
```

File format
-----------

[](#file-format)

What file format does this library use to store data? The file extension `.scf` is for Simple Columnar Format, and it is simple: all the metadata about the file, its columns, and their definitions and offsets are stored on line 1 in a JSON header. The rest of the record is CSV-like data in a columnar arrangement (each column corresponding to one line in the file) using RLE compression and a Record Separator character as the RLE delimiter. There is some extra escaping applied to the strings to increase the range of valid values that can be stored and retrieved. [See a sample file here.](https://github.com/envoymediagroup/columna/blob/main/tests/fixtures/clicks--has_data_no_csort.scf)

Usage
-----

[](#usage)

### Writer

[](#writer)

Each columnar file is specific to one date and one metric, with any number of dimensions. For this example, we will assume a metric named `clicks` and three dimensions named `platform_id`, `site_id`, and `url`. Note that we provide the headers and values as separate inputs to the Writer; this makes sense when we are working with large data sets and want to preserve some memory by not duplicating associative string keys on every array item.

#### Data Types

[](#data-types)

Currently supported data types include strings, ints, floats, and bools, and a special "datetime" type. Datetimes are treated as strings except when evaluating query conditions, when they are parsed with strtotime() and compared with integer operations &gt;, &lt;, =, etc. Nested data is not currently supported. While it is possible to store JSON or other serializations in the string type, these values will not be unserialized by the engine and so cannot be evaluated for nested values. The column definitions include an empty value which will always be used in place of nulls in the data set, so null is never stored in the files or returned when reading a file.

#### Usage

[](#usage-1)

Let's walk through using the Writer in the comments below:

```
