PHPackages                             fabiomez/data-extractor - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. fabiomez/data-extractor

ActiveLibrary

fabiomez/data-extractor
=======================

Library for data extraction from common resources like string or a CSV row from files

1.0.0(5y ago)010MITPHP

Since Mar 25Pushed 5y ago1 watchersCompare

[ Source](https://github.com/Fabiomez/Php-Data-Extractor)[ Packagist](https://packagist.org/packages/fabiomez/data-extractor)[ RSS](/packages/fabiomez-data-extractor/feed)WikiDiscussions main Synced 1w ago

READMEChangelog (1)Dependencies (2)Versions (2)Used By (0)

PHP Data Extractor
==================

[](#php-data-extractor)

Why
---

[](#why)

Everytime that I had to parse a file like a CNAB, EDI or any CSV file, the process became very similar and always returning an array that I could not ever trust on what kind of index they will bring back. Some times breaking the application because some data did not come. So, I decided to write this library that works with objects like models for extracted data. The way the data will be extracted is wrote on models attributes docblock, and this will grant to developers an auto-complete on models attributes with a proper description on how the extraction was made, and optionaly, a descrition of the meaning of this data. What I really missed on arrays.

Instalation
-----------

[](#instalation)

Execute `composer require fabiomez/data-extractor`

Or add `fabiomez/data-extractor: "*"` to required section of your composer.json file.

Usage
-----

[](#usage)

The Data Extractor works above models attributes. A model can be any class with public attributes that have the `@stractable` tag on its docblock.

At version 1, Data extractor brings 3 types of data getters, being: substring, array and regex.

Each type of value getter require its own docblok tags that must be a subtag from `@extractable` tag.

### Substring

[](#substring)

The substring value getter works just like substr PHP function where,

- `{@start}` is the initial position, based on 0 index.
- `{@length}` is the length of the desired text

```
/**
 * @extractable
 *   {@start integer}
 *   {@length integer}
 */
```

### Array

[](#array)

The index is a simple array index tha must be extracted (say, from an CSV file). Both numeric or associative index.

```
/**
 * @extractable
 *   {@index mixed}
 */
```

### Regex

[](#regex)

Regex uses patterns to match the desired data, where:

- `{@pattern}` must be any valid pattern tha must match the desired data
- `{@index}` is a numeric index of the matched data from pattern

```
/**
 * @extractable
 *   {@pattern string}
 *   {@index integer}
 */
```

### The Model

[](#the-model)

Write a class with public attributes with docblock description to guide the Extractor

```
class MyModel
{
    /**
     * @extractable
     *    {@start 0}
     *    {@length 10}
     * @otherTag from prop 1
     */
    public $prop1;

    /**
     * @extractable
     *    {@start 10}
     *    {@length 11}
     * @otherTag from prop 2
     */
    public $prop2;
}
```

### Getting the extractor

[](#getting-the-extractor)

The extractor can be directly instantiated or created via factory

#### Directly

[](#directly)

```
use Fabiomez\DataExtractor\Extractor;
use Fabiomez\DataExtractor\DocBlockParser;
use Fabiomez\DataExtractor\ValueGetters\ArrayValueGetter;
use Fabiomez\DataExtractor\ValueGetters\RegexValueGetter;
use Fabiomez\DataExtractor\ValueGetters\SubstringValueGetter;

//Array extractor
$extractor = new Extractor(
    DocBlockParser::createInstance(),
    new ArrayValueGetter()
);

//Regex extractor
$extractor = new Extractor(
    DocBlockParser::createInstance(),
    new RegexValueGetter()
);

//Substring extractor
$extractor = new Extractor(
    DocBlockParser::createInstance(),
    new SubstringValueGetter()
);
```

#### Via Factory

[](#via-factory)

```
use Fabiomez\DataExtractor\ExtractorFactory;

$factory = new ExtractorFactory();

//Array extractor
$extractor = $factory->createArrayExtractor();

//Regex extractor
$extractor = $factory->createRegexExtractor();

//Substring extractor
$extractor = $factory->createSubstringExtractor();
```

### Extracting the data from source

[](#extracting-the-data-from-source)

The extraction process can use the model namespace or an instance;

```
//By namespace
$extractedModel = $extractor->extract(MyModel::class, 'First dataSecond Data');

//By instance
$extractedModel = $extractor->extract(new MyModel(), 'First dataSecond Data');

echo $extractedModel->prop1; //will give 'First data'
echo $extractedModel->prop2; //will give 'Second data'
```

Optionaly a callback can be provided on third paramenter to touch the model after the extraction

```
$extractedModel = $extractor->extract(
    MyModel::class,
    'First dataSecond Data',
    function ($model, $propertiesSchema) {
        foreach ($propertiesSchema as $property => $schema) {
            $model->{$property} .= $schema['otherTag'];
        }
    }
);

echo $extractedModel->prop1; //will give 'First data from prop 1'
echo $extractedModel->prop2; //will give 'Second data from prop 2'
```

###  Health Score

23

—

LowBetter than 27% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity5

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity52

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

1879d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/3732f5a423c7fafc98cba75328492934d1ee3c227bf802c339f1ae51c8b6e961?d=identicon)[fabiomezini](/maintainers/fabiomezini)

---

Top Contributors

[![Fabiomez](https://avatars.githubusercontent.com/u/5710555?v=4)](https://github.com/Fabiomez "Fabiomez (1 commits)")

---

Tags

phpdocdata extraction

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/fabiomez-data-extractor/health.svg)

```
[![Health](https://phpackages.com/badges/fabiomez-data-extractor/health.svg)](https://phpackages.com/packages/fabiomez-data-extractor)
```

###  Alternatives

[phpdocumentor/phpdocumentor

Documentation Generator for PHP

4.4k3.1M878](/packages/phpdocumentor-phpdocumentor)[phpdocumentor/reflection

Reflection library to do Static Analysis for PHP Projects

12521.4M109](/packages/phpdocumentor-reflection)[cognesy/instructor-php

The complete AI toolkit for PHP: unified LLM API, structured outputs, agents, and coding agent control

310107.9k1](/packages/cognesy-instructor-php)[code-lts/doctum

Doctum, a PHP API documentation generator. Fork of Sami

35077.9k31](/packages/code-lts-doctum)[clean/phpdoc-md

Parse PHP classes and writes documentation to markdown files

46128.0k40](/packages/clean-phpdoc-md)[pronamic/wp-documentor

Documentation Generator for WordPress.

6529.8k5](/packages/pronamic-wp-documentor)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
