PHPackages                             andach/laravel-extract-and-transform - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. andach/laravel-extract-and-transform

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

andach/laravel-extract-and-transform
====================================

This is my package extract-and-transform

016PHPCI failing

Since Mar 13Pushed 2mo agoCompare

[ Source](https://github.com/andach-andreas/laravel-extract-and-transform)[ Packagist](https://packagist.org/packages/andach/laravel-extract-and-transform)[ RSS](/packages/andach-laravel-extract-and-transform/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependenciesVersions (1)Used By (0)

Andach's Laravel Extract Data Package
=====================================

[](#andachs-laravel-extract-data-package)

This is a package to extract and copy data from various sources, including CSVs and SQL tables. It provides a simple, programmatic API for connecting to data sources, inspecting their datasets, and synchronizing them to local database tables.

The package is built with robustness in mind, featuring:

- **Schema Versioning:** Safely manage changes to your source data or local mappings over time. The package can detect schema drift and prevent data corruption.
- **Transactional Syncs:** Guarantees that a sync operation either completes fully or fails completely, preventing partially-synced, inconsistent data.
- **Resilient Connections:** Automatically retries connections to external sources with exponential backoff, making syncs resilient to temporary network issues.
- **Flexible Schema Mapping:** Full control over how source columns are named, typed, or excluded in your local database.
- **Extensible Strategies:** A strategy pattern for handling different sync scenarios.

Installation
------------

[](#installation)

You can install the package via composer:

```
composer require andach/extract-and-transform
```

You can publish and run the migrations with:

```
php artisan vendor:publish --provider="Andach\ExtractAndTransform\ExtractAndTransformServiceProvider"
php artisan migrate
```

Usage
-----

[](#usage)

The following example demonstrates the complete workflow: defining a source, creating a versioned sync profile, running a sync, and handling a schema change by creating a new version.

### Simple CSV Synchronisation

[](#simple-csv-synchronisation)

This example shows a connection to a CSV file. It will automatically search for and create columns unless mapColumns() is specified

```
use Andach\ExtractAndTransform\Facades\ExtractAndTransform;

$path = storage_path('app/products.csv');
$source = ExtractAndTransform::createSource('My Source', 'csv', ['path' => $path]);

$run = $source->sync($path)
    ->withStrategy('full_refresh')
    ->mapColumns(['id' => 'remote_id', 'name' => 'product_name'])   // Optional
    ->toTable('products_v1')                                        // Optional
    ->run();
```

### SQL Database Synchronisation

[](#sql-database-synchronisation)

This example demonstrates connecting to an external MySQL database and syncing two different tables (`users` and `orders`) from that single source.

```
use Andach\ExtractAndTransform\Facades\ExtractAndTransform;

// 1. Create the Source (Connection) once
$source = ExtractAndTransform::createSource('Legacy DB', 'sql', [
    'driver' => 'mysql',
    'host' => '192.168.1.50',
    'database' => 'legacy_app',
    'username' => 'readonly_user',
    'password' => 'secret',
]);

// 2. Sync the 'users' table
$source->sync('users')
    ->withStrategy('full_refresh')
    ->toTable('legacy_users')
    ->run();

// 3. Sync the 'orders' table
$source->sync('orders')
    ->withStrategy('full_refresh')
    ->mapColumns(['order_id' => 'id', 'total_amount' => 'amount'])
    ->toTable('legacy_orders')
    ->run();
```

Data Transformations
--------------------

[](#data-transformations)

The package includes a powerful, database-agnostic DSL for transforming your data *after* it has been extracted. This allows you to perform complex operations like joins (lookups), string manipulation, and math entirely within your database, ensuring high performance.

### Basic Usage

[](#basic-usage)

Use the `transform()` method to define a transformation pipeline. The `Expr` class provides a fluent API for building expressions.

```
use Andach\ExtractAndTransform\Transform\Expr;

ExtractAndTransform::transform('Clean Products')
    ->from('raw_products')
    ->select([
        // Simple Rename
        'sku' => 'remote_id',

        // Concatenation
        'full_name' => Expr::concat(Expr::col('brand'), ' ', Expr::col('name')),

        // Static Mapping (Case Statement)
        'is_active' => Expr::map('status', ['live' => 1])->default(0),

        // Math Operations
        'tax_amount' => Expr::col('price')->multiply(0.2),

        // Lookups (Left Joins)
        'category_name' => Expr::lookup('categories', 'cat_id', 'id', 'name'),
    ])
    ->toTable('clean_products')
    ->run();
```

### Chaining Transformations

[](#chaining-transformations)

You can chain multiple operations together to build complex logic.

```
ExtractAndTransform::transform('Advanced Transform')
    ->from('raw_products')
    ->select([
        // Chain math: (price * 1.2) + 5
        'final_price' => Expr::col('price')->multiply(1.2)->add(5),

        // Chain string functions: LOWER(REPLACE(name, ' ', '-'))
        'slug' => Expr::col('name')->replace(' ', '-')->lower(),

        // Combine concatenation and string functions
        'report_name' => Expr::concat(Expr::col('brand'), ': ', Expr::col('name'))->upper(),
    ])
    ->toTable('advanced_products')
    ->run();
```

### Available Expressions

[](#available-expressions)

- **`Expr::col(string $column)`**: Selects a column.
- **`Expr::concat(...$parts)`**: Concatenates columns and strings.
- **`Expr::map(string $column, array $mapping)`**: Creates a `CASE WHEN` statement.
- **`Expr::lookup(string $table, string $localKey, string $foreignKey, string $targetColumn)`**: Performs a `LEFT JOIN` to fetch a value from another table.

### Chainable Methods

[](#chainable-methods)

- **Numeric:** `add()`, `subtract()`, `multiply()`, `divide()`
- **String:** `upper()`, `lower()`, `trim()`, `replace($search, $replace)`

### Data-Driven Transformations (GUI Support)

[](#data-driven-transformations-gui-support)

The transformation engine is designed to be fully serializable, making it easy to build a GUI on top of it. You can save the transformation configuration as a JSON object in the database and execute it later without writing any PHP code.

**Example JSON Configuration:**This JSON corresponds to a transformation that lowercases a reference code and calculates a total.

```
{
    "selects": {
        "ref": {
            "type": "string_function",
            "function": "LOWER",
            "column": {
                "type": "column",
                "column": "order_ref"
            },
            "arguments": []
        },
        "total": {
            "type": "math",
            "operator": "+",
            "left": {
                "type": "column",
                "column": "subtotal"
            },
            "right": {
                "type": "column",
                "column": "tax"
            }
        },
        "is_paid": {
            "type": "map",
            "column": "status",
            "mapping": { "paid": 1 },
            "default": 0
        }
    }
}
```

**Executing the Saved Transformation:**

```
use Andach\ExtractAndTransform\Models\Transformation;
use Andach\ExtractAndTransform\Services\TransformationService;

// 1. Load the transformation model (which contains the JSON config)
$transformation = Transformation::where('name', 'My Saved Transform')->first();

// 2. Run it
app(TransformationService::class)->run($transformation);
```

The service automatically rehydrates the JSON configuration into executable expressions.

Core Concepts
-------------

[](#core-concepts)

Understanding the distinction between a **Source** and a **Sync** is key to using this package effectively.

### 1. Source (The Connection)

[](#1-source-the-connection)

A **Source** represents the configuration required to connect to an external system.

- **Examples:** A MySQL database connection, a Shopify API credential, or a path to a CSV file.
- **Role:** It knows *how* to connect, but not *what* to fetch.

### 2. Sync (The Dataset)

[](#2-sync-the-dataset)

A **Sync** represents a specific dataset retrieved from a Source.

- **Examples:** The `users` table from a database, the `orders` endpoint from an API, or the contents of a CSV file.
- **Role:** It holds the state (mappings, schema versions, run history) and the strategy for that specific dataset.

### Sync Strategies

[](#sync-strategies)

The package supports multiple strategies for synchronizing data, allowing you to choose the best approach for your specific use case.

- **`full_refresh`**: The simplest strategy. It truncates the local table and re-imports all data from the source. Best for small datasets or when the source does not support incremental updates.
- **`watermark`**: Efficient for large, append-only or mutable datasets. It uses a "watermark" column (e.g., `updated_at` or `id`) to fetch only rows that have changed or been added since the last sync.
    - **Modes:** Supports `append_only` (inserts new rows) and `upsert` (updates existing rows based on primary key).
- **`content_hash`**: Useful when the source lacks a reliable `updated_at` column. It calculates a hash of the row's content to detect changes. It can also detect and handle deletions (soft deletes) by comparing source and local hashes.
- **`id_diff`**: Fetches all IDs from the source and compares them with local IDs. It then fetches full rows only for new IDs. Useful for detecting new records and deletions when fetching the full dataset is too expensive, but fetching a list of IDs is cheap.

### Metadata Columns

[](#metadata-columns)

The package automatically adds several reserved metadata columns to your local tables to manage synchronization state and history. You should **not** use these names for your own mapped columns.

- **`__id`**: The local primary key (BigInteger).
- **`__source_id`**: The original ID from the source (nullable).
- **`__content_hash`**: A hash of the row's content, used by the `content_hash` strategy.
- **`__is_deleted`**: A boolean flag indicating if the row has been deleted in the source (used by `content_hash` and `id_diff` strategies).
- **`__last_synced_at`**: The timestamp when the row was last synced/updated in the local table.

### Simple vs. Complex Workflows

[](#simple-vs-complex-workflows)

- **Simple (1-to-1):** For a single CSV file, the Source and Sync might seem redundant. You define the Source (file path) and Sync (file content) together.
- **Complex (1-to-Many):** For a Database, you define the **Source** once (the connection). You then define multiple **Syncs**—one for each table you want to extract. This keeps your credentials centralized while allowing independent management of each table's sync schedule and schema.

Handling Schema Changes
-----------------------

[](#handling-schema-changes)

When you change your column mapping (e.g., adding a new column), the package automatically detects this change and creates a **new schema version**.

### Automatic Table Naming (Recommended)

[](#automatic-table-naming-recommended)

If you do **not** explicitly specify a table name using `toTable()`, the package will automatically handle the table creation for you.

- **Version 1:** Creates table `csv_my_source_products_v1`
- **Version 2 (Schema Change):** Automatically creates `csv_my_source_products_v2`

This ensures that your sync never fails due to missing columns, and you retain the old data in the `v1` table.

### Explicit Table Naming

[](#explicit-table-naming)

If you explicitly specify a table name using `toTable('products_v1')`, you must be careful when changing the schema.

If you add a column to `mapColumns` but keep `toTable('products_v1')`, the sync will fail with a **Column not found** error because the package will try to insert the new column into the existing `products_v1` table (which doesn't have it).

**Correct Approach:**When changing the schema, you should either:

1. **Remove `toTable()`** to let the package auto-generate the next version name.
2. **Update the table name** manually (e.g., change to `toTable('products_v2')`).

Testing
-------

[](#testing)

```
docker compose exec -e XDEBUG_MODE=coverage app ./vendor/bin/phpunit --coverage-html phpunit_reports
```

###  Health Score

21

—

LowBetter than 19% of packages

Maintenance58

Moderate activity, may be stable

Popularity7

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity11

Early-stage or recently created project

 Bus Factor1

Top contributor holds 75% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/14134660effc2fe199c32c19c6749473faf24b3944bd15dcf34dd9e088f8f095?d=identicon)[andach-andreas](/maintainers/andach-andreas)

---

Top Contributors

[![adchristodoulou](https://avatars.githubusercontent.com/u/25136044?v=4)](https://github.com/adchristodoulou "adchristodoulou (33 commits)")[![andach-andreas](https://avatars.githubusercontent.com/u/140014241?v=4)](https://github.com/andach-andreas "andach-andreas (8 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (2 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (1 commits)")

### Embed Badge

![Health badge](/badges/andach-laravel-extract-and-transform/health.svg)

```
[![Health](https://phpackages.com/badges/andach-laravel-extract-and-transform/health.svg)](https://phpackages.com/packages/andach-laravel-extract-and-transform)
```

###  Alternatives

[solidworx/toggler

Feature toggle library

2157.6k](/packages/solidworx-toggler)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
