PHPackages                             kjenney/php-webminer - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. kjenney/php-webminer

ActiveLibrary

kjenney/php-webminer
====================

A php client that uses WebDriver and Querypath

0.5(11y ago)3311[1 issues](https://github.com/kjenney/php-webminer/issues)Apache-2.0PHPPHP &gt;=5.4.0

Since Feb 14Pushed 11y ago3 watchersCompare

[ Source](https://github.com/kjenney/php-webminer)[ Packagist](https://packagist.org/packages/kjenney/php-webminer)[ Docs](https://github.com/kjenney/php-webminer)[ RSS](/packages/kjenney-php-webminer/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (4)Dependencies (4)Versions (8)Used By (0)

php-webminer -- Extract data using Selenium, QueryPath and PHP
==============================================================

[](#php-webminer----extract-data-using-selenium-querypath-and-php)

DESCRIPTION
-----------

[](#description)

The goal of this project is to create an extensible system for extracting data from web pages. Currently it is using Selenium WebDriver (via php-webdriver), QueryPath, and a configuration file which specifies which components to extract and how to output the results.

### Job File

[](#job-file)

The "job" configuration file defines all of the aspects of the system (database, infrastructure) and the web site and the data you wish to extract.

It is in XML and has the following options:

1. Child element "site" must be defined
2. Child element "steps" are recommended as they drive actions

**Database**

Currently a single MySQL database is accepted. If elements are defind the XML will be imported into the database-&gt;table per the specifications in the Configuration File

**Actions**

1. Click
2. Type
3. Captcha

**Elements**

1. Input - CSS Selectors used by QueryPath to pull data from a web page
2. Output - Element name of Output XML

Samples are included in the /examples folder.

### Outputs XML

[](#outputs-xml)

The definitions in the configuration define how the output will be formatted (element names).

INSTALLING
----------

[](#installing)

**GET THE CODE**

### Github

[](#github)

```
git clone git@github.com:kjenney/php-webminer.git

```

### Packagist

[](#packagist)

Add the dependency.

```
{
  "require": {
    "kjenney/php-webminer": "dev-master"
  }
}

```

**BUILD WITH DEPENDENCIES**

Download the composer.phar

```
curl -sS https://getcomposer.org/installer | php

```

Install the library.

```
php composer.phar install

```

Install PHP5 Extensions

```
apt-get install php5-tidy
yum install php-tidy

apt-get install php5-mysqlnd

```

Install Tesseract (optional)

```
apt-get install tesseract-ocr

```

GETTING STARTED
---------------

[](#getting-started)

- All you need as the server for this client is the selenium-server-standalone-#.jar file provided here:
- Download and run that file, replacing # with the current server version.

    ```
    java -jar selenium-server-standalone-#.jar

    ```

Support
-------

[](#support)

- Wiki -

Contributing
------------

[](#contributing)

- There's still a lot of work that needs to be done, but I welcome any help and/or suggestions.
- Feel free to create issues and recommend features.

###  Health Score

24

—

LowBetter than 32% of packages

Maintenance13

Infrequent updates — may be unmaintained

Popularity11

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity53

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 99% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~2 days

Total

7

Last Release

4096d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/ced37a86809206f0e24388a999a4199635002cc9da4d7311f9406b6afac1b6ad?d=identicon)[kjenney](/maintainers/kjenney)

---

Top Contributors

[![kjenney](https://avatars.githubusercontent.com/u/6553599?v=4)](https://github.com/kjenney "kjenney (104 commits)")[![mastermindg](https://avatars.githubusercontent.com/u/5431723?v=4)](https://github.com/mastermindg "mastermindg (1 commits)")

---

Tags

phpseleniumwebdriverkjenney

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/kjenney-php-webminer/health.svg)

```
[![Health](https://phpackages.com/badges/kjenney-php-webminer/health.svg)](https://phpackages.com/packages/kjenney-php-webminer)
```

###  Alternatives

[php-webdriver/webdriver

A PHP client for Selenium WebDriver. Previously facebook/webdriver.

5.2k90.0M127](/packages/php-webdriver-webdriver)[element-34/php-webdriver

Thin php client for webdriver

121233.8k1](/packages/element-34-php-webdriver)[lmc/steward

Steward - makes Selenium WebDriver + PHPUnit testing easy and robust

222163.1k1](/packages/lmc-steward)[browserstack/browserstack-local

PHP bindings for BrowserStack Local

151.6M6](/packages/browserstack-browserstack-local)[magium/magium

A browser/functional testing suite using Web Driver. Contains low-ish level functionality to quickly build browser/functional tests.

296.7k10](/packages/magium-magium)[magium/magento

Magento-based functionality for the Magium test library

275.4k2](/packages/magium-magento)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
