PHPackages                             nguyenanhung/htmlawed - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. nguyenanhung/htmlawed

ActiveLibrary[Search &amp; Filtering](/categories/search)

nguyenanhung/htmlawed
=====================

Official htmLawed PHP library for HTML filtering

v1.2.10(3y ago)010.1k↓92.9%1GPL-2.0-or-laterHTMLPHP &gt;=4.4

Since Dec 10Pushed 3y ago2 watchersCompare

[ Source](https://github.com/nguyenanhung/HTMLawed)[ Packagist](https://packagist.org/packages/nguyenanhung/htmlawed)[ Docs](https://github.com/nguyenanhung/HTMLawed)[ RSS](/packages/nguyenanhung-htmlawed/feed)WikiDiscussions main Synced yesterday

READMEChangelog (1)DependenciesVersions (2)Used By (1)

```
/*
htmLawed_README.txt, 5 November 2022
htmLawed 1.2.10
Copyright Santosh Patnaik
Dual licensed with LGPL 3 and GPL 2+
A PHP Labware internal utility - https://bioinformatics.org/phplabware/internal_utilities/htmLawed
*/

==  Content =========================================================

1  About htmLawed
  1.1  Example uses
  1.2  Features
  1.3  History
  1.4  License & copyright
  1.5  Terms used here
  1.6  Availability
2  Usage
  2.1  Simple
  2.2  Configuring htmLawed using the '$config' argument
  2.3  Extra HTML specifications using the '$spec' argument
  2.4  Performance time & memory usage
  2.5  Some security risks to keep in mind
  2.6  Use with 'kses()' code
  2.7  Tolerance for ill-written HTML
  2.8  Limitations & work-arounds
  2.9  Examples of usage
3  Details
  3.1  Invalid/dangerous characters
  3.2  Character references/entities
  3.3  HTML elements
    3.3.1  HTML comments & 'CDATA' sections
    3.3.2  Tag-transformation for better compliance with standards
    3.3.3  Tag balancing & proper nesting
    3.3.4  Elements requiring child elements
    3.3.5  Beautify or compact HTML
    3.3.6  Custom elements
  3.4  Attributes
    3.4.1  Auto-addition of XHTML-required attributes
    3.4.2  Duplicate/invalid 'id' values
    3.4.3  URL schemes & scripts in attribute values
    3.4.4  Absolute & relative URLs
    3.4.5  Lower-cased, standard attribute values
    3.4.6  Transformation of deprecated attributes
    3.4.7  Anti-spam & 'href'
    3.4.8  Inline style properties
    3.4.9  Hook function for tag content
  3.5  Simple configuration directive for most valid XHTML
  3.6  Simple configuration directive for most `safe` HTML
  3.7  Using a hook function
  3.8  Obtaining `finalized` parameter values
  3.9  Retaining non-HTML tags in input with mixed markup
4  Other
  4.1  Support
  4.2  Known issues
  4.3  Change-log
  4.4  Testing
  4.5  Upgrade, & old versions
  4.6  Comparison with 'HTMLPurifier'
  4.7  Use through application plug-ins/modules
  4.8  Use in non-PHP applications
  4.9  Donate
  4.10  Acknowledgements
5  Appendices
  5.1  Characters discouraged in HTML
  5.2  Valid attribute-element combinations
  5.3  CSS 2.1 properties accepting URLs
  5.4  Microsoft Windows 1252 character replacements
  5.5  URL format
  5.6  Brief on htmLawed code

== 1  About htmLawed ================================================

  htmLawed is a PHP script to process text with HTML markup to make it more compliant with HTML standards and with administrative policies. It works by making HTML well-formed with balanced and properly nested tags, neutralizing code that introduces a security vulnerability or is used for cross-site scripting (XSS) attacks, allowing only specified HTML tags and attributes, and so on. Such `lawing in` of HTML code ensures that it is in accordance with the aesthetics, safety and usability requirements set by administrators.

  htmLawed is highly customizable, and fast with low memory usage. Its free and open-source code is in one small file. It does not require extensions or libraries, and works in older versions of PHP as well. It is a good alternative to the HTML Tidy:- http://tidy.sourceforge.net application.

-- 1.1  Example uses ------------------------------------------------

  *  Filtering of text submitted as comments on blogs to allow only certain HTML elements

  *  Making RSS newsfeed items standard-compliant: often one uses an excerpt from an HTML document for the content, and with unbalanced tags, non-numerical entities, etc., such excerpts may not be XML-compliant

  *  Beautifying or pretty-printing HTML code

  *  Text processing for stricter XML standard-compliance: e.g., to have lowercased 'x' in hexadecimal numeric entities becomes necessary if an HTML document with MathML content needs to be served as 'application/xml'

  *  Scraping text from web-pages

  *  Transforming an HTML element to another

-- 1.2  Features ---------------------------------------------------o

  Key: '*' security feature, '^' standard compliance, '~' requires setting right options

  htmLawed:

  *  makes input more *secure* and *standard-compliant* for HTML as well as generic *XML* documents  ^
  *  supports markup for *HTML 5*, *custom elements*, and *microdata, ARIA, Ruby, custom attributes*, etc.  ^
  *  can *beautify* or *compact* HTML  ~
  *  works with input of almost any *character encoding* and does not affect it
  *  has good *tolerance for ill-written HTML*

  *  can enforce *restricted use of elements*  *~
  *  ensures proper closure of empty elements like 'img'  ^
  *  *transforms deprecated elements* like 'font'  ^~
  *  can permit HTML *comments* and *CDATA* sections  ^~
  *  can permit all elements, including 'script', 'object' and 'form'  ~

  *  can *restrict attributes by element*  ^~
  *  removes *invalid attributes*  ^
  *  lower-cases element and attribute names  ^
  *  provides *required attributes*, like 'alt' for 'image'  ^
  *  *transforms deprecated attributes*  ^~
  *  ensures attributes are *declared only once*  ^
  *  permits *custom*, non-standard attributes as well as custom rules for standard attributes  ~

  *  declares value for `empty` (`minimized` or `boolean`) attributes like 'checked'  ^
  *  checks for potentially dangerous attribute values  *~
  *  ensures *unique* 'id' attribute values  ^~
  *  *double-quotes* attribute values  ^
  *  lower-cases *standard attribute values* like 'password'  ^

  *  can restrict *URL protocol/scheme by attribute*  *~
  *  can disable *dynamic expressions* in 'style' values  *~

  *  neutralizes invalid named *character entities*  ^
  *  converts hexadecimal numeric entities to decimal ones, or vice versa  ^~
  *  converts named entities to numeric ones for generic XML use  ^~

  *  removes *null* characters  *
  *  neutralizes potentially dangerous proprietary Netscape *Javascript entities*  *
  *  replaces potentially dangerous *soft-hyphen* character in URL-accepting attribute values with spaces  *

  *  removes common *invalid characters* not allowed in HTML or XML  ^
  *  replaces *characters from Microsoft applications* like 'Word' that are discouraged in HTML or XML  ^~
  *  neutralize entities for characters invalid or discouraged in HTML or XML  ^
  *  appropriately neutralize '
