Copied!
CloneableInstantiable
Methods
public static convert(string $html, $options = []) : string
 

Tries to convert the given HTML into a plain text format - best suited for e-mail display, etc.

  • param string $html the input HTML
  • param bool|array<string,bool|string> $options if boolean, Ignore xml parsing errors, else ['ignore_errors' => false, 'drop_links' => false, 'char_set' => 'auto']
  • return string the HTML converted, as best as possible, to text
  • throws \Html2TextException if the HTML could not be loaded as a {@link \DOMDocument}
public static defaultOptions() : array
 
  • return array<string,bool|string>
public static fixNewlines(string $text) : string
 

Unify newlines; in particular, \r\n becomes \n, and then \r becomes \n. This means that all newlines (Unix, Windows, Mac) all become \ns.

  • param string $text text with any number of \r, \r\n and \n combinations
  • return string the fixed text
public static isOfficeDocument(string $html) : bool
 

Can we guess that this HTML is generated by Microsoft Office?

public static isWhitespace(string $text) : bool
public static nbspCodes() : array
 
  • return string[]
public static processWhitespaceNewlines(string $text) : string
 

Remove leading or trailing spaces and excess empty lines from provided multiline text

  • param string $text multiline text any number of leading or trailing spaces or excess lines
  • return string the fixed text
public static zwnjCodes() : array
 
  • return string[]
Methods
private static getDocument(string $html, array $options) : DOMDocument
 

Parse HTML into a DOMDocument

  • param string $html the input HTML
  • param array<string,bool|string> $options
  • return \DOMDocument the parsed document tree
private static iterateOverNode(DOMNode $node, ?string $prevName, bool $in_pre, bool $is_office_document, array $options) : string
 
  • param array<string,bool|string> $options
private static nextChildName(?DOMNode $node) : ?string
private static renderText(string $text) : string
 

Replace any special characters with simple text versions, to prevent output issues:

  • Convert non-breaking spaces to regular spaces; and
  • Convert zero-width non-joiners to '' (nothing).

This is to match our goal of rendering documents as they would be rendered by a browser.

Methods
public static convert(string $html, $options = []) : string
 

Tries to convert the given HTML into a plain text format - best suited for e-mail display, etc.

  • param string $html the input HTML
  • param bool|array<string,bool|string> $options if boolean, Ignore xml parsing errors, else ['ignore_errors' => false, 'drop_links' => false, 'char_set' => 'auto']
  • return string the HTML converted, as best as possible, to text
  • throws \Html2TextException if the HTML could not be loaded as a {@link \DOMDocument}
public static defaultOptions() : array
 
  • return array<string,bool|string>
public static fixNewlines(string $text) : string
 

Unify newlines; in particular, \r\n becomes \n, and then \r becomes \n. This means that all newlines (Unix, Windows, Mac) all become \ns.

  • param string $text text with any number of \r, \r\n and \n combinations
  • return string the fixed text
private static getDocument(string $html, array $options) : DOMDocument
 

Parse HTML into a DOMDocument

  • param string $html the input HTML
  • param array<string,bool|string> $options
  • return \DOMDocument the parsed document tree
public static isOfficeDocument(string $html) : bool
 

Can we guess that this HTML is generated by Microsoft Office?

public static isWhitespace(string $text) : bool
private static iterateOverNode(DOMNode $node, ?string $prevName, bool $in_pre, bool $is_office_document, array $options) : string
 
  • param array<string,bool|string> $options
public static nbspCodes() : array
 
  • return string[]
private static nextChildName(?DOMNode $node) : ?string
public static processWhitespaceNewlines(string $text) : string
 

Remove leading or trailing spaces and excess empty lines from provided multiline text

  • param string $text multiline text any number of leading or trailing spaces or excess lines
  • return string the fixed text
private static renderText(string $text) : string
 

Replace any special characters with simple text versions, to prevent output issues:

  • Convert non-breaking spaces to regular spaces; and
  • Convert zero-width non-joiners to '' (nothing).

This is to match our goal of rendering documents as they would be rendered by a browser.

public static zwnjCodes() : array
 
  • return string[]
© 2024 Bruce Wells
Search Namespaces \ Classes
Configuration