Docs For Class rcube

rcube_html2text

Description

Converts HTML to formatted plain text

Located in /lib/Roundcube/rcube_html2text.php (line 98)

Direct descendents

Class	Description
html2text	Converts HTML to formatted plain text

Variable Summary

string $allowed_tags

array $callback_search

string $charset

array $ent_replace

array $ent_search

string $html

array $pre_replace

array $pre_search

array $replace

array $search

string $text

string $url

integer $width

boolean $_converted

boolean $_do_links

array $_link_list

Method Summary

rcube_html2text __construct ([string $source = ''], [boolean $from_file = false], [boolean $do_links = true], [integer $width = 75], [ $charset = 'UTF-8'])

void blockquote_citation_ballback ( $m)

string get_text ()

string pre_preg_callback (array $matches)

void print_text ()

void set_allowed_tags ([ $allowed_tags = ''])

void set_base_url ([ $url = ''])

void set_html (string $source, [boolean $from_file = false])

string tags_preg_callback (array $matches)

void _build_link_list (string $link, string $display)

void _convert ()

void _converter (string &$text)

void _convert_blockquotes (string &$text)

void _convert_pre (string &$text)

Variables

string $allowed_tags = '' (line 285)

Contains a list of HTML tags to allow in the resulting text.

see: rcube_html2text::set_allowed_tags()
access: protected

array $callback_search = array(
'/<(a) [^>]*href=("|\')([^"\']+)\2[^>]*>(.*?)<\/a>/i', // <a href="">
'/<(h)[123456]( [^>]*)?>(.*?)<\/h[123456]>/i', // h1 - h6
'/<(b)( [^>]*)?>(.*?)<\/b>/i', // <b>
'/<(strong)( [^>]*)?>(.*?)<\/strong>/i', // <strong>
'/<(th)( [^>]*)?>(.*?)<\/th>/i', // <th> and </th>
) (line 242)

List of preg* regular expression patterns to search for and replace using callback function.

access: protected

string $charset = 'UTF-8' (line 129)

Target character encoding for output text

access: protected

array $ent_replace = array(
' ', // Non-breaking space
'"', // Double quotes
"'", // Single quotes
'>',
'<',
'(c)',
'(tm)',
'(R)',
'--',
'-',
'*',
'Â£',
'EUR', // Euro sign. € ?
'|+|amp|+|', // Ampersand: see _converter()
' ', // Runs of spaces, post-handling
) (line 218)

List of pattern replacements corresponding to patterns searched.

see: rcube_html2text::$ent_search
access: protected

array $ent_search = array(
'/&(nbsp|#160);/i', // Non-breaking space
'/&(quot|rdquo|ldquo|#8220|#8221|#147|#148);/i',
// Double quotes
'/&(apos|rsquo|lsquo|#8216|#8217);/i', // Single quotes
'/>/i', // Greater-than
'/</i', // Less-than
'/&(copy|#169);/i', // Copyright
'/&(trade|#8482|#153);/i', // Trademark
'/&(reg|#174);/i', // Registered
'/&(mdash|#151|#8212);/i', // mdash
'/&(ndash|minus|#8211|#8722);/i', // ndash
'/&(bull|#149|#8226);/i', // Bullet
'/&(pound|#163);/i', // Pound sign
'/&(euro|#8364);/i', // Euro sign
'/&(amp|#38);/i', // Ampersand: see _converter()
'/[ ]{2,}/', // Runs of spaces, post-handling
) (line 193)

List of preg* regular expression patterns to search for, used in conjunction with $ent_replace.

see: rcube_html2text::$ent_replace
access: protected

string $html (line 105)

Contains the HTML content to convert.

access: protected

array $pre_replace = array(
'<br>',
'    ',
' ',
'',
''
) (line 271)

List of pattern replacements corresponding to patterns searched for PRE body.

see: rcube_html2text::$pre_search
access: protected

array $pre_search = array(
"/\n/",
"/\t/",
'/ /',
'/<pre[^>]*>/',
'/<\/pre>/'
) (line 257)

List of preg* regular expression patterns to search for in PRE body, used in conjunction with $pre_replace.

see: rcube_html2text::$pre_replace
access: protected

array $replace = array(
'', // Non-legal carriage return
' ', // Newlines and tabs
'', // <head>
'', // <script>s -- which strip_tags supposedly has problems with
'', // <style>s -- which strip_tags supposedly has problems with
"\n\n", // <P>
"\n", // <br>
'_\\1_', // <i>
'_\\1_', // <em>
"\n\n", // <ul> and </ul>
"\n\n", // <ol> and </ol>
"\t* \\1\n", // <li> and </li>
"\n\t* ", // <li>
"\n-------------------------\n", // <hr>
"<div>\n", // <div>
"\n\n", // <table> and </table>
"\n", // <tr> and </tr>
"\t\t\\1\n", // <td> and </td>
) (line 165)

List of pattern replacements corresponding to patterns searched.

see: rcube_html2text::$search
access: protected

array $search = array(
"/\r/", // Non-legal carriage return
"/[\n\t]+/", // Newlines and tabs
'/<head[^>]*>.*?<\/head>/i', // <head>
'/<script[^>]*>.*?<\/script>/i', // <script>s -- which strip_tags supposedly has problems with
'/<style[^>]*>.*?<\/style>/i', // <style>s -- which strip_tags supposedly has problems with
'/<p[^>]*>/i', // <P>
'/<br[^>]*>/i', // <br>
'/<i[^>]*>(.*?)<\/i>/i', // <i>
'/<em[^>]*>(.*?)<\/em>/i', // <em>
'/(<ul[^>]*>|<\/ul>)/i', // <ul> and </ul>
'/(<ol[^>]*>|<\/ol>)/i', // <ol> and </ol>
'/<li[^>]*>(.*?)<\/li>/i', // <li> and </li>
'/<li[^>]*>/i', // <li>
'/<hr[^>]*>/i', // <hr>
'/<div[^>]*>/i', // <div>
'/(<table[^>]*>|<\/table>)/i', // <table> and </table>
'/(<tr[^>]*>|<\/tr>)/i', // <tr> and </tr>
'/<td[^>]*>(.*?)<\/td>/i', // <td> and </td>
) (line 138)

List of preg* regular expression patterns to search for, used in conjunction with $replace.

see: rcube_html2text::$replace
access: protected

string $text (line 112)

Contains the converted, formatted text.

access: protected

string $url (line 292)

Contains the base URL that relative links should resolve to.

access: protected

integer $width = 70 (line 122)

Maximum width of the formatted text, in columns.

Set this value to 0 (or less) to ignore word wrapping and not constrain text to a fixed-width column.

access: protected

boolean $_converted = false (line 300)

Indicates whether content in the $html variable has been converted yet.

see: rcube_html2text::$html, rcube_html2text::$text
access: protected

boolean $_do_links = true (line 316)

Boolean flag, true if a table of link URLs should be listed after the text.

see: rcube_html2text::__construct()
access: protected

array $_link_list = array() (line 308)

Contains URL addresses from links to be rendered in plain text.

see: rcube_html2text::_build_link_list()
access: protected

Methods

Constructor __construct (line 330)

Constructor.

If the HTML source string (or file) is supplied, the class will instantiate with that source propagated, all that has to be done it to call get_text().

rcube_html2text __construct ([string $source = ''], [boolean $from_file = false], [boolean $do_links = true], [integer $width = 75], [ $charset = 'UTF-8'])

string $source: HTML content
boolean $from_file: Indicates $source is a file to pull content from
boolean $do_links: Indicate whether a table of link URLs is desired
integer $width: Maximum width of the formatted text, 0 for no limit
$charset

blockquote_citation_ballback (line 625)

Callback function to correctly add citation markers for blockquote contents

access: public

void blockquote_citation_ballback ( $m)

get_text (line 366)

Returns the text, converted from HTML.

return: Plain text

string get_text ()

pre_preg_callback (line 661)

Callback function for preg_replace_callback use in PRE content handler.

access: public

string pre_preg_callback (array $matches)

array $matches: PREG matches

print_text (line 378)

Prints the text, converted from HTML.

void print_text ()

set_allowed_tags (line 388)

Sets the allowed HTML tags to pass through to the resulting text.

Tags should be in the form "<p>", with no corresponding closing tag.

void set_allowed_tags ([ $allowed_tags = ''])

$allowed_tags

set_base_url (line 398)

Sets a base URL to handle relative links.

void set_base_url ([ $url = ''])

$url

set_html (line 349)

Loads source HTML into memory, either from $source string or a file.

void set_html (string $source, [boolean $from_file = false])

string $source: HTML content
boolean $from_file: Indicates $source is a file to pull content from

tags_preg_callback (line 638)

Callback function for preg_replace_callback use.

access: public

string tags_preg_callback (array $matches)

array $matches: PREG matches

_build_link_list (line 509)

Helper function called by preg_replace() on link replacement.

Maintains an internal list of links to be displayed at the end of the text, with numeric indices to the original point in the text they appeared. Also makes an effort at identifying and handling absolute and relative links.

access: protected

void _build_link_list (string $link, string $display)

string $link: URL of the link
string $display: Part of the text to associate number with

_convert (line 421)

Workhorse function that does actual conversion (calls _converter() method).

access: protected

void _convert ()

_converter (line 453)

Workhorse function that does actual conversion.

First performs custom tag replacement specified by $search and $replace arrays. Then strips any remaining HTML tags, reduces whitespace and newlines to a readable format, and word wraps the text to $width characters.

access: protected

void _converter (string &$text)

string &$text: Reference to HTML content string

_convert_blockquotes (line 572)

Helper function for BLOCKQUOTE body conversion.

access: protected

void _convert_blockquotes (string &$text)

string &$text: HTML content

_convert_pre (line 544)

Helper function for PRE body conversion.

access: protected

void _convert_pre (string &$text)

string &$text: HTML content

Documentation generated on Fri, 03 May 2013 12:45:00 +0200 by phpDocumentor 1.4.4