Class rcube_html2text

Description

Converts HTML to formatted plain text

Located in /lib/Roundcube/rcube_html2text.php (line 98)


	
			
Direct descendents
Class Description
 class html2text Converts HTML to formatted plain text
Variable Summary
 string $allowed_tags
 string $charset
 array $ent_replace
 array $ent_search
 string $html
 array $pre_replace
 array $pre_search
 array $replace
 array $search
 string $text
 string $url
 integer $width
 boolean $_converted
 boolean $_do_links
 array $_link_list
Method Summary
 rcube_html2text __construct ([string $source = ''], [boolean $from_file = false], [boolean $do_links = true], [integer $width = 75], [ $charset = 'UTF-8'])
 string get_text ()
 string pre_preg_callback (array $matches)
 void print_text ()
 void set_allowed_tags ([ $allowed_tags = ''])
 void set_base_url ([ $url = ''])
 void set_html (string $source, [boolean $from_file = false])
 string tags_preg_callback (array $matches)
 void _build_link_list (string $link, string $display)
 void _convert ()
 void _converter (string &$text)
 void _convert_blockquotes (string &$text)
 void _convert_pre (string &$text)
Variables
string $allowed_tags = '' (line 285)

Contains a list of HTML tags to allow in the resulting text.

array $callback_search = array(
'/<(a) [^>]*href=("|\')([^"\']+)\2[^>]*>(.*?)<\/a>/i', // <a href="">
'/<(h)[123456]( [^>]*)?>(.*?)<\/h[123456]>/i', // h1 - h6
'/<(b)( [^>]*)?>(.*?)<\/b>/i', // <b>
'/<(strong)( [^>]*)?>(.*?)<\/strong>/i', // <strong>
'/<(th)( [^>]*)?>(.*?)<\/th>/i', // <th> and </th>
)
(line 242)

List of preg* regular expression patterns to search for and replace using callback function.

  • access: protected
string $charset = 'UTF-8' (line 129)

Target character encoding for output text

  • access: protected
array $ent_replace = array(
' ', // Non-breaking space
'"', // Double quotes
"'", // Single quotes
'>',
'<',
'(c)',
'(tm)',
'(R)',
'--',
'-',
'*',
'£',
'EUR', // Euro sign. € ?
'|+|amp|+|', // Ampersand: see _converter()
' ', // Runs of spaces, post-handling
)
(line 218)

List of pattern replacements corresponding to patterns searched.

array $ent_search = array(
'/&(nbsp|#160);/i', // Non-breaking space
'/&(quot|rdquo|ldquo|#8220|#8221|#147|#148);/i',
// Double quotes
'/&(apos|rsquo|lsquo|#8216|#8217);/i', // Single quotes
'/&gt;/i', // Greater-than
'/&lt;/i', // Less-than
'/&(copy|#169);/i', // Copyright
'/&(trade|#8482|#153);/i', // Trademark
'/&(reg|#174);/i', // Registered
'/&(mdash|#151|#8212);/i', // mdash
'/&(ndash|minus|#8211|#8722);/i', // ndash
'/&(bull|#149|#8226);/i', // Bullet
'/&(pound|#163);/i', // Pound sign
'/&(euro|#8364);/i', // Euro sign
'/&(amp|#38);/i', // Ampersand: see _converter()
'/[ ]{2,}/', // Runs of spaces, post-handling
)
(line 193)

List of preg* regular expression patterns to search for, used in conjunction with $ent_replace.

string $html (line 105)

Contains the HTML content to convert.

  • access: protected
array $pre_replace = array(
'<br>',
'&nbsp;&nbsp;&nbsp;&nbsp;',
'&nbsp;',
'',
''
)
(line 271)

List of pattern replacements corresponding to patterns searched for PRE body.

array $pre_search = array(
"/\n/",
"/\t/",
'/ /',
'/<pre[^>]*>/',
'/<\/pre>/'
)
(line 257)

List of preg* regular expression patterns to search for in PRE body, used in conjunction with $pre_replace.

array $replace = array(
'', // Non-legal carriage return
' ', // Newlines and tabs
'', // <head>
'', // <script>s -- which strip_tags supposedly has problems with
'', // <style>s -- which strip_tags supposedly has problems with
"\n\n", // <P>
"\n", // <br>
'_\\1_', // <i>
'_\\1_', // <em>
"\n\n", // <ul> and </ul>
"\n\n", // <ol> and </ol>
"\t* \\1\n", // <li> and </li>
"\n\t* ", // <li>
"\n-------------------------\n", // <hr>
"<div>\n", // <div>
"\n\n", // <table> and </table>
"\n", // <tr> and </tr>
"\t\t\\1\n", // <td> and </td>
)
(line 165)

List of pattern replacements corresponding to patterns searched.

array $search = array(
"/\r/", // Non-legal carriage return
"/[\n\t]+/", // Newlines and tabs
'/<head[^>]*>.*?<\/head>/i', // <head>
'/<script[^>]*>.*?<\/script>/i', // <script>s -- which strip_tags supposedly has problems with
'/<style[^>]*>.*?<\/style>/i', // <style>s -- which strip_tags supposedly has problems with
'/<p[^>]*>/i', // <P>
'/<br[^>]*>/i', // <br>
'/<i[^>]*>(.*?)<\/i>/i', // <i>
'/<em[^>]*>(.*?)<\/em>/i', // <em>
'/(<ul[^>]*>|<\/ul>)/i', // <ul> and </ul>
'/(<ol[^>]*>|<\/ol>)/i', // <ol> and </ol>
'/<li[^>]*>(.*?)<\/li>/i', // <li> and </li>
'/<li[^>]*>/i', // <li>
'/<hr[^>]*>/i', // <hr>
'/<div[^>]*>/i', // <div>
'/(<table[^>]*>|<\/table>)/i', // <table> and </table>
'/(<tr[^>]*>|<\/tr>)/i', // <tr> and </tr>
'/<td[^>]*>(.*?)<\/td>/i', // <td> and </td>
)
(line 138)

List of preg* regular expression patterns to search for, used in conjunction with $replace.

string $text (line 112)

Contains the converted, formatted text.

  • access: protected
string $url (line 292)

Contains the base URL that relative links should resolve to.

  • access: protected
integer $width = 70 (line 122)

Maximum width of the formatted text, in columns.

Set this value to 0 (or less) to ignore word wrapping and not constrain text to a fixed-width column.

  • access: protected
boolean $_converted = false (line 300)

Indicates whether content in the $html variable has been converted yet.

boolean $_do_links = true (line 316)

Boolean flag, true if a table of link URLs should be listed after the text.

array $_link_list = array() (line 308)

Contains URL addresses from links to be rendered in plain text.

Methods
Constructor __construct (line 330)

Constructor.

If the HTML source string (or file) is supplied, the class will instantiate with that source propagated, all that has to be done it to call get_text().

rcube_html2text __construct ([string $source = ''], [boolean $from_file = false], [boolean $do_links = true], [integer $width = 75], [ $charset = 'UTF-8'])
  • string $source: HTML content
  • boolean $from_file: Indicates $source is a file to pull content from
  • boolean $do_links: Indicate whether a table of link URLs is desired
  • integer $width: Maximum width of the formatted text, 0 for no limit
  • $charset
blockquote_citation_ballback (line 625)

Callback function to correctly add citation markers for blockquote contents

  • access: public
void blockquote_citation_ballback ( $m)
  • $m
get_text (line 366)

Returns the text, converted from HTML.

  • return: Plain text
string get_text ()
pre_preg_callback (line 661)

Callback function for preg_replace_callback use in PRE content handler.

  • access: public
string pre_preg_callback (array $matches)
  • array $matches: PREG matches
print_text (line 378)

Prints the text, converted from HTML.

void print_text ()
set_allowed_tags (line 388)

Sets the allowed HTML tags to pass through to the resulting text.

Tags should be in the form "<p>", with no corresponding closing tag.

void set_allowed_tags ([ $allowed_tags = ''])
  • $allowed_tags
set_base_url (line 398)

Sets a base URL to handle relative links.

void set_base_url ([ $url = ''])
  • $url
set_html (line 349)

Loads source HTML into memory, either from $source string or a file.

void set_html (string $source, [boolean $from_file = false])
  • string $source: HTML content
  • boolean $from_file: Indicates $source is a file to pull content from
tags_preg_callback (line 638)

Callback function for preg_replace_callback use.

  • access: public
string tags_preg_callback (array $matches)
  • array $matches: PREG matches
_build_link_list (line 509)

Helper function called by preg_replace() on link replacement.

Maintains an internal list of links to be displayed at the end of the text, with numeric indices to the original point in the text they appeared. Also makes an effort at identifying and handling absolute and relative links.

  • access: protected
void _build_link_list (string $link, string $display)
  • string $link: URL of the link
  • string $display: Part of the text to associate number with
_convert (line 421)

Workhorse function that does actual conversion (calls _converter() method).

  • access: protected
void _convert ()
_converter (line 453)

Workhorse function that does actual conversion.

First performs custom tag replacement specified by $search and $replace arrays. Then strips any remaining HTML tags, reduces whitespace and newlines to a readable format, and word wraps the text to $width characters.

  • access: protected
void _converter (string &$text)
  • string &$text: Reference to HTML content string
_convert_blockquotes (line 572)

Helper function for BLOCKQUOTE body conversion.

  • access: protected
void _convert_blockquotes (string &$text)
  • string &$text: HTML content
_convert_pre (line 544)

Helper function for PRE body conversion.

  • access: protected
void _convert_pre (string &$text)
  • string &$text: HTML content

Documentation generated on Fri, 03 May 2013 12:45:00 +0200 by phpDocumentor 1.4.4