How to truncate rich text strings with HTML tags in PHP

Free de bike 2022-05-14 14:40:53 阅读数:690

truncaterichtextstringshtml

Preface

In development , Truncating a string is a common operation . stay PHP in , Truncating strings is very convenient , Use mb_substr The function can .
But this is only for ordinary strings , If you want to truncate a band HTML The rich text string of the tag , You can't simply use this function .
Most of the HTML Labels are in pairs , We can't truncate between a pair of tags , You can't truncate the label itself , Otherwise there will be problems .

Code

To solve this problem , I use the DOMDocument This class ( Need to install libxml Expand ) To achieve HTML String truncation , The code is as follows :

<?php
class HtmlText
{

private static function iterateDOMNodes(DOMNode $domNode, callable $callable)
{

foreach ($domNode->childNodes as $node) {

$callable($node);
if($node->hasChildNodes()) {

static::iterateDOMNodes($node, $callable);
}
}
}
/** * Cut off zone HTML String of tags * * @param string $string belt HTML String of tags * @param int $limit The number of words to be truncated * @param array $option Options * - ellipsis: Omit the symbol , The default value is ... * - strip_attr: Whether to remove the label attribute attribute , The default value is false * * @return string Truncated string * * @throws DOMException */
public static function truncate(string $string, int $limit, array $option = []): string
{

$default = [
'ellipsis' => '...',
'strip_attr' => false,
];
$option = $option + $default;
$oriDoc = new DOMDocument();
// Convert the encoding of the string to HTML-ENTITIES, Prevent Chinese miscoding 
$convertedString = mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8');
@$oriDoc->loadHTML($convertedString, LIBXML_HTML_NODEFDTD);
$newDoc = new DOMDocument();
$newDocXPath = new DOMXPath($newDoc);
static::iterateDOMNodes($oriDoc, function($oriNode) use ($newDoc, $newDocXPath, $option, &$limit) {

if ($limit <= 0) {

return;
}
switch ($oriNode->nodeType) {

case XML_TEXT_NODE:
$oriNodeVal = $oriNode->nodeValue;
if (preg_match('/^[\s\xa0]+$/u', $oriNodeVal)) {

return;
}
$oriNodeVal = str_replace(["\r\n", "\n"], '', $oriNodeVal);
// The leading and trailing white space characters are not included in the number of intercepted words 
if (preg_match('/^([\s\xa0]*)(\S.*\S)([\s\xa0]*)$/u', $oriNodeVal, $matched)) {

$preBlank = $matched[1];
$strNeedCut = $matched[2];
$sufBlank = $matched[3];
} else {

$preBlank = '';
$strNeedCut = $oriNodeVal;
$sufBlank = '';
}
$strLength = mb_strlen($strNeedCut);
if ($strLength >= $limit) {

$strNeedCut = mb_substr($strNeedCut, 0, $limit);
$strNeedCut = "{
$strNeedCut}{
$option['ellipsis']}";
$sufBlank = '';
}
$limit -= $strLength;
$tmp = "{
$preBlank}{
$strNeedCut}{
$sufBlank}";
$newNode = $newDoc->createTextNode($tmp);
break;
case XML_ELEMENT_NODE:
$newNode = $newDoc->createElement($oriNode->nodeName);
if (!$option['strip_attr'] && $oriNode->hasAttributes()) {

foreach ($oriNode->attributes as $attr) {

$newAttr = new DOMAttr($attr->nodeName, $attr->nodeValue);
$newNode->setAttributeNode($newAttr);
}
}
break;
default:
return;
}
if (!$newDoc->hasChildNodes()) {

$newDoc->appendChild($newNode);
} else {

$oriParentNodePath = pathinfo($oriNode->getNodePath(), PATHINFO_DIRNAME);
$parentNode = $newDocXPath->query($oriParentNodePath)->item(0);
$parentNode->appendChild($newNode);
}
});
$newString = html_entity_decode($newDoc->saveHTML());
// Remove automatically added labels 
$checkTags = ['html', 'body', 'head', 'p'];
foreach ($checkTags as $tag) {

if (stripos($string, "<$tag>") === false
&& stripos($newString, "<$tag>") !== false) {

$newString = str_replace(["<$tag>", "</$tag>"], '', $newString);
}
}
return $newString;
}
}

Use case testing

Use cases 1:

<?php
$text =<<<EOT <div> <script src="jquery-2.1.1.min.js"></script> <p style="color: red;"> <a href="#"> abed, I see a silver light </a> </p> <p> Suspected frost on the ground </p> <img src="jquery-2.1.1.min.js" alt=""/> <h2> look at the bright moon </h2> </div> EOT;
// Intercept 8 A word 
echo HtmlText::truncate($text, 8);

Output :

<div>
<script src="jquery-2.1.1.min.js"></script>
<p style="color: red;">
<a href="#"> abed, I see a silver light </a>
</p>
<p> Suspiciously ...</p>
</div>

just 8 A word ,HTML The tag is not truncated , And automatically spliced at the end ... Ellipsis , normal .

Use cases 2:

$text = ' This is an unlabeled text ';
echo HtmlText::truncate($text, 8); // Output : This is a paragraph without a label ...

It can also be used without HTML Tag text

Use cases 3:

$text = ' This is a paragraph with only one <p> Tag text ';
echo HtmlText::truncate($text, 8); // Output :<p> This is a paragraph with only one ...</p>

forehead … There is something wrong with this use case , In the middle of the <p> The label is missing , And automatically add a pair on both sides p label , But the problem is not big …

summary

After testing , The test results of most use cases are normal , Has been able to meet my use , There are some small problems that can be fixed later .

in addition , Open source framework cakephp A truncation is also provided HTML Rich text function class ,Text Class truncate Method , For details, see github.

版权声明:本文为[Free de bike]所创,转载请带上原文链接,感谢。 https://qdmana.com/2022/134/202205141436001109.html