Alex (SirShurf) Frenkel's Blog

A web log of a PHP professional

Problem with UTF, Excepts — AND GOOD CODING PRACTICES!

with 2 comments

I am sorry if what I am going to write will offend anybody but after more then 2 days of debugging… I am getting frustrated with this.

I have no problems with plugins not working with Hebrew, since this language is not that common, but for crying out loud, PLEASE pretty PLEASE use common sense when programming!

I will show here simple example of what I mean and this is only an example, unfortunaly the same problem exist in other plugins (I have seen the same general programming problems in Ultimate Facebook too).

This plugin create a description text for the “META-Description” tag.
Now the problem in the current version I faces is that it’s not stripping out “Caption” tags and it’s splitting multi-byte strings in the middle of the letter (usage of strlen which is not multibyte safe).
I am not asking here WHY the programmer not used a built in function to get excerpt of a set length, or added ANY hooks in his plugins!!!

Here is the code used:

First lets go to: wds_onepage.php that is responsible for printing the meta to the client, in it we see next code:

if (is_singular()) {
$metadesc = wds_get_value('metadesc');
if ($metadesc == '' || !$metadesc) {
$metadesc = wds_replace_vars($wds_options['metadesc-'.$post->post_type], (array) $post );
}
}

You can see here that first we try to get metadesc value and if it’s empty we are requesting metadesc value from the option.
Lets trace “wds_replace_vars” to the source, and we find it in wds_core.php
I will not put the full function here to save some space but here is the interesting part:
'%%excerpt%%' => !empty($r['post_excerpt']) ? apply_filters('get_the_excerpt', $r['post_excerpt']) : substr(wp_trim_excerpt($r['post_content']), 0, 155),

We are checking IF excerpt exists if not we will create one using “wp_trim_excerpt” and we will substr (not MB safe!!!).
The first part of the problem is easily fixed here all I need to do is to sit in the “wp_trim_excerpt” hook of WP:

add_filter( 'wp_trim_excerpt', array($objClass,'seo'));
...
function seo($strText){

$strText = strip_shortcodes( $strText );
$strText = strip_tags ( $strText );

return $strText;
}

That way we are getting text only and not all of the HTML junk the plugin is giving GOOGLE.

But now I am getting blank description, so lets trace it further, we have striped the text, and cutted it not MB safe to 155 characters. next things we have is this:

echo "\t".''."\n";

OUCH, now, we are stripping tags and using esc_attr on our cutted text. the first is stupid since the HTML we had was cutted to 155 in the middle so the strip_tags would not work, but that I have fixed using the hook above (I am stripping tags on the original text before the substr call).

But the biggest problem is the esc_attr call, since it’s checking for the CORRECT UTF text, and since we cutted the text in the middle of a letter it is not a VALID UTF!

to fix that lets change the hook above:

add_filter( 'wp_trim_excerpt', array($objClass,'seo'));
...
function seo($strText){

$strText = strip_shortcodes( $strText );
$strText = strip_tags ( $strText );
$strText = apply_filters('the_content', $strText);
$strText = str_replace(']]>', ']]>', $strText);
$excerpt_length = apply_filters('excerpt_length', 55);
$excerpt_more = apply_filters('excerpt_more', ' ' . '[...]');
$strText = wp_trim_words( $strText, $excerpt_length, $excerpt_more );

return $strText;
}

This time I am responsible for the shortening of the text (I am making it only 55 characters to fit in 155 that will be cutted later), and using wp_trim_words in order to have only full words in the description (Google dont like paritial words).

This has taken ceare of the first part the client part, but what about the Admin part?

Well this is the one that got me to write this post, in the admin part for some reason the creation of the text does not uses the same function at all!!!!

The parameter here is $desc in wds-core-metabox.php and is acheaved by this code:

$desc = wds_get_value('metadesc');
if (empty($desc))
$desc = substr(strip_tags($post->post_content), 0, 130).' ...';
if (empty($desc))
$desc = 'temp description';

Now here there is NO hooks and no methods for me to catch.
Currently as of writing this post, I have not yet found an answere of how to bypass this and ANY ideas will be welcomed.

In any way, my fix for this plugin will be availible at this location as soon as wordpress.org approve it:
http://wordpress.org/extend/plugins/wpmu-dev-seo-addon/

Advertisements

Written by Alex (Shurf) Frenkel

February 20, 2012 at 5:22 pm

2 Responses

Subscribe to comments with RSS.

  1. […] In addition to this post:Problem with UTF, Excepts — AND GOOD CODING PRACTICES! […]

  2. […] you where able to see from my previous post’s I was quite made at some of the development decisions of WPMU stuff and I was stuck with a problem […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: