Online OCR

I was thinking recently how to speed up in an intelligent way the creation of image maps with my online utility. Often users try to map images like this:

The user will select "Home" and try to map it, giving it an url, adding an alt attribute. Now what if we could prefill the alt attribute with the text "Home"? Optical character recognition (OCR) in javascript? Wow, that would rock!
Well, technically we can access pixel data with the advent of the canvas element, but since OCR is a processor heavy operation, as i see today (better to say as Google sees it), noone wasted the time to write a javascript based OCR yet. :)
Now what if we use some server-backed online service?
Technically speaking again, it is possible to crop one part of the image, get the image data, post it to the online OCR, and parse the result. Unfortunately these online services didnt produce too good results with the test image above. I know, it is noisy jpg and too small, but still, it is a real life example.

Anyway, here comes the list of online OCR services that i found, in order from best to worst, result of the process in brackets:


The slow echo

...so i was about to profile a web application written in php, and as all of us does when doing so, i scattered the code around echoing microtime differences. (Ok, ok, some of you might use log instead.)
To my greatest surprise the part where the application showed significant slowdown in some cases, was a simple echo statement. Well, i wont be able to optimize that, will i? Why can this be happening?
After a bit of googling, i came across this site, where they suggest to split up the echo to smaller chunks to avoid network fragmentation. Hmm, i gave it a try, but no difference. It is a rather old post so i was scrolling down on the comments (most of them are engaging into the question of determining MTU and completely miss the point) to see if people still face this problem, and what are their solutions. The very last post (at that time) suggested simply using output buffering.

Well, my application was already using ob, but as i inspected the code more closely, one of the modules that was included early in the page generation, simply switched ob off!
Damm, switching it back speeded up the application again!

So what is happening in the background?
I am not sure about the implementation of echo, but i guess when echoing a large string over a slow network, if output buffering is not used, echo waits until it has sent the last byte. In other words the network latency comes down to php level.
However, if you use output buffering, you can generate the whole page quickly, then have the webserver send it to the client over the network, all the latency is offloaded from php and will not reflect in the execution time.

To sum up, output buffering is your true friend.


SugarCRM redundant indexes

...ok, so i was doing some database tuning recently to speed up one of our deployed instance of SugarCRM, when i became alert of an interesting phenomenon.

The table accounts not only had a primary key(id), which is normal, but also had a composite key(id, deleted) named idx_accnt_id_del. Now wait a minute! Even if you perform a query with id AND deleted criteria, id already points to a single record in the table, since it is a unique primary key. Thus the composite key makes absolutely no sense!
Mysqlperformanceblog, which i have read a lot lately, confirms my findings.
I didn't try any of the tools they mention in the comments section, but by quickly searching for "id_del" in Sugar vardef files reveals that the following tables have the same problem:
- roles
- acl_actions
- acl_roles
- fields_meta_data

These tables are usually small, and don't have frequent insert operations, so it is not a big performance issue, but the aforementioned accounts can grow big, and managing an unnecessary index with a length of 38 bytes is just a burden on the database. The redundant index is present in an older Sugar 4.5 and the latest 5.0.0f, too.

UPDATE: I posted it on Sugar forums, lets see the feedback:)


Web applications going offline

...so you still remember the times when Internet was loud of desktop applications going online, huh? Email was one of the earliest obvious implementations, slowly followed by the more difficult applications: online document, spreadsheet, presentation editing, agenda, todo list, contact management, image editing, imagemap editing ;), etc. And you started to use one or more of these online wonders, and you were happy. Obviously sooner or later you realized you are pretty much stuck when you are offline. Application developers also had this in mind, and now in 2008 we have a few handsome tools to transform some of these applications to work in offline mode.

The trend might seem ironic, as applications that were desperate to go online now try to crawl back to your desktop, but in reality this only means progression. We benefit hugely of the interaction of the web and the desktop. Myself played some hours with Adobe AIR (Gears and BrowserPlus still waiting to be discovered but they also look great), and as i see most applications can gain a lot with options like drag and drop, file operations, local storage.

Currently Gears and Air offers rather different approaches to offline working. Which one will be the winner? Only future can tell, both seem very strong at the moment.



HAI! Seriously, did you know there is a programming language called LOLCODE? I did not, but apparently it is very popular, among the implementations we can even find PHP and Javascript parsers.

Hello Hai World Example:


Firefox 3 changes in file input

...so try to access a file input's value property in FFX3. What do you get? The filename. What do you get in IE and FFX2? The filename AND the path.
Well, if your script has been relying on this behaviour, you might find yourself in trouble. So far the only solution i found hides in the comments section of this post (great showcase of new functionality on the other hand btw).

To save you the time, you have to do the following:
- go to about:config page
- swear that you wont make any trouble :)
- set signed.applets.codebase_principal_support to true (search for "applet" in the quick search box)

And in your script something like this:

<input name="uploadfile" onchange="
if (document.myform.uploadfile.files) {
//ffx3 - try to have access to full path
try {
netscape.security.PrivilegeManager.enablePrivilege( 'UniversalFileRead' )
catch (err) {
//need to set signed.applets.codebase_principal_support to true
document.myform.file_path.value = document.myform.uploadfile.value" type="file">
<input name="file_path" type="xhidden">

Now whenever you will try to run the onchange script, FFX3 will display a dialog window where you can confirm access to the full file path.
(Sorry about the formatting i am still examining whats the best way to post code in blogger)

UPDATE: Another approach can be found here.


PHP RSS parsers

...so the other day i figured out that the parser i used so far (LastRss) was not able to read Atom feeds. Since our beloved Blogger only publishes atom feeds i needed to find a cure for the problem.
One easy solution is to ask FeedBurner to convert between rss and atom, which works pretty well, but better not mess with the Gods, lets do it the proper way.
Then i came across this blog entry, that compared some of the existing solutions. Finally i chose SimplePie, which works brilliantly ever since. To have an overview of the solutions, here is my little comparison:

+ simple
+ fast
+ small footprint
- no support for Atom
- website looks interesting recently (says: It works!)

+ part of Zend Framework
+ supports Rss and Atom
+ well documented
- supports PHP5 only

+ supports Rss and Atom ("with few exceptions" :))
+ long time out there
- long time out there :) seems like it has never really grown up

+ seems like it supports everything
+ well documented
+ fresh and stable project
- one heavy includefile if it bothers you (350K)

XML_RSS (Pear)
- i am not sure, but i think only supports Rss feeds
- Pear dependencies
- "Oldest open bug: 326 days"

- supports Rss only
- PHP 5 only (i know, i know, we should all live in a PHP5 world, but we dont)
- "A commercial version (v3) of the RSS Parser / XML Parser for PHP [rss_php] is now released and available for download. This release fixes a couple of tiny bugs and adds far more functionality...Our original version (RSS_PHP v1) is still freely available." - so you can get a buggy version for free, the commercial version costs 15USD.


Evolution of Gmail chat

Just a short overview of how Gmail chat evolved since its first appearance. And it has to be said it works very well. However i started to be a fan of Google Talk Labs edition. I dont know since when it has this feature, but i just love the fetching of youtube vids and picasa albums straight in the chat window so i dont have to visit the site itself, very well done!