Wednesday, November 30, 2011

Array extras and Objects

When Array extras landed in JavaScript 1.6 I had, probably together with other developers, one of those HOORRAYYY moment ...
What many libraries and frameworks out there still implement, is this sort of universal each method that supposes to be compatible with both Arrays and Objects.

A Bit Messed Up

What I have never liked that much about these each methods is that we have to know in advance in any case if the object we are passing is an Array, an ArrayLike one, or an Object.
In latter case, the callback passed as second argument will receive as second argument the key, and not the index, which simply means we cannot trust a generic callback unless this does not check per each iterated item the second argument type, or unless we don't care at all about the second argument.
In any case I always found this a bad design. If we think about events, as example, it's totally natural to expect a single argument as event object and then we can act accordingly.
This let us reuse callbacks for similar purpose and maintain a DRY code.

Need For An Object#forEach

All implementation of each, and as far as I know with the only exception of jQuery which makes things even more complicated since we generally have to completely ignore the first argument in this case, have some natural confusion inside the method.
If you take the underscore.js library, as example, you will note that there are two aliases for the each method, each itself and forEach, so it's more than clear for me that JS developers are clearly missing an Array#forEach like method in order to iterate with objects, rather than lists.
It must be also underlined that all these methods are somehow error prone: what if the object we are passing has a length property that does not necessary mean it points to the length of items stored via index as if it was an Array?
You may consider this an edge case, or an anti pattern, then you have to remember that functions in JavaScript are first class objects.
Probably all these methods will nicely fail indeed with functions, passed as objects, whenever you decide that your function can be used as object too.

var whyNot = function (obj) {
/* marvelous stuff here */
this.calls++;
return this.doStuff(obj);
};
whyNot.calls = 0;
whyNot.doStuff = function (obj) {
/* kick-ass method */
};

// the unexpected but allowed
whyNot = whyNot.bind(whyNot);

whyNot.length; // 1
whyNot[0]; // undefined

By design, the length of any function in JavaScript is read-only and means nothing, in therms of Array iteration, it simply means the number of arguments the function defined during its declaration/definition as expression.

WTF

Whenever above example makes sense or not, I am pros patterns exploration and when a common method is not compatible with all scenarios, I simply think something went wrong or is missing in the language.
Thanks gosh JS is freaking flexible and with ES5 we can define some prototype without affecting for( in ) loops but hopefully simplifying our daily basis stuff.
Remember? With underscore or others we still have to know in advance if the passed object is an Array, an ArrayLike, or a generic object ... so what would stop us to simply chose accordingly?

// Array or ArrayLike
[].forEach.call(genericArrayLike, callbackForArrays);

// generic object to iterate
{}.forEach.call(object, callbackForObjects);

An explicit choice in above case is the fastest and most reliable way we have to do things properly. A DOM collection, as well as any array or arrayLike object will use the native forEach, but we can still recycle callbacks designed to deal with value, key and objects, rather than value, index, and this is the little experiment:

Object extras

The concept of each callback is exactly the same of original, native, Array callbacks, except things are based on native functions available in all ES5 compatible desktop and basically all mobile browsers, and easy to shim with all others too old to deal with JS 1.6 or higher.



Here a couple of examples:

var o = {a:"a", b:"b", c:""};

// know if all values are strings
o.every(function (value, key, object) {
return typeof value == "string";
}); // true

// filter by content, no empty strings
var filtered = o.filter(function (value, key, object) {
return value.length;
}); // {a:"a",b:"b"} // original object preserved

// loop through all values (plus checks)
o.forEach(function (value, key, object) {
object === o; // true
this === o; // true
if (key.charAt(0) != "_") {
doSomethingWithThisValue(value);
}
}, o); // NOTE: all these methods respect Array extras signatures

// map a new object
var mapped = o.map(function (value, key, object) {
return value + 1;
}); // {a:"a1",b:"b1"} // original object preserved

// know if a value contains "a"
o.some(function (value, key, object) {
return value === "a";
}); // true

The reason reduce and reduceRight are not in the list is simple: which one would be the key to preserve, the first of the list? There is no such thing as "predefined for/in order" in JavaScript plus these methods are more Array related so out of this experiment.

As Summary

Once minified and minzipped the gist weights about 296 bytes which is ridiculous size compared with any application we are dealing with on daily basis.
Specially forEach, but probably others too, may become extremely handy and ... of course, using the Object.keys method internally, this is gonna be compatible with Arrays too but hey, the whole point was to make a clear distinction ;)


[edited]

The Misleading Signature

I don't know how many times I have spoken with jQuery developers, just because they are common, convinced that native Array#forEach was accepting the value as second argument.
I always considered inverted signatures, whatever API it is, bad for both performances, no possibility to fallback into some native method, and learning curve, where new comers learn than a generic each method must have the index as first argument.
Bear in mind whenever we loop we are most likely interested into the value of that index or key, so this value should be the first, and if you need the only one, argument passed through the procedure.
A completely ignored first argument is, once again and in my opinion, a bad design for an API: stuck without native power, teaching arguments order is not relevant.
Well, specially latter point is true if we have named arguments, but in JS nothing have been planned so far, and in ES6 the way we gonna name arguments is still under discussion.

Have fun with JS

Sunday, November 27, 2011

About Felix's Style Guide

Style guides are good as long as these are meaningful and a bit "open minded" because if one style is recognized as anti pattern then it must be possible to update/change it.

The Felix's Node.js Style Guide is surely a meaningful one, but I hope it's open minded too.

Why This Post

Because I write node.js code as well sometimes and I would like to write code accepted by the community. Since developers may go too religious about guides I don't want explain all these points per each present or future node.js module I gonna write ... I post it once, I will eventually update it, but it's here and I can simply point at this any time.

Style Guide Side Effect

As happened before with the older one or the linter, a style guide may mislead developers and if they are too religious and unable to think about some point, the quality of the result code may be worse rather than better.
I have talked about JSLint already too much in this blog so ... let's bring the topic in track and see what Felix may re-consider.

Quotes

Current status: Use single quotes, unless you are writing JSON
A common mistake for JavaScript developers, specially those with PHP, Ruby, or other scripting languages background, is to think that single quotes are different from double quotes ... sorry to destroy this myth guys but in JavaScript there is no difference between single and double quotes.
Once you get this, you must think that if by standard definition JSON uses double quotes there is no reason to prefer single quotes over double.
What should this difference tell us, that one is JSON and one is JavaScript for node.js? Such assumption is a bit ridiculous to me since JSON is JavaScript, nothing more, nothing less.
More over, I am not a big fan of HTML strings inside JS files because of separation of concerns, and I'll come back on it, but in English, as well as in many other languages, the single quote inside a string is quite common, isn't it?
Accordingly, are you really planning to prefer 'it\'s OK' over "it's OK"?
And if the suggested editor, in the guide itself, is one with supported syntax highlight, shouldn't we simply trust the highlighted string in order to recognize it, rather than distinguish between JSON and non JSON strings?
Shouldn't we write comfortably without silly rules, as Felix says in another point, so that we can chose whatever string delimiter we want accordingly with the content of the string?
Last, but not least, for those convinced that in HTML there is any sort of difference between single and double quotes, I am sorry to, once again, destroy this myth since there is no difference in HTML between quotes ... so, once again, chose what you want and focus on something else than distinction between JSON and non JSON strings ...

Variable declarations

Current status: Declare one variable per var statement, it makes it easier to re-order the lines. Ignore Crockford on this, and put those declarations wherever they make sense.
If there is one thing that I always agreed with Crockford is this one: variable declared at the beginning of the function, with the exception for function declarations that should be before variable declarations, and not after.
The reason is quite simple: wherever we declare a variable, this will be available at the very beginning of the scope so which place can be better than the beginning of the scope itself to instantly recognize all local scope variables rather than scrolling the whole closure 'till the end in order to find that name that completely screwed up the whole logic in the middle because is shadowing something else inside or outside the scope ?
Is that easy, I don't want to read potentially thousands of lines of code to understand which variable has been defined where, I want a single bloody place to understand what is local and what is not.
The only exception to this rule may be some temporary variable, as is the classic i used in for loops, or some tmp reference still common in for loops

// the exception ... in the middle of the code

for (var i = 0, length = something.length, tmp; i < length; ++i) {
tmp = something[i];
// do something meaningful
}

Chances that above loop will destroy our logic because of a shadowed i or tmp reference are really rare and we should use meaningful variables name.
As summary, there are only advantages on declaring variables on top, and until we have no let statements it is only risky, and dangerous for our business, to define variables in the wild.

// how you may write code
(function () {
return true;
var inTheWild = 123;
function checkTheWild() {
return inTheWild === 123;
}
}());

// how JS works in any case
(function () {

// function declarations instantly available in the scope
function checkTheWild() {
return inTheWild === 123;
}

// any var declaration instantly available as undefined
var inTheWild;

return true;

// no matters where/if we assign the value
// the inTheWild is already available everywhere
// with undefined as default value
// and before this assignment
inTheWild = 123;

// shadows should be easy to track
// top of the function is the right place to track them
}());


Constants

Current status: Constants should be declared as regular variables or static class properties, using all uppercase letters.

Node.js / V8 actually supports mozilla's const extension, but unfortunately that cannot be applied to class members, nor is it part of any ECMA standard.

I could not disagree more here ... the meaning of constant is something immutable and the reason we use constant is to believe these are immutable.
Why on earth avoid a language feature when it provides exactly what we need? Don't we want more reliable code? And what about security?
Neither in PHP we can define an object property and this does not mean we should avoid the usage of define function, isn't it?
Use const as much as you want unless the code does not have to run cross platform ( browsers included ) and if you want to define constants with objects, use the JavaScript equivalent to do that.

function define(object, property, value) {
Object.defineProperty(object, property, {
get: function () {
return value;
}
// if you really want to throw an error
// , set: function () {throw "unable to redefine " + property;}
});
}

var o = {};
define(o, "TEST", 123);
o.TEST = 456;
o.TEST; // 123

I have personally proposed cross platform ways to do the same without or within objects themselves.
As we can see there are many ways to define constants and if the argument is that const is not standard it's a weak one because nobody is planning to remove this keyword from V8 or SpiderMonkey, most likely every other browser will adopt this keyword in the future but if that won't happen, to find and replace it with var only when necessary is a better way to go: security and features, we should encourage these things rather than ignore them, imho.

Equality operator

Current status: Programming is not about remembering stupid rules. Use the triple equality operator as it will work just as expected.
No it doesn't because coercion is not about equality only.

alert("2" < 3 && 1 < "2");

Programming IS about remembering rules, that's the reason we need to know the syntax in order to code something ... right?
Coercion must be understood rather than being ignored and rules are pretty much simple and widely explained in the specs, even more simplified in this old post of mine.
If any of you developers still do this idiotic check you are doing it wrong.

// WRONG!!!!!!!!!!!!!
arg === null || arg === undefined

// THE SAFER EXACT EQUIVALENT
arg == null

You should rather avoid undefined equality in any piece of code you wrote, unless you did not create an undefined reference by yourself.
As summary, a good developer does not ignore problems, a good developer understands them and only after decide which way is the best one.
Felix is a good developer and he should promote deeper knowledge ... this specific point was really bad because it's like saying "avoid regular expressions because you don't understand them and you don't want to learn them".

Extending prototypes

Current status: Do not extend the prototypes of any objects, especially native ones
Read this article and stick an "it depends" in your screen. Nobody wants to limit JS potentials and Object.defineProperty/ies is a robust way to do things properly.
Sure we must understand that even a method like Array#empty could have different meanings ... e.g.

var arr = Array(3);
// is arr empty ?
// length is 3 but no index has been addressed

If we pollute native prototype we must agree on the meaning or we must prefix our extension so that nobody can complain if that empty was good enough or just a wrong concept.

// a meaningful Array#empty
Object.defineProperty(Array.prototype, "empty", {
value: function empty() {
for (var i in this) return false;
return true;
}
});

alert([
[].empty(), // true
Array(3).empty(), // true
[null].empty() // false
]);


Conditions

Current status: Any non-trivial conditions should be assigned to a descriptive variable
I often write empty condition on purpose and to both avoid superflous re-assignment and speed up the code.

// WRONG
object.property = object.property || defaultValue;

// better
object.property || (object.property = defaultValue):

// RIGHT
"property" in object || (object.property = defaultValue);

Why on earth would I assign that ? Same I would not do it if the condition is not reused at least twice.

if (done.something() && done.succeed) {
// done.something() && done.succeed useful
// here and nowhere else in the code
}

Accordingly with the fact spread variable declarations are a bad practice, I cannot think about creating flags all over the place for these kind of one shot checks.

Function length

Current status: Keep your functions short. A good function fits on a slide that the people in the last row of a big room can comfortably read. So don't count on them having perfect vision and limit yourself to ~10 lines of code per function.
I agree on this point but JavaScript is all about functions.
I understand the require approach does not need a closure to surround the whole module, being this already somehow sandboxed, but cross platform code and private functions or variables are really common in JS so that a module may be entirely written inside a closure.
A closure is a function so as long as this point is flexible is fine to me, also as long as the code is still readable and elegant ... I can already imagine developers writing 10 lines of functions using all 80 characters per line in order to respect this point ...



... now, that would be lame.

Object.freeze, Object.preventExtensions, Object.seal, with, eval

Current status: Crazy shit that you will probably never need. Stay away from it.
Felix here put language features together with "language mistakes".
With "use strict" directive, the one Felix should have put at the very beginning of his guideline, with statement is not even allowed.
eval has nothing to do with Object.freeze and others and these methods are part of the ES5 specification.
It is true you may not need them, to define them as something to stay away from, specially from somebody that said there are no constants for objects, is kinda limitative and a sort of non sense.
Learn these new methods, and use them if you think/want/need.

And that's all folks

Saturday, November 26, 2011

JSONH New schema Argument

The freaking fast and bandwidth saver JSONH Project has finally a new schema argument added at the end of every method in order to make nested Homogeneous Collections automatically "packable", here an example:

var
// nested objects b property
// have same homogeneous collections
// in properties c and d
schema = ["b.c", "b.d"],

// test case
test = [
{ // homogeneous collections in c and d
b: {
c: [
{a: 1},
{a: 2}
],
d: [
{a: 3},
{a: 4}
]
}
}, {
a: 1,
// same homogeneous collections in c and d
b: {
c: [
{a: 5},
{a: 6}
],
d: [
{a: 7},
{a: 8}
]
}
}
]
;

The JSONH.pack(test, schema) output will be the equivalent of this string:

[{"b":{"c":[1,"a",1,2],"d":[1,"a",3,4]}},{"a":1,"b":{"c":[1,"a",5,6],"d":[1,"a",7,8]}}]


How Schema Works

It does not matter if the output is an object or a list of objects, as well as it does not matter if the output has nested properties.
As soon as there is an homogeneous collection somewhere deep in the nested chain and common for all other levels, the schema is able to reach that property and optimize it directly.
Objects inside objects do not need to be the same or homogeneous, these can simply have a unique property which is common for all items and this is enough to take advantage of the schema argument that could be one string, or an array of strings.

Experimental

Not because it does not work, I have added tests indeed, simply because I am not sure 100% this implementation covers all possible cases but I would rather keep it simple and let developers deal with more complex scenario via manual parsing through the JSONH.pack/unpack and without schema ... this is still possible as it has always been.
Let me know what you think about the schema, if accepted, I will implement it in Python and PHP too, thanks.