<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Tag Clouds: See How Noisy Your Code Is</title>
	<atom:link href="http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/feed/" rel="self" type="application/rss+xml" />
	<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/</link>
	<description>Repeat after me: Data is code, code is data.</description>
	<pubDate>Sat, 20 Mar 2010 03:24:27 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: Jonathan Feinberg</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1776</link>
		<dc:creator>Jonathan Feinberg</dc:creator>
		<pubDate>Sun, 07 Jun 2009 01:13:47 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1776</guid>
		<description>I came for the post about Clojure in GAE, and stumbled upon this one. I thought you might be amused to see the Wordle that results from concatenating the Java source for the core Wordle layout code (as distinct from the Wordle applet, which is 95% UI code). Notice, and ponder, the prevalence of boilerplate copyright blocks ("IBM", "Corp", "Copyright").

http://www.wordle.net/gallery/wrdl/921427/Wordle_Core_Source_Code

Also, if anyone is still interested, there's a command-line version of Wordle available here:

http://www.alphaworks.ibm.com/tech/wordcloud</description>
		<content:encoded><![CDATA[<p>I came for the post about Clojure in GAE, and stumbled upon this one. I thought you might be amused to see the Wordle that results from concatenating the Java source for the core Wordle layout code (as distinct from the Wordle applet, which is 95% UI code). Notice, and ponder, the prevalence of boilerplate copyright blocks (&#8221;IBM&#8221;, &#8220;Corp&#8221;, &#8220;Copyright&#8221;).</p>
<p><a href="http://www.wordle.net/gallery/wrdl/921427/Wordle_Core_Source_Code" rel="nofollow">http://www.wordle.net/gallery/wrdl/921427/Wordle_Core_Source_Code</a></p>
<p>Also, if anyone is still interested, there&#8217;s a command-line version of Wordle available here:</p>
<p><a href="http://www.alphaworks.ibm.com/tech/wordcloud" rel="nofollow">http://www.alphaworks.ibm.com/tech/wordcloud</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Henrique</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1742</link>
		<dc:creator>Henrique</dc:creator>
		<pubDate>Sun, 03 May 2009 17:20:14 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1742</guid>
		<description>Very interesting idea, thank you!</description>
		<content:encoded><![CDATA[<p>Very interesting idea, thank you!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabe da Silveira</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1738</link>
		<dc:creator>Gabe da Silveira</dc:creator>
		<pubDate>Thu, 30 Apr 2009 16:32:15 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1738</guid>
		<description>This is definitely a cool visualization, but let me play devil's advocate:

I think the main thing you're measuring here (aside from language verbosity which you already mentioned) is just how well the variables and methods are named.  Even then, a good architecture might have everything namespaced inside a very important module.  If the module is cleanly decoupled it may not appear high in the word count at all even if it's the most important word.  Low-level methods and local variables will be favored, which (especially the latter) are the easiest part of the code to understand and have no bearing at all on whether the architecture is good. 

As far as trading in standard language keywords for DSLs, that always sounds good in theory, but what you are doing are trading standard language semantics for something domain-specific which may or may not be confusing, leaky, or non-obvious.  With standard language semantics you have a huge ready-made pool of talent.  For a DSL to shine the domain should naturally lend itself to a formal description, rather than shoehorning it in for the sake of aesthetics.

When I think of what goes into a good architecture, neither over-engineered, or under-engineered, well-tested, de-coupled, maintainable and performant, I have a hard time seeing where optimizing the word counts would not push out some more important design criterion.  I think it's a good exercise to create these word (they're not tag) clouds and make observations, but the moment we start using it as a metric then people will optimize for it, which I don't think would be win per se.</description>
		<content:encoded><![CDATA[<p>This is definitely a cool visualization, but let me play devil&#8217;s advocate:</p>
<p>I think the main thing you&#8217;re measuring here (aside from language verbosity which you already mentioned) is just how well the variables and methods are named.  Even then, a good architecture might have everything namespaced inside a very important module.  If the module is cleanly decoupled it may not appear high in the word count at all even if it&#8217;s the most important word.  Low-level methods and local variables will be favored, which (especially the latter) are the easiest part of the code to understand and have no bearing at all on whether the architecture is good. </p>
<p>As far as trading in standard language keywords for DSLs, that always sounds good in theory, but what you are doing are trading standard language semantics for something domain-specific which may or may not be confusing, leaky, or non-obvious.  With standard language semantics you have a huge ready-made pool of talent.  For a DSL to shine the domain should naturally lend itself to a formal description, rather than shoehorning it in for the sake of aesthetics.</p>
<p>When I think of what goes into a good architecture, neither over-engineered, or under-engineered, well-tested, de-coupled, maintainable and performant, I have a hard time seeing where optimizing the word counts would not push out some more important design criterion.  I think it&#8217;s a good exercise to create these word (they&#8217;re not tag) clouds and make observations, but the moment we start using it as a metric then people will optimize for it, which I don&#8217;t think would be win per se.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phillip Calçado "Shoes"</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1737</link>
		<dc:creator>Phillip Calçado "Shoes"</dc:creator>
		<pubDate>Thu, 30 Apr 2009 10:10:24 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1737</guid>
		<description>Rob,

I concatenated the code in a similar fashion (aggregated using ack) and just pasted code straight from emacs. It was around 64K lines for each example and I had no trouble. I user Safari 4.0 and Java 5 (for the applet).

I thought about using the advanced tools to mark in different colours the multiple domains -technical and business in this case- but was too lazy to actually do this myself.</description>
		<content:encoded><![CDATA[<p>Rob,</p>
<p>I concatenated the code in a similar fashion (aggregated using ack) and just pasted code straight from emacs. It was around 64K lines for each example and I had no trouble. I user Safari 4.0 and Java 5 (for the applet).</p>
<p>I thought about using the advanced tools to mark in different colours the multiple domains -technical and business in this case- but was too lazy to actually do this myself.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rob Hunter</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1736</link>
		<dc:creator>Rob Hunter</dc:creator>
		<pubDate>Thu, 30 Apr 2009 09:03:02 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1736</guid>
		<description>I wanted to run this myself but had trouble running the Wordle app from local input.

How did you do it?

I started by concatenating all the code files together[1] and pasting the results into the Wordle &lt;a href="http://www.wordle.net/create" rel="nofollow"&gt;"Paste a bunch of words here"&lt;/a&gt;.

I eventually discovered the &lt;a href="http://www.wordle.net/advanced" rel="nofollow"&gt;Wordle Advanced Tools&lt;/a&gt; and wrote a monster command-line[2] to find the top 200 words in a codebase.

In my sample of two:
 * A Rails codebase (just the "app" folder) -- relatively domainy, relatively few "programmer" words like "def"
 * The major section of a Java codebase: almost no domain words at all in the top 50 :-(

I believe the result is an accurate reflection with how generally expressive the code is in each codebase. The Java one makes heavy use of "String" and "get" and other machinery words.


-- 
[1] Concatenate all Java files together
&lt;code&gt;
find . -name '*.java' &#124; xargs cat
&lt;/code&gt;

[2] Identify the 200 most-used words in across all Java files (treating camelCase as two words)
&lt;code&gt;
find . -name '*.java' &#124; xargs cat &#124; tr -s '[[:blank:][:cntrl:][:punct:]]' '\n' &#124; grep '[[:alpha:]]' &#124; sed s/'\([[:lower:]]\)\([[:upper:]]\)'/'\1 \2'/g &#124; tr '[[:blank:]]' '\n' &#124; sort &#124; uniq -c &#124; sort -n &#124; sed s/'^\s*\([0-9]*\) \(.*\)$'/'\2:\1'/g &#124; tail -n 200
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>I wanted to run this myself but had trouble running the Wordle app from local input.</p>
<p>How did you do it?</p>
<p>I started by concatenating all the code files together[1] and pasting the results into the Wordle <a href="http://www.wordle.net/create" rel="nofollow">&#8220;Paste a bunch of words here&#8221;</a>.</p>
<p>I eventually discovered the <a href="http://www.wordle.net/advanced" rel="nofollow">Wordle Advanced Tools</a> and wrote a monster command-line[2] to find the top 200 words in a codebase.</p>
<p>In my sample of two:<br />
 * A Rails codebase (just the &#8220;app&#8221; folder) &#8212; relatively domainy, relatively few &#8220;programmer&#8221; words like &#8220;def&#8221;<br />
 * The major section of a Java codebase: almost no domain words at all in the top 50 <img src='http://fragmental.tw/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> </p>
<p>I believe the result is an accurate reflection with how generally expressive the code is in each codebase. The Java one makes heavy use of &#8220;String&#8221; and &#8220;get&#8221; and other machinery words.</p>
<p>&#8211;<br />
[1] Concatenate all Java files together<br />
<code><br />
find . -name '*.java' | xargs cat<br />
</code></p>
<p>[2] Identify the 200 most-used words in across all Java files (treating camelCase as two words)<br />
<code><br />
find . -name '*.java' | xargs cat | tr -s '[[:blank:][:cntrl:][:punct:]]' '\n' | grep '[[:alpha:]]' | sed s/'\([[:lower:]]\)\([[:upper:]]\)'/'\1 \2'/g | tr '[[:blank:]]' '\n' | sort | uniq -c | sort -n | sed s/'^\s*\([0-9]*\) \(.*\)$'/'\2:\1'/g | tail -n 200<br />
</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fun with Wordle &#187; Code Musings and Such</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1735</link>
		<dc:creator>Fun with Wordle &#187; Code Musings and Such</dc:creator>
		<pubDate>Wed, 29 Apr 2009 13:20:31 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1735</guid>
		<description>[...] saw a post on proggit about using tag clouds to detect code noise and thought I'd have some Wordle fun of my [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] saw a post on proggit about using tag clouds to detect code noise and thought I&#8217;d have some Wordle fun of my [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phillip Calçado "Shoes"</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1734</link>
		<dc:creator>Phillip Calçado "Shoes"</dc:creator>
		<pubDate>Wed, 29 Apr 2009 11:53:12 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1734</guid>
		<description>About the filtering of Java keywords:

There are -at least- two different ways to use the cloud. The first one is what I did, to try to get the ratio between noise caused by the language and signal in the code. This is useful to show how DSLs attack noise, for example. If you remove the keywords you won't find it.

The other approach I see is to do what you guys said and filter out known keywords and maybe other known entities - framework classes, for example. This is extremely useful in a different way: it tells you what the ubiquitous language looks like. Using this approach you can even try to study how Bounded Contexts are used in an application and how cohesive a system is.

I am working right now in trying to write something about those multiple uses. The hardest part is to find some good public code bases to explore.

@Ivan
I'm writing code too, maybe we should try to do something together. Still need some prototyping time alone, though, but I'm following your repo.</description>
		<content:encoded><![CDATA[<p>About the filtering of Java keywords:</p>
<p>There are -at least- two different ways to use the cloud. The first one is what I did, to try to get the ratio between noise caused by the language and signal in the code. This is useful to show how DSLs attack noise, for example. If you remove the keywords you won&#8217;t find it.</p>
<p>The other approach I see is to do what you guys said and filter out known keywords and maybe other known entities - framework classes, for example. This is extremely useful in a different way: it tells you what the ubiquitous language looks like. Using this approach you can even try to study how Bounded Contexts are used in an application and how cohesive a system is.</p>
<p>I am working right now in trying to write something about those multiple uses. The hardest part is to find some good public code bases to explore.</p>
<p>@Ivan<br />
I&#8217;m writing code too, maybe we should try to do something together. Still need some prototyping time alone, though, but I&#8217;m following your repo.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alberto Souza</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1733</link>
		<dc:creator>Alberto Souza</dc:creator>
		<pubDate>Wed, 29 Apr 2009 11:44:42 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1733</guid>
		<description>Great post!!!!!</description>
		<content:encoded><![CDATA[<p>Great post!!!!!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rafael Peixoto de Azevedo</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1732</link>
		<dc:creator>Rafael Peixoto de Azevedo</dc:creator>
		<pubDate>Wed, 29 Apr 2009 10:54:43 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1732</guid>
		<description>Thanks for this outstanding post!

Great idea: highly useful, effective and simple application for word tags.

Congratulations!</description>
		<content:encoded><![CDATA[<p>Thanks for this outstanding post!</p>
<p>Great idea: highly useful, effective and simple application for word tags.</p>
<p>Congratulations!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ícaro Medeiros</title>
		<link>http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is/#comment-1731</link>
		<dc:creator>Ícaro Medeiros</dc:creator>
		<pubDate>Wed, 29 Apr 2009 10:50:52 +0000</pubDate>
		<guid isPermaLink="false">http://fragmental.tw/?p=141#comment-1731</guid>
		<description>Why not to exclude reserved words of the cloud?</description>
		<content:encoded><![CDATA[<p>Why not to exclude reserved words of the cloud?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
