<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Lupi on Software</title>
	<atom:link href="http://blog.lupi-software.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.lupi-software.com</link>
	<description></description>
	<lastBuildDate>Wed, 23 May 2012 06:01:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.lupi-software.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Lupi on Software</title>
		<link>http://blog.lupi-software.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.lupi-software.com/osd.xml" title="Lupi on Software" />
	<atom:link rel='hub' href='http://blog.lupi-software.com/?pushpress=hub'/>
		<item>
		<title>Particle Sketches / 2 &#8211; working around the central limit theorem</title>
		<link>http://blog.lupi-software.com/2012/05/22/particle-sketches-2-working-around-the-central-limit-theorem/</link>
		<comments>http://blog.lupi-software.com/2012/05/22/particle-sketches-2-working-around-the-central-limit-theorem/#comments</comments>
		<pubDate>Tue, 22 May 2012 19:56:02 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[particle sketches]]></category>

		<guid isPermaLink="false">http://blog.lupi-software.com/?p=1202</guid>
		<description><![CDATA[If you read my previous post about Particle Sketches and are literate in statistics, you may have already dismissed my ideas as flawed, convinced that they will shatter against the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1202&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If you read <a title="Particle Sketches, a way to estimate a very large number of Bayesian models in sub-linear space and time" href="http://blog.lupi-software.com/2012/05/16/particle-sketches-a-way-to-model-a-very-large-number-of-bayesian-models-in-sub-linear-space-and-time/">my previous post about Particle Sketches</a> and are literate in statistics, you may have already dismissed my ideas as flawed, convinced that they will shatter against the hard cliffs of classical statistics and its mightiest peak, the <a class="zem_slink" title="Central limit theorem" href="http://en.wikipedia.org/wiki/Central_limit_theorem" rel="wikipedia" target="_blank">central limit theorem</a>.</p>
<p>It is not necessarily so. Don&#8217;t get me wrong, I don&#8217;t pretend to be able to circumvent it in any way or outsmart the math giants of the past. The naive way to put together <em>count-min sketches</em> and particle filters <strong>is</strong> bound to hit this wall, but if we are a little bit smart we can use the central limit theorem itself to guide us around these perilous shores.</p>
<p>What is the problem? The core concept of count-min sketches is to use a set of hash functions to distribute hits on various counters, relying on the fact that good hash functions will have few collisions. The counter with the lowest number of hits will be an upper-bound approximation of the true frequency for a given item.</p>
<p>Count-Mean-Min Sketches are a bit smarter and incorporate the number of expected collisions in the estimation itself.</p>
<p>When we replace counters with statistical models, like particle filters, we are replacing integers with random variables.</p>
<p>What&#8217;s the deal with adding together independent random variables? Under very mild conditions, the result is the <a class="zem_slink" title="Normal distribution" href="http://en.wikipedia.org/wiki/Normal_distribution" rel="wikipedia" target="_blank">normal distribution</a> (if the variance of the independent variables is finite) or one of the other well known and well studied distributions that you can find in literature (if the variance is infinite, such as in <a class="zem_slink" title="Power law" href="http://en.wikipedia.org/wiki/Power_law" rel="wikipedia" target="_blank">power-laws</a>).</p>
<p>The key here is the word <strong>independent</strong>. That&#8217;s surely the case if we choose a set of random hash functions in the same way we would do with normal <em>count-min sketches</em>. We can be smarter than that. If our items are not opaque labels for the object they stand for, but instead have an internal structure, we can use that internal structure to drive the selection of which <a title="Particle filter" href="http://en.wikipedia.org/wiki/Particle_filter" rel="wikipedia" target="_blank">particle filter</a> to update.</p>
<p>In simpler terms, we can use the attributes of our items to cluster them into specific, non-random categories. Let these attributes drive our hash functions: if they are effective properties around which we can cluster our items, we&#8217;ll see non uniform distributions arise among various buckets. If not, we&#8217;ll end up with very similar distributions along a single line.</p>
<p><a href="http://allupo.files.wordpress.com/2012/05/particle-filters-demo.png"><img class="size-full wp-image-1215 alignright" title="particle-filters-demo" src="http://allupo.files.wordpress.com/2012/05/particle-filters-demo.png?w=470&h=293" alt="" width="470" height="293" /></a></p>
<p>Let&#8217;s see a concrete example. A tweet is not just the text that the user sees, it includes a lot of metadata: the user location and timezone, the user bio, the number of followers the user has and the number of users he follows, how many times it has been retweeted, the kind of tweet (reply, retweet, plain text, with a link, with hashtags, with user mentions).</p>
<p>We can maintain separate set of buckets for each of these dimensions, using hash functions that map closely related items in the domain to the same slots in the co-domain.</p>
<p>If the processes underlying the phenomenon we are tracking do not change, we can expect these distributions not to change. In which case, we can discard these dimensions as not interesting.</p>
<p>If the processes do evolve, we can use a key strength of particle filters to still keep the computing load manageable: we can dynamically reduce the number of particles used to track these uninteresting dimensions, allocating resources to more interesting ones. Over time, what is interesting and uninteresting may change and we&#8217;ll just have to focus (i.e have more particles) where the real action is at the moment. That&#8217;s one reason I like particle filters over other statistical methods. The other reason is that they make it possible to estimate any kind of phenomenon, you don&#8217;t have to commit to a specific statistical model up front.</p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/programming/'>Programming</a> Tagged: <a href='http://blog.lupi-software.com/tag/particle-sketches/'>particle sketches</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1202/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1202&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2012/05/22/particle-sketches-2-working-around-the-central-limit-theorem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://allupo.files.wordpress.com/2012/05/particle-filters-demo.png?w=150" />
		<media:content url="http://allupo.files.wordpress.com/2012/05/particle-filters-demo.png?w=150" medium="image">
			<media:title type="html">particle-filters-demo</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>

		<media:content url="http://allupo.files.wordpress.com/2012/05/particle-filters-demo.png" medium="image">
			<media:title type="html">particle-filters-demo</media:title>
		</media:content>
	</item>
		<item>
		<title>We&#8217;re unveiling Squirro at CloudForce!</title>
		<link>http://blog.lupi-software.com/2012/05/22/were-unveiling-squirro-at-cloudforce/</link>
		<comments>http://blog.lupi-software.com/2012/05/22/were-unveiling-squirro-at-cloudforce/#comments</comments>
		<pubDate>Tue, 22 May 2012 06:09:36 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Everything else]]></category>
		<category><![CDATA[squirro]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[CloudForce]]></category>

		<guid isPermaLink="false">http://blog.lupi-software.com/?p=1194</guid>
		<description><![CDATA[Come and meet our little beast at CloudForce in London today. Squirro is the personal digital research app. Broader than feeds and more specific than search, Squirro scans multiple sources from Internet channels [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1194&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://allupo.files.wordpress.com/2012/05/d4a55744a37211e1abd612313810100a_7.jpg"><img class="alignright" title="d4a55744a37211e1abd612313810100a_7" src="http://allupo.files.wordpress.com/2012/05/d4a55744a37211e1abd612313810100a_7.jpg?w=470&h=470" alt="" width="470" height="470" /></a>Come and meet our little beast at <a href="https://www.salesforce.com/uk/events/details/cf12-london/?d=70130000000sQEi&amp;internal=true">CloudForce in London</a> today.</p>
<p><a title="Squirro, the personal digital research app" href="http://www.squirro.com/">Squirro</a> is <strong>the personal digital research app. </strong>Broader than feeds and more specific than search, Squirro scans multiple sources from Internet channels and social media, private databases and even internal systems such as <a class="zem_slink" title="NYSE: CRM" href="http://www.google.com/finance?q=NYSE:CRM" rel="googlefinance" target="_blank">Salesforce.com</a> and <a class="zem_slink" title="NYSE: SAP" href="http://www.google.com/finance?q=NYSE:SAP" rel="googlefinance" target="_blank">SAP</a> to find the most relevant information on your topic of interest, then updates it continuously and automatically. The result is a living collection of curated content you can save, synthesize and share with friends and colleagues in your own private workspace.</p>
<p>Squirro gives you timely, relevant information to navigate fast changing business relationships.</p>
<p><span style="text-align:center; display: block;"><a href="http://blog.lupi-software.com/2012/05/22/were-unveiling-squirro-at-cloudforce/"><img src="http://img.youtube.com/vi/DfdM1Gz_2us/2.jpg" alt="" /></a></span></p>
<p>When I joined Nektoon in January, Squirro was still diagrams on a drawing board and some minimal proof of concept code. Now it&#8217;s a powerful, responsive web application that runs on your phone, tablet and computer. It supports thousands of users and can track hundreds of thousand sources.</p>
<p>It&#8217;s amazing how far and how fast we progressed. <strong>A tribute to the talent of my colleagues, I am humbled every day by how good you are, friends, and energized by how it fells to work with you</strong>. Great vibes!</p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/everything-else/'>Everything else</a> Tagged: <a href='http://blog.lupi-software.com/tag/cloudforce/'>CloudForce</a>, <a href='http://blog.lupi-software.com/tag/salesforce-com/'>Salesforce.com</a>, <a href='http://blog.lupi-software.com/tag/sap/'>SAP</a>, <a href='http://blog.lupi-software.com/tag/squirro/'>squirro</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1194/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1194/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1194/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1194/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1194/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1194/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1194/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1194/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1194/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1194/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1194/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1194/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1194/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1194/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1194&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2012/05/22/were-unveiling-squirro-at-cloudforce/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://allupo.files.wordpress.com/2012/05/d4a55744a37211e1abd612313810100a_7.jpg?w=150" />
		<media:content url="http://allupo.files.wordpress.com/2012/05/d4a55744a37211e1abd612313810100a_7.jpg?w=150" medium="image">
			<media:title type="html">d4a55744a37211e1abd612313810100a_7</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>

		<media:content url="http://allupo.files.wordpress.com/2012/05/d4a55744a37211e1abd612313810100a_7.jpg" medium="image">
			<media:title type="html">d4a55744a37211e1abd612313810100a_7</media:title>
		</media:content>
	</item>
		<item>
		<title>Particle Sketches, a way to estimate a very large number of Bayesian models in sub-linear space and time</title>
		<link>http://blog.lupi-software.com/2012/05/16/particle-sketches-a-way-to-model-a-very-large-number-of-bayesian-models-in-sub-linear-space-and-time/</link>
		<comments>http://blog.lupi-software.com/2012/05/16/particle-sketches-a-way-to-model-a-very-large-number-of-bayesian-models-in-sub-linear-space-and-time/#comments</comments>
		<pubDate>Tue, 15 May 2012 23:02:41 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[particle sketches]]></category>

		<guid isPermaLink="false">https://allupo.wordpress.com/?p=1182</guid>
		<description><![CDATA[I have been designing an interesting, novel (as far as I know) data structure. Its goal is to estimate a very large number Bayesian models in the smallest possible space [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1182&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://allupo.files.wordpress.com/2012/05/particle-sketches.jpg"><img class="alignright size-full wp-image-1185" title="Particle-Sketches" src="http://allupo.files.wordpress.com/2012/05/particle-sketches.jpg?w=470&h=470" alt="" width="470" height="470" /></a>I have been designing an interesting, novel (as far as I know) data structure. Its goal is to <em>estimate a very large number Bayesian models</em> in the smallest possible space and time. <em>Sublinear complexity in space and time as well as online updates</em> area key requirement.</p>
<p>The key idea is to combine particle filters and sketches in a single powerful tool. That&#8217;s why I call them <strong>Particle Sketches</strong>.</p>
<p>I devised them as a tool to track the posting behavior of a very large set of Twitter users, in order to decide who to follow using Twitter streaming API and who to follow using the Rest API based on their expected tweeting frequency and distribution across the day and week. The data structure is however totally generic and suitable to all kind of large-scale data estimation problem: I can surely see applications in high-frequency trading and website traffic forecast.</p>
<p><span id="more-1182"></span></p>
<p>Let&#8217;s look at the building blocks first.</p>
<p><strong>Particle Filters</strong> —otherwise known as Sequential Monte Carlo method— are model estimation techniques based on simulations. They are used to estimate Bayesian models in which the latent variables are connected in a Markov chain, typically when the state space of the latent variables is continuous and not sufficiently restricted to make exact interference tractable.</p>
<p>In practice, they work by maintaining a set of differently-weighted samples —called <em>particles</em>— that represent the expected distribution of the latent variable. You can then &#8220;filter&#8221;, which in this context mean determine the distribution at a specific time, by estimating how the particle swarm should have evolved under the observed variable (e.g. time).</p>
<p>On update, a new set of weighted particles is generated influenced by new observations. You can check Wikipedia&#8217;s page on <a href="http://en.wikipedia.org/wiki/Particle_filter">particle filters</a> for details or the multitude of pages about them available on the Internet.</p>
<p>The key fact here is that particle filters are composed by a set of discrete, importance-weighted particles and their distribution is effectively the sum of these particles.</p>
<p>We can thus split up a particle filter into multiple smaller filters by splitting up the particles into them. If we pick the particles at random, we should end up with essentially the same distribution among all of them.</p>
<p>If we add noise to a few of these sub-filters, the cumulative distribution of the sum of all of them will still be fairly close to the original estimated distribution. It will be even more true if we can account for the noise and remove its effect.</p>
<p><strong>Count-Min (CM) Sketches</strong> are probabilistic, memory efficient data structures that allow one to estimate frequency-related properties of a data set (estimate frequencies of particular items, find top-K frequent elements, performance range frequency queries —i.e. find the sum of the frequencies of elements within a given range— such as estimate percentiles).</p>
<p>They are conceptually quite simple, they maintain a two-dimensional array [<em>w, d</em>] of integer counters. A set of <em>d </em>random hash functions with co-domain between 0 and <em>w </em>is used to choose which counters to update whenever we a new sample arrives.</p>
<p>Given an item, we can estimate its frequency by looking at all buckets in the two-dimensional array and choosing the one with the lowest value.</p>
<p>CM sketches are extremely memory efficient: for example, 48 KB are sufficient to calculate the top-100 most frequent elements in a 40 MB dataset with 4% error.</p>
<p>CM sketches tend to perform better with highly-skewed distributions, such as those generated by power laws or Zipf-like distributions.</p>
<p>In case of low or moderately skewed data, a more careful estimation can be done by accounting for the noise due to hash collisions: for example, by estimating and removing the noise for each hash function (calculated as the average value of all remaining counters) and choosing the median instead of the minimum as the estimated frequency. This variation is called <strong>Count-Mean-Min Sketch</strong>.</p>
<p>There are many other interesting properties of CM Sketches, such as various algebraic operations that they support or a way to measure how different two sketches are — calculating the cosine distance between them.</p>
<p>For a good explanation of CM Sketches with sample code, check <a href="http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/">this nice post on highlyscalable</a>. For deeper knowledge, check <a href="https://sites.google.com/site/countminsketch/">this site</a>.</p>
<p><strong>Particle Sketches</strong> replace the counters with particle filters. On each new observation, a set of filters is selected using hash functions and, for each of them, only a random subset of the particles is updated.</p>
<p>To obtain a distribution for a given item, we put together all the particle filters that are associated with that item. For better performance, we can estimate and remove the expected noise due to collisions in a way analogous to Count-Mean-Min sketches.</p>
<p>The nice thing is that many cool properties of CM Sketches are preserved. For example, by putting all particle filters together we have an estimation of the cumulative distribution of the whole population. Using techniques similar to those used to estimate the top-K elements in a set, we can calculate the cumulative particle filters of a sub-population and so on.</p>
<p>It&#8217;s still a work in progress, I am still exploring the characteristics of these models and the best way to merge together multiple particle filters discarding the peculiar noise generated by (pardon the repetition) structure of this data structure.</p>
<p><em>Possible improvements</em>: stacking together particle sketches at different resolutions, it should be possible to decompose these models in a way analogous to wavelet decomposition. It should make it possible to use even a more compact representation, thus dually to have —if we use the same bits of information— a more faithful model with less noise.</p>
<p><strong>NEXT:</strong> <a title="Particle Sketches / 2 – working around the central limit theorem" href="http://blog.lupi-software.com/2012/05/22/particle-sketches-2-working-around-the-central-limit-theorem/">Particle Sketches / 2 — working around the central limit theorem</a></p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/programming/'>Programming</a> Tagged: <a href='http://blog.lupi-software.com/tag/cloud/'>cloud</a>, <a href='http://blog.lupi-software.com/tag/data-structures/'>data structures</a>, <a href='http://blog.lupi-software.com/tag/particle-sketches/'>particle sketches</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1182/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1182/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1182&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2012/05/16/particle-sketches-a-way-to-model-a-very-large-number-of-bayesian-models-in-sub-linear-space-and-time/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:thumbnail url="http://allupo.files.wordpress.com/2012/05/particle-sketches.jpg?w=150" />
		<media:content url="http://allupo.files.wordpress.com/2012/05/particle-sketches.jpg?w=150" medium="image">
			<media:title type="html">Particle-Sketches</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>

		<media:content url="http://allupo.files.wordpress.com/2012/05/particle-sketches.jpg" medium="image">
			<media:title type="html">Particle-Sketches</media:title>
		</media:content>
	</item>
		<item>
		<title>My new job</title>
		<link>http://blog.lupi-software.com/2012/02/04/my-new-job/</link>
		<comments>http://blog.lupi-software.com/2012/02/04/my-new-job/#comments</comments>
		<pubDate>Sat, 04 Feb 2012 11:29:33 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Everything else]]></category>
		<category><![CDATA[squirro]]></category>

		<guid isPermaLink="false">https://allupo.wordpress.com/?p=1176</guid>
		<description><![CDATA[I decided at the end of last year that I was working in the wrong place. The company was nice, the people were great, but the development practices and project [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1176&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 260px"><a href="http://www.crunchbase.com/company/memonic"><img class="zemanta-img-inserted zemanta-img-configured" title="Image representing Memonic as depicted in Crun..." src="http://www.crunchbase.com/assets/images/resized/0008/6923/86923v1-max-250x250.png" alt="Image representing Memonic as depicted in Crun..." width="250" height="199" /></a><p class="wp-caption-text">Image via CrunchBase</p></div>
<p>I decided at the end of last year that I was working in the wrong place. The company was nice, the people were great, but the development practices and project management felt on the wrong track. I quit.</p>
<p>I looked all over Europe for a new interesting adventure. London, Berlin, Amsterdam — I was all over the place meeting new people, screening companies and interviewing with the few interesting choices.</p>
<p>In the process, I clarified my goals. I want:</p>
<ul>
<li><strong>to remain sane</strong>, so I need a company that understands and applies agile software development practices</li>
<li><strong>to learn cloud computing</strong>, so I need a company that has expertise in that area and has at least a successful deployment or product in the area of large scale computing</li>
<li><strong>to improve my data mining skills</strong>, so I need a company that tackles interesting problems</li>
<li><strong>to spend most of my time developing software</strong>, so I will avoid mixed half-management, half-development positions, particularly in large corporations.</li>
</ul>
<p>You can imagine how delighted I was when I found a company that meets these goals here in Zurich. I couldn&#8217;t believe that to be possible. I thought that so improbable that I already gave up my rented apartment here and was about to move somewhere else. Now I am stuck in a rented room in a shared flat, while I search for a new place.</p>
<p><span id="more-1176"></span></p>
<p>The company I joined is Nektoon AG. A startup built by the core team of <a href="http://www.local.ch/">local.ch</a>, a Switzerland whitepages site, that has already shipped a successful product, <a href="http://www.memonic.com">Memonic</a>, a Evernote-like tool geared toward teams. Memonic predates Evernote, but when you have 100 times the money it&#8217;s easy to beat the competition.</p>
<p>Memonic is built entirely on <a class="zem_slink" title="Amazon Web Services" href="http://aws.amazon.com/" rel="homepage">Amazon Web Services</a>, is written in my old friend Python and it&#8217;s quite an interesting tool. You can find some details on its architecture on our <a href="http://blog.memonic.com">Memonic blog</a>.</p>
<h2 id="squirro">Squirro</h2>
<p>I, however, did not join Nektoon to work on Memonic. I am working on a shiny new product called <a href="http://www.squirro.com">Squirro</a>. Memonic and Squirro attack the same problem, they help you pick valuable needles from the Internet haystack, but they do it in quite a different way.</p>
<p>Memonic sits at your side while you browser and help you save interesting leads, shortlist relevant information and curate them for your peers. Searching for the right stuff, scoring sources and keeping up to date with new content is still a manual process.</p>
<p>Squirro does that and more for you: give our little squirrel a topic and he will find the best acorn trees and harvest the tastier nuts for you. He will keep coming with better and better fruits, because it learns your taste over time.</p>
<p>He can do much more, he&#8217;s a sucker for serendipity: do you work at a Big Pharma company and have you got a meeting with Dr. Bob next Thursday about that new promising molecule for Alzheimer? Squirro will come up with relevant studies and papers that you haven&#8217;t yet read, will tell about unexpected weather conditions/road blocks/strikes at the place where you&#8217;re going to meet and will give you the latest spicy gossip about FCZ Zurich… because he knows Bob is a big fan of that soccer team.</p>
<p>Search is the past, Google&#8217;s business is telling you about what you are already thinking about. Social is the present, Facebook and Twitter tell you what others around you are thinking right now. Squirro is the future, <em>we&#8217;re in the business of telling you what you are not yet thinking about</em>.</p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/everything-else/'>Everything else</a> Tagged: <a href='http://blog.lupi-software.com/tag/squirro/'>squirro</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1176/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1176/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1176/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1176/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1176/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1176/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1176/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1176/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1176/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1176/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1176/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1176/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1176/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1176/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1176&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2012/02/04/my-new-job/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://allupo.files.wordpress.com/2012/02/squirro.png?w=150" />
		<media:content url="http://allupo.files.wordpress.com/2012/02/squirro.png?w=150" medium="image">
			<media:title type="html">Squirro</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>

		<media:content url="http://www.crunchbase.com/assets/images/resized/0008/6923/86923v1-max-250x250.png" medium="image">
			<media:title type="html">Image representing Memonic as depicted in Crun...</media:title>
		</media:content>
	</item>
		<item>
		<title>MS-SQL: Cleaning up duplicates with Common Table Expressions</title>
		<link>http://blog.lupi-software.com/2011/12/27/ms-sql-cleaning-up-duplicates-with-common-table-expressions/</link>
		<comments>http://blog.lupi-software.com/2011/12/27/ms-sql-cleaning-up-duplicates-with-common-table-expressions/#comments</comments>
		<pubDate>Tue, 27 Dec 2011 09:10:16 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[ms-sql]]></category>
		<category><![CDATA[oracle]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">https://allupo.wordpress.com/?p=1130</guid>
		<description><![CDATA[It is a common pattern to prefer non-semantic primary keys — auto-incremented integers or UUIDs — over semantic ones. It&#8217;s advisable to define unique indexes on the semantic identity fields, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1130&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>It is a common pattern to prefer non-semantic primary keys — auto-incremented integers or UUIDs — over semantic ones. It&#8217;s advisable to define unique indexes on the semantic identity fields, but sometimes you avoid it for performance reasons or you just forget.</p>
<p>When things screw up, you end up with multiple records relating to the same entity or relation. How do you clean things up?</p>
<p>Let&#8217;s take the simple case of a many-to-many relation, implemented as a linking table:</p>
<p><pre class="brush: sql;">
CREATE TABLE First (
	First_ID INT IDENTITY(1,1),
	-- more fields
	PRIMARY KEY (First_ID)
);

CREATE TABLE Second (
	Second_ID INT IDENTITY(1,1),
	-- more fields
	PRIMARY KEY (Second_ID)
);

CREATE TABLE First_Second(
	First_Second_ID INT IDENTITY(1,1),
	First_ID INT FOREIGN KEY REFERENCES First,
	Second_ID INT FOREIGN KEY REFERENCES Second
);
</pre></p>
<p>If we end up with duplicate <code>First_Second</code> records, we can get rid of them using this code:</p>
<p><pre class="brush: plain;">
WITH records(First_Second_ID, RecNo) AS
  SELECT
    First_Second_ID,
    ROW_NUMBER() OVER( PARTITION BY First, Second ORDER BY First, Second) AS RecNo
  FROM First_Second

DELETE FROM records WHERE RecNo &gt; 1;
</pre></p>
<p>Doing the same without CTE is possible, but less clean and straightforward. It involves temporary tables, cursors or nested subqueries.</p>
<p>I learned this trick with Microsoft SQL Server. Postgresql, DB2, Oracle and a bunch of <a href="http://en.wikipedia.org/wiki/Common_table_expression">other databases support CTE as well</a>.</p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/programming/'>Programming</a> Tagged: <a href='http://blog.lupi-software.com/tag/ms-sql/'>ms-sql</a>, <a href='http://blog.lupi-software.com/tag/oracle/'>oracle</a>, <a href='http://blog.lupi-software.com/tag/postgresql/'>postgresql</a>, <a href='http://blog.lupi-software.com/tag/sql/'>sql</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1130/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1130&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2011/12/27/ms-sql-cleaning-up-duplicates-with-common-table-expressions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>
	</item>
		<item>
		<title>Cone of Silence</title>
		<link>http://blog.lupi-software.com/2011/12/14/cone-of-silence/</link>
		<comments>http://blog.lupi-software.com/2011/12/14/cone-of-silence/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 14:43:21 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Business & Management]]></category>

		<guid isPermaLink="false">https://allupo.wordpress.com/?p=1064</guid>
		<description><![CDATA[Once upon a time, I worked for a small company that had a tiny office. It was just a big room where all the developers, designers and creatives assembled, plus [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1064&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 310px"><a href="http://en.wikipedia.org/wiki/File:Me-_cone.JPG"><img class="zemanta-img-inserted zemanta-img-configured" title="Me- cone" src="http://upload.wikimedia.org/wikipedia/en/thumb/8/89/Me-_cone.JPG/300px-Me-_cone.JPG" alt="Me- cone" width="300" height="225" /></a><p class="wp-caption-text">Image via Wikipedia</p></div>
<p>Once upon a time, I worked for a small company that had a tiny office. It was just a big room where all the developers, designers and creatives assembled, plus a couple of small ones where the owner and his secretary spent most of the time on their phone pitching to new clients or trying to get them pay overdue bills.</p>
<p>Italy is full of old buildings with tall roofs, hard to heat and with terrible acoustics. This was not different. The &#8220;Production&#8221; room was small and noisy. It was much like cubicle-land, but worse. Being a bunch of nerds, we devised a nerdy solution.</p>
<p>Whenever one deemed the room too noisy, he would cast a Cone of Silence — embodied in a physical paper cone placed in the middle of the central desk. It magically turned the noisy office room into a silent library one. Meetings, chit chat, even working together was banned for a few hours. Like any respectable spell, you couldn&#8217;t cast it in rapid succession. You could cast it once per week, then you had to wait and recharge to prevent abuse and let others finish their work.</p>
<p>All in all, <em>Cone of Silence</em> worked really well.</p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/business-management/'>Business &amp; Management</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1064/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1064/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1064/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1064/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1064/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1064/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1064/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1064/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1064/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1064/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1064/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1064/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1064/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1064/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1064&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2011/12/14/cone-of-silence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>

		<media:content url="http://upload.wikimedia.org/wikipedia/en/thumb/8/89/Me-_cone.JPG/300px-Me-_cone.JPG" medium="image">
			<media:title type="html">Me- cone</media:title>
		</media:content>
	</item>
		<item>
		<title>@rethinkdb dog challenge</title>
		<link>http://blog.lupi-software.com/2011/12/13/rethinkdb-dog-challenge/</link>
		<comments>http://blog.lupi-software.com/2011/12/13/rethinkdb-dog-challenge/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 19:10:00 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C]]></category>

		<guid isPermaLink="false">http://blog.lupi-software.com/?p=1044</guid>
		<description><![CDATA[Rethinkdb recently published a cat challenge. Well, it&#8217;s not really a challenging challenge if you can solve it by looking up Vigenère encryption from Wikipedia and following instructions. I don&#8217;t [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1044&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://rethinkdb.com/">Rethinkdb</a> recently published a cat challenge. Well, it&#8217;s not really a challenging challenge if you can solve it by looking up <a href="http://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher">Vigenère encryption</a> from Wikipedia and following instructions.</p>
<p>I don&#8217;t want to debate the pro and cons of the remainder operator in <a class="zem_slink" title="ANSI C" href="http://en.wikipedia.org/wiki/ANSI_C" rel="wikipedia">ANSI C</a> and <a class="zem_slink" title="CPython" href="http://www.python.org/" rel="homepage">CPython</a>: the two standards adopt one the divisor, the other the dividend&#8217;s sign for the result of the remainder operator. While the original K&amp;R C left it to be platform dependent, if I remember correctly. I am not much of  a systems hacker and I am on the wrong side of the Atlantic to submit my candidature, unless they&#8217;d consider sponsoring people from Europe — which I doubt.</p>
<p>I find however interesting their <a href="http://1.61803398874.com/canine/">additional challenge</a>, the <strong>dog</strong> command: it&#8217;s really a sort of glorified, asynchronous version of <strong>tee</strong>.</p>
<p>I am preparing for an interview in Berlin tomorrow, so I don&#8217;t have the time to write a full blown dog command today, but I did some preliminary research.</p>
<p>Writing non-blocking, <a class="zem_slink" title="Asynchronous I/O" href="http://en.wikipedia.org/wiki/Asynchronous_I/O" rel="wikipedia">Asynchronous I/O</a> is still somewhat platform dependant:</p>
<ul>
<li>On Windows, your best bet is <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/aa365747(v=vs.85).aspx">overlapped I/O</a> with <strong>FILE_FLAG_OVERLAPPED</strong>.</li>
<li>On POSIX, you can use <a href="http://pwet.fr/man/linux/conventions/posix/aio_h">aio.h</a>.</li>
<li>If you are specifically on Linux, you can go even faster avoiding copying in and out of user space by using <a href="http://linux.die.net/man/2/tee">tee(2)</a>, <a href="http://linux.die.net/man/2/splice">splice(2)</a> and <a href="http://linux.die.net/man/2/vmsplice">vmsplice(2)</a>.</li>
</ul>
<p>Given that they wrote a man page for <strong>dog</strong>, I&#8217;d go with the POSIX or Linux solution.</p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/programming/'>Programming</a> Tagged: <a href='http://blog.lupi-software.com/tag/c/'>C</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1044/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1044&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2011/12/13/rethinkdb-dog-challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>
	</item>
		<item>
		<title>Functional vs. Object Oriented, an unusual point of view</title>
		<link>http://blog.lupi-software.com/2011/12/13/functional-vs-object-oriented-an-unusual-point-of-view/</link>
		<comments>http://blog.lupi-software.com/2011/12/13/functional-vs-object-oriented-an-unusual-point-of-view/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 09:37:46 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[Object-oriented programming]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://blog.lupi-software.com/?p=1040</guid>
		<description><![CDATA[While searching for a job, I met a proprietary trading company that uses functional languages for their own systems. Jane Street is a private equity firm, specializing in statistical arbitrage. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1040&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://player.vimeo.com/video/14317442" width="470" height="353" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe></p>
<p>While searching for a job, I met a proprietary trading company that uses functional languages for their own systems. <a href="http://www.janestreet.com/">Jane Street</a> is a private equity firm, specializing in statistical arbitrage. They use algorithms to take advantage of the small inefficiencies in (or between) financial markets.</p>
<p>Their main needs are <strong>correctness</strong>, <strong>agility</strong> and <strong>performance</strong>. Their <a href="http://www.janestreet.com/technology/ocaml.php">tool of choice</a> is <a href="http://caml.inria.fr/">OCaml</a>.</p>
<p>They trade 1 to 2 millions shares a day, 2 to 4 billion dollars flow through their system. There is no faster way to run yourself out of business than making lots of bad decisions automatically in a tight loop. Algorithmic trading is an arms race, people compete on milliseconds in placing orders to get an edge but the best way to win is to outsmart adversaries.</p>
<p><span id="more-1040"></span></p>
<p>Jane Street was a pretty conservative shop a decade ago. They were smaller, Excel and VBA satisfied their needs. As they outgrew them, they looked at the usual tools in the enterprise market and found them lacking.</p>
<p>This firm deeply cares about the quality of their software: they are betting their own money, billions of dollars, on it. It’s no wonder that senior partners committed to review each and every line of code that goes into production for critical components.</p>
<p><em>Jane Street considered rewriting their systems in C#, but abandoned it in favor of OCaml</em>. They cite verbosity and lack of clarity as important reasons in this choice.</p>
<p><strong>Verbosity</strong> is kind of obvious for languages like C# or Java, although in the recent years there were mild improvements. An interesting point that Yaron Minsky (Jane Street’s managing director) made is that these languages —when applied to large systems— tend to make people cut and paste, reimplementing the wheel over and over, because they are not expressive enough to capture high level variability and invariants properly. You can’t pay enough people to code review boilerplate, dull code. The tenth time they see the same-looking code block, they skim over it and miss important bugs.</p>
<p><em>I do agree</em>: although C# and Java greatly improved in the last years, they are nowhere close to the expressiveness of OCaml polymorphic variants, pattern matching and module functors.</p>
<p>I expect F# and Scala —the former a OCaml cousin running on the .NET platform, the latter a functional/object oriented language running on the Java VM— to slowly make inroads in the enterprise in the next few years.</p>
<p>It is both a matter of technology — functional languages needs are different from object oriented ones and platforms need time to adapt — and people — we need a new generation of coders before enterprises will truly grasp functional programming. It’s a kind of catch-22 situation that slows down adoption.</p>
<p><strong>Lack of clarity</strong> is due to the “spooky action at distance” things in OOP called inheritance. Senior partners at Jane Street, bright people but not professional programmers, found it hard to understand what was going on and what code was actually being executed. They are right.</p>
<p>As the OOP community has finally figured out, inheritance is often a bad way to get polymorphism, but <em>I don’t fully buy into Minsky’s argument</em>: in order to write concise programs in functional languages, you have to introduce the same kind of indirection through the use of high-order functions and module functors. It’s only slightly less confusing.</p>
<p>It is true that type systems, when used properly, will help you write highly readable, high-order code better in functional languages than in object oriented ones — but that is only because the functional languages are younger and come from an academic background, they benefit from more recent research.</p>
<p>There are languages —Eiffel, for an example that is not so cutting edge— that let you reason about the correctness of your programs even when a high degree of polymorphism is involved.</p>
<p><strong>Agility</strong> is the ability to adapt code to new requirements. Being able to refactor code fast and without errors is key to this goal.</p>
<p>Minsky cites algebraic data types and the semantic of the <code>match</code> statement in OCaml as important tools in this regard. The same is true for object oriented code, using double dispatch or —if you have a sane enough language— multi-method dispatch.</p>
<p>A great gem from Mr. Minsky is: “you have to think about the type system as a tool of its own”, you have to <strong>encode the invariants of your domain in your types</strong>. This is key to using static languages properly, no matter if they are functional or object oriented.</p>
<p><strong>Performance</strong> is the last big theme that nudged Jane Street to choose OCaml.</p>
<p>They think that ML languages sit in a sweet spot between expressiveness and performance, a kind of local equilibrium where you get a lot on both axes. If you move either way, you have to give on the other one.</p>
<p>That is true: OCaml, in particular, has quite a good native code compiler. It produces fast code without requiring esoteric optimization techniques. It’s easy to reason about its output and it’s relatively easy to fine-tune its garbage collector, facts that help a lot in the kind of soft realtime arena where Jane Street plays.</p>
<p>Minsky cites <em>UIs, Concurrency, External libraries and Programming in the Large as weak points in OCaml</em>. Because of the particular needs of his firm, these are less of a problem than in other situations.</p>
<p>In my opinion, these are actually the key strengths that make Scala or F# better functional languages for common situations than niche players like OCaml.</p>
<h6 class="zemanta-related-title" style="font-size:1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://ocaml.janestreet.com/?q=node/61">Caml Trading talk at CMU</a> (ocaml.janestreet.com)</li>
</ul>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/programming/'>Programming</a> Tagged: <a href='http://blog.lupi-software.com/tag/functional-programming/'>Functional programming</a>, <a href='http://blog.lupi-software.com/tag/object-oriented-programming/'>Object-oriented programming</a>, <a href='http://blog.lupi-software.com/tag/objective-caml/'>Objective Caml</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1040/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1040&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2011/12/13/functional-vs-object-oriented-an-unusual-point-of-view/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>
	</item>
		<item>
		<title>Being an effective project manager</title>
		<link>http://blog.lupi-software.com/2011/12/10/being-an-effective-project-manager/</link>
		<comments>http://blog.lupi-software.com/2011/12/10/being-an-effective-project-manager/#comments</comments>
		<pubDate>Sat, 10 Dec 2011 13:42:29 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Business & Management]]></category>
		<category><![CDATA[Project management]]></category>

		<guid isPermaLink="false">http://blog.lupi-software.com/?p=1033</guid>
		<description><![CDATA[Lesson learned: Being an effective project manager is not about doing the work yourself, it is about making sure the right resource is applied to the right problem. (from The [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1033&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Lesson learned:</p>
<blockquote><p>
Being an effective project manager is not about doing the work yourself, it is about making sure the right resource is applied to the right problem.
</p></blockquote>
<p>(from <a href="http://thegorillaisnamedhogarth.blogspot.com/2011/02/responsible-authority-gorilla.html">The gorilla is named Hogarth</a>)</p>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/business-management/'>Business &amp; Management</a> Tagged: <a href='http://blog.lupi-software.com/tag/project-management/'>Project management</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1033/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1033/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1033/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1033/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1033/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1033/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1033/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1033&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2011/12/10/being-an-effective-project-manager/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>
	</item>
		<item>
		<title>Try Harder</title>
		<link>http://blog.lupi-software.com/2011/12/08/try-harder/</link>
		<comments>http://blog.lupi-software.com/2011/12/08/try-harder/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 16:31:56 +0000</pubDate>
		<dc:creator>Roberto Lupi</dc:creator>
				<category><![CDATA[Business & Management]]></category>
		<category><![CDATA[agile]]></category>
		<category><![CDATA[pragpub]]></category>

		<guid isPermaLink="false">http://blog.lupi-software.com/?p=1027</guid>
		<description><![CDATA[I have found this little gem in PragPub, May 2011: A depressing theme with many of today’s software shops is the need to only make two kinds of hires. The [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1027&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have found this little gem in <a title="PragPub" href="http://pragpub.com/magazines">PragPub</a>, May 2011:</p>
<blockquote><p>A depressing theme with many of today’s software shops is the need to only make two kinds of hires. The first is a developer. After all, a developer codes, and that makes money! The second hire is an MBA-style manager. This manager is an HR-type who handles budgets, spreadsheets, and politics.</p>
<div class="wp-caption alignright" style="width: 310px"><a href="http://commons.wikipedia.org/wiki/File:American_soldier_in_Iraq_going_through_concertina_wire.jpg"><img class="zemanta-img-inserted zemanta-img-configured " src="http://upload.wikimedia.org/wikipedia/commons/thumb/a/a1/American_soldier_in_Iraq_going_through_concertina_wire.jpg/300px-American_soldier_in_Iraq_going_through_concertina_wire.jpg" alt="" width="300" height="201" /></a><p class="wp-caption-text">Is Your Software Project in the Trenches? (Image via Wikipedia)</p></div>
<p>Then someone like me comes along. I’ve got a development background and I’ve managed as well, but today I don’t do either. Instead I work with a team to see where the problems are. I sit with them and look for the areas that have become blind spots, and then find ways to solve those problems. I’ve saved large organizations substantial amounts of money by improving how their teams work. But I’ve usually done this by hiring in as a developer or manager. It’s a rare company that hires someone to improve their process. They’d much rather sit in the trenches and inspire their soldiers to leap out in the face of concertina wire and machine guns, sure that with the right mix of courage and moral fiber, this time they’ll finally ship that product!</p>
<p>As we look around, this attitude seems so… stupid. They seem to think that by trying harder they’ll succeed. How many of our favorite sports teams don’t have coaches, but ask their professional athletes to try harder?</p>
<p style="font-weight:bold;">From “Is Your Software Project in the Trenches?” by Jared Richardson</p>
</blockquote>
<br />Filed under: <a href='http://blog.lupi-software.com/category/featured-categories/business-management/'>Business &amp; Management</a> Tagged: <a href='http://blog.lupi-software.com/tag/agile-2/'>agile</a>, <a href='http://blog.lupi-software.com/tag/pragpub/'>pragpub</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/allupo.wordpress.com/1027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/allupo.wordpress.com/1027/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/allupo.wordpress.com/1027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/allupo.wordpress.com/1027/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/allupo.wordpress.com/1027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/allupo.wordpress.com/1027/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/allupo.wordpress.com/1027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/allupo.wordpress.com/1027/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/allupo.wordpress.com/1027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/allupo.wordpress.com/1027/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/allupo.wordpress.com/1027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/allupo.wordpress.com/1027/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/allupo.wordpress.com/1027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/allupo.wordpress.com/1027/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.lupi-software.com&#038;blog=5339219&#038;post=1027&#038;subd=allupo&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.lupi-software.com/2011/12/08/try-harder/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/78e13892b6611a140af58dbff95eeaea?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Kitten Lulu</media:title>
		</media:content>

		<media:content url="http://upload.wikimedia.org/wikipedia/commons/thumb/a/a1/American_soldier_in_Iraq_going_through_concertina_wire.jpg/300px-American_soldier_in_Iraq_going_through_concertina_wire.jpg" medium="image" />
	</item>
	</channel>
</rss>
