hacks Pearls

Teach yourself Git in 2 minutes

Git is very simple. It's very powerful, but fundamentally very logical and very simple. If you try to learn everything you can do with git, then the information will flood your brain and drown you. That's true of any powerful tool like Photoshop and Unix.

But if you just want to use git to backup your code changes, develop new branches, and share your source, git is actually as straightforward as SVN. Avoid complex and dangerous commands like git-rebase. I've worked on large codebases with distributed teams and I've never needed anything more than basic commit, branch/merge, and push/pull. Git also has useful log, diff, and grep tools for quickly finding out information about your code.

Git Flow: Git for Humans

To use git without brain augmentation surgery, you should make a simple, consistent system for yourself with a handful of commands. Or just use my system.

You want to commit often, so it's good to create bash shortcuts so that you use git more often. The 2-letter shortcuts encourage you to commit more often and keep everyone's code up to date. You'll never be afraid of losing code.

Here's my main workflow, commit and push:

$ # make changes, fix bugs...

$ cm "fixed bug 214 in the UI"

$ ph

I'm constantly checking the status to see if I forgot to add files or commit something:

$ sl

# On branch master

nothing to commit (working directory clean)

If I'm branching, I create a branch, make my changes, and then merge.

$ ct -b newfeature

# make changes

$ ct master

$ me newfeature

And then I can push my changes and delete the branch.

$ ph

$ bh -d newfeature

You want to commit often, so always cm (git commit -a -m) and ph (git push) after even small changes. The 2-letter shortcuts encourage you to commit more often and keep everyone's code up to date.

The codes are easy to remember because they are consistent. The codes are always 2 letters, composed of precisely the first letter of the command and the last letter (including all of the options). By using the last letter of the command including options, the shortcut tricks your mind into thinking of the full command every time you type it. Normally, you forget commands with abbreviations of the first letters, but with my system you remember the whole command every time so you can still use git on other systems and other peoples' computers.

Git Flow Examples

Where is this variable myVar declared (git grep)?

$ gp myVar

How is my branch different from master (git diff --ignore-space-change)?

$ de master

I forgot, did I commit all my changes, what files did I forget to add (git status -uall)?

$ sl

How do I make a new branch (git checkout -b)?

$ ct -b mybranch

How do I merge it back (git merge)?

$ #ensure you've committed all changes in your branch

$ ct master

$ me mybranch

How do I delete my branch after I've merged changes (git branch -d)?

$ bh -d mybranch

How do I pull and push my changes (git pull, git push)?

$ pl

$ ph

What changes were made recently (git log)?

$ lg

What branches exist and branch am I on (git branch)?

$ bh

What if I screwed up and want to remove all the code in my branch without merging (git branch -D, since caps are harder)?

$ bh -D mybranch

How do I make a new repository?

$ git init

I just added new files to my code, how do I add them to my git repository?

$ ad .

Below are my bash aliases. Add these to your ~/.bashrc file so that you can use these shortcuts too:

alias ad='git add'

alias pl='git pull'

alias ph='git push'

alias cm='git commit -a -m'

alias sl='git status -uall'

alias lg='git log'

alias gp='git grep'

alias de='git diff --ignore-space-change'

alias me='git merge'

alias bh='git branch'

alias ct='git checkout'

over 5 years ago on January 6 at 1:12 am by Joseph Perla in tech, hacks


Don't write on the whiteboard

I recently interviewed at a major technology company. I won't mention the name because, honestly, I can't remember whether I signed an NDA, much less how strong it was.

I did well. Mostly because of luck. I normally step over myself when I interview. I guess I've improved over the years. Here are a few tips to ace your own interview.

1. Don't write on the whiteboard

When I interviewed at Palantir around 5 years ago, I had a lot of trouble with this. Yes, I knew next to nothing about computer science then, but I should have been able to answer many of those questions. For example, Palantir asked me to write an API for a hash table, and I forgot set key and get key, the most basic operations. The alien situation of the whiteboard contributed to my nervousness. I didn't get an offer from Palantir.

Most people think you have to write on the whiteboard. Steve Yegge recommends that you practice writing code on a whiteboard and even buy and bring your own marker to the interview. That's pretty extreme and truly conveys the capriciousness of modern-day tech interviewing.

The interviewer started by asking me to code up a simple recursive calculation, using any language I wanted. "I dont like to write on whiteboards," I said. "It feels unnatural and distracting. I'd prefer to write on paper." "Okay," he shrugged.

The interviewers don't care. Use paper.

2. Bring your own paper and pen

So I asked for some paper and a pen. But there was no paper around, only some post-it notes. My mistake.

You should always have paper and pen anyway to write down ideas. On the subway, in line for movie tickets. Or you can keep a few sheets of paper with your resume and folder you brought to the interview (you did that, right?). Moleskines are excellent notebooks.

Some of the best programmers figure out the high-level overview on paper before they write a single line of new code.

3. Use Python

Even if you are a C++ systems guru. Even if you only know how to use Eclipse to program Java. Learn Python and use it during your interview. Python's philosophy is very simple and consistent. It's largely composed of a subset of the ideas in Java and C++. 80% of Python is based around the dictionary (HashMap). It will take you a few days to learn and not much longer to master.

You will waste a lot of time writing string manipulation code and initialization code that you can do in one line of Python. Get to the algorithm.

All I had was a post-it note, a tiny amount of space. So I wrote down the algorithm in Python line-by-line on the post-it note.

4. Write short algorithms, then make them half as long, then make them shorter, then ask an expert how they would make it even shorter

Five minutes later, I had my algorithm. It took up less than a few lines. He looked at it, yea, that looks correct. "Normally people write it in Java and it takes them a while and it takes up a lot more space on the whiteboard. They spend a lot of time manipulating the input string."

Since my first interview at Palantir, I had done a lot of practice problems.

The highest value problems I know of are on Project Euler. The site posts a sequence of problems of increasing difficulty which you have to solve with increasing efficiency. To become a great hacker, just do those problems in order (in Python!). Then take your solution and do it in half the lines. Now, read more of the Python docs (maybe read about generators and list comprehensions and decorators) and make it even shorter. Finally, look at the solutions posted on the Project Euler site. Stand in awe of the 1-line solutions. Weep in joy over the solutions of the guy who answered the problems using just pen and paper and his brain.

The highest value courses I took are algorithms and advanced algorithms courses. I was lucky enough to study under Robert Tarjan. But I also did every problem in CLRS, the standard (and very well-written) algorithms textbook.

If you do all that, you won't be nervous at your big interview. You'll be bored.

5. Write tests on your own code, sometimes

I say sometimes instead of always because, first, it is impossible to test every case. 100% test coverage is a myth. Any non-trivial program is going to have too many edge cases to check, computationally. You need to test the high-value parts. You need to test the parts that you keep breaking.

Finally, as you finish, the interviewer will look at your code and ask you to write tests for it. So, pre-empt him and describe the tests that you would write yourself. Write edge case tests. Run the tests in your mind. Does your algorithm work? Remember, the interviewer will ask you to do this anyway, so just do it yourself and you will be one step ahead and score well.

The highest value book I read which taught me practical programming techniques is Programming Pearls. It also teaches you the importance of tests and how to write them in a pain-free way. Read this book, it's very short.

He asked me to write tests for my code, find corner cases. He then asked me 3 other problems. They were Dan Tunkelang type problems. He ran out of problems and there were 15 minutes left. "Normally there's not enough time to ask more than 1 or 2", he said. So we just talked about VMs for 15 minutes. He taught me a lot about virtual machines. This brings up lesson 6:

6. Understand what you don't know, why you don't know it, have an interest in it

Read random Wikipedia articles. You don't have to understand it all, just know enough to be able to ask someone who is knowledgeable. Usually, they can teach you a lot from the seed of what you read. But you need that seed, that germ of interest.

This will make your questions good and your conversations interesting. People like to talk about themselves. Always carry some knowledge and some ignorance as fuel for them talk about themselves and teach you. The interviewer will leave with a positive impression of you.

Don't just read blogs. Read research papers published by successful company. Read Google's MapReduce, GFS, and BigTable papers. Read Yahoo's Hadoop and PNUTS papers. Read Amazon's Dynamo paper. Big companies have big systems, and they will expect at least some familiarity with how they work. These papers are hard. If you have no systems experience, it may take you a day to read through a single one. In the end, you will understand not just how these systems work, but how to think about these systems and design one yourself.

7. Use esoteric tricks you know, teach the interviewer

You're not supposed to use libraries in these coding exercises. But if you know something cool, just use it. In the worst case, the interviewer will tell you to rewrite it. In the best case, he will be interested and ask more questions about it. You will teach him.

I taught the next interviewer about the memoization decorator in Python. Memoization takes a complicated dynamic programming problem and makes it blazingly fast. He asked me to solve a problem. I wrote an O(N^2) algorithm, then made it O(N) with just one more line of code on my post-it note (still no paper).

I taught him about how I often write a file-backed cached version of @memoized that writes to a file so that I can persist the quick results between runs from the shell. He's a C++ guy, so I taught him about Python decorators as well.

8. Think how you think, go with it.

Studies on creativity have shown that if you tell people to be more creative, they end up less creative. If you split two painters up and tell them that the most creative person wins a prize, the paintings will be boring or the same. People forced into creativity think in the same way.

So, don't think outside the box. Just think how you think to the extreme. Go with it.

The third interviewer asked me to design a game API. He wanted a low-level design, but I misunderstood, so I started adding artificial intelligence features that would suggest moves for you to make. I wasn't sure what he was asking, and I kind of tried to clarify, but then I just went with it. My main interest is AI and machine learning. He was impressed by the originality, and I eventually answered his original questions too. You can usually tell when someone is being genuinely sincere by the manner in which they go out of their way to tell you, versus when they are just mouthing the words. He honestly implored, "it was really great talking to you."

9. Give reasons for your opinions, not just opinions

I asked the next interviewer how much he liked mobile development and he said he liked it. I was learning iPhone development. I played with Android too, but I told him I found the XML for UI distasteful. He asked, "Why?" I said you always want to put as much as you can into code because it gives you power. For example, you want to create and manage a repetitive UI element in code to avoid repetition. Avoiding repetition is the whole point of good coding.

He said that Android has a solution for the repetition in the form of XML templates or something. "Oh, I didn't know that," I said. I thought about it more. "Yea," I observed, "the solution to the problem of XML, in enterprise, is often more XML." He chortled.

At a previous interview, a CTO was looking for an employee #1. He was just leaving Google to start a new startup, and he asked me what my thoughts were about Amazon Web Services. I said, "they're good."

I didn't realize that he did not have much experience with them (having been at Google), and wanted some real constructive feedback. I didn't realize that he was testing to see my experience and familiarity with AWS. More importantly, he was testing my reasoning ability and judgment. I have been running an EC2 instance continuously to run this blog for the past 5 years. It scales well and is very simple. I've used nearly every single one of their technologies since they came out. I invest all of my savings in Amazon because I know this technology will net them billions of dollars. I should have said this. Instead, all I said was "they're good."

10. Interview the interviewer

The last interviewer asked me some simple questions that reminded me of a harder problem I had in one of the companies I started. So I asked him how he would solve my problem.

I was trying to find the phrases in a document that correspond to scientific terms in order to link them to Wikipedia automatically. The vocabulary is composed of words and sequences of words like "DNA", "p53ase", "phospholipid bilayer", and "congenital determined myoglasia peptide". Documents are research papers, which can be long. There are a lot of terms (a hundred thousand biological terms, and many more if we include other sciences). How would you find and label all of the phrases in a document efficiently and what is the big-O running time?

He got the O(vocabulary size * document size) algorithm pretty easily, but I told him that there is an O(document size) solution. Can you solve it? Try it out. It's a fun, practical problem.

I pushed him a little bit but he didn't get it. I interviewed the interviewer and stumped him (although I'm sure he would get it if he thought about it longer).

11. Mention your projects and passions first

After finishing up all of the problems in 40 minutes, I had 5 minutes left with the last interviewer. I started telling him about my minimalist python web framework and he said, "That's so interesting, that's what we should have been talking about instead of going over these questions."

I got the offer this time, but I would much rather do a PhD instead. I want to learn how to push the boundaries of knowledge, not just apply what I learned in these books.

over 5 years ago on January 2 at 1:12 am by Joseph Perla in tech, hacks


Sentiment analysis using transfer learning from reviews to news

I'm going to describe some failed experiments in my research in sentiment analysis. I am using LDA and supervised LDA. I will be developing other custom models that incorporate blog comments, and I will be training using stochastic optimization in future iterations.

Data

NYT: A corpus of medium-length (500-3000 words) articles from the New York Times. It contains nearly every article from January 1, 2008 through September 2011. It contains 115,586 documents and 118,028,937 words. It contains over 100,000 unique words.

YELP: A corpus of short (10-500 words) local business reviews, almost exclusively restaurants, from the Yelp.com website. Each review is labeled with 1,2,3,4, or 5 stars by the author of the review to indicate the quality of the restaurant the text describes. It contains 152327 documents and 19,753,615 words. It also contains over 100,000 unique words, many of which are misspellings.

Experiments

I ran several experiments to figure out what information can be extracted about sentiments in the new york times articles dataset NYT.

I created a vocabulary nytimes_med_common based on the NYT dataset using words that appear in less than 40% of the documents and more than 0.1% of the documents. This removes very common words and very rare which aren't informative about the document collection in general.

First, I ran LDA on the NYT dataset using the nytimes_med_common vocabulary. On the most recent 2000 articles, I extracted 40 topics represented below. The topics closely follow the lines of politics, education, international news, and so on. They closely model the different sections of the newspaper. (lda_c_2011_10_16).

I ran sLDA on the YELP dataset using the nytimes_med_common vocabulary. This excludes many features of the YELP dataset which are specific to restaurant reviews, and misspellings (e.g. "terrrrrible"). On the first 10000 reviews of the dataset, I extracted 50 topics. The topics computed include a few topics which describe negative words. Many of the topics generally describe specific kinds of restaurants (ice cream shops, thai foods) in detail in generally neutral or positive terms. There is a chinese food topic with generally negative terms. The topics with the most extreme coefficients do seem to give a good sense of the polarity of the words contained within. Based on informal analysis, it looks like the topics would have good word intrusion and document intrusion properties. (yelp_slda_2011_10_17)

I ran LDA on the NYT dataset starting from the model and the topics extracted from the sLDA on the YELP dataset. This did not work very well, and got about the same topics as LDA from scratch. Perhaps a better experiment would be to take the topics with the most predictive coefficients, the 5-10 of them, and run LDA starting with those. (yelptopics_nytimes_lda_c_2011_10_17).

More interestingly, I created a lexicon of the words with high coefficients for predicting the polarity of Yelp reviews using Naive Bayes (yelp_lexicon and yelp_lexicon_small). I ran LDA on the NYT dataset using the yelp_lexicon as a vocabulary. This brought out a few topics that did not strictly follow along with the newspaper sections. For example, there is an epidemic/disease topic. There is a "corrections" topic with words like the following: incorrectly, misidentified, erroneously, incorrect, correction. The topic on employment reveals a strong motivator: paid, contract, negotiations, wages, executives, employees, unions, manager, compensation. Many of the topics do match up, like baseball and football and music and food and books, but it is just a much more noisy set of topics. It is easier to find the same section topics when that section uses a lot of review-filled words (like food, music, and book reviews). Many of the topics are unidentifiable, perhaps I used too many topics. But some are interesting, such as topic 029 using yelp_lexicon_small: winner, favorite, amazing, perfect, fantastic, outstanding, with other words in various sections of the newspaper.

A final experiment I ran on the Yelp dataset using nytimes_med_common vocabulary. I ran sLDA on the Yelp dataset to generate topics with coefficients. I then ran inference on the news articles using these generated topics and coefficients. The distribution of predicted ratings looks Gaussian with mean 3.5 and standard deviation .25 . Nearly all the documents are clustered to be labeled between 3 and 4 stars, with less than 5% below 3 or over 4. Even at the extremes, the documents with the highest predicted label have many death-related and terrorism-related articles. The negative extremes are also not consistent.

My next experiment will be to try to isolate topics which relate specifically to sentiment, independent of domain. One idea I have relates to fixing topics when training (an idea Chong Wang introduced to me). My idea is to run LDA on the yelp dataset to generate domain topics. Then, I will run sLDA with those topics fixed plus 2-10 extra topics which are unfixed. The fixed topics will act as background with middling coefficients, I predict, and the remaining trained topics will end up with extreme coefficients and will contain strong sentiment words independent of topics in the domains.

over 5 years ago on October 21 at 5:41 am by Joseph Perla in tech, hacks, research


Your website is unviral

Your website is probably unviral.

Everybody wants his or her website to go viral. As web designers and entrepreneurs it is our goal to create buzz; an unstoppable avalanche of traffic; a self-feeding hurricane.

For many entrepreneurs PayPal, YouTube and Facebook are the alpha and omega of marketing and strategic growth. The goal is to emulate the distinguishing characteristics of these products in an attempt to achieve similar heights.

Some websites succeed and grow on the same trajectory or faster. They usually exist in fundamentally social businesses like email, payment services or social networking in places without such networks.

However, there is also a special cadre of websites which lies in a no-man's land untouched by virality. Not only is it difficult for these sites to become viral, but the nature of the business actively fights its own online growth, behaving much like a tumor suppressor gene. These businesses never experience exponential growth and all of their growth paths, even if strong, are linear. Some examples quickly come to mind: male enhancement pills, adult diapers, schizophrenia medicine.

The online world is in the habit of thinking itself as viral by nature. Virality, some think, is built into the very fabric of the Internet. It is not. In fact, one of the most profitable online businesses is unviral: online dating.

Dating websites carry such a stigma that some couples successfully matched online invent a fictionalized romantic encounter to conceal the fact that they were mouse-selected by a filtering process and a geographic search. Although dating websites provide immense value, arguably more than almost any other online service, users will often strive to hide their enrollment from even their closest friends. Maybe especially their closest friends.

Dating websites are unviral. They do not spread by word of mouth. As a matter of fact, they actively suppress this form of growth due to the nature of their service. Many an experienced entrepreneur has made the mistake of underestimating the unvirality of online dating.

Your website may be unviral too, although it may be less than obvious. Perhaps it only demonstrates certain elements of unvirality; for instance, would your users tell all of their friends about your service, or only some? Would they actively deny using or even knowing about your website to certain friends, acquaintances, or co-workers if asked? Is it embarrassing? That would be pretty bad for your virality.

If this is the case you must face the facts: your website is unviral.

Look at examples of of purely viral websites: PayPal, the old Hotmail, YouTube and the current Facebook (not the old version that limited itself to college students) all display or once displayed growth without any symptoms of unvirality. I told everyone about Hotmail when it was launched: cousins, teachers, pen-pals. Users of contagious sites like this may not actively evangelize to literally everyone (as they do with, say, YouTube) but they certainly wouldn't avoid a discussion or hold back praise once the topic was broached. In contrast, there are many sites that quickly provoke responses of "yea that's weird," when mentioned. In these instances unvirality dominates and kills growth.

Though unvirality is not a death sentence, it does limit a potential for greater growth. In some instances, unvirality is inherent to the service or structure of the business. But if this is the case, why should an entrepreneur involve himself with it and how can he manage to save it from the depths of unvirality?

Two Answers: Anonymity and Covert Transformation

  1. The Internet supports anonymity which allows users to praise a product they love to others--thousands or even millions of other strangers--while avoiding the embarrassment and reluctance to share a product or website that often results in unvirality. Anonymous forums and reviews abound for even the most unviral of products.
  2. Secondly: The web entrepreneur can covertly transform an unviral business into a viral one by euphemizing or disguising its true purpose. For example: Create a website with dating tools but market it as a social network. Facebook at Harvard worked this way with its subtle but pivotal "relationship status." Give users a guise under which to refer their friends and thus avoid the focus on unviral traits, such as the embarrassment endorsing a dating site, that would otherwise prevent the expansion of your user base.

Utility vs. Virality

Many people confuse utility with virality, but these are actually very independent qualities. Entrepreneurs may believe that if they have developed a good product it will naturally become viral. This could not be further from the truth.

Often, people will want to tell their friends about a product they find useful but this is not a necessarily so. Some of the most useful online services actually discourage such talk. Something can be useful and viral (such as Facebook), useful and not viral (dating websites; most products ever created), not useful and viral (lolcatz; chia pets; almost all 4chan memes) and something can most definitely be neither useful nor viral (almost everything). Utility and virality are therefore orthogonal. They represent two different dimensions which can intersect but do not necessarily do so.

In fact, something extremely useful--something that you and many others may pay thousands of dollars for to bring yourselves joy for a lifetime--may be strictly unviral. As I mentioned earlier, people will actively go out of their way to not talk about some very useful products. Many medical products fall into this category. Utility is one dimension of a service and it is clearly distinct from virality, though by no means mutually exclusive. The realm of influence between utility and virality is vast and depends mostly on the nature of the business.

The lesson: just because your website is useful, does not mean it will be viral. Just because it is viral, it may not be useful and thus will die once the virus finishes spreading. First, solve the utility problem: build something useful. Then, you have to solve the distribution problem, and I gave 2 techniques for doing so: anonymity and covert transformations. Do you have other ideas?

over 5 years ago on October 4 at 4:42 am by Joseph Perla in tech, hacks, entrepreneurship


How to hack Silicon Valley, meet CEO's, make your own adventure

It was my sophomore year. Everyone was making plans for fall break. What are you going to do? You don't know? There are only 4 days left before break.

Before this particular fall break, I was busy with classes and had thus neglected to make plans. Some students were going skiing, others on class trips, others to homes nearby. Where are you going? I had no idea.

However, around this time, I was reading a lot about California. I read work by entrepreneur and essayist, Paul Graham, in which he says that the San Francisco Bay Area is the best place to start a company. He described the energy, but I couldn't palpate it. If I were to take his word, it's an ethereal, magical place.

That day, James Currier, internet entrepreneur, stood before me and a packed class full of eager students. His eyes were shot open, a purple glaze lit them afire. His wavy hair burst out atop his skinny head. Gaunt and fearless, he embraced the air as he swung his arms widely to make his point.

“Silicon Valley is absolutely the place to be,” he said. “It’s where all technology happens. It’s where Google started, it’s where Apple, Yahoo, Intel, Oracle, and so many other technology companies started. Some of the smartest people in the world lived there at Stanford, Berkeley, and Xerox PARC. It is a magical forever-sunny wonderland where dreams come true and it rains investments and acquisitions.”

He went on to make even more grandiose claims. Startups? Risky? Not at all when you do things right. Moreover, they are nothing compared to the risks of a financial job.

Everyone laughed. This room in Princeton was filled with students who had already accepted offers at investment banks or who would be applying soon. In 2006, finance was booming with big bonuses and strong growth prospects. Derivatives opened up whole new worlds for trading and speculation. Operations research quants donned their glasses in pride. They were respected.

So everyone laughed. He said, "No, really. They can fire you any time. They don't care about you. The market can turn the other way in a heartbeat. You have no job security. Your firm can go bankrupt."

To the students at the time, this all seemed ludicrous. They all envied these corporate finance jobs, nevermind that many would lose their finance jobs less than 2 years later.

He inspired. He didn't have charm so much as hurricane-force energy. He was insightful and learned. He talked about his great times. He talked about his learning moments.

And so he inspired me to see it. I had to see it. What is so special about the Silicon Valley, the San Francisco Bay Area? How can it actually be that great? What exactly gives the air such power to breath life into world-changing tech empires?

I knew what I had to do, but Fall Break was just two days away. How would I fly there without paying outrageous fees? Where would I stay? What would I do there? How would I meet the minds behind these great startups? I was a sophomore from Florida. I had no network in California.

I searched online for cheap tickets, no luck--that is until I noticed an ad for Hotwire. If you have yet to try this site, Hotwire buys leftover seats in bulk, and then sells them to users blind such that they don't know exactly which flight on which airline at what time until they buy. I snagged a very cheap ticket for 3 days later.

Now, where would I stay? I knew exactly three people from my high school in the bay area, 2 at Berkeley, and 1 at Stanford. I sent them all an email and hoped they would get back to me in time. I could always get a hotel somewhere.

Finally, how could I reach the top startups in Silicon Valley and find entrepreneurs who could meet with me on such short notice? I didn’t know any CEO's. How do I hack Silicon Valley itself?

TechCrunch always covers the hottest new funded startups. Every day, they publish dozens of new articles on the latest technology. I should just pick a few of the best and email them. But how do I choose the best? What if they don't get back to me? Will I waste a trip?

I noticed half of all of the articles listed the location of each company. Some in California, some not. So I enumerated every neighborhood in the bay area: Redwood City, Palo Alto, Berkeley, San Francisco, Menlo Park, etc. I wrote a program to crawl all of the Techcrunch archives to find all of the articles about companies in one of these cities. I then parse out the name and URL automatically as well.

I looked through the list of companies and I picked the most interesting. Some invented a new technology, and others just came out of a new incubator called YCombinator.

These days, you can do this easily yourself with Crunchbase, a useful database of every startup in existence.

I wrote another script to send an email to every single one: jobs@azureus.com, jobs@youos.com, and so on. In each email, I wrote: I am a student who will be graduating soon, and I would be very interested in learning more about your startup since I saw it in Techcrunch and I think your Company is very innovative. I'm from Princeton and I'd be interested in potentially working for your company. I am visiting California next week, can we please meet?

I sent out dozens of emails, and then I waited. Not everyone replied, but many did. I flew in, finished my homework on the plane, and crashed with my friends (all three through through the week).

I met with many CEOs. In startup land the companies are all very small. Everyone in the company has to wear many different hats. Therefore, when you send an email to jobs@startup.com, the CEO reads it. Few people realize that you can easily get direct access to startup CEOs.

One company I reached out to was YouOS. YouOS was in the first class of YCombinator. They are incredibly good hackers. We just talked over pizza and they joked about how they've written and rewritten servers from scratch so many times that they can do it in 5 minutes while sleeping. YouOS did not work out, but the founders continued innovating. A couple of them made and then sold Project Wedding. Another went on to create thesixtyone.com and Aweditorium, two of the most innovative music apps in the world.

Walking around Palo Alto, I saw several startups on each block. If you can imagine another planet where the Internet is turned into physical locations with storefronts, with Facebook next to Dropbox next to Shopkick, then you have a pretty good idea of what Silicon Valley looks like.

You see zetok, jlingo, and any conceivable combination of letters plastered everywhere. I noticed a small blue frog in one corner, it looked familiar. I tried the door and walked upstairs. The offices were in fact the offices of Azureus, the Bittorrent app that made torrents popular. I went up to the exhausted man walking hurriedly by the front desk, and I began with: hi. I'm Joseph Perla. I am a student looking for a job. I am visiting just for a week, can I talk to you for just a few minutes?

He was taken aback at first, a little flustered. He said, yes, sure, but not today, I'm a little stressed because I'm signing papers. We’re raising 4 million dollars right now. Can you come tomorrow?

Absolutely.

The next day, I spent 3 hours talking to the CEO one-on-one about Azureus, raising money, silicon valley, bittorrent, technology, France (he's French), and the french technology industry.

I ended up meeting with half a dozen other startup founders. I toured the golden gate bridge and many parts of the bay area, Berkeley, and Stanford.

The Valley is very friendly, and everyone does everything they can to help you because, at some point, someone definitely went out of their way to help them succeed. I started building a network from nothing. I directly used the connections I made on my spontaneous trip to start my next company, Labmeeting.

David Tisch, of Techstars NYC, made a great point recently. Startups are very difficult. The odds are against you. Your competitors are twofold. On the one hand you compete with the biggest companies in the world. Even more difficult, you compete with inertia and ignorance and apathy. Everyone in the startup industry knows how hard it is, so we all do what we can to help each other to beat the odds. That's the only way it works at all. That's how we succeed against all odds. Silicon Valley is one big mega-commune of startup capitalists.

Make the most of what you have (friends in new places), trust in people, and find out what the ethos of Silicon Valley is really like. I know scores of startups who would love to have smart students, especially students looking for jobs, visit their offices and see what they have built. I can point you in the right direction. CEO's love to tell their stories more than you like to listen to them! Let me know if you plan to make your own adventure, and please tell me about your trip when you get back.

over 5 years ago on October 3 at 12:00 pm by Joseph Perla in tech, hacks, entrepreneurship


Write bug-free javascript with Pebbles

Github: https://github.com/jperla/pebbles

We actively seek contributors!

Goals of Pebbles

  • so easy that designers and non-programmers use it to write complicated AJAX!
  • 0 lines of javascript
  • complicated ajax websites
  • 0 lines of javascript
  • no bugs
  • 0 lines of javascript
  • very fast speed and optimality.
  • backwards compatibility with clients who have javascript off (this was more important 4 years ago when I first made this)
  • Memrise loves it!

Plus, you don't even have to write one line of javascript!

The basic idea is that almost every complicated AJAX interaction can be reduced to a handful of fundamental actions which can be composed (remind you of UNIX?). So, all you have to do with this library is add few lines of HTML to elements of a page to describe the Pebbles response that happens when someone clicks that element. Maybe you submit a form, maybe you fetch some content and update part of the page.

Most current websites write and rewrite slightly different versions of these same basic patterns in javascript. This separates the HTML which has information about AJAX interactions and the Javascript which has other information. But you want it all in one place!

Pebbles uses the jQuery.live function. Very heavy pages with tens of thousands of elements take 0 time to load, since almost no javascript is executed.

Javascript can be tricky to write even for an experienced programmer. Moreover, a lot of this stuff is repeated, and it shouldn't be. Pebbles brings more of a descriptive style programming (a la Haskell, Prolog) to the web in the simplest of ways.

FAQ

Couldn't you just write javascript functions that you call that do the same thing?

You might but then you introduce the opportunity of syntax and other programming errors, thus not achieving 0 bugs. You would also have to figure out how to make it fast yourself. In practice, this library is so straightforward to use that once you define a complicated action, which only takes a few seconds, you can move it around and it just always works.

Moreover, it's easier to auto-generate correct readable html (e.g. from Django templates). Many of your pages won't need *any* javascript even if highly dynamic. All the custom logic is in one place rather than spread over the html and the javascript. Basically, writing javascript is harder than what amounts to a DSL in HTML.

I need more complicated action-handlers than just these 3, can you please make them?

The code is open source and on Github on jperla/pebbles. Feel free to add your own enhancements. Be careful because you want to keep your app simple, and, in my experience, these 3 actions comprise the vast majority of user ajax paradigms. With a little thinking you can probably do what you want using either "form-submit" or "replace" with the right response html.

Technical Documentation

Pebbles accepts spinner url (to an animated gif of a spinner for waits). Pebbles sets up a live listener on divs with classes of type "actionable".

Classes of type actionable contain a hidden div which has class "kwargs".

.actionable .kwargs { display: none; }

kwargs div contains a number of <input> html elements, each with a name and value. The name is the key name, the value is the value for that key. In this way, in HTML, we specify a dictionary of keyword arguments to the actionable.

Here are some self-explanatory examples:


It fails loudly if misconfigured. It's hard to write buggy code and not notice in quick testing. It is easy to do everything right and it is easy for you to write a complex ajax website with no extra javascript code.

Full arguments are below:
===========================
Arguments:
  type: replace, open-close, submit-form
        replace replaces the target with the url
        open-close will toggle hide/display the target, 
                which also may dynamically lazily load content from an url
        submit-form submits a form via ajax which is a child of the actionable,
                or may be specified in form argument; 
                the response of the ajax replaces target

  url: url string of remote page contents

  target: CSS3 selector of element on page to update

  target-type: absolute, parent, sibling, closest, or child-of
                Absolute just executes the target selector in jQuery.
                Parent executes target selector on jQuery.parents().
                Sibling the same on siblings.
                Closest looks at children and children of children and so on.
                child-of looks at target's children

  closest: selector used in combination with target-type:child-of to get target's children
  form: selector used in combination with type:submit-form to find the form

If you use the open-close type, then the actionable can have two child divs with classes "when-open" and "when-closed". Fill when-open with what the actionable looks like when the target is toggled open (for example, a minus sign), and fill when-closed with what the it looks like when the target is toggled closed (for example, a plus sign).

over 5 years ago on September 22 at 5:41 am by Joseph Perla in tech, hacks


How to launch in a month, scale to a million users

These are case studies.

I will talk about my last two startups where I used a lot of techniques to build them quickly and scale them up. Here I explore different techniques I used to architect them to scale which are quite simple, but someone who is not familiar with building systems may be interested in learning how to build his or her own scalable site.

This is based on the outline of a paper by Lampson with more modern web-based examples:

http://research.microsoft.com/en-us/um/people/blampson/33-Hints/WebPage.html

Labmeeting was a search engine for biomedical literature and a social network for scientists. http://www.crunchbase.com/company/labmeeting.

The same principles in Lampson can be applied to any large system such as Google or http://www.turntable.fm.

Functionality

Keep it simple.

We built API's before making the website at Labmeeting. That means that the design of data access, security, and data flow happens long before the first interfaces are created. Simplicity in a small interface is key, with well-defined and single-purpose functions coming from each module and submodule of the API. The whole front-end interface uses exclusively less than 30 methods in 5 modules available in the API.

Get it right.

From day one, we built automated tests into Labmeeting to catch any conceivable and subtle bugs that we may introduce during development. In advance, we knew that it would be a complex, dynamic site with hard to reproduce state. This made it all the more important that each simple method in the API performed exactly as it needed to both in the edge cases and in the normal case. We had individual function tests, module level tests, and full integration tests that automatically started a full chatserver and tested real requests. The tests were run on every commit and no bugs were allowed to persist before writing new code.

Don't hide power

Labmeeting was a very dynamic website using a lot of AJAX to speed up page requests and minimize initial page download size. You could click on a fragment of an abstract, or a button that said Full Text, which then made a request to the server and replaced information on the page with the response. A lot of javascript is repetitive, and can be very buggy. We implemented a library that could create complicated AJAX interactions by writing 0 javascript, instead just adding a few extra HTML tags to code. The library virtually eliminated bugs and increased speed on the site by eliminating javascript execution time and centralizing code. Despite just requiring HTML tags, the library allows for maximum flexibility by the user to submit full CSS3 jQuery selectors as arguments if desired. Despite normalizing an interface, it does not hide the power of jQuery. Memrise.com now uses this library enthusiastically.

You can see the docs and use this library yourself at the Pebbles introduction.

Use procedure arguments to provide flexibility in an interface

We created a system for filtering through news articles. The system has many basic parameters that can be passed that are very simple, but the parameters are simply procedures. Therefore, if someone had a special complicated need, they could write their own function that returned a boolean value of whether to filter the news and pass that through the interface.

Leave it to the client

The interface at Labmeeting was very simple, and we expect the client to perform complicated manipulations of the many elements of the interface and keep track of all those states. This allowed the backend to be developed very quickly, although it meant that frontends, like an iPad or Android app, take a little longer to develop.

Continuity

Keep basic interfaces stable. Keep a place to stand if you do have to change interfaces.

The API of Labmeeting and Stickybits is versioned. They can thus offer full compatibility with previous functionality, but enhancements and changes can be made in newer versions.

Making implementations work

Plan to throw one away.

Many of the routines in the initial prototypes were written very quickly and with an eye to throwing them out once in full production mode. For example, the first version of the PDF search feature pulled in the whole user's collection and did a search in memory. A fully optimized version would be a little more complicated, but many routines were designed that way with an eye to throwing the inner part of the function out and rewriting once the bottlenecks are identified.

Keep secrets of the implementation

We built Labmeeting up from separate silo'd modules that, while decreasing performance a bit, allowed them to operate independently and with maximum flexibility to respond to changes in requirements in the interface. For example, the PDF manager knew nothing about how users were stored or queried. The User api could store users in memory, on disk, in Postgres, or halfway across the world. Lab groups only knew that it could call the same API external methods used to look up a user or set of users.

Use a good idea again instead of generalizing it

At Labmeeting, we had to extract author names from PDFs. We realized that we could do decently well at extracting the names using machine learning techniques, but never perfectly. However, by indexing a gazette, a complete database of every possible PDF, then we could simply make some guesses (possibly using machine learning) and then just look up those guesses in the gazette to see if there is a match. It becomes a problem of efficient enumeration. We didn't generalize it, and used the idea again in a slightly different context. Each PDF has a scientific abstract with various complicated terms from biology and physics. We wanted to identify those important terms to allow further exploration. Again, some indicators could point us in the right direction, but we did not get everything. So, we crawled Wikipedia to compile a gazette of biological terms, then merely used those terms in the abstracts that appear in the gazette modulo very frequent words like DNA. This was highly accurate again. We linked these extracted entities to Wikipedia to provide further information for the curious.

Handle all the cases

Handle normal and worst cases separately as a rule

At Labmeeting, we analyzed PDFs to extract the title, publication date, and other information. The special case of a PDF which is encrypted and unparseable and no text can be extracted went straight to a separate method. The special case could possibly be handled by a more general-purpose algorithm for text extraction that happens to special case to a right answer, but it is more straightforwardly handled separately. Anyone reading the code could see it plainly, rather than having to think through the special case in more complicated parsing code.

Speed

Split resources in a fixed way if in doubt

At Labmeeting, we put the database index on a separate machine from the Solr search index. We had millions of search queries coming into the search system, and we didn't want those queries to slow down the db, and thus normal operation of the site. Writes take much longer than reads, and are more important for logged in users. External users of the site using the search engine just hit the index, performing exclusively reads on the index. This allowed us to scale up the search index independently from the database.

Use static analysis if you can

At Labmeeting, before every commit, I had a version of PyFlakes run on all of my new code. PyFlakes is a static analysis tool for Python that finds common errors that can be detected before run-time. For example, PyFlakes can find improper number of arguments to a function call and references to variable names that are not in scope (like typos). Static analysis finds a lot of bugs that might appear in production only rarely in edge cases. It is most useful in a language like Python that is dynamic and thus doesn't have a lot of the normal safety features available to a statically typed language.

Cache answers to expensive computations

Obvious we did this all of the time at Labmeeting. For example, we performed a document similarity search to find "Related Papers" when we showed one individual paper to recommend other papers a scientist may want to read. The vector computation and search for this is quite expensive so we cache the results for a month. Another example: we had to open up a PDF file which has a research publication, perform text extraction, and then do an information extraction step from the text to analyze the title, authors, publication date and other information. This is a difficult problem to do and involves searching a gazette of 30 million documents and querying the PubMed database at least once. Once this process was completed for one step we saved it to the paper metadata so that we would not have to calculate it again for that PDF each time. The flip-side of caching is that for quickly changing data then one needs to be careful about cache invalidation.

When in doubt, use brute force

We wanted to get the first version of Labmeeting finished very quickly. There are many ways to optimize a system to improve performance, but they come at the cost of decreasing modularity, making more assumptions, and, most directly costly, developer time. The first implementations of the pdf search algorithm used brute force linear search by pulling the name of every scientific paper and then searching each one for the substring. This takes a few minutes to write and does not require a complicated separate hosted index. Moreover, for the small number of papers used during testing, it ends up being much faster than doing a network query to a search index!

Compute in background when possible

After a user uploads a PDF to Labmeeting, a process must go through the PDF, analyze it and normalize it, perhaps convert it to a standard format, extract the metadata, and deduplicate it. This process can take a while, so we avoid this process from blocking the web server by pushing it to a queue. When the queue completes, it sends a message back to the user, which adds the PDF to the person's collection.

over 5 years ago on September 21 at 5:41 am by Joseph Perla in tech, hacks, entrepreneurship


Log Reader 3000

I wanted to show off another python script today.  I think it’s pretty cool.  It’s kind of like a very rudimentary version of something you might see in Iron Man.  And, of course, anything in Iron Man is cool.

I dub it Log Reader 3000.  It’s purpose?  It helps me monitor logs.  How?  Well, sometimes I need to follow a log in real time as it is written, but I can quickly get bored.  The log scrolls by endlessly while, very often, little new information spits itself out.  I can quickly lose focus, or at the very least, damage my vision after staring at a screen intently for extended periods.

Ideally, I want the log to simply flow through me, and if my subconscious notices something odd, then I can act on it.  If the log is read aloud to me, then I can work on other tasks and let my auditory memory and auditory processing take note of oddities on which I need to act.

So, I made a python script to read the log out to me as it is written. It is my first Python 2.6 script.  I take advantage of the new multiprocessing module built into the standard library.  I also use the open-source festival text-to-speech tool.

First, install festival.  sudo apt-get install festival in Ubuntu.  You probably want to set it up to work with ALSA or ESD sound.  By default, festival uses /dev/dsp, which means that you can’t use festival and any other program that uses audio (like Skype) at the same time.   Fortunately, and as usual, Ubuntu provides detailed, simple instructions to set up festival with ALSA: https://help.ubuntu.com/community/TextToSpeech .

Finally, just find an appropriate use case.  Note that most log monitoring applications would not be improved with Log Reader 3000.  If you just want to be notified of errors, you should have a program email you when an error appears in a log.  If you want to understand the log output of a program that has already run, understand that Log Reader 3000 is meant for live-running programs.  Yes, Log Reader 3000 can be modified to read any text file line-by-line.  But, you will find that reading ends up being much faster than listening to a slow automated voice, so I recommend that you just try to skim a completed program’s output with VIM.

So then why ever use Log Reader 3000?  It is useful for applications which fit all of the following criteria:

  1. you want to monitor a live running program
  2. and the debugging information is nuanced and you need a human to interpret it (i.e. it cannot be filtered programmatically) and/or you want to be able to intervene while the program is running to keep it doing what it ought to be doing in real time.

Applications:

  • Say that you are spidering the web, and what the spider should and should not be spidering is not yet well-defined, but a human knows, then the Log Reader 3000 can read aloud where the spider is, and the human can correct course as he or she notices the spider going astray.
  • Or, say that you are working on some kind of artificial intelligence.  Perhaps, the AI program can reason aloud and a human can correct or redirect the machine’s reasoning as it goes along.  I have no idea how or why an AI would do that.
  • Maybe you want to protect against bot attacks, but your aggressor is particularly clever and seems to avoid looking like a bot in all of the obvious ways.  You can pipe the output of your log into Log Reader 3000 and notice new kinds of suspicious patterns live while reclining in your chair or surfing the web.
  • You run a securities trading program.  You have numerous checks and double-checks to ensure that everything works correctly.  Nevertheless, you need to have a human monitoring the system as a whole continuously anyway, so you have Log Reader 3000 read aloud total portfolio value, or live trades, or trading efficiency, or fast-moving securities, or all of the above.
  • The lobby of your startup has a TV screen with graphs of user growth and interaction on the site.  You want to increase the coolness factor by having a computer voice read aloud some of the searches or conversations happening on your site live.
  • You make a living by selling cool techy art projects which blend absurdity with electronics.  You read aloud live google searches, or live wikipedia edits, or inane YouTube comments out of what looks like a spinning vinyl record.  Passersby whisper of your genius.

Once you have the application, just tail -f the log, parse out the parts you want the log reader to read (you can use awk for that, for example, or maybe a simple python script), and pipe that into the Log Reader 3000.

tail -f output.log | awk “{ print $1 }” | ./log_reader_3000.py

How does Log Reader 3000 work?  The main process reads in one line at a time.  As it reads in each line from stdin, it sends it to the processing queue.  The child process reads the last item in the queue (it discards the items at the top of the queue because those are old and we need to catch up with the latest output line) and then calls a function to say() the line.  The say() function simply uses the subprocess module to call festival in a separate process and then blocks until it is done saying it aloud.

Because having a computer voice read aloud a sentence takes a while, the log probably outputs many more lines than can be read aloud.  That is why a multiprocess queue is needed, and that is why Log Reader 3000 only reads out the most recent line which has been output, which is why it is most useful for specific applications.

Here is the script, log_reader_3000.py:

#!/usr/bin/env python2.6
import sys
import subprocess
from multiprocessing import Process, Queue
def say(line):
    say = '(SayText "%s")' % line
    echo = subprocess.Popen(['echo', say], stdout=subprocess.PIPE)
    subprocess.call(['festival'], stdin=echo.stdout)
def listen_to_lines(queue):
    line = 'I am Log Reader 3000.  The world is beautiful.'
    while True:
        while not queue.empty():
            line = queue.get()
        say(line)
queue = Queue()
p = Process(target=listen_to_lines, args=(queue,))
p.start()
while True:
    line = sys.stdin.readline()
    sys.stdout.write(line)
    queue.put(line)

over 8 years ago on November 21 at 4:33 am by Joseph Perla in art, hacks, technology


A Clean Python Shell Script

Guido van Rossum, the creator of Python, recently wrote a post on his blog about how Python makes great shell scripts, even (especially?) compared to shell scripts traditionally created in Bash and using purely shell commands.

Guido is absolutely correct.  Shell scripts birth themselves painfully from my fingertips.  Bash’s kludgy syntax irks my orderly sensibilities.  Typos frequent my unreadable scripts.  iffi?  Who invented this?

On the other hand, Python sticks to just a handful of language constructs, so the language does not force me to google how to create else statement every time I need to do it.  Python just makes sense.

One of the comments asks if Guido can show him some “really beautiful” python shell scripts.  I don’t mean to brag, and by no means do I think that my scripts in particular invite the light of the heavens to shine upon them, but I think I follow PEP 8 fairly closely and I pay attention to the brevity and clarity of my language.  I find that I can quickly debug my scripts because of their transparency at run-time and their concise self-annotating source code.

So, below I show a script which I call merge_branch.py. Do

chmod +x merge_branch.py 

so that you can run it from the command line with a simple ./merge_branch.py.  I alias it in ~/.bashrc.

alias mb='../path/to/merge_branch.py'

The script simplifies what I would have to do manually in git many times a day.  You see, git exemplifies a great version control system for keeping track of source code.  I create branches instantly.  This encourages me to work on bug fixes and new features separate from the main code base.  As others make changes, I can easily integrate their changes by merging the changed branch into my branch.  Git’s merge algorithm embarrasses any other I have ever used.  Some of Subversion’s merges still haunt my nightmares to this day.

So, git rocks.  Unfortunately, conflicts sometimes do occur.  So, the proper procedure for merging a branch needs to be followed carefully.  People are bad at doing things carefully. That’s okay.  We should spend more time on making mistakes being creative.  That is why we invented computers to do work for us.

For best results, merge the latest changes people made to master into your branch as often as possible.  Small changes incrementally will probably mean small merge fixes.  One big change will probably cause you major pains.  So, merge from master into your branch often.

If you use GitHub or another central repository with a number of other people, then you must do a number of things.  First, make sure that all the commits that you wanted to make are commited to your branch and that you didn’t leave any files out that you have not explicitly git-ignored.  This happens a lot in SVN and Git if you are not careful, and it is the greatest source of frustration for anyone who uses a system like this.  To human is to err, c’est la vie.  Then, git-checkout master.  Make sure that origin has the latest changes from your master.  Then git-pull the latest changes form origin (github) into master.  Then git-checkout your branch again.  Then git-merge master into your branch.  If there are any errors, fix them and commit, otherwise you are done.

Also, when you complete all the changes in your branch and all the tests pass, then you need to git-merge your branch into master and git-push it back up to origin (possibly github).  You need to follow the procedure above to ensure the latest master changes are included in your branch (preferably before you run the tests).  Then, you check out master, git-merge master into the branch (this will be clean since it should just be a fast forward because you already have the latest master).  git-push the changes to origin.  Finally, delete the branch that you just completed.

This tedium rotted my brain for weeks.  Finally, I resolved to write a script to solve the tedious parts, but bring possible errors to my attention if they occur.

Please let me describe to you a few features of the script.  First, I try to follow PEP 8 as much as I can.  I have read it at least times; you should too. Also, recite the Zen of Python every night before you go to bed.

Notice how I start the script with a shebang line which says /usr/bin/env python, the preferred way to start Python since it is most flexible.  For example, I can use Python 2.6, or my own local version of Python.

I use the logging module which is part of Python’s very large standard library.  Logging gives you so much for free.  For example, instead of commenting out print statements, just change the default logging level threshold.  Always use the logging module instead of using print for everything.  Always.  It’s as easy as import logging; logging.error(’…’).  Also, the logging.basicConfig(…) I use here is the same one I use everywhere.  Logging the time that a message appeared saves hours and hours when I debug long-running scripts.

Use the optparse module in every Python shell script you write (getopt is too weak and will end up being much, much less simple by the end of a non-trivial program).  Again, you get so much for free, like a -h help command.  The documentation for optparse explains how to do everything in detail.  Make sure you set the usage parameter.  Also, make sure you call parser.error() for input option errors instead of raising an exception yourself.

Write utility functions.  Use the power and simplicity of Python to your favor.  Here, I use call_command().  I use it throughout the script and it makes the code so clean and clear.

Finally, I like to put the main() function of scripts at the top.  That makes the most sense to me.  If I open a file, I want to instantly read what it does, not read what its utility functions do.  I put the utility functions below.  Of course, at the bottom, after everything else has loaded, I place the if __name__=”__main__” code and then call main().   This way, I can import this as a module (in, for example, py.test or iPython) to test the utility functions without running the actual script.  (warning:  Do not put anything except the call to main() at the bottom.  Otherwise, you may not realize under what circumstances you call main() and with what parameters.)

Here is the script:

#!/usr/bin/env python
import os
import re
import subprocess
import logging
import optparse

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(message)s')

def main():
    usage = "usage: %prog [options]"
    parser = optparse.OptionParser(usage)
    parser.add_option("-m", "--merge-master", dest="merge_master",
                    action="store_true",
                    default=False,
                    help="Merges the latest master into the current branch")
    parser.add_option("-B", "--merge-branch", dest="merge_branch",
                    action="store_true",
                    default=False,
                    help="Merge the current branch into master; forces -m")
    options, args = parser.parse_args()

    if not options.merge_master and not options.merge_branch:
        parser.error('Must choose one-- try -m or -B')

    # Merging branch requires latest merged master
    if options.merge_branch:
        options.merge_master = True

    if options.merge_master:
        output,_ = call_command('git status')
        match = re.search('# On branch ([^\s]*)', output)
        branch = None
        if match is None:
            raise Exception('Could not get status')
        elif match.group(1) == 'master':
            raise Exception('You must be in the branch that you want to merge, not master')
        else:
            branch = match.group(1)
            logging.info('In branch %s' % branch)

        if output.endswith('nothing to commit (working directory clean)\n'):
            logging.info('Directory clean in branch: %s' % branch)
        else:
            raise Exception('Directory not clean, must commit:\n%s' % output)

        logging.info('Switching to master branch')
        output,_ = call_command('git checkout master')

        output,_ = call_command('git pull')
        logging.info('Pulled latest changes from origin into master')
        logging.info('Ensuring master has the latest changes')
        output,_ = call_command('git pull')
        if 'up-to-date' not in output:
            raise Exception('Local copy was not up to date:\n%s' % output)
        else:
            logging.info('Local copy up to date')

        logging.info('Switching back to branch: %s' % branch)
        output,_ = call_command('git checkout %s' % branch)

        output,_ = call_command('git merge master')
        logging.info('Merged latest master changes into branch: %s' % branch)
        logging.info('Ensuring latest master changes in branch: %s' % branch)
        output,_ = call_command('git merge master')
        if 'up-to-date' not in output:
            raise Exception('Branch %s not up to date:\n%s' % (branch, output))
        else:
            logging.info('Branch %s up to date' % branch)

        logging.info('Successfully merged master into branch %s' % branch)

    if options.merge_branch:
        logging.info('Switching to master branch')
        output,_ = call_command('git checkout master')

        output,_ = call_command('git merge %s' % branch)
        logging.info('Merged into master latest branch changes: %s' % branch)

        output,_ = call_command('git branch -d %s' % branch)
        logging.info('Deleted safely branch: %s' % branch)

        call_command('git push')
        logging.info('Pushed master up to origin')
        logging.info('Ensuring that origin has latest master')
        stdout,stderr = call_command('git push')
        if stderr == 'Everything up-to-date\n':
            logging.info('Remote repository up to date: %s' % branch)
        else:
            raise Exception('Remote repository not up to date:\n%s' % output)

        logging.info('Successfully merged branch %s into master and pushed to origin' % branch )
def call_command(command):
    process = subprocess.Popen(command.split(' '),
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    return process.communicate()
if __name__ == "__main__":
    main()

over 8 years ago on November 17 at 11:00 pm by Joseph Perla in hacks, technology


Switch windows quickly

I want to avoid RSI.  So, I want to use the mouse as little as I possibly can while I focus on my keyboard.  Unfortunately, alt+tabbing between windows takes far too long and annoys me with how much I have to think to move between the windows.

I often have several windows open.  I want to switch between them very quickly.  So, I created a program using Python and Xlib that generates another program.

The program moves my mouse automatically to a specific location and then clicks automatically.

The program also creates a keyboard shortcut in Gnome.  So, I can type something like Alt+q which calls the program above, moves the mouse to a specific location, and then clicks.

Finally, the program also generates a program which, when called, opens up many windows at specific locations in a grid:

If I combine all of these programs, I can focus between windows very easily. I can move the mouse between 12 spots in a 4×3 grid using the keyboard letters Q,W,E,R, etc like below:

To move to the Top-Left window, I hit Alt+q, which moves my mouse to the top left and clicks, focusing the top-left window.  To move to the bottom right, I press Alt+v, then the mouse moves to the bottom right and clicks, focusing into the bottom right window.

Basically, when I’m programming, I never have to move to my mouse to switch between windows very quickly.  I can open up 12 files at once, and switch between them swiftly and deftly.  When you work on a large application, this becomes very useful.

I’ve open-sourced the program here: http://github.com/jperla/mouse-focus-shortcuts/tree/master .

over 8 years ago on September 18 at 8:19 am by Joseph Perla in hacks, technology


Heart Rate

In PE in middle and high school, I always remember taking my pulse and having a very high resting heart rate relative to others.  I forget the exact number, but it might have reached 80 beats per minute or higher.  I always thought I counted incorrectly or double-counted.

I’ve started running and walking almost daily recently.  I feel better and more energized, especially after a run.  I would like to know if my overall health improved since I started about 4-6 weeks ago.  The first days, I could barely go around the block.  I would come back into the house and gulp down a gallon of water in between deep panting heaves.  After a couple of weeks I could run/walk a mile or two without gasping for breath by the end.  Yesterday, I estimate I ran/walked 5 miles (I know that I walk 4 miles an hour) without a problem.

So, in terms of endurance, my fitness improved.  But what about my heart rate?  I check yesterday and today.  It’s down to about 63 beats per minute.  I made a graph using a short Python script, pygooglechart, and the Google Chart API.  I will be plotting more data points every day:

Heart Rate

Lance Armstrong has a resting heart rate of 32 beats per minute!  On the other hand, some quick research online tells me that resting heart rate poorly correlates to fitness, although recovery rate would be a better measure.  I’ll start charting my recovery rate once I figure out how to measure it easily.

over 8 years ago on July 24 at 7:09 pm by Joseph Perla in hacks, life, personal


Y Combinator Application Guide

Y Combinator, a kind of mini-venture capital firm, invests tens of thousands of dollars ($$$) into very early seed stage start-up companies run by smart technology hackers.  They wanted to fund me in Summer 2008.

I applied to Y Combinator two times.  The first time, when I applied with my friend Mason for the Summer 2007 round,  I arrogantly presumed that Paul would lavish on us praise and beg us to fly to California to work with him.  I spent no more than an hour on the application.  We had no passion in the idea we presented.  Our projects list hinted at nothing particularly remarkable or unique.  Our analysis of the idea and our competitors delved only into the shallowest parts of a deep lagoon.

The second time, when I applied alone in Summer 2008, in an inspired moment I sat down in Starbucks for a solid few hours to work on the application.  I strived for excellence, not perfection.  A few months prior, I had briefly glimpsed the semi-successful application of Liz Jobson and Danielle Fong.  I recalled their deep detail and thoughtful writing, so I imitated that kind of deep analysis which shows off one’s mastery of logic and breadth of experience.

I wish I had known how to write a good application the first time.  So, taking my cue from Brian Lash’s recent question on Hacker News, I helped him out.  I write here a slightly expanded version to help out anybody else who wants Paul Graham & co. to fund his or her startup.

If I were to advise myself in 2007, I would recommend that I write briefly but write a lot.  This advice seems contradictory, but I mean it in a very specific way.  My first application, I kept brief.  I did not want to swamp YC with a tome of text. I saved many of my accomplishments for the interview.  Do not do this.  Write, write, and write some more.  Write everything interesting and unique about yourself.  If you have doubts about a statement you made about a competitor, qualify it.  Don’t vacillate, but at the same time don’t seem shallow, ignorant, and inexperienced.

Of course, once you’ve written all that, you have a very long application.  Now, take out filler words.  Compress ideas that take up two sentences when you can use just one. If you waste two words in a sentence, delete the whole sentence and write it again from scratch.  If you see a phrase that you think an investment banker might use on his resume, nuke it.  Achieve a high density.  In my experience, the YC crew truly pores over these applications to understand all of the meat of it.  They do not skim your application when it has rich content.  Cut, cut, and cut some more.

Now, step back and look at your application.  If you have very little writing left, real content, then you may not be the best fit for Y Combinator this year.  That’s okay.  It’s good you know now.  Take this year off and work on some interesting, hard projects that nobody has done before.  Bounce your idea off of the smartest person you know.  Hell, micro-test the idea.  Then, repeat this process.

Step back, look at your tight list of accomplishments.  If it’s long, that’s great, since reading something long but rich in content everyone loves to do.  The length indicates strength.  In my limited experience, I think this is how I made my application successful.

Here’s some of my application below (I elided some less relevant parts). I was accepted for Summer ‘08 2008 but decided to pass this time for a variety of reasons.

—————————————-

What is your company going to make?

I’m open to anything. Here’s one idea:

————–

Have you ever scanned a document before? How was that experience?

It was terrible for me, too. Everyone I have ever asked has agreed that it is physically painful. But, there is a solution, one based on understanding actual human needs. What is wrong with the scanners of today?:

* slow (takes time to heat up)

* slow (scanning at a high dpi takes a long time)

* complicated (please select the dpi, now select bla, now bal[sic]…)

* cumbersome (files generated at high dpi are huge, slow down system)

* cumbersome (OCR’ing a document is a whole other rigamarole)

What do people really need?  Simply a decent, readable scan of the document. This should be as easy as holding the paper up to face the monitor.

Imagine that.

I propose that I sell a device which is basically just a decent-resolution CCD chip with a special lens which connects to a computer (wired at first, but v2 wireless). Scanning a document is as simple as holding the camera up to a document and clicking. In my tests, scanning a whole text books takes 5-10 minutes. This is a game-changer. I’ve worked with an ip lawyer to file the provisional patent on this and a few other aspects of the designs.

[BY THE WAY, IF ONE OF YOU WANTS TO HELP ME BUILD THIS, I'M ALL EARS. I'M AN AI HACKER NOT A HARDWARE HACKER. OH, BY THE WAY, I USED A DIFFERENT IDEA IN THE INTERVIEW ROUND, NOT THIS ONE SINCE I'M SKEPTICAL OF THE MARKET FOR THIS PRODUCT AT THIS POINT. NEVERTHELESS, IT'S VERY COOL. I WANT TO BUILD THIS FOR MYSELF!]

For each founder, please list: name, age, YC username, email address, personal url (if any), and current employer and title or school and major. List the main contact first. Separate founders with blank lines. Put an asterisk before the name of anyone not able to move to Boston June through August.

….. [Be sure to put your blog here. Don't have a blog? Make one. Blog about whatever is on your mind. Blog about your hacking.

To be honest, an Ivy League pedigree probably helped.  Also, my computer science degree (as opposed to Economics or Business one) probably encouraged YC's faith in me.]

Please tell us in one or two sentences about something impressive that each founder has built or achieved.

Looking at some things in ~/projects folder: ……..

[Here I mention a few of my projects, with links to open source code, web pages, anything I can publicly show. I didn't spend more than one or two sentences describing any one project, but I listed many of my most interesting projects and why I worked on them. YC likes to see you working on real problems, so I talked about problems I solved for myself and for others directly

They want to see that you think creatively and that you actually finish things.

It goes without saying that you should list projects which uniquely describe you.  Building a toy language in Programming Languages class many people probably do.  Yes, it may have taken you a long time, and you may have learned a lot, but you do not necessarily stand out.  Writing a CAPTCHA solver to hack Digg few people do or can do.]

Please tell us about the time you, ljlolel, most successfully hacked some (non-computer) system to your advantage.

…… [I talked about my shotgun email to dozens of startups here in Silicon Valley which gave me the opportunity to meet a lot of cool entrepreneurs.  I'll probably blog about this at some point in the future.]

Please tell us about an interesting project, preferably outside of class or work, that two or more of you created together. Include urls if possible.

(see above) [I applied alone, so group projects inapplicable.]

How long have the founders known one another and how did you meet? Have any of the founders not met in person?

n/a [Again, I was a sole founder.]

What’s new about what you’re doing? What are people forced to do now because what you plan to make doesn’t exist yet?

(see above) Basically, nobody ever scans anything because it takes forever, doesn’t really do what you want (you just want a readable, small image and for the document to be searchable),

What do you understand about your business that other companies in it just don’t get?

Scanner manufacturers try to pack in the highest dpi they possibly can. They focus on resolution, when they should be focusing on the user experience. Speed is what they should optimize, but I see no scanner manufacturer doing that.

Who are your competitors, and who might become competitors? Who do you fear most?

HP, Xerox, etc, also ScanR, Qipit, Evernote …… [I go on to be brutally honest about the difficulty and vulnerability of my position as a hardware startup in a crowded field. Remember, you are writing for some very, very smart people. They want to see your analytical thinking skills here. They want to see you be realistic, not delusional.]

……. more questions, answer analytically deeply, answer honestly to the best of your ability ……

If you had any other ideas you considered applying with, feel free to list them. One may be something we’ve been waiting for.

…….. [I always think of new ideas and discuss them with friends. I chose 4 and listed them here. I crisply described each in no more than 2 brief sentences.]

over 8 years ago on July 20 at 10:37 pm by Joseph Perla in entrepreneurship, hacks, life, money, personal, technology, ycombinator


Make your own chimes

I just bought a Mac Mini.  I love it.  Apple spent a lot of time polishing OS X.

I configured OS X to create a chime, a 21-st century chime.  In the Date & Time settings, I selected it to tell me the time every hour.  At 6pm, a voice from the computer says, “It’s 6 o’clock.”  At 10pm when South Park comes on, it reminds me by saying “It’s 10 o’clock.”  The chime keeps me conscious of the time passing when I’m online.

I dual-boot Ubuntu on this Mac Mini.  Ubuntu, unfortunately, does not have this chiming feature. However, I set it up in minutes.  I installed festival, the free open-source text-to-speech synthesizer, as well as an American voice (I struggled to understand the British voice).

sudo apt-get install festival festvox-kallpc16k

Then, in crontab, I added one line:

0 * * * * echo "(SayText \\"Its`date +\%l` oclock\\")" | festival

Now I easily keep track of the time.

over 8 years ago on April 17 at 1:12 am by Joseph Perla in hacks, technology


How to check email two times a day

Tim Ferriss popularized the idea that you should limit the amount of time you spend checking email every day.  He espouses a philosophy of life called the low-information diet.  By following these guidelines, you get more done and, more importantly, feel less stressed.

One of his suggestions about email spread across the blogosphere very quickly because of its simplicity and practicality.  He recommends that you check email only twice a day (or preferably less often) and strictly adhere to that rule.  I started following these guidelines a few days ago, but I easily relapse.  Nevertheless, I do a few things to try to stay on the wagon:

  • Add a message to all outgoing emails:
EXPERIMENT: I will be checking email 2 times a day at 1pm and 6pm pacific time.
If you need me earlier, then please contact me below.

And of course I put my contact information below.  With this signature, I do not worry about missing out on important and urgent information or replies.

  • Delete all links, shortcuts, and bookmarks to GMail
  • Set up a script to automatically open up GMail at 1pm and 6pm every day.  In Ubuntu, I write just one line in crontab:
0 13,18 * * *  export DISPLAY=:0 && firefox https://mail.google.com/

Linux makes hard things easy.

over 8 years ago on April 15 at 9:02 pm by Joseph Perla in hacks, life, technology


Stanford’s Entrepreneurial Thought Leaders Series (now on your iPhone!)

The Stanford Technology Ventures Program runs a well-developed incubator for tech businesses at Stanford University.

STVP offers some very cool resources free to the world. For example, I have been listening to their Entrepreneurial Thought Leaders audio podcast. The program brings in some of the greatest entrepreneurial forces in Silicon Valley today. Some of the speakers include

and so on.

While I was listening to Mike Maples and Ron Conway give their talks about angel investing, I had trouble following along and knowing who was saying what. I had subscribed to the talks through iTunes. When I visited the site, I noticed that they published videos as well, but in a Flash format, and not available as a podcast. I followed the videos much more easily.

So, in accordance with their fair use license, I decided to scrape the video metadata, download the video files, transcode them into an iPod-acceptable format, and republish them in a simple video podcast format. I now have scores of these talks in my iPhone and on my Mac Mini to view at my leisure.

Now, you can download them to your iPod, too. Just subscribe through iTunes or any other podcatcher.

P.S. I’ve open-sourced the code (AGPLv3), in case you ever want to make a podcast yourself.

over 8 years ago on April 5 at 7:43 pm by Joseph Perla in entrepreneurship, hacks, technology


Learn 100 digits of pi at lightning speed

Learn 100 digits of pi at lightning speed.

In a previous post, I wrote about the Secret to Pi.  I wrote about the method I used to learn 100 digits of pi in under an hour and remember them days later without extra practice.

While memorizing the digits of pi using this method, I realized that I was spending most of my time trying to think up words that would translate to the digits.  I tried to think of the longest word I could.  Sometimes I would screw up and use a word that did not translate to the correct digits.  I spent 2/3rds of my time just thinking of good words, images, vivid pictures.  It was hard and slow.

So, I decided to make a computer program to find the words and optimize everything for me.  I did, and I’m releasing the code under Affero GP.  Of course, all the code is PYthon2.5.  Please allow me describe it to you.  With the words precomputed, I can learn pi as quickly as I can tell a story!

At the top above, I linked to a page which I generated automatically using these libraries I’m releasing.

There are a few libraries.  They all require NLTK.  NLTK is an excellently-designed, well-developed, actively-maintained open-source natural language parsing library.  It has many (nearly 1GB of) corpora.

First, generate_nouns.py is a script.  We need to automatically generate a good, long list of concrete nouns for you to have strong images and remember the story of pi visually. It uses the CMUDict Pronunciation Corpus which is in nltk.corpus.cmudict.  It also uses the wordnet corpus in nltk.wordnet.  The script does some intelligent processing to filter out archaic words, curse words, and abstract nouns.  Run generate_nouns.py at the command line to create a nouns.csv file, or just download my copy in the repo.  50-75% are very good, concrete, vivid nouns for this purpose.  If you can help me get a higher percentage/more good nouns, please tell me.

Second, there is soundmap.py.  Soundmap.py is a library (import soundmap) that you can use to convert a word or phrase into the corresponding digits.  To be perfectly flexible, it loads a file which describes how to match which sounds to which digits.  I provided the sounds.csv file which is the one I use.  I haven’t tried to figure out what would be the optimal configuration yet, but maybe you can :) .  This also uses the CMUDict Pronunciation corpus (of course).  Call soundmap.convert_to_digits(phrase) to have it return a string of digits.

Finally, there is mapwords.py.  Mapwords.py is a library that takes in a string of digits (such as the digits in pi) and uses the nouns.csv list of nouns and soundmap.py to figure out the optimal sequence of words for people to remember that sequence of digits.  It also has a couple hundred digits of the famed constant inside the library: mapwords.pi.  Simply call mapwords.get_best_mapping(mapwords.pi) for it to return a list of words.

You can put all these together and quickly learn thousands of digits of pi.  Here’s a great page with many digits to throw into the program.

over 8 years ago on March 14 at 12:01 am by Joseph Perla in art, hacks, science, technology


The Secret to Pi

NEW: I created a website designed designed to teach you dozens of digits of pi in minutes using this secret method. If you want more digits, I also open-sourced the code.

Tomorrow, around the country, schools and universities will be celebrating the ratio pi (π).  Students and professors will eat blueberry pies, talk about math, and hold contests.  The pie-eating contests look like fun.


(photo by becw)

Another large part of the festivities are pi recitation contests.  In these competitions, students face off against one another attempting to see who can recite the most consecutive digits of pi starting at 3.14.  A number of schools have them.  The Daily Princetonian, from my alma mater, reported on a contest few years ago where one student recited a couple hundred digits of pi.  Harvard’s Crimson reported on a student last year who recited more than 1000 digits of pi in one sitting.  Even elementary school kids have these contests on March 14th.

Last year, FOX News interviews some students about the significance of these contests.  “Bryan Owens, an MIT senior, says the ability to recite pi is a sort of bragging right, a coin of the realm.”  Math geeks wear the number of digits of pi they know as a symbol of pride. Some people become obsessed: “In 2004 Umile read the digits of pi into a
tape recorder. He did it a thousand at a time and gave it a rhythm _
some numbers high-toned, some low. He listened to the tape constantly. This went on for two years. A two-year trance.”

Two years.  I say, a waste.  I know the Secret to Pi. I have learned 100 digits of pi in under an hour with perfect recall a day later.  That includes overhead of learning the secret, so now I can learn at the rate of 100 digits of pi every 20-30 minutes.

The secret:

First, you must leverage human psychology.  Understand that you can only remember so many things.  Furthermore, human beings evolved an incredible capacity for remembering some pieces of information, but not others.  We have a weak capacity for numbers, strings of digits.  They are unnatural.  However, we have a strong capability for remembering images, and a large number of images.  You can close your eyes and see your childhood home, even imagine walking through it clearly.  In particular, we remember vivid, unique images.  You will remember and recognize the picture above with the pied woman’s face, because you do not see that everyday.

You also have good short-term auditory memory.  That is why it might be easy for you to learn 10 digits of pi very quickly, since you just replay your auditory memory of the digits, but you forget them just a day (or an hour) later.  Stories and images, however, stick with you.  You can imagine most of Harry Potter’s long journey from beginning to end.  How do we take advantage of our natural gifts?

Simple.

  • Associate words and hard, concrete images with numbers.
    Take any word with a strong visual image. For each hard consonant sound in the word, associate a digit to that sound. Ignore the vowel sounds. Use the key that is on Tim Ferriss’s excellent blog post on the topic.
  • Associate the digits of pi with a list of words. [3.]1415 926 becomes TREADWHEEL BANJO.
  • Make a story with the words, in oder. Make it wacky. The more offbeat, the easier to remember. Imagine the story vividly.
    (Or use the method of loci; just don’t run out of locations!).
  • Recite pi: simply remember the words in order, then translate each word into the corresponding digits.

So, instead of trying to remember chunks of numbers, and how chunks of numbers relate to other chunks of numbers, you just remember a story.  Imagine you are on a farm, pushing a TREADWHEEL very hard with your own hands (for some reason, remember the zanier the easier to remember;  the droll is forgettable).  You’re sweating from the work, but the sweet farm fresh air keeps you going.  Suddenly, somebody comes in playing a BANJO, playing your favorite song.  And so on.

Instead of imagining a story, you can use the method of loci. This method involves imagining vividly a location you already know very well, such as your home.  Imagine walking through, very clearly, and identify objects one by one.  For example, imagine your kitchen.  You start walking in, the first thing on your left is a CLOCK, then a CUTTING BOARD, then a REFRIGERATOR, then a MICROWAVE, etc.  Finally, associate each word in your list of words with each word in your kitchen, in order.  So imagine a TREADWHEEL CLOCK, then a BANJO where you can cut vegetables like a CUTTING BOARD.  Now, to recite pi, just imagine walking through your kitchen.  You will first see a CLOCK, but you will also immediately think of a TREADWHEEL.  Then you take a step and see your CUTTING BOARD, reminding you of BANJO.  You can use this method to remember any kind of list.

Finally, use the key to translate a word into the sequence of digits in pi.  TREADWHEEL is 1415, and BANJO is 926, so those are the numbers in 3.1415926… And just continue on that way.

Not only do you learn the digits more easily, but also you remember them for weeks without a refresher, and re-learning becomes an easy game of recalling the story just once for a few minutes.

Moreover, this method is overall less taxing on your brain.  You use up less of your mental space (although I think you have more than you will ever need).  To memorize 100 digits of pi, you need only remember 1 story, composed of 40 words, instead of 100 digits.

Finally, learning to recite pi this way becomes useful for remembering other numbers or lists of numbers.  If you ever have to remember a very long number, just break it up into a few words and remember those.

I want to demistify pi memorization.  Although people may treat it that way, it is no more a measure of real intelligence than the SAT or Stanford-Binet.  You can learn a few hundred digits of pi tomorrow morning, go to a pi recitation contest, and blow everyone out of the water.  When people say how smart you are, just tell them you how easily you learned hundreds in a few hours, and how they can too.

over 8 years ago on March 13 at 11:25 pm by Joseph Perla in hacks, science


Stuff

Stuff sucks life out of you. Americans are brainwashed to consume more and more every day. Unfortunately, this consumerism is spreading around the world. It should stop. Although I’ve wanted to describe it here, I could not figure out a clear way to express the weightlessness I feel after having eliminated nearly all my stuff.

Someone has written a fairly good article describing these ideas for me:

http://www.getrichslowly.org/blog/2007/11/21/the-hidden-costs-of-stuff/

Paul Graham wrote about this very topic recently as well. He does not seem to follow the idea to its conclusion, nor is the writing particularly well suited for the non-technical, but it’s another opinion:

http://www.paulgraham.com/stuff.html

over 9 years ago on November 26 at 11:32 am by Joseph Perla in hacks, life


Tiles

When someone searches for your name on Google, you don’t know what they might find. They might find an article or two in which you were featured, or they might find a random post you made to an open source project. Or, they might find someone else and not know he is not you. The scattered information evaporates and confounds your online persona. You have a blog, but it might not point to your LinkedIn account, or your home-made balancing robot video.

What I need to be able to do is manage my online identity easily. Moreover, I don’t want my data to be hostage to a single company. But, that’s exactly the opposite goal of a company like Facebook or Microsoft. Ideally, companies want to trap your data entirely on their platforms so that you must use their services, even as they become inferior due to lack of competition.

Now, a knowledgeable guy like me could set up, for example, his own identity server and RDF Friend-Of-A-Friend (FOAF) page. However, most people cannot, of course. Moreover, such an technical, intellectual exercise would seem to serve no purpose. FOAF has not the following of Facebook.

I propose a service which I would value today, even if I were the only person using it. It may even improve as more people use it. Am I proposing another social network? No.     Please, no. Facebook wouldn’t want you to link to your MySpace or LinkedIn accounts. No, Mark Zuckerberg want to be MySpace and LinkedIn. He even wants to be Windows.

I propose an elegant identity aggregator. You constantly create yourself and reveal yourself online. You want to make sure that people can see all that you created and all that you are. Additionally, you want to authoratatively say which sites are about you, so the imposters can stay hidden. You want this to be easy. You want this to be elegant. You want it to be open. You don’t want to sign up for another social network.

I envision a beautiful, simple way to aggregate all of your online personas onto a single page which summarizes and links to all of the others. Onto a single page, a single tile. Your tile shows small thumbnails and links to your Flickr photos, knows who you are on Facebook, MySpace, and LinkedIn, points out and highlights the best articles about your accomplishments.

It happens simply, you go to OneTile, tell it about your blog. Then, it intelligently searches for more about you. It guesses that you might be this user on Digg, and that user on Yahoo Answers. Eventually, it picks up enough information to create a personalized, flashy tile for you. The best content about you stands out, but it’s all reachable. Now, when someone searches your name, your tile comes up. It’s impressive, concise, and accurate. Plus, all the data you have collated is easily downloaded through RDF, not locked into OneTile.

As more and more people start using the service, it gains notoriety. I imagine a sea of tiles for everyone online. This is what I imagine the OneTile homepage could be:
Tiles Homepage Mockup

It highlights the most fascinating tiles with the most interesting content. You can zoom in to have a closer look:

Tiles Zooming Mockup

Slick animation zooms through the sea of tiles as you browse through people’s public online lives. Finally, you can choose one tile to examine closely.

Tiles OneTile Mockup

When choosing a specific tile, one person’s tile, you can see their name and all the tiles that are theirs: their flickr pictures, their photo albums, their rss reader on the right, maybe some friends, links to social networks on the left, some research tiles in the middle.

Some people link to blogs when they mention their friends. I think OneTile would be more complete.

My friend Dan O’Shea sparked the idea for me. He described to me a similar vision, and I had this one. We might start building this soon. As I said, if I can make this nice, it would be useful for me myself. If I can help others, that’s just extra pasta.

over 9 years ago on November 21 at 12:48 am by Joseph Perla in art, entrepreneurship, hacks, technology


Telecommunications

Telecom companies (AT&T, Verizon, Telefonica, and so on) undermine capitalism. They always act either monopolistically or oligopolistically. If there were any real competition in this space, then our Internet capabilities would blaze at gigabit speeds, text messages would be free, our phones would have myriad more features, and calling anywhere in the world would cost nearly nothing.

Some trailblazers are pushing the limits of what they can do within the system, leveraging the Internet to provide the useful services that telecoms should have provided a decade ago.

For example, I now have a few phone numbers. My personal cell phone number I give out only to people I trust. But every call eventually gets rerouted to my cell phone. Right now, I am out of the country, so my US cell number would be dead under the telecom’s schemes. It isn’t.

If you call my cell phone, the call gets rerouted to my GrandCentral number which I use as my main business line. If I don’t know you, GrandCentral asks you to record your name once so that I can screen the call. In fact, with GrandCentral, I can send you to voicemail, and listen to you record your voicemail live. If you are leaving me an important message and I realize that I want to interrupt and talk to you directly, I just hit a button. Telecoms should have provided this feature ages ago. GrandCentral is free.

That’s not the end of it. I cannot answer a GrandCentral call directly; GrandCentral redirects calls to another number that you can answer. In my case, I reroute to a number I have in San Francisco. It’s my SkypeIn number which I bought with a discount through my Skype Pro annual membership. Skype Pro also comes with voicemail.

The SkypeIn number in San Francisco forwards to my Skype account, so I can answer your calls on my laptop anywhere in the world. The calls can last for hours, it costs me nothing. Of course, I would prefer a free live video chat through Skype.

Moreover, if I am not at my computer, then Skype will forward your call to my foreign phone at cheap Skype Out rates. There’s the magic of the Internet. All of these features, all either free or very cheap. It lets me connect with you as I normally would, (nearly) anywhere in the world.

The telecoms don’t provide these features because they can get away with it. There is no competition which can enter and provide better services. I’m lucky that GrandCentral and Skype exist at all.

over 9 years ago on November 18 at 11:05 am by Joseph Perla in hacks, technology


Memory

I see many people taking copious notes when studying. Sometimes they copy and recopy the notes in an effort to learn, to memorize. I read that studious individuals often write prodigious amounts of notes to help them remember.

UPDATE: I misunderstood Matthew Cornell’s recording methodology. It is more a method for organizing new ideas and questions that come up while reading a text. I’m a big fan of this method of keeping track of my creativity and curiosity.

These methods are counterproductive. Writing notes, rather than increasing comprehension and recall, degrades memory. The notes become a crutch. Because the notes exist, because the ideas exist in writing, your mind freely forgets them.

Look at phone numbers today. Many people do not know even their own cell phone numbers. They can look up their numbers easily if needed, so their minds see no need to remember. People used to know their phone numbers. The choice is not conscious, it just happens.

Gordon Bell, a researcher at Microsoft, records every second of his life. He can quickly search through any document or web site he has read digitally based on time and date. He can browse through continuous audio recordings and continual snapshots of his surroundings. When he tries to remember something, he immediately looks to his LifeBrowser. But, in knowing that they exist, the recordings are his mind’s crutch. He suspects that these recordings “might be slowly degrading his real, carbon-based brain’s ability to remember clearly.”

Furthermore, the barrage of notes that people write rarely are re-read. Instead, people remember the ideas for a short time, refer to the notes for a short time, and then discard the notes while simultaneously forgetting the ideas. Even if someone does keep all of his notes, then the stewing garbage heap of notes becomes too large to be useful. It becomes impossible to find and learn from such a large, unstructured set of data.

Some people today want to create better tools to search through these notes. I propose a better solution. Never take notes of things that you actually just want to know. That means never, for example, taking notes in class. Never take notes of a book that should be teaching you.

I stopped taking notes sophomore year of high school. It began as an experiment. I never had a good memory. I did much better in classes where a few simple concepts and reasoning were most important like math. I did poorly (not as well) in history and other detail-filled classes. I would write copious notes for hours on end, sometimes rewriting the notes to help practice the ideas. But I would quickly forget them. I’d leave my future self with little knowledge and a pile of notes too large to manage.

So I decided to stop wasting my hours taking notes. Just stop completely. I wanted to see if my mind would make up the difference. At first, I just forgot everything I read. But as I relied solely on my mind to remember the details and concepts in everything I read, and as teachers and tests pressured me to remember, I did. I remembered more and more of what I read. After a couple of years of grueling AP classes in history (I took all of them), I could read something and remember all the concepts effortlessly. I can listen to a lecture and remember every salient point. I can’t say that I have a photographic memory, recalling every tiny detail in order, but I did improve from a terrible memory to decent good one.

I think anyone (without an amnesic disorder of course) can benefit by not taking notes, no matter how poor a person claims their memory to be. The improvement simply requires a fundamental change in approach. I support fundamental shifts in thought or behavior to make true, major, order of magnitude improvements. The improvements are many: less effort, less time wasted, better retention, longer-lasting retention, fewer hand cramps, less paper to buy, less paper to store or throw out, etc.

Finally, there is a big distinction between not taking notes and not writing anything down at all. I advocate not taking notes, in the sense that you should not write down anything that you should know or understand in your mind. Anything relating to classes, anything in non-reference books, and so on are in this category. You want this knowledge available to you at any time for the rest of your life. Do not relegate the ideas to paper. What should be recorded and stored are things that shouldn’t really clutter your thoughts: most references, tedious equations (not ones fundamental to understanding), and errands to run some time this week.

I think the latter is very important to consider. You won’t remember any notes you write down easily, but you can use this limitation to your advantage. You don’t want the stress of an errand to clutter thoughts of more important things. David Allen, author of Getting Things Done, is a major advocate of making complete lists of responsibilities and action items. The purpose is to get them out of your head to free your mind. It works; I totally support this idea. Keep ideas of paper due-dates, project deadlines, and groceries out of your mind and on paper; keep ideas of Platonic ideals, search engine optimization, and fluid mechanics off paper–in your mind.

over 9 years ago on October 15 at 3:13 pm by Joseph Perla in hacks


Orders of Magnitude

Never go for the 1% or 2% or even 10% improvements. Although these improvements require time and effort to achieve and maintain, these small gains are tiny, often imperceptible, to a person.

No, I say waste no effort even considering any kind of cost-benefit exchange. Ignore these small fruit within your grasp. You will be picking berries all your life because it is easy, because you can see the nearby goal. Instead, spend a little more effort looking for and finding the watermelons. The larger goal might take more creativity to achieve and the end may not easily be in sight, but in the end you spend orders of magnitude less time and effort. Much more importantly, you employ your natural human faculties to create. You enjoy imagining and innovating much more than the tedious and repetitive.

Unfortunately, the world focuses on picking berries. Although we naturally love to reason and think and be creative, for some reason, we also love to accumulate the small improvements. I think the quick satisfaction and low chance of failure cause us to avoid a much more fruitful but less clear end. Established companies depend on squeezing out small improvements in efficiency, at the cost of quality and employee morale. These big companies grew large due to a several order of magnitude innovation, but, now large, the executives do not want to risk innovation. They prefer to squeeze out whatever profit they can from their established product until the company smothers itself under its own morass of details.

Even in your day to day tasks, you focus on small details which seem significant. They, however, pale in comparison to real efforts you can contribute to create orders of magnitudes of improvement. Environmentalism is a great example. Some people go through great pains to recycle what is left after consuming something. But what if they don’t consume it in the first place? Would their life be materially more unhappy? Not at all by any scientific survey. Not at all by any experiment I have done. In fact, if you really sit down to think about it, the next thing you buy you almost definitely do not need and you probably don’t actually want. It just takes effort to go out and buy, to store, to maintain, to consume, and to dispose of. Sometimes it makes you feel guilty (e.g. many snack foods). What actually affects you most is who you are with and how you think about life, others, and yourself. Changing your thinking brings orders of magnitude of change. Buying more baubles does nothing.

People often hate their jobs because they pick berries. At a career fair the other day, I overheard many students asking the recruiters what kind of work they would be doing, will I be “creating new models” or something else? No matter what job you have, in whatever industry, in a large company you will be making small improvements to the existing process. Thats why they hire you and others. The guys at the top found the berries, and they are hiring you to pick them. It doesn’t matter if you are in the research department or if you are the manager of trading. If you are lucky, your job is minimally creative.

But that does not have to be the case. I think it’s possible to move away from this model. The company just needs to make a solid commitment to not make small improvements. All changes, all work, should not only be quantifiable, but easily qualitatively perceptible in value to the outside world. Demand that no efficiency improve things by less than an order of magnitude. Let’s define an order of magnitude as doubling, twice the original. So, ten times better would be a little more than 3 orders magnitude (2^3). A thousand-fold improvement is 10 OM’s. There are few 10 OM in reach but many 3 OM changes.

So, don’t figure out how to improve department communication by 10%. Figure out how to eliminate the 90% of the communications, documentation, and emails which are not only useless but also suffocating. A strong 10 OM improvement.

Usually, these major improvements are so large and fundamentally revolutionary, the improvements are barely measurable:

Don’t increase staff by 50% so you can have more people indexing the Internet by hand. Instead, build an automated search engine that intelligently uses links already online to figure out which web pages are the best. Immeasurable improvement that actually scales.

Don’t stick a slightly faster computer chip and more RAM into your video game system. Instead, design a completely different game controller which couples motion and your whole body tightly with the game. Qualitatively improve the way people experience and invest themselves in your games.

The orders of magnitude improvements are always there. You can always find them, as long as you do not occupy any of your time with the small ones.

over 9 years ago on October 7 at 1:40 pm by Joseph Perla in entrepreneurship, hacks, technology


Capturing frames from a webcam on Linux

Not many people are trying to capture images from their webcam using Python under Linux and blogging about it. In fact, I could find nobody who did that. I found people capturing images using Python under Windows, and people capturing images using C under Linux, and finally some people capturing images with Python under Linux but not blogging about it. This instructional post I wrote to help those people who want to start processing images from a webcam using the great Python language and a stable Linux operating system.

There is a very good library for capturing images in Windows called VideoCapture. It works, and a number of people blogged about using it. I was jealous for a long time.

There are a number of very old libraries which were meant to help with capturing images on Linux: libfg, two separate versions of pyv4l, and pyv4l2. But the first doesn’t work on my computer, the two versions of pyv4l cause segfaults because they are so old and not updated, and the last has no code written.

Finally, I learned that OpenCV has an interface to V4L/V4L2. OpenCv is Intel’s Open Source Computer Vision library. It’s excellent, extensive, and has a good community behind it. V4L is Linux’s standard abstraction for reading in video data. V4L2 is the newer version which Ubuntu also has installed.

Plus, OpenCV has very complete Python bindings. Unfortunately, these bindings and how to use them properly to capture images from a webcam are not documented. Only after careful searching on the sizable OpenCV mailing list did I finally find the answer.

Below is code that reads in up to 30 frames per second from a web cam while simultaneously displaying what it reads in. It’s very cool. It uses opencv’s camera acquisition abstraction, PIL, and pygame for speed in the looping. Note that with the images read into Python, you and I can now do arbitrary things with the image. We can flip it, track objects, draw markers, or do really anything.

This is example utility code. It is not a well structured program. Much of the code I use below is from techlists.org.


import pygame
import Image
from pygame.locals import *
import sys

import opencv
#this is important for capturing/displaying images
from opencv import highgui 

camera = highgui.cvCreateCameraCapture(0)
def get_image():
    im = highgui.cvQueryFrame(camera)
    # Add the line below if you need it (Ubuntu 8.04+)
    #im = opencv.cvGetMat(im)
    #convert Ipl image to PIL image
    return opencv.adaptors.Ipl2PIL(im) 

fps = 30.0
pygame.init()
window = pygame.display.set_mode((640,480))
pygame.display.set_caption("WebCam Demo")
screen = pygame.display.get_surface()

while True:
    events = pygame.event.get()
    for event in events:
        if event.type == QUIT or event.type == KEYDOWN:
            sys.exit(0)
    im = get_image()
    pg_img = pygame.image.frombuffer(im.tostring(), im.size, im.mode)
    screen.blit(pg_img, (0,0))
    pygame.display.flip()
    pygame.time.delay(int(1000 * 1.0/fps))

over 9 years ago on September 26 at 12:41 pm by Joseph Perla in hacks, technology


Amazon EC2

Amazon provides a great resource to anyone with its Elastic Computing Cloud (EC2). Right now, the service is in limited Beta, but it should grow and become more open soon.

The purpose of EC2 is to let Amazon create and host virtual computers which you can start and stop at any time. You pay by the hour, with no minimum cost. So, if you have a large computing project, you need lots of computers but only for a short time, then, instead of buying lots of computers for this one task, you just create a number of virtual computers on EC2.

Some companies use EC2 as normal web servers. The hard disks on the computers are virtual, so, if it crashes, then you lose all the data stored on the virtual computer. You will have to take serious backup precautions if you want to use EC2 for serving web pages. Nevertheless, you should be doing so any way. I think a really good application of EC2 is for serving a non-stateful service, such as a PDF converter.

You can use Amazon Simple Storage Service (S3) for backing-up some data, or just for doing logging.

To test the service, I am running one virtual computer at jperla.homeip.net. It is lightning fast. I can pull up an SSH terminal into it from anywhere. It runs the latest version of Ubuntu Linux, so I feel at home.

James Gardner provides a very thorough walk-through of EC2 using Boto.

over 9 years ago on September 24 at 12:13 am by Joseph Perla in hacks, technology


If You Approach Your Startup Like Building a Ferrari You Will FAIL

I disagree that if you try to perfect your product, you will necessarily succeed. In fact, I contend that you will necessarily fail.

Perfection is impossible. You will never achieve it. Your startup can get 90% of the way there with just a little effort. To get that last 10%, to be close enough to caress perfection, would take orders of magnitude more work. You never launch.

Take the example of designing your homepage. It is incredibly complex and dynamic, with dozens if not hundreds of individual components. Of course, it is also the most important page of your entire site. Now, consider just one tiny aspect of designing this page: where to align the text. You end up spending hours deciding how many pixels from the left side of the screen your text should start. Yes, there is a perfect distance. You can make focus groups and do scientifically-rigorous experiments, then ask a consultant to run all the data you collected through a quadratic optimizer. Or, you can ask the guy next to you if it’s ugly or not. Saves you money, save you time, and you can actually get the valuable part to the end-user.

Better yet, at regular intervals, just run through a quick usability test with someone, anybody. Sit next to them, and ask them to find some information. If she hesitates, then you found an important problem. If she finds it quickly, then congratulations! Your site is ten times better than nine out of ten sites online. If she says that the orange on your hompage should be just a little more red, then you need a different tester. Color, spacing, width, etc are not important. They are means to an end. Don’t spend time measuring and testing and perfecting the means. Spend your time excelling at the ends.

Ferrari is a great brand. But it also doesn’t make the kind of money that Google makes. They only make any amount of money because they are an old company, and because they advertise to people that they spend a lot of time “perfecting” their engines. I’d still bet Toyota engines last longer. Sure, Ferraris can accelerate faster and have a higher top speed, but a lot Ferrari owners don’t race and instead just buy the cars for their cachet.

So, I can probably create a car company overnight which would have as much or more cachet, but without the wasted effort in trying to “perfect” design. All I have to do is take a Toyota, put a slightly different frame on it (Lotus?), and perhaps stud it with diamonds and layer solid gold on everything. Instantaneous excellence. On top of everything, I’d probably also make more money than Ferrari.

over 9 years ago on September 19 at 12:08 pm by Joseph Perla in entrepreneurship, hacks, technology


Word Frequency

Tim Ferriss recommends efficiency in learning a new language. He suggests that you be skeptical in the traditional language education plan. One of his methods for learning a new language is to focus on the most common words which you will encounter.

I started to search for lists of the most common words in some languages, but I could not find many. I decided to find out for myself. I downloaded almost two thousand news articles on Google News (all English) over a period of a couple of weeks. Of course, I can get a better view of word frequency if I take a larger sample over a longer, more varied period of time from sources other than online newspapers. Nevertheless, I think this sample is good enough to get the broad-stroke view of the written language. Just ignore only recently common words like “iraqi.” Also note that this is going to be biased by the lexicon of newspapers today. Speech and even written fiction have different lexicons which will yield different word distributions.

Here is the list of a hundred or so most common English words with at least 4 letters in order along with the number of times I counted them among the hundreds of articles. I put this hack together very quickly, so there are some minor editing mistakes which aren’t a big enough problem to go back and correct.

English:
about: 4975
hours: 3860
business: 3531
health: 3480
after: 3386
their: 3070
people: 3009
world: 2754
august: 2722
would: 2693
sports: 2603
search: 2599
google: 2548
other: 2407
which: 2191
times: 2042
first: 2035
report: 2009
contact: 1762
video: 1728
states: 1696
state: 1682
privacy: 1675
online: 1642
president: 1622
national: 1608
article: 1604
entertainment: 1602
insurance: 1585
could: 1540
services: 1535
e-mail: 1525
service: 1522
united: 1519
there: 1511
local: 1459
washington: 1450
terms: 1443
policy: 1427
print: 1421
company: 1407
»: 1398
years: 1359
press: 1347
estate: 1343
travel: 1338
government: 1334
mortgage: 1329
copyright: 1321
before: 1299
technology: 1275
related: 1269
today: 1266
email: 1262
rights: 1252
agoby: 1247
stories: 1240
politics: 1226
media: 1209
million: 1199
comments: 1194
south: 1172
wednesday: 1164
story: 1164
against: 1147
international: 1146
money: 1144
comment: 1131
three: 1126
military: 1114
house: 1100
canada: 1092
reuters: 1076
should: 1073
reports: 1068
while: 1057
still: 1053
subscribe: 1041
blogs: 1037
those: 1034
center: 1032
tuesday: 1030
political: 1027
loans: 1022
because: 1019
information: 1018
since: 1014
during: 1004
police: 999
american: 997
being: 983
percent: 976
group: 973
mobile: 967
special: 962
children: 961
version: 957
popular: 957
weather: 956
former: 952
security: 945
iraqi: 938
general: 937
music: 935
articles: 935
where: 929
officials: 925
according: 917
white: 911
through: 891
opinion: 888
content: 882
newsletters: 876
federal: 869
public: 865
posted: 863
found: 847
court: 846
killed: 845
daily: 844
games: 843
think: 841
going: 833
friday: 828
latest: 827
craig: 826
between: 826
education: 825
classifieds: 817
place: 809
another: 801
reserved: 794
under: 788
death: 771
second: 767
science: 766
guide: 752
support: 751
attorney: 751
statement: 750
space: 749
nation: 748
events: 742
finance: 738
troops: 732
share: 726
market: 722
without: 720
network: 719
markets: 708
chief: 707
photos: 702
companies: 702
thursday: 701
right: 700
america: 699
advertise: 695
global: 687
party: 683
credit: 680
edition: 678
marketing: 675
least: 674
these: 673
advertising: 671
sales: 669
don’t: 668
including: 665
country: 665
features: 664
michael: 663
headlines: 661
financial: 660
number: 655
released: 654
associated: 654
county: 652
india: 651

I also did this for the two other languages I know. The Spanish is taken from Spain and Peru’s Google News sites.

Spanish:
noticias: 7620
google: 4092
madrid: 3909
agosto: 3785
sobre: 3622
gobierno: 3608
entre: 3399
deportes: 3380
internacional: 3280
presidente: 3077
desde: 2755
portada: 2651
todos: 2631
servicios: 2527
nacional: 2499
contra: 2483
salud: 2418
tiempo: 2227
millones: 2104
publicidad: 2103
foros: 2103
otros: 2089
ayuda: 2082
hasta: 2057
derechos: 1991
espana: 1984
mundo: 1923
diario: 1888
mas: 1873
nueva: 1871
resultados: 1852
cultura: 1846
españa: 1837
partido: 1835
sociedad: 1819
parte: 1804
personas: 1795
estado: 1792
durante: 1729
todas: 1728
tiene: 1725
nuevo: 1721
prensa: 1714
enviar: 1689
barcelona: 1669
cuando: 1654
primera: 1594
segun: 1571
cerrar: 1567
grupo: 1566
digital: 1566
ministro: 1563
ciudad: 1469
pasado: 1463
primer: 1428
radio: 1414
especiales: 1407
sevilla: 1406
hoteles: 1404
horas: 1404
perú: 1391
internet: 1388
donde: 1384
menos: 1365
tambien: 1362
reservados: 1354
terremoto: 1352
centro: 1326
porque: 1316
puede: 1315
total: 1305
fecha: 1301
general: 1295
semana: 1292
economia: 1258
fotos: 1255
otras: 1244
cuatro: 1243
noticia: 1236
pregunta: 1217
quien: 1213
puerta: 1199
julio: 1193
politica: 1191
navegador: 1188
inicio: 1185
gente: 1182
texto: 1176
estados: 1175
javascript: 1173
antonio: 1173
imprimir: 1170
acerca: 1170
soporta: 1168
medio: 1160
momento: 1146
copyright: 1146
forma: 1142
votadas: 1139
acuerdo: 1132
equipo: 1125
martes: 1121
enviadas: 1117
aunque: 1097
hemeroteca: 1094
datos: 1090
actualidad: 1086
seguridad: 1079
unidos: 1076
leídas: 1066
antes: 1063
titulares: 1059
terra: 1057
francia: 1054
ahora: 1051
mientras: 1041
buscar: 1031
zapatero: 1023
economía: 1003
director: 1000
ciento: 978
mundial: 976
europa: 974
aviso: 974
mayor: 964
ordenar: 956
mejor: 956
muertos: 954
legal: 952
motor: 949
viernes: 935
popular: 931
fueron: 931
verano: 929
incendios: 927
vivienda: 925
principal: 922
empleo: 921
juegos: 915
euros: 906
despues: 906
archivo: 906
tecnologia: 905
futbol: 894
comunidad: 888
cualquier: 859
videos: 858
años: 849
ministerio: 848
politica: 845
clasificados: 840
blogs: 839
privacidad: 829
medios: 829
trabajo: 826
grupos: 825
jornada: 820
comentarios: 817
incendio: 816
economia: 815
mujer: 814
(reuters): 814
civil: 803
bolsa: 800
cinco: 799
servicio: 797
jueves: 794
pisco: 792
sitio: 787
registro: 784
alertas: 776
secciones: 775
perú: 771
expresa: 771
opinion: 770
hecho: 767
informe: 766
punto: 762
impresa: 762
final: 761
atentado: 761
horasel: 758
domingo: 749
frente: 747
hombre: 733
correo: 731
muerte: 729
meses: 726
latina: 722
mismo: 720
habia: 719
crisis: 714
segundo: 713
relacionados: 709
ademis: 709
hacer: 706
país: 705
tanto: 703
opinion: 703
tienda: 700
edicion: 699
búsqueda: 699
estos: 698
contacto: 694
barca: 693
empresas: 688
haber: 687
cursos: 687
segunda: 685
septiembre: 676
lugar: 673
futbol: 673
artículos: 673
interior: 672
enlaces: 672
mediapro: 670
tickets: 669
carlos: 667
ediciones: 664
sigue: 663
puntos: 662
europea: 657
queda: 656

French:
france: 4608
août: 3328
monde: 2825
sarkozy: 2463
cette: 2341
google: 2179
comme: 2145
votre: 1911
politique: 1909
c’est: 1887
commentaire: 1714
premier: 1679
depuis: 1652
paris: 1622
ministre: 1574
aout: 1555
aussi: 1535
figaro: 1511
contre: 1487
faire: 1476
avait: 1415
groupe: 1359
entre: 1349
avoir: 1332
selon: 1313
d’une: 1284
articles: 1281
propos: 1232
culture: 1229
trois: 1191
nicolas: 1155
sports: 1142
été: 1078
encore: 1077
nouveau: 1074
toutes: 1065
qu’il: 1055
mercredi: 1054
recherche: 1039
gouvernement: 1008
economie: 993
sport: 988
samedi: 987
mardi: 987
international: 987
toute: 967
blogs: 957
ligue: 942
presse: 940
moins: 930
droits: 930
archives: 927
services: 923
nouvelle: 921
football: 921
notre: 916
ecrit: 914
alors: 909
autres: 903
avant: 888
millions: 883
jours: 878
jeudi: 872
dimanche: 843
suite: 834
dernier: 826
toujours: 823
musique: 822
reagir: 818
article: 813
ligne: 812
passe: 811
apres: 805
immobilier: 800
après: 786
vendredi: 781
emploi: 774
journal: 770
internet: 770
accueil: 770
parti: 767
savoir: 764
l’article: 763
leurs: 734
temps: 718
infos: 715
quand: 711
gauche: 709
bourse: 707
liens: 705
conditions: 702
personnes: 697
reuters: 696
president: 696
images: 663
n’est: 658
resultats: 656
retour: 641
dossiers: 641
semaine: 628
crise: 627
président: 626
programmes: 624
service: 623
centre: 612
grand: 609
royal: 601
forums: 596
compte: 592
ainsi: 592
notamment: 591
droit: 591
quelques: 589
justice: 589
actualite: 588
societe: 586
place: 586
reste: 585
lundi: 578
sites: 577
europe: 577
s’est: 575
commentaires: 574
signaler: 567
point: 567
envoyer: 563
voyages: 559
nouvelles: 549
coupe: 548
francais: 545
vacances: 537
forum: 536
annonces: 536
plusieurs: 534
actualités: 531
résultats: 528
septembre: 523
match: 523
cours: 522
saison: 519
euros: 519
livres: 518
incendies: 516
prendre: 513
devant: 513
demande: 510
minutes: 509
pouvoir: 508
derniers: 502
contacts: 499
classement: 499
rugby: 493
français: 492
barre: 492
afrique: 490
copyright: 487
devrait: 484
être: 476
high-tech: 475
loisirs: 474
actualite: 472
nokia: 471
pourrait: 467
produits: 466
juillet: 466
trouve: 465
newsletters: 464
photo: 460
publicite: 458
photos: 457
entreprises: 455
question: 454
raymond: 453
santé: 451
autre: 449
moment: 445
contact: 445
people: 444
même: 444
matin: 442
magazine: 440
direct: 435
aurait: 434
partenaires: 430
otages: 428
hollande: 428
turquie: 427
dossier: 426
fusion: 422
index: 421
a-t-il: 421
seront: 417
rencontres: 417

over 9 years ago on September 1 at 4:02 pm by Joseph Perla in hacks


Howdy, my name is Joseph Perla. Former VP of Technology, founding team, Turntable.fm. Entrepreneur. Actor. Writer. Art historian. Economist. Investor. Comedian. Researcher. EMT. Philosophe

Twitter: @jperla

Subscribe to my mailing list

* indicates required

Favorite Posts

Y Combinator Application Guide
What to do in Budapest
How to hack Silicon Valley, meet CEO's, make your own adventure
Your website is unviral
The Face that Launched a Thousand Startups
Google Creates Humanoid Robot, Programs Itself

Popular Posts

How to launch in a month, scale to a million users
Weby templates are easier, faster, and more flexible
Write bug-free javascript with Pebbles
How to Ace an IQ Test
Capturing frames from a webcam on Linux
A Clean Python Shell Script
Why Plant Rights?

Recent Posts

Working Copy is a great git editor
Venture Capital is broken
The nature of intelligence: brain bowls, cogniphysics, and prochines
Bitcoin: A call-to-arms for technologists
Stanford is startups
Today is Internet Freedom Day! DRM-free book about Aaron Swartz's causes

More...