All posts by Jack Schultz

Asking for help

Generally, and I’m pretty sure exclusively  so far on this blog, I’ve written about technical topics. This time though, I’m going to switch it up and write about an issue that I’ve been dealing with at work — asking for help.

Being a relatively inexperienced developer working on a dinosaur of an app (still running rails 2), I run across a bunch of different issues everyday. Most of these issues could be solved by the more experienced devs in minutes. Turns out it helps to have years of experience working with the app in question. My issue is to try to balance the amount of headbanging I go through against how annoying it is for those guys to stop what they’re doing and help me.

Now there isn’t an exact answer to this question, but I have come up with a one way to know if it is finally time to call in reinforcements if I can’t come up with the solution on my own (which is still the desirable outcome).

All I do is imagine that the person who I’m going to ask the questions is already standing next to me and I have to justify all of the things I tried and make sure that I have an answer to all of their questions. Obviously this doesn’t always work. If the issue is truly one that you don’t know, you can’t phantom prepare for, you aren’t going to get it right. But making sure you go through a phantom question session makes sure you’ve covered all your bases, as well as letting the other guy know that you’ve given a good effort.

 

Favorite Vim Commands

My current vim syntax highlighting (maximum awesome) shows a few things that my coworker’s editors don’t. Mainly showing tabs that other’s have typed (since maximum awesome expands tabs to spaces so they won’t show up) and trailing whitespace. I’m a huge fan of this since it makes sure that the code I’m writing doesn’t have unnecessary characters.

But working on old code that’s been written and rewritten by a bunch of different people who really had no interest in keeping to syntax standards, these show up as ugly splotches on what should be a clean vim window. Luckily, as is often the case, there are two vim commands that can turn a mess of a file into something a little better.

:retab

Retab works to change all the tabs in the file, and “redoes” them in whatever format defined in your vimrc. If you’re using maximum awesome, that would be :expandtab, which, like I mentioned, expands a tab into the corresponding number of spaces (which is set to 2 in maximum awesome). So :retab changes all the garbage tabs into 2 spaces.

:%s/\s\+$//

A little funking syntax, but this command (found by googling) finds and removes all the trailing whitespace.

With these two commands, and a little more formatting to make sure that the indentions are correct, a file that looks confusing because of poor syntax becomes something much more understandable. And it helps out everyone in the future who has to work on the file. Both outcomes make it worth the time and effort.

Deploying a node application to Nodejitsu

tl;dr – Nodejitsu + custom MongoHQ database are really good to get something out the door.

Running and dealing with your own server is, let’s be honest, a pain in the ass. I don’t want to be dealing with that initially (maybe ever really) when all I’m trying to do is test out an application. With all this in mind, I looked into using a PaaS to deploy my little application I’ve been working on.

Relatively arbitrarily, I ended up going with Nodejitsu (over others listed here because, why not. I needed to pick one and this seemed decent. I also like that it was node specific, unlike Heroku for example. Means that they care only about node applications. And at least off the bat, I’m happy with that decision.

Following this guide, I was up and running in under 10 minutes, most of that time being server setup time on their end. I was able to pick a subdomain, and hit my landing page and see the app deployed on the web. Considering I had never used a PaaS before, I was more than happy with the result. Unfortunately, I hadn’t set up a database yet, so actually registering wasn’t an option.

Nodejitsu provides a database creation service from both their command line interface, as well as a web interface. To start, I created a mongohq db from the web, and copied the uri into my configuration file. I’ll take a second to say that I was also surprised at how the NODE_ENV variable was already set to production. So since I had my config file setup, I didn’t have to do anything special. Also, Nodejitsu has an interface where you’re able to set any environment variables you want from their website.

I tried connecting to their database using the credentials that Nodejitsu gave me, but the server seemed unresponsive. After trying a few different ways to connect, I ditched going through their db management, and went straight to Mongohq itself.

Over there, I was able to create a database and a prod_test user along with a password. I was able to connect to that db right away with those credentials, so I put the new uri into my config file, redeployed, and 20 seconds later, I was interacting with a full database!

So with about 30 minutes of fussing around, I had a fully db backed node application running on the internet. This is way less time and complication than I was expecting considering I had tried to run my own server up until now.

Environments in a node.js application

One of the important issue in all web applications is having different environments for different uses. In particular, you’ll probably want to have different environments for development, testing, and production. I ran into this issue when I first started writing tests for an app I’m working on (sidenote: turns out testing is fun). At the time, the database connection was going to the development db, which isn’t what you want for contained testing. After searching around for a while, and after finding a bunch of different ways to handle the configuration, I settled on the following implementation.

var production = {
    database: {
         url : 'mongodb://localhost/worldpic'
     }
};
var development = {
    database: {
      url : 'mongodb://localhost/worldpic'
    }
};
var test = {
    database: {
        url : 'mongodb://localhost/worldpic_test'
    }
};
module.exports = function () {
     switch(process.env.NODE_ENV){
        case 'production':
             return production;
        case 'development':
             return development;
        case 'test':
             return test;
        default:
             return development;
     }
};

This file I called config.js and I put that in the config folder (so config/config.js). The other thing you could do, and something that I might in the future, is split the environment variables into their own files in the config folder and require them in the config.js file.

With this done, you can get a config object by running in app.js (or server.js if that’s your style)

var Config = require('./config/config')
var config = new Config();

From there, you can access the desired database url by running

config.database.url

The same syntax would be used for any environment variable that you might want, such as external api keys.

The last part is actually setting the NODE_ENV variable. This is done from the command line when you run node or mocha (for testing)

NODE_ENV=test mocha

for example, or

NODE_ENV=development npm start

You don’t actually need to specify the env if you’re using development since it defaults to that, but being specific is always good.

And that’s it! Structured configuration for node that’ll make your life way easier.

Rebooting the blog!

After a considerable amount of time off, I figured it was time to reboot the old blog. Unfortunately, in the time since, I ended up shutting down the server that was hosting it without saving my previous posts. At that time, I had probably accumulated around 20-30ish posts, a considerable loss. Luckily, I was able to search web.archive.org and found that it had done two archives of the old site! So all the posts below are copied an pasted from the archive and were able to retain most of their formatting.

Unfortunately, I know there are some posts that web archive missed, both some from the beginning of when I started writing and a few from the end. (I know I did one analysis about home court advantage in the NBA which was unfortunately deleted. Maybe someday I’ll re-write that, but I do remember it was about 3 points). Not much to do about the missing posts, but at least I could recover some.

Another note, there’s a good chance that some of the links on the older posts are no longer working. Streakflow and Yttogether are both offline as of now, but the code (as naiive as it is) is still up at my github account.

With that, welcome back, and hopefully I can provide some new insights for the future.

The Importance of sentences in articles

(Originally posted August 28, 2013)

I’ve always been interested in Natural Language Processing (NLP), so I wanted to try my hand at a simple article summarizer. The basic idea is that we want to boil down the article to only its most important sentences. Disregard the fluff, and return the ones with the most information. Sounds simple, but the determination wasn’t exactly obvious off the bat. Even trying to rank sentences on my own was tough. After a few hours of research, I came across this post which had a very clever way of determining the important sentences. The important sentences in the article should be those who share the most words with other sentences. To get an idea of this, we realize that an important sentences should have information, and supporting sentences should explain the parts of the main one.

To calculate this, we create a connected graph between sentences where each link is the number of words in common between the sentences, normalized by length. We represent this graph as a matrix and simply loop through the sentences and compare the words. This is the naive approach that the post’s author makes, but he also gives a few suggestions for improvement, such as stemming the words and removing stopwords. Stemming deals with removing pluralizations and other non-root endings to words. For example stemming roots turns to root etc. Stopwords are just common words such as ‘and’ or ‘or’ which shouldn’t provide much information about the topic of the sentence. For these techniques, python’s Natural Language Toolkit is fantastic and provides this out of the box.

After ranking all the sentences, the final step is to determine how to determine how to display the shortened article. The way the post’s author did this was by picking the best sentences from each paragraph. I wanted to be able to shorten the length arbitrarily, so I decided to, at least at the moment, display the most informative X sentences in the order they were written, where X is arbitrary.

At the moment, there are still many improvements to be done. The algorithm does well for those “stock” articles with just information. Opinion pieces are a little tougher to boil down to just the main points. By modifying some of the pieces or the ranking algorithm, it should be able to perform well no matter what the content.

Edit:

After running the above article through the algorithm, I got the following 7 sentences:

Sent 1: The basic idea is that we want to boil down the article to only its most important sentences.
Sent 5: After a few hours of research, I came across this post which had a very clever way of determining the important sentences.
Sent 6: The important sentences in the article should be those who share the most words with other sentences.
Sent 7: To get an idea of this, we realize that an important sentences should have information, and supporting sentences should explain the parts of the main one.
Sent 8: To calculate this, we create a connected graph between sentences where each link is the number of words in common between the sentences, normalized by length.
Sent 9: We represent this graph as a matrix and simply loop through the sentences and compare the words.
Sent 15: After ranking all the sentences, the final step is to determine how to determine how to display the shortened article.
Sent 16: The way the post s author did this was by picking the best sentences from each paragraph.

Not bad, but could probably do a little better. We’ll see how it goes.

Angular.js Show and simplicity

(Originally posted August 22, 2013)

When mapping Streakflow to a mobile app, I decided to use Trigger.io and their forge toolchain. Considering I’ve never done mobile development, and that I wanted to deploy on ios and Andriod out of the box, and there have been many successful apps using Trigger.io, it felt like a great fit. In doing this, it meant that I needed to learn a javascript mvc framework. There were a few choices, but because I’ve been hearing so much buzz about Angular.js, and it turned out to be a great decision.

Among other niceties, Angular’s ng-show directive is particularly simple to work with. In the app, I’ve been using it as a makeshift if statement in the templates, though it turns out to work nicer. I’ll go through two different way’s I’ve used ng-show in the app.

The first example is the simplest, and will show how easy ng-show is. As a side note, I didn’t think this method would work when I saw the example online.

<div ng-show="var_in_scope">
  <h1>Variable is true!</h1>
</div>

When this variable changes in your controller, either

//h1 will be visible
$scope.var_in_scope = true;
//h1 invisible
$scope.var_in_scope = false;

The div in the html will flash off and on. That’s all there is to it. Note that the variable in the html does not need the brackets since it is in a angular directive.

The other way I’ve used ng show is by calling a function. The syntax is exactly the same, but it just calls the function from the directive.

<div ng-show="var_in_scope()">
  <h1>Variable is true!</h1>
</div>

and the javascript this time is

$scope.var_in_scope = function() {
  var random = Math.random();
  if (random > 0.5) return true;
  else return false;
}

Since ng-show evaluates the expression, and if it comes back “truthy” show the html, this works as well. Very simple and clear to anyone reading the code.

Django and Celery

(Originally posted August 19, 2013)

While working on Streakflow, it became evident quickly that email reminders would be important. Not only because they would hopefully bring back users, but also because a little nagging is a good way to get people to finish goals. This email system should email users daily, at a time selected by them. This is exactly the type of thing that is perfect for Celery. In this case, we’re going to be using the scheduling portion of Celery. Jumping in to something new and unknown may be overwhelming, but I’ll show you how simple it is to configure Celery to use in Django, and run in production.

The first step is to install django-celery,

$ pip install django-celery

Then add djcelery to your installed apps,

INSTALLED_APPS += ("djcelery", )

And then migrate your database (assuming you’re using South, which you should):

$ python manage.py migrate djcelery

Finally, add the following three lines to your settings.py file.

import djcelery
djcelery.setup_loader()

That’s as far as the django-celery docs front page gets you, and it really is (almost) that simple. From here, we want to add, functionality for scheduled tasks. We’ll deal with the code first, and the production set up at the end.

Back to your settings.py file and add the following.

BROKER_URL = 'amqp://guest:guest@localhost:5672//'

CELERY_TIMEZONE = 'UTC'
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
    'check-goals-complete': {
        'task': 'streakflow.apps.members.tasks.reminder_emails',
        'schedule': crontab(minute='*/30'),
    },
}

The broker_url is what celery uses to connect to the broker (something like Redis or RabbitMQ). We’ll get to that later.

CELERY_TIMEZONE is used so celery can keep track of the time correctly. No reason here to not have it be utc.

We then import corntab for use in the schedule. Here we do two things. We give the scheduled task an arbitrary name, and then tell celery where it should look for the task, and on what schedule it should. run the code. The ‘task’ parameter is in similar vein of imports, and the crontab definition here is set up for every 30 minutes because some timezones are at half hour offsets. This code is very much straight from the docs.

In our tasks.py file where we define the task, we just need to make sure that we have the following pieces.

from celery import task

@task
def reminder_emails():
  ...
  ...
  send_email(.....)

The logic would replace the …s. As long as this code matches the path from settings, you’ll be fine.

At this point, that’s all the code/setup you need, so now we’re going to shift over to getting this thing to run in production. For me, I used a smallest instance at Digital Ocean. They’re ve 1ff8 ry cheap, and also come with many tutorials to get you started, which is fantastic. Highly recommended. After firing up a default Ubuntu box and installing everything else, which isn’t the topic of this post, we want to install the queue.

$ sudo apt-get install rabbitmq-server

All we want to do with this is make sure that it is running.

$ sudo rabbitmqctl status

It should be running, but if it isn’t, you can always,

$ sudo rabbitmqctl start

To monitor the services, I like to use supervisor. It’s intuitive and simple to hook up to all Django functions like gunicorn. For celery, we’re going to use two supervisor programs. This is because we need to use celery beat for the scheduler, as well as have a celery worker to execute the code. The code both will be almost identical. For each, we need two pieces of code. One is a bash script that contains the code that we want to run when we start the celery process. This looks like the following for the worker.

#!/bin/bash

source /path/to/env/bin/activate
cd /path/to/django/proj/
exec python manage.py celery worker --loglevel=INFO

This activates the virtualenv, changes directories to where manage.py is, and runs the celery command through manage.py like it says in the docs. The only difference for the beat, is that we want to execute “celerybeat” instead of “celery worker”. Make sure to chmod u+x this script!

Finally, we want to create a supervisor conf file for each. The location for this should be in /etc/supervisor/conf.d/ along with your supervisor conf file for gunicorn if you’re using that too.

[program:celery_worker]
command=/path/to/bin/celery_worker_start
stdout_logfile=/path/to/logs/celery_worker.log
redirect_stderr=true

Again, the difference for celery beat is just changing the worker to beat and making sure they match up.

To load the celery things into supervisor,

$ sudo supervisorctl reread
$ sudo supervisorctl update

You can then check the status of both of them by running

$ sudo supervisorctl status

At this point you should see both of them running! If you check the logs, you can see that the celery beat program is waking up and checking how things are, and then sleeping until it is time to run the mailer. The celery worker just sits there until it gets a task. Anything that you print from the task will be able to be seen in the log. Also, you can change the log level from INFO, to DEBUG if you want more information.

Unexpected outcomes

(Originally posted August 16, 2013)

Just a quick note on unexpected outcomes. The NFL’s collective bargaining agreement with the player’s union changed so that there are fewer offseason requirements for workouts. This was meant to give the players more rest and time off. Seems like a great idea especially considering how violent the NFL is and how long lasting many of the injuries are.

Now obviously correlation is not causation, and I wasn’t able to find statistics, but right at the beginning of training camp there were, according to NFL reporters, an inordinate amount of ACL injuries. The reporters, many of whom are former players, felt that part of this may have come from the players’ tendons not being warmed up enough to handle the stress of football since they had been away from it for longer than normal. In the second week, theses injuries seems to have wound down, which fits into the theory because they players now have had time to adjust.

Now there’s no real way to check the veracity of this since it is only one year and statistical outliers can occur at any time, but it is an interesting reminder to realize that good intentions can sometimes have unintended consequences.

Landing pages — Show don’t Tell

(Originally posted August 15, 2013)

Effective landing pages should perform two functions. The first, and obvious one, is to convince the visitor that the service is worthwhile enough to go through the process of signing up. The other function that it should perform is to show the user that the interface is simple and manageable. This is slightly less obvious, but just as important. No matter how interesting and awesome the service might be, if it is complicated, no one will want to use it.

I’m going to assume that you can all write the amazing copy filled with buzzwords and flashy text that convinces the user that what you made is important. As for showing that the interface is simple, this is a lot tougher.

The most common method is using images that show off the interface. Whether it’s a large image at the top of the page, or little images that coincide with specific pieces of text, images do a fine job of showing the app off. But static is boring, and you can do better.

Another method that is seen is a video that walks th 1ff8 rough the important points of the app. This is an improvement over static because it is less work for the user since all they have to do is sit and watch, and it allows you to curate exactly what you want the user to see. But getting a visitor to click on the video, especially with sound, is not easy.

A combination of the first two options is to use a series of images that show the usage of the app, but without the sound. This is becoming more popular recently as it still captures the visitor’s attention, and is already playing when they scroll to that location. With this, you can show off the flow of the app without being so intrusive as a video would be, and it doesn’t require the user to fire up the video player.

There are all good options, but they still only tell the visitor about the service. You want to showthem.

When I was building Streakflow, I needed another way to convince new visitors that the service is worthwhile. At first, I tried and went through all of the methods I mentioned above. But none of those could really capture the essence of how to use the site. I realized that instead of just trying to explain the app, I could let them do the discovery themselves. It seemed simple especially considering that main functionality was on one page. The result of that is the following demo.

The demo does two things. It shows the simplicity of the interface, as well as being a slight teaser that makes the visitor want to sign up and use it for real. Plus, it has a game like interface which is always a plus for the user.

The entire thing is a state machine built in client side javascript. There’s not backend, and all actions are kept locally. And with that code, it brings the app to life, without the user having to do anything initially.

One page apps are pretty simple to envision a demo, but multipage or more complicated apps my have a harder time. Twitter could have a split screen where you have two users. You would be able to follow and unfollow and tween fake things. If you’re following the other guy, then you’ll see their tweets in your feed. unfollow and they won’t show up. It’s simple, shows the functionality, and would make you want to do it for real.

You could even get more complicated, but still have the coolness factor. Something like Foursquare could have a mini city where you move your stick avatar to different places and check in there. This shows the functionality, and would be kind of fun, and again, makes you want to do it for real.

When you don’t have traction, doing what ever you can to get the visitor over the hill and signed up is important. The standard way of doing this in the landing page is by using writing, and images. Telling doesn’t do anything. Showing, by having an interactive aspect can be very beneficial.