Ben McCann

Co-founder of Connectifier.
Investor at C3 Ventures.
Google and CMU alum.

Ben McCann on LinkedIn Ben McCann on AngelList Ben McCann on Twitter

SSL on localhost with nginx

11/14/2011

Install nginx if it’s not already installed:

sudo apt-get install nginx

You must have the SSL module installed. The nginx docs say this is not standard. However, it does come installed on Ubuntu. You can verify by running nginx -V and looking for --with-http_ssl_module.

Next up is generating the SSL certs. Follow the Slicehost docs for this step.

Now you’ll need to update your /etc/nginx/nginx.conf file:

  upstream backend {
    server 127.0.0.1:9000;
  }

  server {
    server_name www.yourdomain.com yourdomain.com;
    rewrite ^(.*) https://www.yourdomain.com$1 permanent;
  }

  server {
    server_name local.yourdomain.com;
    rewrite ^(.*) https://local.yourdomain.com$1 permanent;
  }

  server {
    listen               443;
    ssl                  on;
    ssl_certificate      /etc/ssl/certs/myssl.crt;
    ssl_certificate_key  /etc/ssl/private/myssl.key;
    keepalive_timeout    70;
    server_name www.yourdomain.com local.yourdomain.com;
    location / {
      proxy_pass  http://backend;
    }
  }

Then restart nginx:

sudo nginx -s reload

Finally, in /etc/hosts put:

127.0.0.1   local.yourdomain.com

This will allow you to visit https://local.yourdomain.com/ which will be served up by the server that you have running on port 8080.

Embedded Tomcat

08/28/2011

Earlier in the year, I posted a quick writeup on how to run an embedded Jetty instance. Today, I’m posting basically the same code showing how to run an embedded Tomcat instance. The embedded Tomcat API is much nicer since it matches closely the web.xml syntax. However, the embedded Tomcat instance takes much longer to startup.

package com.benmccann.webtemplate.frontend.server;

import java.net.URL;

import org.apache.catalina.Context;
import org.apache.catalina.core.AprLifecycleListener;
import org.apache.catalina.core.StandardServer;
import org.apache.catalina.deploy.FilterDef;
import org.apache.catalina.deploy.FilterMap;
import org.apache.catalina.startup.Tomcat;
import org.apache.struts2.dispatcher.ng.filter.StrutsPrepareAndExecuteFilter;

import com.beust.jcommander.JCommander;
import com.google.inject.Guice;
import com.google.inject.Inject;
import com.google.inject.Injector;
import com.google.inject.servlet.GuiceFilter;

/**
 * @author Ben McCann (benmccann.com)
 */
public class WebServer {

  private final FrontendSettings webServerSettings;
  private final GuiceListener guiceListener;
  private final Tomcat tomcat;

  @Inject
  public WebServer(
      FrontendSettings webServerSettings,
      GuiceListener guiceListener) {
    this.webServerSettings = webServerSettings;
    this.guiceListener = guiceListener;
    this.tomcat = new Tomcat();
  }

  private FilterDef createFilterDef(String filterName, String filterClass) {
    FilterDef filterDef = new FilterDef();
    filterDef.setFilterName(filterName);
    filterDef.setFilterClass(filterClass);
    return filterDef;
  }
  
  private FilterMap createFilterMap(String filterName, String urlPattern) {
    FilterMap filterMap = new FilterMap();
    filterMap.setFilterName(filterName);
    filterMap.addURLPattern(urlPattern);
    return filterMap;
  }
  
  public void run() throws Exception {
    String appBase = ".";
    tomcat.setPort(webServerSettings.getPort());

    tomcat.setBaseDir("webapp");
    tomcat.getHost().setAppBase(appBase);

    String contextPath = "/";

    // Add AprLifecycleListener to give native speed boost
    // sudo apt-get install libtcnative-1
    StandardServer server = (StandardServer)tomcat.getServer();
    AprLifecycleListener listener = new AprLifecycleListener();
    server.addLifecycleListener(listener);

    Context context = tomcat.addWebapp(contextPath, appBase);
    context.addFilterDef(createFilterDef("guice", GuiceFilter.class.getName()));
    FilterDef struts2FilterDef = createFilterDef("struts2",
        StrutsPrepareAndExecuteFilter.class.getName());
    struts2FilterDef.addInitParameter("struts.devMode",
        Boolean.toString(webServerSettings.isDevModeEnabled()));
    context.addFilterDef(struts2FilterDef);
    context.addFilterMap(createFilterMap("guice", "/*"));
    context.addFilterMap(createFilterMap("struts2", "/*"));
    
    tomcat.start();
    tomcat.getServer().await();
  }

  public static void main(String[] args) throws Exception {
    FrontendSettings webServerSettings = new FrontendSettings();
    new JCommander(webServerSettings, args);
    
    Guice.createInjector(new FrontendModule(webServerSettings));
    
    Injector injector = Guice.createInjector();
    
    WebServer server = injector.getInstance(WebServer.class);
    server.run();
  }

}

Installing CUDA 5.0 and Theano on Ubuntu 12.04 Precise

07/09/2011

Theano is a very interesting Python library developed mainly for deep learning, which can run calculations on some NVIDIA GPUs by using the CUDA library.  Setting up Theano to use the GPU can be a little tricky and take a bit of work.

Install the pre-reqs

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

Next, create a symlink to libglut, which will allow you to install the CUDA samples as described on Utkarsh Jaiswal’s blog

sudo ln -s /usr/lib/x86_64-linux-gnu/libglut.so.3 /usr/lib/libglut.so

Install CUDA
Download CUDA from the NVIDIA site and then install it:

sudo apt-get remove --purge nvidia*
chmod +x cuda_5.0.35_linux_64_ubuntu11.10-1.run
sudo service lightdm stop
sudo ./cuda_5.0.35_linux_64_ubuntu11.10-1.run

Install Theano

Get the latest released version of Theano:

sudo apt-get install python-dev libopenblas-dev liblapack-dev gfortran
sudo pip install --upgrade Theano

Create a ~/.theanorc file to enable the GPU:

[global]
floatX = float32
device = gpu

Test it out

Now run the sample program under “Testing Theano with GPU” in the Theano tutorial. It will hopefully tell you that it used your GPU.

A good benchmark to test out the speed of your setup is to run /usr/local/lib/python2.7/dist-packages/theano/misc/check_blas.py

Credits

Thanks to the Theano developers for providing this awesome library and to Andrew Ng, Samy Bengio, and the other Googlers who have been taking their time to teach the rest of us more machine learning concepts.

Getting started with Git

04/11/2011

I’ve recently started using Git, which I’ve found I much prefer to Subversion for two reasons. The first is that it’s really fast since almost all commands are run locally. The second reason is that Subversion litters your source code with .svn directories and should you accidentally delete or move one then you’re in for a world of hurt. Git also handles ignored files in a much easier manner.

There are two downsides with Git. The first is that there’s no central server to store the code base. GitHub or BitBucket can fulfill this role if you don’t mind someone else hosting your source code. If you want to set up a central server yourself it seems the best solution is gitolite. The documentation isn’t for beginners, but I found a decent tutorial on setting up gitolite.

The other downside with git is that the commands can be a bit bizarre.

git aliases

You can set aliases using git config --global.  E.g. git config --global alias.dt "difftool --no-prompt" makes git dt act the same as git difftool --no-prompt. These aliases are saved in ~/.gitconfig. My ~/.gitconfig looks like:

[user]
	name = Ben McCann
	email = ben@benmccann.com
[alias]
	cam = commit -am
	dt = difftool --no-prompt
	dtm = !meld .
	pending = !clear & git status
	rev = checkout --
	revall = reset --hard HEAD
[push]
	default = current

Reverting to a previous version

$ git reset --hard YOUR_CHANGESET_HERE
$ git reset --soft @{1}
$ git commit -a

Sed Cookbook

03/31/2011

The Linux sed command is a stream editor. What that means is basically that you can do a regex operation on each line of a file or a piped stream. You can also use perl like sed.

Sed does not use the extended regex syntax. Sed regex reminders:

  • You need a backslash before parens in a regex grouping
  • You refer to matched regex groups using \1, \2, etc.
  • The + regex operator does not work
  • Non-greedy quantifiers don’t work.  For example, .*? will not work
  • The output is printed to standard out by default.  You need the -i option if you want to edit a file with sed.

Remove all but the first column in a .tsv stream

sed 's/\([^\t]*\).*/\1/'

Edit a .tsv file by removing all but the first column

sed -i 's/\([^\t]*\).*/\1/'

Remove the first line of a stream

sed '1d'

Strip trailing whitespace from a file

sed -i -e 's/ *$//'

Recursively replace tabs with spaces

grep -Plr '\t' src/ | xargs sed -i 's/\t/  /g'

Replace @inheritDoc with @override after marking for edit

grep -l -r @inheritDoc java/com/benmccann | xargs p4 edit
grep -l -r @inheritDoc java/com/benmccann | xargs sed -i 's/\(.*\)@inheritDoc/\1@override/g'

Replace @inheritDoc with @override in JS files after marking for edit

find java/com/benmccann -name '*.js' -print0 | xargs -0 grep -l @inheritDoc | xargs p4 edit
find java/com/benmccann -name '*.js' -print0 | xargs -0 grep -l @inheritDoc | xargs sed -i 's/\(.*\)@inheritDoc/\1@override/g'

Using the Guice Struts 2 plugin

03/29/2011

Guice 3.0 was released a few days ago!  One of the easiest ways to use it in your web server is to use Struts 2 with the Struts 2 plugin, which is available in the central Maven repository.

This tutorial assumes familiarity with Guice and Struts 2.

In order to use it the plugin, your injector must be created with a Struts2GuicePluginModule:

Injector injector = Guice.createInjector(
    new com.google.inject.servlet.ServletModule(),
    new com.google.inject.struts2.Struts2GuicePluginModule(),
    new MyModule());

You must then define a GuiceServletContextListener to provide the injector to the Struts 2 plugin. I injected the Injector because I’m using embedded Jetty. However, if you’re using a standard servlet container, you’d probably just create the injector in the class itself.

package com.benmccann.example;

import com.google.inject.Inject;
import com.google.inject.Injector;
import com.google.inject.servlet.GuiceServletContextListener;

/**
 * @author benmccann.com
 */
public class GuiceListener extends GuiceServletContextListener {

  private final Injector injector;

  @Inject
  public GuiceListener(Injector injector) {
    this.injector = injector;
  }

  @Override
  public Injector getInjector() {
    return injector;
  }

}

You must then wire it up in your web.xml:

  <listener>
    <listener-class>com.benmccann.example.GuiceListener</listener-class>
  </listener>  

  <filter>
    <filter-name>guice</filter-name>
    <filter-class>com.google.inject.servlet.GuiceFilter</filter-class>
  </filter>

  <filter-mapping>
    <filter-name>guice</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

There’s also an example in the Guice source code repository.

Enjoy!

Latent Dirichlet Allocation with Mallet

03/10/2011

We recently had a PhD candidate from UCI come in and speak to the AI club at Google Irvine to speak about her research on Latent Dirichlet Allocation (LDA). LDA is a topic model and groups words into topics where each article is comprised of a mixture of topics. I was interested to play around with this a bit, so I downloaded Mallet and wrote up some quick code to try making my own LDA model.

package com.benmccann.topicmodel;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.TreeSet;

import cc.mallet.pipe.CharSequence2TokenSequence;
import cc.mallet.pipe.Pipe;
import cc.mallet.pipe.SerialPipes;
import cc.mallet.pipe.TokenSequence2FeatureSequence;
import cc.mallet.pipe.TokenSequenceLowercase;
import cc.mallet.pipe.TokenSequenceRemoveStopwords;
import cc.mallet.pipe.iterator.ArrayIterator;
import cc.mallet.topics.ParallelTopicModel;
import cc.mallet.types.Alphabet;
import cc.mallet.types.IDSorter;
import cc.mallet.types.InstanceList;

import com.google.inject.Guice;
import com.google.inject.Inject;
import com.google.inject.Injector;

public class Lda {

  @Inject private com.benmccann.topicmodel.TextProvider textProvider;

  InstanceList createInstanceList(List<String> texts) throws IOException {
    ArrayList<Pipe> pipes = new ArrayList<Pipe>();
    pipes.add(new CharSequence2TokenSequence());
    pipes.add(new TokenSequenceLowercase());
    pipes.add(new TokenSequenceRemoveStopwords());
    pipes.add(new TokenSequence2FeatureSequence());
    InstanceList instanceList = new InstanceList(new SerialPipes(pipes));
    instanceList.addThruPipe(new ArrayIterator(texts));
    return instanceList;
  }

  private ParallelTopicModel createNewModel() throws IOException {
    List<String> texts = textProvider.getTexts();
    InstanceList instanceList = createInstanceList(texts);
    int numTopics = instanceList.size() / 5;
    ParallelTopicModel model = new ParallelTopicModel(numTopics);
    model.addInstances(instanceList);
    model.estimate();
    return model;
  }

  ParallelTopicModel getOrCreateModel() throws Exception {
    return getOrCreateModel("model");
  }

  private ParallelTopicModel getOrCreateModel(String directoryPath)
      throws Exception {
    File directory = new File(directoryPath);
    if (!directory.exists()) {
      directory.mkdir();
    }
    File file = new File(directory, "mallet-lda.model");
    ParallelTopicModel model = null;
    if (!file.exists()) {
      model = createNewModel();
      model.write(file);
    } else {
      model = ParallelTopicModel.read(file);
    }
    return model;
  }

  public void printTopics() throws Exception {
    ParallelTopicModel model = getOrCreateModel();
    Alphabet alphabet = model.getAlphabet();
    for (TreeSet<IDSorter> set : model.getSortedWords()) {
      System.out.print("TOPIC: ");
      for (IDSorter s : set) {
        System.out.print(alphabet.lookupObject(s.getID()) + ", ");
      }
      System.out.println();
    }
  }

  public static void main(String[] args) throws Exception {
    Injector injector = Guice.createInjector();
    Lda lda = injector.getInstance(Lda.class);
    lda.printTopics();
  }

}

One of the things I found interesting was that you have to specify a number of topics. This is where the ‘art’ of machine learning comes in. With some training data this parameter could be tuned to perform better than my random guesses.

Remote Java debugging in Eclipse

03/08/2011

To debug a Java program being run on the command line from Eclipse you can start the Java program in remote debugging mode:

java -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y -jar myProgram.jar

The program will wait for you to attach the Eclipse debugger to it. Open Eclipse and choose:

Run > Debug Configurations... > Remote Java Application > New

Make sure to enter the same port that you chose on the command line. The default is port 8000. Now hit “Debug” and you’re off!

Security Lockdown for Linux

02/11/2011

Automatic updates

If you’re using Ubuntu you can do this by editing /etc/apt/apt.conf.d/50unattended-upgrades. Running out of date packages with security holes is a good way to get your machine pwnd.

Remove unused software

Every piece of software installed on your system provides one more attack point for malicious users. You should inventory your system and remove anything you don’t need. E.g. to remove Ubuntu One from your system:

sudo apt-get purge ubuntuone*

Secure SSH

Edit /etc/ssh/sshd_config:

PermitRootLogin no
AllowUsers bmccann nx gitolite

You may also disable password authentication and replace it with public key authentication:

PasswordAuthentication no
PubkeyAuthentication yes

Restart the SSH daemon:

sudo service ssh restart

or

sudo /etc/init.d/ssh restart

This disallows login via password and instead replaces it with login via public/private key pair. To setup your public key encryption run ssh-keygen on the client and put ~/.ssh/id_rsa.pub from the client into ~/.ssh/authorized_keys on server.

Sometimes while messing around with SSH settings, you’ll lock yourself out. I this case it’s nice to use the -v option with the ssh client.

You can also setup shortcuts in ~/.ssh/config. E.g. the shortcut below turns ssh gitolite into an alias for ssh -l gitolite -p 77777 bensdynamicdns.getmyip.com.

Host gitolite
   User gitolite
   Hostname bensdynamicdns.getmyip.com
   Port 77777
   IdentityFile ~/.ssh/id_rsa

Secure NX

If you’d like to setup NX in a secure manner, you can follow these instructions.

Secure MySQL

Run mysql_secure_installation

Install fail2ban

  • Install fail2ban by running sudo apt-get install fail2ban, which will lockout users who repeatedly try to access your system by guessing passwords.
  • Make your own copy of the configuration file: sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
  • Check if fail2ban is running properly: sudo fail2ban-client status

More
Andrew Ault and CyberCiti wrote good articles as well.
The NSA has a comprehensive guide to securing a Linux system

Google GXP Struts 2 Plugin

02/02/2011
Google GXP is a replacement for JSP that provides compile-time type safety.  This article is a quick introduction on how to use GXP with Struts 2.
 
1. Download the jar.  It’s not in Maven yet because it’s still unreleased.
 
2. Install the jar in Maven or otherwise put it on your classpath.  You’ll also need the Google GXP jar and the Google Collections jar:
  <dependency>
    <groupId>com.google.gxp</groupId>
    <artifactId>gxp-plugin</artifactId>
    <version>2.2.2-SNAPSHOT</version>
    <scope>system</scope>
    <systemPath>${basedir}/lib/struts2-gxp-plugin-2.2.2-SNAPSHOT.jar</systemPath>
  </dependency>
  <dependency>
    <groupId>com.google.gxp</groupId>
    <artifactId>google-gxp</artifactId>
    <version>0.2.4-beta</version>
  </dependency>
  <dependency>
    <groupId>com.google.collections</groupId>
    <artifactId>google-collections</artifactId>
    <version>1.0</version>
  </dependency>

3. Call the GXP compiler. E.g.

java -cp lib/gxp-0.2.4-beta.jar com.google.gxp.compiler.cli.Gxpc --output_language java com/benmccann/example/web/gxp/*.gxp

4. Add a result type of gxp to your struts.xml:

  <package name="test" extends="gxp-default">
    <action name="TestAction" class="com.benmccann.example.web.action.TestAction">
      <result type="gxp">com/benmccann/example/web/gxp/Index.gxp</result>
    </action>
  </package>
Newer Posts
Older Posts