We often use asynchronous execution of writes to cassandra tables. These methods return as soon as the call to cassandra is made which makes testing them a pain. I was resorting to adding a half second sleep after the method call before verifying the result in an integration test, but that was very fragile and would fail occassionally in our CI envrionment.

Example Cassandra Method

1
2
3
4
5
6
7
8
9
void deleteSubscriptions(EventSource subscriberSource, String subscriberId) {
      log.debug("Deleting subscriptions by subscriber subscriberType: ${subscriberSource}, sourceId: ${subscriberId}")

      BoundStatement boundStatement = deleteBySubscriber.bind()
      boundStatement.setString("subscriberType", subscriberSource.name())
      boundStatement.setString("subscriberId", subscriberId)

      CQLSessionService.session.executeAsync(boundStatement)
}

Test Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    when: "the events are deleted"
    cassandraEventSubscriptionService.deleteSubscriptions(
            eventSubscription.subscriberType,
            eventSubscription.subscriberId)

    and: "the subscriptions are retrieved by source"
    // this may be brittle because the delete is happening asynchronously
    sleep(500)

    Observable<EventSubscription> deletedBySource = cassandraEventSubscriptionService.findAllBySource(
            eventSubscription.sourceType,
            eventSubscription.sourceId,
            eventSubscription.data)

    and: "the subscriptions are retrieved by subscriber"
    Observable<EventSubscription> deletedBySubscriber = cassandraEventSubscriptionService.findAllBySubscriber(
            eventSubscription.subscriberType,
            eventSubscription.subscriberId)

    then: "the persisted subscription list has the original event subscription persisted"
    deletedBySubscriber.toList().toBlocking().first() == []
    deletedBySource.toList().toBlocking().first() == [eventSubscription2]
}

Solution

I wanted a way to retry test conditions without retrying the entire test. At first I thought spock async conditions might help, but I misunderstood how they worked. So I decided to write a retriable condition.

This allows you to retry specific then conditionals. Pass a closure to the retry method along with a number of retries to execute and optionally an amount of time to pause between executions. The closure must make explicit calls to assert. If all attempts and trying the conditions fail, the final assertion failure will bubble up as the test failure reason.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
public class RetriableCondition {
  private final ConcurrentLinkedQueue<Throwable> exceptions = new ConcurrentLinkedQueue<Throwable>();

  public RetriableCondition() {
  }

  /**
  * Evaluates the specified block, which is expected to contain
  * one or more explicit conditions (i.e. assert statements).
  * Any caught exception will be rethrown.</tt>.
  *
  * @param block the code block to evaluate
  */
  @ConditionBlock
  public void retry(int times, int pauseInMilliseconds, Runnable block)
          throws SpockAssertionError, InterruptedException {
      for(int i=0;i<times;i++) {
          try {
              block.run();
              return;
          } catch (Throwable t) {
              exceptions.add(t);
              sleep(pauseInMilliseconds);
          }
      }

      String msg = String.format("Retried block %d times and it failed every time,", times);
      throw new SpockAssertionError(msg, exceptions.poll());
  }

  /**
  * Evaluates the specified block, which is expected to contain
  * one or more explicit conditions (i.e. assert statements).
  * Any caught exception will be rethrown.</tt>.
  *
  * @param block the code block to evaluate
  */
  @ConditionBlock
  public void retry(int times, Runnable block) throws SpockAssertionError, InterruptedException {
      retry(times, 0, block);
  }
}

Updated Test Condition

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 then: "the persisted subscription list has the original event subscription persisted"
        condition.retry(5) {
          Observable<EventSubscription> deletedBySource = cassandraEventSubscriptionService.findAllBySource(
                  eventSubscription.sourceType,
                  eventSubscription.sourceId,
                  eventSubscription.data)

          Observable<EventSubscription> deletedBySubscriber = cassandraEventSubscriptionService.findAllBySubscriber(
                  eventSubscription.subscriberType,
                  eventSubscription.subscriberId)

          assert deletedBySubscriber.toList().toBlocking().first() == []
          assert deletedBySource.toList().toBlocking().first() == [eventSubscription2]
        }

Example failure message

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| Running 1 integration test... 1 of 1
| Failure:  can save, find and delete cassandra subscriptions(physicalgraph.event.cassandra.CassandraEventSubscriptionServiceIntegrationSpec)
|  Retried block 5 times and it failed every time,
    at physicalgraph.RetriableCondition.retry(RetriableCondition.java:33)
    at physicalgraph.event.cassandra.CassandraEventSubscriptionServiceIntegrationSpec.can save, find and delete cassandra subscriptions(CassandraEventSubscriptionServiceIntegrationSpec.groovy:50)
Caused by: Condition not satisfied:

bySource.toList().toBlocking().first() != [eventSubscription, eventSubscription2]
|        |        |            |       |   |                  |
|        |        |            |       |   |                  physicalgraph.event.cassandra.EventSubscription(DEVICE, 123, DEVICE, 789, switch.on, methodToCall, true, 2015-06-26T15:27:29.606-05:00)
|        |        |            |       |   physicalgraph.event.cassandra.EventSubscription(DEVICE, 123, DEVICE, 456, switch.on, methodToCall, true, 2015-06-26T15:27:29.589-05:00)
|        |        |            |       false
|        |        |            [physicalgraph.event.cassandra.EventSubscription(DEVICE, 123, DEVICE, 456, switch.on, methodToCall, true, 2015-06-26T15:27:29.589-05:00), physicalgraph.event.cassandra.EventSubscription(DEVICE, 123, DEVICE, 789, switch.on, methodToCall, true, 2015-06-26T15:27:29.606-05:00)]
|        |        rx.observables.BlockingObservable@10089d0a
|        rx.Observable@54e33280
rx.Observable@5588048e

    at physicalgraph.event.cassandra.CassandraEventSubscriptionServiceIntegrationSpec.can save, find and delete cassandra subscriptions_closure1(CassandraEventSubscriptionServiceIntegrationSpec.groovy:59)
    at physicalgraph.RetriableCondition.retry(RetriableCondition.java:25)
    ... 1 more
| Completed 1 integration test, 1 failed in 0m 1s

This may or may not have been yak shaving, but I wanted to find out what the weather was like and I ended up getting groovy’s GVM and go’s GVM tools to work together.

Both GVMs default to being installed at ~/.gvm and define a bash function named gvm. There is an issue on github that recomends modifying the source of Go’s GVM but I wanted to be able to keep up to date with both tools without much (any) work.

Here’s how to do it:

  • I’m assuming you already have Groovy’s GVM installed locally already.
  • Download the installation script for Go locally. You’ll want to edit it to change the installation directory:
1
2
3
4
5
6
BRANCH=${1:-master}
GVM_DEST=${2:-$HOME}
GVM_NAME="govm"
SRC_REPO=${SRC_REPO:-https://github.com/moovweb/gvm.git}

[ "$GVM_DEST" = "$HOME" ] && GVM_NAME=".govm"
  • Run the installation script that you’ve saved locally.
  • Modify your ~/.bashrc to rename the gvm() function created by Go’s GVM before the gvm function is recreated by Groovy’s GVM.
1
2
3
4
5
6
7
8
9
10
11
copy_function() {
    declare -F $1 > /dev/null || return 1
    eval "$(echo "${2}()"; declare -f ${1} | tail -n +2)"
}

#THIS MUST BE AT THE END OF THE FILE FOR GVM TO WORK!!!
[[ -s "/home/vagrant/.govm/scripts/gvm" ]] && source "/home/vagrant/.govm/scripts/gvm"

copy_function gvm govm

[[ -s "/home/vagrant/.gvm/bin/gvm-init.sh" ]] && source "/home/vagrant/.gvm/bin/gvm-init.sh"

The copy_function() function came from stack overflow.

That’s it. Now you have gvm from groovy and govm for go.

Groovy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
vagrant@vagrant-ubuntu-trusty-64:~$ gvm

Usage: gvm <command> <candidate> [version]
       gvm offline <enable|disable>

   commands:
       install   or i    <candidate> [version]
       uninstall or rm   <candidate> <version>
       list      or ls   <candidate>
       use       or u    <candidate> [version]
       default   or d    <candidate> [version]
       current   or c    [candidate]
       version   or v
       broadcast or b
       help      or h
       offline           <enable|disable>
       selfupdate        [force]
       flush             <candidates|broadcast|archives|temp>

   candidate  :  asciidoctorj, crash, gaiden, glide, gradle, grails, griffon, groovy, groovyserv, jbake, lazybones, springboot, vertx
   version    :  where optional, defaults to latest stable if not provided

eg: gvm install groovy

go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
vagrant@vagrant-ubuntu-trusty-64:~$ govm
Usage: gvm [command]

Description:
  GVM is the Go Version Manager

Commands:
  version    - print the gvm version number
  get        - gets the latest code (for debugging)
  use        - select a go version to use
  diff       - view changes to Go root
  implode    - completely remove gvm
  install    - install go versions
  uninstall  - uninstall go versions
  cross      - install go cross compilers
  linkthis   - link this directory into GOPATH
  list       - list installed go versions
  listall    - list available versions
  alias      - manage go version aliases
  pkgset     - manage go packages sets
  pkgenv     - edit the environment for a package set

Background

A few weeks ago my friend Rob and I relaunched a small beer website we took over. We had rewritten the entire site in ratpack over the previous two months and were excited to show it off to the world. Alomst immediately upon launching our login and password reset pages were slowing to a crawl and timing out for most requests. After several hours of debugging we determined two main causes:

  • I had picked a woefully inadequate instance type for our application. (Because I’m cheap.)
  • We had compounded the problem by picking a log rounds value for password hashing that took a very long time in ec2. Particularly on the instance type I selected.

After moving to a sane ec2 instance type and decreasing the log rounds our performance issues have gone away.

This experience made me wonder if anyone had benchmarked this in ec2 before and without finding anything in a quick google search I decided to do it myself.

Methodology

JMH (Java Micro-benchmarking Harness) is a framework for writing micro-benchmarks in java and I’ve been wanting to try it for a while. JMH handles the tricky parts of benchmarking code on the JVM by handling things like JVM warmup and garbage collection. It provides annotations to generate the benchmarking code as well as the benchmark runner. The JMH documentation isn’t fantastic. I preferred this blog post, and this set of samples.

For this case I wanted to benchmark the amount of time it would take to hash a string using a salt generated by bcrypt over various number of log rounds. The higher the number of log rounds used, the more time it takes to hash the password and is more secure. However, the time to hash the password is going to be the lower bound on the amount of time it takes to authenticate a user. Pick a number that is too high and users could be waiting a very long time. (Particularly if your server has 1 cpu and a slow clock speed.) The method being benchmarked in this case is BCrypt.hashpw using various sizes of salts. The salt is generated by BCrypt.gensalt(logRounds) and log rounds varies from 6 to 12, increasing by 2 each time.

After I wrote the JMH benchmark, I needed an easy way to create ec2 instances, copy the jar file to it, execute the benchmark and capture the results. And I wanted to be able run this on several different ec2 instance types and compare the results. I didn’t find anything that could do this already so I wrote a small groovy script that leverages the Gramazon library to help out.

Results

MacBook Pro – i7 2.5gHz

Benchmark (logRounds) Mode Samples Score Error Units
b.b.LogRoundsBenchmark.hashPw 6 avgt 200 5.194 ± 0.015 ms/op
b.b.LogRoundsBenchmark.hashPw 8 avgt 200 20.570 ± 0.075 ms/op
b.b.LogRoundsBenchmark.hashPw 10 avgt 200 82.559 ± 0.319 ms/op
b.b.LogRoundsBenchmark.hashPw 12 avgt 200 330.919 ± 1.509 ms/op

m3.large

Benchmark (logRounds) Mode Samples Score Error Units
b.b.LogRoundsBenchmark.hashPw 6 avgt 200 6.219 ± 0.002 ms/op
b.b.LogRoundsBenchmark.hashPw 8 avgt 200 24.471 ± 0.017 ms/op
b.b.LogRoundsBenchmark.hashPw 10 avgt 200 97.416 ± 0.050 ms/op
b.b.LogRoundsBenchmark.hashPw 12 avgt 200 388.989 ± 0.225 ms/op

c3.large

Benchmark (logRounds) Mode Samples Score Error Units
b.b.LogRoundsBenchmark.hashPw 6 avgt 200 5.811 ± 0.002 ms/op
b.b.LogRoundsBenchmark.hashPw 8 avgt 200 22.859 ± 0.015 ms/op
b.b.LogRoundsBenchmark.hashPw 10 avgt 200 90.972 ± 0.031 ms/op
b.b.LogRoundsBenchmark.hashPw 12 avgt 200 363.898 ± 0.653 ms/op

Although we didn’t have these numbers before we launched, it would have been really useful. We sarted on an m3.medium with a log rounds value of 16. I didn’t even include it in this test but it took multiple seconds. And there was only a single vCPU so our server was quickly overwhelmed when we sent out the email saying a big new release had been launched.

Having these numbers allows us to make an informed choice about the tradeoffs between security and performance.

Code

All of the code can be found on github of course. First the JMH test itself:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package benchmarks.bcrypt;

import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

import org.mindrot.jbcrypt.BCrypt;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
public class LogRoundsBenchmark {

  @Param({"6", "8", "10", "12"})
  public int logRounds;

  int i;
  
  @Benchmark
  public void hashPw() {
      BCrypt.hashpw("password", BCrypt.gensalt(logRounds));
  }
}

This is all the code you have to write for a simple JMH test. Additionally I used gradle to build it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
plugins {
  id "java"
  id "me.champeau.gradle.jmh" version "0.1.3"
}

sourceCompatibility = 1.7

repositories {
    mavenCentral()
}

dependencies {
  jmh 'org.mindrot:jbcrypt:0.3m'
}

The other piece required was a script to execute the benchmark on an ec 2instance. A lot could be done to make this script much more generic but it fit the bill for what I needed. To try this on your own you would need an amazon access and secret key, as well as the pem file generated for the security group used to create the ec2 instance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
@GrabResolver(name='oss', root='https://oss.sonatype.org/content/repositories/snapshots/')
@Grab('com.aestasit.infrastructure.aws:gramazon:0.3.6-SNAPSHOT')
@Grab(group='com.jcraft', module='jsch', version='0.1.51')

import com.aestasit.infrastructure.aws.*
import com.aestasit.infrastructure.aws.model.Instance
import com.jcraft.jsch.*

def ec2 = new EC2Client('us-east-1')
System.setProperty("aws.accessKeyId", 'ACCESSKEY')
System.setProperty("aws.secretKey", 'SECRETKEY')

println "Starting ec2 instance"

Instance instance = ec2.startInstance(
  'kyle-key-pair',
  'ami-52a3153a',
  'default',
  'm3.xlarge',
  true)

println instance.host
println instance.instanceId

sshToEc2Instance(instance) { Session session ->
  installJava(session)
  scpJarToConnection(session)
  runBenchmark(session)
}

println "Stopping ec2 instance"

ec2.terminateInstance(instance.instanceId)

println "ALL DONE"

Lets take the script in pieces. First is the script that uses Gramazon and Jsch for the heavy lifting. All we’re left with is starting the instance, then sshing to the instance and running several commands. Finally we stop the instance which is important if you don’t want to continue to pay for it once you’re finished. Each time I wanted to try a new instance type I would just change the m3.large string to the next value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void sshToEc2Instance(Instance instance, Closure closure) {
  try {
      JSch jsch=new JSch()
      jsch.addIdentity("/Users/kboon/Documents/workspace/bcrypt-jmh-benchmark/kyle-key-pair.pem")
      jsch.setConfig("StrictHostKeyChecking", "no")

      Session session=jsch.getSession("ubuntu", instance.host, 22)
      session.connect()
      closure(session)
      session.disconnect()
  } catch (all) {
      println all.message
      all.printStackTrace()
  }
}

This SSHes to the instance and then executes the provided closure, passing the ssh session to it.

1
2
3
4
5
6
7
8
9
void scpJarToConnection(Session session) {
  Channel channel = session.openChannel("sftp")
    channel.connect()
    channelSftp = (ChannelSftp)channel
    File fileToTransfer = new File('/Users/kboon/Documents/workspace/bcrypt-jmh-benchmark/build/libs/bcrypt-jmh-benchmark-jmh.jar')
    channelSftp.put(new FileInputStream(fileToTransfer), fileToTransfer.getName())
    channel.disconnect()
    println "Done scping file."
}

This copies the Jar file built with gradle. This should probably be parameterized so the jar file isn’t hard coded.

1
2
3
4
5
6
void installJava(Session session) {
  //run stuff
  String command = "sudo apt-get update;sudo apt-get -y install default-jre"
  executeCommand(command, session)  
    println "Done installing java."
}

The AMI I selected doesn’t have java installed so this takes care of that. I could select a different AMI or build my own if that was important enough.

1
2
3
4
5
void runBenchmark(Session session) {
  String command = "java -jar bcrypt-jmh-benchmark-jmh.jar"
  executeCommand(command, session)  
  println "Done running benchmark."
}  

Executes the JMH benchmark harness. This takes about 20-30 minutes as it runs multiple iterations to get accurate numbers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void executeCommand(String command, Session session) {
  Channel channel = session.openChannel("exec")
      channel.setCommand(command)
      channel.setErrStream(System.err)
      channel.connect()

      InputStream input = channel.getInputStream()
      //start reading the input from the executed commands on the shell
      byte[] tmp = new byte[1024]
      while (true) {
          while (input.available() > 0) {
              int i = input.read(tmp, 0, 1024)
              if (i < 0) break;
                  print(new String(tmp, 0, i))
          }
          if (channel.isClosed()){
              println("exit-status: " + channel.getExitStatus())
              break
          }
          sleep(1000)
      }

      channel.disconnect()
}  

This is just a bit of jsch to execute a command via the shell and stream the output back to standard out. I think I cut and pasted this from an example somewhere.