Weka Data Mining with Singularity

By Staff

Sep 26, 2018 | Blog, How To Guides

Weka is a commonly used Machine Learning suite of algorithms for Data Mining with Java.  We’ve developed a Singularity container so that your Weka environment and data can now be moved cross-system on-demand, with all the benefits of the Singularity Image Format (SIF).

Recipe:

BootStrap: docker
From: ubuntu:16.04

%post
    apt-get -y update
    apt-get -y install curl
    apt-get -y install unzip
    apt-get install -y openjdk-8-jre
    curl -sSL "https://prdownloads.sourceforge.net/weka/weka-3-8-3.zip" > weka.zip
    unzip weka.zip -d / && rm -f weka.zip*
    echo 'export CLASSPATH=/weka-3-8-3/weka.jar' >> /environment
    apt-get clean

To build the Weka container, we run:

$ sudo singularity build weka.sif weka.def

Weka builds without any setup required and its basic usage is:

$ singularity exec weka.sif java weka.classifiers.object

Toy datasets are included with this install, let’s test them out with a command:

$ singularity exec weka.sif java weka.classifiers.functions.MultilayerPerceptron \
    -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -t /weka-3-8-3/data/breast-cancer.arff

It also comes with a multitude of other functions, try the BayesNet function:

$ singularity exec weka.sif java weka.classifiers.bayes.BayesNet -t /weka-3-8-3/data/iris.arff -D \
  -Q weka.classifiers.bayes.net.search.local.K2 -- -P 2 -S ENTROPY \
  -E weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 1.0

Of course, when you run Weka you’ll want to use real data by adding the -B flag to bind your data directory into the container:

$ singularity exec -B path/to/data:/weka-3-8-3/data weka.sif java weka.classifiers.functions.[function here] \
    -t /weka-3-8-3/data/yourdataset/file.arff [args]

For more information about Weka visit their home page.

Related Posts

Introducing CDI Support to SingularityCE 4.0

With the ever increasing adoption of AI techniques in scientific research, as well as growing use of accelerators for traditional numerical workloads, easy access to GPUs and other devices in HPC environments is critical.The 4.0 release of the SingularityCE container...

read more