CAMI FAQ - Frequently Asked Questions

What is a binning method?

A binning method assigns an identifier to every sequence in a sequence sample, where the total number of identifiers is ideally less than the total number of sequences. Thus the act of binning places the sequences into broader categories. A bin includes all the sequences with the same identifier. If these identifiers identify taxa from a taxonomy, the method is a taxonomic binning method.

What is a profiling method?

A profiling method returns an estimate for the frequencies of different taxa in a sequenced microbial community based on analysis of the sequence sample. The main output is a vector with relative abundances for the different sample taxa. The relative abundances of taxa from the same 'rank' of the taxonomy (e.g. superkingdom, including archaea, bacteria and eukaryotes) cannot sum up to more than 1.

What is an assembly method?

An assembly method returns longer nucleotide sequences derived by puzzling together individual sequencing reads. These sequences are assumed to represent contiguous stretches from one genome included in the microbiome sample that was sequenced.

Is CAMI open and transparent?

Yes! The development process is done in an open community. Everybody is invited to participate. People who intend to participate will not be involved in any part of the development process that provides any information or advantage for the actual competition.

What about reproducibility?

CAMI encourages participants to submit reproducible results by providing their executable software in docker containers, along with submission of their predictions (see details below). CAMI has worked together with bioboxes to define formats for a standardized setup and execution of profiling, binning and assembly tools in docker containers. This will make the results of these tools reproducible, and also facilitate a continuous monitoring of their performance on future test data sets.

Where do I find the relevant information when I want to participate in the CAMI contest?

You can find all information on the CAMI page entitled participate.

Why should I participate in the contest?

There are various reasons why you might want to participate.

  • Receive feedback for your methods performance on a larger number of data sets.
  • Facilitated benchmarking. By submitting your software in a docker container, you will be able to continuously monitor its performance against other software that has been submitted this way. It will eventually be possible to automatically run all software on new datasets with the CAMI benchmarking portal and compare the performance using a range of common performance metrics. If you update your software, you simply have to update your docker container that you submitted.
  • Coauthorship. As in the first contest participants who would like to disclose their identities and deliver reproducible results have the option to be authors on a joint CAMI evaluation publication.
  • If you are working on CAMI data sets, you can apply for a short talk at the CAMI session of the Microbiome COSI track at ISMB in Chicago this year.
  • Interact with computational metagenomics community in workshops, hackathons and at meetings. CAMI will be organizing several events where you can help define the most relevant evaluation metrics, bioboxing software and questions to assess in detail for the second CAMI challenge.

When will the contest start?

The second CAMI challenge will start in spring of 2018.

How long will the second CAMI challenge last?

CAMI will be open overall for 4 months. First, eight weeks for the assembly methods, and methods that process read data sets. After eight weeks, a gold standard assembly will be provided for methods that work on assembled data sets. At this point, the assembly competition will end.

How large are the data sets?

Please watch out for the details of individual data sets on the CAMI data portal. Some of the data sets will be very large, reaching up to 1 TB. You will also be able to download subsets of samples for testing.

What are the data access policies for the CAMI challenge?

CAMI toy datasets are generated from published genomes. These data sets do not have any restrictions.

CAMI challenge data sets have been generated from unpublished genome data provided by multiple laboratories who want to control the data release date. Therefore, the CAMI data sets will become fully available, together with the gold standard, after the competition has ended and once the data contributors have released their data. Until then, a data recipient is not allowed to publish the data, deposit the sequences in any database or use it for other purposes than participating in the CAMI challenge. The data recipient agrees to delete and/or trash the data in all forms when the CAMI Challenge is ended to the time until it has been officially released.

Why do I have to register with the CAMI data portal for download of the real challenge datasets?

If there is a problem with the submitted files, we have the chance to contact the contestant. In a later stage of the competition we will show evaluation metrics with an anonymized name for the results (assembly, profiling, binning) provided by the contestant. Participants can choose to have their results displayed anonymously only, or reveal their identities for participation in a joint publication.

If I do not have access to sufficient compute time on my own, what are my options?

The Pittsburgh Supercomputing Center (PSC) is making compute time and storage available on their system to run genome assemblies for the CAMI challenge. If you request these resources during registration, PSC will contact you to determine your software and computing needs and to set up your account on the PSC system.

What kind of data sets will be provided?

CAMI is providing simulated metagenome data sets created from hundreds of predominantly unpublished (contributed) isolate genome sequences, with as much realism built in as we could manage. For instance, they will include multiple strains from the same species.

Why are we providing new simulated data sets and not using existing public ones?

For real metagenome samples, we do not know which read comes from which genome, and in our simulations, this should be the same way. All public simulated samples also have the correct solution also available, which would make the contest less realistic. Furthermore, we want to simulate different relevant scenarios that are used in metagenomics for which no simulated data sets exist. The gold standards for these data sets will be provided after the contest has ended.

How can I download data?

You have to do the following steps.

  1. Login or Register
  2. Go to the Up- and Download section.
  3. Find the dataset you want to download.
  4. If you want to download datasets from the restricted download section, you will have to read and accept the terms and conditions. After accepting the terms you are allowed to download the samples. If you want to download a dataset from the public download section, you can click on the "public download section" button and download any sample you want.

I have problems finding out which output format I should use, what should I do?

You can contact us to get advice.

How do I submit my results?

The best way will be if you provide your tool along with the prediction files. The tool should be installed in a docker container following our instructions on https://data.cami-challenge.org/#dockerInfo, which writes the output to a file with a specific name in a specific directory. You can upload this docker container to the contest website and we will rerun your tool to see if the results are entirely reproducible. If your tool requires standard reference databases to run, many of them will be provided on the website. How you link to these is explained in the instructions.

Should I use the "all samples" or the "specific sample" option?

It is recommended that you submit a co-assembly for all samples of a data set. If you cannot coassemble due to technical or methodological reasons, please submit individual assemblies for all samples using the "specific sample" option.

Can I submit the results without providing my tool?

If you want to submit your results without them being reproducible, you can do so and get the feedback for your personal information. In this case, your tool will only be considered to be included in a joint result publication, if it is a publicly accessible webtool.

How can I submit my results without providing my tool?

You have to do the following steps after the download of a dataset and the execution of your pipeline on it.

  1. Login or Register
  2. Go to the Up- and Download section.
  3. Find the dataset you want to submit for.
  4. Click the Add Assembly, Add Binning or Add Profiling button.
  5. Fill in the form. For the fingerprint field you have to download our cami client jar (See next step).
  6. Run the cami client: java -jar camiClient.jar with the parameter -af for assembly files -bf for binning files or -pf for profiling files. Now enter the returned fingerprint in the input field. Note! You have to execute the cami client jar and submit the form every time you change the file.
  7. After submitting the form you will get a credentials file that allows you to upload the file in the next 36 hours. If you want to submit your file to a later time, just submit the form and you will get new credentials. Upload your file with: java -jar camiClient.jar -u credentials_file file_to_upload. You can upload up to 36 hours after the competition end.
  8. (Optional) Register a docker container for reproducibility. How to build/submit a docker container?

How to use the CAMI Client?

The Java based CAMI client jar is for the validation and upload of assembly, binning and profiling files. Requirements are a Unix-like operating system (e.g. Linux, OSX, Solaris, *BSD) and Java 7 (e.g. Oracle JDK 7, OpenJDK 7).

usage:
​java -jar camiClient.jar [-af < assembly_file >] 
                         [-bf < binning_file extracted_taxnomy_db_path >] 
                         [-d < url destination >] 
                         [-pf < profiling_file extracted_taxnomy_db_path >] 
                         [-u < credentials_file file_to_upload >] 
                         [-version] 
                         [-h] 

Validates and uploads binning, profiling and assembly files.

Command Description
-af,--assemblyFingerprint assembly_file
Computes fingerprint of an assembly file.
-bf,--binningFingerprint binning_file extracted_taxnomy_db_path
 Validates binning file and computes fingerprint.(download the taxonomy_db from https://data.cami-challenge.org/participate (databases section)) 
-pf,--profilingFingerprint profiling_file extracted_taxnomy_db_path
Validates profiling file and computes fingerprint.(download the taxonomy_db from https://data.cami-challenge.org/participate (databases section))
-u,--upload credentials_file file_to_upload
You can get the credentials file from the cami website. File to upload is the assembly, binning or profiling file you want to upload. 
-d,--download url destintation 
Downloads data from S3.
-h,--help
Print the help of the application.
-version,--v
Print the version of the application.

        

Where do I get the fingerprint?

You can get the fingerprint with our cami client jar. Run the cami client: java -jar camiClient.jar with the parameter -af for assembly files -bf for binning files or -pf for profiling files. Your files must be in a specific format, which is explained in the following questions:
What is the format of the CAMI assembly file?
What is the format of the CAMI binning file?
What is the format of the CAMI profiling file?

Note! You have to execute the CAMI client jar and submit the form every time you change the file.

What is the format of the CAMI assembly file?

The CAMI assembly format is a FASTA-formatted contig and scaffold file as specified in our CAMI Github repository.

What is the format of the CAMI binning file?

The CAMI format for binning is specified in our CAMI Github repository.

What is the format of the CAMI profiling file?

The CAMI format for profiling is specified in our CAMI Github repository.

I cannot get the docker installation to work, what should I do?

Please contact us, we are happy to help.

Can I use special Hardware for a Dockers container (like GPUs, ...)?

Yes, all hardware platforms are supported. If you don't have access to the required hardware, please contact us regarding compute resources the Pittsburgh Supercomputing Center could provide to you. This blog post describes how to run CUDA code in a Docker container: http://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-cont...

How does the speed of a method contribute to the result? Is the contest mainly focused on precision and –recall?

CAMI has worked with community members in defining multiple evaluation metrics for assessing performance of different method categories (see Sczyrba et al. Nature Methods 2017). Definition of the most relevant ones is an ongoing effort and will be continued in a public meeting of users and developers after the second CAMI challenge.

I don't want to upload my unpublished tool to Docker Hub, what should I do?

We can provide a private Docker Hub, so that just the CAMI team has access to the unpublished tool. Please contact support@cami-challenge.org, we will tell you the next steps.

Do I have to submit results for binning, assembly and profiling?

You are free to submit for one or multiple tasks, there is no need to submit results for all of them.

Do I have to submit my codes in any particular language?

You can submit your codes / software in any language you like. It should be installed ready to execute in a docker container. The instructions how to install it we will make available soon. We will also offer help with this, if wanted, via Skype.

How can I compare the results of a binning or profiling tool with the results of the first CAMI challenge?

You can reproduce binning and profiling comparisons, as well as compute metrics for the results of other tools, using the assessment packages AMBER (genome binning) and OPAL (profiling). Gold standards and the results of participating binners and profilers are available on the CAMI data portal.

How can I hear the latest news from CAMI?

  • You can register for the CAMI newsletter on the ISCB COSI Microbiome site -
  • You can follow CAMI on twitter (@cami_challenge)

What is the purpose of the CAMI google+-group?

We used it for video hangouts in planning the first CAMI contest because we are working with people from around the world. We used it for feedback and to invite tool developers to interact with the CAMI team here in defining the CAMI contest already in the development phase. We recommend interacting via the CAMI newsletter or twitter with us.