General questions regarding the reproducibility challenge

Hello, I have a few questions regarding the reproducibility challenge for project 2:

Do you provide any recommendations regarding which paper to choose? Given that there are thousands of paper to choose from, finding a paper that can be reasonably re-implemented is quite difficult, especially given that many papers require a significant amount of compute in order to reproduce experiments.
Which guidelines should we follow for the report: the course's guidelines or the RC's guidelines? For example, the project 2 description asks for a report of maximum 4 pages (with no specific template), while the RC asks for a report of maximum 8 pages, using the NeurIPS template (source).

Thank you!

27 Oct '20 ·

Do you provide any recommendations regarding which paper to choose? Given that there are thousands of paper to choose from, finding a paper that can be reasonably re-implemented is quite difficult, especially given that many papers require a significant amount of compute in order to reproduce experiments.

We recommend you to look out for papers that do not require huge amounts of compute, don't use proprietary datasets. Basically, papers with simple experiments are preferred. For example, because of these reasons:

this ICML 2020 paper using a ResNet-50 and quite some hyperparameter tuning is not a good fit: A Simple Framework for Contrastive Learning of Visual Representations.
this NeurIPS 2020 paper's linear regression results should be within reach: Modeling and Optimization Trade-off in Meta-learning

Note that reproducing part of the results is fine, as long as you do most experiments that are within reach.

Also, we recommend to try to reproduce papers (e.g. accepted NeurIPS 2020 papers) that have NO codebase available (mostly on GitHub) at the start of project 2. This makes your reproducibility effort more "novel" and is generally a better learning experience for you as well.

Which guidelines should we follow for the report: the course's guidelines or the RC's guidelines? For example, the project 2 description asks for a report of maximum 4 pages (with no specific template), while the RC asks for a report of maximum 8 pages, using the NeurIPS template (source).

For this project you can follow the reproducibility challenge guidelines.

To get an idea, here are some examples from published EPFL CS433 reproducibility challenges from previous years:

3

27 Oct '20 · 1 ·

anonymous

Do we need to replicate the images in the appendix?

5 Nov '20 ·

anonymous

Do we need to replicate the images in the appendix?

In the previous answer: "Note that reproducing part of the results is fine, as long as you do most experiments that are within reach."

So, if an image in the appendix is a crucial part of the part of the experiments that you reproduce, then yes. But generally, (extra) experiments in the appendix can be left out (unless you are really up for a challenge/think something might be wrong in those experiments).

Also note that the reproducibility challenge revolves around reproducing results, not exact figures/images or tables. You play the role of a scientific investigator who verifies whether the paper is reproducible (at least in part).

5 Nov '20 ·

anonymous

can we choose a paper that has already been claimed?

9 Nov '20 ·

anonymous

can we choose a paper that has already been claimed?

From https://paperswithcode.com/rc2020/registration:

We encourage you to select papers which are yet to be claimed, however, you can also claim a paper which has been claimed by a team.

Besides the recommendation to take papers without available code (see post above), for CS433 we follow the official guidelines.

9 Nov '20 · 1 ·

anonymous

I have read the announcement, website description and their FAQ, but there doesn't seem to be much information about how our submissions will be graded and based on what criteria.

For example, can we expect a good grade (not just a passing 4/6 grade) if we do our best effort, but some of the results cannot be replicated? I realize this was partly answered in a previous comment here, but whether the received grade is merely 4/6 or higher (up to 6/6) is very unclear.

Could you please also offer what general grading guidelines you'll be using? I realize there are some in the project 2 description PDF, but most of these do not apply to option C.

9 Nov '20 · 1 ·

anonymous

I have read the announcement, website description and their FAQ, but there doesn't seem to be much information about how our submissions will be graded and based on what criteria.

Copied from Aswin in Discord:

[We grade ]Mainly on how well you could reproduce the results of the original paper (or show with evidence that the results are not reproducible), while taking into account the difficulty ( we understand that papers vary widely in terms of ease of reproducibility). The reviewing criteria here https://paperswithcode.com/rc2020/registration can give you additional guidelines.

Also note that this project 2 choice (C) is one of the harder ones depending on the paper (you will be reproducing a cutting-edge research paper, and reporting on it scientifically as learned in project 1). To get a sense of the level that is expected, please refer to successful submissions & reviews from previous years here:

For example, can we expect a good grade (not just a passing 4/6 grade) if we do our best effort, but some of the results cannot be replicated? I realize this was partly answered in a previous comment here, but whether the received grade is merely 4/6 or higher (up to 6/6) is very unclear.

It is very much possible that the results cannot be reproduced, despite your best efforts. That is what this reproducibility challenge is for in the first place. To act as a verifier. This will still give you a full grade, as long as it is well supported. E.g., taking a paper requiring multi-year GPU time (please don't) and saying that you didn't have the resources is not sufficient.

A ReScience-C Journal Publication level (=journal for accepted reproducibiliity challenge papers), i.e. an average review score 6+, without major flaws, leans towards an EPFL 6/6. A clear reject (average review score 4-) is in the range of an EPFL 4/6.

9 Nov '20 · 2 ·

anonymous

This topic has been closed.