generated from allenai/python-package-template
-
Notifications
You must be signed in to change notification settings - Fork 39
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f108d53
commit 9b3a246
Showing
1 changed file
with
6 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,13 @@ | ||
# Paloma | ||
|
||
The Paloma benchmark makes use of this repo to run evaluation inference. This readme will explain everything you need to know to get results on Paloma and make a submission to our benchmark. | ||
In addition to the dataset hosted here, Paloma introduces guidelines for making perplexity results comparable across models and code that implements these guidelines with specific experimental controls. This page will walk you through how to apply these standards to your experiments. | ||
|
||
Whether you are just evaluating an off-the-shelf model or preparing to conduct your own pretraining experiment from scratch, we recommend that you employ as much of our standardized code as possible to ensure the greatest level comparability with existing results. | ||
|
||
Links: | ||
|
||
[Data](https://huggingface.co/datasets/allenai/paloma) | ||
|
||
## Getting existing results from the benchmark | ||
Paloma is first and foremost a suite of results from the research community organized by comprability. These are formated as *.jsonl.gz files recording the perplexity per domain over our 585 domains as well as additional metrics discussed in our paper. These are files are the same type of results that are output by running the code in this repo for a given model. | ||
|
||
We are also building out a website to allow interactive inspection of these multi-dimensional results. Until then please contact us by emailing the first author of Paloma if you would like access to the raw benchmark results. | ||
|
||
So far the models evaluated by the benchmark are the 6 baseline 1B parameter models that we release with Paloma as well as `EleutherAI/pythia-160m`, `EleutherAI/pythia-1B`, and `EleutherAI/pythia-6.9b`. | ||
|
||
## Setup | ||
Start by following the installation instructions for this repo in this [readme](../README.md). | ||
|
||
|
@@ -39,22 +34,19 @@ tango --settings tango.yml run configs/example_paloma_config.jsonnet --workspace | |
``` | ||
|
||
## Pretraining your model | ||
Note that if you want to make a submission to our benchmark you must choose whether you will opt in to several experimental controls that will allow your submission to be marked for the greatest level of comparability. In this section we detail how you can accomplish these experimental controls. | ||
If you are pretraining from scratch, we recomend you adopt several experimental controls that will allow the greatest level of comparability for your results. In this section we detail how you can accomplish these experimental controls. | ||
|
||
### Decontaminating your pretraining data | ||
Our decontamination approach is implemented in the Dolma Tooling repo. This will allow you to remove any document from any your pretraining data that is contaminated with respect to the Paloma. | ||
|
||
To do this please follow the instructions [here](https://github.com/allenai/dolma/blob/decon-instructions/docs/paloma_decontamination.md) to decontaminate your own pretraining data. | ||
|
||
### Fixing the training data order | ||
Our approach for fixing the training data order requires the use of the same training code that we employ to train our 1B parameter baselines. This training code cannot yet be released as it is being developed for a separate, ongoing project. When that code is released we will update our instructions here to enable use of this experimental control. If you wish to use this experimental control before then, please feel free to reach out to the first author of Paloma. | ||
Our approach for fixing the training data order requires the use of [the same OLMo training code](https://github.com/allenai/OLMo/tree/1f2f02052d2a5ecba82ff45bbfc731651b1e7d29) that we employ to train our 1B parameter baselines. Contemporary LMs train on instances that are maximum sequence length concatenations of training documents, so we must fix the order of concatenated instances. We do this by fixing the tokenization, maximum sequence length, and random seed, as well as providing dataloading code where order is invariant to number of devices. | ||
|
||
### Fixing the vocabulary | ||
We ask that submissions that do not investigate changes in vocabulary opt in to our standardized vocabulary to enable the greatest level of comprability. That vocabulary is available from the tokenizer hosted on HuggingFace hub as `allenai/gpt-neox-olmo-dolma-v1_5`. | ||
|
||
## Making a submission | ||
At present we are building out an automatic submission process that will soon be available. Until then please reach out to us by emailing `[email protected]`, if you would like to submit results to the benchmark. | ||
|
||
## Citation | ||
|
||
```bibtex | ||
|
@@ -66,4 +58,4 @@ At present we are building out an automatic submission process that will soon be | |
volume={abs/2312.10523}, | ||
url={https://api.semanticscholar.org/CorpusID:266348815} | ||
} | ||
``` | ||
``` |