Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge development branch into main #19

Merged
merged 142 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
e835bf6
Add a draft notebook
KoalaQin Sep 27, 2024
c9f0e22
Add more use cases in notebook
KoalaQin Sep 30, 2024
a44e72d
Add example to get AF for one ancestry for one variant
KoalaQin Sep 30, 2024
4bc936c
Add table of contents and screenshot
KoalaQin Oct 1, 2024
2f1156e
Format gnomad_methods in requirements.txt
KoalaQin Oct 1, 2024
ff64d86
Remove setup.py
KoalaQin Oct 1, 2024
bc85e6f
Remove setup.py
KoalaQin Oct 1, 2024
e083fbf
Reorganize the notebook
KoalaQin Oct 1, 2024
b5101ba
Function to import data by version
KoalaQin Oct 29, 2024
2bd494e
Modify import_data function to match gnomad_methods repo version by d…
KoalaQin Oct 29, 2024
3cfb1bd
Formatting
KoalaQin Oct 29, 2024
8d904cf
Formatting
KoalaQin Oct 29, 2024
49b2cb0
Formatting
KoalaQin Oct 29, 2024
a7c8e53
Formatting
KoalaQin Oct 29, 2024
6d3471f
Formatting
KoalaQin Oct 29, 2024
1b847b4
Formatting
KoalaQin Oct 29, 2024
22c40f7
Formatting docstring block
KoalaQin Oct 29, 2024
66770ea
Function to extract callstats for one variant in one genetic ancestry…
KoalaQin Oct 29, 2024
71a33bd
Function to extract callstats for genetic ancestry groups
KoalaQin Oct 29, 2024
80b1f91
indent correctly
KoalaQin Oct 29, 2024
8dcce6c
Add functions to filter variants by gene symbol, interval and csqs
KoalaQin Nov 1, 2024
34fab76
Correct small errors
KoalaQin Nov 2, 2024
0ccdb5b
Update notebook
KoalaQin Nov 2, 2024
c183470
- Restructure files
jkgoodrich Dec 9, 2024
095a234
- Restructure files
jkgoodrich Dec 9, 2024
5fe3010
- Modifications to support setting a default data_type and version
jkgoodrich Dec 9, 2024
5330ea6
More clean-up of notebooks and functions
jkgoodrich Dec 11, 2024
b853fc2
Add notebooks to git
jkgoodrich Dec 11, 2024
710bdcf
Fix unterminated string error
KoalaQin Dec 13, 2024
4bfc305
Apply suggestions from code review
jkgoodrich Dec 18, 2024
ecb664b
format
jkgoodrich Dec 18, 2024
06de980
Put back setup.py
KoalaQin Dec 18, 2024
d09953c
Merge pull request #5 from broadinstitute/jg/draft_toolbox
jkgoodrich Dec 18, 2024
c37ad6c
move setup to the correct location
jkgoodrich Dec 18, 2024
ef83b20
Merge pull request #4 from broadinstitute/qh/draft_toolbox
jkgoodrich Dec 18, 2024
24b78ab
Add instructions to set up the toolbox and use notebooks
KoalaQin Dec 18, 2024
1b9f98d
small edit
KoalaQin Dec 18, 2024
62b7fd8
use updated freq_bin_expr in gnomad_methods
jkgoodrich Dec 19, 2024
8f49b09
Reorganize steps
KoalaQin Dec 19, 2024
1d811e6
small edit
KoalaQin Dec 19, 2024
18d0a85
small edit 2
KoalaQin Dec 19, 2024
4ddde29
Use `parse_variant` in gnomad_methods
jkgoodrich Dec 19, 2024
a68b7fd
Specify Java version
KoalaQin Dec 19, 2024
7248a81
Undo weird changes to notebooks
KoalaQin Dec 19, 2024
a138ce3
Merge pull request #6 from broadinstitute/jg/move_freq_bin_expr
jkgoodrich Dec 19, 2024
e542c1b
Uncomment
jkgoodrich Dec 19, 2024
056ab1c
Use the filter CDS function from gnomad_methods
KoalaQin Dec 19, 2024
752c123
small edits
KoalaQin Dec 19, 2024
7776c23
Formatting
KoalaQin Dec 19, 2024
7a29ba5
Change `filter_by_csqs` to `filter_by_consequence_category` and clean…
jkgoodrich Dec 20, 2024
0350630
Merge pull request #8 from broadinstitute/jg/use_gnomad_methods_varia…
jkgoodrich Dec 20, 2024
bc2120a
Fix other filter in `filter_by_consequence_category`
jkgoodrich Dec 20, 2024
f482dba
Draft code
KoalaQin Dec 20, 2024
09551f8
Small docstring fix
jkgoodrich Dec 20, 2024
91ecb7d
Merge pull request #10 from broadinstitute/jg/update_filter_by_csqs
jkgoodrich Dec 20, 2024
df9eb61
Add to-do
KoalaQin Dec 20, 2024
aa4e8e5
Move to-do
KoalaQin Dec 20, 2024
f3d459b
Change param name
KoalaQin Dec 20, 2024
64139d8
Merge branch 'development' into qh/filter_by_gene_symbol
jkgoodrich Dec 20, 2024
7557b87
Merge pull request #9 from broadinstitute/qh/filter_by_gene_symbol
jkgoodrich Dec 20, 2024
80b1af4
Get relevant coverage table for each version
KoalaQin Jan 2, 2025
532fcc8
Revert notebook changes
KoalaQin Jan 3, 2025
c6de768
Merge branch 'development' into qh/get_pLOFs
KoalaQin Jan 3, 2025
80d33be
reformat
KoalaQin Jan 3, 2025
e03253e
Modify the code to use browser tables
KoalaQin Jan 6, 2025
a6e9ac4
Merge remote-tracking branch 'origin/development' into qh/filter_by_g…
KoalaQin Jan 6, 2025
427217c
Add extra filter steps to match Browser
KoalaQin Jan 7, 2025
fa0bcfd
Removed unused imports
KoalaQin Jan 7, 2025
319d9fa
Make the big functions to smaller ones
KoalaQin Jan 8, 2025
93184af
Move gnomad_methods to setup.py
KoalaQin Jan 10, 2025
b271529
update notebook
KoalaQin Jan 10, 2025
bdb41d2
Move modified gnomad_methods back to requirements.txt
KoalaQin Jan 10, 2025
bf593fc
rename to gnomad
KoalaQin Jan 10, 2025
0490362
Update filter_by_intervals and filter_by_gene_symbol to use more gnom…
jkgoodrich Jan 10, 2025
19ace51
Fix use of filter_by_intervals
jkgoodrich Jan 10, 2025
fef0c8e
Merge remote-tracking branch 'origin/development' into qh/readme
KoalaQin Jan 10, 2025
f44a79f
Add a name for the gnomad_methods main branch so it can be used to se…
KoalaQin Jan 10, 2025
ae2acbc
Updates to loading data to support more data types and inclusion of c…
jkgoodrich Jan 14, 2025
8eb7296
Fixes during testing
jkgoodrich Jan 14, 2025
87906ee
Fix GnomADSession init
jkgoodrich Jan 14, 2025
95daf52
Fix table in `get_gnomad_release` docstring
jkgoodrich Jan 14, 2025
2853978
Fix use of `data_type` in `_get_dataset`
jkgoodrich Jan 14, 2025
23e0fe2
split with comma instead of tab
jkgoodrich Jan 14, 2025
41c6739
Use correct browser resource, browser_variant
jkgoodrich Jan 14, 2025
93cc5cd
Add additional datasets to the data exploration
jkgoodrich Jan 14, 2025
d9c0a6b
Update gnomad_toolbox/load_data.py
jkgoodrich Jan 14, 2025
82b9dbb
Add links for data download to notebook summary table
jkgoodrich Jan 14, 2025
1d081ea
browser_variant -> browser in notebook table
jkgoodrich Jan 14, 2025
8700831
Changes to some headers and added links to the notebook
jkgoodrich Jan 15, 2025
8833771
Add liftover and internal links to the notebook
jkgoodrich Jan 15, 2025
397c14f
Apply suggestions from code review
jkgoodrich Jan 15, 2025
b143d27
Add comment explaining why we support a subset of the versions available
jkgoodrich Jan 15, 2025
5a71f8c
Apply suggestions from code review
jkgoodrich Jan 15, 2025
836994d
Filter to only canonical in filter_to_plofs
jkgoodrich Jan 15, 2025
e56464b
Don't need the MANE and canonical constraint config
jkgoodrich Jan 15, 2025
e3496e4
Merge branch 'jg/get_pLOFs' of https://github.com/broadinstitute/gnom…
jkgoodrich Jan 15, 2025
de240b5
Fix table in `get_gnomad_release` docstring
jkgoodrich Jan 15, 2025
ffbe8e8
Run explore_release_data.ipynb all the way through
jkgoodrich Jan 15, 2025
799e304
Remove unneeded warning
jkgoodrich Jan 15, 2025
970b2dc
Fix browser change log link
jkgoodrich Jan 15, 2025
93414d4
Merge pull request #14 from broadinstitute/jg/filter_by_gene_symbol
jkgoodrich Jan 15, 2025
e8c5d30
Merge pull request #12 from broadinstitute/qh/filter_by_gene_symbol
KoalaQin Jan 15, 2025
bacd5b5
Merge pull request #16 from broadinstitute/jg/get_pLOFs
jkgoodrich Jan 15, 2025
4e7b02a
Move pLOF function to contraint
KoalaQin Jan 15, 2025
2949779
Merge from development
KoalaQin Jan 15, 2025
3a2a8b5
Merge pull request #11 from broadinstitute/qh/get_pLOFs
KoalaQin Jan 15, 2025
e2f1d53
Update notebooks to use new functions
KoalaQin Jan 15, 2025
4455bd5
Rename the notebook
KoalaQin Jan 15, 2025
119e4cd
Correct interval function in variant.py
KoalaQin Jan 15, 2025
6f41221
Rerun the filtering notebook with updated function
KoalaQin Jan 15, 2025
233704c
Apply suggestions
KoalaQin Jan 15, 2025
3e276be
Merge pull request #17 from broadinstitute/qh/update_notebooks
KoalaQin Jan 15, 2025
306daf7
Add jupyter server version limit to avoid jupyter notebook 403 error
KoalaQin Jan 15, 2025
f7312eb
Merge branch 'development' of https://github.com/broadinstitute/gnoma…
jkgoodrich Jan 16, 2025
aabf574
Add changes to the README.md and support for jupyter configs
jkgoodrich Jan 21, 2025
b9b5095
Add jupyter configs
jkgoodrich Jan 21, 2025
a1eece8
Move jupyter configs
jkgoodrich Jan 21, 2025
18b1a96
Modify the Jupyter config file to set the notebook directory.
jkgoodrich Jan 21, 2025
d9f1039
Add nbconfig
jkgoodrich Jan 21, 2025
9ea6b53
Use recursive glob
jkgoodrich Jan 22, 2025
c22aac0
I don't think the MANIFEST.in is needed
jkgoodrich Jan 22, 2025
6123429
Small changes in README.md
jkgoodrich Jan 22, 2025
25516a3
Make sure to install hail
jkgoodrich Jan 22, 2025
957c149
Add resources to README.md
jkgoodrich Jan 22, 2025
b7e6cfa
Change to use the Cloud Storage Connector
jkgoodrich Jan 22, 2025
f67bd31
Add image for README.md
jkgoodrich Jan 22, 2025
87ca520
Use correct name for the jupyter notebook -- run all cells
jkgoodrich Jan 22, 2025
8e1f73b
Formatting and clean-up of the README.md
jkgoodrich Jan 22, 2025
47d54fa
A bit more README.md clean-up
jkgoodrich Jan 22, 2025
efe069b
Update the Java prereq section
jkgoodrich Jan 22, 2025
f8fac36
Wrap lines for easier reading
jkgoodrich Jan 22, 2025
3527120
Apply suggestions from code review
jkgoodrich Jan 22, 2025
d662909
Add more infor to java install
jkgoodrich Jan 22, 2025
e68e579
Merge branch 'jg/readme_changes_and_add_configs' of https://github.co…
jkgoodrich Jan 22, 2025
fc766e3
Add Zulip to resources
jkgoodrich Jan 22, 2025
a749531
Add more info about running notebooks locally
jkgoodrich Jan 22, 2025
a4866dc
Small format addition
jkgoodrich Jan 22, 2025
6207d60
Apply suggestions from code review
jkgoodrich Jan 22, 2025
25f108b
Align comments in repo structure
jkgoodrich Jan 22, 2025
bc46c3b
Remove toc from config, we have toc2
jkgoodrich Jan 22, 2025
85e5128
Merge pull request #18 from broadinstitute/jg/readme_changes_and_add_…
jkgoodrich Jan 23, 2025
f574cc1
Merge pull request #7 from broadinstitute/qh/readme
jkgoodrich Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 228 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,228 @@
# gnomad-toolbox
This repository provides a set of Python functions to simplify working with gnomAD Hail Tables. It includes tools for data access, filtering, and analysis.
# gnomad-toolbox: Simplifying Access and Analysis of gnomAD Data

![License](https://img.shields.io/github/license/broadinstitute/gnomad-toolbox)

The Genome Aggregation Database ([gnomAD](https://gnomad.broadinstitute.org/)) is a widely used resource for understanding genetic variation, offering large-scale data on millions of variants across diverse populations. This toolbox is a Python package designed to streamline use of gnomAD data, simplifying tasks like loading, filtering, and analysis, to make it more accessible to researchers.

> **Disclaimer:** This package is in its early stages of development, and we are actively working on improving it.
> There may be bugs, and the API is subject to change. Feedback and contributions are highly encouraged.

---

## Repository Structure

The package is organized as follows:

```
gnomad_toolbox/
├── load_data.py # Functions to load gnomAD release Hail Tables.
├── filtering/ # Modules for filtering gnomAD data.
│ ├── constraint.py # Filter by constraint metrics (e.g., observed/expected ratios).
│ ├── coverage.py # Filter by coverage thresholds.
│ ├── frequency.py # Filter by allele frequency thresholds.
│ ├── pext.py # Filter by predicted expression (pext) scores.
│ ├── variant.py # Filter specific variants or sets of variants.
│ ├── vep.py # Filter by VEP (Variant Effect Predictor) annotations.
├── analysis/ # Analysis functions.
│ ├── general.py # General-purpose analyses, such as summarizing variant statistics.
├── notebooks/ # Example Jupyter notebooks.
│ ├── explore_release_data.ipynb # Guide to loading gnomAD release data.
│ ├── intro_to_filtering_variant_data.ipynb # Introduction to filtering gnomAD variants.
│ ├── dive_into_secondary_analyses.ipynb # Secondary analyses using gnomAD data.
```

---

## Set Up Your Environment for Hail and gnomAD Toolbox

This section provides step-by-step instructions to set up a working environment for using [Hail](https://hail.is/) and the gnomAD Toolbox.

> We provide this guide to help you set up your environment, but we cannot guarantee that it will work on all systems.
> If you encounter any issues, you can reach out to us on the [gnomAD Forum](https://discuss.gnomad.broadinstitute.org),
> and if it is something that we have come across before, we will try to help you out.

### Prerequisites

Before installing the toolbox, ensure the following:
- Administrator access to install software.
- A working internet connection.
- Java **11**.
- Check your Java version:
```commandline
java -version
```
- If you do not have Java 11 installed:
- For Linux, use `apt-get` or `yum` to install OpenJDK 11.
- For macOS, [Hail recommends](https://hail.is/docs/0.2/install/macosx.html) using [Homebrew](https://brew.sh/):
```commandline
brew tap homebrew/cask-versions
brew install --cask temurin8
```
or using a packaged installation from [Azul](https://www.azul.com/downloads/?version=java-11-lts&os=macos&package=jdk&show-old-builds=true).
> Ensure you choose a Java installation that matches your system architecture (found in **Apple Menu > About This Mac**).
> - For Apple M1/M2 chips, select an **arm64** Java package.
> - For Intel-based Macs, choose an **x86_64** Java package.
>
> You may also need to set the `JAVA_HOME` environment variable to the path of the installed Java version. For example:
> ```commandline
> export JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home
> export PATH=$JAVA_HOME/bin:$PATH
> ```

### Install Miniconda

Miniconda is a lightweight distribution of Conda.

1. Download Miniconda from the [official website](https://docs.anaconda.com/miniconda/install/).
2. Follow the installation instructions described on the download page for your operating system.
3. Verify installation:
```commandline
conda --version
```

### Set Up a Conda Environment

Create and activate a new environment with a specified Python version for the gnomAD Toolbox:
```commandline
conda create -n gnomad-toolbox python=3.11
conda activate gnomad-toolbox
```

### Install gnomAD Toolbox
- To install from PyPI:
```commandline
pip install gnomad-toolbox
```
- To install the latest development version from GitHub:
```commandline
pip install git+https://github.com/broadinstitute/gnomad-toolbox@main
```

> **Troubleshooting:** If you encounter an error such as `Error: pg_config executable not found`, install the
> `postgresql` package:
> ```commandline
> conda install postgresql
> ```


### Verify the Installation

Start a Python shell and ensure that Hail and the gnomAD Toolbox are set up correctly:
```python
import hail as hl
import gnomad_toolbox
hl.init()
print("Hail and gnomad_toolbox setup is complete!")
```

---

## Available Example Notebooks

The gnomAD Toolbox includes Jupyter notebooks to help you get started with gnomAD data:

- **Explore Release Data:**
- Learn how to load and inspect gnomAD release data.
- Notebook: `explore_release_data.ipynb`

- **Filter Variants:**
- Understand how to filter variants using different criteria.
- Notebook: `intro_to_filtering_variant_data.ipynb`

- **Perform Secondary Analyses:**
- Explore more advanced analyses using gnomAD data.
- Notebook: `dive_into_secondary_analyses.ipynb`

---

## Run the Example Notebooks Locally
> If you already have experience with Google Cloud and using Jupyter notebooks, you can skip this section and use the
> notebooks in your preferred environment.

Hail can be [initialized](https://hail.is/docs/0.2/api.html#hail.init) with different backends depending on
where you want to run your analysis. For analyses that require a lot of computational resources, a cloud-based
environment will be most suitable.

However, running the gnomaAD Toolbox example notebooks can be done locally using the
`local` backend. At the beginning of each notebook, Hail is initialized with the `local` backend using:
```python
hl.init(backend="local")
```

To run the example notebooks locally, there are a few additional steps needed to set up your environment:

### Install the Cloud Storage Connector
The gnomAD Hail tables are stored in Google Cloud Storage, and in order to avoid downloading the entire dataset to your local machine,
we recommend using the [Google Cloud Storage Connector](https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage)
to access the data.

The easiest way to install the connector is to use the `install-gcs-connector` script provided by the Broad Institute:
```commandline
curl -sSL https://broad.io/install-gcs-connector | python3 - --auth-type UNAUTHENTICATED
```

### Copy and Open the Notebooks

1. Copy the notebooks to a directory of your choice:
```commandline
copy-gnomad-toolbox-notebooks /path/to/your/notebooks
```
> If the specified directory already exists, you will need to provide a different path, or if you want to overwrite
> the existing directory, you will need to add the `--overwrite` flag:
> ```commandline
> copy-gnomad-toolbox-notebooks /path/to/your/notebooks --overwrite
> ```

2. Start Jupyter with gnomad-toolbox specific configurations:
- For Jupyter Notebook:
```commandline
gnomad-toolbox-jupyter notebook
```
- For Jupyter Lab:
```commandline
gnomad-toolbox-jupyter lab
```

> These commands will start a Jupyter notebook/lab server and open a new tab in your default web browser. The
> notebook directory containing the example notebooks will be displayed.

3. Open the `explore_release_data.ipynb` notebook to learn how to load gnomAD release data:
- Run all cells by clicking on the >> button in the toolbar (shown in the image below) or by selecting "Run All"
- from the "Cell" menu.
![jupyter notebook -- run all cells](images/jupyter_run_all.png)

4. Explore the other notebooks described above.

5. Try adding your own queries to the notebooks to explore the data further.
> **WARNING:** Avoid running queries on the full dataset as it may take a long time.

---

## Resources

### gnomAD:
- [gnomAD Toolbox Documentation](https://broadinstitute.github.io/gnomad-toolbox/)
- [gnomAD Browser](https://gnomad.broadinstitute.org/)
- [gnomAD Download Page](https://gnomad.broadinstitute.org/downloads)
- [gnomAD Forum](https://discuss.gnomad.broadinstitute.org)

### Hail:
- [Hail Documentation](https://hail.is/docs/0.2/index.html)
- [Hail Discussion Forum](https://discuss.hail.is/)
- [Hail Zulip Chat](https://hail.zulipchat.com/)

---

## Contributing

We welcome contributions to the gnomAD Toolbox! See the [CONTRIBUTING.md](CONTRIBUTING.md) file for more information.

---

## License

This project is licensed under the BSD 3-Clause License. See the [LICENSE](LICENSE) file for details.
45 changes: 14 additions & 31 deletions docs/generate_api_reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,6 @@

EXCLUDE_PACKAGES = ["tests"]

EXCLUDE_TOP_LEVEL_PACKAGES = []
"""
List of packages/modules to exclude from API reference documentation.

This should be a list of strings where each string is the full name (from the top level)
of a package or module to exclude. For example, if 'gnomad_toolbox' includes a
'example_notebooks' that you want to exclude, you would add
'gnomad_toolbox.example_notebooks' to this list.
"""

PACKAGE_DOC_TEMPLATE = """{title}

{package_doc}
Expand Down Expand Up @@ -119,7 +109,11 @@ def write_module_doc(module_name):
write_file(doc_path, doc)


def write_package_doc(package_name):
def write_package_doc(
package_name,
package_doc = None,
doc_path = None,
):
"""Write API reference documentation file for a package."""
package = importlib.import_module(package_name)

Expand All @@ -139,32 +133,21 @@ def write_package_doc(package_name):

doc = PACKAGE_DOC_TEMPLATE.format(
title=format_title(package_name),
package_doc=package.__doc__ or "",
package_doc= package_doc or package.__doc__ or "",
module_links="\n ".join(module_links),
)

doc_path = package_doc_path(package)
doc_path = doc_path or package_doc_path(package)
write_file(doc_path, doc)


if __name__ == "__main__":
packages = setuptools.find_namespace_packages(
where=REPOSITORY_ROOT_PATH, include=["gnomad_toolbox.*"]
)
top_level_packages = [
pkg for pkg in packages if pkg.count(".") == 1 and pkg not in EXCLUDE_TOP_LEVEL_PACKAGES
]

for pkg in top_level_packages:
write_package_doc(pkg)

root_doc = PACKAGE_DOC_TEMPLATE.format(
title=format_title("gnomad_toolbox"),
package_doc="",
module_links="\n ".join(
f"{pkg.split('.')[1]} <{pkg.split('.')[1]}/index>"
for pkg in top_level_packages
write_package_doc(
"gnomad_toolbox",
package_doc=(
"This repository provides a set of Python functions to simplify working "
"with gnomAD Hail Tables. It includes tools for data access, filtering, "
"and analysis."
),
doc_path=os.path.join(DOCS_DIRECTORY, "api_reference", "index.rst"),
)

write_file(os.path.join(DOCS_DIRECTORY, "api_reference", "index.rst"), root_doc)
1 change: 1 addition & 0 deletions gnomad_toolbox/analysis/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# noqa: D104
60 changes: 60 additions & 0 deletions gnomad_toolbox/analysis/general.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
"""Set of general functions for gnomAD analysis."""

from typing import Dict, List, Optional, Tuple, Union

import hail as hl
from gnomad.assessment.summary_stats import freq_bin_expr

from gnomad_toolbox.load_data import _get_dataset


def get_variant_count_by_freq_bin(
af_cutoffs: List[float] = [0.001, 0.01],
singletons: bool = False,
doubletons: bool = False,
pass_only: bool = True,
**kwargs,
) -> Dict[str, int]:
"""
Count variants by frequency bin.

By default, this function counts PASS variants that are AC0, AF < 0.01%, and
AF 0.01% - 0.1%.

The function can also include counts of singletons and doubletons, with or
without passing filters.

.. note::

This function works for gnomAD exomes and genomes data types, not yet for gnomAD
joint data type, since the HT schema is slightly different.

:param af_cutoffs: List of allele frequencies cutoffs.
:param singletons: Include singletons.
:param doubletons: Include doubletons.
:param pass_only: Include only PASS variants.
:param kwargs: Keyword arguments to pass to `_get_dataset`. Includes 'ht',
'data_type', and 'version'.
:return: Dictionary with counts.
"""
# Load the Hail Table if not provided
ht = _get_dataset(dataset="variant", **kwargs)

# Filter to PASS variants.
if pass_only:
ht = ht.filter(hl.len(ht.filters) == 0)

# Initialize allele count cutoffs with AC0.
ac_cutoffs = [(0, "AC0")]

if singletons:
ac_cutoffs.append((1, "singletons"))

if doubletons:
ac_cutoffs.append((2, "doubletons"))

freq_expr = freq_bin_expr(
ht.freq, ac_cutoffs=ac_cutoffs, af_cutoffs=af_cutoffs, upper_af=None
)

return ht.aggregate(hl.agg.counter(freq_expr))
7 changes: 7 additions & 0 deletions gnomad_toolbox/configs/jupyter_notebook_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"NotebookApp": {
"nbserver_extensions": {
"jupyter_nbextensions_configurator": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"numberingH1": false,
"includeOutput": false,
"syncCollapseState": false
}
7 changes: 7 additions & 0 deletions gnomad_toolbox/configs/nbconfig/notebook.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"load_extensions": {
"nbextensions_configurator/config_menu/main": true,
"contrib_nbextensions_help_item/main": true,
"toc2/main": true
}
}
5 changes: 5 additions & 0 deletions gnomad_toolbox/configs/nbconfig/tree.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"load_extensions": {
"nbextensions_configurator/tree_tab/main": true
}
}
1 change: 1 addition & 0 deletions gnomad_toolbox/filtering/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# noqa: D104
Loading
Loading