Skip to content
Mabel edited this page Sep 19, 2022 · 88 revisions

Welcome to the Conversion Rate metrics-models wiki!

This is the homepage for CHAOSS' new draft metrics model for Conversion Rate.

From Issue #305: The conversion rate seeks to answer the question: What are the rates at which new contributors become more sustained contributors? The conversion rate metric is primarily aimed at identifying how new community members become more sustained contributors over time. However, the conversion rate metric can also help understand the changing roles of contributors, how a community is growing or declining, and paths to maintainership within an open source community.

Objectives (why)

  • Observe if new members are becoming more involved with an open source project
  • Observe if new members are taking on leadership roles within an open source project
  • Observe if outreach efforts are generating new contributors to an open source project
  • Observe if outreach efforts are impacting roles of existing community members
  • Observe if community conflict results in changing roles within an open source community
  • Identify casual, regular, and core contributors

Conversion Rate - Roadmap

This repository currently contains the code for the conversion rate metric model itself, and a proposed testing framework for metric models. It will be integrated into the Chaoss Compass after work on the Chaoss Compass foundations is completed. It currently is available as an optional, standalone plugin to the GrimoireLab stack. It also should be cross-checked with its parallel implementation by @Tieway59 prior to integrating it. Currently the aim is to make the workflow smoother for Github, create a more flexible and fully featured visualization, and flesh out the documentation for the config file - then add more data sources and a manual data input, and maybe a conversion rate config file generator tool. There are some non-essential bugs to fix as well - however they do not impact the conversion rate as is and are largely just quirks. Next, we will add a few more contribution types, including to distinguish leadership roles, both manually and by trace data (also allowing more developer levels). Integration of this metric model (and others) as a part of Chaoss Compass (https://github.com/open-metrics-code) is planned in the near future. The code can be easily tweaked to work with CHAOSS' Augur Stack in the future. May have to bundle a sortinghat tools module with the Metric Model. Currently, each run of the metric model code will calculate ONE type of conversion rate (level X to level Y only, for example)

Conversion Rate - Types of Contributions Supported (by platform)

  1. Github: Creation of issues, creation of PRs, various actions taken on "Github Issues" (aka either a PR or an Issue), comments on both issues and PRs. Supports some, but not all of https://docs.github.com/en/developers/webhooks-and-events/events. These are the types of events tracked by the metric model currently. Each of these events can be assigned to one or more "developer funnel" levels in the conf.yaml file for flexible analysis. Note that for projects.json - the data source you should put is github if you would like to collect any of the following data.

    • 'AddedToProjectEvent'
    • 'ClosedEvent'
    • 'CreatedEvent'
    • 'CreatedPREvent'
    • 'CrossReferencedEvent'
    • 'LabeledEvent'
    • 'MergedEvent'
    • 'MovedColumnsInProjectEvent'
    • 'PullRequestReview'
    • 'RemovedFromProjectEvent'
    • 'UnlabeledEvent'
    • 'UpdatedCommentOnIssueEvent'
    • 'UpdatedCommentOnPREvent'
  2. Others coming soon

Conversion Rate - Calculation Modes Possible

  • Cutoff-based: Use this mode for numerical-based conversion rate calculation. This is the method most similar to that seen in CHAOSS metrics such as New Contributors and Occasional Contributors.
    • Example: Level d1 requirement: $1 \leq c < 10$ where $c =$ number of contributions in specified denominator tracking interval. Level d2 requirement: $10 \leq c < \infty$ where $c =$ number of contributions in specified numerator tracking interval.
  • Action-based: Use this mode for designating different contributor funnel levels based on what type of contributions they make. If not also using the cutoff-based method in conjunction (which is recommended) - the types of contributions assigned to the upper level may NOT be a superset of the lower level - or else the calculation does not make sense. Note that these methods can be used together as well, but interpreting the results may be more difficult as there are multiple variables.

Conversion Rate - Constraints on Multiple Conversions

  1. Multiple conversions of the same uuid over time are not permitted, unless the option allow_multiple_conversions is set to True. The recommended value is False because of a scenario which can be illustrated by the following example: (it does not matter if January - June is the first collection interval for the metrics model, or there are ones before it.) Assume that person 'a6ad4707985ba75c0c932b23f5a1b4c7d6ed5bbd' made 2 qualifying D1 contributions during May, which is within the collection interval of January - June (inclusive). Then, by the end of July, the person had made 3 qualifying D2 contributions. So, this counts as a conversion for the month of July. However, when the collection interval moves to February - July - the same 2 D1 and D2 contributions will be counted again, and another conversion is recorded for the same person and the same actions in August. When allow_multiple_conversions is set to False, only ONE (the first) conversion is recorded for a given uuid.

Conversion Rate - Levels of Analysis

  1. Repo: To be used on a single repo.
  2. Project: To be used on a collection of repos that comprise a project.
  3. Community: To be used on entire communities or organizations [to be implemented] The keywords project and community can be the same in some cases, but not when there are multiple levels (see Kubernetes) (an example is a SIG)

Conversion Rate - Metric Model Workflow

Screen Shot 2022-09-07 at 11 10 57 PM ## Conversion Rate - Example Calculation Intervals Explained Screen Shot 2022-08-26 at 6 11 24 PM

The month offset between the numerator and denominator is to allow time for "conversions" to happen.

Functionality Developed as a part of GSoC

Over the course of Summer 2022, I was able to develop a metric model workflow that will take as input Grimoire ELK enriched data for Github, GithubQL, and Github2 enrichers, perform a combine phase (secondary enrichment) and finally, will compute a conversion rate time series which is unique to the specific configuration settings set by the user (in conf.yaml) - this last "enrichment" will result in data in Elasticsearch or Opensearch database for visualization. The project was divided into 4 phases - understanding enrichers/GrimoireLab codebase, writing code for the secondary enrichment, writing code for calculating conversion rate, and then incorporating more features as described in the conf.yaml. Developing the algorithm that calculates the conversion rate time series (over a running interval) was more complex and had more caveats than expected in the proposal so it took several weeks to nail down a flexible implementation that would fit all edge cases and a wide variety of use cases. Pilot studies and testing were run on the repo-level with GrimoireLab and Augur repositories.