PRACTICE BRIDGE A methane emissions reduction equivalence framework for alternative leak detection and repair programs

et al., 2016; Frankenberg et al., 2016; Terry et al., 2017), and satellites (Jacob et al., 2016). Although initiatives such as the Environmental Defence Fund/Stanford Mobile Monitoring Challenge and the ARPA-E MONITOR program have sought to evaluate some of these technologies, little progress has been made in systematically comparing alternative and conventional LDAR programs. Here, we propose a framework for demonstrating emissions reduction equivalence between LDAR programs.


Introduction
The International Panel on Climate Change (IPCC) recently underscored the importance of reducing methane emissions to keep global warming below 1.5°C (IPCC, 2018). However, global natural gas production is increasing, with an estimated 1.7-2.3% of total production (primarily methane) escaping directly to the atmosphere (Alvarez et al., 2018;International Energy Agency, 2017). Leak detection and repair (LDAR) programs are the most common regulatory tool for mitigating fugitive methane emissions (leaks) from upstream oil and gas. Historically, LDAR programs have relied on a variety of close-range methods implemented through U.S. Environmental Protection Agency's (EPA) Method-21 or Alternative Work Practice to perform component-level surveys. Although effective, these approved methods remain labour-intensive (ICF International, 2014, 2015. Recently, new methane-sensing technologies have emerged, promising faster, cheaper, or more effective leak detection (Fox et al., 2019). In response, regulators in Canada and the U.S. have created opportunities for flexible LDAR programs that permit new approaches to detection. However, operators wanting to move from a 'regulatory' to an ' alternative' LDAR program are typically required to demonstrate equivalence in emissions mitigation (Government of Canada, 2018). Regulatory approval of new technologies will be effective only if there is a transparent framework for operators and solution providers to demonstrate equivalence.
A broad spectrum of candidate technologies exists for potential integration with alternative LDAR programs. The most common alternative technology classes include handheld instruments, mobile ground labs (Caulton et al., 2018;Yacovitch et al., 2015), unmanned aerial vehicles (Nathan et al., 2015;Barchyn et al., 2017;Golston et al., 2018), stationary sensors (Coburn et al., 2018), manned aircraft Frankenberg et al., 2016;Terry et al., 2017), and satellites (Jacob et al., 2016). Although initiatives such as the Environmental Defence Fund/Stanford Mobile Monitoring Challenge and the ARPA-E MONITOR program have sought to evaluate some of these technologies, little progress has been made in systematically comparing alternative and conventional LDAR programs. Here, we propose a framework for demonstrating emissions reduction equivalence between LDAR programs.

PRACTICE BRIDGE
A methane emissions reduction equivalence framework for alternative leak detection and repair programs 2 Framework development The framework was originally developed at a multi-stakeholder workshop and has since been publicly reviewed at two additional workshops and during a 30-day public comment period. The development process was designed to be transparent and inclusive of diverse opinions and interests. Each of the workshops followed Chatham House rules.
On 25 July 2018, approximately 50 scientists, regulators, operators, consultants, and non-profit organizations gathered at the University of Calgary in Alberta, Canada, to discuss and solicit perspectives on how to demonstrate equivalence for alternative leak detection and repair (LDAR) programs (Figure 1). The workshop was organized into three sets of presentations followed by break-out sessions. Presentations were used to establish a common understanding of the equivalence challenge, including regulatory context, industry needs, and scientific knowledge. During break-out sessions, mixed stakeholder groups of 8-10 participants engaged in semistructured discussions around three themes: (a) Thinking about equivalence, (b) Developing a common framework, and (c) Applying equivalence to specific technologies. The following day, a committee of 8 experts met to distill these conversations into a draft framework. The resulting white paper was publicly distributed, and comments were solicited for 30 days to enable contributions from those unable to attend the workshop.
On 8 January 2019, a second two-day workshop was held at Colorado State University in Fort Collins. The draft framework was presented to 68 stakeholders from Canada and the U.S., proposed amendments were debated, two modifications were made, and the framework was collectively approved. A third workshop was held on 14 February 2019 geared specifically towards LDAR solution providers; this group was not invited to earlier workshops to prevent bias and manage workshop attendance. Participants at Workshop 3 were supportive of the framework and no further modifications were proposed.

Definitions
In the context of equivalence, it is important to distinguish among technologies, methods, and programs: A technology is a gas sensing instrument, optionally configured with a deployment platform and/or ancillary instruments (e.g. anemometers, positioning), that can be used to gather data on emissions.
A method combines a technology, a work practice, and analytics for use in an LDAR program. A method must clearly state any mandatory actions to be performed as part of the work practice, along with suitable operating conditions for the technology. These can include environmental conditions, limitations on facility-types, technology configurations, and survey procedure.
An LDAR program is the systematic implementation of one or more methods across a collection of assets. The program describes the method, or combination of methods, to be used for each facility, along with survey frequency, repair response, and reporting standards. Ultimately, it is the LDAR program that results in emissions mitigation, not the technologies or methods in isolation.
The frequently used term 'technology equivalence' is a misnomer, as no two technologies can be shown to have equivalent mitigation potential outside the context of a method and/or program. Although mitigation equivalence may be demonstrated among methods, it is most universally demonstrated at the program level for three reasons. First, multiple methods may be used simultaneously in a program. Assessing equivalence for multi-method programs is not as simple as aggregating the mitigation from individual methods due to potential detection overlap. Second, mitigation is a function of survey frequency, which is typically part of the program, not the method. For example, EPA's Method 21 is a method that can be implemented at differ-  Art. 30, page 3 of 7 ent frequencies (e.g. monthly, quarterly, annually) as part of a program to achieve different targets. Third, depending on regulatory language, ' alternative LDAR' doesn't necessarily require adoption of new technology. Operators may want to use approved methods while otherwise adjusting the program (e.g. definition of a leak, type of equipment surveyed, repair requirements, survey frequency, etc.) Operators may even propose different survey protocols for different asset types or locations. Despite using existing methods, these alternative programs may also need to demonstrate equivalence.

Equivalence framework
We define an equivalence framework as a scientifically-rigorous and transparent process that uses a combination of empirical data and modeling to estimate emission reductions from the implementation of an LDAR program and compares this estimate to mitigation from an approved program or a defined target. The reference mitigation achieved by the approved program and the spatial scale of comparison must be specified by the regulator. The proposed framework was designed to be of general interest to regulators developing alternative LDAR policy for conventional and unconventional oil and gas regions and does not account for specific jurisdictional contexts. It consists of five stages (Figure 2): 1. Method identification to assemble and define new methods 2. Controlled testing to evaluate the performance of new methods 3. Simulation modeling to predict the performance of new programs 4. Field trials to establish operational efficacy of new programs 5. Full approval of the alternative LDAR program Stages 1 and 2 focus on methods while subsequent stages require a program. The five stages will require engagement from multiple stakeholders including solution providers, operators, independent evaluators, and regulators. Stakeholders may wish to use the results of one stage to inform progression through the framework. An adaptive feedback process would help transfer experience and knowledge among stages.

Stage 1: Method identification
Clear method identification is critical to developing effective protocols for testing and evaluation. During Stage 1, applications should be solicited for new methods seeking to demonstrate equivalence. Clusters of similar methods can then be organized and defined to (a) identify common features and constraints, and (b) establish protocols for controlled testing. Specifically, Stage 1 will set group-wise performance metrics that link controlled testing with modeling. Standardized evaluation of multiple methods could improve comparisons, especially during controlled testing, and enable synchronized controlled testing to reduce costs. If performance of approved standards is unknown, these methods should also be moved through the Stages 2-4 to establish a reference mitigation target.

Stage 2: Controlled testing
Standardized controlled testing is necessary to understand method performance. Single-blind controlled field testing, administered by independent experts, should be used to develop performance metrics across a range of operational conditions. In addition to generating performance metrics, controlled field testing can contribute to the development and promulgation of clear and reproducible testing standards. Development of testing protocols should be led by neutral experts supported  with commenting from the broader user, developer, and regulatory community. Performance metrics should be carefully selected to ensure valid empirical inputs for the subsequent modeling stage. For each method, detection probabilities should be established under a range of conditions. Additional metrics should also be developed, depending on the method, and could include quantification accuracy, false positive rates, and spatial resolution. A geographically-dispersed network of test-sites could facilitate testing under different environmental conditions and facility types. Member sites should collectively adhere to testing and reporting standards recognized by all stakeholder groups. Joint funding of this network by stakeholders standing to benefit from emerging methods could minimize financial burden on solution providers, while maintaining independence of the testing. In addition to testing results, technology developers would also benefit from a testing process that identifies ways to improve their technology, work practice, and analytics. At this early stage, operators may not need to be involved, but may wish to partner with solution providers to develop methods that fit their needs.

Stage 3: Simulation modeling
Given the highly-skewed and stochastic nature of fugitive emissions from upstream oil and gas , fully-representative testing of methods and programs would be prohibitively expensive. Simulation modeling is a fast and cost-effective way to evaluate and explore a range of possible LDAR program configurations, forecast performance over long periods, and develop programs with cost-or mitigation-optimized deployment of different methods across a collection of assets. Stage 3 would use the performance metrics developed in Stage 2 as inputs for simulating the mitigation effectiveness of an alternative program. Whereas Stage 2 evaluates LDAR methods for detection effectiveness, Stage 3 evaluates LDAR programs for aggregate mitigation impact. The simulations will estimate total emissions detection over a reporting period, which can translate into emissions reductions when repairs are conducted.
Controlled testing standards, protocols, and outputs must be designed with modeling needs in mind, and models should strive to represent new technologies accurately. To date, modeling tools such as FEAST have been used to compare costs and mitigation effectiveness of LDAR programs. (Kemp et al., 2016) Extending such tools to demonstrate LDAR program equivalence may require new functionality. Simulation models must balance fidelity, data input requirements, accessibility, and resilience to gaming. Developers of prospective methods should be encouraged to experiment with models before entering Stage 1 to estimate performance requirements and avoid committing resources to testing if not ready. By the end of Stage 3, technology developers could decide whether to adjust their method and perform more testing before moving forward in the framework. In most cases, modeling results would need to show improved emissions mitigation and/or reduced costs to be attractive to operators, who are typically responsible for submitting alternative LDAR program applications. Regulators can help guide the development of modeling and reporting requirements. Jurisdictions should provide representative inputs (e.g. baseline emissions, leak-size distributions, and activity factors) that best reflect assets under their control, because high-quality empirical inputs generate more accurate outputs that can improve decision-making capacity.
The modeling stage is focused primarily on emissions mitigation and is unlikely to incorporate cost. For a proposed program to be attractive, operators and solution providers will need to develop cost models in parallel to determine whether the program warrants further work.

Stage 4: Field trials
Controlled testing and modeling may fail to capture the full scope of real-world performance, including unforeseen human and environmental factors that may only become apparent during deployment. Data from the field may also help improve models and build confidence in their predictions. We therefore recommend field trials to evaluate performance in operational conditions and demonstrate the efficacy of candidate programs.
To initiate Stage 4, operators and technology developers would work together to develop an alternative LDAR program application for submission to the regulator. The submission would include results from Stages 2 and 3, methods to be used, survey frequencies, reporting protocols, a field trial plan, and other information relevant to the alternative program. The duration of the field trial (in time or number of surveys) and the number of assets involved should be specified in the application to ensure a representative sample size. The regulator would review the application package and, if satisfactory, may approve the proposed field trial.
During Stage 4 the alternative program would be implemented on a specified proportion of the operator's assets. Evaluating field trial effectiveness could take several forms. At minimum, a brief field trial should be required to troubleshoot for unanticipated issues not accounted for or predicted by modeling. The bigger goal -evaluating mitigation effectiveness in the field -is complicated because (a) true emission rates are unknown, (b) most quantification methods are highly uncertain, and (c) obtaining a representative sample including 'super-emitters' -a small number of high-emitting sources -may be cost-prohibitive. Limited insights into mitigation effectiveness may be gleaned by deploying alternative technologies alongside conventional LDAR and a range of component-to facility-scale quantification techniques. Selecting a field trial approach may depend on jurisdictional policy context. For example, Canadian regulatory language considers operators under compliance during field trials. However, in other jurisdictions, a full regulatory LDAR program must be implemented in conjunction with field trials.

Stage 5: Full approval
Results from all stages would be communicated to the regulator by the applicant for evaluation and full approval. Program auditing and compliance details may be included in the application, along with the program scope. Upon full approval, operators could substitute the new method for existing (approved) LDAR methods. Regulators should consider developing commensurate frameworks for approvals, allowing alternative programs approved in one jurisdiction to be more easily approved in another, with necessary adjustments for geographical constraints, gas compositions, weather, and other relevant factors.

Exceptions
Should a regulator choose to adopt a version of this framework, they may wish to identify situations in which exceptions are warranted. We provide three examples, but others likely exist: 1. Operators may wish to implement an alternative LDAR program using approved methods. In such a case, Stages 1-2 of the framework could be skipped. 2. A novel method may be proposed that is very similar to an approved method. In such a case, a reduced suite of controlled testing scenarios may be warranted. 3. A company may want to implement an alternative practice specific to an individual facility or type of facility. In such a case, it may be reasonable to skip directly to field trials.

Challenges
Several unresolved challenges must be considered before adopting this framework. We arrange these challenges into five broad categories: controlled testing, modeling, scale and source disambiguation, human factors, and logistics.

Controlled testing
Each method has a unique set of environmental and operational variables that influences its performance. These variables must be identified and incorporated into testing, and results must be used to inform when, where, and how new methods should be deployed. Two distinct issues can arise from omitting critical variables from testing. First, a method could be tested beyond its optimal use case. As a hypothetical example, a method may only work at 10% of sites, but detect 90% of emissions at those sites. Without context, the method could be defined to detect only 9% of emissions. Similar consideration may be required for meteorology, daylight, topography, work practice, or other method-specific factors. Second, if performance metrics are developed under optimal measurement conditions, a method may fail to achieve anticipated mitigation when deployed. A potential solution is to develop ' operational envelopes' that define the measurement conditions under which a method can be deployed. Operational envelopes would reflect testing conditions and could expand over time with additional testing to access new markets. Criteria for expanding operational envelopes must balance cost and scope of testing; fully representative testing under all environmental conditions is not practical.

Modeling
Critical modeling challenges include establishing baseline emissions distributions, specifying functionality, and model validation. Emission rates in leak simulation models are sampled from empirical distributions. Several distributions exist, but they differ by basin and represent only a snapshot in time. For most producing regions, data are incomplete, dated, or do not exist. Acquisition of detailed baseline measurements is time-consuming, expensive, and can suffer from measurement bias or uncertainty. Sample sizes must be high to account for super-emitters; recent studies have demonstrated that the top 5% of sources typically contribute 50% of total emissions . Chance variability in the presence and magnitude of super-emitters can therefore result in markedly different distributions, which may favour certain methods over others during modeling.
Optimizing model functionality and accessibility is an unresolved challenge. On one hand, modeling tools that are intuitive, accessible, and transparent will be widely used and accepted. On the other hand, these tools must be able to accommodate a broad diversity of methods, environments, and policy contexts. With insufficient functionality, methods could be excluded or poorly represented. Finally, model validation is difficult. While detection modules for individual methods may be validated in the field, validation of long-term programs with multiple methods will be challenging.

Scale and source disambiguation
Method 21 and handheld cameras are well-established in LDAR as they enable component-level detection, which can often lead to immediate diagnosis and repair. However, many emerging methods propose rapid screening for aggregate emissions, often at the facility scale. If anomalous emissions are detected while screening, close-range methods must be deployed to confirm and diagnose the source. Screening-based programs must therefore articulate when and how repair teams will respond to detection events. However, screening is sensitive to confounding sources. For example, most sites have legal venting limits that can be difficult to distinguish from fugitive emissions. The possible presence of confounding sources may increase the rate of false-positives, leading to unnecessary follow-up surveys.

Human factors
Human dimensions of LDAR have not been studied. In the context of equivalence, the biggest challenges are system gaming and post-approval incentivization. If the framework is recognized, it will become a barrier to market access for emerging solution providers, who will face pressure to succeed. Simulation models must be protected from directed attacks, such as modifying source code to improve results or selectively editing input data sets. Less nefarious deception, such as selective reporting of data or results, could occur. These and other temptations must be thoroughly considered, and preventative measures implemented. Model results should also be reviewed for accidental misapplication (e.g. inappropriate selection of input data).
The framework faces two broad incentivization challenges: program improvement and program compliance. First, solution providers may be disincentivized to Art. 30, page 6 of 7 improve work practices, technologies, and analytics once they are approved if modifications risk voiding approval. To prevent stagnation, efficient approval mechanisms for updates should be implemented. Second, service providers must be incentivized to abide by approved programs as defined, particularly when human intervention is required to complete the approved process. For example, technicians may face pressures from employers, operators, or unpleasant working conditions (e.g., excessive cold or heat) that may decrease program effectiveness relative to the approved program. Performance may also vary with user experience, and method-specific training protocols should be defined and implemented.

Logistics
Several logistical challenges remain unresolved. Stakeholder roles and responsibilities must be defined, including funding sources, oversight of controlled testing, and development, management and administration of the simulation model. Transparency of the demonstration process must be established, including whether performance results are made public, standards for protection of intellectual property, and whether approved methods and programs are made available for all operators. These challenges may be resolved according to jurisdictional differences in regulatory language. Jurisdictions should work together to ensure that definitions and approval standards are consistent to minimize redundant bureaucratic barriers to approval.

Conclusions
This framework is a first step towards encouraging adoption of alternative LDAR programs. Implementation should strive to balance rigor and practicality. If hurdles are too great, operators will avoid alternative methods and settle for regulatory LDAR, which may lower investment, stifle innovation, and limit our ability to reduce emissions and learn about new methods through deployment. However, rigor in framework implementation is necessary, as failure to curb fugitive emissions may not be evident due to the challenge of tracking baseline emissions. Moving forward, work will be required to consolidate, refine, and execute the framework. First, relevant stakeholders should be identified, and their roles clearly defined. Regulators may want to take responsibility for leading the development of a formalized framework with clear and detailed criteria for demonstration. A collaborative network of controlled testing sites should be developed, with broad geographical representation, reliable funding, and independent operation. New methods must be formalized, and testing protocols developed. Open-source simulation models should be developed to be flexible, transparent, robust, and accessible. Communication networks among regulators should be established to facilitate inter-jurisdictional translatability of methods and programs. Finally, the framework should evolve to have specific guidelines for each stage, providing all stakeholders a clear understanding of the resources required to develop and implement new methods and programs.