Domain Editor-in-Chief: Detlev Helmig; University of Colorado Boulder, US
Guest Editor: Paul Palmer; The University of Edinburgh, UK

## 1. Introduction

Inversion of atmospheric tracers (Tarantola 2005) is a widely used method to determine the surface fluxes of greenhouse gases (GHGs) such as carbon dioxide (CO2) (Tans et al. 1990; Enting and Mansbridge 1989) using observations from various observing platforms such as towers (Richardson et al. 2012, Andrews et al. 2014, Miles et al. 2016, Richardson et al. 2016), aircraft (e.g. Gerbig et al. 2003), ground-based remote sensing (e.g. Wunch et al. 2010) and more recently satellite measurements (Crisp et al. 2004). Inverse methods have the potential to support both global- and local-scale GHG emissions monitoring (Pacala et al., 2010), pending adequate accuracy and precision of the inverse flux estimates. Inverse GHG flux estimate methods require an accurate representation of the physical relationship between the GHG mole fractions measured by these various observing platforms and the original surface fluxes of GHGs (Enting 2002). If atmospheric transport uncertainty is large, the observational data have limited impact in determining the true GHG surface fluxes. However, atmospheric transport errors are not typically rigorously quantified for inverse flux estimates, but research suggests that transport errors have a large impact on atmospheric GHG mole fraction estimates (Lin and Gerbig 2005, Díaz-Isaac et al. 2014) and flux estimates (Baker et al 2006, Lauvaux et al. 2009, Lauvaux and Davis 2014). Few studies have attempted to assimilate jointly meteorological observations and atmospheric greehouse gas measurements (e.g. Kang et al., 2012; Agustí-Panareda et al., 2016). Therefore, improving atmospheric transport model performance should substantially improve the accuracy and precision of inverse GHG flux estimates.

An objective of the Indianapolis Flux Experiment (INFLUX) is to quantify and improve the precision and accuracy of atmospheric inverse methods for determining urban CO2 emissions (Davis et al. 2016, this issue). Improving the accuracy and precision of atmospheric transport model solutions is thus a critical element of this study. Methods for improving transport model solutions can be divided into four broad classifications, increasing model resolution, improving model physical parameterizations, improving input data used to drive the simulation, and assimilation of meteorological observations.

Law et al. (2003) attempted to reduce transport model errors by increasing model resolution, potentially reducing the representation errors affecting global scale models (Ahmadov et al. 2007, Carouge et al. 2010). Higher resolutions such as regional applications of inverse modeling (e.g. Schuh et al., 2010; 2013; Lauvaux et al. 2016), demand greater atmospheric transport fidelity due to highly dynamic continental meteorology (Law et al. 2008), and the presence of observations obtained largely within the planetary boundary layer (PBL) and in close proximity to strong sources and sinks (e.g. Peters et al. 2007, Miles et al. 2016, this issue). Simply increasing resolution is thus unlikely to solve the issue of transport model accuracy since resolution and complexity increase jointly. In addition, increased resolution does not ensure accuracy in model physical parameterizations or input data. Urban inversions are thus likely to suffer from significant transport errors similar to regional and global inversions (Feng et al. 2016).

The Weather Research and Forecasting (WRF) model is a state-of-the-science community-supported numerical weather prediction (NWP) and atmospheric simulation system. WRF has been used worldwide for both research and operational applications (Skamarock et al. 2008). Its ability to simulate atmospheric processes relevant to atmospheric transport and dispersion has been tested widely (Cintineo et al. 2014, Clark et al. 2015, Coniglio et al. 2013, Hariprasad et al. 2014, Rogers et al. 2013, Lauvaux et al. 2013, Karion et al. 2015). In addition to its advanced numerical scheme and continuously upgraded array of model physics parameterizations, WRF has a four dimensional data assimilation (FDDA) capability implemented by Penn State University (Deng et al. 2009) that allows meteorological observations to be continuously assimilated, allowing WRF to produce dynamic analyses at user-desired resolution. Rogers et al. (2013) investigated the effect of various FDDA strategies on the accuracy of the WRF-simulated mesoscale features over the Central Valley of California. The optimal FDDA configuration identified in Rogers et al. (2013) has been used in many recent modeling studies involving studying GHGs (Lauvaux et al. 2013, Karion et al. 2015).

Towards that end, we test the impact of assimilating various meteorological data on the transport model accuracy for the INFLUX region, and the effect of assimilating these meteorological data sources on the inverse CO2 fluxes derived for the city. Since wind speed, wind direction, and the depth of mixing in the PBL are expected to be critical to accurate and precise GHG flux estimation, the regional operational weather observation network was enhanced with a commercial compact Halo Streamline Doppler lidar (Pearson et al. 2009) capable of continuous observations of those key atmospheric variables. The lidar data complement the standard WMO rawinsondes (only available at 00 and 12 UTC), and aircraft measurements from the commercial airline program Aircraft Communications Addressing and Reporting System (ACARS, Anderson 2010). We evaluate the performance of the inversion system over 2 months (September-October 2013) using WRF atmospheric simulations that assimilate different combinations of data sources from WMO surface stations, the lidar, and in-situ data from ACARS, and assess the overall impact on WRF performance for each of the meteorological instrumentation used in our assimilation system. Finally, we conduct several sensitivity studies to assess the impact of these different meteorological data assimilation choices on the inverse CO2 flux estimates obtained over the two-month period. Section 2 briefly describes the WRF model used in this study. Model configurations and experimental design are discussed in Section 3. Model results and discussions are given in Section 4, and a summary and conclusions are given in Section 5. This study complements a companion study by Sarmiento et al (2016, this issue) that explores the dependence of model performance on both the choice of land surface and PBL parameterizations, and the input data used to describe the urban surface.

## 2. Model description

The modeling system used in this research consists of a mesoscale atmospheric modeling component that handles forward transport and dispersion, a Lagrangian Particle Dispersion Model (LPDM, Uliasz 1994) that is run backward in time and driven by the flow fields simulated by the atmospheric transport model, WRF, and a Bayesian the inversion modeling system to compute the posterior fluxes given the prior fluxes, the transport adjoint solutions, and the estimates of prior emissions errors, transport model errors and measurement errors. The mesoscale atmospheric model used here is the Weather Research and Forecasting (WRF) model, a mesoscale atmospheric modeling system that has been used worldwide for both research and operational applications (Skamarock et al. 2008). WRF’s development is supported by the broad scientific community, along with very active participation of university scientists worldwide. WRF has a flexible, portable code that runs efficiently in computing environments ranging from massively parallel supercomputers and clusters to laptops.

WRF is a non-hydrostatic, fully compressible three dimensional (3D) primitive equation model with a terrain-following, hydrostatic pressure vertical coordinate, and is designed for simulating atmospheric phenomena across scales ranging from large eddies (~100 m) to mesoscale circulations and waves (~1 km to 100 km) to synoptic-scale weather systems (~1000 km). These applications include real-time NWP, model physics research, regional climate simulation, hazard prediction modeling and, with the addition of a chemistry module (WRF-Chem, Grell et al. 2005), air-quality studies.

The WRF model includes a complete suite of atmospheric physical processes that interact with the model’s dynamics and thermodynamics core. These physical processes include cloud microphysics (MP), cumulus parameterization needed on coarser grids (Δx > O(10 km)) for representing the un-resolved atmospheric convection, atmospheric radiation, PBL/turbulence physics, and land surface models (LSMs). Selection of the model physics suite in this research is based on previous research (e.g., Lauvaux et al. 2013, Rogers et al. 2013). For cloud microphysics, this study uses the WRF single-moment five-class (WSM5) simple ice scheme (Hong et al. 2004) that assumes no mixed-phase conditions. For cumulus parameterization, the Kain–Fritsch scheme (Kain and Fritsch 1990, 1993; Kain 2004) is used on the 9-km grid (see next section for grid configuration). For atmospheric radiation, the Rapid Radiative Transfer Model (RRTM; Mlawer et al. 1997) longwave (LW)/Dudhia shortwave (SW; Dudhia 1989) scheme is used. For PBL turbulent processes, the turbulent kinetic energy (TKE)-predicting Mellor-Yamada Nakanishi Niino (MYNN) Level 2.5 turbulent closure scheme (Nakanishi and Niino 2006) is used, along with the MYNN surface layer scheme to preserve consistency. The decision of selecting the MYNN PBL scheme is based on our experiences and previous studies where MYNN appeared to produce the most accurate PBL temperature and moisture profiles (Cintineo et al. 2014, Clark et al. 2015) as well as the most accurate PBL depth (Coniglio et al. 2013, Hariprasad et al. 2014) in simulations of the PBL over land, all of which are highly important to simulating transport and dispersion of surface emissions. For land surface processes, the Noah LSM (Chen and Dudhia 2001, Tewari et al. 2004) is used. The Noah LSM is a four-layer soil temperature and moisture scheme and includes plant root zone, evapotranspiration, soil drainage, and runoff, taking into account vegetation categories, monthly vegetation fraction, soil texture, and snow and ice cover.

The WRF modeling system has four dimensional data assimilation (FDDA) capabilities that allow continuous assimilation of meteorological observations during the model simulation, unlike intermittent approaches used in variational data assimilation techniques. For retrospective applications, FDDA can be used in numerical models to produce accurate dynamic analyses at the desired temporal and spatial resolution. FDDA has been widely used in studying atmospheric transport and dispersion processes (Deng et al. 2004, 2006, Rogers et al. 2013, Lauvaux et al. 2013, Karion et al. 2015). The version of FDDA used in this research was originally developed for MM5 and was enhanced and implemented into WRF (Deng et al. 2009, Rogers et al. 2013). Further enhancements to the observation nudging technique in WRF have brought more flexibility to control how surface observations influence meteorology aloft. WRF users have freedom to choose different vertical weighting functions for the surface observations (Rogers et al. 2013). Unlike Rogers et al. (2013) in which various FDDA strategies were evaluated to identify the optimal FDDA settings to produce the most accurate model solutions to represent the meteorological conditions, this research uses the findings from Rogers et al. (2013) to focus on exploring the effect of assimilating various meteorological observation types on the model solution as well as on the inverse flux solutions driven by the WRF-simulated meteorological solutions. The FDDA strategy used in this research includes using analysis nudging and observation nudging on the 9-km coarse grid, and only observation nudging on the 3- and 1-km fine grids, with similar multiscale configurations and parameter settings to those used in Rogers et al. (2013), Lauvaux et al. (2013), and Karion et al. (2015).

To solve the inverse problem, surface fluxes are related to the observed atmospheric mole fraction via a representation of atmospheric dynamics referred to as the observation operator (Lauvaux et al. 2012). This linear operator is then inverted to allow for the optimization of fluxes given prior flux estimates, atmospheric observations and the associated uncertainties. Whereas the WRF model represents the projection of the surface fluxes into atmospheric mole fraction, the inverse of the operator is still required in order to solve for the inverse problem. To represent the inverse operator, also called influence function, we use the Lagrangian Particle Dispersion Model (Uliasz 1994) driven by the hourly mean wind fields, potential temperature, and the turbulence simulated by the WRF model as discussed above. The mean wind fields and the TKE will drive the LPDM backward in time so that areas of influence at the surface for any given tower observation can be represented by the model solution. LPDM has been used extensively in the past for various applications (e.g. Pielke and Uliasz, 1993, Schuh et al. 2010, Lauvaux et al. 2012), and more recently to perform urban inversions of CO2 over Indianapolis (Lauvaux et al. 2016). Lauvaux et al. (2016) provides a complete description of the coupling between WRF and LPDM used here.

The inversion system is solving for the exact solution of the Bayesian inverse problem, i.e. producing the analytical flux solution with its associated error covariances (i.e. Kalman method) (Tarantola 2005). Prior CO2 emissions estimates within the inner domain are drawn from the Hestia 2012 emissions product (Gurney et al. 2012). Hestia is a high-resolution (i.e., building-level) inventory-based data product created to estimate anthropogenic CO2 emissions for the Indianapolis area. It combines several datasets such as energy consumption, traffic data, industrial productivity, and electricity generation from the power plant, with models such as a building energy model. Forwards-in-time atmospheric CO2 simulations based on emissions from the Hestia dataset within WRF-Chem are only used to compare to the mole fractions estimated with WRF-LPDM. Time periods with large discrepancies between forwards and adjoint atmospheric CO2 mole fractions reveal unusually difficult transport conditions, and these times are excluded from the inverse analyses.

The inverse system for Indianapolis has been described in Lauvaux et al. (2016) for the period September 2012 to May 2013. Here, we perform five-day emission inversions for September and October 2013 using the reference configuration as described in Lauvaux et al. (2016). The minimzation is performed using a Kalman filter producing the analytical solution to the inverse problem (Tarantola et al., 2005). Similar approaches have been used at global (e.g. Law et al., 2003), continental (Carouge et al., 2010), or regional (Lauvaux et al., 2008). The prior error structures prescribed in the inversion follow the urban land mask and a correlation length scale of 4 km, both convolved to generate the prior emission covariance matrix similar to Carouge et al. (2010) and Lauvaux et al. (2012). The control vector to be optimized in the system includes only the fossil fuel emissions, ignoring biogenic fluxes for this time of year. Turnbull et al. (2015) demonstrated the limited influence of biogenic signals during October to April (less than 5%) relatively too small to be optimized over that time of year. The background conditions were determined using the optimal tower location depending on wind speed, as detailed in Lauvaux et al. (2016). Of most importance, we use the wind direction to define the optimal background sites for each observation time and perform multiple inversions using identical assumptions only swapping the influence functions from the different WRF-FDDA simulations. This comparison allows us to isolate and quantify the sensitivity of the inverse estimate of CO2 emissions to the different transport solutions.

## 3. Model configuration, meteorological observations, and experiment design

### 3.1. Model configuration

The WRF modeling system used in this research is based on WRF V3.5.1, released in September 2013. The INFLUX WRF configuration consists of three nested grids with 9-/3-/1-km horizontal resolutions (Figure 1a), with the focus on the 1-km grid. The topographic and landuse database needed to initialize the WRF model is based on the U.S. Geological Survey (USGS) 30-second terrain and 24-category landuse. As indicated by the landuse distributions (Figure 1b, c, d), a significant fraction of the 1-km grid is characterized as urban (Figure 1d). In the vertical, fifty nine (59) vertical terrain-following layers are used, with the center point of the lowest model layer located ~7 m above ground level (AGL). The thickness of the layers increases gradually with height, with 24 layers below 850 hPa (~1550 m above sea level). The top of the model is set at 100 hPa.

The WRF-Chem system was configured to run for a two-month period (Sept.–Oct. 2013), in five-day segments with a 12-hour overlapping time-window. The WRF model solutions are then used to drive a LPDM that calculates the CO2 footprints for each of the CO2 tower observations (Lauvaux et al., 2016). The footprints or influence functions are used in the inversion system to compute the updated posterior CO2 fluxes.

Figure 1

WRF grid configuration. a) WRF 9-/3-/1-km resolution grids, b) 9-km grid landuse, c) 3-km grid landuse, and d) 1-km grid landuse. Graph b corresponds to the full areas shown in graph a, and graphs c and d are enlargements corresponding to the areas indicated by the black boxes in graph a. Color notation for the landuse panels: Black: urban; Light Yellow: dryland cropland and pasture; Yellow: mixed dryland-irrigated cropland and pasture; Light Green: irrigated cropland and pasture; Green: forest; Blue: water bodies. DOI: https://doi.org/10.1525/elementa.133.f1

### 3.2. Meteorological observations

The meteorological observations assimilated in WRF-FDDA include the standard measurements of 10-m wind, 2-m temperature and moisture fields from World Meteorological Organization (WMO) surface stations at hourly intervals and radiosondes at 12-houly intervals, as shown in Figure 2. There are five upper-air observations spread across the 9-km grid. However, none of them are located on the 3- and 1-km grids and only three WMO stations are available on the 1-km grid.

Figure 2

Distribution of the assimilated observations, for 21 UTC 15 October 2013. a) National Weather Service surface (open circles) and radiosonde (solid circles) observations, b) LIDAR and ACARS at 935 hPa level, c) LIDAR and ACARS at 885 hPa level, and d) ACARS at 250 hPa level. Note that star symbols denote the ACARS observations and solid square denotes the Halo lidar observations. DOI: https://doi.org/10.1525/elementa.133.f2

The 20-minute wind profiles measured by the lidar are used in this study. The lidar is deployed in Lawrence, IN, about 15 km to the northeast of downtown Indianapolis. The lidar has operated and provided data continually since April 2013, with a 6-month gap at the end of 2015 when the system was temporarily removed and upgraded. Additional information about the deployment and data are available online (http://esrl.noaa.gov/csd/projects/influx/). Pearson et al. (2009) provides a complete description of the system and its operating principles. For data presented and used here, range gates (i.e., spatial resolution) were set to be equally spaced at 38.4 m apart. Thus, scales of motion and turbulence larger than the range gate size were explicitly resolved by the lidar.

The lidar directly measures range resolved line-of-sight velocities and backscatter intensity from aerosols and other scatterers. The lidar performs a sequence of velocity azimuth display (VAD) scans at multiple elevations, range height indicator (RHI) scans, and stares. This suite of scans that repeats every 20 min is used to make profiles of the wind speed, wind direction, velocity variances, and backscatter intensity. These profiles are used together to estimate the PBL depth. The 20-minute wind profiles measured by the lidar are also available for assimilation into WRF.

In addition to the WMO and lidar observations, the winds, temperatures and moisture fields observed from the ACARS are also assimilated. The ACARS observations are available at times when observations are taken from commercial aircraft, and are distributed in the vertical from near-surface to cruising altitudes above the tropopause, and in the horizontal along the flight tracks. PBL data density is highest near major airports as aircraft ascend or descend.

### 3.3. Experimental design

In order to evaluate the effect of assimilating various observations, as shown in Table 1, four different WRF configurations (or experiments) are conducted and results of both meteorological fields and posterior CO2 fluxes are compared among the four experiments: 1) NOFDDA – No data assimilation of any form is applied, and WRF is purely driven by the North America Regional Reanalysis (NARR), an analysis product that is a combination of model background and observations but on a coarse spatial and temporal scale (i.e., 32 km); 2) FDDA_WMO – Only standard WMO hourly surface (winds) and 12-hourly upper-air observations (winds, temperature and water vapor) are assimilated; 3) FDDA_WMO_Lidar – In addition to WMO observations, wind profiles from the local INFLUX lidar are also assimilated; and 4) FDDA_WMO_Lidar_ACARS – In addition to the WMO and lidar observations, the ACARS observations are also assimilated. Since all five WMO sondes are outside the 3- and 1-km grids, the effect of assimilating WMO sondes can only propagate into the 3- and 1-km through the grid boundaries between the 3- and 9-km grids. Since in FDDA the impact of surface observations is limited to the lowest portion of atmosphere up to the model-simulated PBL top (Rogers et al. 2013), it is anticipated that assimilating additional upper air observations such as Lidar and ACARS observations (Figure 2b, 2c, 2d) can cause additional improvements in the WRF model solutions so that the transport error in the inversion system is further reduced. During model integration, observations are continuously assimilated using the WRF FDDA technique that is described in Deng et al. (2009) and Rogers et al. (2013). Similar to previous studies (i.e., Deng et al. 2004, 2006, Rogers et al. 2013, Lauvaux et al. 2013, Karion et al. 2015), assimilation of temperature and moisture observations are only allowed above the model-predicted PBL top so that the internal model PBL physics may operate without interference from the data assimilation, while winds are assimilated through the entire atmosphere.

Table 1

FDDA configuration for WRF simulations. DOI: https://doi.org/10.1525/elementa.133.t1

Exp. Name Data assimilated

NOFDDA No meteorological observations
FDDA_WMO WMO surface and upper-air sonde observations of winds, temperature and water vapor
FDDA_WMO_Lidar Same as above with addition of INFLUX Halo Doppler lidar winds
FDDA_WMO_Lidar_ACARS Same as above with addition of ACARS wind, temperature and water vapor

All experiments use identical model physics for all grids except the cumulus parameterization scheme (that is: 4-layer Noah LSM, the 2.5-level MYNN PBL scheme, and the RRTM longwave/Dudhia shortwave scheme are used on all grids, while the KF cumulus scheme is used only on the 9-km grid). For the preliminary evaluations conducted in this paper, each WRF simulation segment of the four experiments is five days long, and was initialized with 3-hourly NARR at 32-km × 32-km resolution for the initial conditions and lateral boundary conditions (ICs/LBCs). The NARR analyses were downloaded from the Research Data Archive (RDA) maintained by the Computational and Information Systems Laboratory (CISL) at the National Center for Atmospheric Research (NCAR).

In addition to assimilating observations during the model integration, the initial condition fields are further enhanced by sonde and surface data through the WRF objective analysis process, Obsgrid, using a modified Cressman analysis method (Deng et al. 2009, Rogers et al. 2013). The three-dimensional (3D) analyses and the surface analysis fields used for analysis FDDA are also enhanced by the objective analysis process and are defined at three-hour intervals (Deng et al. 2009), which means that WRF solutions are nudged towards more accurate analyses than NARR on the WRF grid by including observations.

### 3.4. Model evaluation methods

The WRF-simulated meteorological fields are evaluated quantitatively by comparing the error statistics of the model-simulated wind speed, wind direction, and temperature. Evaluation is performed on the 1-km grid only since the high-resolution grid is our primary interest. Mean absolute error (MAE) and root mean squared error (RMSE) are calculated to measure how close the model values are to the observed values. Mean error (ME) is calculated to measure the model bias for a given variable. MAE and ME are computed for both the surface and upper air observation locations. For the surface, the WRF (2-m temperature and 10-m wind) values derived from the lowest model layer using similarity theory are compared with the surface observations. For the upper air, the model values are interpolated onto the observation locations in both horizontal and vertical pressure space, and are then compared with the observations. A calm wind threshold was used in this study to remove situations with very light winds (less than or equal to 1 m s–1) for the wind direction statistics calculation because the wind direction for near-calm wind is undefined. Note that due to the limited size of the 1-km grid, there are only three WMO surface stations and no sondes available within the grid, thus reserving a separate set of WMO observations for independent verification (as done in Rogers et al. 2013) is not possible. Since the WRF FDDA technique uses a relaxation term in the model tendency equations rather than conform the model solution to an observation, one should not expect data assimilation to produce an exact match between the modeled and observed values, and spatial correlations in the nudging coefficients limit the extent of the observation influence. As shown in Rogers et al. (2013), the accuracy of a simulation using FDDA scheme highly depends on the accuracy of the model dynamics. Therefore, the model statistics (i.e. MAE and ME) should remain similar with either assimilated or independent data. In addition, as shown in Rogers et al. (2013), meteorological observations within a distance (e.g., ~60 km) are correlated. For these two reasons, the complete set of meteorological observations including WMO, lidar and ACARS was used in model verification. However, the ACARS data could be considered to be independent observations for the first three numerical experiments since they are not assimilated in them; similarly, lidar is not assimilated in the first two experiments and so is independent of those simulations.

In addition to wind speed and wind direction, we present the evaluation of PBL depth which impacts directly the transport and dispersion of trace gases near the surface, and therefore the estimation of surface fluxes in the inversion. Note that PBL depth is not assimilated, thus it is an independent variable for validation. WRF model can produce a diagnosed PBL depth from its PBL sub-model or PBL scheme. Several methods have been used to diagnose the PBL depth within the different PBL schemes that have been proposed in the literature, usually based on either the vertical thermal profile or the vertical profile of TKE as predicted by the specific PBL scheme. Here, we used the 2.5-level MYNN scheme (Nakanishi and Niino 2006) which is a TKE-predictive scheme. We compare the model-predicted PBL-depth with the observed PBL depth to measure how well the model represents PBL processes. Nighttime comparisons of PBL depth were not performed due to the fact that modeled PBL depth in stable conditions is often less reliable and harder to define, and thus nighttime tower data are not currently utilized in our inverse CO2 flux estimates.

To evaluate the impact of the different model configurations on the CO2 emissions from the inversion, we compute the ratios of the Bayes Factors (BF) which represent the ratios of the marginal likelihoods for the different model configurations. BFs relate to the relative values of posterior conditional probabilities for a given model and can be expressed as follows:

$-2\mathit{\text{log}}\left(\mathit{\text{BF}}\right)=\mathit{\text{log}}\left(|\mathbit{\text{HBH}}{}^{\mathbit{\text{T}}}+\mathbit{\text{R}}|\right)+{\left(y-\mathbit{\text{Hx}}{}_{\mathbit{\text{b}}}\right)}^{\mathbit{\text{T}}}×{\left(\mathbit{\text{HPH}}{}^{\mathbit{\text{T}}}+\mathbit{\text{R}}\right)}^{-1}×\left(y-\mathbit{\text{Hx}}{}_{\mathbit{\text{b}}}\right)$

where H is the Jacobian, B and R the prior and data covariances, y the observations and xb the prior estimate. Larger values of BF correspond to more probable models. By calculating the ratios of BF across our model configurations, we can evaluate the improvement relative to one another. Ratios of BF greater than 10 suggest a strong evidence for improvement, moderate evidence between 3 and 10, and anecdotal evidence between 1 and 3.

## 4. Results

### 4.1. Meteorological evaluation

Table 2 shows the ME and MAE of the WRF-predicted 10-m wind direction, wind speed and 2-m temperature over the 1-km grid verified hourly (both day and night) against three WMO surface measurements, averaged over the period between 00 UTC 27 August and 00 UTC 3 November 2013. Comparing the model surface MAE and ME scores among all four numerical experiments, we notice that the greatest error reductions occur between experiments NOFDDA and FDDA_WMO. Surface wind direction MAE (ME) is reduced from 30 to 19 (6 to 2) degrees, and surface wind speed MAE (ME) is reduced from 1.0 to 0.8 (0.2 to 0.1) m s–1. Since lidar and ACARS observations are all above the surface, assimilating lidar and ACARS does not directly improve the surface MAE and ME scores for experiments FDDA_WMO_Lidar and FDDA_WMO_Lidar_ACARS. The MAE and ME scores for both experiments remain similar to the FDDA_WMO experiment (e.g., 19-degree wind direction MAE, 0.8 m s–1 -wind speed MAE for both experiments). Although some slight degradation in wind speed and temperature ME scores is seen in the FDDA_WMO_Lidar_ACARS experiment as compared to the FDDA_WMO experiment, the FDDA_WMO_Lidar_ACARS experiment has the overall smallest MAE scores out of all four experiments. Surface temperature improvements are minimal since temperature assimilation is only allowed above the model-predicted PBL.

Table 2

Mean error and mean absolute error of the WRF-predicted 10-m wind direction, wind speed and 2-m temperature on the 1-km grid, averaged for the three WMO surface measurements and for all times (hourly) over the period between 00 UTC 27 August and 00 UTC 3 November 2013. DOI: https://doi.org/10.1525/elementa.133.t2

NOFDDA FDDA_WMO FDDA_WMO_Lidar FDDA_WMO_Lidar_ACARS

Wind Direction (Degree) ME 6 2 2 1
MAE 30 19 19 19
Wind Speed (m s–1) ME 0.2 0.1 0 –0.2
MAE 1.0 0.8 0.8 0.8
Temperature (K) ME –1.0 –0.8 –0.9 –1.4
MAE 2.3 2.3 2.4 2.2

Table 3 shows ME and MAE of the WRF-predicted wind direction, wind speed, and temperature over the 1-km grid verified hourly (both day and night) against the lower tropospheric (below 2 km AGL) INFLUX lidar measurements (winds only) and ACARS measurements (winds and temperatures), averaged over the period between 00 UTC 27 August and 00 UTC 3 November 2013. The information shown in Table 3 is the same as Table 2 except that validations are now performed against all lower-tropospheric (below 2 km AGL) measurements excluding the three surface stations. Now we clearly see error magnitudes that gradually decrease as additional observations are assimilated into the WRF model from NOFDDA to the best model performance with FDDA_WMO_Lidar_ACARS.

Table 3

Same as Table 2, but verified against the upper-air observations, INFLUX lidar measurements (winds only) and ACARS measurements (winds and temperatures) below 2 km AGL. DOI: https://doi.org/10.1525/elementa.133.t3

NOFDDA FDDA_WMO FDDA_WMO_Lidar FDDA_WMO_Lidar_ACARS

Wind Direction (Degree) ME 4 2 –1 0
MAE 26 24 15 14
Wind Speed (m s–1) ME 0.2 –0.2 –0.2 –0.2
MAE 2.0 2.0 1.3 1.2
Temperature (K) ME 0.8 1.0 1.0 0.5
MAE 1.3 1.4 1.4 0.8

As shown in Table 3, clear wind error reductions, both MAE and ME, are achieved by assimilating the INFLUX lidar wind measurements. For example, there is a 9-degree reduction in wind direction MAE and 0.7 m s–1 reduction in wind speed MAE. There are no temperature improvements since no temperature observations are available from the lidar. Assimilation of ACARS observations further reduces model MAE and ME error consistently in both wind and temperature fields except the wind speed ME which is already very small. For example, there is a 0.5 °C ME reduction and 0.6 °C MAE reduction in temperature error when ACARS observations are assimilated. Similar to the surface layer error statistics, the FDDA_WMO_Lidar_ACARS case has the smallest overall errors.

To demonstrate and summarize the effect of data assimilation more directly, Figure 3 and 4 show the scatterplots comparing the simulation without assimilating observations (i.e., NOFDDA) and the best simulation with assimilating all the available observations (i.e., FDDA_WMO_Lidar_ACARS) for the INFLUX project. The improvement due to assimilating observations is evident and the error reduction is significant, especially for wind direction (e.g., nearly 40% reduction in the surface wind direction MAE). Assimilation of upper air wind observations from the INFLUX lidar and ACARS, and temperatures from ACARS clearly improves model performance, especially for wind speed and wind direction (e.g., nearly 50% reduction in upper-air wind direction MAE).

Figure 3

Surface layer model performance. WRF-predicted versus observed wind direction (a: NOFDDA, b: FDDA_WMO_Lidar_ACARS), wind speed (c: NOFDDA, d: FDDA_WMO_Lidar_ACARS) and temperature (e: NOFDDA, f: FDDA_WMO_Lidar_ACARS) on the 1-km resolution grid, between 00 UTC 27 August and 00 UTC 3 November 2013, comparing between NOFDDA (left) and FDDA_WMO_Lidar_ACARS (right) experiments. Note that x-axis represents the model value and y-axis represents the observed value, and each data point represents a model-observation pair averaged for three National Weather Service surface stations inside the 1-km resolution grid at a given hour. The dashed line in the figure represents the y = x line that indicate a perfect model-observation match; thus the points above y = x line represents the situation where model underpredicts and points below represents the situation where model overpredicts. Note that ME represents mean error and MAE represents mean absolute error. DOI: https://doi.org/10.1525/elementa.133.f3

Figure 4

Model performances for the lower troposphere within 2 km AGL. Same as Fig. 3 but for the lower troposphere. Each data point represents a model-observation pair averaged for all the available observations (i.e., Halo LIDAR and ACARS) at a given hour. DOI: https://doi.org/10.1525/elementa.133.f4

Model errors averaged in the vertical and over the entire two-month period do not represent the temporal and spatial error distributions, both of which are important for the inverse flux corrections at 1-km, five-day resolution. Figure 5 shows the model error diurnal variation of the WRF-simulated wind direction and wind speed, averaged over the two-month period. It is shown that the wind direction MAEs for all experiments do not show apparent diurnal variations (Figure 5a), while wind speed MAEs are larger in daytime than in nighttime (Figure 5b), likely due to larger wind speed in the daytime. It is also possible that the larger daytime errors are due to the effect of PBL large eddies that is not well represented in the model, while during the nighttime winds are more influenced by large-scale weather patterns that are controlled by the large-scale dynamics. FDDA reduces the model MAE quite significantly due to assimilating the three WMO surface observations (e.g., nearly 40% error reduction in the surface wind direction). Note that assimilating lidar in FDDA_WMO_Lidar and ACARS in FDDA_WMO_Lidar_ACARS is not expected to further reduce the model surface errors since they are upper-air observations.

Figure 5

Model surface errors vs UTC time of a day. Daily time series of model errors averaged over the two-month period, for the 1-km grid, for a) mean absolute error for wind direction, b) mean absolute error for wind speed, c) mean error for wind direction, and d) mean error for wind speed, for all four experiments: NOFDDA (MAE1/ME1), FDDA_WMO (MAE2/ME2), FDDA_WMO_Lidar (MAE3/ME3) and FDDA_WMO_Lidar_ACARS (MAE4/ME4). Note that ME represents mean error and MAE represents mean absolute error. DOI: https://doi.org/10.1525/elementa.133.f5

Similar to the MAEs, the wind direction MEs (Figure 5c) do not show diurnal variations. Although the wind direction MEs are quite small (< 10 degrees), error reductions are noticeable in the three FDDA simulations. For wind speed (Figure 5d), like its MAEs, the MEs are larger and positive (model is faster than the observations) in daytime, and FDDA tends to reduce the daytime biases, although the overall MEs are quite small (< 0.6 m s–1). Note that assimilating upper-air observations from INFLUX lidar and ACARS pushed the model bias slightly to the negative side. Since the lidar and ACARS observations do not directly influence the surface errors and the magnitudes of the MEs are quite small (Figure 5d and Table 2), these small negative biases are likely the artifact of imbalances introduced by insertion of lidar and ACARS observations, or possibly due to the behavior of the model vertical mixing.

Figure 6 shows vertical MAE and ME distributions for WRF-predicted wind direction (Figure 6a and c) and wind speed (Figure 6b and d) within the lowest 2.5 km AGL, averaged over the two-month period, comparing among four numerical experiments listed in Table 1. For both wind speed and wind direction, NOFDDA has the largest error through the entire atmosphere below ~2 km AGL (except for the wind speed near the 1.5-km level where FDDA_WMO appears to have slightly larger error), with a ~30-degree wind direction error and 2–2.5 m s–1 wind speed error at the surface. Model errors generally decrease with height. Comparing the NOFDDA and FDDA_WMO experiments, we see that assimilating only WMO surface observations improves the model-predicted winds. Although not significant, the improvement over a vertical layer is expected since the FDDA sub-model in WRF allows the influence of the surface observations to spread to the entire depth of the PBL depending on the stability regime (Deng et al. 2009, Roger et al. 2013); however, the magnitudes of improvements, especially for wind direction, gradually decreases with height. The addition of the INFLUX lidar winds to the data assimilation further reduces model error. Consistent with Tables 2 and 3, the large gap between the FDDA_WMO and FDDA_WMO_Lidar shows that addition of upper air observations creates substantially improved model solutions beyond what can be obtained from assimilating surface observations alone. It is clear that addition of ACARS observations further reduces model errors although the further improvement is not large (likely because after assimilating the lidar data the model errors are already quite small).

Figure 6

Vertical distribution of model errors. Vertical profile of model errors averaged over the two-month period, for the 1-km grid, for a) mean absolute error for wind direction, b) mean absolute error for wind speed, c) mean error for wind direction, and d) mean error for wind speed, for all four experiments: NOFDDA (MAE1/ME1), FDDA_WMO (MAE2/ME2), FDDA_WMO_Lidar (MAE3/ME3) and FDDA_WMO_Lidar_ACARS (MAE4/ME4). Note that ME represents mean error and MAE represents mean absolute error. DOI: https://doi.org/10.1525/elementa.133.f6

The wind direction biases (Figure 6c) are quite small for the low levels (< 5 degrees), and biases increases with height. It is apparent that FDDA reduces the biases. Wind speed biases (Figure 6d) in NOFDDA demonstrate large positive biases within the lowest 1 km although the magnitudes of biases decrease with height, and becomes somewhat negative in mid- and upper- PBL. Assimilating surface observation slightly reduces the low-level biases, and the error reduction appears to be limited to below 1 km, while assimilating lidar and ACARS observations more significantly reduces model biases.

### 4.2. Evaluation of model-predicted PBL depth using the Halo Doppler Lidar data

The PBL depth for this study is manually estimated from the lidar observations for each 20-min time period, identified by large gradients in Signal-to-Noise Ratio (SNR) and the height where the vertical velocity variance becomes small (less than ~0.1 m2 s–2). NOAA is currently implementing and testing an algorithm to automate the PBL depth estimation process for the INFLUX data set. As an example, a comparison of PBL structures between WRF and the INFLUX lidar observations at Indianapolis for 29 and 30 August 2013 is shown in Figure 7. Generally, the model-predicted TKE structures are highly correlated with the lidar-observed vertical velocity variances and large gradients in SNR. The diurnal variation of the PBL structure can be clearly seen within both model output and observations. However, differences in the vertical extent of the PBL depth between the model output and observations are apparent.

Figure 7

Planetary Boundary Layer verification. Comparison of Planetary Boundary Layer structures between WRF and the INFLUX lidar observations at Indianapolis for 28 and 30 August 2013: a) WRF-predicted turbulent kinetic energy from NOFDDA, b) WRF-predicted turbulent kinetic energy from FDDA_WMO_Lidar_ACARS, c) Lidar-observed vertical velocity variances, and d) Lidar-observed Signal-to-Noise Ratio (SNR). DOI: https://doi.org/10.1525/elementa.133.f7

Table 4 compares the MAE and ME of the WRF-predicted PBL depth on the 1-km grid verified hourly against the lidar estimates of PBL depth at the lidar site in Indianapolis. The evaluation of PBL depth is conducted only for the daytime period between 17 and 22 UTC when a well-mixed PBL is fully developed and quasi-stationary, for the 2-month period between 00 UTC 27 August and 00 UTC 3 November 2013. Our results show that for all experiments the MAE of model-predicted PBL depth is quite similar, in a range between 223 and 272 m.

Table 4

Mean error and mean absolute error of the WRF-predicted PBL depth on the 1-km grid verified against the Indianapolis INFLUX lidar measurements, averaged for all times between 17 and 22 UTC each day for the period between 00 UTC 27 August and 00 UTC 3 November 2013. DOI: https://doi.org/10.1525/elementa.133.t4

NOFDDA FDDA_WMO FDDA_WMO_Lidar FDDA_WMO_Lidar_ACARS

ME (m) 25 103 83 –23
MAE (m) 259 272 254 223

### 4.3. Evaluation of inverse emissions

To evaluate the impact of the different WRF simulations on the posterior fluxes estimated from the inversion system, we coupled the WRF-FDDA modeled variables (mean winds and turbulence) to generate the corresponding LPDM tower footprints, or influence functions. Using the different WRF simulations discussed above and their corresponding LPDM tower footprints, the five-day inverse emissions were computed for whole-city emissions using a Bayesian inversion system at 1-km resolution over the urban area of Indianapolis, and the CO2 mole fraction from the 11 of the 12 towers from the INFLUX tower network (Figure 8), for the entire two-month period (Sept–Oct 2013). Inverse CO2 emissions were computed over five-day periods and the configuration was similar to Lauvaux et al. (2016) and kept identical across the four inversions. Therefore, the differences in the inverse emissions correspond to the impact of different transport model realizations, and more precisely the impact of the FDDA systems, assimilating various data sources.

Figure 8

Observation network showing 12 INFLUX towers. Observation network showing location of 12 INFLUX towers. Particular site details and coordinates are given in Miles et al. (2016) and Richardson et al. (2016). DOI: https://doi.org/10.1525/elementa.133.f8

Figure 9 illustrates the different influence functions (or tower footprints) for the 12 tower locations for one single observation time. The extent and the main axes of the tower footprints vary significantly depending on the WRF-FDDA model results, which translates into varying spatial attributions of the observed atmospheric CO2 mole fractions. Overall, four atmospheric inversions were performed for the period September-October 2013 producing five-day emissions estimates at 1 km resolution over the domain. Figure 10 shows the relative differences among the different transport solutions (influence functions) in space over the simulation domain (equivalent to the 1-km grid of WRF) represented by average pairwise differences. The maximum differences, up to 50%, are located at some of the tower locations emphasizing the importance of the near-field contribution and the high sensitivity of the footprints to wind direction changes. Differences average around 15% across the domain. Once combined with prior information, the impact of different transport fields is decreased as inverse emissions are also constrained by prior fluxes and their associated error covariances. Here, similar to Lauvaux et al. (2016), we assume that the error variances scale with the prior emissions, which amplifies the beltway, for example, as a likely locations for corrections in the optimization procedure. Figure 11 shows the maps of the differences across the four different CO2 inverse flux estimates. As expected, differences are distributed following the error variances and not the spatial differences in the influence functions. As shown in Lauvaux et al. (2016), prior emission errors impact significantly the spatial distribution and the whole-city inverse emissions. Because large spatial gradients are present in the emissions and in the emissions error variances, the minimzation will attribute differences to high-variance areas, lessening the impact of the transport. Here, varying influence functions remains less influential on our inverse solutions than the constraints imposed by structures in the prior emission error covariances. The magnitudes of the flux corrections are as high as 15%, similar to the influence functions. However, the local maxima observed previously at the tower locations are not visible in the CO2 flux differences.

Figure 9

Influence functions. Influence functions over the INFLUX 1-km resolution domain for 10 of the 12 tower locations of the INFLUX network using the Lagrangian Particle Dispersion Model, averaged for 26–30 October 2013 (corresponding to the observation time 17–22UTC) driven by the meteorological variables from the four different WRF configurations: NOFDDA (Upper left), FDDA_WMO (Upper right), FDDA_WMO_Lidar (Lower left), and FDDA_WMO_Lidar_ACARS (Lower right), in log scale (ppm hour m2 g–1). Note that numbers 1–12 indicate the tower locations as detailed in Figure 8 and two towers were not operational during the time period Oct 26–30. DOI: https://doi.org/10.1525/elementa.133.f9

Figure 10

Spread of influence functions. Spatial distribution of influence function (averaged pairwise differences) over the two months (September–October 2013, 17–22UTC) representing the variability in the tower footprints across the different WRF-FDDA experiments. Note that numbers 1–12 indicate the tower locations as detailed in Figure 8 and two towers were not operational during the time period Oct 26–30. DOI: https://doi.org/10.1525/elementa.133.f10

Figure 11

Spread of the inverse emission. Relative differences among the four inversion configurations (in % – averaged pairwise differences) representing the variability in the inverse emissions due to the transport fields, averaged over the two-month period. The assimilation of different meteorological data types impacts the inverse CO2 emissions by up to 15% in the downtown area and around the beltway. Tower indices are similar to the site locations in Figure 8, except for Site 13 which was not operational in 2013. Note that numbers 1–12 indicate the tower locations as detailed in Figure 8 and two towers were not operational during the time period Oct 26–30. DOI: https://doi.org/10.1525/elementa.133.f11

Finally, we show the aggregated impact over the domain for each five-day periods of the two-month inversion. Figure 12 shows that the total impact of meteorological data assimilation on each five-day segment is relatively small compared to the correction by the inversion (shown by the distance to the prior in blue). With the exception of few five-day periods showing larger differences (e.g. September 6–11, 2013), most inversion periods correspond to similar inverse fluxes; this is explained by the compensation of higher and lower emissions distributed spatially across the inversion domain. Differences remain small at the five-day time scale (less than 10%) and over the two months (less than 5% change of the total 2-month emissions). We conclude based on the inversion results that all the transport configurations do not introduce any significant bias in the solution, which confirms that the low systematic errors in the meteorological variables across the four simulations match the small differences in total posterior emissions. Random errors in meteorological variables propagates into the flux solution as additional posterior uncertainties but do not create any systematic errors in the fluxes. While this ensemble of simulations is not calibrated to meteorological observations (e.g. Grimit and Mass, 2007), and therefore does not necessarily represent the true transport errors in the simulation, we show here that for this experiment, the different observations used in the WRF-FDDA simulations do affect the spatial attribution of flux corrections but have a limited impact on total inverse emissions over the entire domain. Furthermore, we computed the Bayes Factors for each model configurations. Results are presented in Table 5. The ratios between the different WRF-FDDA configurations show strong evidences (ratio > 10) that the last two configurations (i.e. with lidar and ACARS observations) better represent the transport compared to the original FDDA (i.e. WMO surface stations only) or no assimilation of any data. The ratio between the last two configurations remains small (about 2.6) which suggests only a moderate evidence of an improvement. These results agree with the meteorological evaluation, allowing us to conclude that the improvement of the transport is also noticeable and statistically significant in the CO2 flux inverse solutions.

Figure 12

Inverse emissions. Inverse emissions in ktC per 5-day periods for the entire 1-km resolution INFLUX domain (87 km × 87 km) from the four WRF configurations, respectively WRF (no FDDA), WRF-FDDA with WMO data, WRF-FDDA with WMO and Lidar data, and WRF-FDDA with WMO, Lidar, and ACARS data. The prior emissions (from Hestia) are indicated in blue. DOI: https://doi.org/10.1525/elementa.133.f12

Table 5

Median ratio of the Bayes Factors derived from the Kalman matrix inversion when comparing the different model configurations for the period between 00 UTC 27 August and 00 UTC 3 November 2013. DOI: https://doi.org/10.1525/elementa.133.t5

NOFDDA FDDA_WMO FDDA_WMO_Lidar FDDA_WMO_Lidar_ACARS

NOFDDA N/A 2.14 0.12 0.03
FDDA_WMO 0.5 N/A 0.31 0.09
FDDA_WMO_Lidar 8.3 3.2 N/A 0.5
FDDA_WMO_Lidar_ACARS 30.3 11.5 2.6 N/A

## 5. Conclusions

Atmospheric transport model errors limit the accuracy and precision of inverse flux estimates. Assimilation of meteorological observations is a well-established method for reducing transport model errors. This paper represents a first quantification of the impact of meteorological data assimilation on urban-scale CO2 inverse flux estimates.

As expected, meteorological data assimilation significantly reduces transport model error. For example, ~40% error reduction in the surface wind direction and ~50% error reduction in upper-air wind direction are achieved due to data assimilation. This paper demonstrates that observations of winds throughout the PBL have significantly greater value in reducing random errors (or mean absolute error, MAE) in wind speed and wind direction than surface layer observations. Random errors in wind speed and wind direction in the PBL were reduced by approximately a factor of two when PBL wind measurements were assimilated. The transport model showed very small biases in wind speed and direction, and PBL depth prior to meteorological data assimilation, and these small biases were either reduced or unchanged with data assimilation, with the exception that the assimilation of only surface layer observations did cause a noticeable increase in the PBL depth bias. We expect that if our initial meteorological simulation had been biased, that meteorological data assimilation would have reduced these errors as well. Sarmiento et al., (2016, this issue), for example, show that for Indianapolis model-data biases can be significant at different times of year for this configuration, and that biases vary across choices of model configurations and land surface data.

Dedicated observational systems such as the lidar deployed for INFLUX, capable of continuous profiling of PBL winds, are recommended as a straightforward and potent means of minimizing transport errors for urban inversions. The relatively small domains for urban inversions make direct measurements of regional wind fields feasible. We also show, however, that commercial aircraft data from ACARS have a similarly potent influence on atmospheric transport errors. This is promising since most major urban centers are collocated with major airports, thus as long as commercial aircraft meteorological observations are recorded and reported, these data can be used to improve atmospheric transport model performance.

We were unable to explore the quantitative impact of data assimilation on the performance of a biased meteorological model configuration. Further, the data assimilated here did not significantly improve the random errors in PBL depth, and might not have a significant impact on a model configuration with a PBL depth bias. Lidar and ACARS observations, however, could clearly identify such biases, and additional data assimilation approaches could be adopted to address this issue. Improvements to model physics and input data (e.g. Sarmiento et al. 2016, this issue) represent another important approach to improving transport model performance.

Inverse emissions from the four simulations show a significant improvement when using the transport configurations assimilating vertical observations (with lidar and aircraft observations) as demonstrated by the Bayes Factors. The differences in space were directly related to the quality of the transport simulations, with local differences of about 15% in the emission corrections after inversion. However, the whole-city inverse emissions remained similar across the different model configurations (less than 5% change over the two months). This result is in agreement with the low biases in the different meteorological variables before and after meteorological data assimilation. It is reasonable to expect that if model meteorology was initially biased, meteorological data assimilation could have a substantial impact on the time-integrated, whole-city emissions as well.

In summary, this work shows the benefit of meteorological data assimilation on urban transport model performance, especially in reducing PBL wind speed and direction MAE when assimilating PBL wind profile observations. This reduction in the transport errors for wind speed, direction, and PBL height will provide more robust CO2 inverse emissions at the city-scale, by improving the spatial attribution of the emission corrections. We expect that similar results would be achieved for biases given transport simulations with initially biased wind speed and wind direction.

## Data Accessibility Statement

The INFLUX tower data is available at sites.psu.edu/INFLUX, and model results can be made available upon request.