API Reference
This page documents objects and functions provided by flux-data-qaqc
.
Data
- class fluxdataqaqc.Data(config)[source]
-
An object for interfacing
flux-data-qaqc
with input metadata (config) and time series input, it provides methods and attributes for parsing, temporal analysis, visualization, and filtering data.A
Data
object is initialized from a config file (see Setting up a config file) with metadata for an eddy covariance tower or other dataset containing time series meterological data. It serves as a starting point in the Python API of the energy balance closure analysis and data validation routines that are provided byflux-data-qaqc
.Manual pre-filtering of data based on user-defined quality is aided with the
Data.apply_qc_flags
method. Weighted or non-weighted means of variables with multiple sensors/recordings is performed upon initialization if these options are declared in the config file. TheData
class also includes theData.df
property which returns the time series data in the form of apandas.DataFrame
object for custom workflows.Data
inherits line and scatter plot methods fromPlot
which allows for the creation of interactive visualizations of input time series data.- climate_file
Absolute path to climate input file.
- Type:
- config
Config parser instance created from the data within the config.ini file.
- header
Header as found in input climate file.
- Type:
- inv_map
Dictionary with input climate file names as keys and internal names as values. May only include pairs when they differ.
- Type:
- out_dir
Default directory to save output of
QaQc.write
orQaQc.plot
methods.- Type:
- plot_file
path to plot file once it is created/saved by
Data.plot
.- Type:
pathlib.Path or None
- soil_var_weight_pairs
Dictionary with names and weights for weighted averaging of soil heat flux or soil moisture variables.
- Type:
- qc_var_pairs
Dictionary with variable names as keys and QC value columns (numeric of characters) as values.
- Type:
- units
Dictionary with internal variable names as keys and units as found in config as values.
- Type:
- variables
Dictionary with internal names for variables as keys and names as found in the input data as values.
- Type:
- variable_names_dict
Dictionary with internal variable names as keys and keys in config.ini file as values.
- Type:
- apply_qc_flags(threshold=None, flag=None, threshold_inequality='lt')[source]
Apply user-provided QC values or flags for climate variables to filter poor-quality data by converting them to null values, updates
Data.df
.Specifically where the QC value is < threshold change the variables value for that date-time to null. The other option is to use a column of flags, e.g. ‘x’ for data values to be filtered out. The threshold value or flag may be specified in the config file’s METADATA section otherwise they should be assigned as keyword arguments here.
Specification of which QC (flag or numeric threshold) columns should be applied to which variables is set in the DATA section of the config file. For datasets with QC value columns with names identical to the variable they correspond to with the suffix “_QC” the QC column names for each variable do not need to be specified in the config file.
- Keyword Arguments:
threshold (float) – default
None
. Threshold for QC values, if flag is below threshold replace that variables value with null.flag (str, list, or tuple) – default
None
. Character flag signifying data to filter out. Can be list or tuple of multiple flags.threshold_inequality (str) – default ‘lt’. ‘lt’ for filtering values that are less than
threshold
value, ‘gt’ for filtering values that are greater.
- Returns:
Example
If the input time series file has a column with numeric quality values named “LE_QC” which signify the data quality for latent energy measurements, then in the config.ini file’s DATA section the following must be specified:
[DATA] latent_heat_flux_qc = LE_QC ...
Now you must specify the threshold of this column in which to filter out when using
Data.apply_qc_flags
. For example if you want to remove all data entries of latent energy where the “LE_QC” value is below 5, then the threshold value would be 5. The threshold can either be set in the config file or passed as an argument. If it is set in the config file, i.e.:[METADATA] qc_threshold = 0.5
Then you would cimply call the method and this threshold would be applied to all QC columns specified in the config file,
>>> from fluxdataqaqc import Data >>> d = Data('path/to/config.ini') >>> d.apply_qc_flags()
Alternatively, if the threshold is not defined in the config file or if you would like to use a different value then pass it in,
>>> d.apply_qc_flags(threshold=2.5)
Lastly, this method also can filter out based on a single or list of character flags, e.g. “x” or “bad” gievn that the column containing these is specified in the config file for whichever variable they are to be applied to. For example, if a flag column contains multiple flags signifying different data quality control info and two in particular signify poor quality data, say “b” and “a”, then apply them either in the config file:
[METADATA] qc_flag = b,a
Of within Python
>>> d.apply_qc_flags(flag=['b', 'a'])
For more explanation and examples see the “Configuration Options” section of the online documentation.
- property df
Pull variables out of the config and climate time series files load them into a datetime-indexed
pandas.DataFrame
.Metadata about input time series file format: “missing_data_value”, “skiprows”, and “date_parser” are utilized when first loading the
df
into memory. Also, weighted and non-weighted averaging of multiple measurements of the same climatic variable occurs on the first call ofData.df
, if these options are declared in the config file. For more details and example uses of these config options please see the “Configuration Options” section of the online documentation.- Returns:
df (
pandas.DataFrame
)
Examples
You can utilize the df property as with any
pandas.DataFrame
object. However, if you would like to make changes to the data you must first make a copy, then make the changes and then reassign it toData.df
, e.g. if you wanted to add 5 degrees to air temp.>>> from fluxdataqaqc import Data >>> d = Data('path_to_config.ini') >>> df = d.df.copy() >>> df['air_temp_column'] = df['air_temp_column'] + 5 >>> d.df = df
The functionality shown above allows for user-controlled preprocessing and modification of any time series data in the initial dataset. It also allows for adding new columns but if they are variables used by
flux-data-qaqc
e.g. Rn or other energy balance variables, be sure to also updateData.variables
andData.units
with the appropriate entries. New or modified values will be used in any further analysis/ploting routines withinflux-data-qaqc
.By default the names of variables as found within input data are retained in
QaQc.df
, however you can use the naming scheme asflux-data-qaqc
which can be viewed inData.variable_names_dict
by using the theData.inv_map
dictionary which maps names from user-defined to internal names (as opposed toData.variables
) which maps from internal names to user-defined. For example if your input data had the following names for LE, H, Rn, and G set in your config:[DATA] net_radiation_col = Net radiation, W/m2 ground_flux_col = Soil-heat flux, W/m2 latent_heat_flux_col = Latent-heat flux, W/m2 sensible_heat_flux_col = Sensible-heat flux, W/m2
Then the
Data.df
will utilize the same names, e.g.>>> # d is a Data instance >>> d.df.head()
produces:
date
Net radiation, W/m2
Latent-heat flux, W/m2
Sensible-heat flux, W/m2
Soil-heat flux, W/m2
10/1/2009 0:00
-54.02421778
0.70761
0.95511
-40.42365926
10/1/2009 0:30
-51.07744708
0.04837
-1.24935
-33.35383253
10/1/2009 1:00
-50.99438925
0.68862
1.91101
-43.17900525
10/1/2009 1:30
-51.35032377
-1.85829
-15.4944
-40.86201497
10/1/2009 2:00
-51.06604228
-1.80485
-19.1357
-39.80936855
Here is how you could rename your dataframe using
flux-data-qaqc
internal names,>>> d.df.rename(columns=q.inv_map).head()
date
Rn
LE
H
G
10/1/2009 0:00
-54.02421778
0.70761
0.95511
-40.42365926
10/1/2009 0:30
-51.07744708
0.04837
-1.24935
-33.35383253
10/1/2009 1:00
-50.99438925
0.68862
1.91101
-43.17900525
10/1/2009 1:30
-51.35032377
-1.85829
-15.4944
-40.86201497
10/1/2009 2:00
-51.06604228
-1.80485
-19.1357
-39.80936855
A minor note on variable naming, if your input data variables use exactly the same names used by
flux-data-qaqc
, they will be renamed by adding the prefix “input_”, e.g. “G” becomes “input_G” on the first time reading the data from disk, i.e. the first time accessingData.df
.Note
The temporal frequency of the input data is retained unlike the
Qaqc.df
which automatically resamples time series data to daily frequency.
- hourly_ASCE_refET(reference='short', anemometer_height=None)[source]
Calculate hourly ASCE standardized short (ETo) or tall (ETr) reference ET from input data and wind measurement height.
If input data’s time frequency is < hourly the input data will be resampled to hourly and the output reference ET time series will be returned as a datetime
pandas.Series
object, if the input data is already hourly then the resulting time series will automatically be merged into theData.df
dataframe named “ASCE_ETo” or “ASCE_ETr” respectively.- Keyword Arguments:
reference (str) – default “short”, calculate tall or short ASCE reference ET.
anemometer_height (float or None) – wind measurement height in meters , default
None
. IfNone
then look for the “anemometer_height” entry in the METADATA section of the config.ini, if not there then print a warning and use 2 meters.
- Returns:
Hint
The input variables needed to run this method are: vapor pressure, wind speed, incoming shortwave radiation, and average air temperature. If vapor pressure deficit and average air temperature exist, the actual vapor pressure will automatically be calculated.
- plot(ncols=1, output_type='save', out_file=None, suptitle='', plot_width=1000, plot_height=450, sizing_mode='fixed', merge_tools=False, link_x=True, **kwargs)[source]
Creates a series of interactive diagnostic line and scatter plots of input data in whichever temporal frequency it is in.
The main interactive features of the plots include: pan, selection and scrol zoom, hover tool that shows paired variable values including date, and linked x-axes that pan/zoom togehter for daily and monthly time series plots.
It is possible to change the format of the output plots including adjusting the dimensions of subplots, defining the number of columns of subplots, setting a super title that accepts HTML, and other options. If variables are not present for plots they will not be created and a warning message will be printed. There are two options for output: open a temporary file for viewing or saving a copy to
QaQc.out_dir
.A list of all potential time series plots created:
energy balance components
radiation components
multiple soil heat flux measurements
air temperature
vapor pressure and vapor pressure deficit
wind speed
precipitation
latent energy
multiple soil moisture measurements
- Keyword Arguments:
ncols (int) – default 1. Number of columns of subplots.
output_type (str) – default “save”. How to output plots, “save”, “show” in browser, “notebook” for Jupyter Notebooks, “return_figs” to return a list of Bokeh
bokeh.plotting.figure.Figure`s, or "return_grid" to return the :obj:`bokeh.layouts.gridplot
.out_file (str or None) – default
None
. Path to save output file, ifNone
save output toData.out_dir
with the name [site_id]_input_plots.html where [site_id] isData.site_id
.suptitle (str or None) – default
None
. Super title to go above plots, accepts HTML/CSS syntax.plot_width (int) – default 1000. Width of subplots in pixels.
plot_height (int) – default 450. Height of subplots in pixels, note for subplots the height will be forced as the same as
plot_width
.sizing_mode (str) – default “scale_both”. Bokeh option to scale dimensions of
bokeh.layouts.gridplot
.merge_tools (bool) – default False. Merges all subplots toolbars into a single location if True.
link_x (bool) – default True. If True link x axes of daily time series plots and monthly time series plots so that when zooming or panning on one plot they all zoom accordingly, the axes will also be of the same length.
Example
Starting from a correctly formatted config.ini and climate time series file, this example shows how to read in the data and produce a series of plots of input data as it is found in the input data file (unlike
QaQc.plot
which produces plots at daily and monthly temporal frequency). This example also shows how to display a title at the top of plot with the site’s location and site ID.>>> from fluxdataqaqc import Data >>> d = Data('path/to/config.ini') >>> # create plot title from site ID and location in N. America >>> title = "<b>Site:</b> {}; <b>Lat:</b> {}N; <b>Long:</b> {}W".format( >>> q.site_id, q.latitude, q.longitude >>> ) >>> q.plot( >>> ncols=2, output_type='show', plot_width=500, suptitle=title >>> )
Note, we specified the width of plots to be smaller than default because we want both columns of subplots to be viewable on the screen.
Tip
To reset all subplots at once, refresh the page with your web browser.
Note
Additional keyword arguments that are recognized by
bokeh.layouts.gridplot
are also accepted byData.plot
.See also
- variable_names_dict = {'G': 'ground_flux_col', 'H': 'sensible_heat_flux_col', 'H_user_corr': 'sensible_heat_flux_corrected_col', 'LE': 'latent_heat_flux_col', 'LE_user_corr': 'latent_heat_flux_corrected_col', 'Rn': 'net_radiation_col', 'date': 'datestring_col', 'lw_in': 'longwave_in_col', 'lw_out': 'longwave_out_col', 'ppt': 'precip_col', 'rh': 'rel_humidity_col', 'sw_in': 'shortwave_in_col', 'sw_out': 'shortwave_out_col', 'sw_pot': 'shortwave_pot_col', 't_avg': 'avg_temp_col', 'vp': 'vap_press_col', 'vpd': 'vap_press_def_col', 'wd': 'wind_dir_col', 'ws': 'wind_spd_col'}
QaQc
- class fluxdataqaqc.QaQc(data=None, drop_gaps=True, daily_frac=1.0, max_interp_hours=2, max_interp_hours_night=4)[source]
-
Numerical routines for correcting daily energy balance closure for eddy covariance data and other data analysis tools.
Two routines are provided for improving energy balance closure by adjusting turbulent fluxes, latent energy and sensible heat, the Energy Balance Ratio method (modified from FLUXNET) and the Bowen Ratio method.
The
QaQc
object also has multiple tools for temporal frequency aggregation and resampling, estimation of climatic and statistical variables (e.g. ET and potential shortwave radiation), downloading gridMET reference ET, managing data and metadata, interactive validation plots, and managing a structure for input and output data files. Input data is expected to be aData
instance or apandas.DataFrame
.- Keyword Arguments:
drop_gaps (bool) – default
True
. IfTrue
automatically filter variables on days with sub-daily measurement gaps less thandaily_frac
.daily_frac (float) – default 1.00. Fraction of sub-daily data required otherwise the daily value will be filtered out if
drop_gaps
isTrue
. E.g. ifdaily_frac = 0.5
and the input data is hourly, then data on days with less than 12 hours of data will be forced to null withinQaQc.df
. This is important because systematic diurnal gaps will affect the autmoatic resampling that occurs when creating aQaQc
instance and the daily data is used in closure corrections, other calculations, and plots. If sub-daily linear interpolation is applied to energy balance variables the gaps are counted after the interpolation.max_interp_hours (None or float) – default 2. Length of largest gap to fill with linear interpolation in energy balance variables if input datas temporal frequency is less than daily. This value will be used to fill gaps when \(Rn > 0\) or \(Rn\) is missing during each day.
max_interp_hours_night (None or float) – default 4. Length of largest gap to fill with linear interpolation in energy balance variables if input datas temporal frequency is less than daily when \(Rn < 0\) within 12:00PM-12:00PM daily intervals.
- agg_dict
Dictionary with internal variable names as keys and method of temporal resampling (e.g. “mean” or “sum”) as values.
- Type:
- config
Config parser instance created from the data within the config.ini file.
- config_file
Absolute path to config.ini file used for initialization of the
fluxdataqaqc.Data
instance used to create theQaQc
instance.- Type:
- corrected
False until an energy balance closure correction has been run by calling
QaQc.correct_data
.- Type:
- corr_methods
List of Energy Balance Closure correction routines usable by
QaQc.correct_data
.- Type:
- gridMET_exists
True if path to matching gridMET time series file exists on disk and has time series for reference ET and precipitation and the dates for these fully overlap with the energy balance variables, i.e. the date index of
QaQc.df
.- Type:
- gridMET_meta
Dictionary with information for gridMET variables that may be downloaded using
QaQc.download_gridMET
.- Type:
- inv_map
Dictionary with input climate file names as keys and internal names as values. May only include pairs when they differ.
- Type:
- out_dir
Default directory to save output of
QaQc.write
orQaQc.plot
methods.- Type:
- n_samples_per_day
If initial time series temporal frequency is less than 0 then this value will be updated to the number of samples detected per day, useful for post-processing based on the count of sub-daily gaps in energy balance variables, e.g. “LE_subday_gaps”.
- Type:
- plot_file
path to plot file once it is created/saved by
QaQc.plot
.- Type:
pathlib.Path or None
- temporal_freq
Temporal frequency of initial (as found in input climate file) data as determined by
pandas.infer_freq
.- Type:
- units
Dictionary with internal variable names as keys and units as found in config as values.
- Type:
- variables
Dictionary with internal variable names as keys and names as found in the input data as values.
- Type:
Note
Upon initialization of a
QaQc
instance the temporal frequency of the input data checked usingpandas.infer_freq
which does not always correctly parse datetime indices, if it is not able to correctly determine the temporal frequency the time series will be resampled to daily frequency but if it is in fact already at daily frequency the data will be unchanged. In this case theQaQc.temporal_freq
will be set to “na”.- agg_dict = {'ASCE_ETo': 'sum', 'ASCE_ETr': 'sum', 'ET': 'sum', 'ET_corr': 'sum', 'ET_fill': 'sum', 'ET_fill_val': 'sum', 'ET_gap': 'sum', 'ET_user_corr': 'sum', 'G': 'mean', 'G_subday_gaps': 'sum', 'H': 'mean', 'H_corr': 'mean', 'H_subday_gaps': 'sum', 'H_user_corr': 'mean', 'LE': 'mean', 'LE_corr': 'mean', 'LE_subday_gaps': 'sum', 'LE_user_corr': 'mean', 'Rn': 'mean', 'Rn_subday_gaps': 'sum', 'br': 'mean', 'ebr': 'mean', 'ebr_5day_clim': 'mean', 'ebr_corr': 'mean', 'ebr_user_corr': 'mean', 'energy': 'mean', 'flux': 'mean', 'flux_corr': 'mean', 'gridMET_ETo': 'sum', 'gridMET_ETr': 'sum', 'gridMET_prcp': 'sum', 'lw_in': 'mean', 'lw_out': 'mean', 'ppt': 'sum', 'ppt_corr': 'sum', 'rh': 'mean', 'rso': 'mean', 'sw_in': 'mean', 'sw_out': 'mean', 'sw_pot': 'mean', 't_avg': 'mean', 't_dew': 'mean', 't_max': 'mean', 't_min': 'mean', 'vp': 'mean', 'vpd': 'mean', 'ws': 'mean'}
- corr_methods = ('ebr', 'br', 'lin_regress')
- correct_data(meth='ebr', et_gap_fill=True, y='Rn', refET='ETr', x=['G', 'LE', 'H'], fit_intercept=False)[source]
Correct turblent fluxes to improve energy balance closure using an Energy Balance Ratio method modified from FLUXNET.
Currently three correction options are available: ‘ebr’ (Energy Balance Ratio), ‘br’ (Bowen Ratio), and ‘lin_regress’ (least squares linear regression). If you use one method followed by another corrected,the corrected versions of LE, H, ET, ebr, etc. will be overwritten with the most recently used approach.
This method also computes potential clear sky radiation (saved as “rso”) using an ASCE approach based on station elevation and latitude. ET is calculated from raw and corrected LE using daily air temperature to correct the latent heat of vaporization, if air temp. is not available in the input data then air temp. is assumed at 20 degrees celcius.
Corrected or otherwise newly calculated variables are named using the following suffixes to distinguish them:
uncorrected LE, H, etc. from input data have no suffix _corr uses adjusted LE, H, etc. from the correction method used _user_corr uses corrected LE, H, etc. found in data file (if provided)
- Parameters:
y (str) – name of dependent variable for regression, must be in
QaQc.variables
keys, or a user-added variable. Only used ifmeth='lin_regress'
.x (str or list) – name or list of independent variables for regression, names must be in
QaQc.variables
keys, or a user-added variable. Only used ifmeth='lin_regress'
.
- Keyword Arguments:
meth (str) – default ‘ebr’. Method to correct energy balance.
et_gap_fill (bool) – default True. If true fill any remaining gaps in corrected ET with ETr * ETrF, where ETr is alfalfa reference ET from gridMET and ETrF is the filtered, smoothed (7 day moving avg. min 2 days) and linearly interpolated crop coefficient. The number of days in each month that corrected ET are filled will is provided in
QaQc.monthly_df
as the column “ET_gap”.refET (str) – default “ETr”. Which gridMET reference product to use for ET gap filling, “ETr” or “ETo” are valid options.
fit_intercept (bool) – default False. Fit intercept for regression or set to zero if False. Only used if
meth='lin_regress'
.apply_coefs (bool) – default False. If
True
then apply fitted coefficients to their respective variables for linear regression correction method, rename the variables with the suffix “_corr”.
- Returns:
Example
Starting from a correctly formatted config.ini and climate time series file, this example shows how to read in the data and apply the energy balance ratio correction without gap-filling with gridMET ETr x ETrF.
>>> from fluxdataqaqc import Data, QaQc >>> d = Data('path/to/config.ini') >>> q = QaQc(d) >>> q.corrected False
Now apply the energy balance closure correction
>>> q.correct_data(meth='ebr', et_gap_fill=False) >>> q.corrected True
Note
If
et_gap_fill
is set toTrue
(default) the gap filled days of corrected ET will be used to recalculate LE_corr for those days with the gap filled values, i.e. LE_corr will also be gap-filled.Note
The ebr_corr variable or energy balance closure ratio is calculated from the corrected versions of LE and H independent of the method. When using the ‘ebr’ method the energy balance correction factor (what is applied to the raw H and LE) is left as calculated (inverse of ebr) and saved as ebc_cf.
See also
For explanation of the linear regression method see the
QaQc.lin_regress
method, calling that method with the keyword argumentapply_coefs=True
and \(Rn\) as the y variable and the other energy balance components as the x variables will give the same result as the default inputs to this function whenmeth='lin_regress
.
- daily_ASCE_refET(reference='short', anemometer_height=None)[source]
Calculate daily ASCE standardized short (ETo) or tall (ETr) reference ET from input data and wind measurement height.
The resulting time series will automatically be merged into the
Data.df
dataframe named “ASCE_ETo” or “ASCE_ETr” respectively.- Keyword Arguments:
reference (str) – default “short”, calculate tall or short ASCE reference ET.
anemometer_height (float or None) – wind measurement height in meters , default
None
. IfNone
then look for the “anemometer_height” entry in the METADATA section of the config.ini, if not there then print a warning and use 2 meters.
- Returns:
Note
If the hourly ASCE variables were prior calculated from a
Data
instance they will be overwritten as they are saved with the same names.
- property df
See
fluxdataqaqc.Data.df
as the only difference is that theQaQc.df
is first resampled to daily frequency.
- download_gridMET(variables=None)[source]
Download reference ET (alfalfa and grass) and precipitation from gridMET for all days in flux station time series by default.
Also has ability to download other specific gridMET variables by passing a list of gridMET variable names. Possible variables and their long form can be found in
QaQc.gridMET_meta
.Upon download gridMET time series for the nearest gridMET cell will be merged into the instances dataframe attibute
QaQc.df
and all gridMET variable names will have the prefix “gridMET_” for identification.The gridMET time series file will be saved to a subdirectory called “gridMET_data” within the directory that contains the config file for the current
QaQc
instance and named with the site ID and gridMET cell centroid lat and long coordinates in decimal degrees.- Parameters:
variables (None, str, list, or tuple) – default None. List of gridMET variable names to download, if None download ETr and precipitation. See the keys of the
QaQc.gridMET_meta
dictionary for a list of all variables that can be downloaded by this method.- Returns:
Note
Any previously downloaded gridMET time series will be overwritten when calling the method, however if using the the gap filling method of the “ebr” correction routine the download will not overwrite currently existing data so long as gridMET reference ET and precipitation is on disk and its path is properly set in the config file.
- classmethod from_dataframe(df, site_id, elev_m, lat_dec_deg, var_dict, drop_gaps=True, daily_frac=1.0, max_interp_hours=2, max_interp_hours_night=4)[source]
Create a
QaQc
object from apandas.DataFrame
object.- Parameters:
df (
pandas.DataFrame
) – DataFrame of climate variables with datetime index named ‘date’site_id (str) – site identifier such as station name
lat_dec_deg (float) – latitude of site in decimal degrees
var_dict (dict) – dictionary that maps flux-data-qaqc variable names to user’s columns in df e.g. {‘Rn’: ‘netrad’, …} see
fluxdataqaqc.Data.variable_names_dict
for list of flux-data-qaqc variable names
- Returns:
None
Note
When using this method, any output files (CSVs, plots) will be saved to a directory named “output” in the current working directory.
- gridMET_meta = {'ETr': {'name': 'daily_mean_reference_evapotranspiration_alfalfa', 'nc_suffix': 'agg_met_etr_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_ETr', 'units': 'mm'}, 'pet': {'name': 'daily_mean_reference_evapotranspiration_grass', 'nc_suffix': 'agg_met_pet_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_ETo', 'units': 'mm'}, 'pr': {'name': 'precipitation_amount', 'nc_suffix': 'agg_met_pr_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_prcp', 'units': 'mm'}, 'sph': {'name': 'daily_mean_specific_humidity', 'nc_suffix': 'agg_met_sph_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_q', 'units': 'kg/kg'}, 'srad': {'name': 'daily_mean_shortwave_radiation_at_surface', 'nc_suffix': 'agg_met_srad_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_srad', 'units': 'w/m2'}, 'tmmn': {'name': 'daily_minimum_temperature', 'nc_suffix': 'agg_met_tmmn_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_tmin', 'units': 'K'}, 'tmmx': {'name': 'daily_maximum_temperature', 'nc_suffix': 'agg_met_tmmx_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_tmax', 'units': 'K'}, 'vs': {'name': 'daily_mean_wind_speed', 'nc_suffix': 'agg_met_vs_1979_CurrentYear_CONUS.nc#fillmismatch', 'rename': 'gridMET_u10', 'units': 'm/s'}}
- lin_regress(y, x, fit_intercept=False, apply_coefs=False)[source]
Least squares linear regression on single or multiple independent variables.
For example, if the dependent variable (
y
) is \(Rn\) and the independent variables (x
) are \(LE\) and \(H\), then the linear regression will solve for the best fit coefficients of \(Rn = c_0 + c_1 LE + c_2 H\). Any number of variables in theQaQc.variables
can be used forx
and one fory
.If the variables chosen for regression are part of the energy balance components, i.e. \(Rn, G, LE, H\) and
apply_coefs=True
, then the best fit coefficients will be applied to their respecitive variables with consideration of the energy balance equation, i.e. the signs of the coefficients will be corrected according to \(Rn - G = LE + H\), for example ify=H
andx=['Rn','G','LE]
. i.e. solving \(H = c_0 + c_1 Rn + c_2 G + c_3 LE\) then the coefficients \(c_2\) and \(c_3\) will be multiplied by -1 before applying them to correct \(G\) and \(LE\) according to the energy balance equation.This method returns an
pandas.DataFrame
object containing results of the linear regression including the coefficient values, number of data pairs used in the, the root-mean-square-error, and coefficient of determination. This table can also be retrieved from theQaQc.lin_regress_results
instance attribute.- Arguments:
- y (str): name of dependent variable for regression, must be in
QaQc.variables
keys, or a user-added variable.- x (str or list): name or list of independent variables for
regression, names must be in
QaQc.variables
keys, or a user-added variable.
- Keyword Arguments:
- fit_intercept (bool): default False. Fit intercept for regression or
set to zero if False.
- apply_coefs (bool): default False. If
True
then apply fitted coefficients to their respective variables, rename the variables with the suffix “_corr”.
- Returns:
- Example:
Let’s say we wanted to compute the linear relationship between net radiation to the other energy balance components which may be useful if we have strong confidence in net radiation measurements for example. The resulting coefficients of regression would give us an idea of whether the other components were “under-measured” or “over-measured”. Then, starting with a
Data
instance:>>> Q = QaQc(Data_instance) >>> Q.lin_regress(y='Rn', x=['G','H','LE'], fit_intercept=True) >>> Q.lin_regress_result
This would produce something like the following,
SITE_ID
Y (dependent var.)
c0 (intercept)
c1 (coef on G)
c2 (coef on LE)
c3 (coef on H)
RMSE (w/m2)
r2 (coef. det.)
n (sample count)
a_site
Rn
6.99350781229883
1.552
1.054
0.943
18.25
0.91
3386
In this case the intercept is telling us that there may be a systematic or constant error in the independent variables and that \(G\) is “under-measured” at the sensor by over 50 precent, etc. if we assume daily \(Rn\) is accurate as measured.
- Tip:
You may also use multiple linear regression to correct energy balance components using the
QaQc.correct_data
method by passing themeth='lin_regress'
keyword argument.
- property monthly_df
Temporally resample time series data to monthly frequency based on monthly means or sums based on
QaQc.agg_dict
, provides data aspandas.DataFrame
.Note that monthly means or sums are forced to null values if less than 20 percent of a months days are missing in the daily data (
QaQc.df
). Also, for variables that are summed (e.g. ET or precipitation) missing days (if less than 20 percent of the month) will be filled with the month’s daily mean value before summation.If a
QaQc
instance has not yet run an energy balance correction i.e.QaQc.corrected
=False
before accessingmonthly_df
then the default routine of data correction (energy balance ratio method) will be conducted.Utilize the
QaQc.monthly_df
property the same way as thefluxdataqaqc.Data.df
, see it’s API documentation for examples.Tip
If you have additional variables in
QaQc.df
or would like to change the aggregation method for the monthly time series, adjust the instance attributeQaQc.agg_dict
before accessing theQaQc.monthly_df
.
- plot(ncols=1, output_type='save', out_file=None, suptitle='', plot_width=1000, plot_height=450, sizing_mode='fixed', merge_tools=False, link_x=True, **kwargs)[source]
Creates a series of interactive diagnostic line and scatter plots of input and computed daily and monthly aggregated data.
The main interactive features of the plots include: pan, selection and scrol zoom, hover tool that shows paired variable values including date, and linked x-axes that pan/zoom togehter for daily and monthly time series plots.
It is possible to change the format of the output plots including adjusting the dimensions of subplots, defining the number of columns of subplots, setting a super title that accepts HTML, and other options. If variables are not present for plots they will not be created and a warning message will be printed. There are two options for output: open a temporary file for viewing or saving a copy to
QaQc.out_dir
.A list of all potential time series plots created:
energy balance components
radiation components
incoming shortwave radiation with ASCE potential clear sky (daily only)
multiple soil heat flux measurements
air temperature
vapor pressure and vapor pressure deficit
wind speed
station precipitation and gridMET precipitation
initial and corrected latent energy
initial, corrected, gap filled, and reference evapotranspiration
crop coefficient and smoothed and interpolated crop coefficient
initial and corrected energy balance ratio
multiple soil moisture measurements
A list of all potential scatter plots created:
radiative versus turblent fluxes, initial and corrected
initial versus corrected latent energy
initial versus corrected evapotranspiration
- Keyword Arguments:
ncols (int) – default 1. Number of columns of subplots.
output_type (str) – default “save”. How to output plots, “save”, “show” in browser, “notebook” for Jupyter Notebooks, “return_figs” to return a list of Bokeh
bokeh.plotting.figure.Figure`s, or "return_grid" to return the :obj:`bokeh.layouts.gridplot
.out_file (str or None) – default
None
. Path to save output file, ifNone
save output toQaQc.out_dir
with the name [site_id]_plots.html where [site_id] isQaQc.site_id
.suptitle (str or None) – default
None
. Super title to go above plots, accepts HTML/CSS syntax.plot_width (int) – default 1000. Width of subplots in pixels.
plot_height (int) – default 450. Height of subplots in pixels, note for subplots the height will be forced as the same as
plot_width
.sizing_mode (str) – default “scale_both”. Bokeh option to scale dimensions of
bokeh.layouts.gridplot
.merge_tools (bool) – default False. Merges all subplots toolbars into a single location if True.
link_x (bool) – default True. If True link x axes of daily time series plots and monthly time series plots so that when zooming or panning on one plot they all zoom accordingly, the axes will also be of the same length.
Example
Starting from a correctly formatted config.ini and climate time series file, this example shows how to read in the data and produce the default series of plots for viewing with the addition of text at the top of plot that states the site’s location and ID.
>>> from fluxdataqaqc import Data, QaQc >>> d = Data('path/to/config.ini') >>> q = QaQc(d) >>> q.correct_data() >>> # create plot title from site ID and location in N. America >>> title = "<b>Site:</b> {}; <b>Lat:</b> {}N; <b>Long:</b> {}W".format( >>> q.site_id, q.latitude, q.longitude >>> ) >>> q.plot( >>> ncols=2, output_type='show', plot_width=500, suptitle=title >>> )
Note, we specified the width of plots to be smaller than default because we want both columns of subplots to be viewable on one page.
Tip
To reset all subplots at once, refresh the page with your web browser.
Note
Additional keyword arguments that are recognized by
bokeh.layouts.gridplot
are also accepted byQaQc.plot
.
- write(out_dir=None, use_input_names=False)[source]
Save daily and monthly time series of initial and “corrected” data in CSV format.
Note, if the energy balance closure correction (
QaQc.correct_data
) has not been run, this method will run it with default options before saving time series files to disk.The default location for saving output time series files is within an “output” subdifrectory of the parent directory containing the config.ini file that was used to create the
fluxdataqaqc.Data
andQaQc
objects, the names of the files will start with the site_id and have either the “daily_data” or “monthly_data” suffix.- Keyword Arguments:
out_dir (str or
None
) – defaultNone
. Directory to save CSVs, ifNone
save toout_dir
instance variable (typically “output” directory where config.ini file exists).use_input_names (bool) – default
False
. IfFalse
useflux-data-qaqc
variable names as in output file header, or ifTrue
use the user’s input variable names where possible (for variables that were read in and not modified or calculated byflux-data-qaqc
).
- Returns:
Example
Starting from a config.ini file,
>>> from fluxdataqaqc import Data, QaQc >>> d = Data('path/to/config.ini') >>> q = QaQc(d) >>> # note no energy balance closure correction has been run >>> q.corrected False >>> q.write() >>> q.corrected True
Note
To save data created by multiple correction routines, be sure to run the correction and then save to different output directories otherwise output files will be overwritten with the most recently used correction option.
Plot
- class fluxdataqaqc.Plot[source]
Bases:
object
Container of plot routines of
fluxdataqaqc
including static methods that can be used to create and update interactive line and scatter plots from an arbitrarypandas.DataFrame
instance.Note
The
Data
andQaQc
objects both inherit all methods ofPlot
therefore allowing them to be easily used for custom interactive time series plots for data within input data (influxdataqaqc.Data.df
) and daily and monthly data influxdataqaqc.QaQc.df
andQaQc.monthly_df
.- static add_lines(fig, df, plt_vars, colors, x_name, source, labels=None, **kwargs)[source]
Add a multiple time series to a
bokeh.plotting.figure.Figure
object using data from a datetime indexedpandas.DataFrame
with an interactive hover tool.Interactive hover shows the values of all time series data and date that is added to the figure.
- Parameters:
df (
pandas.DataFrame
) –pandas.DataFrame
containing time series data.plt_vars (list) – list of data columns in
df
to plot.colors (list) – list of line colors for variables in
plt_vars
.x_name (str) – name of the x-axis variable, e.g. the datetime index, in the
pandas.DataFrame
(df
) containing data to plot.source (
bokeh.models.sources.ColumnDataSource
) – column data source created from thepandas.DataFrame
with data to plot, i.e.df
.labels (
list
orNone
) – defaultNone
. Labels for each plot variable inplt_vars
.
- Returns:
if none of the variables in
plt_vars
are found indf
then returnNone
otherwise returns the updated figure.- Return type:
ret (
None
orbokeh.plotting.figure.Figure
)
Example
Similar to
Plot.line_plot
we first need to create abokeh.models.sources.ColumnDataSource
from apandas.DataFrame
. This example shows how to plot two variables, daily corrected latent energy and sensible heat on the same plot.>>> from fluxdataqaqc import Data, QaQc, Plot >>> d = Data('path/to/config.ini') >>> q = QaQc(d) >>> q.correct_data()
Now the
QaQc
instance should have the “LE_corr” (corrected latent energy) and “H_corr” (corrected sensible heat) columns, we can now make abokeh.models.sources.ColumnDataSource
fromfluxdataqaqc.QaQc.df
orfluxdataqaqc.QaQc.monthly_df
,>>> from bokeh.plotting import ColumnDataSource, figure, show >>> df = q.df >>> plt_vars = ['LE_corr', 'H_corr'] >>> colors = ['blue', 'red'] >>> labels = ['LE', 'H'] >>> source = ColumnDataSource(df) >>> fig = figure( >>> x_axis_label='date', y_axis_label='Corrected Turbulent Flux' >>> ) >>> Plot.add_lines( >>> fig, df, plt_vars, colors, 'date', source, labels=labels >>> ) >>> show(fig)
- static line_plot(fig, x, y, source, color, label=None, x_axis_type='date', **kwargs)[source]
Add a single time series to a
bokeh.plotting.figure.Figure
object using data from a datetime indexedpandas.DataFrame
with an interactive hover tool.Interactive hover shows the values of all time series data and date that is added to the figure.
- Parameters:
fig (
bokeh.plotting.figure.Figure
) – a figure instance to add the line to.x (str) – name of the datetime index or column in the
pandas.DataFrame
containing data to plot.y (str) – name of the column in the
pandas.DataFrame
to plot.source (
bokeh.models.sources.ColumnDataSource
) – column data source created from thepandas.DataFrame
with data to plot.color (str) – color of plot line, see Bokeh for color options.
label (str or
None
) – defaultNone
. Label for plot legend (fory
).x_axis_type (
str
orNone
) – default ‘date’. If “date” then the x-axis will be formatted as month-day-year.
- Returns:
Example
To use the
Plot.line_plot
function we first need to create abokeh.models.sources.ColumnDataSource
from apandas.DataFrame
. Let’s say we want to plot the monthly time series of corrected latent energy, starting from a config.ini file,>>> from fluxdataqaqc import Data, QaQc, Plot >>> d = Data('path/to/config.ini') >>> q = QaQc(d) >>> q.correct_data()
Now the
QaQc
should have the “LE_corr” (corrected latent energy) column, we can now make abokeh.models.sources.ColumnDataSource
fromfluxdataqaqc.QaQc.df
orfluxdataqaqc.QaQc.monthly_df
,>>> from bokeh.plotting import ColumnDataSource, figure, show >>> source = ColumnDataSource(q.monthly_df) >>> # create the figure before using line_plot >>> fig = figure(x_axis_label='date', y_axis_label='Corrected LE') >>> Plot.line_plot( >>> fig, 'date', 'LE_corr', source, color='red', line_width=3 >>> ) >>> show(fig)
Notice,
line_width
is not an argument toPlot.line_plot
but it is an acceptable keyword argument tobokeh.plotting.figure.Figure
and therefore will work as expected.
- static scatter_plot(fig, x, y, source, color, label='', lsrl=True, date_name='date', **kwargs)[source]
Add paired time series data to an interactive Bokeh scatter plot
bokeh.plotting.figure.Figure
.Handles missing data points (gaps) by masking out indices in
x
andy
where one or both are null. Thelsrl
option adds the best fit least squares linear regression line with y-intercept through zero and reports the slope of the line in the figure legend. Interactive hover shows the values of all paired (x,y) data and date that is added to the figure.- Returns:
minimum and maximum
x
and minimum and maximumy
values of paired data which can be used for adding a one to one line to the figure or other uses.- Return type:
(tuple)
Example
Let’s say that we wanted to run the energy balance ratio closure correction including gap filling with reference ET * crop coefficient and then plot corrected ET versus the calculated ET from reference ET (named “et_fill” in
flux-data-qaqc
) which is calculated on all days even those without gaps. Similar toPlot.line_plot
we first need to create abokeh.models.sources.ColumnDataSource
from apandas.DataFrame
.>>> from fluxdataqaqc import Data, QaQc >>> d = Data('path/to/config.ini') >>> q = QaQc(d) >>> q.correct_data()
Now the
QaQc
instance should have the “et_corr” (corrected ET) and “et_fill” (et calculated from reference ET and crop coefficient) columns, we can now make abokeh.models.sources.ColumnDataSource
fromfluxdataqaqc.QaQc.df
orfluxdataqaqc.QaQc.monthly_df
,>>> from bokeh.plotting import ColumnDataSource, figure, show >>> df = q.df >>> source = ColumnDataSource(df) >>> fig = figure( >>> x_axis_label='ET, corrected', y_axis_label='ET, gap fill' >>> ) >>> # note, we are calling this plot method from a QaQc instace >>> q.scatter_plot( >>> fig, 'ET_corr', 'ET_fill', source, 'red', label='lslr' >>> ) >>> show(fig)
The
label
keyword argument will be used in the legend and since the least squares linear regression line between x and y is being calculated the slope of the line will also be printed in the legend. In this case, if the slope of the regression line is 0.94 then the legend will read “lslr, slope=0.94”.Note
Extra keyword arguments (accepted by
bokeh.plotting.figure.Figure
) will be passed to the scatter plot but not to the least squares regression line plot.
utility classes and functions
Collection of utility objects and functions for the fluxdataqaqc
module.
- class fluxdataqaqc.util.Convert[source]
Bases:
object
Tools for unit conversions for
flux-data-qaqc
module.- allowable_units = {'G': ['w/m2', 'mj/m2'], 'H': ['w/m2', 'mj/m2'], 'LE': ['w/m2', 'mj/m2'], 'Rn': ['w/m2', 'mj/m2'], 'lw_in': ['w/m2', 'mj/m2'], 'lw_out': ['w/m2', 'mj/m2'], 'ppt': ['mm', 'in', 'm'], 'sw_in': ['w/m2'], 'sw_out': ['w/m2', 'mj/m2'], 't_avg': ['c', 'f', 'k'], 't_max': ['c', 'f', 'k'], 't_min': ['c', 'f', 'k'], 'vp': ['kpa', 'hpa', 'pa'], 'vpd': ['kpa', 'hpa', 'pa'], 'ws': ['m/s', 'mph']}
- classmethod convert(var_name, initial_unit, desired_unit, df)[source]
Givin a valid initial and desired variable dimension for a variable within a
pandas.DataFrame
, make the conversion and return the updatedpandas.DataFrame
.For a list of variables that require certain units within
flux-data-qaqc
seeConvert.allowable_units
(names of allowable options of input variable dimensions) andConvert.required_units
(for the mandatory dimensions of certain variables before running QaQc calculations).- Parameters:
var_name (str) – name of variable to convert in
df
.initial_unit (str) – name of initial unit of variable, must be valid from
Convert.allowable_units
.desired_unit (str) – name of units to convert to, also must be valid.
df (
pandas.DataFrame
) –pandas.DataFrame
containing variable to be converted, i.e. withvar_name
in columns.
- Returns:
updated dataframe with specified variable’s units converted
- Return type:
df (
pandas.DataFrame
)
Note
Many potential dimensions may not be provided for automatic conversion, if so you may need to update your variable dimensions manually, e.g. within a
Data.df
before creating aQaQc
instance. Unit conversions are required for variables that can potentially be used in calculations withinData
orQaQc
.
- pretty_unit_names = {'c': 'C', 'f': 'F', 'hpa': 'hPa', 'k': 'K', 'kpa': 'kPa', 'pa': 'Pa'}
- required_units = {'G': 'w/m2', 'H': 'w/m2', 'LE': 'w/m2', 'Rn': 'w/m2', 'lw_in': 'w/m2', 'lw_out': 'w/m2', 'ppt': 'mm', 'sw_in': 'w/m2', 'sw_out': 'w/m2', 't_avg': 'c', 't_max': 'c', 't_min': 'c', 'vp': 'kpa', 'vpd': 'kpa', 'ws': 'm/s'}
- fluxdataqaqc.util.monthly_resample(df, cols, agg_str, thresh=0.75)[source]
Resample dataframe to monthly frequency while excluding months missing more than a specified percentage of days of the month.
- Parameters:
df (
pandas.DataFrame
) – datetime indexed DataFrame instancecols (list) – list of columns in df to resample to monthy frequency
agg_str (str) – resample function as string, e.g. ‘mean’ or ‘sum’
- Keyword Arguments:
thresh (float) – threshold (decimal fraction) of how many days in a month must exist for it to be temporally resampled, otherwise the monthly value for the month will be null.
- Returns:
datetime indexed DataFrame that has been resampled to monthly time frequency.
- Return type:
ret (
pandas.DataFrame
)
Note
If taking monthly totals (agg_str = ‘sum’) missing days will be filled with the months daily mean before summation.
- fluxdataqaqc.util.write_configs(meta_df, data_dict, out_dir=None)[source]
Write multiple config files based on collection of site metadata and a dictionary containing variable information.
Useful for creating config files for flux-data-qaqc for batches of flux stations that utilize the same naming conventions and formatting.
- Parameters:
meta_df (
pandas.DataFrame
) – dataframe that contains the following columns (or more) that describe metadata for multiple climate stations: ‘site_id’, ‘climate_file_path’, ‘station_longitude’ ‘station_elevation’, ‘station_latitude’, and ‘missing_data_value’. Elevation should be in meters and latitude is in decimal degrees. Additional metadata columns will be added to the config file for each site, e.g. ‘QC_flag’, ‘anemometer_height’, and any others.data_dict (dict) – dictionary that maps flux-data-qaqc config names to user’s column names in input files header e.g. {‘net_radiation_col’: ‘netrad’, ‘net_radiation_units’ : ‘w/m2’} Anything that flux-data-qaqc config files “DATA” section can be present here including QC flag names, multiple soil moisture names and weights.
- Keyword Arguments:
out_dir (str or None) – default None. Directory to save config files, if None then save to currect working directory.
- Returns:
- list of
pathlib.Path
objects of full paths to each config file written.
- list of
- Return type:
configs (list)
- Raises:
Exception – if one of the mandatory metadata columns does not exist in meta_df.