Docstrings for savvy modules¶
Included below are the docstrings for all modules and functions included in savvy.
sensitivity_tools.py¶
Functions for setting up and carrying out a Sobol sensitivity analysis on your model.
This module requires the package SALib. If you don’t have SALib you can use the other functionality in savvy but will not be able to perform new sensitivity analyses.
-
sensitivity_tools.analyze_sensitivity(problem, Y, column, delimiter, order, name, parallel=False, processors=4)[source]¶ Perform the sensitivity analysis after you have run your model with all the parameters from gen_params(). This is done from the command line because it is faster and gives the option to specify the column of the results file to analyze. Parallel processing is possible. Results are saved to a file using the name parameter.
Parameters: - problem (str) – the path to the saparams* file that contains the problem definition.
- Y (str) – the path to the results file. Results should be in a file without a header. Each line of the file must contain results that correspond to the same line of the param_sets generated in gen_params().
- column (int) – integer specifying the column number of the results to analyze (zero indexed).
- delimiter (str) – string specifying the column delimiter used in the results.
- order (int) – the maximum order of sensitivity indices [1 or 2].
- name (str) – the name of the output measure to use when saving the sensitivity analysis results to a file.
- parallel (bool, optional) – boolean indicating whether to use parallel processing.
- processors (int, optional) – if parallel is True, this is an integer specifying the number of processors to use.
Returns: Return type: None
-
sensitivity_tools.gen_params(num_vars, names, bounds, n, save_loc, second_ord=True)[source]¶ Generate the parameter sets for the Sobol sensitivity analysis. Saves a file with the information required for the analysis that will be performed later.
Parameters: - num_vars (int) – the number of parameters you will vary.
- names (list) – list of strings with the names of the parameters
- bounds (list) – list of lists, where each inner list contains the upper and lower bounds for a given parameter.
- n (int) – number of initial samples to generate from the pseudo-random Sobol sequence. n parameter sets will be generated using the Sobol sequence, then the Saltelli cross-sampling method will be applied to give a total of 2n(p+1) parameter sets to be run if second_ord = True.
- save_loc (str) – path to the directory where you would like to save the parameters.
- second_ord (bool, optional) – a boolean to indicate whether or not to calculate second order sensitivity indices. If False, only 1st and total order indices will be calculated and n(p+2) parameter sets will be generated.
Returns: param_sets – an ndarray where each row is one set of parameter values. You must run your model (in whatever environment is appropriate) with each of the sets of parameters from this array. The output must be stored in the same order as given in this parameter set array (one row of results for each row of parameters).
Return type: numpy ndarray
data_processing.py¶
Tools for reading and processing the sensitivity analysis data files. Some of the files are specific to our project (input_parameters.csv’ and `results.csv), but the results of sensitivity analyses are formatted as any SALib analysis results will be from a sobol analysis.
Our data files are stored outside this repository because they are too large, so users need to specify the path to their data.
-
data_processing.find_unimportant_params(header='ST', path='.')[source]¶ This function finds which parameters have sensitivities and confidence intervals equal to exactly 0.0, which means those parameters have no role in influencing the output variance for any of the calculated output measures.
These parameters could be considered for removal from the model (although it is possible they might play a role in other, unsaved outputs)
Parameters: - header (str, optional) – string of the column header for the sensitivity index you choose.
- path (str, optional) – string with the path to the folder where your analysis files are located.
Returns: unimportant – a list of the parameters that don’t matter for these outputs.
Return type: list
-
data_processing.get_params(path='./input_parameters.csv', numrows=None, drop=['End_time', 'Oxygen'])[source]¶ - NOTE: This function is specific to our lignin modeling dataset
- and is not needed for the visualization features of savvy
Returns a pandas dataframe with all the parameters analyzed in the sensitivity analysis, but not additional parameters like end time and oxygen content. If you would like all of the parameters (even those not analyzed for sensitivity) then pass drop=False.
Parameters: - path (str, optional) – string containing the path to the parameters csv.
- numrows (int, optional) – the number of rows of the input_parameters file to read (default is to read all rows).
- drop (list, optional) – a list of strings for which parameters you do not want to include in the returned dataframe. If you want all params then pass drop=False.
Returns: Return type: pandas dataframe
-
data_processing.get_results(path='./results.csv', numrows=None, drop=['light_aromatic_C-C', 'light_aromatic_methoxyl'])[source]¶ - NOTE: This function is specific to our lignin modeling dataset
- and is not needed for the visualization features of savvy
Returns a pandas dataframe with the results of running all of the simulations for the parameters sets in input_parameters.csv. This function drops two unused functional groups from the results file.
Parameters: - path (str, optional) – the path to the results csv file.
- numrows (int, optional) – the number of rows of the input_parameters file to read (default is to read all rows).
- drop (list, optional) – a list of strings for which output measures to drop from the returned dataframe. If you want all outputs use drop=False.
Returns: Return type: pandas dataframe
-
data_processing.get_sa_data(path='.')[source]¶ This function reads and processes all the sensitivity analysis results in a specified folder and returns a dictionary with the corresponding dataframes for first/total order sensitivity indices and second order indices (if present).
Sensitivity analysis results should be in the default SALib output format and must start with the word ‘analysis’.
NOTE: there are two lines of code at the beginning of this function (the filenames.remove lines) that are specific to our lignin modeling dataset. Future users can remove or modify these lines to use with other datasets.
Parameters: path (str, optional) – String containing the relative or absolute path of the directory where analysis_*.txt files are stored. There cannot be any files or folders within this directory that start with ‘analysis’ except those generated by the SALib sensitivity analysis. All analysis* files in this path should correspond to outputs from one sensitivity analysis project, and if second order sensitivity indices are included in any of the files they should be present in all the others. Returns: sens_dfs – Dictionary where keys are the names of the various output measures (one output measure per analysis file in the folder specified by path). Dictionary values are a list of pandas dataframes. sens_dfs[‘key’][0] is a dataframe with the first and total order indices of all the parameters with respect to the “key” output variable.
sens_dfs[‘key’][1] is a dataframe with the second order indices for pairs of parameters (if second order indices are present in the analysis file). If there are no second order results in the analysis file then this value is a boolean, False.
Return type: dict
-
data_processing.read_file(path, numrows=None, drop=False, sep=', ')[source]¶ Function reads a file of input parameters or model results and returns a pandas dataframe with its contents. The first line of the input should contain headers corresponding to the column names.
Parameters: - path (str) – the complete filename, including absolute or relative path.
- numrows (int, optional) – number of rows of the file to read. If you don’t specify this parameter all rows will be read.
- drop (list, optional) – list of strings indicating which (if any) of the named columns you do not want to include in the resulting dataframe. (ex. [‘cats’, ‘dogs’], default is not to drop any rows).
- sep (str) – string indicating the column separator in the file (optional, default = ‘,’).
Returns: df – A pandas dataframe with the contents of the file, limited to the number of rows specified and without the columns named in “drop”.
Return type: pandas dataframe
plotting.py¶
This module creates plots for visualizing sensitivity analysis dataframes.
make_plot() creates a radial plot of the first and total order indices.
make_second_order_heatmap() creates a square heat map showing the second order interactions between model parameters.
-
plotting.make_plot(dataframe=<Mock name='mock.DataFrame()' id='140625130435088'>, highlight=[], top=100, minvalues=0.01, stacked=True, lgaxis=True, errorbar=True, showS1=True, showST=True)[source]¶ Basic method to plot first and total order sensitivity indices.
This is the method to generate a Bokeh plot similar to the burtin example template at the Bokeh website. For clarification, parameters refer to an input being measured (Tmax, C, k2, etc.) and stats refer to the 1st or total order sensitivity index.
Parameters: - dataframe (pandas dataframe) – Dataframe containing sensitivity analysis results to be plotted.
- highlight (lst, optional) – List of strings indicating which parameter wedges will be highlighted.
- top (int, optional) – Integer indicating the number of parameters to display (highest sensitivity values) (after minimum cutoff is applied).
- minvalues (float, optional) – Cutoff minimum for which parameters should be plotted. Applies to total order only.
- stacked (bool, optional) – Boolean indicating in bars should be stacked for each parameter (True) or unstacked (False).
- lgaxis (bool, optional) – Boolean indicating if log axis should be used (True) or if a linear axis should be used (False).
- errorbar (bool, optional) – Boolean indicating if error bars are shown (True) or are omitted (False).
- showS1 (bool, optional) – Boolean indicating whether 1st order sensitivity indices will be plotted (True) or omitted (False).
- showST (bool, optional) –
Boolean indicating whether total order sensitivity indices will be plotted (True) or omitted (False).
Note if showS1 and showST are both false, the plot will default to showing ST data only instead of a blank plot
Returns: p – A Bokeh figure of the data to be plotted
Return type: bokeh figure
-
plotting.make_second_order_heatmap(df, top=10, name='', mirror=True, include=[])[source]¶ Plot a heat map of the second order sensitivity indices from a given dataframe. If you are choosing a high value of top then making this plot gets expensive and it is recommended to set mirror to False.
Parameters: - df (pandas dataframe) – dataframe with second order sensitivity indices. This dataframe should be formatted in the standard output format from a Sobol sensitivity analysis in SALib.
- top (int, optional) – integer specifying the number of parameter interactions to plot (those with the ‘top’ greatest values are displayed).
- name (str, optional) – string indicating the name of the output measure you are plotting.
- mirror (bool, optional) – boolean indicating whether you would like to plot the mirror image (reflection across the diagonal). This mirror image contains the same information as plotted already, but will increase the computation time for large dataframes.
- include (list, optional) – a list of parameters that you would like to make sure are shown on the heat map (even if they are not in the top subset)
Returns: p – A Bokeh figure to be plotted
Return type: bokeh figure
interactive_plots.py¶
This modules adds interactivity to plots in plotting.py through Bokeh tabs and ipython widgets.
Dependencies: plotting.py data_processing.py matplotlib numpy pandas os bokeh ipywidgets collections
-
interactive_plots.interact_with_plot_all_outputs(sa_dict, demo=False, manual=True)[source]¶ This function adds the ability to interactively adjust all of the plotting.make_plot() arguments.
Parameters: - sa_dict (dict) – a dictionary with all the sensitivity analysis results.
- demo (bool, optional) – plot only few outcomes for demo purpose.
Returns: Return type: Interactive widgets to control plot
-
interactive_plots.plot_all_outputs(sa_dict, demo=False, min_val=0.01, top=100, stacked=True, error_bars=True, log_axis=True, highlighted_parameters=[])[source]¶ This function calls plotting.make_plot() for all the sensitivity analysis output files and lets you choose which output to view using tabs.
Parameters: - sa_dict (dict) – a dictionary with all the sensitivity analysis results.
- demo (bool, optional) – plot only two outcomes instead of all outcomes for demo purpose.
- min_val (float, optional) – a float indicating the minimum sensitivity value to be shown.
- top (int, optional) – integer indicating the number of parameters to display (highest sensitivity values).
- stacked (bool, optional) – Boolean indicating in bars should be stacked for each parameter.
- error_bars (bool, optional) – Boolean indicating if error bars are shown (True) or are omitted (False).
- log_axis (bool, optional) – Boolean indicating if log axis should be used (True) or if a linear axis should be used (False).
- highlighted_parameters (list, optional) – List of strings indicating which parameter wedges will be highlighted.
Returns: p – a Bokeh plot generated with plotting.make_plot() that includes tabs for all the possible outputs.
Return type: bokeh plot
-
interactive_plots.plot_all_second_order(sa_dict, top=5, mirror=True, include=[])[source]¶ This function calls plotting.make_second_order_heatmap() for all the sensitivity analysis output files and lets you choose which output to view using tabs
Parameters: - sa_dict (dict) – a dictionary with all the sensitivity analysis results.
- top (int, optional) – the number of parameters to display (highest sensitivity values).
- include (list, optional) – a list of parameters you would like to include even if they are not in the top top values.
Returns: p – a Bokeh plot that includes tabs for all the possible outputs.
Return type: bokeh plot
network_tools.py¶
This module contains functions to create and display network graphs of the sensitivity analysis results. It is included as an independent module in this package because graph-tools is an uncommon package that is slightly more involved to install than normal conda- or pip-accessible packages. All the other visualization functionality of savvy is accessible with the more readily available bokeh plots.
The plots generated in this module offer a good visualization of which parameters have the highest sensitivities, and which are connected by second order interactions. Relative sizes of vertices on these plots are not very good representations of the actual difference in magnitude between sensitivities (a value of 0.02 appears similar to a value of 0.2). The bokeh visualizations offer better insight into these relative magnitudes.
-
network_tools.build_graph(df_list, sens='ST', top=410, min_sens=0.01, edge_cutoff=0.0)[source]¶ Initializes and constructs a graph where vertices are the parameters selected from the first dataframe in ‘df_list’, subject to the constraints set by ‘sens’, ‘top’, and ‘min_sens’. Edges are the second order sensitivities of the interactions between those vertices, with sensitivities greater than ‘edge_cutoff’.
Parameters: - df_list (list) – A list of two dataframes. The first dataframe should be the first/total order sensitivities collected by the function data_processing.get_sa_data().
- sens (str, optional) – A string with the name of the sensitivity that you would like to use for the vertices (‘ST’ or ‘S1’).
- top (int, optional) – An integer specifying the number of vertices to display ( the top sensitivity values).
- min_sens (float, optional) – A float with the minimum sensitivity to allow in the graph.
- edge_cutoff (float, optional) – A float specifying the minimum second order sensitivity to show as an edge in the graph.
Returns: g – a graph-tool graph object of the network described above. Each vertex has properties ‘param’, ‘sensitivity’, and ‘confidence’ corresponding to the name of the parameter, value of the sensitivity index, and it’s confidence interval. The only edge property is ‘second_sens’, the second order sensitivity index for the interaction between the two vertices it connects.
Return type: graph-tool object
-
network_tools.plot_network_circle(g, inline=True, filename=None, scale=300.0)[source]¶ Display a plot of the network, g, with the vertices placed around the edge of a circle. Vertices are the model parameters and they are connected by edges whose thickness indicates the value of the second order sensitivity.
Parameters: - g (graph-tool graph) – The graph to plot.
- inline (bool, optional) – Boolean indicating whether the plot should be shown inline in an ipython notebook. If false the plot is created in its own window and is somewhat interactive.
- filename (str, optional) – If you would like to save the plot to a file specify a filename (with an extension of pdf or png).
- scale (float, optional) – If you would like to resize the vertices you can change the value of this float.
Returns: Return type: graph-tool plot
-
network_tools.plot_network_random(g, inline=True, filename=None, scale=300.0)[source]¶ Display a plot of the network, g, with the vertices placed in an unstructured, apparently random layout. Vertices are the model parameters and they are connected by edges whose thickness indicates the value of the second order sensitivity.
Parameters: - g (graph-tool graph) – The graph to plot.
- inline (bool, optional) – Boolean indicating whether the plot should be shown inline in an ipython notebook. If false the plot is created in its own window and is somewhat interactive.
- filename (str, optional) – If you would like to save the plot to a file specify a filename (with an extension of pdf or png).
- scale (float, optional) – If you would like to resize the vertices you can change the value of this float.
Returns: Return type: graph-tool plot