Scientific measurement is often associated with uncertainties, so we need accurate accounting for error analysis. Wherever possible these should be indicated on plots and graphs. Showing the error effectively can make plot convey much more complete information. A dedicated Matplotlib method, errorbar() allow this, for full documentation see here
- Standard Deviation
- Confidence Interval
import matplotlib.pyplot as plt
import numpy as nperrorbar(x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, capthick)x and y: the array of N data points to be plottedyerr and xerr: determine the size of the error barsfmt: controls the format of the data markers and connecting points, asplt.plotecolor: sets the color of the error barselinewidth: changes the thickness of the error linescapsize: sets the length of the error bar caps in points (defaults to 0: no error bar caps)capthick: the thickness the error bar caps.
A basic errorbar can create with a single Matplotlib function call. The errorbar fucntion has many options to fine-tune the outputs. Using these additional options such as color, capsize, can easily customize the aesthetics of our errorbar plot.
import matplotlib.pyplot as plt
import numpy as np
# The data is y = np.sin(x) + dy * np.random.randn(10)
dy = 0.1
x = np.linspace(0, 5, 10)
y = np.sin(x) + dy * np.random.randn(10)
xerr = x * 0.1
plt.errorbar(x, y, yerr=dy, xerr=xerr, fmt='o', color='black', capsize=4)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Error bar plot with noise")
plt.savefig('Error bar with noise.svg', bbox_inches='tight')
plt.show()All experimental measurements have some degrees of uncertainty, often referred to as an error. Here we'll visualize such a continuous error measurement of Ohm's Law.
The current that flows through most subtances is directly proportional to the voltage
where
import matplotlib.pyplot as plt
import numpy as np
#set default font and figure sizes
plt.rcParams['figure.figsize'] = (6,4)
plt.rcParams['font.size'] = 12
plt.rcParams['savefig.bbox'] = 'tight'
# Load a Sample data set
# run cell to download data file
V, I_mA, Ierr_mA = np.loadtxt('data_with_errors.txt', unpack=True)
# Basic Errorbars
plt.errorbar(V, I_mA, Ierr_mA, fmt='.k');
plt.title("Measurements of Ohm's Law")
plt.xlabel("Voltage (V)")
plt.ylabel("Current (mA)")
plt.savefig('Error bar Ohm Law.svg', bbox_inches='tight')
plt.show()Let's assume that the Ohm's Law experiment was conducted qith a
import matplotlib.pyplot as plt
import numpy as np
# Load a Sample data set
# run cell to download data file
V, I_mA, Ierr_mA = np.loadtxt('data_with_errors.txt', unpack=True)
# convert I_mA to A
I = I_mA * 1e-3
Ierr = Ierr_mA * 1e-3
R = 1_000
I_Theory = V/R
Verr = np.ones_like(V) * 0.1
# create the errorbars
plt.errorbar(V, I, xerr=Verr, yerr=Ierr, capsize=3, fmt='o', markersize=5, label='Measurements')
plt.plot(V, I_Theory, label='Theory')
plt.title("Measurements of Ohm's Law")
plt.xlabel('Voltage (V)')
plt.ylabel('Current (A)')
plt.legend()
plt.savefig('Error bar Ohm Law bothdimensions.svg', bbox_inches='tight')
plt.show()In this work, we will try a simple Gaussian process regression, using the Scikit-Learn API (for details, see Introducing Scikit-Learn). This method for fitting a very flexible non-paramteric function to data with a continous measure of the uncertainty.
Gaussing process regression (GPR) is supervised learning menthod used to solve regression and probabilistic classification problems. GPR has been applied to solve several different types of real-world problems, including ones in materials science, chemistry, physics, and biology.
The Scikit-Learn version is implemented mainly upon NumPy, which is simple and easy to use, but has limited hyperparameter tuning options.
To illustrate how kernel work in GPR, we will look at a simple dataset curated intentionaly. We will start by generating a synthetic dataset. The true generative process is defined. $$ f(x);=;x;sin(x) $$
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
import numpy as np
import matplotlib.pyplot as pltGaussianProcessRegressor: the main class for Gaussian Process regression in scikit-learn.
RBF: Radial Basis function kernel (squared exponential kernel)
model = lambda x: x * np.sin(x)model: a simple function
kernel = RBF(length_scale=1.0)
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)kernel: defines the covariance function of Gaussian process
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
import numpy as np
import matplotlib.pyplot as plt
# The model and draw some data
model = lambda x: x * np.sin(x)
xdata = np.array([1, 3, 5, 6, 8, 10, 12])
ydata = model(xdata)
# Define kernel (RBF with initial length scale)
kernel = RBF(length_scale=1.0)
# Compute the Gaussian process fit
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)
#fit the model
gp.fit(xdata[:, np.newaxis], ydata)
xfit = np.linspace(start=0, stop=10, num=1000)
yfit, ystd = gp.predict(xfit[:, np.newaxis], return_std=True)
# Visulaize the result
plt.plot(xdata, ydata, 'or', label=r'$f(x) = x \sin(x)$')
plt.plot(xfit, yfit, color='gray', label='Gaussian Prediction', linestyle='dotted')
plt.fill_between(xfit, yfit - 2*ystd, yfit + 2*ystd, color='gray', alpha=0.2, label='Confidence Interval')
plt.xlim(0, 10)
plt.legend()
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.title('Gaussian process regression')
plt.savefig('Scikit-Learn Gaussian.svg', bbox_inches='tight')
plt.show()