smoothing hydrographic profiles

Abstract.Methods are presented for smoothing hydrographic profiles.

1. Introduction

Smoothing hydrographic profiles with conventional time-series methods is problematic for two reasons: (a) the data are commonly not equi-spaced in depth and (b) the data seldom lack trends in depth. The issues and their solutions are illustrated without much discussion here.

2. Methods

library(oce)
library(signal)
data(ctd)
S <- ctd[['salinity']]
p <- ctd[['pressure']]
## must create equispaced data for filtering to make sense
dp <- median(diff(p))
pp <- seq(min(p), max(p), dp)
S0 <- approx(p, S, pp)$y
W <- dp / 2                            # critical frequency
f1 <- butter(1, W)
f2 <- butter(2, W)

par(mfrow=c(1, 3))

## filter raw profile
plotProfile(ctd, xtype="salinity", type='l')
S0f1 <- filtfilt(f1, S0)
S0f2 <- filtfilt(f2, S0)
lines(S0f1, pp, col='red')
lines(S0f2, pp, col='blue')
mtext("(a) ", side=3, adj=1, line=-5/4, cex=3/4)
## filter detrended profile
plotProfile(ctd, xtype="salinity", type='l')
Sd <- detrend(pp, S0)
S1f1 <- filtfilt(f1, Sd$Y) + Sd$a + Sd$b * pp
S1f2 <- filtfilt(f2, Sd$Y) + Sd$a + Sd$b * pp
lines(S1f1, pp, col='red')
lines(S1f2, pp, col='blue')
mtext("(b) ", side=3, adj=1, line=-5/4, cex=3/4)

## smooth-spline raw profile
spline <- smooth.spline(pp, S0, df=3/W) # suggestion: try different df values
S2 <- predict(spline)$y
plotProfile(ctd, xtype="salinity", type='l')
lines(S2, pp, col='red')
mtext("(c) ", side=3, adj=1, line=-5/4, cex=3/4)

3. Results

The first order filter is less prone to isolated wiggles than the second order one, but both behave poorly near the top and bottom of the profile, unless the data are detrended. With detrending, filtered signals are similar to those calculated with the smoothing spline.

Smoothing a hydrographic profile

A salinity profile (black line) with various smoothing models. (a) Smoothing by first-order Butterworth filter (red) and second-order Butterworth filter (blue) on raw data. (b) As (a) but using detrended data. (c) Spline smoothing.

4. Discussion and conclusions

With detrending, the filtered results can be made similar to those of smoothing spline. However, splines have some practical advantages: (1) they do not require equispaced data, (2) they do not require detrending, (3) the results are inherently smooth (by construction of the spline) and (4) calculating derivatives is easy with a spline. These advantages explain why splines are used to calculate buoyancy frequency in the Oce package.

inferring halocline depth

Abstract. A method given for inferring halocline depths based on derivatives calculated with a smoothing spline.

1. Introduction.

There are no agreed-upon methods for inferring halocline depth, but a reasonable method might involve locating the depth at which dS/dp is largest, where S is salinity and p is pressure (Kelley 2014 chapter 5).  Calculating the derivative using e.g. diff(S)/diff(p) can be problematic because of sensitivity to noise, especially for data that have not been bin-averaged. Achieving smoothness with conventional filtering has problems at the end-points, which is particularly troublesome for a near-surface halocline (see the next blog entry). A possible solution to such problems is to calculate the derivative with a smoothing spline.

2. Methods.

Pasted below is test code that does this with the ctd dataset in the oce package. The function returns the pressure at which the smoothing spline has highest salinity derivative, and it can also plot the results (which is recommended). The parameter named deltap is used to set the value of df (degrees of freedom) for the spline. One might think of deltap as the thickness (in dbar) of the smoothing interval for each of the sub-components of the spline.

library(oce)
 
findHalocline <- function(ctd, deltap=5, plot=TRUE)
{
    S <- ctd[['salinity']]
    p <- ctd[['pressure']]
    n <- length(p)
    ## trim df to be no larger than n/2 and no smaller than 3.
    N <- deltap / median(diff(p))
    df <- min(n/2, max(3, n / N))
    spline <- smooth.spline(S~p, df=df)
    SS <- predict(spline, p)
    dSSdp <- predict(spline, p, deriv=1)
    H <- p[which.max(dSSdp$y)]
    if (plot) {
        par(mar=c(3, 3, 1, 1), mgp=c(2, 0.7, 0))
        plotProfile(ctd, xtype="salinity")
        lines(SS$y, SS$x, col='red')
        abline(h=H, col='blue')
        mtext(sprintf("%.2f m", H), side=4, at=H, cex=3/4, col='blue')
        mtext(sprintf(" deltap: %.0f, N: %.0f, df: %.0f", deltap, N, df),
              side=1, line=-1, adj=0, cex=3/4)
    }
    return(H)
}
 
# Plot two panels to see influence of deltap.
par(mfrow=c(1, 2))
data(ctd)
findHalocline(ctd)
findHalocline(ctd, 1)

3. Results.

The graph shows results for a test dataset provided with the oce package, using two values for deltap.

4. Discussion.

Lacking further information about the data or a statistical model of the variation, the choice of deltap is somewhat open, and one interpretation of the results would be to state that the halocline depth is likely to be between 4 and 5 m.

A spline-based method for halocline detection.  Left: with default parameters.  Right: setting the smoothing interval to 1 dbar (roughly 1 m).

A spline-based method for halocline detection. Left: with default parameters. Right: setting the smoothing interval to 1 dbar (roughly 1 m).

5. Conclusions.

Further work is required to test and calibrate the method with other data.

References.

  1. Kelley, Dan, 2014. Oceanographic Analysis with R. Springer-Verlag. In preparation.

Pisa 2012 scores

The Guardian Newspaper has an interesting article about the Pisa (Program for International Student Assessment) scores for 2012, and it includes data. Since I was interested to see how my own region scored, I downloaded the data into a file called PISA-summary-2012.csv and created a plot summarizing scores in all the sampled regions, with Canada highlighted.

Summary graph, ranked in three categories

Summary of Pisa 2012 scores, broken down into category.

Summary of Pisa 2012 scores, broken down into category.

R code that creates the graph

The header length is unlikely to be the same in other years, nor the column names, so this code is brittle across similar datasets, but the necessary modifications for similar data should be obvious to anyone with passing familiarity with R.

regionHighlight <- "Canada"
d <- read.csv('PISA-summary-2012.csv', skip=16, header=FALSE,
              col.names=c("rank","region",
                          "math","mathLow","mathHigh","mathChange",
                          "reading",'readingChange',
                          'science','scienceChange'))
n <- length(d$math)
par(mar=c(0.5, 3, 0.5, 0.5), mgp=c(2, 0.7, 0))
range <- range(c(d$math, d$reading, d$science))
plot(c(0, 6), range,
     type='n', xlab="", axes=FALSE,
     ylab="PISA Score (2012)")
axis(2)
box()
dy <- diff(par('usr')[3:4]) / 50 # vertical offset

x0 <- 0
dx <- 1
cex <- 0.65

## Math
o <- order(d$math, decreasing=TRUE)
y <- approx(1:n, seq(range[2],range[1],length.out=n), 1:n)$y
segments(rep(x0, n), d$math[o], rep(x0+dx, n), y, 
     col=ifelse(d$region[o]==regionHighlight, "red", "gray"))
lines(rep(x0, 2), range(d$math))
text(rep(x0+dx, n), y, d$region[o], pos=4, cex=cex,
     col=ifelse(d$region[o]==regionHighlight, "red", "black"))
text(x0+dx, range[2]+dy, "Maths", pos=4, cex=1.2)

## Reading
x0 <- x0 + 2 * dx 
o <- order(d$reading, decreasing=TRUE)
segments(rep(x0, n), d$reading[o], rep(x0+dx, n), y, 
     col=ifelse(d$region[o]==regionHighlight, "red", "gray"))
lines(rep(x0, 2), range(d$reading))
text(rep(x0+dx, n), y, d$region[o], pos=4, cex=cex,
     col=ifelse(d$region[o]==regionHighlight, "red", "black"))
text(x0+dx, range[2]+dy, "Reading", pos=4, cex=1.2)

## Science 
x0 <- x0 + 2 * dx 
o <- order(d$science, decreasing=TRUE)
segments(rep(x0, n), d$science[o], rep(x0+dx, n), y, 
     col=ifelse(d$region[o]==regionHighlight, "red", "gray"))
lines(rep(x0, 2), range(d$science))
text(rep(x0+dx, n), y, d$region[o], pos=4, cex=cex,
     col=ifelse(d$region[o]==regionHighlight, "red", "black"))
text(x0+dx, range[2]+dy, "Science", pos=4, cex=1.2)

Contents of the PISA-summary-2012.csv data file

,,"Mean score
in PISA 2012, MATHS","Share
of low achievers
in mathematics
(Below Level 2)","Share
of top performers
in mathematics
(Level 5 or 6)","Annualised
change
in score points"," Mean score
in PISA 2012, READING","Annualised
change
in score points","Mean score
in PISA 2012, SCIENCE","Annualised
change
in score points"
1,Shanghai-China,613,3.8,55.4,4.2,570,4.6,580,1.8
3,Hong Kong-China,561,8.5,33.7,1.3,545,2.3,555,2.1
2,Singapore,573,8.3,40,3.8,542,5.4,551,3.3
7,Japan,536,11.1,23.7,0.4,538,1.5,547,2.6
12,Finland,519,12.3,15.3,-2.8,524,-1.7,545,-3
11,Estonia,521,10.5,14.6,0.9,516,2.4,541,1.5
5,Korea,554,9.1,30.9,1.1,536,0.9,538,2.6
17,Vietnam,511,14.2,13.3,m,508,m,528,m
14,Poland,518,14.4,16.7,2.6,518,2.8,526,4.6
13,Canada,518,13.8,16.4,-1.4,523,-0.9,525,-1.5
8,Liechtenstein,535,14.1,24.8,0.3,516,1.3,525,0.4
16,Germany,514,17.7,17.5,1.4,508,1.8,524,1.4
4,Taiwan,560,12.8,37.2,1.7,523,4.5,523,-1.5
20,Ireland,501,16.9,10.7,-0.6,523,-0.9,522,2.3
10,Netherlands,523,14.8,19.3,-1.6,511,-0.1,522,-0.5
19,Australia,504,19.7,14.8,-2.2,512,-1.4,521,-0.9
6,Macao-China,538,10.8,24.3,1,509,0.8,521,1.6
23,New Zealand,500,22.6,15,-2.5,512,-1.1,516,-2.5
9,Switzerland,531,12.4,21.4,0.6,509,1,515,0.6
26,United Kingdom,494,21.8,11.8,-0.3,499,0.7,514,-0.1
21,Slovenia,501,20.1,13.7,-0.6,481,-2.2,514,-0.8
24,Czech Republic,499,21,12.9,-2.5,493,,508,-1
18,Austria,506,18.7,14.3,0,490,-0.2,506,-0.8
15,Belgium,515,18.9,19.4,-1.6,509,0.1,505,-0.8
28,Latvia,491,19.9,8,0.5,489,1.9,502,2
-,OECD average,494,23.1,12.6,-0.3,496,0.3,501,0.5
25,France,495,22.4,12.9,-1.5,505,0,499,0.6
22,Denmark,500,16.8,10,-1.8,496,0.1,498,0.4
36,United States,481,25.8,8.8,0.3,498,-0.3,497,1.4
33,Spain,484,23.6,8,0.1,488,-0.3,496,1.3
37,Lithuania,479,26,8.1,-1.4,477,1.1,496,1.3
30,Norway,489,22.3,9.4,-0.3,504,0.1,495,1.3
32,Italy,485,24.7,9.9,2.7,490,0.5,494,3
39,Hungary,477,28.1,9.3,-1.3,488,1,494,-1.6
29,Luxembourg,490,24.3,11.2,-0.3,488,0.7,491,0.9
40,Croatia,471,29.9,7,0.6,485,1.2,491,-0.3
31,Portugal,487,24.9,10.6,2.8,488,1.6,489,2.5
34,Russian Federation,482,24,7.8,1.1,475,1.1,486,1
38,Sweden,478,27.1,8,-3.3,483,-2.8,485,-3.1
27,Iceland,493,21.5,11.2,-2.2,483,-1.3,478,-2
35,Slovak Republic,482,27.5,11,-1.4,463,-0.1,471,-2.7
41,Israel,466,33.5,9.4,4.2,486,3.7,470,2.8
42,Greece,453,35.7,3.9,1.1,477,0.5,467,-1.1
44,Turkey,448,42,5.9,3.2,475,4.1,463,6.4
48,United Arab Emirates,434,46.3,3.5,m,442,m,448,m
47,Bulgaria,439,43.8,4.1,4.2,436,0.4,446,2
43,Serbia,449,38.9,4.6,2.2,446,7.6,445,1.5
51,Chile,423,51.5,1.6,1.9,441,3.1,445,1.1
50,Thailand,427,49.7,2.6,1,441,1.1,444,3.9
45,Romania,445,40.8,3.2,4.9,438,1.1,439,3.4
46,Cyprus,440,42,3.7,m,449,m,438,m
56,Costa Rica,407,59.9,0.6,-1.2,441,-1,429,-0.6
49,Kazakhstan,432,45.2,0.9,9,393,0.8,425,8.1
52,Malaysia,421,51.8,1.3,8.1,398,-7.8,420,-1.4
55,Uruguay,409,55.8,1.4,-1.4,411,-1.8,416,-2.1
53,Mexico,413,54.7,0.6,3.1,424,1.1,415,0.9
54,Montenegro,410,56.6,1,1.7,422,5,410,-0.3
61,Jordan,386,68.6,0.6,0.2,399,-0.3,409,-2.1
59,Argentina,388,66.5,0.3,1.2,396,-1.6,406,2.4
58,Brazil,391,67.1,0.8,4.1,410,1.2,405,2.3
62,Colombia,376,73.8,0.3,1.1,403,3,399,1.8
60,Tunisia,388,67.7,0.8,3.1,404,3.8,398,2.2
57,Albania,394,60.7,0.8,5.6,394,4.1,397,2.2
63,Qatar,376,69.6,2,9.2,388,12,384,5.4
64,Indonesia,375,75.7,0.3,0.7,396,2.3,382,-1.9
65,Peru,368,74.6,0.6,1,384,5.2,373,1.3

installing the oce package for the R language

Several of the blog items have used the oce package.  The official version of this can be installed from within R by

install.packages("oce")

and more up-to-date versions can be installed using the devtools package written by Hadley Wickham, which is itself installed with

install.packages("devtools")

after which installing the latest development version of oce is accomplished with

library(devtools)
install_github('ocedata', 'dankelley', 'master')
install_github('oce', 'dankelley', 'develop')

Note that ocedata does not need to be updated frequently, as it only updated when new datasets are added to oce. The official version of oce is only updated every few months, but the branch named develop (used above) may be updated several times a day, if the author is adding new features or fixing bugs.

For more on oce, see the oce website on github.

R interface to WebTide (tidal prediction)

The previous posting explained how to install WebTide on an OSX machine. This one shows how to hook up to an installed WebTide database, so that R code can get tidal predictions.

The following code in the R language will produce a graph in which the top panel mimics the tidal-elevation graph produced by WebTide itself (see previous blog posting for comparison).

library(oce)
tStart <- as.POSIXct("2013-12-29 14:21:00", tz="UTC")
tEnd <- as.POSIXct("2014-01-13 15:21:00", tz="UTC")
time<-seq(tStart, tEnd, by=15, units="minutes")
prediction <- webtide("predict", lon=-65.06747, lat=45.36544, time=time)
A webtide prediction using the oce package of the R library

A webtide prediction using the oce package of the R library

One of the advantages of accessing the tidal prediction from within oce is to make it easier to undertake further analysis, e.g. a node nearer Halifax has a mixed tide, with the following illustrating in terms of velocity and a so-called progressive vector

p <- webtide("predict", node=14569)
par(mfrow=c(2,1), mar=c(3, 3, 1, 1), mgp=c(2, 0.7, 0))
plot(p$u, p$v, asp=1, type="o")
dt <- diff(as.numeric(p$time[1:2]))
x <- dt * cumsum(p$u)
y <- dt * cumsum(p$v)
plot(x, y, asp=1, type="o")
Velocity covariance and progressive vector diagram for a mixed tide.

Velocity covariance and progressive vector diagram for a mixed tide.

WebTide (tidal prediction) installation (OSX)

Abstract.This blog item explains how to install WebTide, a tidal prediction application, on an OSX machine. The work requires moderate expertise, being carried out in the console, and requiring that a C compiler be found on the host machine.

1. Introduction

WebTide comes with a friendly graphical user interface that makes it easy to get tidal predictions at various locations around the world. Anyone with an interest in oceanography is likely to find it a useful application. Although the interface is slightly quirky (particularly for map navigation), it only takes a few minutes to learn. (An upcoming blog entry will explain how to avoid the WebTide interface, using the oce R package to access the data that underly WebTide.)

2. Installation

  1. Download the “linux without Java VM” version from the Bedford Institute of Oceanography website, and store the results in the ~/Downloads directory.

  2. Open a terminal or console window, and type the following to enter the WebTide directory and start the process of installing WebTide.

    cd ~/Downloads
    sh WebTide_unix_0_7.sh
    

    After a moment, a dialog box will pop up, asking where WebTide is to be installed. To install a version just for you, supply /Users/USERNAME/WebTide, where USERNAME is your username. To install a version for all accounts on the computer, use the default, which is /usr/local/WebTide. A second dialog box addresses the issue of symlinks. Click the box indicating that these are not to be used. After a few moments, a final dialog box will appear, stating that the work has been completed. Click the button named “Finish”.

  3. Next comes the OSX-specific part of the work. You will need the gcc compiler to do this. It is available for free on the Apple website, as the Xcode development tools. (If you have been using the computer for science, it is quite likely you have already installed a compiler, so just try the steps below, before bothering with a re-install of Xcode.) In a console, type the following to enter the source directory (here assumed to be configured for a single user) and to compile two programs.

    cd ~/WebTide/Tidecor_src
    gcc -O2 -o tidecor webtidecor2.5.1.c -lm
    gcc -O2 -o constituentinterpolator constituentinterpolator1.1.c -lm
    

    In the two gcc lines shown above, it may be necessary to change the names of the ".c" files, to match whatever you find in this tidecor_src directory. You may ignore the warning messages that are likely to appear, but if errors appear, you will need to interrupt this tutorial to find out what is wrong. Possibly leaving a comment on this site would encourage another reader to offer a solution; possibly the scientists at the Bedford Institute of Oceanography (named on the website) could help.

  4. Finally, move these newly-compiled programs from the source directory into the bin directory, by typing

    mv tidecor ../bin
    mv constituentinterpolator ../bin
    

3. Using WebTide

If you installed WebTide to your home directory, launch WebTide by typing the following in a console window

~/WebTide/WebTide

and use the GUI from that point on. If you installed it to /usr/local/WebTide, type the following instead

/usr/local/WebTide/WebTide

The main WebTide window looks like the following.

A WebTide map, showing a marker in the Bay of Fundy (see also the sea-level elevation diagram)

A WebTide map, showing a marker in the Bay of Fundy (see also the sea-level elevation diagram)

There is little point in explaining the details of the menu and mouse actions here, since WebTide provides its own documentation. In most cases, the action will be to click the mouse on spot on the map, to set a so-called “marker” for a location at which calculations are to be done. The diagram shown here has a marker in the Bay of Fundy. Once a marker has been set, the user may wish to click a menu item (see at the right-hand side) to get a tidal prediction, as shown in the screenshot below.

A WebTide prediction of sea-level elevation

A WebTide prediction of sea-level elevation

sundial

After experimenting with calculations for what I eventually came to realize were analemma-based sundials (with shadow cast by a vertical pole), I remembered that the common sundial has a wedge as the shadow-maker. A bit of research told me that the wedge is called a gnomon. It is a right triangle with one vertex (the “centre” vertex, shall we say) having angle equal to the local latitude. If this wedge is placed upright on a horizontal plane with the centre vertex aligned south and the 90deg vertex aligned north, then the shadow will produce a line that indicates the hour of the day. This works throughout the year, with approximately quarter-hour adjustments being required through the seasons.

The R code given below the diagram creates an outline of the expected edge of the shadow of the gnomon. To illustrate the variation in angle through the year (which relates to the “equation of time”), colours are used to indicate four significant times during the year.

Printed at full scale, the diagram might be suitable for laying out the horizontal scale for a sundial. Naturally, readers must alter the latitude and longitude if they do not live in Halifax, Nova Scotia.

A few notes:

  1. The gnomon hypotenuse will point to the pole star (Polaris) when the apparatus is aligned properly towards the north.
  2. Calling the function with debug=1 will show dots along the radial lines. These are the shadows of virtual points lying along the hypotenuse of the gnomon, and provide a check against errors in the calculation (since they should lie along a line if the gnomon angle matches the latitude).
  3. Noon is not aligned with North because the longitude is not an even multiple of 15 degrees.
  4. The length of the shadow provides extra information, but here this information is not shown (the lengths are normalized in lines 35 to 37 of the code.)

Image

## gnonom style sundial
if (!interactive())
    png("sundial_with_gnomon.png", width=7, height=6, unit="in", 
        res=200, pointsize=13)

library(oce)

sundial <- function(lon=-63.60, lat=44.65,
                    days=c("2014-03-20", "2014-06-20", "2014-09-23", "2014-12-21"),
                    keys=c("Spring equinox", "Summer solstice",
                           "Autumn equinox", "Winter solstice"),
                    debug=0)
{
    col <- 1:4
    glwd <- 8
    timezone <- floor(0.5 + lon / 15)
    L <- 1                           # horiz gnomon length (arbitrary)
    nhours <- 24
    first <- TRUE
    for (season in 1:4) {
        hours <- seq.POSIXt(as.POSIXct(days[season], tz="UTC"),
                            by="1 hour", length.out=nhours)
        for (hour in seq_along(hours)) {
            t <- hours[hour]
            tlocal <- t + 3600 * timezone
            sa <- sunAngle(t, lon=lon, lat=lat)
            gy <- seq(0, L, length.out=10)
            gx <- rep(0, length(gy))
            gz <- gy * tan(lat * pi / 180)
            R <- gz / tan(-sa$altitude * pi / 180) # radius of shadow
            theta <- (90 - sa$azimuth) * pi / 180
            par(mar=rep(2, 4))
            x <- gx + R * cos(theta)
            y <- gy + R * sin(theta)
            len <- max(sqrt(x^2 + y^2))
            x <- x / len * L
            y <- y / len * L
            if (sa$altitude > 0) {
                if (first) {
                    first <- FALSE
                    D <- L * 5
                    plot(x, y, type='n', pch=20, asp=1,
                         xlim=c(-L, L), ylim=c(-L/5, L),
                         axes=FALSE, col=col[season], xlab="", ylab="")
                    ## Draw gnomon as a gray bar
                    segments(0, 0, 0, L, lwd=glwd, col='gray')
                    grid()
                    abline(h=0, lwd=1.5*par('lwd'), lty='dotted')
                    abline(v=0, lwd=1.5*par('lwd'), lty='dotted')
                    mtext("South", side=1, at=0)
                    mtext("West", side=2, at=0)
                    mtext("North", side=3, at=0)
                    mtext("East", side=4, at=0)
                    legend("topright", lwd=glwd, col="gray",
                           legend="Gnomon")
                    legend("bottomright",
                           legend=sprintf("%.3fE %.3fN", lon, lat))
                    legend("topleft", lwd=1, col=1:4, cex=3/4,
                           legend=keys)
                    box()
                    points(0, 0, pch=20, cex=2)
                    segments(0, 0, x, y, col=col[season])
                } else {
                    segments(0, 0, x, y, col=col[season])
                    if (debug)
                        points(x, y)
                }
                if (season==2) {
                    xend <- tail(x, 1)
                    yend <- tail(y, 1)
                    text(1.05*xend, 1.05*yend, format(tlocal, "%H"))
                }

            }
        }
    }
}
sundial()
if (!interactive())
    dev.off()