A histogram is a graphical representation of the frequency distribution of a set of data.
Part 1 provides an introduction to frequency tables and histogram chart. The table is setup by counting bin values, then using the Excel Analysis Toolpak and finally, the Excel FREQUENCY function. The Excel version is available in the Part 1
worksheet of the associated file.
We start with a list of stock prices and returns for a major Australian retailer over one month period for November 2012 (Figure 1). The daily closing prices Cprice
are in column B, and the daily log returns LogR
in column C are returned by the Excel LN function.
Histogram/Frequency Distribution Table & Graph. To get a frequency distribution graph from the above frequency distribution table, first, select any cell within the table. Click on the Insert tab. In the Charts group of commands, you see there is a command named PivotChart.
Descriptive statistics are shown in the range E7:F15
. Each statistic is returned by an Excel function. A formula is used for the arithmetic Range statistic in cell F9
.
Figure 1. Returns and summary statistics.
In addition to the data summary provided by the descriptive statistics, an analyst might be interested in the number of returns above or below the average, or within plus or minus one standard deviation.
A frequency table tells us how often values occur in a table. In finance, it is often assumed that the stock returns series is normally distributed.
In this example, four bins are used to count the frequency values:
LogR
≤ (μ - 1σ)LogR
< μ LogR
< (μ + 1σ)LogR
To count the frequencies in the LogR
vector (see Figure 1):
E17:F19
).values
in the LogR
vector (to column I).values
into ascending order.This task can be more easily performed with the histogram tool from the Excel Analysis ToolPak
The Analysis Tools should be available on the Excel Ribbon under Data > Analysis > Data Analysis. If it is not there, then you need to install the Analysis Toolpak (ATP) AddIn.
To use the Histogram tool:
K5:L8
shown in Figure 2). This is an array formula linking the range E17:19
from Figure 1.Figure 2.The histogram bins in column L for plus and minus one standard deviation. The bins range links to E17:F19
in Figure 1.
L5
has been omitted.Figure 3.The ATP Histogram dialog box. The label in cell L5
was not selected.
The output is shown in Figure 4. The output table (range M5:N9
) has bin labels for the first three bins, whilst the last bin is given the default label of More
.
The chart output option generates the column chart for the four bins. Its format can be edited as required.
Figure 4. The ATP histogram output with frequencies si shown in column N, and a column chart on the right.
The frequency table values in Figure 4 have no link to the source data. Each value in the range N6:N9
is a constant as shown by the formula bar entry for cell N6
.
This means that the Histogram tool must be run each time the table or the chart need updating.
The FREQUENCY function provides a way of linking the frequency table to the source data, and also allows use of dynamic tables and charts used in dashboard type management reports in Part 2 of this document.
The FREQUENCY function:
To use the FREQUENCY function:
K13:L16
shown in Figure 5).M14:M17
.FREQUENCY(LogR,L14:L17)
, see cell M14
in Figure 5. The empty cell at L17
, corresponds to bin number 4, labeled More at cell L9
in Figure 4.Figure 5. A frequency table created with the Excel FREQUENCY function. The blank cells L17:M17
are equivalent to the More bin in Figure 4.
Figure 6. The return values for the FREQUENCY array function in the range M14:M17
.
The data used is a sample of share price data for the calendar year 2012, from the Australian retailer, Woolworths Limited, ASX code WOW. The data source is the Yahoo Finance web site, and the data has been imported to the WOW worksheet.
To do this (see the range A:C
in Figure 7).
A
).B
, use the Excel VLOOKUP function to map the price data in the WOW worksheet to each date in the master list of column A
. The formula is shown in the formula bar.C
, insert a vector of daily log returns using the Excel LN function. Be careful of the date vector direction.Figure 7. The stock return vectors on the left, and the Stock analyser: selection panel on the right.
The stock analyser selection panel is shown in the range E3:H7
of Figure 7. This provides an easy way to change the sample analysis date and number of observations.
The analyser selection panel cells link to the return data for the 2012 calendar year. Firstly, the analyst selects an Analysis date (cell G5
), then the length of the analysis sample is selected from the Sample (days) list (cell G6
). Satisfactory operation requires that the selection falls within the 2012 data window.
The year 2012 had 253 trading days, and suppose that the analyser permits a sample length (Sample (days)
) of 20 to 90 trading days within this 1 year window. The setup task is simplified with named ranges, and a Workarea of helper cells.
This means that the earliest analysis date (the furthest back in time) will be 11 May 2012. If 31 December 2012 is day 1, then 11 May 2012 is day 164, thus providing a maximum of 90 analysis days including 11 May 2012. [164 + 90 - 1 (because of the date common to both periods) = 253].
To setup the date vectors, create the following names:
='Part2 Analysis and Charts'!$A$6:$A$258
. (253 rows).='Part2 Analysis and Charts'!$A$6:$A$169
. (164 rows).The range G5:G6
contains two data validation cells. To do add the Data Validation items:
G5
=DateShort
{20,21,...,90}
. Name this vector DayVector
.G6
=DayVector
='Part2 Analysis and Charts'!$G$5
Sample (days)
. ='Part2 Analysis and Charts'!$G$6
The stock analyser selector panel controls the dynamic vector named LogRVector
. In turn the LogRVector
uses a nested vector named DateVector
as one of its arguments. Here are the details:
Add the Names (several names are setup in section 2.2.3:
=OFFSET(INDEX(DateShort,1,1),PositionTD-1,0,Height)
=OFFSET(DateVector,0,2)
The argument names come from the labels shown in Figure 8.
Other Names have been used in the range N6:O27
shown by the red border in Figure 8. The labels in column N
has been applied as names to the values in column 0
. All cell formulae for the Workarea as shown in the shaded section P6:Q27
of Figure 8. The Holidays
name refers to rows 31 and 32.
Figure 8. Workarea range names and formulae, with details of the LogRVector.
In this example, the dynamic histogram uses a fixed number of bins, but allows the intervals and bins widths to vary with the data. Optimal bin numbers and intervals is discussed in Cimbala (2013), Doane (1976) - Doane's formula, Scott (1979) - Scott's normal reference rule, and Sturges (1926) - Sturges' formula.
Given the assumption of a normal distribution in log returns, six symmetrical to the mean bins are used. Three below the mean, and three above the mean. Remember that the bin_array
argument to the FREQUENCY function is the upper limit to the values in a particular bin. So bin 3, labeled -1 in Figure 9 has an upper limit of Mu.
To achieve symmetry around the mean, we need to determine the Max{|Min - Mu|, |Max - Mu|}. This statistic is denoted m and is used to setup the bins values as shown in Figure 9.
We calculate the m statistic in cell Q35. The Excel ABS function returns the absolute value. The six bins are labeled {-3, -2, ... , -1} in column Q. The -4 bin is a place holder to achieve symmetry in the left tail of the normal curve.
The frequencies in column R
use the same method as the FREQUENCY function table in Figure 6. In column S
, each frequency value is divided by the vector total to return the relative frequency.
Figure 9. Worksheet showing dynamic bins (column Q
), frequencies (column R
), and relative frequencies (column S
).
The values for the normal probability density funciton in column T of Figure 8 are returned by the Excel NORMDIST function.
The NORMDIST function:
To scale the frequency and bell curve values, the Relative Normal frequency is calculated in column U
.
Looking at Figure 8, the three area marked 1, 2, and 3 are the series for the charts in Figures 10, 11, and 12. First item 2, the RelFreq
, and 3, the RelNormal
are graphed as an Excel column chart (Figure 10).
Figure 10. Column chart
Then the item 2 series column chart is converted to a line chart (Figure 11).Figure 11. Normal curve as a line chart.
Finally, the normal cuve line chart is smoothed by selecting the series, then Format Data Series > Marker Line Style, then ticking Smoothed line.
Figure 12. Normal curve after smoothing.
Associated Excel file: histogramwithnormalcurveoverlay.xlsx
.Cimbala, J.M., (2013), 'Histograms' Penn State University, Unpublished Manuscript.
Doane D.P., (1976), 'Aesthetic frequency classiication' American Statistician, pp.181-183.
Scott D.W., (1979), 'On optimal andThe choice of a class interval' Journal of the Amercian Statistical Association, pp.65-66.