Problems with spreadsheets
By McDonald, T. | Updated 24th of March 2020
Small errors in spreadsheets have been the cause of major problems in real life. Examining the kinds of common errors in spreadsheets helps us understand the general problems working with data can present. In order to understand these problems, it is a good idea to look at what a spreadsheet is and how equations can go wrong.
A spreadsheet consists of one or more sheets:
- A sheet is a two-dimensional (array) of cells in which each cell is referenced by an index consisting of its column letter and row number.
- A cell can contain data or a calculation (known as a formula).
- There is a text box, known as a formula bar, in which the contents of a cell are displayed, and which is used for creating and editing the contents of the cell.
What is a spreadsheet?
A sheet can contain one or more:
- Tables (sheets)
- A table is a two-dimensional array of data; some data is numeric, some is text.
- The table obviously has columns and rows.
- These need to be meaningful.
- These are used much like comments in code and must be meaningful. For example, they could
- Other information.
Spreadsheets have cells
Each of the values are in a separate cell. The value in a cell is called a datum. The value that you see is called a view because it may not be what is actually in the cell. For instance, the cell may contain a function, but the cell displays the result of the function and is what you view.
|Cell in spreadsheet.|
Cells have an address so you can find them. Down the left hand side are numbers and across the top are letters. The number are the rows while the letters represent the columns. The cell address in this example is B8.
Cells can be auto updated by a formulaCells can be added together using =SUM(Cell Address 1+ Cell Address 2) or just =W9+W12.
A range of cells can be added together using =SUM(Cell Address 1: Cell Address 2).
The equals sign indicates a formula is to follow. What follows the equals sign is called the formula.
Whenever you change a value of one or more of the cells mentioned in a calculation, the result will automatically change as a result. This ease of use is a potential weakness as you will see later.
Formulas can be copied to a range of other cells in the row starting from the selected cell by clicking on the handle in the bottom right corner, holding it and dragging it to the desired cell. Clicking on any of these cells will show the same formula.
|A selected range of cells|
An example of relative addressing is W8 or A10 (no $ sign). The cell with the relative address is relative to another cell address. When a cell in a formula is copied, it will keep its relative address relative to the other address.
Absolute addressingAn example of absolute addressing is $W$8 or $A$10.
This will keep the cell address as W8 or A10 regardless of whether rows are deleted or added, which would change the address of the cell W8 to say W9 if a row before it was added.
W$8, however, will allow the column to change to say F, but it will always refer to row 8.
Formatting CellsThis allows the cells to take on the form of decimals, accounting, scientific etc.
Built-in Statistical Functions
There are many of these. Here are some examples.
- Will calculate the average value from cell B14 to k14.
- This is standard deviation.
- This will count ONLY cells with something in; blank cells don’t get counted.
- =COUNTIF(J3:J14, "Pass").
- This will count every cell if it contains ‘Pass’ from cells J3 to J14.
- This function is case sensitive.
- =AVERAGE(Cell: Cell) and =AVERAGEA(Cell: Cell)
- =AVERAGE calculates the numerical average value in a dataset, ignoring text. This function does not count the text values when dividing the added cells.
- =AVERAGEA calculates the numerical average value including text. This function does count the text values. It adds the numerical values together and then divides by the total number of cells including the text cells.
Creating simple graphs
Selecting multiple cells: first click on the first cell and then hold shift and select the last cell you want in the range.
Next, click on insert and select a chart type. I have chosen 2D column chart (bar chart).
If you chose a line chart you will have to select chart and then select switch row/column.
Excel can use IF statements
=IF(I3>=50, "Pass", "Fail").
How spreadsheets scaleSpreadsheets do not scale very well. As they become larger and more complicated, they become prone to mistake that can break them.
Problems with spreadsheets
- There are significant risks in using spreadsheets for solving real-world problems
- Financial disasters
- Technique disasters
- Complicated spreadsheets might be hard to find errors in it. The more complicated a spreadsheet becomes the more difficult it is to find errors.
- Formulas are not immediately obvious (magic numbers).
- Human error can break the spreadsheet; copying data is easy to do but is fraught with danger Copied data (whether within the same spreadsheet or in several spreadsheets). Not using relative addressing and absolute address in the correct way and for the correct reasons can lead to inaccurate calculations.
- Magic numbers. These are difficult to understand what they mean or stand for. For this reason, it is best to avoid them.
- Missing data
- Human error in general.
- Fraud – because of the mixture of code and data it is a great environment for fraud.
- Users can become overconfident because spreadsheet users do not look for errors. This includes being overconfident in the output of the spreadsheet.
- Translation of a business problem into a spreadsheet domain.
- Poor archiving leads to weaknesses in spreadsheet (version) control
- The underlining problem is that spreadsheet users have a good idea of the problem and motivation to solve it, but don’t run tests to check for validity because they see it as a waste of time. Consequently, large amounts of spreadsheets have errors in them, which can then lead to errors in linked spreadsheets. Data scientists and IT professionals know how to build robust systems that avoid these common problems.
Now we know what the problems are what is the solution?The follow can help but will not solve all problems:
- Avoid copying data (use relative addressing, if possible).
- Don't put 'magic numbers' in formulas.
- Ensure that, if a value is likely to change, any other value that depends on that value should be calculated via a formula.
- Document any cells that are used for holding variables occurring in formulas.
- Use appropriate functions when there are missing data.
- Avoid hiding data so that, if someone is new to the spreadsheet, they will be able to see all data.
- Document the spreadsheet using meaningful headings and notes.
Hope you liked my blog. If you did, leave a comment and please like.