Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Medical Statistics from Scratch: An Introduction for Health Professionals
Medical Statistics from Scratch: An Introduction for Health Professionals
Medical Statistics from Scratch: An Introduction for Health Professionals
Ebook809 pages6 hours

Medical Statistics from Scratch: An Introduction for Health Professionals

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Correctly understanding and using medical statistics is a key skill for all medical students and health professionals.

In an informal and friendly style, Medical Statistics from Scratch provides a practical foundation for everyone whose first interest is probably not medical statistics. Keeping the level of mathematics to a minimum, it clearly illustrates statistical concepts and practice with numerous real world examples and cases drawn from current medical literature.

This fully revised and updated third edition includes new material on: 

  • missing data, random allocation and concealment of data
  • intra-class correlation coefficient
  • effect modification and interaction
  • diagnostic testing and the ROC curve
  • standardisation

Medical Statistics from Scratch is an ideal learning partner for all medical students and health professionals needing an accessible introduction, or a friendly refresher, to the fundamentals of medical statistics.

LanguageEnglish
Release dateAug 7, 2014
ISBN9781118519394
Medical Statistics from Scratch: An Introduction for Health Professionals

Related to Medical Statistics from Scratch

Related ebooks

Medical For You

View More

Related articles

Reviews for Medical Statistics from Scratch

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Medical Statistics from Scratch - David Bowers

    This edition first published 2014 © 2014 by John Wiley & Sons Ltd.

    Second edition © 2008 by John Wiley & Sons Ltd.

    First edition © 2002 by John Wiley & Sons Ltd.

    Registered office: John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

    Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UK

    The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

    111 River Street, Hoboken, NJ 07030-5774, USA

    For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell

    The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.

    All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

    Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

    The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting a specific method, diagnosis, or treatment by health science practitioners for any particular patient. The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. Readers should consult with a specialist where appropriate. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising herefrom.

    Library of Congress Cataloging-in-Publication Data

    Bowers, David, 1938- author.

    Medical statistics from scratch : an introduction for health professionals / David Bowers.— Third edition.

    p. ; cm.

    Includes bibliographical references and index.

    ISBN 978-1-118-51938-7 (pbk.)

    I. Title.

    [DNLM: 1. Biometry. 2. Statistics as Topic--methods. WA 950]

    RA409

    610.72′7— dc23

    2014020550

    A catalogue record for this book is available from the British Library.

    Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

    Cover image courtesy of the author.

    Preface to the 3rd Edition

    The 1st edition of this book was published in 2002 and the 2nd edition in 2008. I was surprised when I discovered it was quite such a long time ago. Where did the time go! Anyway, over the course of the last five years, I have received many favourable comments from readers of my book, which of course is immensely gratifying. I must be doing something right then.

    This edition contains a completely new chapter (on diagnostic tests), there is a quite a lot of new material and most of the chapters have received an extensive re-write. I have also updated virtually all of the examples drawn from the journals and added many new exercises. I hope that this gives the book a fresh feel—as well as a new lease of life.

    The book should appeal, as before, to everybody in health care (students and professionals alike) including nurses, doctors, health visitors, physiotherapists, midwives, radiographers, dieticians, speech therapists, health educators and promoters, chiropodists and all those other allied and auxiliary professionals. It might possibly also be of interest to veterinary surgeons, one of whom reviewed my proposal fairly enthusiastically.

    My thanks to Jon Peacock and all the others at Wiley who have shepherded me along in the past and no doubt will do so in the future. I must also thank Barbara Noble, who patiently acted as my first-line copyeditor. She read through my manuscript, discovered quite a few errors of various sorts and made many valuable suggestions to improve readability. Any remaining mistakes are of course mine.

    I also want to acknowledge my great debt to Susanne, who always encourages me, enthusiastically, in everything I attempt.

    Finally, I would like to mention another book which might be of interest to any readers who are thinking of embarking on research for the first time—Getting Started in Health Research, Bowers et al., Wiley, 2012. This book covers both quantitative and qualitative research. It will guide you through the research process, from the very first idea to the interpretation of your results and your conclusions.

    David Bowers, 2013

    Preface to the 2nd Edition

    This book is a ‘not-too-mathematical’ introduction to medical statistics. It should appeal to anyone training or working in the health care arena—whatever his or her particular discipline is—who wants either a simple introduction to the subject or a gentle reminder of stuff that they might have forgotten. I have aimed the book at:

    students doing either a first degree or a diploma in clinical and health care courses

    students doing post-graduate clinical and health care studies

    health care professionals doing professional and membership examinations

    health care professionals who want to brush up on some medical statistics generally or who want a simple reminder of a particular topic

    anybody else who wants to know a bit of what medical statistics is about.

    The most significant change in this edition is the addition of two new chapters, one on measuring survival and the other on systematic review and meta-analysis. The ability to understand the principles of survival analysis is important, not least because of its popularity in clinical research and consequently in the clinical literature. Similarly, the increasing importance of evidence-based clinical practice means that systematic review and meta-analysis also demand a place. In addition, I have taken the opportunity to correct and freshen the text in a few places, as well as adding a small number of new examples. My thanks to Lucy Sayer, my editor at John Wiley & Sons, for her enthusiastic support, to Liz Renwick and Robert Hambrook and all the other people in Wiley for their invaluable help and my special thanks to my copyeditor Barbara Noble for her truly excellent work and enthusiasm (of course, any remaining errors are mine).

    I am happy to get any comments from you. You can e-mail me at: [email protected].

    Preface to the 1st Edition

    This book is intended to be an introduction to medical statistics but one which is not too mathematical—in fact, it has the absolute minimum of maths. The exceptions however are Chapters 17 and 18, which have maths on linear and logistic regressions. It is really impossible to provide material on these procedures without some maths, and I hesitated about including them at all. However, they are such useful and widely used techniques, particularly logistic regression and its production of odds ratios, which I felt they must go in. Of course, you do not have to read them. It should appeal to anyone training or working in the health care arena—whatever his or her particular discipline is—who wants a simple, not-too-technical introduction to the subject. I have aimed the book at:

    students doing either a first degree or a diploma in health care-related courses

    students doing post-graduate health care studies

    health care professionals doing professional and membership examinations

    health care professionals who want to brush up on some medical statistics generally or who want a simple reminder of a particular topic

    anybody else who wants to know a bit of what medical statistics is about.

    I intended originally to make this book as an amalgam of two previous books of mine, Statistics from Scratch for Health Care Professionals and Statistics Further from Scratch. However, although it covers a lot of the same material as in those two books, this is in reality a completely new book, with a lot of extra stuff, particularly on linear and logistic regressions. I am happy to get any comments and criticisms from you. You can e-mail me at: [email protected].

    Introduction

    My purpose in writing this book is to offer a guide to all those health care students and professionals out there, who either want to get started in medical statistics or who would like (or need) to refresh their understanding of one or more medical statistics topics. I have tried to keep the mathematics to a minimum, although this is a bit more difficult with the somewhat challenging material on modelling in later chapters.

    I have used lots of appropriate examples drawn from clinical journals to illustrate the ideas and lots of exercises which the readers may wish to work through to consolidate their understanding of the material covered (the solutions are at the end of this book).

    I have included some outputs from SPSS and Minitab which I hope will help the readers interpret the results from these statistical programmes.

    Finally, for any tutors who are using this book to introduce their students to medical statistics, I am always very pleased to receive any comments or criticisms they may have which will help me improve the book in the future editions. My e-mail address is [email protected].

    Part I

    Some Fundamental Stuff

    Chapter 1

    First things first—the nature of data

    Learning objectives

    When you have finished this chapter, you should be able to:

    Explain the difference between nominal, ordinal and metric data.

    Identify the type of any given variable.

    Explain the non-numeric nature of ordinal data.

    Variables and data

    Let's start with some numbers. Have a look at Figure 1.1.

    Figure 1.1 Some numbers. Actually, the birthweight (g) of a sample of 100 babies. Data from the Born in Bradford Cohort Study. Born in Bradford, Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust

    These numbers are actually the birthweights of a sample of 100 babies (measured in grams). We call these numbers sample data. These data arise from the variable birthweight. To state the blindingly obvious, a variable is something whose value can vary. Other variables could be blood type, age, parity and so on; the values of these variables can change from one individual to the other. When we measure a variable, we get data—in this case, the variable birthweight produces birthweight data.

    Figure 1.2 contains more sample data, in this case, for the gender of the same 100 babies.

    Figure 1.2 The gender of the sample of babies in Figure 1.1

    Moreover, Figure 1.3 contains sample data for the variable smoked while pregnant.

    Figure 1.3 The variable ‘smoked while pregnant?’ for the mothers of the babies in Figure 1.1

    The data in Figures 1.1 1.2 and 1.3 are known as raw data because they have not been organised or arranged in any way. This makes it difficult to see what interesting characteristics or features the data might contain. The data cannot tell its story, if you like. For example, it is not easy to observe how many babies had a low birthweight (less than 2500 g) from Figure 1.1, or what proportion of the babies were female from Figure 1.2. Moreover, this is for only 100 values. Imagine how much more difficult it would be for 500 or 5000 values. In the next four chapters, we will discuss a number of different ways that we can organise data so that it can tell its story. Then, we can see more easily what is going on.

    Exercise 1.1

    Why do you think that the data in Figures 1.1 1.2 and 1.3 are referred to as ‘sample data’?

    Exercise 1.2

    What percentage of mothers smoked during their pregnancy? How does your value contrast with the evidence which suggests that about 20 per cent of mothers in the United Kingdom smoked when pregnant?

    Of course, we gather data not because it is nice to look at or we've got nothing better to do but because we want to answer a question. A question such as ‘Do the babies of mothers who smoked while pregnant have a different (we're probably guessing lower) birthweight than the babies of mothers who did not smoke?’ or ‘On average, do male babies have the same birthweight as female babies?’ Later in the book, we will deal with methods which you can use to answer such questions (and ones more complex); however, for now, we need to stick with variables and data.

    Where are we going…?

    This book is an introduction to medical statistics.

    Medical statistics is about doing things with data.

    We get data when we determine the value of a variable.

    We need data in order to answer a question.

    What we can do with data depends on what type of data it is.

    The good, the bad, and the ugly—types of variables

    There are two major types of variable—categorical variables and metric variables; each of them can be further divided into two subtypes, as shown in Figure 1.4.

    c01f004

    Figure 1.4 Types of variables

    Each of these variable types produces a different type of data. The differences in these data types are of great importance—some statistical methods are appropriate for some types of data but not for others, and applying an inappropriate procedure can result in a misleading outcome. It is therefore critical that you identify the sort of variable (and data) you are dealing with before you begin any analysis, and we need therefore to examine the differences in data types in a bit more detail. From now on, I will be using the word ‘data’ rather than ‘variable’ because it is the data we will be working with—but remember that data come from variables. We'll start with categorical data.

    Categorical data

    Nominal categorical data

    Consider the gender data shown in Figure 1.2. These data are nominal categorical data (or just nominal data for short).

    The data are ‘nominal’ because it usually relates to named things, such as occupation, blood type, or ethnicity. It is particularly not numeric. It is ‘categorical’ because we allocate each value to a specific category. Therefore, for example, we allocate each M value in Figure 1.2 to the category Male and each F value to the category Female. If we do this for all 100 values, we get:

    Male    56

    Female  44

    Notice two things about this data, which is typical of all nominal data:

    The data do not have any units of measurement.¹

    The ordering of the categories is arbitrary. In other words, the categories cannot be ordered in any meaningful way.² We could just as easily have written the number of males and females in the order:

    Female

      44

    Male    56

    By the way, allocating values to categories by hand is pretty tedious as well as error-prone, more so if there are a lot of values. In practice, you would use a computer to do this.

    Exercise 1.3

    Suggest a few nominal variables.

    Ordinal categorical data

    Let's now consider data from the Glasgow Coma Scale (GCS) (which some of you may be familiar with). As the name suggests, this scale is used to assess the level of consciousness after head injury. A patient's GCS score is judged by the sum of responses in three areas: eye opening response, verbal response, and motor response. Notice particularly that these responses are assessed rather than measured (as weight, height or temperature would be). The GCS score can vary from 3 (deeply unconscious) to 15 (fully conscious). In other words, there are 13 possible categories of consciousness.³

    Suppose that we have two motor-cyclists, let us call them Wayne and Kylie, who have been admitted to the Emergency Department with head injuries following a road traffic accident. Wayne has a GCS of 5 and Kylie a GCS of 10. We can say that Wayne's level of consciousness is less than that of Kylie (so we can order the values) but we can't say exactly by how much. We certainly cannot say that Wayne is exactly half as conscious as Kylie. Moreover, the levels of consciousness between adjacent scores are not necessarily the same; for example, the difference in the levels of consciousness between two patients with GCS scores of 10 and 11 may not be the same as that between patients with scores of 11 and 12. It's therefore important to recognise that we cannot quantify these differences.

    GCS data is ordinal categorical (or just ordinal) data. It is ordinal because the values can be meaningfully ordered, and it is categorical because each value is assigned to a specific category. Notice two things about this variable, which is typical of all ordinal variables:

    The data do not have any units of measurement (so the same as that for nominal variables).

    The ordering of the categories is not arbitrary, as it is with nominal variables.

    The seemingly numeric values of ordinal data, such as GCS scores, are not in fact real numbers but only numeric labels which we attach to category values (usually for convenience or for data entry to a computer). The reason is of course (to re-emphasise this important point) that GCS data, and the data generated by most other scales, are not properly measured but assessed in some way by a clinician or a researcher, working with the individual concerned.⁴ This is a characteristic of all ordinal data.

    Because ordinal data are not real numbers, it is not appropriate to apply any of the rules of basic arithmetic to this sort of data. You should not add, subtract, multiply or divide ordinal values. This limitation has marked implications for the sorts of analyses that we can do with such data—as you will see later in this book. Finally, we should note that ordinal data are almost always integer, that is, they have whole number values.

    c01uf001

    Exercise 1.4

    Suggest a few more scales with which you may be familiar from your clinical work.

    Exercise 1.5

    Explain why it would not really make sense to calculate an average GCS for a group of head injury patients.

    Metric data

    Discrete metric data

    Consider the data in Figure 1.5. This shows the parity⁵ of the mothers of the babies whose birthweights are shown in Figure 1.1.

    Figure 1.5 Parity data (number of viable pregnancies) for the mothers whose babies' birthweights are shown in Figure 1.1

    Discrete metric data, such as that shown in Figure 1.5, comes from counting. Counting is a form of measurement—hence the name ‘metric’. The data is ‘discrete’ because the values are in discrete steps; for example, 0, 1, 2, 3 and so on. Parity data comes from counting—probably by asking the mother or by looking at records. Other examples of discrete metric data would include number of deaths, number of pressure sores, number of angina attacks, number of hospital visits and so on. The data produced are real numbers, and in contrast to ordinal data, this means that the difference between parities of 1 and 2 is exactly the same as the difference between parities of 2 and 3, and a parity of 4 is exactly twice a parity of 2.

    In short:

    Metric discrete variables can be counted and can have units of measurement—‘numbers of things’.

    They produce data which are real numbers and are invariably integers (i.e. whole numbers).

    Continuous metric data

    Look back at Figure 1.1—the birthweight data.

    Birthweight is a metric continuous variable because it can be measured. For example, if we want to know someone's weight, we can use a weighing machine; we don't have to look at the individual and make a guess (which would be approximate) or ask them how heavy they are (very unreliable). Similarly, if we want to know their diastolic blood pressure, we can use a sphygmomanometer.⁶ Guessing or asking is not necessary. But, what do we mean by ‘continuous’? Compare a digital clock with a more old-fashioned analogue clock. With a digital clock, the seconds are indicated in discrete steps: 1, 2, 3 and so on. With the analogue clock, the hand sweeps around the dial in a smooth, continuous movement. In the same way, weight is a continuous variable because the values form a continuum; weight does not increase in steps of 1 g.

    Because they can be properly measured, these data are real numbers. In contrast to ordinal values, the difference between any pair of adjacent values, say 4000 g and 4001 g is exactly the same as the difference between 4001 g and 4002 g, and a baby who weighs 4000 g is exactly twice as heavy as a baby of 2000 g. Some other examples of metric continuous data include blood pressure (mmHg), blood cholesterol (µg/ml), waiting time (minutes), body mass index (kg/m²), peak expiry flow (l per min) and so on. Notice that all of these variables have units of measurement attached to them. This is a characteristic of all metric continuous data.

    Because metric data values are real numbers, you can apply all of the usual mathematical operations to them. This opens up a much wider range of analytic possibilities than is possible with either nominal or ordinal data—as you will see later.

    To sum up:

    Metric continuous data result from measurement and they have units of measurement.

    The data are real numbers.

    These properties of both types of metric data are markedly different from the characteristics of nominal and ordinal data.

    Exercise 1.6

    Suggest a few continuous metric variables which you are familiar with. What is the difference between assessing the value of something and measuring it?

    Exercise 1.7

    Suggest a few discrete metric variables which you are familiar with.

    Exercise 1.8

    What is the difference between continuous and discrete metric data? Somebody shows you a six-pack egg carton. List (a) the possible number of eggs that the carton could contain; and (b) the number of possible values for the weight of the empty carton. What do you conclude?

    How can I tell what type of variable I am dealing with?

    The easiest way to tell whether data is metric is to check whether it has units attached to it, such as g, mm, °C, µg/cm³, number of pressure sores and number of deaths. If not, it may be ordinal or nominal—the former if the values can be put in any meaningful order. Figure 1.6 is an aid to variable-type recognition.

    c01f006

    Figure 1.6 An algorithm to help identify data type

    Exercise 1.9

    Four migraine patients are asked to assess the severity of their migraine pain one hour after the first symptoms of an attack by marking a point on a horizontal line 100 mm long. The line is marked ‘No pain’ at the left-hand end and ‘Worst possible pain’ at the right-hand end. The distance of each patient's mark from the left-hand end is subsequently measured with an mm rule, and their scores are 25 mm, 44 mm, 68 mm and 85 mm. What sort of data is this? Can you calculate the average pain of these four patients? Note that this form of measurement (using a line and getting subjects to mark it) is known as a visual analogue scale (VAS).

    The baseline table

    When you are reading a research report or a journal paper, you will want to know something about the participants in the study. In most published papers, the authors will provide the reader with a summary table describing the basic characteristics of the participants in the study. This will contain some basic demographic information, together with relevant clinical details. This table is called the baseline table or the table of basic characteristics. In the following three exercises, we make use of the baseline tables provided by the authors.

    Exercise 1.10

    Figure 1.7 contains the basic characteristics of cases and controls from a case–control study into stressful life events and the risk of breast cancer in women. Identify the type of each variable in the table.

    * Two sample t test.

    † Data for one case missing.

    χ² test for trend.

    § χ² test.

    ¶ No data for one control.

    Figure 1.7 Basic characteristics of cases and controls from a case–control study into stressful life events as risk factors for breast cancer in women. Values are mean (SD) unless stated otherwise. Source: Protheroe et al. (1999). Reproduced by permission of BMJ Publishing Group Ltd

    Exercise 1.11

    Figure 1.8 is from a cross-sectional study to determine the incidence of pregnancy-related venous thromboembolic events and their relationship to selected risk factors, such as maternal age, parity, smoking, and so on. Identify the type of each variable in the table.

    Exercise 1.12

    Figure 1.9 is from a study to compare two lotions, malathion and d-phenothrin, in the treatment of head lice in 193 schoolchildren. Ninety-five children were given malathion and 98 d-phenothrin. Identify the type of each variable in the table.

    Data presented as n (%).

    OR, odds ratio; CI, confidence interval.

    Figure 1.8 Table of baseline characteristics from a cross-sectional study of thrombotic risk during pregnancy.

    Source: Lindqvist et al. (1999). Reproduced by permission of Wolters Kluwer Health

    The two groups were similar at baseline except for a significant difference for the length of hair (p = 0.02; chi-square)

    *One value missing in the d-phenothrin group square.

    Figure 1.9 Baseline characteristics of the Pediculus humanus capitis-infested schoolchildren assigned to receive either malathion or d-phenothrin lotion.

    Source: Chosidow et al. (1994). Reproduced by permission of Elsevier

    At the end of each chapter, you should look again at the chapter objectives and satisfy yourself that you have achieved them.

    ¹ For example, cm, seconds, ccs, or kg, etc.

    ² We are excluding trivial arrangements such as alphabetic.

    ³ The scale is now used by first responders, paramedics and doctors, as being applicable to all acute medical and trauma patients.

    ⁴ There are some scales which may involve some degree of proper measurement, but these still produce ordinal values if even one part of the score is determined by a non-measured element.

    ⁵ Number of pregnancies carried to a viable gestational age—24 weeks in the United Kingdom, 20 weeks in the United States.

    ⁶ We call the device that we use to obtain the measured value, for example, a weighing scale, a sphygmomanometer, or a tape measure, etc. a measuring instrument.

    ⁷ Do not worry about the different types of study; I will discuss them in detail in Chapter 6.

    Part II

    Descriptive Statistics

    Chapter 2

    Describing data with tables

    Learning objectives

    When you have read this chapter, you should be able to:

    Explain what a frequency distribution is.

    Construct a frequency table from raw data.

    Construct relative frequency, cumulative frequency and relative cumulative frequency tables.

    Construct grouped frequency tables.

    Construct a cross-tabulation table.

    Explain what a contingency table is.

    Rank data.

    Descriptive statistics. What can we do with raw data?

    As we saw in Chapter 1, when we have a lot of raw data, for example, as in Figure 1.1 (birthweight) or Figure 1.2 (gender), it is not easy for us to answer questions that we may have; for example, the percentage of low birthweight babies or the proportion of male babies. This is because the data have not been arranged or structured in any way. If there are any interesting features in the data, they remain hidden from us. We said then that the data could not tell their story, and of course, the more the data are, the harder this becomes. Samples of many hundreds or thousands are not uncommon.

    In this chapter, and the four following, we are going to describe some methods for organising and presenting the data, so that we can answer more easily the questions of interest—essentially to enable us to see what's going on. Collectively, these methods are called descriptive statistics. These methods are a set of procedures that we can apply to raw data, so that its principal characteristics and main features are revealed. This might include sorting the data by size, putting it into a table, presenting it as a chart, or summarising it numerically.

    An important consideration in this process is the type of data you are working with. Some types of data

    Enjoying the preview?
    Page 1 of 1