eDiaMoND: A grid-enabled federated database of annotated mammograms, M Brady, DJ Gavaghan, AC Simpson

Tags: FEDERATED DATABASE, MICHAEL BRADY, image formation, UK, epidemiological studies, mammograms, images, database, robust software systems, interesting, United States, cancer, medical image analysis, MIAS, mammogram, breast cancer, data mining, applications, patient data, knowledge services, development, medical technologies, hospital information systems, quantitative analysis, image processing, Healthcare Informatics, cervical cancer, decision support systems, grid applications, brain tumour, healthcare, technical information, image compression, medical images, United Kingdom, Grid Computing, electronic medical records, database consistency, Oxford, United Kingdom, information technology, European Community, image segmentation, erroneous results, segmentation algorithms, automatic exposure control, Advanced Knowledge Technologies, UK Inter-disciplinary Research Consortium, Triple Assessment, medical application, secure storage, storage space requirements, Grid technology, demographic data, federated databases, benefit, Medical Research Council, Derek Hill, representation, University College, London, image analysis, Grid services, initial focus, David Hawkes, Open Grid Services Architecture, Imperial College of Science, Technology and Medicine, Anne Trefethen, Engineering and Physical Sciences Research Council, data fusion
Content: 41 eDiamond: a Grid-enabled federated database of annotated mammograms Michael Brady,1 David Gavaghan,2 Andrew Simpson3, Miguel Mulet Parada,3 and Ralph Highnam3 1Oxford University, Oxford, United Kingdom, 2Computing Laboratory, Oxford, United Kingdom, 3Oxford Centre for Innovation, Oxford, United Kingdom 41.1 INTRODUCTION This chapter introduces a project named eDiamond, which aims to develop a Grid-enabled federated database of annotated mammograms, built at a number of sites (initially in the United Kingdom), and which ensures database consistency and reliable image processing. A key feature of eDiamond is that images are `standardised' prior to storage. Section 41.3 describes what this means, and why it is a fundamental requirement for numerous grid applications, particularly in medical image analysis, and especially in mammography. The eDiamond database will be developed with two particular applications in mind: teaching and supporting diagnosis. There are several other applications for such a database, as Section 41.4 discusses, which are the subject of related projects. The remainder of Grid Computing ­ Making the Global Infrastructure a Reality, edited by F. Berman, G. Fox and T. Hey. 2002 John Wiley & Sons, Ltd.
924
MICHAEL BRADY ET AL.
·Au: Please clarify if this sentence can be rephrased as follows: ``The design of methodologies that enable large, robust software systems to be developed, maintained and updated.''
this section discusses the ways in which information technology (IT) is impacting on the provision of health care ­ a subject that in Europe is called Healthcare Informatics. Section 41.2 outlines some of the issues concerning medical images, and then Section 41.3 describes mammography as an important special case. Section 41.4 is concerned with medical image databases, as a prelude to the description in Section 41.5 of the eDiamond e-Science project. Section 41.6 relates the eDiamond project to a number of other efforts currently under way, most notably the US NDMA project. Finally, we draw some conclusions in Section 41.7. All Western societies are confronting similar problems in providing effective healthcare at an affordable cost, particularly as the baby boomer generation nears retirement, as the cost of litigation spirals, and as there is a continuing surge of developments in often expensive pharmaceuticals and medical technologies. Interestingly, IT is now regarded as the key to meeting this challenge, unlike the situation as little as a decade ago when IT was regarded as a part of the problem. Of course, some of the reasons for this change in attitude to IT are generic, rather than being specific to healthcare: · The massive and continuing increase in the power of affordable computing, and the consequent widespread use of PCs in the home, so that much of the population now regard computers and the Internet as aspects of modern living that are equally indispensable as owning a car or a telephone; · The miniaturisation of electronics, which have made computing devices ubiquitous, in phones and personal organisers; · The rapid deployment of high-bandwidth communications, key for transmitting large images and other patient data between centres quickly; · The development of the global network, increasingly transitioning from the Internet to the Grid; and · ·The development of software methodologies that enable large, robust software systems to be developed and enable them to be maintained and updated. In addition, there are a number of factors that contribute to the changed attitude to IT which are specific to healthcare: · The increasing number of implementations of hospital Information Systems, including electronic medical records; · The rapid uptake of Picture Archiving and Communication Systems (PACS) which enable images and signals to be communicated and accessed at high bandwidth around a hospital, enabling clinicians to store images and signals in databases and then to view them at whichever networked workstation that is most appropriate; · Growing evidence that advanced decision support systems can have a dramatic impact on the consistency and Quality of Care; · Novel imaging and signalling systems (see Section 41.2), which provide new ways to see inside the body, and to monitor disease processes non-invasively; · Miniaturisation of mechatronic systems, which enable minimally invasive surgery, and which in turn benefits the patient by reducing recovery time and the risk of complications, at the same time massively driving down costs for the health service provider;
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
925
· Digitisation of information, which means that the sites at which signals, images and other patient data are generated, analysed, and stored need not be the same, as increasingly they are not;1 and, by no means least; · The increased familiarity with, and utilisation of, PCs by clinicians. As little as five years ago, few consultant physicians would use a PC in their normal workflow, now almost all do. Governments have recognised these benefits and have launched a succession of initiatives, for example, the UK Government's widely publicised commitment to electronic delivery of healthcare by 2008, and its National Cancer Plan, in which IT features strongly. However, these technological developments have also highlighted a number of major challenges. First, the increasing range of imaging modalities allied to fear of litigation,2 mean that clinicians are drowning in data. We return to this point in Section 41.2. Second, in some areas of medicine ­ most notably mammography ­ there are far fewer skilled clinicians than there is a need for. As we point out in Section 41.3, this offers an opportunity for the Grid to contribute significantly to developing teleradiology in order to allow the geographic separation of the skilled clinician from his/her less-skilled colleague and that clinician's patient whilst improving diagnostic capability.
41.2 MEDICAL IMAGES RoЁntgen's discovery of X rays in the last decade of the nineteenth century was the first of a continuing stream of technologies that enabled clinicians to see inside the body, without first opening the body up. Since bones are calcium-rich, and since calcium attenuates X rays about 26 times more strongly than soft tissues, X-radiographs were quickly used to reveal the skeleton, in particular, to show fractures. X rays are normally used in transmission mode ­ the two-dimensional spatial distribution is recorded for a given (known) source flux. A variety of reconstruction techniques, for example, based on the Radon transform, have been developed to combine a series of two-dimensional projection images taken from different directions (normally on a circular orbit) to form a threedimensional `tomographic' volume. computed tomography (CT) is nowadays one of the tools most widely used in medicine. Of course, X rays are intrinsically ionising radiation, so in many applications the energy has to be very carefully controlled, kept as low as possible, and passed through the body for as short a time as possible, with the inevitable result that the signal-to-noise (SNR) of the image/volume is greatly reduced. X rays of the appropriate energies were used increasingly from the 1930s to reveal the properties of soft tissues, and from the 1960s onwards to discover small, non-palpable tumours for which the prognosis is very good. This is most highly developed for mammography, to 1 This technological change, together with the spread of PACS systems, has provoked turf battles between different groups of medical specialists as to who `owns' the patient at which stage of diagnosis and treatment. The emergence of the Grid will further this restructuring. 2 It is estimated that fully 12% of malpractice suits filed in the USA concern mammography, with radiologists overwhelmingly heading the `league table' of clinical specialties that are sued.
926
MICHAEL BRADY ET AL.
·Au: Please spell out this abbreviation at the first instance. ·Au: Many references cited in the text have not been listed in the reference list. Please provide the details for these citations.
which we return in the next section; but it remains the case that X rays are inappropriate for distinguishing many important classes of soft tissues, for example, white and grey matter in the brain. The most exquisite images of soft tissue are currently produced using magnetic resonance imaging (MRI), see Westbrook and Kaut [1] for a good introduction to MRI. However, to date, no pulse sequence is capable of distinguishing cancerous tissue from normal tissue, except when using a contrast agent such as the paramagnetic chelate of Gadolinium, gadopentetate dimeglumine, abbreviated as DTPA. In contrast-enhanced MRI to detect breast cancer, the patient lies on her front with the breasts pendulous in a special ·RF receiver coil; one or more image volumes are taken prior to bolus injection of DTPA and then image volumes are taken as fast as possible, for up to ten minutes. In a typical clinical setting, this generates 12 image volumes, each comprising 24 slice images, each 256 Ч 256 pixels, a total of 18 MB per patient per visit. This is not large by medical imaging standards, certainly it is small compared to mammography. Contrast-enhanced MRI is important for detecting cancer because it highlights the neoangeogenesis, a tangled mass of millions of micron-thick leaky blood vessels, grown by a tumour to feed its growth. This is essentially physiological ­ functional ­ rather than anatomical ­ information [2]. Nuclear medicine modalities such as positron-emission tomography (PET) and single photon emission computed tomography (SPECT) currently have the highest sensitivity and specificity for cancer, though PET remains relatively scarce, because of the associated capital and recurrent costs, not least of which involve a cyclotron to produce the necessary quantities of radiopharmaceuticals. Finally, in this very brief tour (see Webb ·(1988), Fitzpatrick and Sonka (2000) for more details about medical imaging), ultrasound image analysis has seen major developments over the past decade, with Doppler, second harmonic, contrast agents, threedimensional probes, and so on; but image quality, particularly for cancer, remains sufficiently poor to offset its price advantages. Generally, medical images are large and depict anatomical and pathophysiological information of staggering variety both within a single image and across a population of images. Worse, it is usually the case that clinically significant information is quite subtle. For example, Figure 41.1 shows a particularly straightforward example of a mammogram. Microcalcifications, the small white spots shown in Figure 41.1, are deposits of calcium or magnesium salts that are smaller than 1 mm. Clusters of microcalcifications are often the earliest sign of non-palpable breast cancer, though it must be stressed that benign clusters are often found, and that many small white dots do not correspond to microcalcifications (see Highnam and Brady [3] for an introduction to the physics of mammography and to microcalcifications). In order to retain the microcalcifications that a skilled radiologist can detect, it is usual to digitise mammograms to a resolution of 50 to 100 µ. It has been found that the densities in a mammogram need to be digitised to a resolution of 14 to 16 bits, yielding 2 bytes per pixel. An A4-sized mammogram digitised at the appropriate resolution gives an image that is typically 4000 Ч 4000 pixels, that is 32 MB. Generally, two views ­ craniocaudal (CC, head to toe) and mediolateral oblique (MLO, shoulder to opposite hip) ­ are taken of each of the breasts, giving 128 MB per patient per visit, approximately an order of magnitude greater than that from a contrast-enhanced MRI
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
927
Figure 41.1 A patient aged 61 years presented with a breast lump. Mammography reveals a 2 cm tumour and extensive microcalcifications, as indicated by the arrows. Diagnostically, this is straightforward.
·Au: Please spell out this abbreviation at the first instance.
examination. Note that the subtlety of clinical signs means that in practice only loss-less image compression can be used. Medical images have poor SNR, relative to good quality ·CCD images (the latter is nowadays less than 1% noise, a factor of 5 to 10 better than most medical images). It is important to realise that there are distortions of many kinds in medical images. As well as high frequency noise (that is rarely Gaussian), there are degrading effects, such as the `bias field', a low-frequency distortion due to imperfections in the MRI receiver coil. Such a degradation of an image may appear subtle, and may be discounted by the (expert) human eye; but it can distort massively the results of automatic tissue classification and segmentation algorithms, and give wildly erroneous results for algorithms attempting quantitative analysis of an image. Over the past fifteen years there has been substantial effort aimed at medical image analysis ­ the interested reader is referred to journals such as IEEE Transactions on Medical Imaging or Medical Image Analysis, as well as conference proceedings such as MICCAI (Medical Image Computation and Computer-Assisted Intervention). There has been particular effort expended upon Image Segmentation to detect regions-of-interest: shape analysis, motion analysis, and non-rigid registration of data, for example, from different patients. To be deployed in clinical practice, an algorithm has to work 24/7 with extremely high sensitivity and specificity. This is a tough specification to achieve even for images of relatively simple shapes and in cases for which the lighting and camera-subject pose can be controlled; it is doubly difficult for medical images, for which none of these simplifying considerations apply. There is, in fact, a significant difference between image analysis that uses medical images to illustrate the performance of an algorithm, and medical image analysis, in which application-specific information is embedded in algorithms in order to meet the demanding performance specifications.
928
MICHAEL BRADY ET AL.
We noted in the previous section that clinicians often find themselves drowning in data. One potential solution is data fusion ­ the integration of diverse data sets in a single cohesive framework ­ which provides the clinician with information rather than data. For example, as we noted above, PET, and SPECT can help identify the microvasculature grown by a tumour. However, the spatial resolution of PET is currently relatively poor (e.g. 3 to 8 mm voxels), too poor to be the basis for planning (say) radiotherapy. On the other hand, CT has excellent spatial resolution; but it does not show soft tissues such as grey matter, white matter, or a brain tumour. Data fusion relates information in the CT with that in the PET image, so that the clinician not only knows that there is a tumour but where it is. Examples of data fusion can be found by visiting the Website: http://www.mirada-solutions.com PACS systems have encouraged the adoption of standards in file format, particularly the DICOM standards ­ digital communication in medicine. In principle, apart from the raw image data, DICOM specifies the patient identity, the time and place at which the image was taken, gives certain technical information (e.g. pulse sequence, acquisition time), specifies out the region imaged, and gives information such as the number of slices, and so on. Such is the variety of imaging types and the rate of progress in the field that DICOM is currently an often frustrating, emerging set of standards.
41.3 MAMMOGRAPHY 41.3.1 Breast cancer facts Breast cancer is a major problem for public health in the Western world, where it is the most common cancer among women. In the European Community, for example, breast cancer represents 19% of cancer deaths and fully 24% of all cancer cases. It is diagnosed in a total of 348 000 cases annually in the United States and the European Community and kills almost 115 000 annually. Approximately 1 in 8 of women will develop breast cancer during the course of their lives, and 1 in 28 will die of the disease. There were 900 000 new cases worldwide in 1997 (World Health Organization, 1997). Such grim statistics are now being replicated in eastern countries as diets and environment become more like their western counterparts. During the past sixty years, female death rates in the United States from breast cancer stayed remarkably constant while those from almost all other causes declined. The sole exception is lung cancer death rates, which increased sharply from 5 to 26 per 100 000. It is interesting to compare the figures for breast cancer with those from cervical cancer, for which mortality rates declined by 70% after the cervical smear gained widespread acceptance. The earlier a tumour is detected the better the prognosis. A tumour that is detected when its size is just 0.5 cm has a favourable prognosis in about 99% of cases, since it is highly unlikely to have metastasized. Few women can detect a tumour by palpation (breast self-examination) when it is smaller than 1 cm, by which time (on average) the tumour will have been in the breast for up to 6 to 8 years. The five-year survival rate
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
929
for localized breast cancer is 97%; this drops to 77% if the cancer has spread by the time of diagnosis and to 22% if distant metastases are found (Journal of the National Cancer Institute). This is the clear rationale for screening, which is currently based entirely on X ray mammography (though see below). The United Kingdom was the first country to develop a national screening programme, though several other countries have established such programmes: Sweden, Finland, The Netherlands, Australia, and Ireland; France, Germany and Japan are now following suit. The first national screening programme was the UK Breast Screening Programme (BSP), which began in 1987. Currently, the BSP invites women between the ages of 50 and 64 for breast screening every three years. If a mammogram displays any suspicious signs, the woman is invited back to an assessment clinic where other views and other imaging modalities are utilized. Currently, 1.3 million women are screened annually in the United Kingdom. There are 92 screening centres with 230 radiologists, each radiologist reading on average 5000 cases per year, but some read up to 20 000. The restriction of the BSP to women aged 50 and above stems from fact that the breasts of pre-menopausal women, particularly younger women, are composed primarily of milk-bearing tissue that is calcium-rich; this milk-bearing tissue involutes to fat during the menopause ­ and fat is transparent to X rays. So, while a mammogram of a young woman appears like a white-out, the first signs of tumours can often be spotted in those of post-menopause women. In essence, the BSP defines the menopause to be substantially complete by age 50! The UK programme resulted from the Government's acceptance of the report of the committee chaired by Sir Patrick Forrest. The report was quite bullish about the effects of a screening programme: by the year 2000 the screening programme is expected to prevent about 25% of deaths from breast cancer in the population of women invited for screening . . . On average each of the women in whom breast cancer is prevented will live about 20 years more. Thus by the year 2000 the screening programme is expected to result in about 25 000 extra years of life gained annually in the UK. To date, the BSP has screened more than eleven million women and has detected over 65 000 cancers. Research published in the BMJ in September 2000 demonstrated that the National Health Service (NHS) Breast Screening Programme is saving at least 300 lives per year. The figure is set to rise to 1250 by 2010. More precisely, Moss (British Medical Journal 16/9/2000), demonstrated that the NHS breast screening program, begun in 1987, resulted in substantial reductions in mortality from breast cancer by 1998. In 1998, mortality was reduced by an average of 14.9% in those aged 50 to 54 and 75 to 79, which would be attributed to treatment improvements. In the age groups also affected by screening (55 to 69), the reduction in mortality was 21.3%. Hence, the estimated direct contribution from screening was 6.4%. Recent studies suggest that the rate of interval at which cancers appear between successive screening rounds is turning out to be considerably larger than predicted in the Forrest Report. Increasingly, there are calls for mammograms to be taken every two years and for both a CC and MLO image to be taken of each breast.
930
MICHAEL BRADY ET AL.
·Au: There was a Reference 13 inserted here. Please clarify if it is a reference, if so please provide the details. ·Au: Please clarify whether this should be `Roentgenology'.
Currently, some 26 million women are screened in the United States annually (approximately 55 million worldwide). In the United States there are 10 000 mammographyaccredited ·units. Of these, 39% are community and/or public hospitals, 26% are private radiology practices, and 13% are private hospitals. Though there are 10 000 mammography centres, there are only 2500 mammography specific radiologists ­ there is a worldwide shortage of radiologists and radiologic technologists (the term in the United Kingdom is radiographers). Huge numbers of mammograms are still read by non-specialists, contravening recommended practice, nevertheless continuing with average throughput rates between 5 and 100 per hour. Whereas expert radiologists have cancer detection rates of 76 to 84%, generalists have rates that vary from between 8 to 98% (with varying numbers of false-positives). The number of cancers that are deemed to be visible in retrospect, that is, when the outcome is known, approaches 70% (American Journal of Roentgenology 1993). Staff shortages in mammography seem to stem from the perception that it is `boring but risky': as we noted earlier, 12% of all malpractice lawsuits in the United States are against radiologists, with the failure to diagnose breast cancer becoming one of the leading reasons for malpractice litigation (AJR 1997 and Clark 1992). The shortage of radiologists is driving the development of specialist centres and technologies (computers) that aspire to replicate their skills. Screening environments are ideally suited to computers, as they are repetitive and require objective measurements. As we have noted, screening has already produced encouraging results. However, there is much room for improvement. For example, it is estimated that a staggering 25% of cancers are missed at screening. It has been demonstrated empirically that double reading greatly improves screening results; but this is too expensive and in any case there are too few screening radiologists. Indeed, recall rates drop by 15% when using 2 views of each breast (British Medical Journal, 1999). Double reading of screening mammograms has been shown to half the number of cancers missed. However, a study at Yale of board certified, radiologists showed that they disagreed 25% of the times about whether a biopsy was warranted and 19% of the time in assigning patients to 1 of 5 diagnostic categories. Recently, it has been demonstrated that single screening plus the use of computer-aided diagnosis (CAD) tools ­ image analysis algorithms that aim to detect microcalcifications and small tumours ­ also greatly improve screening effectiveness, perhaps by as much as 20%. Post-screening, the patient may be assessed by other modalities such as palpation, ultrasound and increasingly, by MRI. 5 to 10% of those screened have these extended `work-up'. Post work-up, around 5% of patients have a biopsy. In light of the number of tumours that are missed at screening (which reflects the complexity of diagnosing the disease from a mammogram), it is not surprising that clinicians err on the side of caution and order a large number of biopsies. In the United States, for example, there are over one million biopsies performed each year: a staggering 80% of these reveal benign (non-cancerous) disease. It has been reported that between screenings 22% of previously taken mammograms are unavailable or are difficult to find, mostly because of the fact that they have been misfiled in large film archives ­ lost films are a daily headache for radiologists around the world, 50% were obtained only after major effort, Bassett et al. (American Journal of Roetentology·, 1997).
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
931
·Au: Please clarify if we can rephrase this sentence as follows: ``. . . through the breast and is compressed between. . ..''.
41.3.2 Mammographic images and standard mammogram form (SMF) Figure 41.2 is a schematic of the formation of a (film-screen) mammogram·. A collimated beam of X rays pass through the breast, which is compressed (typically to a force of 14 N) between two Lucite plates. The X-ray photons that emerge from the lower plate pass through the film before being converted to light photons, which then expose the film, which is subsequently scanned (i.e. converted to electrons) at a resolution (typically) of 50 µ. In the case of full-field digital mammography, the X-ray photons are converted directly to electrons by an amorphous silicon sensor that replaces the film screen. As Figure 41.2 also shows, a part of the X-ray flux passes in a straight line through the breast, losing a proportion of less energetic photons en route as they are attenuated by the tissue that is encountered. The remaining X-ray photon flux is scattered and arrives at the sensor surface from many directions (which are, in practice, reduced by an anti-scatter grid, which has the side-effect of approximately doubling the exposure of the breast). Full details of the physics of image acquisition, including many of the distorting effects, and the way in which image analysis algorithms can be developed to undo these distortions, are presented in Highnam and Brady [3]. For the purposes of this article, it suffices to note that though radiologic technologists are well trained, the control over image formation is intrinsically weak. This is illustrated in Figure 41.3, which shows the same breast imaged with two different exposure times. The images appear very different. There are many parameters p that affect the appearance of a mammogram, including: tube voltage, film type, exposure time, and placement of an automatic exposure control. If these were to vary freely for the same compressed breast, there would be huge variation in image brightness and contrast. Of course, it would be ethically unacceptable to perform that experiment on a living breast: the accumulated radiation dose would be far too high. However, it is possible to develop a mathematical model of the formation of a mammogram, for example, the HighnamBrady physics model. With such a model in hand, the variation in image appearance can be simulated. This is the basis of the teaching system VirtualMammo developed
Primary
X-ray target Collimator Compression plate Scattered photon
Glare
Film screen cassette, anti-scatter grid, and intensifier
Figure 41.2 Schematic of the formation of a mammogram.
932
MICHAEL BRADY ET AL.
Figure 41.3 Both sets of images are of the same pair of breasts, but the left pair is scanned with a shorter exposure time than the right pair ­ an event that can easily happen in mammography. Image processing algorithms that search for `bright spots' will be unable to deal with such changes. by Mirada Solutions Limited in association with the American Society of Radiologic Technologists (ASRT). The relatively weak control on image formation, coupled with the huge change in image appearance, at which Figure 41.3 can only hint, severely limits the usefulness of the (huge) databases that are being constructed ­ images submitted to the database may tell more about the competence of the technologists who took the image, or the state of the equipment on which the image was formed, than about the patient anatomy/physiology, which is the reason for constructing the database in the first place! It is precisely this problem that the eDiamond project aims to address. In the course of developing an algorithm to estimate, and correct for, the scattered radiation shown in Figure 41.2, Highnam and Brady [3] made an unexpected discovery: it is possible to estimate, accurately, the amount of non-fat tissue in each pixel column of the mammogram. More precisely, first note that the X-ray attenuation coefficients of normal, healthy tissue and cancerous tissue are very nearly equal, but are quite different from that of fat. Fat is clinically uninteresting, so normal healthy and cancerous tissues are collectively referred to as `interesting': Highnam and Brady's method estimates ­ in millimetres ­ the amount of interesting tissue in each pixel column, as is illustrated in Figure 41.4. The critical point to note is that the interesting tissue representation refers only to (projected) anatomical structures ­ the algorithm has estimated and eliminated the particular parameters p(I) that were used to form this image I. In short, the image can be regarded as standardised. Images in standardised form can be included in a database without the confounding effect of the (mostly irrelevant ­ see below) image formation parameters. This greatly increases the utility of that database. Note also that the interesting tissue representation is quantitative: measurements are in millimetres, not in arbitrary contrast units that have no absolute meaning.
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
933
Tumour
Glandular tissue Fatty tissue
Compression plates
1cm
Hint 1.0 cm Figure 41.4 The `interesting tissue' representation developed by Highnam and Brady [3]. Tumours and normal glandular tissue together form the `interesting' tissue class. The algorithm estimates, for each column the amount of interesting tissue. If, for example, the separation between the Lucite plates is 6.5 cm, the amount of interesting tissue at a location (x,y) might be 4.75 cm, implying 1.75 cm of fat. Figures 41.5 and 41.6 show two different depictions of the interesting tissue representation: one as a surface, one as a standardised image. The information content of these two depictions is precisely the same ­ whether one chooses to work with a surface depiction, which is useful in some cases, or with an image depiction, which is useful in others, depends only upon the particular application and the user's preference. The information that is recorded in the database is the same in both cases, and it is freed of the confounding effects of weakly controlled image formation. It should now be clear why the first eDiamond database has been based on mammography: (1) there are compelling social and healthcare reasons for choosing breast cancer and (2) the interesting tissue representation provides a standardisation that is currently almost unique in medical imaging. Larry Clarke, Chief of Biomedical Imaging at the National Cancer Institute in Washington DC recently said: `I believe standardisation is crucial to biomedical image processing.' 41.4 MEDICAL DATABASES Medical image databases represent both huge challenges and huge opportunities. The challenges are many. First, as we have noted earlier, medical images tend to be large, variable across populations, contain subtle clinical signs, have a requirement for loss-less compression, require extremely fast access, have variable quality, and have privacy as
934
MICHAEL BRADY ET AL.
(a)
(b)
Figure 41.5 Depicting the interesting tissue representation as a surface. In this case, the amounts of interesting tissue shown in Figure 41.4 are regarded as heights (b), encouraging analyses of the surface using, for example, local differential geometry to estimate the characteristic `slopes' of tumours. This depiction is called the hint image.
(a)
(b)
Figure 41.6 Depicting the interesting tissue representation as an image. In this case a standard set of imaging parameters p(S) are chosen and a fresh image is formed (b). Note that this is fundamentally different from applying an enhancement algorithm such as histogram equalisation, whose result would be unpredictable. This depiction is called the standard mammogram form SMF .
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
935
a major concern. More precisely, medical images often involve 3D+t image sequences taken with multiple imaging protocols. Imaging information aside, metadata concerning personal and clinical information is usually needed: age, gender, prior episodes, general health, incidence of the disease in close relatives, disease status, and so on. Medical records, including images, are subject to strict ethical, legal and clinical protocols that govern what information can be used for what purposes. More prosaically, there are practical problems of multiple, ill-defined data formats, particularly for images. All of this means that the data is difficult to manage, and is often stored in a form in which it is only useful to individuals who know, independently, what it is, how it was acquired, from whom, and with what consent. In addition to these challenges are the facts that inter- and even intra-variability amongst clinicians is often 30 to 35%, and that the ground truth of diagnosis is hard to come by. These issues are becoming a major issue in view of the RAPID GROWTH of on-line storage of medical records, particularly images, and particularly in the United States. (In the United States, each individual owns their records, as is natural in a society that enables individuals to seek second and third opinions, and to approach medical specialists directly.) In the United Kingdom, by way of contrast, the NHS decrees that medical records are owned by the individual's clinician (primary care or specialist), and there are constraints on changing clinicians and on seeking second opinions. This situation is in a state of flux as healthcare provision is being increasingly privatised, complicating even further the development of a data archive that is designed to have lasting value. As we noted in the previous section, acquiring any kind of medical image involves setting a large number of parameters, which as often as not, reflect the individual preferences of the clinician and whose effects are confounded in the appearance of the image. The practical impact is that it is usually the case even at large medical centres that the number of images that are gathered over any given time interval ­ say, a year ­ is insufficient to avoid statistical biases such as clinician preference. In practice, this may be one of the most important contributions of the Grid, for example, to facilitate epidemiological studies using numbers of images that have sufficient statistical power to overcome biases. In a federated database, for example, one might develop a `leave-one-out' protocol to test each contributing institution against the pooled set of all other institutions. A variant to the leave-one-out methodology has, in principle, the potential for automatic monitoring of other variations between institutions such as quality control. This is currently being studied in a complementary European project Mammogrid, which also involves Oxford University and Mirada Solutions. Section 41.3 also pointed out that in the United States there are currently far fewer mammography specialist radiologists (2500) than there are mammogram readers (25 000) or mammography machines (15 000). This has re-awakened the idea of teleradiology: shipping at least problematical images to centres of expertise, an idea that demands the bandwidth of the Grid. In a related vein, clinicians and technologists alike in the United States are subject to strict continuing education (CE) requirements. Mirada Solutions' VirtualMammo is the first of what is sure to be a stream of computerbased teaching and CE credit systems for which answers can be submitted over the net and marked remotely. The ways in which a large database of the type that eDiamond envisages
936
MICHAEL BRADY ET AL.
can be used to further teaching ­ for technologists, medical physicists, and radiologists alike ­ will be a major part of the scientific development of the eDiamond project. Finally, there are a whole range of diagnostic uses for a huge database such as eDiamond is building. These range from training classification algorithms on instances for which the `ground truth' (i.e. diagnosis, confirmed or not by pathology) is known, through to data mining applications. To this latter end, Mirada Solutions has developed a data mining package called FindOneLikeIt in which the clinician identifies a region of interest in a mammogram and the data mining system trawls the database to find the 10 image fragments ­ together with associated metadata including diagnoses. The current system has been developed for a static database comprising 20 000 images; the challenge for eDiamond is to further develop both the algorithm and the database search when the database is growing rapidly (so that the statistical weight associated with a particular feature is changing rapidly).
41.5 EDIAMOND 41.5.1 Introduction The eDiamond project is designed to deliver the following major benefits, initially for mammography in the United Kingdom, but eventually wider both geographically and for other image modalities and diseases. The project aims to use the archive to evaluate innovative software based on the SMF standardisation process to compute the quality of each mammogram as it is sent to the archive. This will ensure that any centre feeding into the archive will be automatically checked for quality. A result of the eDiamond project will be an understanding of automated methods ­ based on standardisation ­ for assessing quality control. Second, the eDiamond archive will provide a huge teaching and training resource. Thirdly, radiologists faced with difficult cases will be able to use the archive as a resource for comparing difficult cases with previous cases, both benign and malignant. This has the potential to reduce the number of biopsies for benign disease thus reducing trauma and cost to the NHS. Fourth, it is intended that eDiamond will be able to provide a huge resource for epidemiological studies such as those relating to hormone replacement therapy (HRT) [4], to breast cancer risk, and to parenchymal patterns in mammograms [5]. Finally, as regards computer-aided detection CADe, eDiamond contributes to the development of a technology that appears able to increase detection rates, and thus save lives. The project offers the following stakeholder benefits: · Patients will benefit from secure storage of films, better and faster patient record access, better opinions, and lowering of the likelihood of requiring a biopsy. They will begin to `own' their mammography medical records. · Radiologists will benefit from computer assistance, massively reduced storage space requirements, instant access to mammograms without loss of mammograms, improved early diagnosis (because of improved image quality), and greater all-round efficiency leading to a reduction in waiting time for assessment. Radiologists will also benefit by
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
937
applying data mining technology to the database to seek out images that are `similar' to the one under consideration, and for which the diagnosis is already known. · Administrative staff, although their initial workload will increase, will subsequently benefit from the significantly faster image archiving and retrieval. · Hospital Trust managers will benefit from the overall reduced cost of providing a better quality service. · Hospital IT Managers will benefit from the greatly reduced burden on their already over-stretched resources. The national consolidation in storage implied by eDiamond promises to reduce drastically their costs through reduced equipment and staffing requirements and support contracts. · Researchers will benefit, as the eDiamond database, together with associated computing tools, will provide an unparalleled resource for epidemiological studies based on images. · The Government will benefit, as it is intended that eDiamond will be the basis of an improved service at greatly reduced cost. Furthermore, eDiamond will be a pilot and proving ground for other image-based databases and other areas where secure on-line data storage is important.
41.5.2 e-Science challenges This project requires that several generic e-Science challenges be addressed both by leveraging existing Grid technology and by developing novel middleware solutions. Key issues include the development of each of the following: · Ontologies and metadata for the description of demographic data, the physics underpinning the imaging process, key features within images, and relevant clinical information. · Large, federated databases both of metadata and images. · Data compression and transfer. · Effective ways of combining Grid-enabled databases of information that must be pro- tected and which will be based in hospitals that are firewall-protected. · Very rapid data mining techniques. · A secure Grid infrastructure for use within a clinical environment.
41.5.3 Objectives It is currently planned to construct a large federated database of annotated mammograms at St George's and Guy's Hospitals in London, the John Radcliffe Hospital in Oxford, and the Breast Screening Centres in Edinburgh and Glasgow. All mammograms entering the database will be SMF standardised prior to storage to ensure database consistency and reliable image processing. Applications for teaching, aiding detection, and aiding diagnosis will be developed. There are three main objectives to the initial phase of the project: the development of the Grid technology infrastructure to support federated databases of huge images (and related information) within a secure environment; the Design and Construction of the Gridconnected workstation and database of standardised images; and the development, testing
938
MICHAEL BRADY ET AL.
and validation of the system on a set of important applications. We consider each of these in turn.
41.5.3.1 Development of the Grid infrastructure There are a number of aspects to the development of a Grid infrastructure for eDiamond. The first such aspect is security: ensuring secure file transfer, and tackling the security issues involved in having patient records stored on-line, allowing access to authorised persons but also, potentially, the patients themselves at some time in the future are all key issues. The design and implementation of Grid-enabled federated databases for both the metadata and the images is another such aspect, as is the issue of data transfer: typically, each image is 30 Mb, or 120 Mb for a set of 4 images, which is the usual number for a complete case. Issues here revolve around (loss-less) data compression and very rapid and secure file transfer. Data mining issues revolve around speed of access: we aim for eDiamond to return the ten most similar cases within 8 to 10 s. Teaching tools that test students by production of `random' cases will need to work at the speed of current breast screening, which means the next case will need to be displayed within seconds after the `next' button is hit.
41.5.3.2 Database construction and data mining Ontologies are being developed for description of patient and demographic data, together with descriptions of the image parameters and of features within images. Database design is tailored to the needs of rapid search and retrieval of images, and the database is being built within the DB2 framework. The needs of future epidemiological studies and information related to high risk such as family history and known breast cancer causing mutations are also being incorporated. Some prototype data mining tools have already been developed within the context of a single database. Again, speed of access is crucial: the database architecture is being determined according to the frequency of requests for particular image types.
·Au: Please provide the caption for this figure.
41.5.3.3 System testing and validation Each of the clinical partners is being provided with a specialised breast imaging workstation to allow access to the Grid-enabled system. This will allow validation of all aspects of the development process, with the following aspects being focused upon: teaching and education, which will include the ability to undertake random testing from the database; epidemiological studies, which will involve the analysis of breast patterns using various quantification techniques; and diagnosis, which will use the database to study the benefits of computer-aided diagnosis. 41.5.4 project structure The overall structure of the eDiamond ·architecture is illustrated below.
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
939
Satellite location e.g. Edinburgh,St George's
Staging Server
Fire
100 Mbps Ethernet
wall
Workstation
Workstation Test/demo equipment (movable)
Workstation Test/demo equipment (IBM) Fire wall VPN
Satellite location e.g. Guy's, John Radcliffe Staging Server 100 Mbps Ethernet Workstation
Development PCs
Fire wall Development PCs
Workstation Development server 100 Mbps Ethernet Development location
Server Complex
Workstation
Firewall
web services WebSphere
Security server Gigabit Ethernet
Staging server Local location (co-located with Server Complex)
File
File DB2 SAN-attached
server server server
disks
SANS switch
Tape library Backup storage hierarchy
The system consists of a number of different types of location: the server complex, the satellite locations, the development locations; and the local workstation locations. We consider each in turn. In the eDiamond project, IBM Hursley will provide technical consultancy in areas such as understanding user scenarios, architecture and design, the use of products such as WAS and DB2, security and privacy (including encryption and watermarking); and optimisation for performance. In addition, IBM will provide selected software by means of the IBM Scholars Program, including IBM DB2 Universal Database, IBM DiscoveryLink, IBM Websphere Application Server (WAS), MQ Series Technology Release and Java and XML tools on Linux and other platforms. Finally, IBM will provide access to relevant IBM Research conferences and papers relevant to this project. A sister project being conducted with Dr Steve Heisig (IBM Watson Lab, New York) looking at the generic problem of Workload Management in a Grid environment is the subject of a separate (submitted) application to the DTI from OeSC. This project will use the eDiamond project as its primary application test bed. The server complex will be relatively conventional, and consist of: a high-speed switched network, a connection to the Wide Area Network (WAN) through which the remote locations connect to the server complex, a firewall, two file servers with disks attached via a Storage Area Network, two database servers with the database contents stored on SAN-attached disks, a small automatic tape library, a security server and a Web Services/WebSphere Application Server machine. In this case, the satellite locations are hospital and/or teaching locations where data is created for loading onto the system, and also where authorised users are able to access the system. Each will have a standard system user workstation and a `staging server' connected to each other and to the external link via a 100 Mbps Ethernet network. Consideration should be given to making this
940
MICHAEL BRADY ET AL.
internal network also be Gigabit Ethernet to maximise the performance of image loading. The staging server is included in the configuration to allow for the pre-loading of images and to avoid delays that might occur if all images had to be retrieved over the WAN. The workstation locations comprise a workstation and staging server configuration, co-located with the server complex, that is, on the same high-speed network, so as to explore the performance characteristics of accessing image files when the network performance is as good as possible. In order to explore the performance characteristics of using a workstation on the same high-speed network as the server complex, it is proposed that there should be a workstation and staging server co-located with the server complex. These are identical to those deployed at the satellite locations.
41.6 RELATED PROJECTS The eDiamond project (in most cases, deliberate) overlaps with several other grid projects, particularly in mammography, and more generally in medical image analysis. First, the project has strong links to the US NDMA project, which is exploring the use of Grid technology to enable a database of directly digitised (as opposed to film-screen) mammograms. IBM is also the main industrial partner for the US NDMA project, and has provided a Shared University Research (SUR) grant to create the NDMA Grid under the leadership of the University of Pennsylvania. Now in Phase II of deployment, the project connects hospitals in Pennsylvania, Chicago, North Carolina, and Toronto. The architecture of the NDMA Grid leverages the strengths of the IBM' eServer clusters ­ running AIX and Linux ­ with open protocols from Globus. The data volumes will exceed 5 petabytes per year, with network traffic at 28 terabytes per day. In addition, privacy mandates that all image transmissions and information concerning patient data be encrypted across secure public networks. Teams from the University of Pennsylvania and IBM have worked together to implement a prototype fast access, very large capacity DB2 Universal Database to serve as the secure, highly available index to the digitised X-ray data. Operation of this system is enhanced by DB2 parallel technology that is capable of providing multi-gigabyte performance. This technology also enables scalable performance on large databases by breaking the processing into separate execution components that can be run concurrently on multiple processors. The eDiamond project will be able to learn to draw from and utilise the considerable body of practical experience gained in the development of the NDMA Grid. It is expected that eDiamond and NDMA will collaborate increasingly closely. A critical difference between eDiamond and the NDMA project will be that eDiamond will use standardisation techniques prior to image storage in the database. Second, there is a complementary European project Mammogrid, which also involves Oxford University and Mirada Solutions, together with CERN, Cambridge University, the University of Western England, and Udine; but which, as noted in the previous section, will concentrate on three different applications: the use of a federated database for quality control, for training and testing a system to detect microcalcification clusters, and to initiate work on using the grid to support epidemiological studies. Third, we are just beginning work on a project entitled `Grid-Enabled Knowledge Services: Collaborative Problem Solving Environments in Medical Informatics'. This is a
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
941
·Au: Please spell out this abbreviation at the first instance. ·Au: Please clarify if it should be a `set of features'
programme of work between the UK Inter-disciplinary Research Consortium (IRC) entitled MIAS3 (from Medical Images and Signals to Clinical Information) and one of the other computer-centric IRCs, the Advanced Knowledge Technologies (AKT). The domain of application is collaborative medical problem solving using knowledge services provided via the e-Science Grid infrastructure. In particular, the initial focus will be Triple Assessment in symptomatic focal breast disease. The domain has been chosen because it contains a number of characteristics that make it especially valuable as a common focus for the IRC e-Science research agenda and because it is typical of medical multi-disciplinary teambased evidential reasoning about images, signals, and patient data. Triple assessment is a loosely regulated cooperative decision-making process that involves radiologists, oncologists, pathologists, and breast surgeons. The aim is to determine, often rapidly, the most appropriate management course for a patient: chemotherapy, neo-adjuvant chemotherapy, exploratory biopsy, surgery ­ at varying levels of severity ­ or continued surveillance with no treatment. The vision of the Grid that is taking shape in the UK e-Science community consists of a tier of services ranging across the computational fabric, information and data management, and the use of information in particular problem solving contexts and other knowledge intensive tasks. This project regards the e-Science infrastructure as a set of services that are provided by particular institutions for consumption by others, and, as such, it adopts a service-oriented view of the Grid. Moreover, this view is based upon the notion of various entities providing services to one another under various forms of contract and provides one of the main research themes being investigated ­ agent-oriented delivery of knowledge services on the Grid. The project aims to extend the research ambitions of the AKT and MIAS IRCs to include researching the provision of AKT and ·MIS technologies in a Grid Services context. Although this work focuses on medical application the majority of the research has generic applicability to many e-Science areas. The project aims at the use of the Grid to solve a pressing ­ and typical ­ medical problem rather than seeking primarily to develop the Grid architecture and software base. However, it seeks to provide information enrichment and knowledge services that run in a Grid environment, and so demonstrate applications at the information and knowledge tiers. The project will complement other projects under development within the Oxford e-Science Centre, and also within the UK e-Science community more generally. Within Oxford, several projects involve the building and interrogation of large databases, whilst others involve methods of interrogating video images, and large-scale data compression for visualisation. Within the wider UK community, Grid-enabling database technologies is a fundamental component of several of the EPSRC Pilot Projects, and is anticipated to be one of the United Kingdom's primary contributions to the proposed Open Grid Services Architecture (OGSA). ·In particular, the MIAS IRC is also developing a Dynamic Brain Atlas in which a set of features are determined by a clinician to be relevant to this patient (age, sex, medical history) and a patient-specific atlas is created from those
3 MIAS is directed by Michael Brady and includes the Universities of Oxford, Manchester, King's College, London, University College, London, and Imperial College of Science, Technology and Medicine, London. It is supported by EPSRC and MRC.
942
MICHAEL BRADY ET AL.
images in a database of brain images that match the features (see References 6 and 7 for more details). IBM is involved fully in this work, and is participating actively in the development of Data Access and Integration (DAI) for relational and XML databases (the OGSA ­ DAI project) that can be used in conjunction with future releases of Globus, including OGSA features. Additionally, IBM's strategic database support, DB2, provides XML and Web services support within the product, and it is anticipated that this support will evolve to incorporate any significant developments in the evolution of OGSA. IBM is also actively involved in the European DataGrid Project, which will provide middleware to enable next-generation exploitation of petabytes datasets of the order of petabytes. Some of the technologies being developed in that project are relevant to this project; overlap between these projects will help drive the integration of the range of different Grid technologies available today.
41.7 CONCLUSIONS eDiamond is an initial exploration of developing a grid-enabled, federated database to support clinical practice throughout the United Kingdom. One of its key innovations is that images will be standardised, removing as much as possible of the irrelevant image formation information that is confounded in the appearance of the image. Of course, eDiamond is only a start ­ for medical image databases in general, and even for the construction of a federated database of breast cancer images, which will, in the future, need to incorporate MRI, PET, ultrasound and other image types. This in turn will necessitate non-rigid registration of datasets and, perhaps, the development of an appropriate coordinate frame for the breast to complement those such as the Tallairach atlas for the brain. Finally, clinical diagnosis of breast cancer relies as much on non-image data: individual records about a patient's medical history, diet, lifestyle, and the incidence of breast cancer in the patient's family, workplace, and region. There has been excellent work done on the use of such non-image data, but it remains a significant challenge to integrate image and non-image data, as is done effortlessly by the skilled clinician. Plainly, there is much to do.
ACKNOWLEDGEMENTS This project would not have happened without the vision, drive and support of Tony Hey, Anne Trefethen, and Ray Browne. Paul Jeffreys of the Oxford eScience Centre continues to be a strong supporter, and we acknowledge a debt to the drive of Roberto Amendolia, who leads the Mammogrid project. Numerous other colleagues in the UK's IRCs, particularly, David Hawkes, Derek Hill, Nigel Shadbolt, and Chris Taylor have had a major influence on the project. Finally, the project could not have taken place without the support of the Engineering and Physical Sciences Research Council and the Medical Research Council. JMB acknowledges many fruitful conversations with Paul Taylor, John Fox, and Andrew Todd-Pokropek of UCL.
eDIAMOND: A GRID-ENABLED FEDERATED DATABASE OF ANNOTATED MAMMOGRAMS
943
·Au: Please provide the volume number and the page range for this reference. ·Au: Please provide the place of publication for this reference. ·Au: Please provide the place of publication for this reference. ·Au: Please pro- vide the place of publication for this reference.
REFERENCES 1. Westbrook, C. and Kaut, C. (1998) MRI in Practice. Cambridge, MA: Blackwell Science. 2. Armitage, P. A., Behrenbruch C. P., Brady, M. and Moore, N. ·(2002) Extracting and visualiz- ing physiological parameters using dynamic contrast-enhanced magnetic resonance imaging of the breast. IEEE Transactions on Medical Imaging. 3. Highnam, R. and Brady, M. (1999) Mammographic Image Analysis. ·Kluwer. 4. Marias, K., Highnam, R., Brady, M., Parbhoo, S. and Seifalian, A. (2002) Assessing the role of quantitative analysis of mammograms in describing breast density changes in women using HRT, To appear in Proc. Int. Workshop on Digital Mammography. Springer-Verlag. 5. Marias, K., Petroudi, S., English, R., Adams, R. and Brady, M. (2002) Subjective and computerbased characterisation of mammographic patterns, To appear in Proc. Int. Workshop on Digital Mammography. · Springer-Verlag. 6. Hartkens, T., Hill, D. L. G., Hajnal, J. V., Rueckert, D., Smith, S. M. and McKleish, K. (2002) Dynamic brain atlas, McGraw Hill Year Book of Science and Technology. New York: McGrawHill; in press. 7. Hill, D. L. G., Hajnal, J. V., Rueckert, D., Smith, S. M., Hartkens, T. and McLeish K. ·(2002) A dynamic brain atlas, Proc. MICCAI 2002. Japan Springer Lecture Notes in computer science; in press.

M Brady, DJ Gavaghan, AC Simpson

File: ediamond-a-grid-enabled-federated-database-of-annotated-mammograms.pdf
Title: c41.dvi
Author: M Brady, DJ Gavaghan, AC Simpson
Published: Wed Nov 13 15:45:24 2002
Pages: 22
File size: 0.63 Mb


CONTEMPORARY MORAL PROBLEMS, 42 pages, 0.56 Mb

, pages, 0 Mb

, pages, 0 Mb

France'40-Designer Notes, 35 pages, 0.52 Mb
Copyright © 2018 doc.uments.com