Conceptualising research quality in medicine for evaluative bibliometrics

Transkript

1 FACULTY OF HUMANITIES UNIVERSITY OF COPENHAGEN PhD thesis Jens Peter Andersen Conceptualising research quality in medicine for evaluative bibliometrics Academic advisor: Jesper Wiborg Schneider Submitted: 26/07/2013

2

3 Conceptualising research quality in medicine for evaluative bibliometrics Jens Peter Andersen PhD thesis from Research Programme for Infomation Studies Faculty of Humanities, University of Copenhagen

4 CIP - Cataloging in publication Andersen, Jens Peter Conceptualising research quality in medicine for evaluative bibliometrics / Jens Peter Andersen- Aalborg: Research Programme for Information Studies, Faculty of Humanities, Copenhagen University, Denmark, x. 398 p. Includes appendix. ISBN:

5 Konceptualisering af forskningskvalitet indenfor medicin Jens Peter Andersen Ph.d.-afhandling fra Forskningsprogram for informationsstudier Det Humanistiske Fakultet, Københavns Universitet

6

7 For Linnea and Anette

8

9 Acknowledgments A great number of people have helped in the creation of this dissertation and I am thankful to each and every one of them. Without their help and support, this dissertation and the process of writing it would not have been the same. My rst and foremost thank goes to my advisor and mentor, Jesper W. Schneider. You have played a major role in shaping this dissertation, and your subtle way of guiding me in the right direction has allowed me to grow with the task. is is not only true for my work with the dissertation, but your trust and guidance has also helped me in the many other areas that are part of becoming a researcher. Before even starting on this path, a number of people inspired and supported me to go in this direction at all: Pia Borlund offered her help from the very beginning, and together with Conni Skrubbeltrang and Hans Gregersen was invaluable in creating the initial proposals and a platform for the actual PhD position. Also Birgitta Olander and Fredrik Åström offered their council and trust in those initial stages. I would like to thank the NORSLIS research school and the ISSI doctoral forums for their fruitful PhD workshops and forums, and in particular the signi cant feedback from Dick Klavans, Paul Wouters and Peter Ingwersen. In addition to the curriculum, I have met many exciting people through both institutions. Particularly Björn Hammarfelt has been a good friend and collaborator; I hope we will have more opportunities to work together in the future. I would also like to thank the many people I have met at the ISSI and STI conferences during my time as a PhD student. You have been most welcoming and it is always interesting to discuss bibliometrics with you. Especially I would like to thank Ludo Waltman, ed van Leeuwen, Rodrigo Costas, Dag Aksnes, Carolin Michels, Pei- Shan Chi, Truyken Ossenblock, Stefanie Haustein and Grant Lewison for their advice, discussions and company. A key element of this dissertation is the input from medical researchers in both interview sessions and an online survey. All the participants in these studies have been incredibly important for the study, and the dialogues during the interviews were inspiring and eye-opening. I am humbled by the interest and time these participants have spent on my project. My colleagues at the medical library at Aalborg University Hospital have all been very understanding and supportive. Conni, Hanne, Jakob, Jette, Marianne, Louise, Pernille, Kristin and Tenna - you make me look forward to going to work every day. Also my other colleagues at Aalborg Hospital Science and Innovation Center have been incredibly supportive and I appreciate the advice and help I have received from all of you. My nal appreciation goes to my partner in life, the universe and everything; Anette. I could not imagine a better friend, wife and lover than you - you have been there for me and our daughter, challenged me, made me want to try harder, aspire for more and never give up.

10

11 Abstract e use of the term research quality in bibliometric research assessment is a problematic, yet common, practice. While the concept can be operationalised as a matter of successfully executing and publishing research (e.g. Smart, 2005; Andras, 2011), there are several other elements of research which could be considered qualities. Such elements could be e.g. the practical implications of research, the effects on society or the adaptation by other research areas. Traditional evaluative bibliometrics often measure the impact of research, which is seen as an aspect of research quality. However, impact can also be interpreted in different ways, but yet there is a semantic relationship between the interpretation of research quality and impact; a relationship which is most likely an overlap between certain aspects of research quality and bibliometric quantities. e aim of this dissertation was to conceptualise research quality, through a description of the qualities of the concept relating to the formalised dissemination of research, and describing the dimensions of the concept. e purpose of this conceptualisation was an articulation of the interface between research quality and evaluative bibliometrics. e study is delimited to the medical eld; a eld with high productivity, an established scienti c society and internal perceptions (formal as well as informal) of research quality. A combination of qualitative and quantitative methods were used to investigate the concept of research quality. In the initial phase, interviews were used to obtain qualitative statements on how research quality is perceived by researchers and practitioners in the medical eld. 14 people from Aalborg University Hospital participated in these interviews, and codi ed statements were extracted from transcripts of the interviews. ese were validated through an online survey sent to medical researchers from most of Europe, North America and Australia. A total of 279 complete responses to the survey were collected, and factor analysis was used to analyse the underlying structure of the included variables. Based on this, important variables, factors and interactions were identi ed. e factors were further quali ed by relating them back to the qualitative data they were originally derived from. e resulting interlinked, narrated factors were used to create two models of research quality, one in a research process context, and one highlighting the descriptions of dimensions and quantitations of research quality. In both models, the central elements are three dimensions of research quality; dissemination, policy effects and health effects. Each of these dimensions, or impact types, provide a partial answer to de nition of the quality of research. e results of the present dissertation contribute to the progression of bibliometric research through an elucidation of the research quality concept and its interface with bibliometric methodology. It also adds to the debate about bibliometric terminology by elaborating the impact aspect of research quality. Finally, it calls for further research on different document types, as well as citation context and path studies.

12

13 Abstract in danish Brugen af ordet forskningskvalitet indenfor bibliometrisk forskningsevaluering er en problematisk, men alligevel almindelig, praksis. Omend begrebet kan operationalises som den succesfulde udførsel og publicering af forskning (f.eks. Smart, 2005; Andras, 2011), er der adskillige andre elementer af forskning der kan anses som kvaliteter. Sådanne elementer kunne f.eks. være forskningens praktiske betydning, den samfundsmæssige betydning og andre forskningsområders tilegnelse. Traditionel, evaluerende bibliometri måler ofte hvad der betegnes som forskningens impact, der ses som et aspekt af forskningskvalitet. Impact kan dog også fortolkes forskelligt, men der er alligevel en semantisk relation imellem fortolkningen af forskningskvalitet og impact. Denne relation kan sandsynligvis betegnes som et overlap imellem visse aspekter af forskningskvalitet og bibliometriske mål. Formålet med denne afhandling var at konceptualisere forskningskvalitet, igennem en beskrivelse af de af begrebets kvaliteter, der er relateret til den formaliserede udbredning af forskning, og en beskrivelse af begrebets dimensioner. Formålet med denne konceptualisering var en italesættelse af berørings- aden imellem forskningskvalitet og bibliometri. Undersøgelsen er afgrænset til det medicinske område; et område præget af høj produktivitet, et etableret videnskabeligt samfund og en intern opfattelse (formel som uformel) af forskningskvalitet. Der blev anvendt en kombination af kvalitative og kvantitative metoder til at undersøge begrebet forskningskvalitet. I den indeledende fase blev interviews brugt til at indsamle kvalitative udtalelser om, hvordan forskningskvalitet bliver opfattet af forskere og praktikere i det medicinske område. 14 mennesker fra Aalborg Universitetshospital deltog i disse interviews, og kodede udtalelser blev udtrukket fra interviewtranskriberingerne. Disse blev efterfølgende valideret igennem et online spørgeskema, sendt til medicinske forskere fra størstedelen af Europa, Nordamerika og Australien. Ialt blev der indsamlet 279 fuldstændinge svar fra spørgeskemaet, og der blev udført faktoranalyse for at undersøge den underliggende struktur i de inkluderede variable. Med udgangspunkt i dette identi ceredes afgørende variable, faktorer og interaktioner herimellem. Faktorerne blev yderligere kvali ceret ved at holde dem op imod de kvalitative data, de oprindelig blev a edt af. De endelige, relaterede, beskrevne faktorer blev brugt til at fremstille to modeller over forskningskvalitet. Den ene model sætter forskningskvalitet i en forskningsprocesskontekst, imens den anden belyser beskrivelsen af forskningskvalitetsbegrebets dimensioner og kvantiteringer. I begge modeller er de centrale elementer tre dimensioner af forskningskvalitet; udbredelse (disseminering), politikeffekter og helbredseffekter. Hver af disse dimensioner, eller impact typer, bidrager med en del af svaret til en de nering af forskningskvalitet. Resultaterne af den forhåndenværende afhandling bidrager til progressionen af bibliometrisk forskning igennem en præcisering af forskningskvalitetsbegrebet og dets berørings ade med bibliometrisk metodologi. Afhandlingen bidrager også til debatten om bibliometrisk terminologi ved at udbygge impactaspektet af forskningskvalitet. Endelig lægger det op til yderligere forskning indenfor forskellige dokumenttyper, såvel som citationskontekst- og citation path -undersøgelser.

14

15 Contents 1 Introduction Objectives of the dissertation Research questions Structure of the dissertation Research quality The research quality concept Research quality and impact assessment Negative findings, fraud and retractions Peer assessment Summary Measuring research quality Quantitation & measurement Quantitation of quality Measurement & metrics Evaluative bibliometrics Bibliometric foundation Bibliometric units Citation analysis Impact Referencing and citation theory Research quality in medicine A very brief history of medicine Evidence-based medicine and scientific communication Positioning the dissertation

16 Conceptualising research quality in medicine for evaluative bibliometrics 6 Methods and materials Study Design Interview study Pilot interview study Script revision Main interview study Online survey Factor analysis Demographic analysis Demographic variables summarised Creating narratives Results Data analysis Raw survey data Calibration Optimal coordinates for factor analysis Factor analysis Model confirmation Factor loadings and descriptions Factor evaluation Article-level and demographic analysis Summary Factor narration and conceptualisation Narrated factors Factor 1 - Journal prestige Factor 2 - Clinical guidelines Factor 3 - Referencing behaviour Factor 4 - Method section Factor 5 - Subjective quality Factor 6 - Basic to applied Factor 7 - Author Factor 8 - Citation meaning Factor 9 - Citation quality Factor 10 - Innovation stunt Factor 11 - Skepticism Factor 12 - Propriety Summary Assessment of participants' articles Conceptual model Citation impact Journal Impact Factor Citation paths Health and policy effects

17 CONTENTS 8.3 Summary Discussion Research question summaries Implications of results Final summary References Appendices A Interview data A.1 Pilot study transcriptions A.1.1 Interview reference: P A.1.2 Interview reference: P A.1.3 Interview reference: P A.1.4 Interview reference: P A.1.5 Interview reference: P A.1.6 Interview reference: P References, pilot study A.2 Main study transcriptions A.2.1 Interview reference: M A.2.2 Interview reference: M A.2.3 Interview reference: M A.2.4 Interview reference: M A.2.5 Interview reference: M A.2.6 Interview reference: M A.2.7 Interview reference: M A.2.8 Interview reference: M A.2.9 Interview reference: M A.2.10 Interview reference: M A.2.11 Interview reference: M A.2.12 Interview reference: M A.2.13 Interview reference: M A.2.14 Interview reference: M References, main study B Interview invitation B.1 Original letter of invitation B.2 Translated letter of invitation C Interview documents C.1 Interview script, pilot study C.2 Interview script, main study C.3 Declaration of consent

18 Conceptualising research quality in medicine for evaluative bibliometrics D Initial statement codes E Initial survey questions E.1 Personal details E.2 Statements on research quality E.3 Importance of article-specific factors F Final survey F.1 Visual design F.2 Drop-down response options G invitation to survey H Data cleaning

19 List of Figures 4.1 Model of interactions in medical research. Adapted from Lewison (2004) A model of scientific communication as a global distributed information system. Adapted from Björk (2007) Records (all publication types) in the PubMed MEDLINE database published between 1945 and 2010, as of late Model of study design Frequency of response codes ordered by rank of occurrence Number of unique new codes added by each participant, ranked decreasingly Correlation between professional experience, measured as years since completion of formal medical education, and age in years. Linear regression is illustrated as a red line Number of participants from different countries. Only countries with more than two participants are displayed individually Correlation between number of participants per country and the corresponding number of s sent to domains from those countries Boxplots showing the distribution of main variables measured on 5-point scales (top) and 11-point scales (bottom) Histogram of variance ranges for 5-point (left) and 11-point (right) scale variables Empirical cumulative density distribution of 5-point (left) and 11-point (right) scale variables. All variables are combined into the two groups, creating a collective overview Pure frequencies of response selections for article-specific variables Histograms showing frequencies of score ranges for index-calibration (left) and standard-calibration (right)

20 Conceptualising research quality in medicine for evaluative bibliometrics 7.6 Empirical cumulative density distributions for all variables combined, using index-calibration (left) and standard-calibration (right) Distribution of optimal coordinates (oc) with 250 tests, using 500 repetitions in each test, for both calibration types. To the left, boxplots show the overall distribution with quantile ranges, and to the right, the empirical cumulative density distributions for index-calibration (orange) and standard-calibration (blue) show the exact differences between the resulting optimal coordinates for the two calibrations p-value for testing the number of factors, as a function of the number of factors, using Bartlett's test and varimax rotation. The red line marks the 0.05 boundary Dendrogram showing the similarity of observations, as likeness of Bartlett scores. Similarity is shown as distance, i.e. values close to 0 are very similar. Clusters are shown as red outlines Heatmap showing the average Bartlett scores for each cluster on each factor Histograms of index scores for variables associated with factor 1 - Journal prestige Histograms of index scores for variables associated with factor 2 - Clinical guidelines Histograms of index scores for variables associated with factor 3 - Referencing behaviour Histograms of index scores for variables associated with factor 4 - Method section Histograms of index scores for variables associated with factor 5 - Subjective quality Histograms of index scores for variables associated with factor 6 - Basic to applied Histograms of index scores for variables associated with factor 7 - Author Histograms of index scores for variables associated with factor 8 - Citation meaning Histograms of index scores for variables associated with factor 9 - Citation quality Histogram of index scores for variable associated with factor 10 - Innovation stunt Histograms of index scores for variables associated with factor 11 - Skepticism Histograms of index scores for variables associated with factor 12 - Propriety Relationship between citations to selected articles and comparable articles. TC = times cited for selected articles, MCJ = Mean citations to articles of the same type, published the same year in the same journal Conceptual model of research quality in medical research, from an evaluative bibliometrics perspective

21 LIST OF FIGURES 9.1 clinrel variable as function of primary job categories ``research'' or ``clinical practice'' Model of impact types and quantitated elements

22 Conceptualising research quality in medicine for evaluative bibliometrics 8

23 List of Tables 3.1 Classification scheme for measurement instruments from (Geisler, 2000) Basic statistics on item ratings for the Sternberg & Gordeeva (1996) impact questionnaire items. Means are calculated from ratings between 1 (low) and 6 (high) List of variables, question wording, response types and data types in the online survey Distribution of gender Distribution of age and experience among participants. Experience is measured as years since completion of formal medical education. Both variables were recorded as exact years but are here cumulated in five-year intervals Primary job category Primary work place Distribution of medical specialties among participants. Specialties represented by less than five participants were combined in the 'other' category Descriptive statistics for main variables. SD = standard deviation, var = variance Correlations between factors. Factor numbers are used as labels, as the correlation matrix is symmetrical, only the upper half is displayed Proportion of total variance (Pvar) explained by individual factors Factor loadings for main variables. Only loadings.20 or 0.20 are displayed for ease of reading. Salient loadings are marked with bold text Low- and high-scoring factors in each cluster, using thresholds of ± List of papers collected from main study participants, containing qualityrelated statements

24 Conceptualising research quality in medicine for evaluative bibliometrics 8.2 List of papers collected from main study participants. PY = Publication year, TC = Times cited in Web of Science, total from publication until retrieval (09 Sep, 2011), MCJ = Mean citations for papers of type=article, published in the same journal, the same year with the same citation window as TC. References in appendix. One paper omitted as it could not be retrieved (M18_1) D.1 Initial statement codes and unique ID's as well as frequency (f) of occurence

25 1. Introduction Research quality is an elusive but widely used phrase, used by researchers, policy makers and the society as a whole. But what is research quality? In medical research it could be claimed that there is a connection between the quality of research and the overall effect of said research on the state of public health. If medical researchers produce high-quality research, this should be re ected in better treatment, or policies which result in less illness - but this is a very generalised, cursory view on medicine, health, research and research quality. For instance, much basic biomedical research is not directly translatable to health improvements, but through further research iterations it may result in a new drug or therapy that is clinically tested and shown to cure a speci c disease. at could be considered one of the purposes of basic research, but this would also banalise central aspects of the basic research idea, such as pure knowledge generation, while the quality also becomes less apparent. e same is true of many other levels and types of research. e complexity increases even more if we wish to measure quality as it is of an intangible, complex nature, containing numerous dimensions. is does not make quality measurement irrelevant, however; funding agencies, government bodies and research managers quite naturally want to invest in the best possible research projects. is is where research assessment plays an important role, as evaluating previous research of a given researcher, research group or university might hint at their potential future achievements. ese assessments can be divided into two main classes; peer assessment and metrics. Peer assessment is regarded as the golden standard by many but also critiqued for different types of bias and subjectivity, while metrics are usually regarded as more objective but also unable to grasp the entire picture (e.g. Butler & McAllister, 2009; Clerides, Pashardes, & Polycarpou, 2011; Goldstein, 2011; Kenna & Berche, 2011; Taylor, 2011). One great problem of metrics is that they measure some very speci c objects, such as publication or citation counts, which may be related to research quality, and used as proxies thereof, but in no way re ect all dimensions or properties of the entire concept (J. R. Cole & Cole, 1971; Zuckerman, 1987). While this is the case, metrics are a relevant tool for research assessment as it may 11

26 Conceptualising research quality in medicine for evaluative bibliometrics quantify certain aspects of productivity or impact in the research society that are hard to grasp and easily in uenced in peer assessments. But if we want to achieve a good quantitation of the research quality concept it is necessary to rst discover the dimensions and properties, or qualities, of the concept. at is the primary aim of this dissertation. In the following introductory chapter, we will look further at the motivation for this subject and illustrate brie y how we will seek to reach the goal. e theoretical framework of this dissertation is to be found in the bibliometric eld, the science of measuring attributes of documents and very often scienti c documents. It is a fairly young scienti c discipline, closely related to information science and scientometrics, the science of measuring science. Scientometrics and bibliometrics may at times be regarded as synonymous, and at other times the one may be regarded as a specialised case of the other. However, the aim of each discipline is not always the same. Scientometrics are chie y concerned with the assessment of science (and technology) using various indicators, such as the funding a university receives, the number of researchers employed at a department or the success rate of research grant applications, as well as historical or predictive studies of science, scienti c communication and growth and science sociology. Bibliometrics are concerned with similar indicators, such as the productivity and impact of research departments, but always based on information derived from their published items, such as journal articles and monographs. Bibliometrics may also be used for other purposes however; for instance domain analysis (Hjørland, 2002), thesaurus construction (Schneider, 2004), indexing (Salton, 1963; Gar eld, 1964; Schneider, 2006) and information retrieval (White, 2007a, 2007b). is dissertation is concerned with research performance assessment, i.e. the assessment of how well individual researchers and research groups perform on a global and local scale. As already stated, there are several indicators of such performance, but the focus of this dissertation will be on those related to scienti c publishing performance. Put simply, bibliometric methods for research performance assessment are about measuring the publication output of a research unit, and assessing the impact of these in one way or another. As discussed above, the connection between impact and research quality is not necessarily straightforward and impact does not re ect all aspects of research quality. Creating a connection between the measurements and the conceptualisation of research quality is a question of what can be measured, what makes sense to be measured and what the meaning thereof is. e famous quote, attributed to Albert Einstein, Not everything that can be counted counts, and not everything that counts can be counted., comes to mind in this context and is naturally important to keep in mind when performing bibliometric research assessments. e ability to measure publication counts, citations and other quanti able aspects of science and research does not necessarily mean it is viable to do so in any given setting. Nonetheless, when it is interesting at all to use quantitative tools to measure scienti c output it is because we can see patterns in the outputs and deduce meaning from these patterns. If all scientists were to produce the same amount of research papers per year and citations were given arbitrarily, a measurement of these units would be 12

27 1. Introduction of little consequence to anything, as their interpretation would be just as arbitrary as their nature. e case is the opposite, however; the productivity of researchers and the way citations are received resembles natural laws (Price, 1963), although far from all bibliometric research agrees on the validity of such purely statistical measurements of science (c.f. Gilbert, 1977; Cozzens, 1981; Luukkonen, 1997). ese rst, broad analyses of science and in particular scienti c publications, e.g. by Price, show that science and research may be measured and assessed using metrics and statistics and gave legitimacy to the coming bibliometric eld. Since then, the elds of sciento- and bibliometrics have developed a number of metrics, indicators and indices, measuring and assessing various aspects of science and research. is has accelerated, in particular in recent years, where many governments and university directors have implemented local and national assessment exercises and funding schemes based on these. is recent focus on bibliometric assessments of scienti c productivity has intensi ed the debate about what these methods and metrics actually measure, and if it is fair to fund research based on such indicators (e.g. Williams, 1998; Elton, 2000; Clarke, 2005; Kostoff & Geisler, 2007). is is not a new debate for bibliometricians; for some time there has been a discussion about the lack of methodological consensus (Glänzel & Schoep in, 1994), lack of a common language (Glänzel, 1996), database inconsistencies (Glänzel, 1996; Bar-Ilan, 2008), abuse of metrics (Seglen, 1992; Moed & van Leeuwen, 1995; Seglen, 1997), method redundancy (e.g. Bollen, van de Sompel, Hagberg, & Chute, 2009; Leydesdorff, 2009), validity issues (van Leeuwen, 2008), method robustness (e.g. Lehmann, Jackson, & Lautrup, 2006, 2008) and the various problems related hereto. In addition to the above, there is a debate, or even different paradigms, with regard to what a citation, the main measure of scienti c impact, is and how it can be used. To see this in a broader perspective, we can also ask what research quality is. Often the aforementioned citations to publications are used as an indicator of the quality of the research contained within the publication. is shows that research quality can be operationalised, and in such cases can be well represented by bibliometric indicators (Smart, 2005; Andras, 2011), however; while this may be pragmatic and useful, there are a number of issues with the use of citations as a quality indicator (some of which are shown above), and it should also be clear that citations can only re ect a speci c part of the quality and impact of research (e.g. Waltman, van Eck, & Wouters, 2013; Wouters, in press). Providing a better understanding of the different dimensions of the research quality concept should improve our communication about the different aspects of research quality assessments, and thereby also clarify which aspects can be measured bibliometrically Objectives of the dissertation e main objective of this dissertation is to investigate the connection between the research quality concept and bibliometric methodolody. is will result in a better and more explicit understanding of what is being measured when bibliometricians speak 13

Vis mere