Measures of central tendency. Normer og standardisering. Raw scores. Normalized scores (standard scores)

Relaterede dokumenter
Basic statistics for experimental medical researchers

Kvant Eksamen December timer med hjælpemidler. 1 Hvad er en continuous variable? Giv 2 illustrationer.

Reexam questions in Statistics and Evidence-based medicine, august sem. Medis/Medicin, Modul 2.4.

The X Factor. Målgruppe. Læringsmål. Introduktion til læreren klasse & ungdomsuddannelser Engelskundervisningen

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Barnets navn: Børnehave: Kommune: Barnets modersmål (kan være mere end et)

Statistik for MPH: 7

Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US

Evaluating Germplasm for Resistance to Reniform Nematode. D. B. Weaver and K. S. Lawrence Auburn University

Vina Nguyen HSSP July 13, 2008

Black Jack --- Review. Spring 2012

Observation Processes:

Improving data services by creating a question database. Nanna Floor Clausen Danish Data Archives

Help / Hjælp

Financial Literacy among 5-7 years old children

Richter 2013 Presentation Mentor: Professor Evans Philosophy Department Taylor Henderson May 31, 2013

Measuring the Impact of Bicycle Marketing Messages. Thomas Krag Mobility Advice Trafikdage i Aalborg,

Engelsk. Niveau D. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

1 What is the connection between Lee Harvey Oswald and Russia? Write down three facts from his file.

Jens Olesen, MEd Fysioterapeut, Klinisk vejleder Specialist i rehabilitering

The River Underground, Additional Work

Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

Feedback Informed Treatment

Applications. Computational Linguistics: Jordan Boyd-Graber University of Maryland RL FOR MACHINE TRANSLATION. Slides adapted from Phillip Koehn

Feedback Informed Treatment

KA 4.2 Kvantitative Forskningsmetoder Forår 2010

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528)

Aktivering af Survey funktionalitet

Bilag. Resume. Side 1 af 12

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

Analyseinstitut for Forskning

Implementing SNOMED CT in a Danish region. Making sharable and comparable nursing documentation

Agenda. The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark

Department of Public Health. Case-control design. Katrine Strandberg-Larsen Department of Public Health, Section of Social Medicine

Sport for the elderly

Measuring Evolution of Populations

United Nations Secretariat Procurement Division

The GAssist Pittsburgh Learning Classifier System. Dr. J. Bacardit, N. Krasnogor G53BIO - Bioinformatics

Engelsk. Niveau C. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

Tilmelding sker via stads selvbetjening indenfor annonceret tilmeldingsperiode, som du kan se på Studieadministrationens hjemmeside

SKEMA TIL AFRAPPORTERING EVALUERINGSRAPPORT

Trolling Master Bornholm 2012

Sikkerhed & Revision 2013

F o r t o l k n i n g e r a f m a n d a l a e r i G I M - t e r a p i

Bayley-III Motor Scale. Bayley-III Screening test Bayley scales of Infant and Toddler. Development III. Development, Third Edition

Central Statistical Agency.

Patientinddragelse i forskning. Lars Henrik Jensen Overlæge, ph.d., lektor

Statistical information form the Danish EPC database - use for the building stock model in Denmark

Øjnene, der ser. - sanseintegration eller ADHD. Professionshøjskolen UCC, Psykomotorikuddannelsen

To the reader: Information regarding this document

CHAPTER 8: USING OBJECTS

DoodleBUGS (Hands-on)

Dokumentation og udredning af komplekse posttraumatiske reaktioner hos bosniske flygtninge i danske behandlingscentre.

The use of instrumented gait analysis in interdisciplinary interventions for children with cerebral palsy

applies equally to HRT and tibolone this should be made clear by replacing HRT with HRT or tibolone in the tibolone SmPC.

Childhood motor performance as predictor of physical activity and physical activity related injuries

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

SOFTWARE PROCESSES. Dorte, Ida, Janne, Nikolaj, Alexander og Erla

Health surveys. Supervision (much more) from the patients perspective. Charlotte Hjort Head of dep., MD, ph.d., MPG

RoE timestamp and presentation time in past

USERTEC USER PRACTICES, TECHNOLOGIES AND RESIDENTIAL ENERGY CONSUMPTION

Linguistic and cross-cultural translation of KOOS-child from Swedish to Danish

LESSON NOTES Extensive Reading in Danish for Intermediate Learners #8 How to Interview

Portal Registration. Check Junk Mail for activation . 1 Click the hyperlink to take you back to the portal to confirm your registration

Forventer du at afslutte uddannelsen/har du afsluttet/ denne sommer?

Gusset Plate Connections in Tension

ATEX direktivet. Vedligeholdelse af ATEX certifikater mv. Steen Christensen

Brug sømbrættet til at lave sjove figurer. Lav fx: Få de andre til at gætte, hvad du har lavet. Use the nail board to make funny shapes.

ELEVERS INTERESSE OG SELVTILLID I NATURFAGENE -OG I FREMTIDEN

Special VFR. - ved flyvning til mindre flyveplads uden tårnkontrol som ligger indenfor en kontrolzone

A multimodel data assimilation framework for hydrology

Brug af logbog i undervisning. Karen Lauterbach Center for Afrikastudier Adjunktpædagogikum 19. Juni 2013

Sikkerhedsvejledning

DK - Quick Text Translation. HEYYER Net Promoter System Magento extension

Roskilde Universitet Jeanette Lindholm PHD-.student

Teknologispredning i sundhedsvæsenet DK ITEK: Sundhedsteknologi som grundlag for samarbejde og forretningsudvikling

INTEL INTRODUCTION TO TEACHING AND LEARNING AARHUS UNIVERSITET

Den nye Eurocode EC Geotenikerdagen Morten S. Rasmussen

Forventer du at afslutte uddannelsen/har du afsluttet/ denne sommer?

Learnings from the implementation of Epic

ESG reporting meeting investors needs

Our activities. Dry sales market. The assortment

Sustainable use of pesticides on Danish golf courses

Women in STEM education in the Nordics

NOTIFICATION. - An expression of care

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1

Constant Terminal Voltage. Industry Workshop 1 st November 2013

Besvarelser til Lineær Algebra Reeksamen Februar 2017

CS 4390/5387 SOFTWARE V&V LECTURE 5 BLACK-BOX TESTING - 2

HÅNDTERING AF RISIKOFAKTORER FOR SYGDOM Medicinforbrug og selvvurderet helbred

Developing a tool for searching and learning. - the potential of an enriched end user thesaurus

Userguide. NN Markedsdata. for. Microsoft Dynamics CRM v. 1.0

MSE PRESENTATION 2. Presented by Srunokshi.Kaniyur.Prema. Neelakantan Major Professor Dr. Torben Amtoft

Bedømmelse af klinisk retningslinje foretaget af Enhed for Sygeplejeforskning og Evidensbasering Titel (forfatter)

Modtageklasser i Tønder Kommune

How consumers attributions of firm motives for engaging in CSR affects their willingness to pay

Tema: Pets Fag: Engelsk Målgruppe: 4. klasse Titel: Me and my pet Vejledning Lærer

An expression of care Notification. Engelsk

Small Autonomous Devices in civil Engineering. Uses and requirements. By Peter H. Møller Rambøll

Transkript:

Measures of central tendency Normer og standardisering STATISTISKE BEGREBER MODE: the most common/frequently occuring score ex (1,2,2,5,6,7,9) mode = 2 MEAN: the average score ex (1,2,2,5,6,7,9) mean = 4,57 MEDIAN: the middle score (50. percentil) ex (1,2,2,5,6,7,9) median = 5 Data, der afspejler testresultater fra den generelle population eller en specifik gruppe, fx køn eller alder, og som gør det muligt at fortolke en persons testresultater i forhold til andre, dvs. afspejler hvor en person er placeret i forhold til andre. Kvaliteten af normerne afhænger af sample size, hvor repræsentativ samplet er (også i ekstremerne), hvor de er fra (fx hvilket land) og hvor gamle normerne er. Forskellige slags normer: Percentiler, baseret på en ordinal skala, ex en person med en score i den 75. percentil, har en højere score end 75 % af personerne i samplet Gennemsnit og standardafvigelser Normaliserede scores, ex T- scores Standardisering inkl.: Standard metoder for administration af testen Standard metoder for scoring af testen Udviklingen af normer for testen Raw scores Normalized scores (standard scores) Det første resultat fra en psykologisk test, dvs. før scoren bliver normaliseret til fx T- scores. Normaliserede scores er råscores, der er omregnet til at passe i en normalfordeling og som kan fungere som en fælles standard skala for forskellige tests. z scores The most basic standard score, which can also be used to calculate other standard scores with. Mean = 0 SD = 1 T scores (ex SCL- 90 og MMPI og NEO- PI) Mean = 50 SD = 10 Usually ranges from 20-80 Sten scores Mean = 5,5 SD = 2 Ranges from 1-10

Stanine Mean = 5 SD = 2 Ranges from 1-9 Scales Nominal Folk klassificeres i kategorier ved brug af numre/tal. Kun få regne- operationer kan lade sig gøre. Kategorier er ikke ordnede på nogen måde og kan ikke sammenlignes kvantitativt. Ex mand = 1, kvinde = 2. Ordinal Folk sættes i rangorden med hensyn til en bestemt variabel, fx fra først til sidst eller lav til høj i en konkurrence. Disadv.: Skalaen viser ikke folks absolutte position, kun den relative ifht. de andre i sættet. Det er ikke muligt at udlede de reelle forskelle mellem to personer, kun deres relative placering på rangordenen. EX. LIKERT SKALA (A large number of favourable and unfavourable items are developed and administered to a large group of people who rate them on a continuum from 1-5, from like to dislike (sometimes 1-7). Interval Numre knyttes til et individ og viser om denne er >, <, = andre, men repræsenterer også hvor meget forskel der er mellem individer. Manglen på et ægte nulpunkt betyder at man ikke kan vide/kende det absolutte niveau af det der måles på. Ex depressionsskala fra 1-10, celsiustemperaturer, IQ etc. Ratio Ligesom intervalskalaen, men med et naturligt/absolut nulpunkt. Den mest ideelle og meningsfulde skala. Ex meter, kilo, reaktionstid, antal rigtige svar etc. I psykologiske tests benyttes ofte ordinal- eller intervalskalaer. Reliabilitet Reliabilitet reflekterer i hvilken grad en test giver det samme resultat for de samme personer hver gang de tager testen, dvs. hvorvidt testen måler en stabil egenskab/træk/faktor. Endvidere kan reliabilitet bruges til at vurdere hvorvidt individuelle forskelle i testresultater er forårsaget af reelle forskelle mellem individer eller tilfældige variationer, jf. Classic Test Theory hvor X (testscore) = T (true score) + E (error). Dvs. reliabilitetskoefficienten kan bruges til at beregne Standard Error of Measurement. Measures of reliability Test- retest (Stabilitet af testresultater over tid, Correlation)

Parallel Form (Correlation) Split- half (Hvorvidt items måler på det samme træk, Correlation with S- B prophecy) Inter- item (Intern reliabilitet og item homgenitet, dvs. har forskellige items samme egenskaber, Cronbach s Alpha) Scorer/Inter- rater (Kappa) For individual test results, a reliability of > 0.7 is usually required. For research purposes, lower reliabilities can be accepted. Spearman- Brown Prophecy Cronbach s Alpha (internal reliability) Validitet En formel, der relaterer en tests længde til reliabiliteten og som ofte bruges til at forudsige ændringer i reliabiliteten når der foretages ændringer af testlængden. (En tests reliabilitet bliver mindre når der er få items.) Relateret til split- half korrelationer og afhænger af testlængde (dvs. antal items). Måler den interne reliabilitet, dvs. i hvor høj grad de forskellige items i en test eller bestemt skala måler på det samme træk/evne/faktor/construct. Gennemsnittet af alle mulige inter- item reliabiliteter, dvs. alle de måder, det er muligt at sammenligne forskellige items i en test. Cronbach s Alpha bliver større når antallet af items i en test forøges, (ligesom split- half.) Validitet handler om hvorvidt en test måler det den siger den måler, men også hvordan, hvornår, under hvilke omstændigheder og til hvilke formål en test kan give meningsfulde resultater. Types of validity: a. Content validity: Dækker testen alle aspekter af det område/træk/faktor den omhandler? Er der irrelevante items? Content validity er en egenskab ved selve testen og hvor godt den er blevet lavet. Bestemmes ved at fx eksperter eller testbrugere mener at testen dækker det område den måler godt nok. b. Criterion- related validity: Hvorvidt testresultater kan vises at være relateret til et eksternt kriterium, fx hvor godt man klarer sig på en anden test, ens jobkompetencer, demensniveau etc. Nyttigt ifht. at evaluere hvorvidt en test kan bruges til at lave forudsigelser. Er som regel kvantitativ, dvs. udtrykt ved et tal, fx korrelationer og regression når kriteriet er variabelt (demensniveau) og udtrykt ved sensitivitet/specificitet når der gøres brug af cut- off score og kriteriet er enten- eller (skizofren eller ej). Predictive validity: Bruges til at forudsige fremtidig adfærd (fx resultater i en anden test.) Af stor betydning i erhvervssammenhænge, hvor testresultater bruges til at forudsige fremtidig jobpræstation. Concurrent validity: Sammenligner en test med andre eksisterende vurderinger, fx resultater i en erhvervstest sammenlignes med nylige vurderinger af ens jobpræstation. Afhænger stærkt af kvaliteten af de vurderinger testen

sammenlignes med. c. Construct validity: Den vigtigste type validitet, der omfavner og overlapper med de andre former for validitet. Har at gøre med evidens for at testen virkelig måler på det område/træk/faktor den påstår. Opbygges gennem længere tids evidensindsamling og afprøvning af testen indenfor forskellige områder (modsat predictive validity der kan etableres i et enkelt eksperiment.) o Convergent- discriminant/divergent validity: Når en tests resultater konvergerer/korrelerer med andre tests, der måler på det samme område/træk/faktor og divergerer/afviger fra tests, der måler på noget helt andet. Måles ofte via Multitrait- multimethod metoden. Other validities Face validity does the test look appropriate to the person being tested. Faith validity does the tester believe in the test. Ecological validity - The Multitrait- multimethod Approach Sensitivitet og specificitet En detaljeret måde at etablere convergent- discriminant validity, ved at korrelere/sammenligne forskellige assessment metoder af forskellige områder/træk/faktorer i en matrix. Herved kan evidens for convergent validity etableres ved at observere om der fremkommer ensartede målinger når man måler et område/træk/faktor med forskellige metoder/testinstrumenter. Sensitivitet Hvor ofte identificerer testen patienttypen (fx dement.) Lav sensitivitet medfører flere falsk negative. Specificitet Hvor ofte identificerer testen ikke- patienten (fx ikke- dement.) Lav specificitet medfører flere falsk positive. Cut- off score, sensitivitet og specificitet Sensitivitet kan på bekostning af specificiteten forøges ved at justere på cut- off scoren og omvendt. Standard Error of Measurement Kan bruges hvis reliabilitet er kendt SEm = SD x (1- r) It indicates how much inaccuracy there is in a test score because of its less- than- perfect reliability Classic test theory defines reliability as: The proportion of variance which is not caused by random error due to Observed score = True score +/- Measurement error SEm = SD x (1- r) Ex: The Wechsler Intelligence Scale, SD = 15, r = 0,9

SEm = 15 x (1-0,9) = 4,65 Which means that there is a 68 % possibility (1 SD on both sides of the score) that a person s true score lies within an interval of +/- 5 points around the person s obtained score and 95 % possibility (2 SD s on both sides of score) of the true score lying within an interval of +/- 10 points around the obtained score. (= Confidence limits/intervals) Konfidensinterval Baseret på SEm. Ex: The Wechsler Intelligence Scale, SD = 15, r = 0,9 SEm = 15 x (1-0,9) = 4,65 Which means that there is a 68 % possibility (1 SD on both sides of the score) that a person s true score lies within an interval of +/- 5 points around the person s obtained score and 95 % possibility (2 SD s on both sides of score) of the true score lying within an interval of +/- 10 points around the obtained score. (= Confidence limits/intervals) Criterion- keying Criterion- keyed construction selects items based on their discriminability between a criterion group and a control. Item content is irrelevant and the approach is atheoretical. EX MMPI. From a large item- pool with various items, those items which can discriminate between a criterion group and a control are selected (regardless of their content). A purely empirical approach, not based on theory. (A limitation) ex MMPI (Minnesota Multiphasic Personality Inventory), CPI (California Personality Inventory) Based upon true- false or yes- no items. Limitations: No understanding of why such a test works due to the lack of theory behind item selection. Likely to produce scales of poor reliability, as scales would be measuring a mix of different possible attitudes. The test would be specific only to the group used in construction. Classic test theory In construction using classic test theory the aim is to generate a pool of items of various difficulty measuring the same thing (item homogeneity), which is checked by item/total correlations. Observed score = True score +/- Measurement error A test is generated by analysing items, ex factor analysis, in a pilot sample to make sure they measure only one factor. The end result is a pool of items of differing difficulty which measure only one thing. This is done by item- whole/total correlation (correlating every single item with total scores) and item analysis.

Classic test theory In construction using classic test theory the aim is to generate a pool of items of various difficulty measuring the same thing (item homogeneity), which is checked by item/total correlations. Observed score = True score +/- Measurement error A test is generated by analysing items, ex factor analysis, in a pilot sample to make sure they measure only one factor. The end result is a pool of items of differing difficulty which measure only one thing. This is done by item- whole/total correlation (correlating every single item with total scores) and item analysis. Two criterions for items: item homogeneity (they all measure the same thing) and difficulty indicator, ex item discrimination analysis. IRT (Item Response Theory) & RASCH models Using IRT and Rasch scaling items first undergoe factor analysis to ensure they all measure the same trait and are then analysed with focus on item difficulty and how well the item facilitates elicitation of the wanted trait. A very large sample is needed. Items are constructed and an initial factor analysis is carried out to make sure that all items measure a single trait. In order to computate the Rasch parameters the sample data is split into groups of high and low scorers, providing different levels of the difficulty level. If the facility of an item for eliciting the trait is the same for both groups, it is seen as conforming to the model and is chosen for the test. On completion of item selection an item- free measurement of all individuals is needed, ex by checking whether sub- groups of items (the hardest vs. the easiest) generate the same scores for each person. If items fit the model, each individual will score the same on both tests. EX ITEM CHARACTERISTICS CURVES Faktoranalyse Factor analysis is based on analysis of correlations between variables, and is used to check whether items relate to the same trait/theme/factor. Oblique or orthogonal analysis. A multivariate data reduction tool which enables us to simplify correlations between sets of variables. Based upon correlations between variables. In test construction factor analysis is used to check whether items relate to a common theme or factor. Used to generate assessments which only measure one factor. With oblique analysis factors are correlated. With orthogonal analysis factors are not related by correlation, ie. factors are independent. ex 16PF, Eysenck Personality Inventory. Limitations: Needs large samples. Complex technical problems, therefore good knowledge of its procedures is highly important. Item Discrimination Analysis Testing that the people who get any one item right have got more of all the remaining items right than those who got the item wrong.

Wechslers intelligens- tests WISC WAIS WPPSC WIAT WAIS- IV (Article by Weiss L. G. et al. (2009)) FACTS: Gennemsnit = 100, SD = 15, SEm = 5 Alder 16-90 (89 for WAIS- III) 15 subtests (5 of which are supplemental), 4 index scales Tager ca. 67 min. at gennemføre PURPOSE: WAIS er en kompleks prøve til måling og vurdering af begavelse. THEORY: Wechslers definition af intelligens inkl. aspekter fra både Spearman (den overordnede g- faktor) og Thorndike (kvalitativt forskellige evner.) Dvs. WAIS bygger på en antagelse om intelligens som en global størrelse, der omfatter en række forskellige indbyrdes uafhængige funktioner (inkl. kognitive og ikke- kognitive evner, fx motivation, vedholdenhed, temperament), som tilsammen giver et grundlag for intelligensvurdering. DEVELOPMENT AND REVISIONS: 4 delprøver udgår: Object Assembly, Picture Arrangement, Coding Recall og Coding Copy Nye delprøver: Visual Puzzles (i stedet for Picture Completion), Figure Weights og Cancellation Psykometriske forbedringer: Updaterede normer Udvidelse af FSIQ range Forbedring af floors og ceilings Forbedret brugervenlighed, fx reduceret testing tid (fra 80 67 min.) Reviderede instruktioner Fra Verbal IQ og Performance IQ til Indeksscores (VCI, PRI, WMI og PSI) Den vigtigste forskel mellem WAIS- III og WAIS- IV er at indeksscores er blevet det primære niveau for fortolkning af resultater i stedet for Verbal og Performance IQ, der var baseret på de gamle Army Alpha og Army Beta. Der lægges dermed mere vægt på Verbal Conceptualisation, Perceptual Reasoning, Working Memory og Processing som led i en mere differentieret beskrivelse af intelligenstestresultaterne og i højere overensstemmelse med nuværende viden om kognition. SCALES AND ITEMS: 15 subtests (5 of which are supplemental), 4 index scales Verbal Comprehension Index: Similarities Vocabulary Comprehension Information Word Reasoning Perceptual Reasoning Index:

SCALES AND ITEMS: 15 subtests (5 of which are supplemental), 4 index scales Verbal Comprehension Index: Similarities Vocabulary Comprehension Information Word Reasoning Perceptual Reasoning Index: Block Design Picture Concepts Matrix Reasoning Picture Completion Working Memory Index: Digit Span Letter- Number Sequencing Arithmetic Processing Speed Index: Coding Symbol Search Cancellation NORMS AND STANDARDISATION: Dansk standardisering: 340 personer i aldersgruppen 17- - 70 år. Amerikansk standardisering: 2450 personer, stratificeret ud fra køn, alder, uddannelse, sociodemografiske data. Dækker aldersgruppen 16- - 89. VALIDITY AND RELIABILITY (WAIS- III): Reliabilitet Reliabilitet utrolig god, testet ved split- half og test- retest. Reliabilitet af delprøver (mellem 0,7-0,8, med undtagelse af ordforråd og information på 0,9) er lavere end FSIQ reliabilitet som er blevet udregnet til 0,97. Validitet God til utrolig god convergent- discriminant validity coefficients (construct validity), målt ved korrelation mellem VCI/PRI og andre Wechsler tests (ex VIQ og PIQ på WISC- III). God kriterierelateret validitet fx i forbindelse med andre Wechsler tests og andre IQ tests. SCORING: Primary scores Subtest score, gennemsnit = 10 og SD = 3 Index score og FSIQ, gennemsnit = 100 og SD = 15 Kliniske forskelle mellem indeksscores En forskel på 12 point eller mere significerer klinisk betydning (14 point for PSI) Composite scores General Ability Index (GAI): VCI + PRI, repræsenter de mest g- loadede subtests. Cognitive Proficiency Index (CPI): WMI + PSI, repræsenterer effektiv

INTERPRETATION: Verbal Comprehension Index (Vocabulary, Information, Similarities, Comprehension) Evne til at forstå verbale stimuli, arbejde med semantisk materiale og kommunikere tanker og ideer med ord. Krystalliseret viden. Afhænger delvist af en person s uddannelsesniveau og generelle livserfaring, men også evne til at forstå indlært viden og benytte det på passende vis. Vocabulary subtesten har den højeste g load og er den bedste indikator af overordnet intelligens. Perceptual Reasoning Index (Block Design, Matrix Reasoning, Visual Puzzles, Figure Weights, Picture Completion) Måler flydende intelligens, samt perceptuel organisering (da fluid reasoning ikke kan måles separat, men kræver et objekt.) Kvantitativ, ikke- verbal flydende intelligens og evnen til at bevare et visuelt billede i tankerne, mens man mentalt manipulere det. Working Memory Index (Digit Span, Arithmetic, Letter- Number Sequencing) Måler på opmærksomhed, koncentration og working memory (Baddeleys model), dvs. evnen til mental kontrol, at holde information i tankerne (kortvarigt), mens man udfører en form for mental manipulation på denne information. Vær opmærksom på, at forskelle på scores i fx Digit- Symbol eller Letter- Number Sequencing og Arithmetic kan afspejle at testpersonen ikke har lært det nødvendige matematik frem for at han/hun har specifikke indlæringsvanskeligheder. Processing Speed Index (Coding, Symbol Search, Cancellation) Måler på hastigheden af mental bearbejdning ved hjælp af visuelle og graphomotor evner, og er relateret til effektiv brug af andre kognitive evner. PSI interagerer med andre højere rangerende kognitive funktioner og kan have betydning for generelle kognitive funktioner, ny indlæring, ræsonnering og hverdagspræstationer. Vær opmærksom på at PSI er tæt relateret til alder, dvs. svækkes med alderen. OTHER REMARKS: A BRIEF HISTORY OF INTELLIGENCE TEST INTERPRETATION 1. The first wave: Quantification of general level (Focus: The global IQ and practical considerations regarding the need to classify people into separate groups. Ex Stanford- Binet Scale and Spearman s g- factor.) 2. The second wave: Clinical profile analysis (Focus: Patterns of high and low subtest scores, which could presumably reveal diagnostic and

psychotherapeutic considerations.) 3. The third wave: Psychometric profile analysis (Focus: Psychometric precision and methods in profile analysis, rather than the loose interpretative attempts of clinical profile analysis. However, the lack of empirical support and a theoretical background makes this approach controversial and lacking in validity.) 4. The fourth wave: Application of theory (Focus: Grounding intelligence testing and interpretation of scores on a theoretical basis. The most popular theory in test development and interpretation is the CHC (Cattell- Horn- Carroll theory). Fluid and crystallized intelligence (Cattell) Flydende intelligens er evnen til logisk tænkning og problemløsning i ukendte situationer, uafhængigt at indlært viden. Evnen til at analysere nye problemer, identificere mønstre og sammenhænge. Krystalliseret intelligens er evnen til at bruge skills, viden og erfaring. Er ikke det samme som hukommelse, men er afhængig af adgangen til langtidshukommelsen. Består af ens livstid og intellektuelle opnåelser, fx vist gennem ordforråd eller general viden om verdensbegivenheder. WISC- IV (Article by Flanagan, D. P. & Kaufman, A. S. (2009)) DEVELOPMENT AND REVISIONS: Structural changes from WISC- III to WISC- IV: Deleted subtests = Picture Arrangement, Object Assembly and Mazes. New subtests = Word Reasoning, Matrix Reasoning, Picture Concepts, Letter- Number Sequencing and Cancellation. VIQ and PIQ dropped and replaced by 4 indexes: WMI, PSI, VCI and PRI. o WHY? Because the difference/discrepancy between the 2 was overused, and its meaningfulness and clinical utility was never made clear in the litterature. FSIQ has changed dramatically in content and concept and now consists of merely 5 (out of 10) subtests (Similarities, Comprehension, Vocabulary, Block Design and Coding). Norms updated. Items added to improve floors and ceilings. SCALES AND ITEMS: WISC- IV consists of 15 subtests 10 core- battery subtests and 5 supplemental subtests. G- loadings VCI subtests generally have the highest g- loadings at every age, followed by the PRI, WMI and PSI subtests, except Arithmetics which loads more like VCI. STANDARDISATION: Sample = 2200 children resembling the 2002 Census data on variables of age, gender, geographic region, ethnicity and socioeconomic status. The sample was divided into 11 age groups, each containing 200 children and was split equally between boys and girls. RELIABILITY:

Average internal consistency: of subtests: Ranges from 0,72 (Coding, for ages 6-7) to 0,94 (Vocabulary, for age 15). of indexes: VCI = 0,94, PRI = 0,92, WMI = 0,92 and PSI = 0,88. of Full Scale IQ: FSIQ = 0,97. Average test- retest coefficients: VCI = 0,93, PRI = 0,89, WMI = 0,89, PSI = 0,86 and FSIQ = 0,93. Practice effects: In general practice effects are greatest for ages 6-7 and become smaller with increasing age. Coding and Symbol Search showed the largest gains (ages 6-7). Floors and ceilings for all WISC- IV subtests are excellent, which means that WISC- IV can be used with confidence in testing individuals who are functioning either in the gifted or mentally retarded ranges of functioning. Item gradients refer to the spacing between items on a subtest. Generally these range from good to excellent at all ages in the WISC- IV. This means that the spacing between items is generally small enough to allow for reliable discrimination between individuals on the latent trait measured by the subtest. VALIDITY: Structural validity is supported by factor- analytic studies. Positive results of investigations (Keith et al., 2006) of whether the WISC- IV measures the same constructs across its 11- year age span (children from 6-16). The nature of these constructs was also investigated and it was concluded that the WISC- IV measures Crystallized Ability, Visual Processing, Fluid Reasoning, Short- Term Memory and Processing speed. Good to excellent convergent- discriminant validity when considering VCI and PRI. OTHER REMARKS: Ipsative/intraindividual interpretations: Interpretation/analysis of an individual s profile. Raven s Progressive Matrices (Article by Raven) SPM+ / MHV FACTS: Brief nonverbal (SPM+) and verbal (MHV) screening measures of general ability. For use in educational and clinical settings Group or individual administration Allows for comparison with peers SPM+ and MHV can be administered together or on their own PURPOSE: Udviklet til at vurdere aspekter af g- faktoren (som beskrevet af Spearman, 1927) og de to underkomponenter, eductive og reproductive ability.

THEORY: Baseret på teorien om eductive og reproductive intelligens, hvilket svarer overens til flydende og krystalliseret intelligens (ex SPM måler på den flydende/eductive intelligens, MHV måler på den krystalliserede/reproductive intelligens.) DEVELOPMENT: Raven s Progressive Matrices har været i brug i mere end 70 år. De første serier var baseret på en test brugt af Spearman. 1938 Standard Progressive Matrices (SPM) 1941 Advanced Progressive Matrices (APM) sværeste udgave, til de klogeste 20% 1947 Coloured Progressive Matrices (CPM) til børn 5-10 år The SPM+ Udviklet for at imødekomme Flynn effekten, der var meget udtalt ifht. Raven s, producerede fx nogle stærke ceiling effects. Den nye version af SPM havde derfor fået fjernet de nemmeste items og tilføjet nogle sværere. SCALES AND ITEMS: SPM+ Består af 5 sæt á 12 multiple- choice problemer/items (ikke- verbale stimuli fx visuelle mønstre og former) arrangeret i et cyklisk format, dvs. hvert sæt starter med forholdvis nemt og åbenlyst problem og forsætter med sværere og sværere items. Hvis administreret efter standard procedure indeholder Raven s således et indbygget træningsprogram og vurderer også testpersonens evne til at lære af erfaring. RASCH modeller: Raven s passer ikke umiddelbart godt på Rasch modeller, da folk kan gætte sig til det rigtige svar (pga. multiple- choice strukturen.) MHV Består af 2 sæt af i alt 88 ord, der skal defineres af testpersonen. I det ene sæt skal prøvepersonen selv skrive ordbeskrivelserne, mens det andet sæt er multiple- choice. NORMS AND STANDARDISATION: o Gode normer fra 924 børn, 7-18 år gamle, 2008 The SPM+ has been standardised numerous times on different populations. The majority of standardisations have been completed on the Classic Form, from which the SPM+ has been developed. RELIABILITY: SPM+ reliability Split- half reliability: r = 0,936, n = 924 Test- retest reliability: r = 0,833, n = 105 SPM+ Standard Error of Measurement and confidence intervals SEM = 3,79 (standardised scores) 95 % confidence interval 7 (standardised scores)

MHV reliability Test- retest reliability = 0,916 Parallel forms reliability = 0,929. MHV Standard Error of Measurement and confidence intervals SEM = 3,99 95 % confidence interval 8 (Evidence of SPM+ validity also comes from SPM- C data as the two are similar in form and content.) IN SUM: GOD RELIABILITET for SPM (split- half og test- retest) og MHV (test- retest og parallel forms). Raven s scores kan omdannes til IQ scores. Her har de en SEM (Standard Error of Measruement) ca. 4 point for både SPM og MHV. VALIDITY: Content validity Item analysis shows that the properties of SPM+ are relatively stable. SPM- C has face validity in cross- cultural settings, ie. its form is not culturally biased. Criterion- related validity Concurrent validation between SPM+ and SPM- C/SPM- P shows a pooled correlation between 0,8 and 0,83. In general, concurrent and predictive validity of the SPM- C varies with age, possibly sex, homogeneity of the sample etc. Reliable correlations between SPM- C and Stanford- Binet and Wechsler- scales. Correlations between the SPM- C and performance on achievement and scholastic aptitude tests have generally been lower and more variable than correlations with intelligence tests. Lower correlations/concurrent validity between SPM- C and measures of verbal and language abilities than with measures of maths and science skills. Construct validity Evidence of age- related validity as raw scores increase regularly as children get older and older. Evidence from Item Characteristic Curves shows that the items are all measuring a common factor and that the abilities required to solve the problems form part of a continuum, ie. it is generally not possible to solve the more difficult problems if one does not have the abilities to solve the easier problems (ex of a Rasch model.) Raven s is generally described as one of the best measures of g and fluid intelligence (ex factor analysis and cross- cultural studies.)

However some factor analytic studies also suggest that Raven s measures other factors in addition to g. There has especially been evidence of a spatial component. IN SUM: Content validity Item analysis shows that properties of SPM+ are relatively stable Face validity in cross- cultural settings Criterion validity Correlates highly with the Stanford- Binet & Wechsler.54-.86 Correlations with achievement, scholastic, occupational measures (lower than correlations with intelligence tests) Construct validity Age- related validity Evidence from Item Characteristic Curves shows that the items are all measuring a common factor and the abilities required to solve the problems form part of a continuum. Described as one of the best measures of g and fluid intelligence, with evidence from factor analytic studies and cross- cultural studies, all revealing high g loadings ADMINISTRATION: SCORING: INTERPRETATION: OTHER REMARKS: The Flynn Effect Large rises in mean scores since initial publication (also seen in other psychometric tests, ex WISC). Much work on the Flynn Effect has come from analysis of Raven s Progressive Matrices. Flynn showed that on average, IQ scores increased by 0,3 IC points every year and had been doing so throughout most of the 20th century. He argues that the rise is due to the increasing influence of scientific ways of thinking. The Flynn Effect appears to be universal with similar results being reported in over 14 countries. 360 Feedback (Brett & Atwater, MRG) ERHVERVSTESTS FACTS: A process in which subordinates, peers and bosses provide anonymous feedback to managers, who also rate their own performance. The LEA was not designed to be used as: a measure of personality a basis for termination a direct measure of manager/leader performance

PURPOSE: The profile provides information about the focus person s views on his own leadership role and his boss, colleagues and employees' perception of the leader s behavior - all to increase the organizational effectiveness of the individual leader and the individual management team. THEORY: Is supposed to provide developmental feedback, which can improve performance by creating awareness and motivating individuals to change behaviour eg. if ratings from others are lower than self ratings. Leadership "sets" = the theoretical basis for the LEA 360. Def.: A "set" indicates the probability that a leader will behave consistently across a broad range of managerial challenges. DEVELOPMENT: Baseret på empiriske studier af ledere og lederadfærd. Rollebaseret adfærdsanalyse. Leadership sets = den teoretiske basis for LEA 360. Based on empirical studies of leaders and leadership behavior. A role- based behavioral analysis. Originally 35 leadership sets - was reduced to the current 22 sets. SCALES AND ITEMS: The profile is a web- based analysis of behavior that are conducted among the manager himself, his boss, peers and direct reports. It is measured and analyzed in the following main areas: - Creating a vision - Developing followers - Implementing the vision - Following through - Achieving results - Team playing VALIDITY AND RELIABILITY: Test- retest: Gennemsnitlig test- retest reliabilitet på 0,77-0,8. Inter- rater: (Extensive inter- rater reliability studies using the ratings of 1068 bosses, 2592 peers and 2544 direct reports. Intra- class correlation coefficients were used to assess inter- rater reliability.) Boss ratings: coefficients ranged from 0,58 (2 raters) to 0,80 (4 raters). Peer ratings: coefficients ranged from 0,67 (4 raters) to 0,80 (8 raters). Direct report ratings: coefficients ranged from 0,66 (4 raters) to 0,79 (8 raters). Internal consistency measures were not conducted, as they are not appropriate for the (semi- ipsative) format used in LEA.