DIGHUMLAB IDENTIFICATION OF SERVICES AND RESPONSIBILITIES

Relaterede dokumenter
CLARIN-DK Status. info.clarin.dk. Bente Maegaard. National Coordinator Vice Executive Director

Forskning med brug af tekster og tekstværktøjer

DIGHUMLAB DESCRIPTION OF COMPONENTS

DANSK DANish helpdesk

VPN VEJLEDNING TIL MAC

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1

Design til digitale kommunikationsplatforme-f2013

Byg din informationsarkitektur ud fra en velafprøvet forståelsesramme The Open Group Architecture Framework (TOGAF)

DIGITAL HUMANIORA CAFE. 20. April 2015

Forskning med brug af audiovisuelt materiale især radio

MOC On-Demand Administering System Center Configuration Manager [ ]

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende

MOC On-Demand Identity with Windows Server 2016 [20742]

Teknologispredning i sundhedsvæsenet DK ITEK: Sundhedsteknologi som grundlag for samarbejde og forretningsudvikling

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

Communicate and Collaborate by using Building Information Modeling

Portal Registration. Check Junk Mail for activation . 1 Click the hyperlink to take you back to the portal to confirm your registration

Bilag. Resume. Side 1 af 12

Engelsk. Niveau C. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

Engelsk. Niveau D. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

Mission and Vision. ISPE Nordic PAT COP Marts Jesper Wagner, AN GROUP A/S, Mejeribakken 8, 3540 Lynge, Denmark

Agenda. The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark

Improving data services by creating a question database. Nanna Floor Clausen Danish Data Archives

Microsoft Dynamics C5. version 2012 Service Pack 01 Hot fix Fix list - Payroll

Overfør fritvalgskonto til pension

IPv6 Application Trial Services. 2003/08/07 Tomohide Nagashima Japan Telecom Co., Ltd.

Cross-Sectorial Collaboration between the Primary Sector, the Secondary Sector and the Research Communities

Statistical information form the Danish EPC database - use for the building stock model in Denmark

Udbud på engelsk i UCL. Skabelon til beskrivelse

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Experience. Knowledge. Business. Across media and regions.

Unitel EDI MT940 June Based on: SWIFT Standards - Category 9 MT940 Customer Statement Message (January 2004)

From innovation to market

Grøn Open Access i Praksis

A Strategic Partnership between Aarhus University, Nykredit & PwC. - Focusing on Small and Medium-sized Enterprises

Backup Applikation. Microsoft Dynamics C5 Version Sikkerhedskopiering

Forskning og udvikling i almindelighed og drivkraften i særdeleshed Bindslev, Henrik

Humanistiske forskningsinfrastrukturer aktiviteter i DIGHUMLAB

Help / Hjælp

WINDCHILL THE NEXT STEPS

Lovkrav vs. udvikling af sundhedsapps

DK CLARIN: METADATA FOR WP4 RESSOURCER

The X Factor. Målgruppe. Læringsmål. Introduktion til læreren klasse & ungdomsuddannelser Engelskundervisningen

Online kursus: Content Mangement System - Wordpress

Citrix CSP og Certificate Store Provider

Molio specifications, development and challenges. ICIS DA 2019 Portland, Kim Streuli, Molio,

Developing a tool for searching and learning. - the potential of an enriched end user thesaurus

Forventer du at afslutte uddannelsen/har du afsluttet/ denne sommer?

Projektledelse i praksis

Shared space - mellem vision og realitet. - Lyngby Idrætsby som case

Challenges for the Future Greater Helsinki - North-European Metropolis

Den uddannede har viden om: Den uddannede kan:

Aalborg Universitet. Borgerinddragelse i Danmark Lyhne, Ivar; Nielsen, Helle; Aaen, Sara Bjørn. Publication date: 2015

Agenda. Hvad er Smart City og hvem er aktørerne? Udfordringer. Muligheder

Strategic Capital ApS has requested Danionics A/S to make the following announcement prior to the annual general meeting on 23 April 2013:

Transformering af OIOXML til OIOUBL og OIOUBL til OIOXML

APNIC 28 Internet Governance and the Internet Governance Forum (IGF) Beijing 25 August 2009

Finn Gilling The Human Decision/ Gilling September Insights Danmark 2012 Hotel Scandic Aarhus City

Forslag til implementering af ResearcherID og ORCID på SCIENCE

Userguide. NN Markedsdata. for. Microsoft Dynamics CRM v. 1.0

ATEX direktivet. Vedligeholdelse af ATEX certifikater mv. Steen Christensen

Terese B. Thomsen 1.semester Formidling, projektarbejde og webdesign ITU DMD d. 02/

KANDIDATUDDANNELSE I ROBOTTEKNOLOGI

Hvad er INSPIRE? - visionen - infrastrukturen - relationer til danske forhold

BANGKOK FASE 2 -VALGFAG INFORMATION, VEJLEDNING OG DOKUMENTER

STUDIEOPHOLD I BANGKOK FASE 2 - INFORMATION, VEJLEDNING OG DOKUMENTER

Shooting tethered med Canon EOS-D i Capture One Pro. Shooting tethered i Capture One Pro 6.4 & 7.0 på MAC OS-X & 10.8

BANGKOK FASE 2 - VALGFAG INFORMATION, VEJLEDNING OG DOKUMENTER

Forventer du at afslutte uddannelsen/har du afsluttet/ denne sommer?

Observation Processes:

IBM Software Group. SOA v akciji. Srečko Janjić WebSphere Business Integration technical presales IBM Software Group, CEMA / SEA IBM Corporation

Agenda. Ny Digital Strategi Data og Vækst Smart Government. Carsten Ingerslev:

Nyhedsbrev 15 Februar 2008

Black Jack --- Review. Spring 2012

En god Facebook historie Uddannelser og valgfag målrettet datacenterindustrien!?

ESG reporting meeting investors needs

Mandara. PebbleCreek. Tradition Series. 1,884 sq. ft robson.com. Exterior Design A. Exterior Design B.

Notat om rebudgettering, Pædagogik og formidlingstiltag Oktober 2015

2a. Conceptual Modeling Methods

CLARIN en europæisk forskningsinfrastruktur

Integrated Coastal Zone Management and Europe

Microsoft Dynamics C5. Nyheder Kreditorbetalinger

Challenges of the Open Source Component Marketplace in the Industry

Application form for access to data and biological samples Ref. no

Basic statistics for experimental medical researchers

Velkommen til webinar om Evaluatorrollen i Horizon Vi starter kl Test venligst lyden på din computer ved at køre Audio Setup Wizard.

POSitivitiES Positive Psychology in European Schools HOW TO START

United Nations Secretariat Procurement Division

Brug sømbrættet til at lave sjove figurer. Lav fx: Få de andre til at gætte, hvad du har lavet. Use the nail board to make funny shapes.

The Arctic Dimension, Horizon 2020

Ansøgningen vedrører udstedelse af

IBM WebSphere Operational Decision Management

OIOEA and Archimate. Kuno Brodersen and John Gøtze

Mustafa Saglam SAP Integration & Certification Center

Bilag J - Beregning af forventet uheldstæthed på det tosporede vejnet i åbent land Andersen, Camilla Sloth

Danish Language Course for International University Students Copenhagen, 12 July 1 August Application form

Implementing SNOMED CT in a Danish region. Making sharable and comparable nursing documentation

Patientinddragelse i forskning. Lars Henrik Jensen Overlæge, ph.d., lektor

Microsoft Development Center Copenhagen, June Løn. Ændring

Transkript:

April 2016 DIGHUMLAB IDENTIFICATION OF SERVICES AND RESPONSIBILITIES By Birte Christensen-Dalsgaard DIGHUMLAB Draft version 1.0; indlejring_servicedefinition_udkast April 20, 2016

Version history Version no. Data Author Status Changes 1.0 2016-04-21 BCD Draft for theme leaders ii

INDHOLD Introduction... 1 Characteristica of the elements of DIGHUMLAB... 3 Characteristica of Tools and services being part of DIGHUMLAB... 3 Characteristica of collections of digital objects being part of DIGHUMLAB... 4 Characteristica of tutorials being part of DIGHUMLAB... 4 Characteristica of experts contribution to DIGHUMLAB... 5 Existing tools and services in DIGHUMLAb... 5 DIGHUMLAB-Datasets... 6 DIGHUMLAB tutorials, Courses and workshops... 8 Online tutorial/case studies:... 1 Tutorials/Cases under preparation... 4 Workshops... 5 Workshops/Courses under preparation... 6 International activities:... 7 Appendix 1: Tools... 1 Clarin... 1 CMDI metadata & PID workflow (internal tool)... 1 Metadata Updater (internal tool)... 1 CMDI Component Registry Editor front-end web application (CLARIN)... 1 CMDI toolkit (CLARIN)... 2 Incoporated tools... 2 Larm... 4 CHAOS technology stack... 6 CHAOS structure... 8 CHAOS diagram... 9 CHAOS network... 10 CHAOS API... 10 Running of CHAOS and LARM:... 11 Statistic & logging of LARM... 12 Documentation and community:... 12 The data for Larm... 12 Data: Radio/TV-samlingen, SB... 12 Tool: Mediestream... 13 iii

iv

INTRODUCTION In the profile paper <1> DHL is defines as an ecosystem sustained by a partnership among institutions with a shared vision of stimulating humanities and social science through a shared infrastructure, DIGHUMLAB, supporting the work with and on digital objects. Quoting from the profile paper: DIGHUMLAB is a digital ecosystem advancing research in digital humanities and consisting of: Digital Ecosystem Objects Tools Practice Tutorials Community Experts A tool area consisting of Software (preferable open source) with well described APIs and with metadata for discovery Selected external (and internal) services, which can be invoked according to standards Capture tools A digital object area consisting of Licensed data and their associated metadata Open data adhering to standards and with metadata for discovery (preferable adhering to the standards behind Linked (open) Data) Proprietary data with restricted access Tutorials consisting of Video/demonstrators illustrating the use Workflow descriptions Online material for workshops and course material Training courses 1

A share policies and practice area consisting of Information on ethical questions (DIGETIK) Network for best practice in right and privacy (find correct title) And finally a community area of Special interest groups addressing emerging themes Participation in national networks Communities on the Web and on Facebook Collaboration through international fora And finally, experts agreeing to support the research (and educational) community through activities such as A technical helpdesk Advice on similar problems (e.g. purchase of video capture tools) The individual cells of the ecosystem will be grouped according to properties such as the nature of the dataset and the research activity supported by the tool and the underlying technique used. An additional concept introduced in DIGHUMLAB is in context ; no tool and no dataset will be exposed without being set in a context, which can happen either via a case story, a paper based on the dataset and/or tools or via a description of a workflow. This connectivity in context can be illustrated as follows: The objective of this paper is to clarify especially the definitions in these four areas: Tools, objects, tutorials and experts. This paper will focus on definitions and descriptions often by identifying the boundaries to connected activities. 2

CHARACTERISTICA OF THE ELEMENTS OF DIGHUMLAB Focusing on the four circles: tools, objects, tutorials and experts, I list below the (some?) requirements, which needs to be satisfied. These all relate to the overall idea of tools being applicable for more research groups, of presenting information in context and of a strong support for usage. CHARACTERISTICA OF TOOLS AND SERVICES BEING PART OF DIGHUMLAB In the following I use the following distinction between tools and services: Tools are computer software (programs, apps) or hardware (such as cameras), which can be used as standalone (like Word) or be initiated/connected in/via a software program (like Gephir in a javascript programme). Often but not always does a tool act on data and produces data or visualisations (incl. video) in a new form. Services are deployed tools, which have been combined to serves or specific tasks and often are predefined to act on specific objects, which produce data or visualisation in a new form. Examples of tools are Word, HTTrack, lemmeriser, a camera, software to control a camera. Examples of services are Google Docs, Larm. A number of conditions need to be satisfied in order for a tool and services to be part of the DIGHUMLAB ecosystem. These relates to relevance, documentation, ease of re-use and broader interest. For tools the following always apply: - Discoverable o Tools should be described using the TaDirah taxonomy and its applicability must be placed in the research life cycle - ease to use o There must be a clear description of how to use the tool o If possible and relevant, there should be a well described API o All elements must be part of the helpdesk infrastructure - In context o The usage of tools and should be addressed in a course, via a tutorial or as part of a case study - Well defined responsibility for maintenance o All elements must have well described procedures for running and maintenance o The use of tools, services and digital objects should be monitored and there must be access to information on usage - Reusable o Be useful for a broader community (i.e. not specifically developed for a researchers special needs) For open source tools we have additionally: - Should adhere to an open source license - Should be posted (e.g. Github, Sourgeforce) together with readme file 3

Services developed as part of DIGHUMLAB: - services should be open and free to use for all (if commercial rights permit) o The usage of tools should be addressed in a course, via a tutorial or as part of a case study o All elements must be part of the helpdesk infrastructure o All elements must have well described procedures for running and maintenance o The use of tools, services and digital objects should be monitored and there must be access to information on usage CHARACTERISTICA OF COLLECTIONS OF DIGITAL OBJECTS BEING PART OF DIGHUMLAB Digital objects have a slightly different role than tools. Objects often are grouped in collections (such as the radio archive, the newspaper collection, the map-collection etc.) and the relation is to the collection, not the individual elements. The collection must be usable via some of the tools being part of DIGHUMLAB. Furthermore the following requirements should be satisfied: - The datasets should be open for as many as possible (may be restricted by copyright and privacy) - The metadata and the form of the object should adhere to standards - Access to the metadata and the objects should adhere to international standards for exchange (e.g. OAI-PMH) or be well defined via an API for using the data - The license for use must be defined (creative commons) - Access to and use of the objects should be supported by the helpdesk infrastructure - The use of digital objects should be monitored and there must be access to information on usage CHARACTERISTICA OF TUTORIALS BEING PART OF DIGHUMLAB The tutorials being part of DIGHUMLAB all relate directly to the use of tools or to how data are used (and as such require a tool). General educational tutorials and workshops, which address DIGHUMLAB in general, are as such not part of DIGHUMLAB, unless they relate to tools or data being part of DIGHUMLAB. For tutorials the following apply: - online tutorials are freely available for all - Workshops and in person consultation are only available for members of the DIGHUMLAB consortia - The tutorial must be announced via the DIGHUMLAB website 4

CHARACTERISTICA OF EXPERTS CONTRIBUTION TO DIGHUMLAB Two categories of experts form an important part of DIGHUMLAB: DH-knowledge: advising on how tools are used to address Digital Humanities research questions or how they can be used in Digital Humanities teaching. IT-knowledge: Supports the use of tools and can act as it-developers in specific research questions. The conditions for using the services are: - The research question makes use of either DIGHUMLAB digital object collection or uses DIGHUMLAB tools or services - The service is available for all members of the DIGHUMLAB collaboration The experts will be organised via a knowledge network accessible via the DIGHUMLAB website or via the different centers own websites. As experts support researchers and students from all participating universities, a proper digital communication tool should be in place. Experts (especially it-developers) can be affiliated with projects for longer periods. The organisation of this offer needs to be clarified. EXISTING TOOLS AND SERVICES IN DIGHUMLAB DIGHUMLAB offers access to four services, which support building corpora consisting of sections of digital objects and refine and annotate these. These are: The CLARIN workbench supporting: - Common search and retrieval among objects deposited in Clarin.dk (?) - Support for authentication (WAYF) - Building of selected corpora (kurv) - Access to tools to annotate selected corpora (?) The Clarin workflowplanner contains tools to transcribe and annotate resources which can be combined to form workflows. The workflow planner is implemented as a webservice. The workflow planner allows users to use the tools for (the individual tools are described in more detail in appendix 1): o o o o OCR-tools (CuneiForm, Tesseract-OCR) Text extraction/converion tools such as PDFMiner, LibraOffice, html2text, CST's RTFreader and Flat text to CBF converter Linguistic annotation (CST's Name recogniser, OpenNLP tools PosTagger, Brill's PoStagger, CST-Lemmatiser, Bohnets parser) Conversation: CST paragraf- og sætningssegmenter for dansk og engelsk, TEIP5- segmenter, TEIP5-tokeniser/sentence extractor, CoNLL converter, espeak) The LARM workbench/platform with access to Audio/Video/Television/Newspapers and supporting: 5

- common search and retrieval - support for selecting and sharing corpora via projects - support for authentification (WAYF) - support for playing the multimedia objects - support for annotating the multimedia objects The metadata resides on the LARM platform; the AV material resides on SB. The connection to the objects uses SBs media stream which does not have a well defined API. The Netlab workbench for work with material from webarkivet (SB and KB) - support for retrieval based on URL and on search - support for selecting a corpora - Support for word-analysis (shine an n-gram type of tool)? DIGHUMLAB has two videolabs, one in Aalborg (VILA) and one in Kolding (name?). These are equipped as follows: Vila: - Xxx - yyy Kolding - xxx - DIGHUMLAB-DATASETS Most of the data is open access; however, a fair amount of the contemporary material is closed as privacy hinders open access to the material. CLARIN: Collection consists of: - DK-CLARIN Language for Special Purposes (LSP) corpus consists of texts from seven selected domains. It comprises 11 M tokens from the period 2000-2010, complementing the existing Danish general language corpora. A description of the corpora and its making can be found at http://cst.ku.dk/resurser/fagsprogligtkorpus/dkclarin_lspcorpus_documentation_01012013.docx Size and the basis for the collection is shown below: Domæne Number of words Health 1 (netpatient.dk, Søfartsstyrelsen, Sundhedsstyrelsen, regionh, Libris, Aktuel Naturvidenskab) 2.087.183 Health 2 (sundhed.dk) 3.003.409 6

Agriculture (Danmarks JordbrugsForskning) 2.376.029 Environment (Hovedland, Danske Miljøundersøgelser, Det økologiske Råd, Aktuel Naturvidenskab (via DMI)) 1.460.644 Economy (SKAT, Finanstilsynet, Erhvervs- og Selskabsstyrelsen) 1.351.169 IT (Libris, Open Office, Aktuel Naturvidenskab) 1.098.587 Construction (Statens Byggeforskningsinstitut, Erhvervs- og Byggestyrelsen, Murerfagets Oplysningsråd) 577.392 Nanotechnology (inano (Interdisciplinary Nanoscience Center, AU), Nano (DTU), Niels Bohr Institutet, Forskningscenter Risø, Ministeriet for Sundhed og Forebyggelse (via DTU), Miljøstyrelsen, Aktuel Naturvidenskab) 358.144 - Other text collections (7.725 files) - A series of three different material types related to o Talk bank conversations (7 sound files, 13 video files and 21 annotated collections) o Interviews with young students (14 sound files, 46 video files and 16 Annotated collections) o Conversation among students (16 sound files, 21 video files and 4 Annotated collections) - Structured data (owl, csv) - Jydsk ordbog (TEIPS) Access/API etc. All the data in Clarin are stored according to the Clarin license All metadata can be archived via OAI-PMH Netarchive Content: The netarchive has harvested the Danish Internet since 2005, where the Danish Legal Deposit Law was changed to include this type of material. The task is undertaken by the two legal deposit libraries in Denmark, the State and University Library, Aarhus and the Royal Library in Copenhagen. The Netarchive contains more than 10.000.000.000 documents. Access/API All data are protected and access requires an application. Radio- and television collection Content 7

The State and University Library, Aarhus, hosts the national media collection. Due to the library focus on digitalization it now contains near to 2 million programs: Harvesting of radio- and television continuously since Jan. 1 2006 Retrodigitalisation of television (Mpeg-1, mpeg-2, H.264) Retrodigitalisation of DR radio tapes from approx. 1920 (in collaboration with DR) (WAV, BWF, mp3) 52.000 commercials Access: Access is required the home institution to have paid for the Copydan AV package for material newer than 100 years. Access to other material is free (which license?). Access is provided via Media stream, a portal developed by the State and University Library. No API is provided. Metadata may be exchanged via agreement (which is the case for Larm.fm). DIGHUMLAB TUTORIALS, COURSES AND WORKSHOPS Each of the themes has or is in the process of developing tutorials and courses. The experience up to now has been, that few will sign-up for scheduled courses; therefore, most of these are ondemand, where the exact content will be tailored to the specific situation. The following on-demand courses with are available: Workshops: 8

ONLINE TUTORIAL/CASE STUDIES: Topic Name of tutorial Reference to online material Name of digital Basis arbejde med kopora Kom godt i gang http://info.clarin.dk/kom-godt-igang/soeg-i-tekst/ objects Saxos Gesta Danorum Den ældste danske viseoverlevering Name of tool(s) Name of expert/responsible institution Seniorrådgiver Claus Povlsen, cpovlsen@hum.ku.dk info@clarin.dk Brug i hum. forskning Deponering af tekster Assyrian texts: A test case on the clarin.dk platform Eksempler på brug http://info.clarin.dk/showcases/ Nemmere deponering af tekster i TEIformat https://clarin.dk/clarindk/toolsupload.jsp: vælg Klargøring tekstresurser (url vil ændre sig) DanNet (samt FinnWordNet, TEKsaurus, Swesaurus, plwordnet) Wordties Nem deponering????? A standard TEIencoding (XML) was created for a sample set of Assyrian transcriptions. The custom TEI encoding used elements not typically required or used in normal text resources on clarin.dk, therefore this test case Professor Bolette Sandford Pedersen, bspedersen@hum.ku.dk Bart Jongejan, Dorte Haltrup Hansen 1

Interface for reading/under standing Latin 2 Locus Classicus http://www.locus-classicus.org Introductory video: http://www.locusclassicus.org/#introvideo required a custom transformation (XSL) to produce text and symbol/indicator output on the resource webpage. In addition, a relational link was provided to link a corresponding translation of the Assyrian text. The transcription and the translation was made viewable in the same screen/interface and with text aligned on line segments Sourcecode at: clarin.dk (not publicly available) Source texts, commentary notes and translations are converted into

TEI/XML files imported and integrated with a frontend web application built with Meteor.js (JavaScript) and using an opensource project, exist-db, as the XMLdatabase/RESTX Q backend. Available from https://git.sc.ku. dk/kuhumcst/lat in-read-demo (KU only) Brug af webarkiver Kursusmateriale til workshop samt liste over værktøjer http://www.netlab.dk/wpcontent/uploads/2015/08/forsk erbrug-af-webarkiver-en-kortindfoering.pdf Netarkivet (KB+SB) Internet Archive Wayback Machine; nb@cc.au.dk Basic editing of video Final Cut Pro X Basics https://mobilelabtutorials.wordp ress.com/mobilelabs/video/final-cut-pro-xbasics/ https://mobilelabtutorials.wordp ress.com/mobilelabs/video/premiere-pro-basics/ Final Cut pro X Akademisk Medarbejder Max Roald Eckardt (mrec@sdu.dk) og Julia Ruser (jurus@sdu.dk) Basic editing Premier Pro Premier Pro Akademisk Medarbejder Max of video Basics Roald Eckardt (mrec@sdu.dk) og Julia Ruser (jurus@sdu.dk) Document InDesign https://mobilelabtutorials.wordp InDesign Julia Ruser (jurus@sdu.dk) 3

ress.com/mobilelabs/document/indesign/ Tran-scription CLAN https://mobilelabtutorials.wordp ress.com/mobilelabs/transcription/clan/ Tran-scription ELAN https://mobilelabtutorials.wordp ress.com/mobilelabs/transcription/elan/ Tran-scription PRAAT https://mobilelabtutorials.wordp ress.com/mobilelabs/transcription/praat/ Recording GOPRO https://mobilelabtutorials.wordp ress.com/mobilelabs/recording/gopro/ Transana http://www.vila.aau.dk/resource s/guides%2c+resources+and+r eferences/ CLAN ELAN PRAAT GOPRO Transana Julia Ruser (jurus@sdu.dk) Adjunkt Jacob Davidsen (jdavidsen@hum.aau.dk) Adjunkt Jacob Davidsen (jdavidsen@hum.aau.dk) Akademisk Medarbejder Max Roald Eckardt (mrec@sdu.dk), Julia Ruser (jurus@sdu.dk) Adjunkt Jacob Davidsen (jdavidsen@hum.aau.dk) TUTORIALS/CASES UNDER UDARBEJDELSE Topic Name of tutorial Deadline for færdiggørelse Name of digital Brug af NLPværktøjer Text analytics Introduktion til LARM 4 Annotering af tekster Basal Text Analytics Kom i gang med Larm Sommer 2016 September 2016 Sommer, 2016 objects Radio and television from SBs media Name of tool(s) LARM platformen Name of expert/responsible institution Sussi Olsen Lene Offersgaard, Dorte Haltrup Hansen Iben Have

archive WORKSHOPS Topic Name of tutorial Reference to online material Name of digital objects Brug af webarkiver NetLab workshop om webarkivering Kursusbeskrivelse: http://www.netlab.dk/wpcontent/uploads/2016/02/work shop_brochure_v3.pdf Kursusmateriale: http://www.netlab.dk/wpcontent/uploads/2015/08/forsk erbrug-af-webarkiver-en-kortindfoering.pdf Netarkivet (SB+KB) Internet Archive Name of tool(s) Wayback machine Web Snapper Paparazzi Video Download Helper Musicbox WireTap Studio Videobox SnagIT HTTrack Name of expert/responsible institution nb@cc.au.dk TEI-workshop Video based interaction analysis Multimodal video analysis Workshop om brug af TEI, nr 3 Video-Based Interaction Research: Technical course in data collection, video editing, transcription and sharing Multimodal video analysis Afholdes 3. maj, programmet er næsten klart, bliver sat på http://info.clarin.dk/kurser/ http://www.kommunikation.aau. dk/arrangementer/arrangement /video-based-interactionresearch--technical-course-indata-collection--video-editing-- transcription-andsharing.cid197927 http://www.kommunikation.aau. dk/arrangementer/arrangement /research-seminar--multimodal- Video ELAN CLAN Transana Adobe Premiere Pro Lene Offersgaard Adjunkt Jacob Davidsen (jdavidsen@hum.aau.dk) Adjunkt Jacob Davidsen (jdavidsen@hum.aau.dk), Professor Pirkko 5

Adobe Premiere Ethnomethodo logy ELAN Adobe Premiere Seminar with Ken Liberman Introduktion til ELAN 2. Semester Kommunikation og Digitale Medier (Aalborg), 2015 og 2016 video-analysis.cid176466 http://www.kommunikation.aau. dk/arrangementer/arrangement /adobe-premiereworkshop.cid165628 http://www.kommunikation.aau. dk/arrangementer/arrangement /seminar-with-kenliberman.cid165591 Adobe Premiere ELAN Raudaskoski Adjunkt Jacob Davidsen (jdavidsen@hum.aau.dk) Adjunkt Jacob Davidsen (jdavidsen@hum.aau.dk) WORKSHOPS/KURSER UNDER UDVIKLING Topic Name of tutorial Deadline workshop/course CLARIN Digital Humaniora Text analytics Digital Humanioraseminar med særligt fokus på CLARIN Workshop: Introduktion til development Sommer 2016 August 2016 Name of digital objects Name of tool(s) Name of expert/responsible institution Bente Maegaard, Lene Offersgaard, samt DeIC Lene Offersgaard 6

Gamle aviser og sprogteknologi Sproglige studier i NetLab-korpus workshop Arbejde med AV-materiale text analytics Forslag November 2016 Aviser fra SB KU og SB (skal aftales) Forslag December 2016 Netarkivet KU-tema1 og NetLab Q4 Radio/TV fra SB Larm Iben Have INTERNATIONAL ACTIVITIES: DIGHUMLAB is the Danish partner in Dariah and Clarin and participates actively in IIPC and TeleARC. Below is a matrix containing international relations: Name, Organisation/Initiative /Project Activity/ International, scientific organisations Organisation Name, participant How does Danish initiative/dighumlab contribute How does the initiative contribute to DIGHUMLAB/Danish Digital Humanities CLARIN ERIC National coordinators Forum KU Bente Maegaard Contribute to development and decision-making Knowledge about what happens in other countries, possibilities for collaboration General Assembly KU Bente Maegaard Relevant information on different 7

8 DARIAH.EU ERIC Standing Committee for CLARIN Technical center CLARIN Standard Committee activities KU Lene Offersgaard Updates on activities in CLARIN countries about centres and technical collaboration in CLARIN ERIC KU Claus Povlsen Contribution to new developments within standards and formats CLARIN Legal Issues Commitee CLARIN Assessment Committee Board of Directors KU Bente Maegaard Knowledge about RI, governance, measuring progress, etc. National Coordinators Committee Updated knowledge about standards and formats KU Sussi Olsen Contribute to clarification and Updated knowledge about licenses development and IPR KU Lene Offersgaard Assessment of data centres Detailed knowledge about european data centres with focus on language AU Marianne Huang contribution to development and decision making, reporting on DARIAH-DK inkind General Assembly AU Marianne Huang strategic knowledge on DK-EU collaborations, practice and national inkind for national representative Joint Research AU Marianne Huang strategies for cross-section Committee DARIAH Research & Education, cochair DARIAH Advocacy: impact collaboration AU Marianne Huang knowledge specifically on AV and Cultural Big Data as well as European Educational organisations and frameworks, coordinating working groups and dissemination of services AU Marianne Huang furthering DK impact policy of DH: in education and industrial partnerships (creative industries) Deep insight in RI, CLARIN ERIC, ERICs etc creating visibility for DK communities of practice and liasing for DK collaboration, specifically on AV, Cultural Big Data, creative impact relevant knowledge on European strategies knowledge on cross-section collaboration project-collaboration for communities of practice (dariahteach, Dariah Open Humanities, DARIAH Humanities at Scale) creating visibility for DK communities of practice and liasing for DK collaboration, specifically on AV, Cultural Big Data, creative impact

Europeana Europeana Research Advisory Board Task force for Smart Cities AU AU Marianne Ping Huang Hydra Open source community KB Anders Conrad Community around standards for interoperability of IIIF images KB Anders Conrad Europeana AU Marianne Ping Huang TELEARC President of AAU Lone Dirckinck- TELEARC Holmfeld knowledge of collaboration between reserach communities and GLAM, as well as knowledge of industry brokering in the cultural creative domain knowledge on research collaboration in citizens participation on iculture as well as open data impact on creative neighboorhoods. Liasing with European Capitals of Culture Development of digital repository solutions for TEI documents and scanned images. Attending community activities Digital images from KB exposed through IIIF protocol for use in humanities research DIGHUMLAB should take the lead in order to establish an ESFRI infrastructure within TEL (Technology Enhanced Learning) visibility for DK projects and communities as well as collaboration on building transnational ecosystems visibility for DK projects and communities as well as collaboration on building transnational ecosystems specifically for open cultural big data and iculture Repository and related technologies, as well as international knowledge network comprising research libraries, media and cultural institutions API's for image exchange and interoperability, developed in international consortium of national and university libraries TELEARC gather leading TEL-labs in EUROPE. TELEARC can act as the basic organisation for establishing an ESFRI infrastructure. Via konsortiet er der i fællesskab arbejdet på en Marie Curie ansøgning sidste år, ligesom jeg tror de fortsat afholder et årligt møde, vedligeholder online ressourcer og uddeler en TEL-phd-pris 9

Justice through education in the Nordic countries (JustED) affiliated AAU Kathrin Otrell-Cass exchange information about storage of videos Nordic Network of Interaction Studies on Communication Impairmant (NISCI) Digital Humaniora i Norden board member AAU Pirkko Raudaskoski Attending workshops on the topic Information and publications Initiator KU, AAU Bente Maegaard, Lone Dirckinck- Holmfeld ADHO SIG-medlem AU Marianne Ping Huang IIPC Open Source Wayback Machine KB, SB, AU Education/training/conferences Ulrich Have medlem af IIPCs arbejdsgruppe om Open Wayback; Bjarne - Anders - KB/SB involveren Initiated network, active in planning of first conference Ulrich bidrager med viden fra NetLab, der bliver brugt i udarbejdelse med dokumentation til Open Wayback; og fra forskerside bidrager vi med state-of-the-art forskningbaserede og forskningsinfrastrukturrelaterede input til IIPC. NetLab er blandt de få miljøer internationalt, der er langt fremme med udvikling af forskningsinfrastruktur til forskning i webarkiver, baseret på tæt samarbejde mellem forskere og webarkiver dette er i sig selv et væsentligt bidrag til det internationale miljø Suport knowledge exchange between Nordic countries IIPC er det førende internationale forum for arbejdet med webarkiver, primært ud fra en teknisk vinkel, men IIPC inddrager også i stigende grad forskersamfundene, der arbejder med arkiveret web. NetLab får uvurderlig viden om de internationale nyeste tekniske og forskningsmæssige udviklinger, og NetLab får et netværk og forum at præsentere vores arbejde i, både teknisk og forskningsrelateret. 10

European Advanced Workshop on Mobility and Social Interaction (MOBSIN) The networked Learning Conference Relation to other projects founding member AAU Paul McIlvenny Arrange workshops on relvant topics Co-chair AAU Thomas Byberg Arrange conference Information from workshops BUDDAH Pathenos Academic consultant seconded member of consortium AU Niels Brügger NB har deltaget i projektet med teoretisk og metodisk viden om og erfaring med at bruge arkiveret web i forskningsprojekter KU, KB Bente Maegaard, Anders Conrad, Lina Henriksen, Sussi Olsen, Bart Jongejan, Claus Povlsen BM works with international collaboration, LO works with data management, BJ works with tool integration NetLab har fået et uvurderligt netværk til et internationalt førende forsker- og webarkivmiljø, og flere af de i BUDDAH udviklede teknologier, primært fritekstsøgning (Shine) er senere blevet initieret og videreudviklet i det danske Netarkivet New opportunities for international collaboration Language Technology Observatory CLARIN-PLUS Nordic CLARIN Network Digital Humaniora i Norden (association) seconded through CLARIN ERIC UCPH is consortium partner UCPH is coordinator member of interim Board KU KU KU Bente Maegaard, Lina Henriksen, Sussi Olsen, Claus Povlsen Bente Maegaard, Lina Henriksen, Sussi Olsen, Claus Povlsen Bente Maegaard and CLARIN-DK bring CLARIN ERIC and language resource knowledge to industry contribute to the development of CLARIN ERIC Meet Nordic colleagues, exchange ideas, methods, tools, resources KU Bente Maegaard DIGHUMLAB ideas about Digital Humanities better understanding of industry needs more sustainable CLARIN ERIC Meet Nordic colleagues, exchange ideas, methods, tools, resources Follow and influence Digital Humanities in Nordic countries and beyond, be visible 11

Vetenskapsrådet (SE), komite vedr. infrastruktur for hum/samf/sund A Research Infrastructure for the Study of Archived Web, RESAW De #jesuischarlie à #offenturen: archives et archivage du patrimoine nativement numérique face aux attentats medlem (eneste internationale) Initiator, coordinator KU Bente Maegaard Knowledge about RI for humanities Knowledge about RI for humanities and social sciences, incl. health, new developments AU Niels Brügger RESAW var ikke 'opfundet', da DIGHUMLAB blev igangsat, men er derimod en international udløber heraf. RESAW sigter mod at opbygge en tværnational europæisk forskningsinfrastruktur til forskning i nationale webarkiver; NB var medinitiativtager til projektet, og har siden 2013 koordineret arbejdet, der munder ud i en Horizon 2020-ansøgning til forskningsinfrastruktur, som indleveres marts 2016. Derudover har DIGHUMLAB bidraget med støtte til den internationale konference, som RESAW-gruppen arrangerede sidste år i Aarhus, og NetLab har støttet flere af gruppens seminarer og workshops. Researcher AU Niels Brügger Projektet har fokus på brugen af webarkiver i forbindelse med pludseligt opståede samfundsmæssige begivenheder (skyderierne i Paris). NB samt Eld Zierau (KB) og Ditte Laursen (SB) deltager som IT-kyndig og som forskere RESAW-gruppen har haft en uvurderlig betydning for arbejdet i NetLab: 1) den har fungeret som et unikt internationalt netværk, hvor de førende europæiske nationale webarkiver og internationale, europæiske forskere, der forsker i webarkiver, samt ikke mindst IT-udviklere, har kunnet mødes i utallige sammenhænge (foredrag, fælles panels ved konferencer, forskermøder, workshops, konferencer...); 2) den har udgjort grundstammen i det konsortie, der står bag den snart indsendte Horizon 2020-ansøgning; 3) den har været springbræt for NetLabs deltagelse i store nationale forskningsprojekter i andre lande (BUDDAH, De #jesuischarlie à #offenturen) NetLab får adgang til et meget interessant projekt, som vi forventer kan bidrage til NetLab både forsknings- og forskningsinfrastrukturmæssigt 12

Political influence ECIU ESFRI Strategic Working Group Social and Cultural Innovation Member (more roles) AAU Vice chanclor aktivt medvirkende til at forme og varetage AAUs lobbyindstats overfor EU Delegate for DK KU Bente Maegaard Knowledge about RI for humanities, knowledge about DK DHL og de forskellige forskningsgrupper påvirker EU's rammeprogram og ESFRI gennem aktiv indsats for udformningen af infrastruktur callsene i H2020, hvilket på sigt kan skaffe europæisk finansiering Knowledge about RI for humanities and social sciences, new needs, new policies. Influence on the roadmap H2020 reference group Member AU Johnny Laursen Nordiske Humanistiske Dekanforum CreoDK Member (more roles) Member (more roles) KU Dekanatet, KU Præsentation af DHL, muligheder og udviklingspotentiale mellem de nordiske universiteter samt nordisk fokus på området. Opbakning fra de nordiske dekaner til den nordiske organisation og øget samarbejde på området KU Afdelingen for Strategi og forskningsstøtte aktivt medvirkende til at forme KUs lobbyindstats overfor EU 13

APPENDIX 1: TOOLS CLARIN Udarbejdet af CTS (Bente Maegaard, Lene Offersgaard, og andre) CMDI METADATA & PID WORKFLOW (INTERNAL TOOL) A custom workflow tool to process existing resources, items, on clarin.dk (escidoc/fedora Commons repository) to create a new valid CMDI (Component Metadata Infrastructure) metadata instance according to a valid CMDI profile schema, and apply PIDs (Persistent Identifiers) to content data and the current version release of the resource. The tool is written in Node.js JavaScript and the source code is available from a Git repository, and released under a MIT license. It uses existing REST services include escidoc PID Manager REST (production) and escidoc middleware Item REST service (production), for PID creation and to apply production changes to the main escidoc repository. Once items have a valid CMDI instance, they are harvested via the OAI-Provider and made available on the CLARIN VLO. escidoc software and tools are released under a ESCIDOC CDDL licence. METADATA UPDATER (INTERNAL TOOL) The escidoc sub-resource update service can be used to edit metadata resources in our escidoc repository directly using the intermediary REST service provided by escidoc under their own ESCIDOC CDDL license. The updater REST service source code is available on Github at https://github.com/escidoc/escidoc-metadata-updater/. The additional escidoc update service has been a useful tool in the normalisation of existing metadata resources in our escidoc repository, and together with a internal updater tool written in Node.js JavaScript with a MongoDB database, uses this REST service to make individual or bulk changes to metadata resources in the escidoc repository. The service is also used with resolving PID handles with the '@md=cmdi' part-identifier to its CMDI metadata instance. The source code for the metadata updater tool is available from a Git repository with a MIT license. CMDI COMPONENT REGISTRY EDITOR FRONT-END WEB APPLICATION (CLARIN) The CMDI Component Registry Editor is an front-end, online web application to view, create and modify CMDI Profiles and Components. The original front-end is a Flex (Flash runtime-based) application. In 2015, a new HTML5 front-end web application was created, developed in React.js/Flux, and is currently has a beta release. The software is open-source, and available on Github, released under a GPLv3 license https://github.com/clarin-eric/react-webpack-comp-reg 1

. The current back-end software is available from the public available CLARIN SVN repository https://svn.clarin.eu/componentregistry/branches/componentregistry-2.0/. CMDI TOOLKIT (CLARIN) The CMDI toolkit is a Component Metadata Infrastructure toolkit containing a unit-testing framework for evaluating the CMDI metadata schemas and instances, and testing their conversions betweeng CMDI specification versions (1.1 and 1.2). The source code is available on the CLARIN SVN repository https://svn.clarin.eu/metadata/trunk/toolkit/. As the new 1.2 specification is developed and finalised, the original toolkit has been moved to a branch in the SVN. The toolkit provides useful conversions and unit-tests to help to evaluate our CMDI metadata schemas and resources. 2 INTEGREREDE VÆRKTØJER Værktøjer til transformation og annotation af resurser kan sættes sammen til workflows ved hjælp af clarin.dk s workflow-planner. Workflow-planneren analyserer inputtets beskaffenhed (filformat, tilstedeværelsen af tekst og muligvis andre indikatorer) og brugerens specifikation af outputtet og stykker en kæde af værktøjer sammen som opfylder disse betingelserne. Det er også muligt at workflow-planneren meddeler at ønsket ikke kan opfyldes med de værktøjer der p.t. er integrerede, eller at der er flere workflows der opfylder betingelserne. I det sidste tilfælde må brugeren træffe et valg. Workflowplanneren, som ligesom de andre moduler (deliver,deposit og search) er implementeret som webservice, kan downloades fra https://github.com/kuhumcst/dk- ClarinTools. Programmet er implementeret i en kombination af Java og Bracmat. Se http://kyoto.let.vu.nl/clin26_presentations/paper7.pdf for en kort beskrivelse af Bracmat. De værktøjer der er integreret kører ikke på CLARIN-DK s servere, men som webservices på en af CST s servere. Det sidste er dog ikke en nødvendighed. De p.t. aktiverede integrerede værktøjer er: CuneiForm Dette er et open-source OCR (optical character recognition) program. Det understøtter sprogene bulgarsk, dansk, engelsk, estisk, fransk, italiansk, kroatisk, lettisk, litauisk, nederlandsk, polsk, portugesisk, romansk, russisk, slovakisk, serbisk, slovensk, spansk, tjekkisk, tysk, tyrkisk, ukrainsk og ungarsk. Tesseract-OCR Et open source OCR program. Det understøtter (i CLARIN-DK kontekst) sprogene dansk, engelsk og græsk. PDFMiner Open source program som ekstraherer tekst fra PDF dokumenter og som egner sig særligt til tekstanalyse. (Et andet program, pdf2htmlex, er også integreret i CLARIN-DK, men deaktiveret, da programmet fokusserer på bevarelsen af lay-out, på bekostning af tekstfortolkningen.)

LibreOffice LibreOffice er en open source office suite, sammenlignelig med fx Microsoft Office. I CLARIN-DK anvendes LibreOffice som konverteringsværktøj fra diverse Office formater (doc,docx, etc.) til tekst. html2text Open source program som konverterer HTML til flad tekst. CST's RTFreader Dette program ekstraherer flad tekst fra en RTF-fil. Programmet segmenterer og tokeniserer. Tokenisering er optionel. Programmet kan også segmentere og tokenisere flad text. OCRskannet tekst kan tit indeholde støjkarakterer rundom i kanterne, hvor skanneren har klippet bogstaver over. Disse støjkarakterer prøver CST's RTFreader at eliminere. Programmet understøtter de fleste sprog som håndterer især blanktegn og punktummer på samme måde som dansk. Kildetekst: https://github.com/kuhumcst/rtfreader Flat text to CBF converter Dette program konverterer flad tekst til Clarin Base Format (TEIP5-DKCLARIN) CST paragraf- og sætningssegmenter for dansk, CST paragraf- og sætningssegmenter for engelsk Disse to programmer laver TEIP5-DKCLARIN-ANNOTATION annotationslag for sætninger og paragraffer til en basistekst i TEIP5-DKCLARIN format. TEIP5-segmenter Læser TEIP5-DKCLARIN-ANNOTATION annotationslag for tokens og sætninger og producerer segment annotationer, hvor segmenterne referer til token annotationen, og ikke til tokens i basis teksten (i TEIP5-DKCLARIN). TEIP5-tokeniser/sentence extractor Læser TEIP5 og producerer token- og sætnings annotationer. Annotationerne refererer til basisteksten, men indeholder også selv tokens resp. sætninger som læsbar tekst. CST's Name recogniser CST's navnegenkender klassificerer navne i personnavne, stednavne og andre navne. CST's navnegenkender er udelukkende til danske tekster. OpenNLP tools PosTagger Open source POS tagger, understøtter mange sprog. Vi har udvalgt dansk og engelsk til CLARIN- DK platformen. Brill's PoS-tagger 3

CST's POS-tagger er en udvidet udgave af Brill-taggeren, med tilføjelser til håndtering af XML og til forbedret håndtering af ord med store bogstaver i fx overskrifter. I CLARIN-DK s millieu understøtter Brill's PoS-tagger sprogene dansk og engelsk. Kildeteksten findes her: https://github.com/kuhumcst/taggerxml CST-Lemmatiser Lemmatiser for bulgarsk, dansk, engelsk, estisk, farsi, fransk, græsk, islandsk, italiansk, latin, makedonsk, nederlandsk, polsk, portugesisk, romansk, russisk, slovakisk, serbisk, slovensk, spansk, tjekkisk, tysk, ukrainsk og ungarsk. Programmet understøtter input i XML eller plain text format. Programmet beskrives i http://aclweb.org/anthology/p/p09/p09-1017.pdf. Kildeteksten ligger på GitHub: https://github.com/kuhumcst/cstlemma. Ligeledes ligger det tilhørende trainingsprogram på GitHub: https://github.com/kuhumcst/affixtrain. De trænede lingvistiske resurser (24 sprog) ligger midlertidigt på http://cst.dk/download/cstlemma/, indtil det teknisk er muligt at deponere dem i CLARIN-DK-repositoriet. Træningsdata stammer fra forskellig kilder. En stor portion er hentet fra den slovenske Clarin hjemmeside http://www.clarin.si (MULTEXT-East leksika til bulgarsk, estisk, farsi, makedonsk, romansk, slovakisk, slovensk, tjekkisk, ukrainsk, ungarsk.) Bohnets parser Bohnets parser, som bliver distribueret under navnet 'mate-tools', understøtter mange sprog. Vi har udvalgt dansk og engelsk til CLARIN-DK platformen. (Understøttelse af flere sprog kræver p.t. for mange serverresurser.) CoNLL converter Denne utility konverterer input fra TEIP5-DKCLARIN-ANNOTATION til CoNLL 2007 format. espeak Dette er open source TTS (text to speech) software. Stemmen minder om 1990 erne. Følgende sprog understøttes: afrikaans, albansk, armensk, bosnisk, bulgarsk, catalansk, kinesisk, kroatisk, tjekkisk, dansk, nederlandsk, engelsk, esperanto, estisk, finsk, fransk, georgisk, græsk, hindi, ungarsk, islandsk, indonesisk, italiansk, kannada, kurdisk, latin, lettisk, makedonsk, malayalam, polsk, portugisisk, rumansk, russisk, serbisk, slovakisk, spansk, LARM Forfattet af Ivan Dehn, tilrettet (lettere) af Birte Christensen-Dalsgaard Den nuværende version af LARM er en opgradering (teknologisk) af en tidligere version, den såkaldte silverlight version. Ambitionen var, at opdateringen til brug af HTML5 i stedet for Silverlight skulle resultere i en applikation med samme funktionalitet som oprindeligt. 4

Værktøjer og funktionaliteter i LARM Silverlight versionen LARM er som udgangspunkt et værktøj der giver adgang til knapt 900.000 udsendte radio programmer, ca. 26.000 program oversigter i OCR scannet dpf og efter opgraderingen til HTML ca. 1.140.000 TV programmer. Adgang sker via en online portal hvor brugeren verificeres gennem WAYF. I Silverlight versionen skal brugeren være logget ind for at få adgang til både metadata og content. Under udviklingen af LARM 2010-2013 identificerede vi følgende behov for værktøjer fra deltagerne i LARM i forbindelse med en række workshops afholdt på DR. fritekst søgning med begrænset boolske operatorer, quote søgning, filtrering i søgning på assets med vedhæftede filer og med annotationer afgrænsede søgninger indenfor bestemte perioder "drill down" søgning gennem materialet på årstal, måneder og dage. Tilknytning af egne metadata ark til assets annotering på tidslinien i de enkelte assets gemme assets i egne "mapper" og inviterer andre brugere til deling Mulighed for upload af binær fil i tilknytning til assets Data indsamling i LARM Silverlight versionen Data kan indtastes 3 steder i LARM Som annotationer på tidslinien til et lyd indslag o Annotationer er som offentlige og kan ses af alle brugere af LARM Egne metadata skemaer til assets o Disse skal oprettes specielt til de enkelte projekter Upload af filer til bestemte assets Værktøjer og funktionaliteter i LARM HTML versionen I HTML versionen kan brugerne umiddelbart søge i alt materiale uden at være logget ind gennem WAYF. Værktøjerne er som udgangspunkt de samme som LARM i Silverlight, på nær: fritekst søgning med begrænset boolske operatorer o søgning i LARM HTML er prioriteret efter behov undervejs i udviklingsforløbet. Behovene for søgning har ændret sig siden silverlight versionen og der foreligger nu et forslag til opgraderinger som vil bringe søgningen i HTML versionen op på et niveau over Silverlight versionen (se bilag: LARM_T-20160111-1_Tasks.pdf) søgning på assets kun med annotationer o Det blev ikke prioriteret i forbindelse med LARM HTML Mulighed for upload af binær fil i tilknytning til assets 5

o Det blev ikke prioriteret i forbindelse med LARM HTML o Det er dog en funktionalitet der er indbygget i EZarchive, som bruges i DigHumLab regi og da LARM bygger på samme kodebase, kan funktionaliteten forholdsvis let integreres. søgning på assets kun med vedhæftede filer o Udgik da upload pt. ikke er en mulighed i LARM HTML Ekstra i LARM HTML annotering på alle assets incl. pdf LARM.fm kan nu tilgås via det API der er blevet udviklet til CHAOS v6. Dokumentationen kan tilgås på nettet: o http://chaos-community.github.io/chaos-api-documentation/v6/ og der er udarbejdet en pdf til distribution (se bilag: archives.dighumlab.org Documentation.pdf) Data indsamling i LARM HTML versionen Data kan indtastes 3 steder i LARM Som annotationer på tidslinien til et lyd indslag o Annotationer er som offentlige og kan ses af alle brugere af LARM annotering på alle assets incl. pdf Egne metadata skemaer til assets o Disse skal oprettes specielt til de enkelte projekte CHAOS TEKNOLOGI STAK Med CHAOS var visionen at kunne tilbyde fleksible løsninger via et API, samt mulighed for at tilføje udvidelser vertikalt, således at man f.eks. kunne udbygge platformen til at håndtere særlige opgaver (som f.eks. ifm. CoSound). Derudover var det hensigten så vidt muligt at erstatte MS produkter med opensource, som MySQL, Solr, o.s.v. CHAOS består af Portal Core som er kodet i C# i.net frameworket og Portal Modules som er skrevet i C# og java, men i princippet er uafhængige af kode valg. MS.Net er i dag opensource og er cross platform på både Windows, Mac og Linux. 6

CHAOS teknologi stak i 2012 7

CHAOS diagram CHAOS OPBYGNING CHAOS inddelt i 4 niveauer: 1: Portal Http Handler PHH er øverst i hierarkiet, PHH er koblet direkte på IIS for at begrænse overhead og oversætte alle http kald til Portal Core's datamodel og alle returneringer fra Portal Core's datamodel til http. 2: Portal Core Den centrale del i CHAOS er Portal Core. Portal Core fungere som servicehost, den er biblioteket for de grundlæggende funktionaliteter, der deles med portal modulerne, som sessions, authentication, User/Group management, error handling, index, cache, m.fl. Portal Core indeholder en liste over registrerede extentions i modulerne, så den kan videreføre kaldet til det rigtige portal modul. 3: Portal Modules Moduler leverer de specifikke funktionaliteter i CHAOS, f.eks. LARM og Octopus (asyncron content håndtering). Moduler er opbygget og compilet til dynamic link libraries (dll) der anvendes for modularitet og til genbrug af kode. Modulerne indeholder information over Portal Core biblioteket, så de kan udnytte dets grundlæggende funktionaliteter. Modulerne kan udnytte hinandens funktionaliteter, f.eks. udnytter API kald i LARM funktionaliteter i EZarchive, som igen bruger funktionaliteter i MCM til f.eks. metadata opdateringer. Det giver en stor fleksibilitet i udviklingen af CHAOS, at afhængighederne kan være mellem core og moduler, moduler mellem moduler og moduler mellem "andet". Dvs. Man er ikke tvunget til at benytte allerede eksisterende funktionaliteter internt i modulerne, men kan vælge frit mellem funktionaliteterne i alle moduler der er tilknyttet den enkelte instans. Modulerne bliver kaldt via endpoints og views. Endpoints bestemmer input og output kald til funktionaliteterne. Views håndterer indexeringen og består af minimum 2 dele, et index kald til data input og et query kald til data output. I tilfælde af at modulerne håndterer metadata indeholder de både endpoints og views, f.eks. LARM. I tilfælde af at de er designet til f.eks. job håndtering f.eks. Octopus indeholder de alene endpoints. CHAOS kan let udvides med moduler, det fungere ved at de registreres i Portal Core, der derved kan aktiverer dem, når de bliver kaldt. 8

4: Data kilder /eksterne ressourcer Data kilderne er de eksterne ressourcer, der er knyttet til CHAOS som f.eks. index søgning gennem Solr, data cache via Couchbase, m.fl. Globale ressourcer som Solr og Couchbase er delt gennem Portal Core, mens globale/specifikke ressourcer som MySQL kan være delt både gennem Portal Core og de enkelte moduler. CHAOS kan udvides med eksterne ressourcer, f.eks. biblioteker til content håndtering eller allerede eksisterende AWS platform services. CHAOS ENTITET/RELATIONS DIAGRAM CHAOS datamodel er designet til at være modulært og fleksibel. Kernen i CHAOS er et "tomt" objekt som man hæfter metadata og content på. Der kan tilknyttes flere forskellige metadata til hvert objekt via XML ark og schema'er. Det samme gør sig gældende for content. Objekterne defineres gennem en objekttype og relationerne mellem objekterne gennem objekt relationerne. Rettigheder styres enten gennem Brugere og Gruppers adgang til mapper i MCM, eller gennem roller i EZArchive, hvoraf der er 4: Anonymous, User, Contributor og Administrator. CHAOS data bliver indexeret i Solr og lagret i Couchbase. Ved søgninger i indexet, returneres id'erne fra resultaterne fra Solr, hvorefter id'erne slås op i Couchbase og retuneres. Derved holdes indexet i Solr på et minimum af plads. CHAOS datamodel 9

CHAOS NETVÆRK CHAOS er 100% deployet på AWS og udnytter AWS Core services som S3 storage, EC2 CPU, RDS databases, m.fl. CHAOS udnytter AWS Identity and Access Management (IAM) og routing services. Ved at bruge S3 og EC2 for CHAOS adgang til at kunne udnytte andre ressourcer AWS platform services tilbyder som f.eks. Hadoop via Elastic MapReduce, real-time streaming via Amazon Kinesis, Elastic Transcoder (der f.eks. bliver anvendt i EZarchive), m.fl. CHAOS er typisk desployet på følgende AWS EC2 instancer: o o o o o o 1 stk. NAT, Linux server 1 stk. VPN, Linux server 1 stk. Webserver, Linux server 1 stk. Portal API, Windows server (vil kunne deployes på Linux når MS lancerer en officiel.net linux runtime) 1 stk. Couchbase, Linux server 1 stk. Solr, Linux server Dertil knyttes S3 storage til i det omfang det er nødvendigt. CHAOS deployment på AWS CHAOS KALDSTRUKTUR CHAOS tilbyder et RESTful interface til upload, download, manipulation, og organisering af data. API'et kan bruges alene eller til at integrere CHAOS med andre projekter. API'et benytter get 10

eller post kald. Der benyttes post ved alle kald, der enten uploader data eller gemmer data. Returformatet er JSON og XML. Til at authenticate en session bruges login og password eller et oprettet token. Når CHAOS API kaldes modtages det af Portal HttpHandleren, som oversætter http kaldet til Portal Core's datamodel. Hvis kaldet retter sig mod en del af Portal Core's funktionalitet, f.eks. session, retunere den kaldet ellers slår Portal Core kaldet op i dets bibliotek over Portal Moduler og sender det videre. Når Portal Module modtager kaldet, bliver det eksekveret af den efterspurgte funktionalitet. Det beskrevne er illustreret i nedenstående diagram, som illustrerer en søgning i Eurovision Archive. Et search get kald til Eurovision archive. DRIFT AF CHAOS OG LARM: I forbindelse med budgettet for LARM i HTML i 2015, blev aftalen justeret i en SLA mellem AU og CHAOS Insight (se bilag: Service Level Agreement LARM 2015.docx). SLA en er justeret i januar 2016 efter overgang til drift af HTML versionen, (se bilag: Service Level Agreement LARM 20160112.pdf) 11