Text mining hos MAN Diesel Stine Fangel, SAS Institute COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Hvad får du med fra dette indlæg? Eksempel på anvendelse af text og data mining Viden om, hvordan et text og data mining-projekt gennemføres COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Hvad er text mining? Text mining er data mining på ustrukturerede data (Worddokumenter, PDF-filer, e-mails, blogs etc) Analytikeren har hidtil været begrænset til at anvende strukturerede data (pæne databaser) COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Agenda MAN Diesel Problemstillingen Processen i MAN text og data mining-projektet Resultater Afrunding COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Agenda MAN Diesel Problemstillingen Processen i MAN text og data mining-projektet Resultater Afrunding COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
MAN Group Globally active supplier of vehicles, engines and machinery Approx 15,5 billion in sales, over 55,000 employees in 120 countries Four leading business areas Commercial Vehicles Trucks Engines Buses Services Diesel Engines 2-stroke Turbochargers Turbomachinery Compressors Reactors 4-stroke Services Turbines Services Propulsion Packages Industrial Services Contracting Logistics Service platform MAN Diesel SE 2009/05/26 < 6 >
Diesel Engine Programme MAN Diesel SE 2009/05/26 < 7 >
World s Largest Diesel Engine 14K98ME-C Power: 97,300 kw ~ 1,000 midsize car engines Fuel consumption: 346 ton/day MAN Diesel SE 2009/05/26 < 8 >
Large Engines for Large ships 50% of World Trade is Powered by MAN Diesel Engines! MAN Diesel SE 2009/05/26 < 9 >
Agenda MAN Diesel Problemstillingen Processen i MAN text og data mining-projektet Resultater Afrunding COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Modellering og forretningsværdi Balancen mellem værdi og metoder Modellering Forretningsværdi COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Agenda MAN Diesel Problemstillingen Processen i MAN text og data mining-projektet Resultater Afrunding COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Datakilder i et text mining-projekt Identificere tilgængelige datakilder Datakilder COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Processen i et data og text mining-projekt Modellering: Udvikling vs anvendelse Udvikling Anvendelse ja/nej I modeludviklingen findes sammenhængen mellem eksisterende viden omkring karakteristika og respons I modelanvendelsen benyttes eksisterende viden omkring karakteristika og sammenhænge til beregning af en forventning til fremtidig respons COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Processen i et data og text mining-projekt Datastruktur til prediktiv modellering ID X1 X100 Target Data mining går ud på at finde de X er, der er signifikante for target samt korrelationerne mellem X erne Logistisk regression Servicerapporter Beslutningstræer Neurale netværk Hertil kommer integration af forretningsregler COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Processen i et data og text mining-projekt Data mining methodology SEMMA Sample Sampling? Explore Visual exploration Text mining Data reduction Modify Grouping, subsetting Transform Model Neural networks Decision trees Regression techniques Assess Model comparison, new questions COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Validitet ROC What is a good value for the area under the curve? baseline Kilde: http://bjaoxfordjournalsorg/cgi/content/full/93/5/623/fig2 COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Agenda MAN Diesel Problemstillingen Processen i MAN text og data mining-projektet Resultater Afrunding COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Resultater Top 10 Fejltype Faktiske værdi Predikterede værdi Fuel Pump 27% 27% Piston 25% 26% Bearings 18% 18% Roller Guide 16% 14% Cylinder Lubricator 15% 14% Cylinder Liner 11% 11% Camshaft 11% 9% Chain Drive 10% 9% Man Syst 10% 10% Cylinder Cover 9% 10% COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Agenda MAN Diesel Problemstillingen Processen i MAN text og data mining-projektet Resultater Afrunding COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Afrunding Skaber værdi af allerede eksisterende ustrukturerede data Kan afsløre hidtil ukendte mønstre Giver indblik i virksomhedens uskrevne viden Kan anvendes til mange typer opgaver COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Anden anvendelse af text mining Analytics-konference A2009 1-2 juli i København Stefan Hunziker, Director Medical Informatics, Hospital Lucerne, Switzerland: Analysis of medical records Kerem Tezic, Statistician, AFA Swedish Labour Market Insurances, Sweden: Threats and violence as a precursor to occupational injury Text mining of insurance-based information on police officers and security guards in Sweden 2004-2007 COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED
Stine Fangel StineFangel@sdksascom COPYRIGHT 2009, SAS INSTITUTE INC ALL RIGHTS RESERVED