Data Science profession and education
Professor Yuri Demchenko,
System and Network Engineering Research Group
University of Amsterdam, Netherlands
Data Science is an emerging field of science, which requires a multi-disciplinary approach and has a strong link to Big Data and data driven technologies that create a strong transformational impact to all research and industry domains. Their sustainable development requires re-thinking and re-design of both traditional educational models and existing courses.
This talk will present a research and coordination activity done in the framework of the EU funded EDISON project to establish the new profession of Data Scientist for European research and industry [1, 2]. The EDISON project is specifically targeted to address issues of the data related skills and capacity building for European Open Science Cloud (EOSC) and European Digital Single Market (DSM), in particular targeting such issues as Data Stewardship, Research Data Management, research repeatability, and general data literacy.
The talk will provide overview of related research and activities to develop consistent and interoperable Data Science curricula that would empower the future graduates and professionals to build successful career paths as Data Scientist or other Data Science enabled professions. It will also refer to the Data Science champion universities community and related conference .
II. Data Science Professional definition
There is no well established definition of the Data Scientist or Data Science profession due to a diverse number of competences and skills expected from these specialists. The EDISON project proposed the community endorsed definition based on the definition by the NIST SP1500-1 publication  and extended with the essential characteristic to deliver the value to the organisation: “A Data Scientist is a practitioner who has sufficient knowledge in the overlapping regimes of expertise in business needs, domain knowledge, analytical skills, and programming and systems engineering expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle , till the delivery of expected scientific and business value to science or industry.”
The qualified Data Scientist should be capable of working in different roles in different projects and organisations such as Data Steward, Data Analyst, Data Architect, or Data Engineer, etc., and possess the necessary skills to effectively operate components of the complex data infrastructure and processing applications through all stages of the data lifecycle till the delivery of expected scientific and business values to science and/or industry.
III. EDISON Data Science framework and Data Science Competences
The EDISON vision for building the Data Science profession will be enabled through the creation of a comprehensive framework for Data Science education and training that includes such components as Data Science Competence Framework (CF-DS), Data Science Body of Knowledge (DS-BoK) and Data Science Model Curriculum (MC-DS), Data Science Professional Profiles (DSPP) .
The CF-DS includes common competences required for successful work of Data Scientists in different work environments in industry and in research and through the whole career path.
- Data Analytics including statistical methods, Machine Learning and Business Analytics
- Data Science Engineering: software and infrastructure
- Subject Domain competences and knowledge
- Data Management, Curation, Preservation
- Research Methods and Project Management
The DS-BoK defines the Knowledge Areas (KA) for building tailored Data Science curricula to support required Data Science competences. DS-BoK is organised by Knowledge Area Groups (KAG) that correspond to the CF-DS competence groups. DS-BoK incorporates best practices in Computer Science and domain specific BoK’s and includes KAs defined based on the Classification Computer Science (CCS2012), components taken from other BoKs and proposed new KAs to incorporate new technologies used in Data Science and their recent developments.
The MC-DS is built based on CF-DS and DS-BoK where Learning Outcomes are defined based on CF-DS competences and Learning Units are mapped to Knowledge Units in DS-BoK. Three mastery (or proficiency) levels are defined for each Learning Outcome to allow for flexible curricula development and profiling for different Data Science professional profiles.
IV. Data Science design using EDSF
The EDSF can be used for designing customizable Data Science curricula for target group of students or learners. In practice, a new curriculum should be designed targeting a specific target group of learners that can described as target professional groups or professional profiles with the corresponding competence profiles as defined in DSPP. Competences are used to define the Learning Outcomes (LO) which map to the Knowledge Units (KU) of DS-BoK and Learning Units (LU) of MC-DS. The LU together with KU are used to advise the teacher or instructor on the content of the customized courses or programs. Decision will remain with the course developer who can use the advice to adopt to the specific groups or resources available.
The EDSF provides also effective tools for knowledge, competences and skills assessment as a part of education or certification process. It can be also used for Data Science team building and organizational skills management.
- EDISON Project: Building Data Science Profession [online] http://www.edison-project.eu/
- Yuri Demchenko, et al, EDISON Data Science Framework: A Foundation for Building Data Science Profession For Research and Industry, 3rd IEEE STC CC and RDA Workshop on Curricula and Teaching Methods in Cloud Computing, Big Data, and Data Science (DTW2016).
- Data Science champion universities community and conference [online] http://edison-project.eu/third-edison-champions-conference-warsaw-poland
- NIST SP 1500-1 NIST Big Data interoperability Framework (NBDIF): Volume 1: Definitions, Sept 2015 [online] http://nvlpubs.nist.gov/ nistpubs/SpecialPublications/NIST.SP.1500-1.pdf
- EDISON Data Science Framework [online] http://edison-project.eu/edison/edison-data-science-framework-edsf