Data scientist

Description

Data scientists find and interpret rich data sources, manage large amounts of data, merge data sources, ensure consistency of data-sets, and create visualisations to aid in understanding data. They build mathematical models using data, present and communicate data insights and findings to specialists and scientists in their team and if required, to a non-expert audience, and recommend ways to apply the data.

Excludes people performing engineering and programming activities.
Excludes people performing managerial and general focus research activities.

Other titles

The following job titles also refer to data scientist:

data research scientist
research data scientist
data scientists
data expert
data engineer

Minimum qualifications

Master’s degree is generally required to work as data scientist. However, this requirement may differ in some countries.

ISCO skill level

ISCO skill level is defined as a function of the complexity and range of tasks and duties to be performed in an occupation. It is measured on a scale from 1 to 4, with 1 the lowest level and 4 the highest, by considering:

  • the nature of the work performed in an occupation in relation to the characteristic tasks and duties
  • the level of formal education required for competent performance of the tasks and duties involved and
  • the amount of informal on-the-job training and/or previous experience in a related occupation required for competent performance of these tasks and duties.

Data scientist is a Skill level 4 occupation.

Data scientist career path

Similar occupations

These occupations, although different, require a lot of knowledge and skills similar to data scientist.

data analyst
data quality specialist
computer scientist
chief data officer
ICT information and knowledge manager

Long term prospects

These occupations require some skills and knowledge of data scientist. They also require other skills and knowledge, but at a higher ISCO skill level, meaning these occupations are accessible from a position of data scientist with a significant experience and/or extensive training.

Essential knowledge and skills

Essential knowledge

This knowledge should be acquired through learning to fulfill the role of data scientist.

Data mining: The methods of artificial intelligence, machine learning, statistics and databases used to extract content from a dataset.
Visual presentation techniques: The visual representation and interaction techniques, such as histograms, scatter plots, surface plots, tree maps and parallel coordinate plots, that can be used to present abstract numerical and non-numerical data, in order to reinforce the human understanding of this information.
Information extraction: The techniques and methods used for eliciting and extracting information from unstructured or semi-structured digital documents and sources.
Statistics: The study of statistical theory, methods and practices such as collection, organisation, analysis, interpretation and presentation of data. It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments in order to forecast and plan work-related activities.
Information categorisation: The process of classifying the information into categories and showing relationships between the data for some clearly defined purposes.
Resource description framework query language: The query languages such as SPARQL which are used to retrieve and manipulate data stored in Resource Description Framework format (RDF).
Query languages: The field of standardised computer languages for retrieval of information from a database and of documents containing the needed information.
Online analytical processing: The online tools which analyse, aggregate and present multi-dimensional data enabling users to interactively and selectively extract and view data from specific points of view.
Data models: The techniques and existing systems used for structuring data elements and showing relationships between them, as well as methods for interpreting the data structures and relationships.

Essential skills and competences

These skills are necessary for the role of data scientist.

Normalise data: Reduce data to their accurate core form (normal forms) in order to achieve such results as minimisation of dependency, elimination of redundancy, increase of consistency.
Interpret current data: Analyse data gathered from sources such as market data, scientific papers, customer requirements and questionnaires which are current and up-to-date in order to assess development and innovation in areas of expertise.
Establish data processes: Use ICT tools to apply mathematical, algorithmic or other data manipulation processes in order to create information.
Execute analytical mathematical calculations: Apply mathematical methods and make use of calculation technologies in order to perform analyses and devise solutions to specific problems.
Handle data samples: Collect and select a set of data from a population by a statistical or other defined procedure.
Build recommender systems: Construct recommendation systems based on large data sets using programming languages or computer tools to create a subclass of information filtering system that seeks to predict the rating or preference a user gives to an item.
Perform data cleansing: Detect and correct corrupt records from data sets, ensure that the data become and remain structured according to guidelines.
Design database scheme: Draft a database scheme by following the Relational Database Management System (RDBMS) rules in order to create a logically arranged group of objects such as tables, columns and processes.
Implement data quality processes: Apply quality analysis, validation and verification techniques on data to check data quality integrity.
Collect ict data: Gather data by designing and applying search and sampling methods.
Manage data collection systems: Develop and manage methods and strategies used to maximise data quality and statistical efficiency in the collection of data, in order to ensure the gathered data are optimised for further processing.
Deliver visual presentation of data: Create visual representations of data such as charts or diagrams for easier understanding.
Report analysis results: Produce research documents or give presentations to report the results of a conducted research and analysis project, indicating the analysis procedures and methods which led to the results, as well as potential interpretations of the results.
Develop data processing applications: Create a customised software for processing data by selecting and using the appropriate computer programming language in order for an ICT system to produce demanded output based on expected input.

Optional knowledge and skills

Optional knowledge

This knowledge is sometimes, but not always, required for the role of data scientist. However, mastering this knowledge allows you to have more opportunities for career development.

Business intelligence: The tools used to transform large amounts of raw data into relevant and helpful business information.
Mdx: The computer language MDX is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the software company Microsoft.
Xquery: The computer language XQuery is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the international standards organisation World Wide Web Consortium.
Sparql: The computer language SPARQL is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the international standards organisation World Wide Web Consortium.
Ldap: The computer language LDAP is a query language for retrieval of information from a database and of documents containing the needed information.
Unstructured data: The information that is not arranged in a pre-defined manner or does not have a pre-defined data model and is difficult to understand and find patterns in without using techniques such as data mining.
Linq: The computer language LINQ is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the software company Microsoft.
Data quality assessment: The process of revealing data issues using ​quality indicators, measures and metrics in order to plan data cleansing and data enrichment strategies according to data quality criteria.
N1ql: The computer language N1QL is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the software company Couchbase.

Optional skills and competences

These skills and competences are sometimes, but not always, required for the role of data scientist. However, mastering these skills and competences allows you to have more opportunities for career development.

Manage ict data classification: Oversee the classification system an organisation uses to organise its data. Assign an owner to each data concept or bulk of concepts and determine the value of each item of data.
Perform data mining: Explore large datasets to reveal patterns using statistics, database systems or artificial intelligence and present the information in a comprehensible way.
Integrate ict data: Combine data from sources to provide unified view of the set of these data.
Manage ict data architecture: Oversee regulations and use ICT techniques to define the information systems architecture and to control data gathering, storing, consolidation, arrangement and usage in an organisation.
Define data quality criteria: Specify the criteria by which data quality is measured for business purposes, such as inconsistencies, incompleteness, usability for purpose and accuracy.
Manage data: Administer all types of data resources through their lifecycle by performing data profiling, parsing, standardisation, identity resolution, cleansing, enhancement and auditing. Ensure the data is fit for purpose, using specialised ICT tools to fulfil the data quality criteria.
Create data models: Use specific techniques and methodologies to analyse the data requirements of an organisation’s business processes in order to create models for these data, such as conceptual, logical and physical models. These models have a specific structure and format.

ISCO group and title

2511 – Systems analysts

 

 


 

 

References
  1. Data scientist – ESCO
Last updated on August 8, 2022