Description
Big data archive librarians classify, catalogue and maintain libraries of digital media. They also evaluate and comply with metadata standards for digital content and update obsolete data and legacy systems.
Other titles
The following job titles also refer to big data archive librarian:
documentation archivist
computer tape librarian
digital documentation archivist
archive librarian
digital archivists
Minimum qualifications
An associate’s degree is generally required to work as a big data archive librarian.
ISCO skill level
ISCO skill level is defined as a function of the complexity and range of tasks and duties to be performed in an occupation. It is measured on a scale from 1 to 4, with 1 the lowest level and 4 the highest, by considering:
- the nature of the work performed in an occupation in relation to the characteristic tasks and duties
- the level of formal education required for competent performance of the tasks and duties involved and
- the amount of informal on-the-job training and/or previous experience in a related occupation required for competent performance of these tasks and duties.
Big data archive librarian is a Skill level 3 occupation.
Big data archive librarian career path
Similar occupations
These occupations, although different, require a lot of knowledge and skills similar to big data archive librarian.
data centre operator
data entry supervisor
statistical assistant
call centre analyst
medical transcriptionist
Long term prospects
These occupations require some skills and knowledge of big data archive librarian. They also require other skills and knowledge, but at a higher ISCO skill level, meaning these occupations are accessible from a position of big data archive librarian with a significant experience and/or extensive training.
database integrator
chief data officer
database administrator
data scientist
data quality specialist
Essential knowledge and skills
Essential knowledge
This knowledge should be acquired through learning to fulfill the role of big data archive librarian.
- Business intelligence: The tools used to transform large amounts of raw data into relevant and helpful business information.
- Database: The classification of databases, that includes their purpose, characteristics, terminology, models and use such as XML databases, document-oriented databases and full text databases.
- Resource description framework query language: The query languages such as SPARQL which are used to retrieve and manipulate data stored in Resource Description Framework format (RDF).
- Query languages: The field of standardised computer languages for retrieval of information from a database and of documents containing the needed information.
- Data extraction, transformation and loading tools: The tools for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure.
- Database development tools: The methodologies and tools used for creating logical and physical structure of databases, such as logical data structures, diagrams, modelling methodologies and entity-relationships.
- Database management systems: The tools for creating, updating and managing databases, such as Oracle, MySQL and Microsoft SQL Server.
- Data models: The techniques and existing systems used for structuring data elements and showing relationships between them, as well as methods for interpreting the data structures and relationships.
Essential skills and competences
These skills are necessary for the role of big data archive librarian.
- Manage ICT data classification: Oversee the classification system an organisation uses to organise its data. Assign an owner to each data concept or bulk of concepts and determine the value of each item of data.
- Manage database: Apply database design schemes and models, define data dependencies, use query languages and database management systems (DBMS) to develop and manage databases.
- Comply with legal regulations: Ensure you are properly informed of the legal regulations that govern a specific activity and adhere to its rules, policies and laws.
- Analyse big data: Collect and evaluate numerical data in large quantities, especially for the purpose of identifying patterns between the data.
- Write database documentation: Develop documentation containing information about the database that is relevant to end users.
- Maintain data entry requirements: Uphold conditions for data entry. Follow procedures and apply data program techniques.
- Manage content metadata: Apply content management methods and procedures to define and use metadata concepts, such as the data of creation, in order to describe, organise and archive content such as documents, video and audio files, applications and images.
- Manage data: Administer all types of data resources through their lifecycle by performing data profiling, parsing, standardisation, identity resolution, cleansing, enhancement and auditing. Ensure the data is fit for purpose, using specialised ICT tools to fulfil the data quality criteria.
- Manage archive users guidelines: Establish policy guidelines on public access to a (digital) archive and the cautious use of present materials. Communicate the guidelines to archive visitors.
- Maintain database performance: Calculate values for database parameters. Implement new releases and execute regular maintenance tasks such as establishing backup strategies and eliminating index fragmentation. Evaluate hardware products and operating systems.
- Maintain database security: Master a wide variety of information security controls in order to pursue maximal database protection.
- Manage digital archives: Create and maintain computer archives and databases, incorporating latest developments in electronic information storage technology.
Optional knowledge and skills
Optional knowledge
This knowledge is sometimes, but not always, required for the role of big data archive librarian. However, mastering this knowledge allows you to have more opportunities for career development.
- Information structure: The type of infrastructure which defines the format of data: semi-structured, unstructured and structured.
- Informatica PowerCenter: The computer program Informatica PowerCenter is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company Informatica.
- IBM InfoSphere DataStage: The computer program IBM InfoSphere DataStage is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company IBM.
- MDX: The computer language MDXย is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the software company Microsoft.
- Oracle Warehouse Builder: The computer program Oracle Warehouse Builder is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company Oracle.
- Visual presentation techniques: The visual representation and interaction techniques, such as histograms, scatter plots, surface plots, tree maps and parallel coordinate plots, that can be used to present abstract numerical and non-numerical data, in order to reinforce the human understanding of this information.
- Oracle Data Integrator: The computer program Oracle Data Integrator is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company Oracle.
- DB2: The computer program IBM DB2 is a tool for creating, updating and managing databases, developed by the software company IBM.
- Microsoft Access: The computer program Access is a tool for creating, updating and managing databases, developed by the software company Microsoft.
- XQuery: The computer language XQuery is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the international standards organisation World Wide Web Consortium.
- Pentaho Data Integration: The computer program Pentaho Data Integration is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company Pentaho.
- OpenEdge Database: The computer program OpenEdge Database is a tool for creating, updating and managing databases, developed by the software company Progress Software Corporation.
- ObjectStore: The computer program ObjectStore is a tool for creating, updating and managing databases, developed by the software company Object Design, Incorporated.
- MySQL: The computer program MySQL is a tool for creating, updating and managing databases, currently developed by the software company Oracle.
- SPARQL: The computer language SPARQLย is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the international standards organisation World Wide Web Consortium.
- SQL Server Integration Services: The computer program SQL Server Integration Services is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company Microsoft.
- Statistics: The study of statistical theory, methods and practices such as collection, organisation, analysis, interpretation and presentation of data. It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments in order to forecast and plan work-related activities.
- IBM InfoSphere Information Server: The software program IBM InfoSphere Information Server is a platform for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company IBM.
- IBM Informix: The computer program IBM Informix is a tool for creating, updating and managing databases, developed by the software company IBM.
- QlikView Expressor: The computer program QlikView Expressor is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company Qlik.
- Information confidentiality: The mechanisms and regulations which allow for selective access control and guarantee that only authorised parties (people, processes, systems and devices) have access to data, the way to comply with confidential information and the risks of non-compliance.
- LDAP: The computer language LDAP is a query language for retrieval of information from a database and of documents containing the needed information.
- PostgreSQL: The computer program PostgreSQL is a free and open-source software tool for creating, updating and managing databases, developed by the PostgreSQL Global Development Group.
- FileMaker (database management systems): The computer program FileMaker is a tool for creating, updating and managing databases, developed by the software company FileMaker Inc.
- SQL Server: The computer program SQL Server is a tool for creating, updating and managing databases, developed by the software company Microsoft.
- Linq: The computer language LINQ is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the software company Microsoft.
- Data quality assessment: The process of revealing data issues using โquality indicators, measures and metrics in order to plan data cleansing and data enrichment strategies according to data quality criteria.
- Teradata database: The computer program Teradata Database is a tool for creating, updating and managing databases, developed by the software company Teradata Corporation.
- Oracle relational database: The computer program Oracle Rdb is a tool for creating, updating and managing databases, developed by the software company Oracle.
- N1QL: The computer language N1QL is a query language for retrieval of information from a database and of documents containing the needed information. It is developed by the software company Couchbase.
- SAP data services: The computer program SAP Data Services is a tool for integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure, developed by the software company SAP.
Optional skills and competences
These skills and competences are sometimes, but not always, required for the role of big data archive librarian. However, mastering these skills and competences allows you to have more opportunities for career development.
- Migrate existing data: Apply migration and conversion methods for existing data, in order to transfer or convert data between formats, storage or computer systems.
- Normalise data: Reduce data to their accurate core form (normal forms) in order to achieve such results as minimisation of dependency, elimination of redundancy, increase of consistency.
- Digitise documents: Load analog documents by converting them into a digital format, using specialised hardware and software.
- Integrate ICT data: Combine data from sources to provide unified view of the set of these data.
- Design database scheme: Draft a database scheme by following the Relational Database Management System (RDBMS) rules in order to create a logically arranged group of objects such as tables, columns and processes.
- Develop ICT workflow: Create repeatable patterns of ICT activity within an organisation which enhances the systematic transformations of products, informational processes and services through their production.
- Monitor technology trends: Survey and investigate recent trends and developments in technology. Observe and anticipate their evolution, according to current or future market and business conditions.
- Apply information security policies: Implement policies, methods and regulations for data and information security in order to respect confidentiality, integrity and availability principles.
- Manage data collection systems: Develop and manage methods and strategies used to maximise data quality and statistical efficiency in the collection of data, in order to ensure the gathered data are optimised for further processing.
- Design database backup specifications: Specify procedures to be performed on databases which ensure the copying and archiving of data for possible restoration in case of a data loss event.
- Give live presentation: Deliver a speech or talk in which a new product, service, idea, or piece of work is demonstrated and explained to an audience.
- Perform backups: Implement backup procedures to backup data and systems to ensure permanent and reliable system operation. Execute data backups in order to secure information by copying and archiving to ensure integrity during system integration and after data loss occurrence.
ISCO group and title
3433 – Gallery, museum and library technicians
References
- Big data archive librarian – ESCO