Monitoring Data Quality Metrics
Expert-defined terms from the Professional Certificate in Data Quality Assurance using AI in Education course at Greenwich School of Business and Finance. Free to read, free to share, paired with a globally recognised certification pathway.
Accuracy #
Accuracy
: The degree to which data correctly describes the real-world object or event it… #
High accuracy indicates that the data closely matches the actual value.
Challenge #
Ensure data accuracy in large datasets with various sources and formats.
Example #
A student's address data with 95% accuracy would have five incorrectly recorded addresses out of a hundred.
Completeness #
Completeness
: The extent to which data is populated in a dataset, meaning there are no missi… #
: The extent to which data is populated in a dataset, meaning there are no missing or null values.
Challenge #
Maintaining completeness in data entry processes and ensuring consistency across data sources.
Example #
A dataset of student test scores with 80% completeness would have twenty scores missing out of a hundred.
Consistency #
Consistency
: The degree to which data is presented in a uniform manner and follows a specif… #
: The degree to which data is presented in a uniform manner and follows a specific format or pattern.
Challenge #
Ensuring consistency when multiple users or sources contribute data.
Example #
A dataset of student ages should consistently use the format "XX years" (e.g., "15 years," not "15 yrs" or "15Y").
Data governance #
Data governance
Challenge #
Implementing effective data governance to ensure high-quality data in a rapidly changing technological landscape.
Example #
Data governance policies may include rules for data access, data sharing, and data security.
Data lineage #
Data lineage
: The life cycle of data, including its origin, any transformations, and its cur… #
: The life cycle of data, including its origin, any transformations, and its current location.
Challenge #
Tracking data lineage in complex systems with multiple data sources and transformations.
Example #
Data lineage tools can help trace data from its original source through various transformations to its final destination.
Data quality #
Data quality
: The overall condition of data, encompassing its accuracy, completeness, consis… #
: The overall condition of data, encompassing its accuracy, completeness, consistency, timeliness, and relevance.
Challenge #
Ensuring high data quality in large, dynamic datasets.
Example #
A data quality dashboard can provide real-time insights into data quality metrics to help identify and address issues.
Data quality metrics #
Data quality metrics
: Measurable attributes of data that indicate its overall quality, such as accur… #
: Measurable attributes of data that indicate its overall quality, such as accuracy, completeness, consistency, and timeliness.
Challenge #
Defining and tracking appropriate data quality metrics for specific use cases.
Example #
Data quality metrics for student performance data might include accuracy, completeness, consistency, and timeliness.
Data quality report #
Data quality report
: A document or visualization that summarizes data quality metrics, often includ… #
: A document or visualization that summarizes data quality metrics, often including trends, issues, and recommendations for improvement.
Challenge #
Creating meaningful and actionable data quality reports that effectively communicate data quality issues and solutions.
Example #
A data quality report for a school district might include a dashboard with data quality metrics, trends, and comparisons to benchmarks.
Data validation #
Data validation
: The process of checking data for errors, inconsistencies, or missing values #
: The process of checking data for errors, inconsistencies, or missing values.
Challenge #
Implementing automated data validation processes to ensure high data quality and efficiency.
Example #
Data validation rules might include checks for data type, format, range, and completeness.
Precision #
Precision
: The degree to which data is free from random errors and closely matches the ac… #
: The degree to which data is free from random errors and closely matches the actual value.
Challenge #
Ensuring high precision in data collection and analysis processes.
Example #
Precision in student test scores might be measured as the standard deviation of scores around the mean.
Relevance #
Relevance
: The degree to which data is applicable, useful, and meaningful for a specific… #
: The degree to which data is applicable, useful, and meaningful for a specific purpose or context.
Challenge #
Ensuring data relevance in a rapidly changing environment with evolving data needs.
Example #
A dataset of student demographics might be relevant for analyzing achievement gaps, but not for predicting student success.
Standardization #
Standardization
: The process of defining and applying consistent data formats, structures, and… #
: The process of defining and applying consistent data formats, structures, and conventions.
Challenge #
Implementing standardization across multiple data sources and stakeholders.
Example #
Standardization might include defining consistent data formats for dates, addresses, and phone numbers.
Timeliness #
Timeliness
: The degree to which data is available and up-to-date for its intended use #
: The degree to which data is available and up-to-date for its intended use.
Challenge #
Ensuring timely data availability in large, complex systems.
Example #
Timeliness in student performance data might be measured as the time between data collection and data availability for analysis.
Validity #
Validity
: The degree to which data conforms to established rules, conventions, or standa… #
: The degree to which data conforms to established rules, conventions, or standards.
Challenge #
Ensuring data validity in complex data environments with multiple sources and formats.
Example #
Validity in student data might include checks for data type, format, and completeness.