A Culturally Relevant Data Analytics Intervention for High School Students
Project Overview Explore CurriculumCoding Like a Data Miner: A Culturally Relevant Data Analytics Intervention for High School Students
The "Coding Like a Data Miner" project is a data science-based computer science (CS) curriculum funded by the National Science Foundation that uses culturally relevant and responsive pedagogies to constructively center diverse learners in STEM. The curriculum was co-created through participatory curriculum design sessions with educators and youth stakeholders, resulting in activities that begin with highly scaffolded learning to increase accessibility and eventually transition to free inquiry, allowing learners to pursue topics along their personal interests, cultural backgrounds, and/or sociopolitical histories.
The curriculum emphasizes student access to datasets from the social media platform Twitter, providing access to data sources that are both familiar to social media natives and expansive across topics and opinions, enabling users to participate along trajectories couched in their personally and culturally relevant interests. The curriculum was structured around the use of Twitter's Application Programming Interface (API), providing an authentic sandbox-like environment (sandbox data science) that retains material, personal and disciplinary agency for learners to go beyond their typical roles as consumers of information to actively serve as producers of knowledge on their own terms with real, complex, and messy data. The use of API access also provides an opportunity for practical computer science skill development. To read more about our project development, check out our publications and presentations repository.
The curriculum consists of 17 modules that iteratively guide learners through authentic data science practices organized by the amount of scaffolding and support offered: Introduction (7 modules), Guided Inquiry (5 modules), Scaffolded Inquiry (3 modules), and Free Inquiry (2 modules). The curriculum overview and learning objectives detailed below also align with the CollegeBoard AP Computer Sciences Principles. For teacher resources, check out our repository of lesson plans associated with each module.
Curriculum Overview and Learning Objectives
|
Module |
Objectives |
Introduction |
1.1 Curriculum Overview |
Provides an overview of curriculum structure and goals |
1.2 Introduction to Data Visualization |
Students will be able to create simple visualizations in order to better understand data and identify trends |
|
Module 1.2.1 |
Understand different types of data |
|
Module 1.2.2 |
Visualize data using Google Earth |
|
Module 1.2.3 |
Visualize data by creating a word cloud |
|
Module 1.2.4 |
Visualize one-dimensional data using Excel |
|
Module 1.2.5 |
Visualize two-dimensional data using Excel |
|
1.3 Introduction to Data Analysis |
Students will be able to perform basic data analysis in order to better understand data, its types, and structures |
|
Module 1.3.1 |
Introduction to data analysis |
|
Module 1.3.2 |
Introduction to basic data analysis techniques |
|
Module 1.3.3 |
Introduction to basic quantitative data analysis |
|
Module 1.3.4 |
Introduction to basic qualitative data analysis |
|
1.4 Introduction to Data Gathering |
Students will be able to import and export data using Python in order to manipulate Data using different tools |
|
Module 1.4.1 |
Introduction to data collection methods |
|
Module 1.4.2 |
Import and export data using existing code |
|
Module 1.4.3 |
Collect data using Python |
|
1.5 Introduction to Statistics |
Students will be able to use and manipulate existing Python code in order to extract basic statistical information from data |
|
Module 1.5.1 |
Introduction to basic statistics |
|
Module 1.5.2 |
Introduction to descriptive statistics |
|
Module 1.5.3 |
Execute code fragments to calculate descriptive analysis |
|
Module 1.5.4 |
Introduction to data variability |
|
Module 1.5.5 |
Execute code to extract data range and interquartile range |
|
Module 1.5.6 |
Execute code to draw a boxplot diagram |
|
1.6 Introduction to Data Pre-processing |
Students will be able to perform basic data preprocessing in order to address missing data, inconsistent data, and noisy data |
|
Module 1.6.1 |
Introduction to data pre-processing and its importance |
|
Module 1.6.2 |
Identify missing data and how to deal with it |
|
Module 1.6.3 |
Identify noisy data and how to deal with it |
|
Module 1.6.4 |
General data pre-processing methods: a. Data normalization b. Data attribute selection c. Data reduction |
|
1.7 Introduction to Coding |
Students will be able to understand basic Python code in order to use Python to manipulate, analyze, and visualize data |
|
Module 1.7.1 |
Introduction to Google Colab |
|
Module 1.7.2 |
Introduction to basic coding in Python |
|
Module 1.7.3 |
Introduction to basic data types in Python (integers, float, dictionary) |
|
Module 1.7.4 |
Introduction to basic condition structures |
|
Module 1.7.5 |
Introduction to basic loop structures |
|
Guided Inquiry |
2.1 Let’s Gather Some Data |
Students will be able to manipulate code run queries against Twitter in order to extract data from Twitter |
Module 2.1.1 |
Understand different data gathering sources |
|
Module 2.1.2 |
Create a developer account on Twitter |
|
Module 2.1.3 |
Create an application on Twitter |
|
Module 2.1.4 |
Set up a connection to Twitter |
|
Module 2.1.5 |
Set up a search query |
|
Module 2.1.6 |
Extract data from Twitter |
|
2.2 Let’s Do Preprocessing |
Students will be able to understand and manipulate Python code in order to perform basic data preprocessing steps |
|
Module 2.2.1 |
Use Python code to identify missing data |
|
Module 2.2.2 |
Use Python code to identify noisy data |
|
Module 2.2.3 |
Use Python to perform data normalization |
|
Module 2.2.4 |
Use Python to perform data reduction |
|
|
2.3 Let’s Analyze |
Students will be able to apply descriptive statistics in order to analyze and manipulate datasets. |
|
Module 2.3.1 |
Perform basic quantitative data analysis |
|
Module 2.3.2 |
Apply variability analysis |
|
Module 2.3.3 |
Conduct an outlier analysis |
|
Module 2.3.3 |
Conduct variance analysis |
|
2.4 Let’s Do Statistics |
Students will be able to manipulate existing Python code in order to generate statistical visualizations |
|
Module 2.4.1 |
Manipulate code to visualize statistical distributions on numerical data |
|
Module 2.4.2 |
Manipulate code to visualize statistical distributions on categorical data |
|
Module 2.4.3 |
Manipulate code to visualize a boxplot |
|
Module 2.4.4 |
Manipulate code to visualize statistical relationship among variables |
|
2.5 Let’s Visualize |
Students will be able to use Python Libraries and IBM Watson Community Portal in order to visualize data using several methods |
|
Module 2.5.1 |
Create data visualizations using Excel advance charts |
|
Module 2.5.2 |
Create and visualize a pivot data |
|
Module 2.5.3 |
Introduction to visualizations using Python Library 1 (Matplotlib) |
|
Module 2.5.4 |
Introduction to visualizations using Python Library 2 (Seaborn library) |
Scaffolded Inquiry |
3.1 Let’s Mine Twitter Data |
Students will be able to set up connection to twitter API in order to extract datasets |
Module 3.1.1 |
Create a developer account on Twitter |
|
Module 3.1.2 |
Create an application on Twitter |
|
Module 3.1.3 |
Set up connection to Twitter |
|
Module 3.1.4 |
Set up a search query |
|
Module 3.1.5 |
Extract data from Twitter |
|
Module 3.1.6 |
Summarize data with a worksheet |
|
3.2 Let’s Analyze Twitter Data |
Students will be able to identify and apply descriptive statistics and data variability analysis in order to objectively answer questions |
|
Module 3.2.1 |
Identify and apply descriptive analysis to answer questions |
|
Module 3.2.2 |
Identify and apply variability analysis to answer questions |
|
Module 3.2.3 |
Examine data to determine whether outliers exist |
|
3.3 Let’s Visualize Twitter Data |
Students will be able to use advanced features in matplotlib and seaborn data visualization libraries in order to construct advanced visualizations |
|
Module 3.3.1 |
Visualize Twitter data using word cloud |
|
Module 3.3.2 |
Visualize Twitter data using the Matplotlib library |
|
Module 3.3.3 |
Visualize Twitter data using the Seaborn library |
|
Free Inquiry |
4.1 Choose a Topic for Mining Twitter Data |
Students will be able to independently design, implement and report on a line of inquiry based on unique primary source data sets |
Module 4.1.1 |
Select a topic or domain |
|
Module 4.1.2 |
Identify questions |
|
Module 4.1.3 |
Mine datasets that pertain to the questions |
|
Module 4.1.4 |
Analyze datasets in order to objectively answer questions |
|
4.2 Topic Sharing and Group Discussion |
Students will be able to present findings for a line of inquiry based on a primary data source drawn from a social media platform |
|
Module 4.2.1 |
Open discussion on topics |
|
|
Module 4.2.2 |
Open discussion on datasets |
|
Module 4.2.3 |
Open discussion on questions |
|
Module 4.2.4 |
Open discussion on produced results |