Data Miners Group

BPC-DP Coding Like a Data Miner

A Culturally Relevant Data Analytics Intervention for High School Students

Project Overview Explore Curriculum

Curriculum Overview Curriculum Overview PDF


Coding Like a Data Miner: A Culturally Relevant Data Analytics Intervention for High School Students

The "Coding Like a Data Miner" project is a data science-based computer science (CS) curriculum funded by the National Science Foundation that uses culturally relevant and responsive pedagogies to constructively center diverse learners in STEM. The curriculum was co-created through participatory curriculum design sessions with educators and youth stakeholders, resulting in activities that begin with highly scaffolded learning to increase accessibility and eventually transition to free inquiry, allowing learners to pursue topics along their personal interests, cultural backgrounds, and/or sociopolitical histories.

The curriculum emphasizes student access to datasets from the social media platform Twitter, providing access to data sources that are both familiar to social media natives and expansive across topics and opinions, enabling users to participate along trajectories couched in their personally and culturally relevant interests. The curriculum was structured around the use of Twitter's Application Programming Interface (API), providing an authentic sandbox-like environment (sandbox data science) that retains material, personal and disciplinary agency for learners to go beyond their typical roles as consumers of information to actively serve as producers of knowledge on their own terms with real, complex, and messy data. The use of API access also provides an opportunity for practical computer science skill development. To read more about our project development, check out our publications and presentations repository.

The curriculum consists of 17 modules that iteratively guide learners through authentic data science practices organized by the amount of scaffolding and support offered: Introduction (7 modules), Guided Inquiry (5 modules), Scaffolded Inquiry (3 modules), and Free Inquiry (2 modules). The curriculum overview and learning objectives detailed below also align with the CollegeBoard AP Computer Sciences Principles. For teacher resources, check out our repository of lesson plans associated with each module.

 

Curriculum Overview and Learning Objectives

 

 

Module

Objectives

Introduction

1.1 Curriculum Overview

Provides an overview of curriculum structure and goals

1.2 Introduction to Data Visualization

Students will be able to create simple visualizations in order to better understand data and identify trends

Module 1.2.1

Understand different types of data

Module 1.2.2

Visualize data using Google Earth

Module 1.2.3

Visualize data by creating a word cloud

Module 1.2.4

Visualize one-dimensional data using Excel

Module 1.2.5

Visualize two-dimensional data using Excel

1.3 Introduction to Data Analysis

Students will be able to perform basic data analysis in order to better understand data, its types, and structures

Module 1.3.1

Introduction to data analysis

Module 1.3.2

Introduction to basic data analysis techniques

Module 1.3.3

Introduction to basic quantitative data analysis

Module 1.3.4

Introduction to basic qualitative data analysis

1.4 Introduction to Data Gathering

Students will be able to import and export data using Python in order to manipulate Data using different tools

Module 1.4.1

Introduction to data collection methods

Module 1.4.2

Import and export data using existing code

Module 1.4.3

Collect data using Python

1.5 Introduction to Statistics

Students will be able to use and manipulate existing Python code in order to extract basic statistical information from data

Module 1.5.1

Introduction to basic statistics

Module 1.5.2

Introduction to descriptive statistics

Module 1.5.3

Execute code fragments to calculate descriptive analysis

Module 1.5.4

Introduction to data variability

Module 1.5.5

Execute code to extract data range and interquartile range

Module 1.5.6

Execute code to draw a boxplot diagram

1.6 Introduction to Data Pre-processing

Students will be able to perform basic data preprocessing in order to address missing data, inconsistent data, and noisy data

Module 1.6.1

Introduction to data pre-processing and its importance

Module 1.6.2

Identify missing data and how to deal with it

Module 1.6.3

Identify noisy data and how to deal with it

Module 1.6.4

General data pre-processing methods:

a. Data normalization

b. Data attribute selection

c. Data reduction

1.7 Introduction to Coding

Students will be able to understand basic Python code in order to use Python to manipulate, analyze, and visualize data

Module 1.7.1

Introduction to Google Colab

Module 1.7.2

Introduction to basic coding in Python

Module 1.7.3

Introduction to basic data types in Python (integers, float, dictionary)

Module 1.7.4

Introduction to basic condition structures

Module 1.7.5

Introduction to basic loop structures

Guided Inquiry

2.1 Let’s Gather Some Data

Students will be able to manipulate code run queries against Twitter in order to extract data from Twitter

Module 2.1.1

Understand different data gathering sources

Module 2.1.2

Create a developer account on Twitter

Module 2.1.3

Create an application on Twitter

Module 2.1.4

Set up a connection to Twitter

Module 2.1.5

Set up a search query

Module 2.1.6

Extract data from Twitter

2.2 Let’s Do Preprocessing

Students will be able to understand and manipulate Python code in order to perform basic data preprocessing steps

Module 2.2.1

Use Python code to identify missing data

Module 2.2.2

Use Python code to identify noisy data

Module 2.2.3

Use Python to perform data normalization

Module 2.2.4

Use Python to perform data reduction

 

2.3 Let’s Analyze

Students will be able to apply descriptive statistics in order to analyze and manipulate datasets.

 

Module 2.3.1

Perform basic quantitative data analysis

 

Module 2.3.2

Apply variability analysis

 

Module 2.3.3

Conduct an outlier analysis

 

Module 2.3.3

Conduct variance analysis

 

2.4 Let’s Do Statistics

Students will be able to manipulate existing Python code in order to generate statistical visualizations

 

Module 2.4.1

Manipulate code to visualize statistical distributions on numerical data

 

Module 2.4.2

Manipulate code to visualize statistical distributions on categorical data

 

Module 2.4.3

Manipulate code to visualize a boxplot

 

Module 2.4.4

Manipulate code to visualize statistical relationship among variables

 

2.5 Let’s Visualize

Students will be able to use Python Libraries and IBM Watson Community Portal in order to visualize data using several methods

 

Module 2.5.1

Create data visualizations using Excel advance charts

 

Module 2.5.2

Create and visualize a pivot data

 

Module 2.5.3

Introduction to visualizations using Python Library 1 (Matplotlib)

 

Module 2.5.4

Introduction to visualizations using Python Library 2 (Seaborn library)

Scaffolded Inquiry

3.1 Let’s Mine Twitter Data

Students will be able to set up connection to twitter API in order to extract datasets

Module 3.1.1

Create a developer account on Twitter

Module 3.1.2

Create an application on Twitter

Module 3.1.3

Set up connection to Twitter

Module 3.1.4

Set up a search query

Module 3.1.5

Extract data from Twitter

Module 3.1.6

Summarize data with a worksheet

3.2 Let’s Analyze Twitter Data

Students will be able to identify and apply descriptive statistics and data variability analysis in order to objectively answer questions

Module 3.2.1

Identify and apply descriptive analysis to answer questions

Module 3.2.2

Identify and apply variability analysis to answer questions

Module 3.2.3

Examine data to determine whether outliers exist

3.3 Let’s Visualize Twitter Data

Students will be able to use advanced features in matplotlib and seaborn data visualization libraries in order to construct advanced visualizations

Module 3.3.1

Visualize Twitter data using word cloud

Module 3.3.2

Visualize Twitter data using the Matplotlib library

Module 3.3.3

Visualize Twitter data using the Seaborn library

Free Inquiry

4.1 Choose a Topic for Mining Twitter Data

Students will be able to independently design, implement and report on a line of inquiry based on unique primary source data sets

Module 4.1.1

Select a topic or domain

Module 4.1.2

Identify questions

Module 4.1.3

Mine datasets that pertain to the questions

Module 4.1.4

Analyze datasets in order to objectively answer questions

4.2 Topic Sharing and Group Discussion

Students will be able to present findings for a line of inquiry based on a primary data source drawn from a social media platform

Module 4.2.1

Open discussion on topics

 

Module 4.2.2

Open discussion on datasets

 

Module 4.2.3

Open discussion on questions

 

Module 4.2.4

Open discussion on produced results

 


Data Miners Group, University of Texas at El Paso