CheckTick

CheckTick requires scheduled tasks for data governance operations, housekeeping, and external data syncing. This guide explains how to set up these tasks on different hosting platforms.

Overview

CheckTick uses seven scheduled tasks:

The process_data_governance management command runs daily to:

Send deletion warnings - Email notifications 30 days, 7 days, and 1 day before automatic deletion
Soft-delete expired surveys - Automatically soft-delete surveys that have reached their retention period
Hard-delete surveys - Permanently delete surveys 30 days after soft deletion

Legal Requirement: This task is required for GDPR compliance. Failure to run it may result in data being retained longer than legally allowed.

2. Survey Progress Cleanup (Recommended)

The cleanup_survey_progress management command runs daily to:

Delete expired progress records - Removes incomplete survey progress older than 30 days
Free up database storage - Keeps the database lean by removing stale session data

Recommended: While not legally required, this prevents database bloat and improves performance. Progress records are only needed while users are actively completing surveys.

3. External Dataset Sync (Recommended)

The sync_external_datasets management command runs daily to:

Update hospital lists - Syncs hospitals, NHS trusts, and other organisational data from RCPCH API
Maintain accuracy - Ensures dropdown options reflect current NHS organisational structure
Enable offline use - Stores data locally so surveys work without API dependency

Recommended: Run this daily to keep external datasets fresh. Datasets are used in prefilled dropdown fields. Without periodic sync, data may become stale (hospitals close, trusts merge, etc.).

Initial Setup: Just run the sync command once to populate datasets (creates records automatically):

# One-time: Create datasets and populate data (takes 2-3 minutes)
python manage.py sync_external_datasets

4. NHS Data Dictionary Sync (Recommended)

The sync_nhs_dd_datasets management command runs weekly to:

Scrape standardized lists - Fetches codes and values from NHS Data Dictionary website
Update existing datasets - Keeps NHS DD datasets current with official sources
Maintain accuracy - Ensures dropdown options match authoritative NHS standards

Recommended: Run this weekly to keep NHS DD datasets fresh. See NHS Data Dictionary Datasets for the full list of scraped datasets.

Initial Setup: Just run the sync command once to populate datasets (creates records automatically):

# One-time: Create datasets and scrape data (takes 1-2 minutes)
python manage.py sync_nhs_dd_datasets

Maintenance:

# Check which datasets would be synced
python manage.py sync_nhs_dd_datasets --dry-run

# Force re-scrape all datasets (even if recently scraped)
python manage.py sync_nhs_dd_datasets --force

# Sync a single dataset
python manage.py sync_nhs_dd_datasets --dataset smoking_status_code

5. Global Question Group Templates Sync (Recommended)

The sync_global_question_group_templates management command runs daily to:

Import global templates - Syncs question group templates from the docs/question-group-templates/ folder
Update existing templates - Keeps templates current with repository changes
Enable template library - Makes reusable question groups available for import

Recommended: Run this daily if you maintain custom global templates in your repository. If you're using the standard CheckTick distribution without custom templates, this task is optional.

Initial Setup: Run the sync command once to populate templates (creates records automatically):

# One-time: Create templates from markdown files
python manage.py sync_global_question_group_templates

Maintenance:

# Check which templates would be synced
python manage.py sync_global_question_group_templates --dry-run

# Force update all templates (even if unchanged)
python manage.py sync_global_question_group_templates --force

6. Recovery Time Delay Processing (Required if Recovery Enabled)

The process_recovery_time_delays management command runs frequently to:

Check expired time delays - Finds recovery requests where the mandatory waiting period has elapsed
Update status - Transitions requests from "In Time Delay" to "Ready for Execution"
Send notifications - Alerts administrators that a recovery is ready to execute
Create audit entries - Logs the automatic transition for regulatory compliance

Required for Recovery: If you use the ethical key recovery feature, this command ensures recovery requests proceed after their mandatory waiting period (24-48 hours). Without it, approved recovery requests will remain stuck in the time delay state.

Schedule: Run every 5 minutes (*/5 * * * *) to ensure timely processing. The command is idempotent and lightweight - it only processes requests whose time delays have expired.

# Check for expired time delays (dry-run)
python manage.py process_recovery_time_delays --dry-run

# Process expired time delays
python manage.py process_recovery_time_delays --verbose

Prerequisites

CheckTick deployed and running
Email configured (for sending deletion warnings)

7. Subscription Expiry Processing (Required for Billing)

The process_expired_subscriptions management command runs daily to:

Process expired subscriptions - Downgrades users whose subscription period has ended
Handle past due accounts - Downgrades users who failed to pay after the grace period (default: 7 days)
Auto-close excess surveys - Closes (locks) surveys exceeding the free tier limit (oldest first)
Send notifications - Emails users about their account changes

Required for Billing: This command ensures subscription expirations and failed payments result in appropriate downgrades. Without it, users with expired subscriptions would retain paid features indefinitely.

Note: This task is NOT needed for self-hosted instances where SELF_HOSTED=True (billing is disabled).

Schedule: Run daily, typically at a quiet hour (e.g., 3am).

# Check what would be processed (dry-run)
python manage.py process_expired_subscriptions --dry-run --verbose

# Process expired subscriptions
python manage.py process_expired_subscriptions

# Custom grace period (default is 7 days)
python manage.py process_expired_subscriptions --grace-days=10

What happens when a subscription expires:

Account is downgraded to FREE tier
Surveys exceeding the limit (3) are automatically closed (not deleted)
Closed surveys remain accessible in read-only mode
User can still view and export all data
User can reopen surveys by upgrading again
Access to your hosting platform's scheduling features

8. Data Subject Request

The process_dsr_deadlines management command runs daily to:

Track DSR requests - if responders ask for the data to be amended/removed the controller has 30 days to respond

What happens if the 30 days elapses and the issue is unresolved?

the response is frozen
emails are sent to the controller to inform them

Platform-Specific Setup

Northflank (Recommended)

Northflank provides native cron job support, making this the simplest option.

1. Create Data Governance Cron Job

Go to your Northflank project
Click "Add Service" → "Cron Job"
Configure the job:
Name: checktick-data-governance
Docker Image: Use the same image as your web service (e.g., ghcr.io/eatyourpeas/checktick:latest)
Schedule: 0 2 * * * (runs at 2 AM UTC daily)
Command: python manage.py process_data_governance

2. Create Survey Progress Cleanup Cron Job

Click "Add Service" → "Cron Job" again
Configure the job:
Name: checktick-progress-cleanup
Docker Image: Use the same image as your web service
Schedule: 0 3 * * * (runs at 3 AM UTC daily, after data governance)
Command: python manage.py cleanup_survey_progress

3. Create External Dataset Sync Cron Job

Click "Add Service" → "Cron Job" again
Configure the job:
Name: checktick-dataset-sync
Docker Image: Use the same image as your web service
Schedule: 0 4 * * * (runs at 4 AM UTC daily)
Command: python manage.py sync_external_datasets

4. Create NHS DD Sync Cron Job

Click "Add Service" → "Cron Job" again
Configure the job:
Name: checktick-nhs-dd-sync
Docker Image: Use the same image as your web service
Schedule: 0 5 * * 0 (runs at 5 AM UTC every Sunday - weekly)
Command: python manage.py sync_nhs_dd_datasets

5. Create Global Templates Sync Cron Job

Click "Add Service" → "Cron Job" again
Configure the job:
Name: checktick-templates-sync
Docker Image: Use the same image as your web service
Schedule: 0 6 * * * (runs at 6 AM UTC daily)
Command: python manage.py sync_global_question_group_templates

6. Create Recovery Time Delay Cron Job

Click "Add Service" → "Cron Job" again
Configure the job:
Name: checktick-recovery-time-delays
Docker Image: Use the same image as your web service
Schedule: */5 * * * * (runs every 5 minutes)
Command: python manage.py process_recovery_time_delays

7. Copy Environment Variables

All cron jobs need the same environment variables as your web service:

In Northflank, go to your web service → Environment
Copy all environment variables
Go to each cron job service → Environment
Paste the variables

Critical variables needed:

DATABASE_URL
SECRET_KEY
EMAIL_HOST, EMAIL_PORT, EMAIL_HOST_USER, EMAIL_HOST_PASSWORD (for data governance)
DEFAULT_FROM_EMAIL (for data governance)
SITE_URL (for email links in data governance)
EXTERNAL_DATASET_API_URL, EXTERNAL_DATASET_API_KEY (for dataset sync - optional, defaults to RCPCH API)

7. Deploy and Test

Deploy all cron job services
Test them manually via Northflank dashboard: Jobs → Run Now
Check logs to verify successful execution
Monitor the History tab for scheduled runs

Important: For external dataset sync, run the initial setup command once:

# In your web service terminal (via Northflank shell or kubectl exec)
python manage.py sync_external_datasets

Northflank Advantages:

✅ No extra containers running 24/7
✅ Easy manual testing via UI
✅ Built-in logging and monitoring
✅ No additional cost (same compute as web service, but only active for ~1-5 minutes daily)

Docker Compose (Local/VPS)

If you're self-hosting with Docker Compose on a VPS or dedicated server, use the system's cron.

1. Create Cron Scripts

Create /usr/local/bin/checktick-data-governance.sh:

#!/bin/bash
# CheckTick Data Governance Cron Job
# Runs daily at 2 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the management command
docker compose exec -T web python manage.py process_data_governance >> /var/log/checktick/data-governance.log 2>&1

# Exit with the command's exit code
exit $?

Create /usr/local/bin/checktick-progress-cleanup.sh:

#!/bin/bash
# CheckTick Survey Progress Cleanup Cron Job
# Runs daily at 3 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the cleanup command
docker compose exec -T web python manage.py cleanup_survey_progress >> /var/log/checktick/progress-cleanup.log 2>&1

# Exit with the command's exit code
exit $?

Create /usr/local/bin/checktick-dataset-sync.sh:

#!/bin/bash
# CheckTick External Dataset Sync Cron Job
# Runs daily at 4 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the sync command
docker compose exec -T web python manage.py sync_external_datasets >> /var/log/checktick/dataset-sync.log 2>&1

# Exit with the command's exit code
exit $?

Create /usr/local/bin/checktick-templates-sync.sh:

#!/bin/bash
# CheckTick Global Templates Sync Cron Job
# Runs daily at 6 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the sync command
docker compose exec -T web python manage.py sync_global_question_group_templates >> /var/log/checktick/templates-sync.log 2>&1

# Exit with the command's exit code
exit $?

Make them executable:

chmod +x /usr/local/bin/checktick-data-governance.sh
chmod +x /usr/local/bin/checktick-progress-cleanup.sh
chmod +x /usr/local/bin/checktick-dataset-sync.sh
chmod +x /usr/local/bin/checktick-templates-sync.sh

2. Add to System Crontab

sudo crontab -e

Add these lines:

# CheckTick Data Governance - Daily at 2 AM UTC
0 2 * * * /usr/local/bin/checktick-data-governance.sh

# CheckTick Survey Progress Cleanup - Daily at 3 AM UTC
0 3 * * * /usr/local/bin/checktick-progress-cleanup.sh

# CheckTick External Dataset Sync - Daily at 4 AM UTC
0 4 * * * /usr/local/bin/checktick-dataset-sync.sh

# CheckTick Global Templates Sync - Daily at 6 AM UTC
0 6 * * * /usr/local/bin/checktick-templates-sync.sh

3. Create Log Directory

sudo mkdir -p /var/log/checktick
sudo chown $USER:$USER /var/log/checktick

4. Test the Scripts

# Test data governance
/usr/local/bin/checktick-data-governance.sh
tail -f /var/log/checktick/data-governance.log

# Test progress cleanup
/usr/local/bin/checktick-progress-cleanup.sh
tail -f /var/log/checktick/progress-cleanup.log

# Test dataset sync
/usr/local/bin/checktick-dataset-sync.sh
tail -f /var/log/checktick/dataset-sync.log

# Test templates sync
/usr/local/bin/checktick-templates-sync.sh
tail -f /var/log/checktick/templates-sync.log

Important: For external dataset sync, run the initial setup once:

cd /path/to/your/checktick-app
docker compose exec web python manage.py sync_external_datasets

Kubernetes

If you're running CheckTick in Kubernetes, use a CronJob resource.

Create a CronJob manifest:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: checktick-data-governance
  namespace: checktick
spec:
  schedule: "0 2 * * *"  # 2 AM UTC daily
  concurrencyPolicy: Forbid  # Don't run if previous job still running
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: data-governance
            image: ghcr.io/eatyourpeas/checktick:latest
            command:
            - python
            - manage.py
            - process_data_governance
            envFrom:
            - configMapRef:
                name: checktick-config
            - secretRef:
                name: checktick-secrets
            resources:
              requests:
                memory: "256Mi"
                cpu: "100m"
              limits:
                memory: "512Mi"
                cpu: "500m"

Apply the manifest:

kubectl apply -f checktick-cronjob.yaml

Test manually:

# Trigger a manual run
kubectl create job --from=cronjob/checktick-data-governance manual-test-1

# Check logs
kubectl logs -l job-name=manual-test-1

AWS ECS/Fargate

Use AWS EventBridge (CloudWatch Events) to trigger scheduled ECS tasks.

1. Create EventBridge Rule

aws events put-rule \
  --name checktick-data-governance-daily \
  --schedule-expression "cron(0 2 * * ? *)" \
  --description "Run CheckTick data governance tasks daily at 2 AM UTC"

2. Add ECS Task as Target

aws events put-targets \
  --rule checktick-data-governance-daily \
  --targets "Id"="1","Arn"="arn:aws:ecs:region:account:cluster/checktick-cluster","RoleArn"="arn:aws:iam::account:role/ecsEventsRole","EcsParameters"="{TaskDefinitionArn=arn:aws:ecs:region:account:task-definition/checktick-web:latest,LaunchType=FARGATE,NetworkConfiguration={awsvpcConfiguration={Subnets=[subnet-xxx],SecurityGroups=[sg-xxx],AssignPublicIp=ENABLED}},TaskCount=1,PlatformVersion=LATEST}"

3. Override Task Command

In your task definition, set the command override:

{
  "overrides": {
    "containerOverrides": [
      {
        "name": "checktick-web",
        "command": ["python", "manage.py", "process_data_governance"]
      }
    ]
  }
}

Heroku

Heroku provides the Scheduler add-on for running periodic tasks.

1. Add Scheduler Add-on

heroku addons:create scheduler:standard

2. Configure Job

heroku addons:open scheduler

In the web interface: - Command: python manage.py process_data_governance - Frequency: Daily at 2:00 AM (UTC)

Railway

Railway doesn't have native cron support, so use an external service or run a background worker.

Option 1: Use GitHub Actions (Recommended)

Create .github/workflows/data-governance-cron.yml:

name: Data Governance Cron

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM UTC daily
  workflow_dispatch:  # Allow manual trigger

jobs:
  run-data-governance:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger Railway Command
        run: |
          curl -X POST \
            -H "Authorization: Bearer ${{ secrets.RAILWAY_API_TOKEN }}" \
            -H "Content-Type: application/json" \
            -d '{"command": "python manage.py process_data_governance"}' \
            https://backboard.railway.app/graphql/v2

Option 2: Use EasyCron or Similar Service

Sign up for EasyCron or similar
Create a webhook endpoint in your CheckTick app
Schedule the webhook to run daily

Command Reference

Data Governance Command

# Run data governance tasks
python manage.py process_data_governance

# Dry-run mode (show what would be done without making changes)
python manage.py process_data_governance --dry-run

# Verbose output (detailed logging)
python manage.py process_data_governance --verbose

Example Output:

Starting data governance processing at 2024-10-26 02:00:00

--- Deletion Warnings ---
Sent 3 deletion warnings:
  - 30-day warnings: 2
  - 7-day warnings: 1
  - 1-day warnings: 0

--- Automatic Deletions ---
Soft deleted: 1 surveys
Hard deleted: 0 surveys

⚠️  1 surveys were deleted. Check audit logs for details.

Data governance processing completed at 2024-10-26 02:00:15

Survey Progress Cleanup Command

# Clean up expired survey progress records
python manage.py cleanup_survey_progress

# Dry-run mode (show what would be deleted without making changes)
python manage.py cleanup_survey_progress --dry-run

# Verbose output (detailed logging)
python manage.py cleanup_survey_progress --verbose

Example Output:

Starting survey progress cleanup...
Found 15 expired progress records (older than 30 days)
Deleted 15 expired survey progress records
Cleanup completed successfully

What gets deleted:

Anonymous user progress (session-based) older than 30 days
Authenticated user progress older than 30 days
Token-based progress older than 30 days
Only incomplete surveys (completed submissions are already deleted on submission)

Testing

Test in Development

Data Governance:

# Test with dry-run (safe, no changes)
docker compose exec web python manage.py process_data_governance --dry-run --verbose

# Test for real (only if you have test data)
docker compose exec web python manage.py process_data_governance --verbose

Survey Progress Cleanup:

# Test with dry-run (safe, no changes)
docker compose exec web python manage.py cleanup_survey_progress --dry-run --verbose

# Test for real (only if you have test data)
docker compose exec web python manage.py cleanup_survey_progress --verbose

Test in Production

First, use dry-run mode:

```bash # Data governance python manage.py process_data_governance --dry-run --verbose

# Progress cleanup python manage.py cleanup_survey_progress --dry-run --verbose ```

Review the output - it will show what surveys/progress records would be affected
Run for real (on your scheduled platform)
Monitor logs after the first scheduled run

Monitoring

Check Execution Logs

Northflank:

Go to your cron job service → History → View logs

Docker Compose:

# Data governance logs
tail -f /var/log/checktick/data-governance.log

# Progress cleanup logs
tail -f /var/log/checktick/progress-cleanup.log

Kubernetes:

# Data governance
kubectl logs -l job-name=checktick-data-governance --tail=100

# Progress cleanup
kubectl logs -l job-name=checktick-progress-cleanup --tail=100

Audit Trail

Survey Deletions:

All deletions are logged in the database. Check via Django admin:

# In Django shell
python manage.py shell

from checktick_app.surveys.models import Survey
from django.utils import timezone
from datetime import timedelta

# Check recently soft-deleted surveys
Survey.objects.filter(
    deleted_at__gte=timezone.now() - timedelta(days=7)
).values('name', 'deleted_at', 'deletion_date')

# Check surveys due for deletion soon
Survey.objects.filter(
    deletion_date__lte=timezone.now() + timedelta(days=7),
    deleted_at__isnull=True
).values('name', 'deletion_date', 'retention_months')

Progress Cleanup:

Monitor the SurveyProgress table size:

# In Django shell
from checktick_app.surveys.models import SurveyProgress
from django.utils import timezone
from datetime import timedelta

# Count total progress records
print(f"Total progress records: {SurveyProgress.objects.count()}")

# Count expired records (ready for cleanup)
expired = SurveyProgress.objects.filter(
    expires_at__lt=timezone.now()
)
print(f"Expired records: {expired.count()}")

# Count by type
authenticated = SurveyProgress.objects.filter(user__isnull=False).count()
anonymous = SurveyProgress.objects.filter(session_key__isnull=False).count()
token_based = SurveyProgress.objects.filter(access_token__isnull=False).count()
print(f"Authenticated: {authenticated}, Anonymous: {anonymous}, Token-based: {token_based}")

Troubleshooting

No Emails Being Sent

Check email configuration:

# Test email from Django shell
python manage.py shell
>>> from django.core.mail import send_mail
>>> send_mail('Test', 'Testing', '[email protected]', ['[email protected]'])

Common issues: - Missing EMAIL_HOST or EMAIL_PORT environment variables - Incorrect SMTP credentials - Missing SITE_URL (emails include survey links)

Surveys Not Being Deleted

Check for legal holds:

from checktick_app.surveys.models import Survey, LegalHold

# Find surveys with active legal holds
Survey.objects.filter(legal_hold__removed_at__isnull=True)

Surveys with active legal holds are intentionally skipped from automatic deletion.

Command Fails Silently

Run with verbose output:

python manage.py process_data_governance --verbose

Check Python errors: - Database connection issues - Missing environment variables - Permissions problems

Security Considerations

Environment Variables

The cron job needs access to: - Database credentials (via DATABASE_URL) - Email credentials (for sending notifications) - Django SECRET_KEY (for encryption/signing)

Never log sensitive environment variables!

Execution Isolation

Cron jobs should run in the same network as your database
Use read-only database credentials if possible for reporting
Consider separate logging for scheduled tasks

FAQ

Q: What happens if the cron job fails?

A: The next day's run will process any missed deletions. Surveys won't be deleted prematurely - only those past their deletion_date.

Q: Can I change the schedule?

A: Yes, but daily at 2 AM UTC is recommended for: - Off-peak hours (less load) - Predictable timing for users - Allows overnight processing before business hours

Q: What timezone does the schedule use?

A: All schedules use UTC. Django's deletion_date is also stored in UTC, so the system is timezone-aware.

Q: How long does the command take to run?

A: Typically 30-60 seconds for most deployments. Scales with: - Number of surveys approaching deletion - Email sending speed - Database query performance

Q: Can I disable automatic deletion?

A: No - automatic deletion is required for GDPR compliance. However, you can:

Apply legal holds to prevent specific surveys from being deleted
Extend retention periods (up to 24 months)
Export data before deletion

Q: Why do I need the progress cleanup job?

A: The cleanup job prevents database bloat by removing old progress records. Without it:

Your database will grow indefinitely with incomplete survey sessions
Performance may degrade over time
Storage costs will increase

The cleanup is safe because it only removes progress for incomplete surveys older than 30 days.

Q: What if someone was working on a survey and their progress gets deleted?

A: Progress records expire after 30 days of inactivity. This is intentional:

Most surveys are completed within hours or days, not months
After 30 days, the session is likely abandoned
Users can still start a new submission if needed
Only the progress draft is deleted, not any completed submissions

Q: Do both cron jobs need to run?

Data governance: YES - legally required for GDPR compliance
Progress cleanup: RECOMMENDED - prevents database bloat, but not legally required

Next Steps

Survey Progress Tracking - Learn about the progress feature
Data Governance Overview - Understand the retention policy
Data Governance Retention - Learn about retention periods
Self-Hosting Backup - Set up automated backups
Email Notifications - Configure email delivery