CheckTick requires scheduled tasks for data governance operations, housekeeping, and external data syncing. This guide explains how to set up these tasks on different hosting platforms.

Overview

CheckTick uses six scheduled tasks:

1. Data Governance (Required for GDPR)

The process_data_governance management command runs daily to:

  1. Send deletion warnings - Email notifications 30 days, 7 days, and 1 day before automatic deletion
  2. Soft-delete expired surveys - Automatically soft-delete surveys that have reached their retention period
  3. Hard-delete surveys - Permanently delete surveys 30 days after soft deletion

Legal Requirement: This task is required for GDPR compliance. Failure to run it may result in data being retained longer than legally allowed.

The cleanup_survey_progress management command runs daily to:

  1. Delete expired progress records - Removes incomplete survey progress older than 30 days
  2. Free up database storage - Keeps the database lean by removing stale session data

Recommended: While not legally required, this prevents database bloat and improves performance. Progress records are only needed while users are actively completing surveys.

The sync_external_datasets management command runs daily to:

  1. Update hospital lists - Syncs hospitals, NHS trusts, and other organizational data from RCPCH API
  2. Maintain accuracy - Ensures dropdown options reflect current NHS organizational structure
  3. Enable offline use - Stores data locally so surveys work without API dependency

Recommended: Run this daily to keep external datasets fresh. Datasets are used in prefilled dropdown fields. Without periodic sync, data may become stale (hospitals close, trusts merge, etc.).

Initial Setup: Just run the sync command once to populate datasets (creates records automatically):

# One-time: Create datasets and populate data (takes 2-3 minutes)
python manage.py sync_external_datasets

The sync_nhs_dd_datasets management command runs weekly to:

  1. Scrape standardized lists - Fetches codes and values from NHS Data Dictionary website
  2. Update existing datasets - Keeps NHS DD datasets current with official sources
  3. Maintain accuracy - Ensures dropdown options match authoritative NHS standards

Recommended: Run this weekly to keep NHS DD datasets fresh. See NHS Data Dictionary Datasets for the full list of scraped datasets.

Initial Setup: Just run the sync command once to populate datasets (creates records automatically):

# One-time: Create datasets and scrape data (takes 1-2 minutes)
python manage.py sync_nhs_dd_datasets

Maintenance:

# Check which datasets would be synced
python manage.py sync_nhs_dd_datasets --dry-run

# Force re-scrape all datasets (even if recently scraped)
python manage.py sync_nhs_dd_datasets --force

# Sync a single dataset
python manage.py sync_nhs_dd_datasets --dataset smoking_status_code

The sync_global_question_group_templates management command runs daily to:

  1. Import global templates - Syncs question group templates from the docs/question-group-templates/ folder
  2. Update existing templates - Keeps templates current with repository changes
  3. Enable template library - Makes reusable question groups available for import

Recommended: Run this daily if you maintain custom global templates in your repository. If you're using the standard CheckTick distribution without custom templates, this task is optional.

Initial Setup: Run the sync command once to populate templates (creates records automatically):

# One-time: Create templates from markdown files
python manage.py sync_global_question_group_templates

Maintenance:

# Check which templates would be synced
python manage.py sync_global_question_group_templates --dry-run

# Force update all templates (even if unchanged)
python manage.py sync_global_question_group_templates --force

6. Recovery Time Delay Processing (Required if Recovery Enabled)

The process_recovery_time_delays management command runs frequently to:

  1. Check expired time delays - Finds recovery requests where the mandatory waiting period has elapsed
  2. Update status - Transitions requests from "In Time Delay" to "Ready for Execution"
  3. Send notifications - Alerts administrators that a recovery is ready to execute
  4. Create audit entries - Logs the automatic transition for regulatory compliance

Required for Recovery: If you use the ethical key recovery feature, this command ensures recovery requests proceed after their mandatory waiting period (24-48 hours). Without it, approved recovery requests will remain stuck in the time delay state.

Schedule: Run every 5 minutes (*/5 * * * *) to ensure timely processing. The command is idempotent and lightweight - it only processes requests whose time delays have expired.

# Check for expired time delays (dry-run)
python manage.py process_recovery_time_delays --dry-run

# Process expired time delays
python manage.py process_recovery_time_delays --verbose

Prerequisites

  • CheckTick deployed and running
  • Email configured (for sending deletion warnings)
  • Access to your hosting platform's scheduling features

Platform-Specific Setup

Northflank provides native cron job support, making this the simplest option.

1. Create Data Governance Cron Job

  1. Go to your Northflank project
  2. Click "Add Service" โ†’ "Cron Job"
  3. Configure the job:
  4. Name: checktick-data-governance
  5. Docker Image: Use the same image as your web service (e.g., ghcr.io/eatyourpeas/checktick:latest)
  6. Schedule: 0 2 * * * (runs at 2 AM UTC daily)
  7. Command: python manage.py process_data_governance

2. Create Survey Progress Cleanup Cron Job

  1. Click "Add Service" โ†’ "Cron Job" again
  2. Configure the job:
  3. Name: checktick-progress-cleanup
  4. Docker Image: Use the same image as your web service
  5. Schedule: 0 3 * * * (runs at 3 AM UTC daily, after data governance)
  6. Command: python manage.py cleanup_survey_progress

3. Create External Dataset Sync Cron Job

  1. Click "Add Service" โ†’ "Cron Job" again
  2. Configure the job:
  3. Name: checktick-dataset-sync
  4. Docker Image: Use the same image as your web service
  5. Schedule: 0 4 * * * (runs at 4 AM UTC daily)
  6. Command: python manage.py sync_external_datasets

4. Create NHS DD Sync Cron Job

  1. Click "Add Service" โ†’ "Cron Job" again
  2. Configure the job:
  3. Name: checktick-nhs-dd-sync
  4. Docker Image: Use the same image as your web service
  5. Schedule: 0 5 * * 0 (runs at 5 AM UTC every Sunday - weekly)
  6. Command: python manage.py sync_nhs_dd_datasets

5. Create Global Templates Sync Cron Job

  1. Click "Add Service" โ†’ "Cron Job" again
  2. Configure the job:
  3. Name: checktick-templates-sync
  4. Docker Image: Use the same image as your web service
  5. Schedule: 0 6 * * * (runs at 6 AM UTC daily)
  6. Command: python manage.py sync_global_question_group_templates

6. Create Recovery Time Delay Cron Job

  1. Click "Add Service" โ†’ "Cron Job" again
  2. Configure the job:
  3. Name: checktick-recovery-time-delays
  4. Docker Image: Use the same image as your web service
  5. Schedule: */5 * * * * (runs every 5 minutes)
  6. Command: python manage.py process_recovery_time_delays

7. Copy Environment Variables

All cron jobs need the same environment variables as your web service:

  1. In Northflank, go to your web service โ†’ Environment
  2. Copy all environment variables
  3. Go to each cron job service โ†’ Environment
  4. Paste the variables

Critical variables needed:

  • DATABASE_URL
  • SECRET_KEY
  • EMAIL_HOST, EMAIL_PORT, EMAIL_HOST_USER, EMAIL_HOST_PASSWORD (for data governance)
  • DEFAULT_FROM_EMAIL (for data governance)
  • SITE_URL (for email links in data governance)
  • EXTERNAL_DATASET_API_URL, EXTERNAL_DATASET_API_KEY (for dataset sync - optional, defaults to RCPCH API)

7. Deploy and Test

  1. Deploy all cron job services
  2. Test them manually via Northflank dashboard: Jobs โ†’ Run Now
  3. Check logs to verify successful execution
  4. Monitor the History tab for scheduled runs

Important: For external dataset sync, run the initial setup command once:

# In your web service terminal (via Northflank shell or kubectl exec)
python manage.py sync_external_datasets

Northflank Advantages:

  • โœ… No extra containers running 24/7
  • โœ… Easy manual testing via UI
  • โœ… Built-in logging and monitoring
  • โœ… No additional cost (same compute as web service, but only active for ~1-5 minutes daily)

Docker Compose (Local/VPS)

If you're self-hosting with Docker Compose on a VPS or dedicated server, use the system's cron.

1. Create Cron Scripts

Create /usr/local/bin/checktick-data-governance.sh:

#!/bin/bash
# CheckTick Data Governance Cron Job
# Runs daily at 2 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the management command
docker compose exec -T web python manage.py process_data_governance >> /var/log/checktick/data-governance.log 2>&1

# Exit with the command's exit code
exit $?

Create /usr/local/bin/checktick-progress-cleanup.sh:

#!/bin/bash
# CheckTick Survey Progress Cleanup Cron Job
# Runs daily at 3 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the cleanup command
docker compose exec -T web python manage.py cleanup_survey_progress >> /var/log/checktick/progress-cleanup.log 2>&1

# Exit with the command's exit code
exit $?

Create /usr/local/bin/checktick-dataset-sync.sh:

#!/bin/bash
# CheckTick External Dataset Sync Cron Job
# Runs daily at 4 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the sync command
docker compose exec -T web python manage.py sync_external_datasets >> /var/log/checktick/dataset-sync.log 2>&1

# Exit with the command's exit code
exit $?

Create /usr/local/bin/checktick-templates-sync.sh:

#!/bin/bash
# CheckTick Global Templates Sync Cron Job
# Runs daily at 6 AM UTC

# Set working directory
cd /path/to/your/checktick-app

# Run the sync command
docker compose exec -T web python manage.py sync_global_question_group_templates >> /var/log/checktick/templates-sync.log 2>&1

# Exit with the command's exit code
exit $?

Make them executable:

chmod +x /usr/local/bin/checktick-data-governance.sh
chmod +x /usr/local/bin/checktick-progress-cleanup.sh
chmod +x /usr/local/bin/checktick-dataset-sync.sh
chmod +x /usr/local/bin/checktick-templates-sync.sh

2. Add to System Crontab

sudo crontab -e

Add these lines:

# CheckTick Data Governance - Daily at 2 AM UTC
0 2 * * * /usr/local/bin/checktick-data-governance.sh

# CheckTick Survey Progress Cleanup - Daily at 3 AM UTC
0 3 * * * /usr/local/bin/checktick-progress-cleanup.sh

# CheckTick External Dataset Sync - Daily at 4 AM UTC
0 4 * * * /usr/local/bin/checktick-dataset-sync.sh

# CheckTick Global Templates Sync - Daily at 6 AM UTC
0 6 * * * /usr/local/bin/checktick-templates-sync.sh

3. Create Log Directory

sudo mkdir -p /var/log/checktick
sudo chown $USER:$USER /var/log/checktick

4. Test the Scripts

# Test data governance
/usr/local/bin/checktick-data-governance.sh
tail -f /var/log/checktick/data-governance.log

# Test progress cleanup
/usr/local/bin/checktick-progress-cleanup.sh
tail -f /var/log/checktick/progress-cleanup.log

# Test dataset sync
/usr/local/bin/checktick-dataset-sync.sh
tail -f /var/log/checktick/dataset-sync.log

# Test templates sync
/usr/local/bin/checktick-templates-sync.sh
tail -f /var/log/checktick/templates-sync.log

Important: For external dataset sync, run the initial setup once:

cd /path/to/your/checktick-app
docker compose exec web python manage.py sync_external_datasets

Kubernetes

If you're running CheckTick in Kubernetes, use a CronJob resource.

Create a CronJob manifest:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: checktick-data-governance
  namespace: checktick
spec:
  schedule: "0 2 * * *"  # 2 AM UTC daily
  concurrencyPolicy: Forbid  # Don't run if previous job still running
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: data-governance
            image: ghcr.io/eatyourpeas/checktick:latest
            command:
            - python
            - manage.py
            - process_data_governance
            envFrom:
            - configMapRef:
                name: checktick-config
            - secretRef:
                name: checktick-secrets
            resources:
              requests:
                memory: "256Mi"
                cpu: "100m"
              limits:
                memory: "512Mi"
                cpu: "500m"

Apply the manifest:

kubectl apply -f checktick-cronjob.yaml

Test manually:

# Trigger a manual run
kubectl create job --from=cronjob/checktick-data-governance manual-test-1

# Check logs
kubectl logs -l job-name=manual-test-1

AWS ECS/Fargate

Use AWS EventBridge (CloudWatch Events) to trigger scheduled ECS tasks.

1. Create EventBridge Rule

aws events put-rule \
  --name checktick-data-governance-daily \
  --schedule-expression "cron(0 2 * * ? *)" \
  --description "Run CheckTick data governance tasks daily at 2 AM UTC"

2. Add ECS Task as Target

aws events put-targets \
  --rule checktick-data-governance-daily \
  --targets "Id"="1","Arn"="arn:aws:ecs:region:account:cluster/checktick-cluster","RoleArn"="arn:aws:iam::account:role/ecsEventsRole","EcsParameters"="{TaskDefinitionArn=arn:aws:ecs:region:account:task-definition/checktick-web:latest,LaunchType=FARGATE,NetworkConfiguration={awsvpcConfiguration={Subnets=[subnet-xxx],SecurityGroups=[sg-xxx],AssignPublicIp=ENABLED}},TaskCount=1,PlatformVersion=LATEST}"

3. Override Task Command

In your task definition, set the command override:

{
  "overrides": {
    "containerOverrides": [
      {
        "name": "checktick-web",
        "command": ["python", "manage.py", "process_data_governance"]
      }
    ]
  }
}

Heroku

Heroku provides the Scheduler add-on for running periodic tasks.

1. Add Scheduler Add-on

heroku addons:create scheduler:standard

2. Configure Job

heroku addons:open scheduler

In the web interface: - Command: python manage.py process_data_governance - Frequency: Daily at 2:00 AM (UTC)


Railway

Railway doesn't have native cron support, so use an external service or run a background worker.

Create .github/workflows/data-governance-cron.yml:

name: Data Governance Cron

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM UTC daily
  workflow_dispatch:  # Allow manual trigger

jobs:
  run-data-governance:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger Railway Command
        run: |
          curl -X POST \
            -H "Authorization: Bearer ${{ secrets.RAILWAY_API_TOKEN }}" \
            -H "Content-Type: application/json" \
            -d '{"command": "python manage.py process_data_governance"}' \
            https://backboard.railway.app/graphql/v2

Option 2: Use EasyCron or Similar Service

  1. Sign up for EasyCron or similar
  2. Create a webhook endpoint in your CheckTick app
  3. Schedule the webhook to run daily

Command Reference

Data Governance Command

# Run data governance tasks
python manage.py process_data_governance

# Dry-run mode (show what would be done without making changes)
python manage.py process_data_governance --dry-run

# Verbose output (detailed logging)
python manage.py process_data_governance --verbose

Example Output:

Starting data governance processing at 2024-10-26 02:00:00

--- Deletion Warnings ---
Sent 3 deletion warnings:
  - 30-day warnings: 2
  - 7-day warnings: 1
  - 1-day warnings: 0

--- Automatic Deletions ---
Soft deleted: 1 surveys
Hard deleted: 0 surveys

โš ๏ธ  1 surveys were deleted. Check audit logs for details.

Data governance processing completed at 2024-10-26 02:00:15

Survey Progress Cleanup Command

# Clean up expired survey progress records
python manage.py cleanup_survey_progress

# Dry-run mode (show what would be deleted without making changes)
python manage.py cleanup_survey_progress --dry-run

# Verbose output (detailed logging)
python manage.py cleanup_survey_progress --verbose

Example Output:

Starting survey progress cleanup...
Found 15 expired progress records (older than 30 days)
Deleted 15 expired survey progress records
Cleanup completed successfully

What gets deleted:

  • Anonymous user progress (session-based) older than 30 days
  • Authenticated user progress older than 30 days
  • Token-based progress older than 30 days
  • Only incomplete surveys (completed submissions are already deleted on submission)

Testing

Test in Development

Data Governance:

# Test with dry-run (safe, no changes)
docker compose exec web python manage.py process_data_governance --dry-run --verbose

# Test for real (only if you have test data)
docker compose exec web python manage.py process_data_governance --verbose

Survey Progress Cleanup:

# Test with dry-run (safe, no changes)
docker compose exec web python manage.py cleanup_survey_progress --dry-run --verbose

# Test for real (only if you have test data)
docker compose exec web python manage.py cleanup_survey_progress --verbose

Test in Production

  1. First, use dry-run mode:

```bash # Data governance python manage.py process_data_governance --dry-run --verbose

# Progress cleanup python manage.py cleanup_survey_progress --dry-run --verbose ```

  1. Review the output - it will show what surveys/progress records would be affected

  2. Run for real (on your scheduled platform)

  3. Monitor logs after the first scheduled run


Monitoring

Check Execution Logs

Northflank:

  • Go to your cron job service โ†’ History โ†’ View logs

Docker Compose:

# Data governance logs
tail -f /var/log/checktick/data-governance.log

# Progress cleanup logs
tail -f /var/log/checktick/progress-cleanup.log

Kubernetes:

# Data governance
kubectl logs -l job-name=checktick-data-governance --tail=100

# Progress cleanup
kubectl logs -l job-name=checktick-progress-cleanup --tail=100

Audit Trail

Survey Deletions:

All deletions are logged in the database. Check via Django admin:

# In Django shell
python manage.py shell

from checktick_app.surveys.models import Survey
from django.utils import timezone
from datetime import timedelta

# Check recently soft-deleted surveys
Survey.objects.filter(
    deleted_at__gte=timezone.now() - timedelta(days=7)
).values('name', 'deleted_at', 'deletion_date')

# Check surveys due for deletion soon
Survey.objects.filter(
    deletion_date__lte=timezone.now() + timedelta(days=7),
    deleted_at__isnull=True
).values('name', 'deletion_date', 'retention_months')

Progress Cleanup:

Monitor the SurveyProgress table size:

# In Django shell
from checktick_app.surveys.models import SurveyProgress
from django.utils import timezone
from datetime import timedelta

# Count total progress records
print(f"Total progress records: {SurveyProgress.objects.count()}")

# Count expired records (ready for cleanup)
expired = SurveyProgress.objects.filter(
    expires_at__lt=timezone.now()
)
print(f"Expired records: {expired.count()}")

# Count by type
authenticated = SurveyProgress.objects.filter(user__isnull=False).count()
anonymous = SurveyProgress.objects.filter(session_key__isnull=False).count()
token_based = SurveyProgress.objects.filter(access_token__isnull=False).count()
print(f"Authenticated: {authenticated}, Anonymous: {anonymous}, Token-based: {token_based}")

Troubleshooting

No Emails Being Sent

Check email configuration:

# Test email from Django shell
python manage.py shell
>>> from django.core.mail import send_mail
>>> send_mail('Test', 'Testing', '[email protected]', ['[email protected]'])

Common issues: - Missing EMAIL_HOST or EMAIL_PORT environment variables - Incorrect SMTP credentials - Missing SITE_URL (emails include survey links)

Surveys Not Being Deleted

Check for legal holds:

from checktick_app.surveys.models import Survey, LegalHold

# Find surveys with active legal holds
Survey.objects.filter(legal_hold__removed_at__isnull=True)

Surveys with active legal holds are intentionally skipped from automatic deletion.

Command Fails Silently

Run with verbose output:

python manage.py process_data_governance --verbose

Check Python errors: - Database connection issues - Missing environment variables - Permissions problems


Security Considerations

Environment Variables

The cron job needs access to: - Database credentials (via DATABASE_URL) - Email credentials (for sending notifications) - Django SECRET_KEY (for encryption/signing)

Never log sensitive environment variables!

Execution Isolation

  • Cron jobs should run in the same network as your database
  • Use read-only database credentials if possible for reporting
  • Consider separate logging for scheduled tasks

FAQ

Q: What happens if the cron job fails?

A: The next day's run will process any missed deletions. Surveys won't be deleted prematurely - only those past their deletion_date.

Q: Can I change the schedule?

A: Yes, but daily at 2 AM UTC is recommended for: - Off-peak hours (less load) - Predictable timing for users - Allows overnight processing before business hours

Q: What timezone does the schedule use?

A: All schedules use UTC. Django's deletion_date is also stored in UTC, so the system is timezone-aware.

Q: How long does the command take to run?

A: Typically 30-60 seconds for most deployments. Scales with: - Number of surveys approaching deletion - Email sending speed - Database query performance

Q: Can I disable automatic deletion?

A: No - automatic deletion is required for GDPR compliance. However, you can:

  • Apply legal holds to prevent specific surveys from being deleted
  • Extend retention periods (up to 24 months)
  • Export data before deletion

Q: Why do I need the progress cleanup job?

A: The cleanup job prevents database bloat by removing old progress records. Without it:

  • Your database will grow indefinitely with incomplete survey sessions
  • Performance may degrade over time
  • Storage costs will increase

The cleanup is safe because it only removes progress for incomplete surveys older than 30 days.

Q: What if someone was working on a survey and their progress gets deleted?

A: Progress records expire after 30 days of inactivity. This is intentional:

  • Most surveys are completed within hours or days, not months
  • After 30 days, the session is likely abandoned
  • Users can still start a new submission if needed
  • Only the progress draft is deleted, not any completed submissions

Q: Do both cron jobs need to run?

A:

  • Data governance: YES - legally required for GDPR compliance
  • Progress cleanup: RECOMMENDED - prevents database bloat, but not legally required

Next Steps