Past Issues

Not So Fast! AI Helps You Finish Fast, But You Still Need to Start Slow: Structure, Documentation, and Auditability in Python-Supported Analysis by Kimberly Wright

Date: Saturday, March 21, 2026

Hello! My name is Kimberly Wright and I teach Python to adult learners in public agencies with Evaluation + Learning Consulting. LLMs like ChatGPT or Claude and AI copilots (Google, Microsoft, Snowflake, Databricks) can speed up cleaning but here’s the inconvenient truth: AI can’t go back in time and collect the right data, in the right format, with the right definitions.

In evaluation, Python’s continued value is control: schemas, reproducible pipelines, and auditable decisions. Skip structure and documentation, and you end up with results you can’t defend.

Take data cleaning. AI can fix errors, handle missing data, remove duplicates, and enforce data types. But sloppy practices can still create bias (missing data tied to outcomes) and double counting (duplicate IDs, inconsistent keys).

Hot Tip: Confirm your “why” with stakeholders and know your “what” first: One row means what?

Before you write a line of code, confirm the objective with stakeholders and state the unit of analysis out loud: “one row per…participant, visit, case, month”? Knowing the question you are trying to answer and the unit of analysis you have helps you determine the necessary structure of your data, how to aggregate your data, what the denominator should be, what counts as a duplicate and which joins are valid.

Reshaping is where “quick analysis” quietly breaks. Reshaping means structuring data so each row matches the evaluation question. Skip this and you invite many-to-many merge double counting, wrong denominators, and biased estimates—especially when merge uncertainty isn’t quantified. A Harvard led probabilistic record linkage study highlights how deterministic merges can’t quantify uncertainty but can introduce bias.

Standardizing data keeps measures comparable (formats, units, codes). If you don’t standardize, you risk misclassification and unit/interface mismatches—NASA’s Mars Climate Orbiter is the cautionary tale.

Finally, be sure to capture the dataset’s “instructions”: data dictionaries, cohort definitions, timing, provenance, revisions, and limitations, so nobody answers the wrong question confidently.

Rad Resources: Tidy Data + De-duplication best practices (and why AI likes it too!)

Tidy data—each variable is a column, each observation is a row, each unit is its own table—reduces guessing for humans and AI. For de-duplication/participant matching, the CDC best-practices report is a strong reference.

Reproducible ETL is the coder’s solution to “I really did the same steps last time!” turned into code: version-controlled transformations (often SQL and Python) that others can rerun and audit.

OCR/Document AI also needs guardrails. Without validation, you risk automation bias and systematic extraction errors. A Texas court automation highlighted in an AI Readiness report by the National Center for State Courts showed that rollout improved from ~60% to ~95% accuracy with human review before AI scaling.

Cool Trick: Data dictionaries, activity logs, and inline commenting

Keep a simple table: field name, definition, allowed values, units, and source—plus a log of when/why/who changed it. Add docstrings and inline comments so the “why” and “what” survive.

Lesson Learned: Python is valuable because it gives you control—not just speed

AI can suggest and automate steps, but the failure modes are still a human designed structure & controls problems: units, schema, merge logic, reproducibility and metadata. These don’t disappear with AI so use it as an assistive layer, but keep the human-in-the-loop to ensure that Python-based analysis includes audit ready, testable code and explicit metadata governance.

S. Kimberly Wright is a data analytics professional and Data Analytics instructor with experience teaching government employees and adult learners how to apply Python for large-scale data analysis. She leads adult continuing education courses at LaGuardia Community College and hands-on workshops introducing practical programming concepts and tools such as Python, SQL, Pandas, and data visualization libraries to support evidence-based decision making. With a background in strategic consulting, finance, and data-driven research, Kimberly specializes in translating complex technical information and analytical workflows into accessible learning experiences for non-technical professionals. Her work combines technical instruction with applied problem-solving, helping students build real-world skills in data analysis, reporting, and automation.

Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post so that we may enrich our community of practice. Would you like to submit an AEA365 tip? Please send a note of interest to aea365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.