HRIS File Cleaner
🔍 Problem Statement
A client’s enrollment application began blocking users from completing benefit selections. The issue stemmed from international phone number formatting (e.g., +1
) in HRIS files. Since the product team declined to alter the validation logic, we built an internal solution to clean the phone numbers at the data level.
⚙️ Project Overview
This Python-based tool:
- Extracts the most recent HRIS file
- Cleans and validates phone numbers
- Tracks changes with reason and action
- Outputs a cleaned file and Excel audit report
🎥 Demo
A walkthrough showing how the tool:
- Extracts the latest HRIS file from a folder
- Cleans phone numbers in real-time
- Logs invalid or unfixable entries
- Outputs the altered file
- Generates an Excel audit report
💡 My Role
I built this as a solo developer after encountering a real-world defect in an enterprise HRIS environment. I handled everything end-to-end:
- Extract logic
- Data transformation rules
- Audit reporting
- Logging
- Output file generation
- Project documentation (README, changelog, logging strategy, etc.)
This project reflects my ability to translate production-level issues into scalable, modular Python solutions.
⚙️ What It Does
The HRIS File Cleaner is a modular Python application that:
- Scans HRIS data for invalid U.S. phone numbers
- Removes formatting issues (e.g.,
+1
, dashes, parentheses) - Standardizes all entries to a clean 10-digit format
- Tracks which records were changed and why
- Outputs a cleaned CSV and a detailed Excel audit report
- Supports future transformation logic via plug-and-play structure
🧪 Key Features
- Phone Number Cleaning
- Audit Trail
- Scalable Framework
Removes +1
, special characters, and validates 10-digit format
Tracks each change with action (Changed
, Removed
) and reason (e.g., “Stripped leading 1”)
Can be extended to handle name cleaning, state-based exclusions, or DOB formatting
🧾 Documentation
🙌 Why This Matters
This tool was born from a real production problem:
Employees couldn’t complete their benefit enrollment due to phone numbers with +1
formatting—something the product UI didn’t tolerate, and the vendor didn’t have the capacity to patch quickly.
Rather than wait on product development, I created a lightweight ETL solution that allows ops teams to fix data quality issues upstream. This gives us more control over data hygiene without relying on slow or unavailable engineering support.
The system is designed for reuse across other transformation scenarios—like name sanitization or geographic filtering—making it a sustainable, long-term investment in data quality.