I’m a Data Engineer with a passion for turning complex data into actionable insights. Over the past 4+ years, I’ve built scalable, real-time data pipelines, and designed analytics solutions that drive business outcomes across industries like financial services and higher education.
My expertise lies in leveraging cloud-native technologies, primarily AWS, to automate ETL workflows, ensure data quality, and build powerful data infrastructures. I specialize in creating end-to-end data pipelines using AWS Glue, PySpark, and Apache Airflow, along with deploying real-time monitoring solutions to keep data discrepancies in check.
From developing automated data validation frameworks to streamlining financial reconciliation processes, I thrive on finding innovative ways to reduce manual work, improve accuracy, and deliver valuable business insights. My work in data governance and regulatory compliance has contributed to reducing risk and saving significant costs.
Sep 2024 - Present
• Spearheaded the aggregation of property inspection data from 1,200+ hurricane-impacted sites and integrated datasets from FEMA, NOAA, and US Census Pulse Survey to provide a unified foundation for community recovery planning.• Automated data ingestion workflows using AWS Glue and implemented transformation and modeling logic with dbt in Amazon Redshift, processing 10K+ records weekly to ensure accurate and timely data availability.• Applied Great Expectations to enforce data quality by validating schema integrity, null checks, and duplicate detection, reducing downstream reporting errors by 25% and increasing confidence in analytics outputs.• Led a team of 3 analysts in validating and cross-checking field and public data sources, improving the consistency and accuracy of data flowing into internal reports and external recovery dashboards.• Built and maintained Power BI dashboards to enable real-time monitoring of inspection coverage, claim processing status, and funding eligibility, accelerating operational decision-making by 20% and helping prioritize high-need areas for aid distribution.
AWSPythonSQLPowerBIDBTFeb 2023 - May 2024
• Cleaned and analyzed over 5,000 student records using SQL and Excel to reconcile tuition payment and financial aid disbursement data, improving semester-end reporting accuracy.• Collaborated with the Student Financial Services team to define KPIs and data validation rules, ensuring alignment between aid disbursement timelines and billing cycles.• Designed SQL-based reports and ad hoc queries to flag anomalies in student billing data, helping identify and resolve over 100 inconsistencies across three semesters.• Developed interactive Tableau dashboards to visualize key metrics including payment completion rates, average aid disbursement timing, and per-student outstanding balances—resulting in a 20% reduction in delinquency-related follow-ups.• Supported testing and documentation of API integrations between Ellucian Banner and TouchNet, validating transaction status sync and error handling logic to enhance audit readiness.
TableauAPI IntegrationData AnalysisKPI TrackingFeb 2021 - Aug 2022
• Leveraged Databricks to scale the processing of 500K+ monthly invoice records from Generali’s P&C Claims System, enabling efficient downstream analysis and reporting.• Transformed complex, nested claims data into partitioned Parquet files by applying currency normalization to USD and EUR, significantly improving data usability for reconciliation.• Built and automated ETL workflows using AWS Glue, Lambda, and Apache Airflow to orchestrate ingestion, transformation, and load processes across multiple data sources.• Developed real-time anomaly detection pipelines using CloudWatch and SNS to monitor data quality and flag issues such as currency mismatches and incomplete invoice fields—leading to a 10% improvement in error identification.• Designed and deployed Power BI dashboards to visualize KPIs including claims discrepancies, policy duplication rates, and property valuations—reducing overpaid claims by $200K annually through better-informed decisions.
DatabricksAWSSQLAirflowPowerBIPythonJun 2019 - Aug 2019
• Assisted in designing and normalizing database schemas to improve data retrieval efficiency and reduce redundancy.• Wrote optimized SQL queries to extract data from relational databases such as MySQL, for use in internal reporting tools.
MySQLPythonERDDatabase ManagementSep 2017 - May 2018
• Led a team of designers to create visually compelling marketing materials and event banners.• Coordinated with sponsors and stakeholders to ensure brand consistency and effective communication.• Organized and managed creative workshops and presentations to enhance team skills and project outcomes.
IllustratorPhotoshopAug 2016 - Apr 2017
• Organized and coordinated various workshops and events focused on cyber security and ethical hacking.• Collaborated with industry experts to deliver engaging and informative sessions.• Managed event logistics, including scheduling, venue setup, and participant registration.
Event ManagementTeam Coordination• Leveraged Terraform to define and manage the infrastructure required for the application on AWS.• Containerized the frontend and backend using Docker, orchestrated the deployment and management of these containers with Kubernetes.• Implemented a CI/CD pipeline using AWS CodePipeline and CodeBuild to automate the build, testing, and deployment of application code changes.• Integrated AI capabilities using services like Amazon Bedrock and OpenAI to enhance functionalities such as product recommendations and customer support.• Utilized resources and services across multiple cloud providers, including AWS, Google Cloud, and Azure.
TerraformDockerKubernetesMultiCloudAICI/CD• Extracted and processed large-scale Spotify data by integrating custom Python libraries with the Spotify API and storing it in AWS S3.• Automated data workflows with AWS CloudWatch and Lambda, significantly reducing manual intervention and improving operational efficiency.• Transformed and optimized data using Apache Spark and Parquet format, enhancing storage efficiency and data processing capabilities.• Queried and analyzed data using SQL on Athena and visualized insights on AWS QuickSight through detailed trend analysis and KPI tracking.
AWSPythonSQLSpark• Designed and implemented a ETL pipeline to process messages from Amazon SQS, ensuring handling of Personally Identifiable Information (PII) and storing data in a PostgreSQL database.• Achieved data security, integrity, and scalability, enabling seamless handling of increasing data volumes with efficient deployment using Docker Compose.
AWS SQSPythonPostgreSQLDocker• Built a web scraping application to extract articles from news websites, capturing key data such as titles and summaries with high accuracy.• It also demonstrated my ability to clean and preprocess data, reducing noise, and includes interactive visualizations to analyze the most frequently used words in the content.
HTMLJupyter NBPythonJan 2023 - Present
• Actively participated in organizing and hosting developer meetups, workshops, and hackathons, fostering a vibrant tech community among students and professionals.• Gained hands-on experience with Google technologies, tools, and APIs by contributing to knowledge-sharing sessions and coding initiatives.• Developed leadership skills by coordinating events, managing logistics, and collaborating with industry experts, peers, and tech enthusiasts to expand professional networks.
Jan 2018 - Dec 2018
• Conducted weekly tutoring sessions in English and mathematics for students in grades 2-6, promoting academic development.• Developed and delivered interactive workshops and activities, creating an engaging and dynamic learning environment.• Contributed to the organization’s annual fundraising event, playing a key role in securing funds to support its mission and programs.
© 2025 Parth Dodia