Internship : Testing Data Migrations with Synthetic Data: AI-powered approach

02 May 2026
100%
Pully

Job summary

Join our team for an exciting internship in data migration! Work in a dynamic, collaborative environment while making a real impact.

Tasks

Design a strategy for migration script testing and dataset generation.
Implement a proof-of-concept system for realistic test datasets.
Explore multi-agent architectures for efficient data generation.

Skills

Strong Python skills, SQL knowledge, and familiarity with data security are essential.
Experience with LLMs and agentic systems is a plus.
Clear technical writing and problem-solving mindset are required.

Is this helpful?

About the job

Description

Data platform migrations are common in enterprise environments, moving from legacy systems to modern infrastructure while preserving business logic. The technical challenge isn't just syntax translation; it's validation. When developers migrate SQL scripts or data pipelines between platforms, they face different execution environments, modified data access permissions, and no safe way to test against production data.

This internship tackles synthetic data generation for migration script testing. You'll design and implement a system that generates realistic test datasets mirroring production structure and behavior without exposing sensitive information. There are different approaches, it could be a small dataset living in a git repository, or a fully-fledged synthetic data warehouse. Still, the data must be realistic enough to catch real bugs.

The challenge goes beyond simple data mocking. You'll need to decide whether to generate from real data (anonymization risks), from query analysis alone (requires good documentation), or hybrid approaches. Should categorical values match production exactly or can we substitute them and adapt the scripts? Can we extend unit-testing to end-to-end testing, and what would be the required dataset properties?

Part of the work involves establishing an evaluation methodology—potentially collecting a reference set of migration scripts and their expected behaviors to measure how well different synthetic data approaches catch real issues. There's potential to explore multi-agent architectures where specialized agents handle different aspects: schema analysis, constraint extraction, data generation, anonymization verification, and test validation. This is applied research with immediate production impact.

Objectives

Design a strategy for migration script testing that balances realism, anonymization, and practical constraints

Implement a proof-of-concept system that generates test datasets from schema documentation, existing queries, or (carefully) sampled production data

Define testing strategies: unit tests vs. end-to-end tests, minimum viable data sizes, etc.

Develop an evaluation methodology to measure the effectiveness of different synthetic data generation approaches

Explore multi-agent architectures for decomposing the generation pipeline into specialized components (schema analysis, constraint satisfaction, validation)

Our offer

A dynamic work and collaborative environment with a highly motivated multi-cultural and international sites team

The chance to make a difference in peoples’ life by building innovative solutions

Various internal coding events (Hackathon, Brownbags), see our technical blog

Monthly After-Works organized per locations

Skills required

Strong Python programming: data processing, testing patterns, CI/CD integration

Understanding of relational databases, SQL, and data modeling concepts

Experience with LLMs and agentic systems: prompting, tool use, multi-agent orchestration

Familiarity with data security and data anonymization concepts

Problem-solving mindset: comfort with ambiguous requirements and making justified technical trade-offs

Clear technical writing and documentation skills

About the company

Elca informatique SA

Pully

See company profile

Information technology / Telecom.

Number of employees unknown

46 jobs

Internship : Testing Data Migrations with Synthetic Data: AI-powered approach

Tasks

Skills

About the job

About the company

Salary estimator