Video editing

Audio turnover: exporting an AAF from the picture edit to the sound mix

What conformation is, why AAF beats OMF, how many handle frames to leave, and the 2-pop and picture reference that let the mixer confirm sync in seconds.

By Hanna Eng·Video editor, Free Conservatory of French Cinema

Updated 2 June 20269 min read
Part of: Video post-production

An AAF is the file that carries an edited timeline from the picture edit to the sound mixer's session, with the audio clips, their positions and their fades. Export it with handles of several seconds (many mixers ask for a few seconds up to about six; the exact length is set per project), stereo broken out to mono, at 48 kHz, alongside a video reference and a 2-pop, so the mix conforms to the picture exactly.

When the picture edit is finished, the sound does not travel as a flattened mixdown, it travels as the edit itself: every clip, in position, with its fades, rebuilt inside the sound mixer's session. That handoff is the turnover, and getting the export right is the difference between a mixer who starts work immediately and one who spends the first hour fixing sync and missing media. Here is how to do it cleanly.

AAF vs OMF vs XML/EDL for the sound handoff

FormatWhat it carriesUse
AAFAudio clips, edit points, fades, volume automation and track namesThe modern standard for turning a timeline over to Pro Tools
OMFAudio clips and edits, but automation and names are unreliable; embedded files are capped near 2 GB (an OMF-specific limit)Legacy fallback when AAF is not available
XML / EDLEdit decisions only, with no embedded audio mediaConforming and relinking, not an audio media handoff

Production Expert; Avid

What "turnover" and "conformation" mean

Turnover is the moment the picture edit is handed to the sound department. Conformation, conformation in French, is what happens next: the mixer rebuilds the exact edit inside Pro Tools from the file you send, so the audio sits frame-for-frame against the locked picture.

It only works cleanly once the picture is locked, meaning the cut, timing and shot order are final. Conform against an unlocked edit and every later picture change forces the sound to be re-conformed, which is the most expensive kind of rework in post.

AAF, OMF or XML: which format to send

AAF (Advanced Authoring Format) is the modern standard for the sound turnover. It carries the audio clips, their edit points and fades, clip volume automation and track names, which is exactly what the mixer needs to rebuild the edit. OMF (Open Media Framework) is the older format: it works, but it carries automation and track names unreliably, so it is a fallback, not a first choice. The 2 GB cap on embedded OMF is really a limit of the legacy embedded-media container, so embedded AAF hits it too; linking the media instead avoids it either way.

XML and EDL are a different tool. They describe the edit decisions but embed no audio media, so they are used to conform or relink, not to hand a full audio session over. For a sound mix, send an AAF. The table above lays out the trade-offs.

Export with handles, and why a frame or two is not enough

Handles are extra frames exported beyond each clip's In and Out points, so the mixer can extend an edit, add a crossfade or smooth a transition without the audio simply running out. A frame or two is not enough: many mixers ask for a few seconds up to about six seconds of handle on every clip, and the exact length is set per project rather than fixed by a single standard.

Without handles, every fade the mixer wants to lengthen hits a hard edge and the clip has no more audio to give. Generous handles cost a little file size and save the mix from being boxed in.

Stereo, mono and sample rate

Deliver the audio at the project sample rate, which for video is 48 kHz, not the 44.1 kHz used for CD and consumer audio, and typically 24-bit, in the Broadcast Wave (BWF) format. A file at the wrong sample rate plays back at the wrong speed and pitch in the mix session.

Break stereo clips out to separate mono tracks so the mixer controls each channel independently, and decide between embedded media (audio inside the AAF) or linked media (the AAF plus a folder of audio files). Embedded is simpler to hand over; linked keeps the AAF small.

The sync safety net: a 2-pop and a picture reference

A 2-pop is a single frame of 1 kHz tone, placed exactly two seconds before the first frame of program, where the "2" would appear on a countdown, with a matching tail pop after the last frame. Its level follows the delivery's alignment reference: -20 dBFS on SMPTE/US deliverables (where -20 dBFS = 0 VU) and -18 dBFS on EBU/European ones, so confirm the target level with the deliverable spec. It gives the mixer one unambiguous sync point to check the whole session against.

Send it alongside a video reference, usually a QuickTime export of the locked picture with the same start timecode and the same 2-pop burned in. With both, the mixer confirms the AAF lines up against the picture in seconds, instead of discovering a one-frame slip at the end of the day.

Exporting from DaVinci Resolve, and what the mixer needs

DaVinci Resolve exports an AAF directly, and so do the other editing systems a project might come from, each with its own handle-length and mono-breakout options. Set the handles, break out to mono, choose embedded or linked media, and export at 48 kHz.

The full turnover package is more than the AAF: the AAF, a QuickTime picture reference with matching timecode and 2-pop, the production audio if the media is linked, and a short notes file listing frame rate, sample rate, handle length and anything unusual. That package is what lets the mix conform on the first try.

Turnover checklist and common mistakes

The expensive mistakes are all avoidable: turning over before picture lock, handles too short to fade, no 2-pop, audio at 44.1 kHz instead of 48, embedded OMF media that blows past its roughly 2 GB limit, or forgetting the QuickTime reference so the mixer cannot check sync.

A clean turnover is a short checklist: picture locked, AAF with multi-second handles, mono breakout, 48 kHz, a 2-pop, a matching picture reference, and a notes file. Tick those and the mix starts on minute one.

Frequently asked questions

What is audio conformation (conformation)?

It is the sound mixer rebuilding the picture editor's exact timeline inside Pro Tools from the file you send, so every audio clip sits frame-for-frame against the locked picture before the mix begins.

What is the difference between AAF and OMF?

Both carry audio clips and edits to the mixer, but AAF reliably keeps volume automation and track names, while OMF is older and carries automation and names unreliably. Send AAF and keep OMF as a fallback. OMF caps embedded media near 2 GB; AAF has no such hard cap, but for large sessions it is still safest to link the media rather than embed it.

How do you export an AAF for Pro Tools?

From DaVinci Resolve (or another editing system), export an AAF with handles of several seconds, stereo broken out to mono, at 48 kHz, choosing embedded or linked media, then send it with a picture reference and a 2-pop.

How many handle frames should an AAF have?

Not a frame or two. Handles of several seconds are the norm: many mixers ask for a few seconds up to about six seconds on every clip so they can extend edits and add crossfades, and the exact length is set per project rather than fixed by a single standard.

What is a 2-pop?

A single frame of 1 kHz tone placed exactly two seconds before the first frame of program, with a matching tail pop after the last frame. Its level follows the delivery's alignment reference (commonly -20 dBFS on SMPTE/US deliverables, -18 dBFS on EBU/European ones), so confirm it with the deliverable spec. It is a sync reference the mixer checks the whole session against.

What sample rate should audio be for video?

48 kHz, not the 44.1 kHz used for CD and consumer audio, and usually 24-bit in the Broadcast Wave (BWF) format. The wrong sample rate plays back at the wrong speed and pitch.

What should you send to the sound mixer?

An AAF with handles and mono breakout, a QuickTime picture reference with matching timecode and a 2-pop, the production audio if the media is linked, and a notes file listing frame rate, sample rate and handle length.

Sources and references

Have a project that needs this done right?

If your mix has to pass a platform spec, let's talk about the deliverables and the timeline.

Start a project