Merge two subtitles of different languages into one

Posted on March 9, 2013

By using ted2mkv, I can download a TED lecture with all availabe subtitles.

I have the english subtitle en.srt

1
00:00:15,330 --> 00:00:17,330
We grew up

2
00:00:17,330 --> 00:00:20,330
interacting with the physical objects around us.

and the chinese subtitle zh-cn.srt

1
00:00:15,330 --> 00:00:17,330
我们生长在

2
00:00:17,330 --> 00:00:20,330
和周围物体互动的环境里

What I want is like this

1
00:00:15,330 --> 00:00:17,330
We grew up
我们生长在

2
00:00:17,330 --> 00:00:20,330
interacting with the physical objects around us.
和周围物体互动的环境里

I believe there can be hundreds of way to do this. What I find out is this simple one.

diff --new-line-format "%L" en.srt zh-cn.srt > zh-en.srt

The key part is --new-line-format. Without any options, diff will give us this result

3c3
< We grew up
---
> 我们生长在
7c7
< interacting with the physical objects around us.
---
> 和周围物体互动的环境里

We don’t need those 3c3, <, >, after consulting the manpage of diff, we find that

–LTYPE-line-format=LFMT

format LTYPE input lines with LFMT
These format options provide fine-grained control over the output of diff, generalizing -D/–ifdef.

The LTYPE part can be old, new or unchanged.

The LFMT part can be

Note: srtmerger is an online tool that can merge two subtitles. Colors can be chosed for different subtitle, which is not difficult to do with diff!