Cloud storage for recordings

Status
Not open for further replies.

yukon

Member
Oct 3, 2016
138
14
18
Hi all,

Not sure if HA & Load Balancing is the right place to ask this question, so if not please move it :)

I was wondering if anyone here uses any cloud storage for call recordings. AWS, Google Storage, etc? If so what are your results? Suggestions/tips on how you have it configured?

I feel like you wouldn't want to use this for things like voicemail, IVR recordings, etc but call recordings would probably be fine since they would be used a lot less.

Thoughts?
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,070
577
113
The problem with storing recordings in this way is the way that call recordings are displayed within the CDR records. It can be done, I have played with this myself before. I would take a rewrite of the CDR display because the way it currently determines whether or not there is a recording for a CDR is actually to check whether the file exists or not on the actual filesystem. This makes it very, very slow to use when using an type of S3 storage.
 
Dec 1, 2016
92
8
8
45
inside-out.xyz
This shouldn't be in ha. Its not register. Anyway, I got a customer who requested that, and of course the speed issue popped out.

The way we resolve this is by having a mount point in other path, and using mount binds and scripting we kept useful records in storage and recording directory, all other stuff was in the external mount point.

Btw, I used google drive for this. I will write more in my blog and send you a link here.
 

simcard

Member
Jan 22, 2017
49
4
8
Has anyone attempted recordings and/or voicemail on S3?

Is it just too slow?

S3FS might overcome the CDR issue digidaz mentioned for recordings, but our concern is the speed. Recordings not so much, as they would be seldom used and people could put up with a bit of delay in retrieval but keen to know if voicemail is out of the question on a reasonable sized and active cluster?

Thought about running up a teat environment with s3 for vm/recordings and rds for database - are we wasting our time?
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,070
577
113
If you want to go for it then please do and let us know your findings but I do fear it will be too slow. When you say"Recordings not so much, as they would be seldom used and people could put up with a bit of delay in retrieval", bear in mind that this speed issue more than anything presents itself when displaying the CDR records so whether or not a client is using recordings, the filesystem will still be checked and possibly up to 9 times per CDR to see if a recording exists.
 

simcard

Member
Jan 22, 2017
49
4
8
Thanks DD; We`re eventually building the ability to see CDRs from the fpbx platform via our customer portal and will eventually remove access to users to ser the cdrs in the fpbx portal as we prefer to show rated cdrs than the raw ones. May be a non issue for users (but still slow for superusers)?

My concern with the speed issue was around voicemail retrieval via a handset.

Luis, what was your experience with speed/latency around vm retrieval from google drive?
 
Dec 1, 2016
92
8
8
45
inside-out.xyz
Google is slow, you will need to put it in an alternate mount point and sync periodically
Don't take it wrong, it is the easiest way I have seen to share things, you just share the directory and automagically all contents there is shared
 

simcard

Member
Jan 22, 2017
49
4
8
So we had a go at voicemail with S3. I won't bore you with the detail, but in summary;

What we wanted to achieve;
- Store & receive voicemail recordings on S3, reliably and without a considerable delay.

- Allow all servers in the cluster to access a voicemail, irrespective of what server it was recorded on, without keeping local copies on the server (keep servers 'light').

- Allow customers to access voicemail recordings through our customer toolbox (we don't intend to give customers access to the FPBX gui; why? security, end-user's changing things they shouldn't etc).

The set-up;
- 2 x VPS running FS 1.6 & FPBX 4.2
- Poor mans cluster; database sharing setup between two servers (no proxy etc)
- S3 via ap-southeast-2 zone (servers were on the same AS/transit path: ~10ms to aws)

We tried two fuse based file systems (S3FS & Goofys);
-- S3FS worked out of the box for us.

-- Goofys looked the goods on paper (supposed to be faster and there's a 'cheap' setting which reduces the number of put/get/list requests); we had some permission issues to sort out first, but soon realised it doesn't handle appending to files on s3 very well; given the way FS creates a file first then appends/saves the data to the created file, it didn't end up working for us. Would have liked to run the tests with Goofys but it just wouldn't play ball.


General

Using S3FS, FS had no issue storing and retrieving the files. Our test servers were a few non-aws 2GB VPS' we spun up that were relatively close (~10ms) from the ap-southeast-2 (Sydney) zone.

While we saw a slight delay in retrieving voicemail via a handset (and when I mean slight; I mean we could notice because we were looking for it, but doubt an end user would notice it compared to local filesystem).

The set-up was a poor mans cluster with just database sharing; Both servers were able to access the voicemail recordings without any issue (process to test: forced the user to register to the 1st server to create the voicemail recording, then retrieved on 1st server; forced user to register to 2nd server and retrieve voicemail. All successful).

DigiDaz's comment about viewing voicemails through the FPBX gui was spot on - took quite a while to load and consumed a significant amount of get/list requests (we went through our AWS free tier allowance in a day with only a handful of voicemails).


Pro's
-- Not too difficult to set-up
-- Worked fine in a test environment. We want to put more effort into testing (under load, multi-read's, bigger latency), but as a proof of concept, it does work and the results aren't as bad as we were expecting
-- Most fused-based filesystems that work with s3 offer a caching component, which may be useful (i.e. store locally in cache for one hour, in the event user needs to retrieve again; reduces calls to s3)
-- s3 offers cross region replication for a 'higher availability' offering, however given how often you hear of aws s3 being unavailable, I'm not sure the cost is justified.

Con's
-- If you intend on using the Voicemail section of FPBX with S3, expect it to generate a significant amount of put/get/list, which could get expensive if you have a busy FPBX setup and be slow to the point it is almost unusable. There are other object/block storage offerings out there that don't charge for put/get/list's, which may be an option but the speed would still be an issue. We're not particularly fussed about this as we don't offer customers access to the FPBX GUI. This may cause issues for us internally but is worth the effort to find a workaround for.

-- If there's a network connectivity issue, or the network is congested it will certainly have an impact. We intended to run servers outside the AWS network, including across multiple AS's and different transit/peering routes, which brings its own complexities as we don't own that connectivity.


What to watch out for?
-- Proximity of your server to the chosen AWS s3 zone. If ~10ms average offers a small delay in read/writing, the worst you'd want to consider is probably 20-30ms before the delay in read/writing becomes noticeable. Our intent is to look into this with FPBX instances placed further away when we get time.

-- We used VPS' for the test environment as they were easy to setup, but they obviously lack the processing power of bare-metal. We might see performance improve slightly on bare metal, however we doubt by lot as the VPS' weren't under any load (i.e. this testing was the only thing we were doing in that environment at the time).


Other alternatives?

-- GlusterFS; although we're not interested in running our own clustered filesystem and liked the idea of the S3 model - someone else takes care of it with an SLA. We use S3 in another part of the business, so it made sense to look at it due to familiarity.

-- Other Storage as a Service offerings, however we assume they'll all perform near enough the same (AWS, GCP, Azure etc) for Australian zones.

-- Per Luis, create a mount-point and sync when a recording is generated. We didn't explore this in any detail, but could be considered by those not happy with the performance from s3

-- NFS; but don't like the idea of NFS across the internet


Overall;

We'll continue to look at this option in more detail. We're happy with the results thus far and we satisfied our initial query around access times when retrieving a voicemail.

We'll also try and spin up a another two servers in slightly more distant locations to see what those results look like, which would be more comparable to the cluster we have in operation at the moment. If we remember, we'll also keep an eye on the put/get/list volumes in a bit more detail.
 
  • Like
Reactions: DigitalDaz

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,070
577
113
simcard, nice work and thanks for bringing us these results.

I don't know what your eventual scale will be but if you are looking at availability, give LeoFS a look. It would be a bigger job setting it up but you would have your s3 datastore then without the costs, you would of course need 3 servers in the same DC, it uses multicast. I will speak with mcrane because if the db stored the path to the file consistently, we could make the CDR

That is where I would eventually like to be.
 

MTR

Member
Oct 25, 2017
181
9
18
45
Does anyone know how to set up to Email the recordings post the phone call,
 

SlimJim

New Member
Feb 6, 2018
11
1
3
34
Northern Indiana
I would like to share the way I have implemented this across several servers. We already use seafile for file sync and sharing. It utilises s3 in the backend but also supports file versioning. (More on why this excites me in a bit.) They offer a client called seadrive. Basically, it is fuse.

I used this guide to install seadrive.
https://www.seafile.com/en/help/drive_client_linux/

It took me a while to figure out that there is an option you have to enable so apache and FreeSWITCH can access the recordings, sounds, voicemail, faxes, provisioning, etc.
An example command that I ran:

seadrive -c ~/seadrive.conf -f -o allow_other -d data-directory [-l logfile] virtual-drive-dir

The -o allow_other allows for multi-user access without permission denied.

I create symlinks from all the FreeSWITCH directories to the virtual-drive-dir.

Some cool things: with file versioning, if I ever have a client delete a voicemail, fax, or anything by accident, seafile keeps the file for x amount of days. I can recover it with a simple click.

You can also choose the cache size, thus how much is actually stored on your server. In theory, you could have thousands of gigs, on a server with only a few hundred gigs of actual storage.

I have only tested this on servers with a hundred EUs, and I have not had any issues YET. Speed does not seem to be an issue, I can choose a random voicemail that has not been loaded into cache yet, play the file, and it will play as if it was locally stored. I am no expert at this, just a solution that is working for me. Of course, this adds another server to the mix, but for us, we need that server anyway. You can configure the seafile server for HA as well.
 
Last edited:

KonradSC

Active Member
Mar 10, 2017
166
99
28
This shouldn't be in ha. Its not register. Anyway, I got a customer who requested that, and of course the speed issue popped out.

The way we resolve this is by having a mount point in other path, and using mount binds and scripting we kept useful records in storage and recording directory, all other stuff was in the external mount point.

Btw, I used google drive for this. I will write more in my blog and send you a link here.

Hey Luis,

I know this is an old post, but did you ever post your script for moving your extra stuff to an external mount point? I'm getting ready to write something that will move call recordings to an external mount point and also update the recording location in v_call_recordings & v_xml_cdr. Just wanted to see what you came up with before I started.

Thanks!
 

KonradSC

Active Member
Mar 10, 2017
166
99
28
That made me laugh! ...and then I realized that I'm going to have to write the shell script from scratch. I think I'm going to walk out to my car to have an angry cry.
 

socom

New Member
May 11, 2018
8
0
1
55
@KonradSC and @DigitalDaz I'm about to write a bash script that runs via cron daily that will move vm recordings and faxes to vultr's s3 type Object storage if content resides in the dirs that is older the 90 days. After the s3cmd put has completed, it will verify, then delete locally, then create a sybolic link in its place to its new location e.g. /s3fs-rc/vm-id (basically pointing to where it can be found on a mounted s3fs point). I don't plan on tampering with the database. Share my results if anyone is interested.

About 3-4 months ago I did setup vultr's object storage to handle all the FS content, and it worked at low volume - it didn't take much to feel the pains of s3 though. However, here I am revisiting it but from an archival perspective. If I recall correctly I believe I'm shoving CDRs in the DB - did so in an effort to make call recovery work (no success). Planning on moving CDR to file system, if > 90 days to s3 along with recs and faxes.

Ideally, I'd like to move the Fusion search executions and data storage off the switches and have them hit 90 days of block, searches farther out would be hitting s3 objects.

Quick and dirty one or two liner:

find ${ARCHDIR} -type f -name "*.wav" -exec rsync -axHAWXS {} /mnt/s3rc{} \; -mtime +${DELTA} --remove-source-files {} -exec ln -s {} /mnt/s3rc{} \; <-- this may not work.

If then:

find ${ARCHDIR} -type f -name "*.wav" -exec rsync -axHAWXS {} /mnt/s3rc{} \;
find ${ARCHDIR} -type f -name "*.wav" -mtime +${DELTA} -exec rm -f {} \; -exec ln -s {} /mnt/s3rc{} \;
 
Last edited:

KonradSC

Active Member
Mar 10, 2017
166
99
28
For CDR's, I recommend using a seperate Postgres DB server to archive older CDR's. The functionality to access the old records using this method is already built into FusionPBX. If you want to save space, then I'd recommend clearing out data in the v_xml_cdr.json column.
https://docs.fusionpbx.com/en/latest/additional_information/cdr_archive.html?highlight=archive

That's an interesting idea to move the recordings and replace them with links.

I've always been a little hesitant to put voicemails on S3, due to them being access by FreeSWITCH using LUA. I've been burned before by FreeSWITCH accessing the file system and causing issues. I'm pretty aggressive with clients about not letting them keep old voicemails so the data set is pretty small, like less than 5 Gigs of voicemails. Call recordings on the other hand are only access through the web interface, so that made sense to me to move them to S3.
 
  • Like
Reactions: socom

ad5ou

Active Member
Jun 12, 2018
892
204
43
Just a reminder as it relates to this subject: https://www.pbxforums.com/threads/yet-another-take-on-call-recordings-storage-aws-s3.3002/

Our available primary storage space has stayed pretty consistent by using cdr archive database and limiting retention of fax files, voicemails, etc.

Have only recently acquired a client wishing to retain call recordings indefinitely. I already convert recordings nightly to save space and move to dedicated storage volume but plan on using a bit of this info to transfer older files to object storage.
 
  • Like
Reactions: tal952

KonradSC

Active Member
Mar 10, 2017
166
99
28
Thanks @ad5ou! I knew I wrote a post on that stuff!

We have also have a client that wants long term storage of call recordings. They were fine with managing their own storage and needed a very simple way to access the files, so we just SCP the call recordings to them nightly using an Expect script (it's a Windows server...ugh) and dump them in folders by date.

I'm hijacking the thread but I'm feeling dangerous. Here's the expect script....

Code:
#!/usr/bin/expect -f

#Date Variable
set timeout -1
set ip_address 5.x.x.1
set user ricky@domain
set pass "securepassword"
set domain customer.fusiondomain.com

#Date Variables
set year_month [clock format [clock seconds] -format {%Y_%m}]
set year [clock format [clock seconds] -format {%Y}]
set month [clock format [clock seconds] -format {%b}]
set day [clock format [clock seconds] -format {%d}]

#touch a blank file so we can create a folder on the Windows server
spawn mkdir /tmp/$year_month/
spawn touch /tmp/$year_month/blank.mp3

#create the folder
spawn scp -l 1024 -r /tmp/$year_month/ $user@$ip_address:/C:\\Users\\ricky.domain\\Documents\\Recordings
expect {
        password: {send "$pass\r" ; exp_continue}
}

#copy the call recordings
spawn scp -l 1024 -r /var/s3/recordings/$domain/archive/$year/$month/$day/ user@$ip_address:/C:\\Users\\ricky.domain\\Documents\\Recordings\\$year_month

expect {
        password: {send "$pass\r" ; exp_continue}
        eof exit
}
 
  • Like
Reactions: socom

ad5ou

Active Member
Jun 12, 2018
892
204
43
and to think I was going to drop cygwin on my client's windows server to accomplish a similar end result. :eek:
 
Status
Not open for further replies.