Knowledge Dump

Knowledge Dump ============== Debugging encoder/class issues tends to be pretty specific to the situation. Here are the core things to keep in mind that help solve most issues: Database stuff -------------- - `classes` table - `recording_status` generally reflects the `status` field for the encoder that recorded the original stream, but *not* for follow up edits - `vod_status` is modified in two ways: either by the studio, who can set it to `Needs Edits`, `Awaiting Approval`, or `Approved`; and, when we start a new edit for a class (which creates a new video file), this state is updated to `Rendering` when the edit starts and `Awaiting Approval` when it completes. - *if the edit fails, the class will be stuck in the `Rendering` state*. this is infrequent and is an obvious sign of a bug (or possibly an AWS issue, but to my knowledge it's always reflected something we can fix/update on our end) - the `approved` boolean is deprecated, I *think* this can be removed but I'd do a quick check throughout the code to make sure there aren't any lingering uses. this field has been replaced by the `vod_status` field, where `vod_status = 'Approved'` is equivalent to `approved = true`. - the `cancelled` field is used at the class's `start_time` to determine whether we should run the class (see studio-manager code for more details) - `encoders` table - the `status` field is very useful, it's what the encoder sends to the studio manager to update its state as it as it progresses through a stream/edit/transcode - the `task` field tells you whether the encoder is for a stream, edit, or transcode - the `guid` field can be cross-referenced to the `vmID` tag for the corresponding EC2 instance - the `url` field is an internal reference for the studio manager that reflects the private url of the encoder within the AWS network, and is therefore *not publically accessible*. see 'Debugging encoders' below for more info - `video_files` table - `start_time` and `end_time` are NOT absolute times -- they are empty strings for streams, and strings of the format `HH:MM:SS` for edits + transcodes. they are the relative times to trim the source video at, and are passed directly into ffmpeg. they are an easy distinguishing value to show that a video file was an edit and not a stream. - one thing worth considering is that we don't currently store the source GUID for an edit. so, for example, if we edit the original recording of a class (call the result V1), then edit V1 into a new version V2, then the source video file for V2 is V1, not the original. this means the `start_time` and `end_time` values are ambiguous, since they're relative to the source video file for the edit and we don't know whether the source was V1 or the original. this would be an easy thing to add into the database when an edit is started, we just haven't done so. - the `name` field is old/unused, the `title` in the `classes` table is actually what users see. *can likely be deleted without issue* - we build the on demand URL on the fly now based on the `room_id`, `format`, and `guid` fields (*the `url` column used to be for that, but it's deprecated/probably fine to delete*) - The relations between `classes` and `video_files` - one class -> many video files - each row in `video_files` points to a class in `classes` via the `class_id` column - `classes` have a `selected_video_file_id` field to show the active video file for that class (the one that we'd show to users if the `vod_status` of that class was `Approved`) - The relation between `video_files` and `encoders` - each row in `video_files` has a corresponding row in `encoders`, linked by the `guid` - the `status` row for a video file should reflect the `encoder` status (I don't think there's ever a case where they're different, unless we fail to update the db in some way) Encoder ------- ### Startup - the launch template `encoder-v2` is used to start the encoder -- to see the commands it runs, navigate to the launch template in EC2, click on 'Advanced details', and scroll down to the 'User data' section. this sets up the go environment, updates ssh, pulls the most recent updates for the encoder repo (which is already in the encoder AMI), builds the encoder, and starts it via systemd. - an encoder's config lives at `~ec2-user/encoder-v2.conf`. this starts out as mostly-finished config with placeholder values for the private IP (within the AWS network) and VM ID - the script `~ec2-user/query_vm_id.sh` on the encoder uses the AWS CLI to grab the private IP and VM ID values to fill in the config. this script is called by the encoder itself, in the `Setup()` function in `util/setup.go` - if any of the files on the encoder need be changed, you have to create a new AMI and update the launch template to use it - when the launch template is updated to a new version in ec2, you also have to update the launch template version number in the vm-manager config ### Debugging - if a class is having an issue, find in the DB (the title tends to be a good filter, as well as `start_time < now()` or something like `start_time < now() + interval '1 day'`, followed by `ORDER BY start_time desc`) - once you have the class ID, get its corresponding `video_files` (`SELECT * FROM video_files WHERE class_id = <id>;`) - find the video file + `guid` you care about, and check out the encoder state with `SELECT * FROM encoders WHERE guid = <guid>;` - the `guid` can then be used to reference the corresponding EC2 instance by its `vmID` tag. I generally just do this from the ec2 dashboard, but the aws CLI could be used too - from the EC2 dashboard, get the public IP of the encoder. log in with `ssh -i $HOME/.ssh/inspire.pem "ec2-user@<public_ip>"`. I use this bash function: ``` encoder_ssh() { local addr="$1" && shift [[ -z "$addr" ]] && return 1 ssh -i $HOME/.ssh/inspire.pem "ec2-user@$addr" $@ } alias essh='encoder_ssh' ``` - if you want to quickly check the state of an encoder, its video file, and corresponding class, you can use the `scripts/manage_encoder.sh` script in the studio manager repo. run it as `manage_encoder.sh <guid>`. when you run this script, it'll show you the database state for the encoder + video file + class. I initially used this script to fix various recurring issues that used to be more common, so after it displays the database state it checks some of the column values to see if any weird case has been reached. the checks aren't perfect, but it won't do anything until you confirm the update. so if you want to just quickly check on the encoder, you can run the script, then just answer N if it asks if you want to run a particular change. see the script itself for more info on the types of fixes it can run. ### Developing Developing on the encoder can be a bit cumbersome without some helper functions/scripts. I recently made this much easier on myself with the commands in `encoder-dev.sh` in the `feat/rtmp-failover` branch of the `encoder-v2` repo. To develop, I spin up a new encoder from the ec2 dashboard using the encoder-v2 launch template, ssh in, then copy/paste the contents of `encoder-dev.sh` into the terminal on the encoder instance (this script is *not* meant to be ran as a script, despite the extension, I just copy/paste it all in each time). you can select the branch you want it to load with the `BRANCH` environment variable near the top. When you copy/paste those commands in, it'll log in as root, checkout the branch you want to develop on, and load functions useful for testing. `start <room_id>` will start the encoder using its dummy guid of `abcdefg` and have it pull from the RTSP stream for the provided room number. `stop` will stop the encoder -- before starting again, you probably want to run `rm -f /tmp/abcdefg/* /var/log/encoder/error.log` to clean up from the last run. if you push updates to the branch, run the `update` command before running `start` to pull them down. Running `log` will `tail` the log for you, and `vi $log` will open the log in an editor for easier searching/etc. I'll often open a second ssh session and just copy in the `log` var and `log()` function and dedicate that shell to that while running start/stop/etc. in the first shell. Once you understand how the encoder works you can streamline the development a bit (you also don't always have to run it on a dedicated box, it'll work locally in many cases if you have a studio manager it can connect to, but it's good to make sure things work on an actual instance before deployment). For example, if you want to play with the command for uploading to the RTMP servers, you *could* update the code, push to the branch, run `update` on the encoder, and `start <room_id>`. but you can also just run the encoder once, look at the logs, find the `ffmpeg` command (just search for e.g. `bin/ffmpeg` and you'll see all of the ffmpeg commands in a log block together), then as long as there are files in `/tmp/abcdefg` to pull from you can copy that `ffmpeg` command into the terminal and modify it/test until you figure out what you need. Once you're comfortable with the changes, you can usually test in production. If you're only updating the RTMP upload ffmpeg commands, you can test by running a class (live = false) and watching the class stream in the broadcast dashboard. Know that the studio is regularly running edits as well, so be careful with when you do updates/make sure they're aware since seemingly orthogonal changes can still mess with their workflow (something I've made mistakes on, but you guys likely won't haha).