Building-a-Full-Stack-Video-Platform

March 18, 2024 • 2915 words

Building a Full Stack Video Platform

Recently, the idea of self-hosting was so intriguing that I decided to code a full-stack video platform.

Cloud Object Storage Selection

There are many Object Storage providers, including Amazon AWS, Akamai, Digital Ocean, Cloudflare, Alibaba Cloud. Nearly all of them are compatible with AWS S3 storage. I previously tried to self-host Nextcloud and File Storage with s3fs, but they are pretty slow on the frontends, capping at about less than 1 MB/s, in fact, too slow for the storage to be utilized.

Choosing either is fine, but I chose Cloudflare because Cloudflare R2 is very fast and efficient, and I am already using Cloudflare services. It is slightly more expensive than Dropbox, but self-hosting is so much fun and controllable than using a proprietary service. Dropbox is generally good enough to use, except for a few occasions when the Dropbox frontend UI dies.

Cloudflare is good because it is really cheap and offers many other services, like a proxy for servers to prevent against DDOS, and it is also a domain registrar. Cloudflare has everything except VPS for web hosting.

I tried contacting Alibaba and got a response, unfortunately, I use neither of those apps(just deleted Telegram). Alicloud is probably the cheapest for 1 PB storage, at 132120 dollars a year(while Cloudflare is 15 dollars/month for 1 TB and Amazon is around 22 dollars/month for 1 TB except for its glacier archives). Dropbox has like 10 dollar/month for 2 TB, but I reckon if people actually utilized their Dropbox storage the company will go bankrupt. that if I want to start a business (if in China) it can be a viable choice.

Designing the Backend

After that, I init the PostgreSQL and added some user and video schemas.

I started coding with raw metal with Golang, and then I was busy for like a week, then I accidentally threw away the draft, so I started over again. It's not like I coded a lot anyway.

Mainly because Invidious uses S3. Using Node.js is totally feasible, I just wanted something new to try for fun.

I didn't know Ruby language before, but Crystal is still very easy to pick up, and it claims to be lightning quick (as quick as C language).

Designing the backend is pretty easy, as I aim for a small-scale project. Just sync everything before uploading to Object Storage, add them to the local database. Then retrieve everything from the local database.

The current backend looks like this

├── config.yml
├── LICENSE
├── shard.lock
├── shard.yml
└── src
    ├── backend.cr
    ├── config.cr
    ├── controllers
    │   ├── auth_controller.cr
    │   ├── storage_controller.cr
    │   ├── user_controller.cr
    │   └── video_controller.cr
    ├── extensions
    │   └── context_extension.cr
    ├── middlewares
    │   └── auth_middleware.cr
    ├── routes.cr
    └── storage
        ├── check_existence.cr
        ├── delete_file.cr
        ├── generate_upload_url.cr
        ├── list_files.cr
        └── upload_url.cr

Mainly shards is a dependency tool, and add a middleware to help the authorization by signing a private key in config.yml, and routes look like this (kind of like node.js)

get /api/user/:id, &->UserController.get_user(HTTP::Server::Context)
post /api/auth/signup, &->AuthController.sign_up_user(HTTP::Server::Context)

The syntax isn't hard, not that hard to learn from a Node.js background, and ChatGPT knows pretty much of it. Mainly the backend is for interacting with the PostgreSql.

Trying to Generate the Signature Myself

There is this s3 library for Crystal https://github.com/taylorfinnell/awscr-s3, when I tried to generate a presigned url for the user to upload videos. I don't want the user to upload to the server, instead, I want it to upload directly to the cloud. But the built-in method always generates incorrectly, and there wasn't anything I could do. I tried coding a generation myself with ChatGPT, but it didn't generate correctly either, repeatedly returning the Signature didn't match.

So basically they MAC the date and time with the access secret key for s3, and it was pretty nitty-gritty detailed like whatever thing they need. The boto in python and node.js has built-in generation methods.

I searched another repo https://github.com/iksteen/aws-request-signer and it did generate a key, but the key wasn't working either, probably because it was pretty old and Amazon had like another key signing method.

Mainly because my endpoint is Cloudflare while most of the key generation have endpoints to s3 amazonnews, but I thought like they should both work.

I spent a whole dawn and morning trying to do this and failed, so I gave up and used boto embedded in crystal since crystal can execute Bash.

Still, the presigned url has a limitation of like 5 GB.

Designing the Frontend React

Since this project is still in Alpha, I just fetched everything from the backend. Since there aren't much in the backend, and there isn't traffic, processing everything in frontend code seems straightforward.

I jumped from Claude Opus to ChatGPT, and the frontend got good enough very quickly. The frontend just includes signup, login, upload, and viewing videos right now, although I plan to add extensive features. The backend isn't mature enough right now

Overall I spent a little over 2 full days of the weekend at home building the crap. It seems good enough for a raw implementation for a frontend, though, and I deployed it to anony.tube and anonytube.jimchen.me.

Cloudflare offers good speed, generally, and there wasn't much problem with uploading and downloading videos, since I am directly communicating with Cloudflare.

Then it's like managing Cors and https, and that is really tiresome, like some requests just won't work in the server's frontend. Anyways like after trying I pretty much enabled all Cors, which isn't good for security but anyways good enough for like normal usage.

I added the token in cookies for expiration over 7 days. I checks if the user have the correct token in cookie on entering the webpage, if they do, they are automatically logged in. Also they are redirected from signin to the webpage if they have credentials.

More Features to Add

First, I am adding an NSFW detector, since I am aiming for legal hosting. I am also aiming for like no copyright contents or DMCA. I am also hoping to add different resolutions for videos. Anyways, I am pretty tired for now.

Cost

It's actually surprisingly cheap for a small video platform, as the server cost 5 dollars a month for Linode Nanode, and like storage costs 15 dollars a month for 1 TB, which I probably won't need for now. Cloudflare does most of the job for me.

Lack of Utilization

I put the background worker on a separate Linode with 8 GB ram, but it relaxes 90% of the time(since my website doesn't have many videos, and it's only me right now), so it is not utilizing enough of the machine. But it can't run smoothly on lower ends Linodes though. I guess that I just have to leave it be like this. Speaking of costs if I store like 5 TB object storage that will cost about 100 dollars in popular platforms like AWS or Google Cloud, so like this isn't very significant.

Cloudflare

When uploading a video to Cloudflare S3, it takes a while for the video to load smoothly. When I first upload to it, I couldn't fully utilize the Internet connection speed in Hong Kong, but the speed is good enough in the UK. It takes like half a day before I could watch the video easily from everywhere in the world.

About Public Video Platforms

Many video platforms have awfully problematic features and I don't really like them. They try every method possible to stop downloading, only resulting in slower speeds. I also hate the login(and especially hate the fraudulent poping up of forcing me to login). Anyways, these platforms have to comply with local laws, so it's more understandable to have control. But I am aiming for liberty (also within laws) for my platform, mainly hoping to enhance the experience of a guest user. I think another reason might be that they wrote large junky code 10 years ago which are too trash to debug or fix.

There are lots of existing platforms like Peertube and YouPHPTube, but I just want to design one myself for the fun sake of it.

But the problem is probably no one will visit this website, so in the end, it will probably end up like a Dropbox for my personal use. Many people probably like those video platforms anyways, since a determined person who hate those video platforms probably sought for self-host options like me, and I do not consider myself technically advanced or something. So don't believe it when people are complaining, because chances are they actually don't mean it.

20240328 Misconceptions and Observations

Development vs Production

For example, in React, use sudo npm start to start the web server, but actually running in production compiles the code into static HTMLs in a build directory.

Use of Nginx

Using Nginx effectively sets up a reverse proxy, so there is no need for separate machines for different addresses. There is no need for separate frontend and backend Linodes, just set up a reverse proxy webto proxy to the backend from a specific route.

Systemctl Jobs

Actually, setting up nginx.conf to point to the frontend build directory effectively starts the website, so there is no need for a systemctl job to keep running npm start or whatever.

Object Storage (S3) vs Volumes

S3 storage is popular for large file storing (e.g., video hosting), while volumes are SSDs along with the operating system. Space on the operating system is expensive, costing about 0.1 dollar a month for 1 GB. While object storage like Cloudflare R2 is $0.015 / GB-month. So, it's more scalable to host media in Object Storage than locally, while Object Storage might be slower to access, utilizing a built-in CDN solves the problem.

SaaS

Some popular SaaS services include MongoDB.

Anyways, MongoDB is open source and can be run locally. The official website only provides a more stable and scalable option, and just setting up MongoDB from yay and connecting to it in Node.js is straightforward.

MongoDB Security

Never leave MongoDB exposed! It will get hacked. Never change the access IP to 0.0.0.0 without configuring a password. Data will be deleted, with only a Bitcoin address left for recovery (which disappeared in 12 hours). More like, don't store anything valuable in the MongoDB database, and hash (or encrypt) everything sensitive.

SSH Security

Similarly, use strong passwords for SSH.

DbaaS Latency

DbaaS (Database as a Service) tends to have higher latency, but MongoDB Atlas generally works fine. It reduces the time to maintain the database, and offers backups.(though I can manually backup the database)

Cloudflare AI Worker

For utilizing Cloudflare AI worker, the Llama model seems really good, quick, and cheap, but the OpenAI Whisper isn't working, keeps saying upstream service error.

Increasing Temporary Space for PyTorch

For installing PyTorch on CPUs, I need to increase tmp space using sudo nano /etc/systemd/system/tmp.mount.

[Unit]
Description=Temporary Directory
After=local-fs.target
[Mount]
What=tmpfs
Where=/tmp
Type=tmpfs
Options=mode=1777,size=8G
[Install]
WantedBy=multi-user.target

Then

sudo systemctl daemon-reload
sudo systemctl enable tmp.mount
sudo reboot

Increase the tmp space.

Python Script Killed

Use dmesg to find the problem.

If it's running out of memory: include the Swap space in Linode. Not a good solution, but enough for scripts running 24/7.

Use separate Linodes for managing the worker and the server, because merging the two might cause the server to become not responsive (could be managed with cpulimit but takes lots of time).

Temp Files in Python

temp_dir = tempfile.mkdtemp(dir=os.getcwd())

then use shutil to remove it.

Video Upscaling

Too expensive to run; outdated projects throw an error: No module named 'torchvision.transforms.functional_tensor', and waifu2x upgrading is not that good. It's slightly better than original, but far from native 4k videos (blurry in previously clear parts like black and white stripes). Using lanczos is far less good .

Face upgrading seems ok, but it only offers face enhancement. More akin to Web Glow.

Environment Problems

Use a builduser and yay, you can remove the password (set emptypasswordssh to no). Then use that user to install things with yay, and for Python libraries: either install with yay (sometimes takes a long time compiling) or simply use venv.

I like Arch Linux because it's DIY, and any problems comes from the user. Ubuntu packages need to be compiled/downloaded at an up-to-date version to work.

Translation Tools

There are many translation tools available, including Google-t5, Facebook's NLLB Distilled, Helsinki NLP, etc. Just download the model from Hugging Face, then use Transformers, and it runs pretty smoothly when translating VTT files. But it still takes a really long time. Then there's Argos Translate and CTranslate built on OpenNMT models.

Speed Comparison

Laptops are by far much worse in terms of CPU performance compared with cloud servers. My laptop, though high-end, takes about the same time to process a video as an 8 GB Linode with 4 CPUs (it's not utilizing all 16 CPUs on my laptop, but you get the idea). Cloud servers also offer much faster internet speed.

Running Costs

Running a GPU constantly is way too expensive for me; the prices range from 0.2-5 dollars an hour. Even a single-core, very low-end 0.2 dollar/hour GPU, typically RTX 3000 or 4000 series, costs like 150 dollars a month. I kind of understand why these websites want to charge people money—they have all those running GPUs in the background. The raw costs and maintaining them are really expensive, and like, you have to have much more available than the normal workload. Those companies don't turn on and turn off those GPUs like I do when I was doing homework for customers, though, because it has to be readily available.

Then there are 3rd party API services, like Whisper APIs. But those are still very expensive. Transcribing an hour of audio costs around 0.2 dollars. Comparably much cheaper than freaking Veed Premium, which is like 20 dollars a month to use for like 100 minutes of transcription services! Those commercial companies wrap everything up in fancy frontend frameworks, and you never know how expensive they charge!

Transcribing services

Running OpenAI Whisper on CPU is so slow! It goes on forever for transcribing 30 seconds. It's just very, very slow and my computer's fan was complaining all the time.

Distilled Whisper doesn't support other languages, only English. I was like: I don't need to transcribe English, I can hear words clearly no matter.

So I chose Faster-Whisper, which is much faster, and generally accurate enough for me.

There's also "Insanely Fast Whisper," except it's not available on CPU. I have no idea, since like the point of quantization is for models to run on fewer resources.

To be honest, strangely Whisper sometimes doesn't catch one word in a song, but Faster Whisper does well with songs.

Linode 8GB costs $0.072 an hour, and running Faster-Whisper on it for an hour can transcribe about a 2-hour video, which is much cheaper than the API services, although one might argue it's not that scalable. Given 100 hours of audio, it takes that long to translate on Linode, but with one click of an API (since it's designed to scale), you can produce the results. It's probably 10 times cheaper than the third-party stuff, but, I don't think the third-party companies did quantization, though.

There's also Dropbox which offers free transcribing for uploaded videos, not that I like Dropbox or something. In fact, there are so many of those companies offering nowadays, like YouTube's default transcription(which is weird and I still haven't figured it out, sometimes it is good other times it is disabled, and very bad).

React Theme

There is a Dark Reader npm library to implement different themes and it's extremely easy and popular! I wished I had used that instead of manually implementing a dark theme with React's context api for my personal website before.

But the iframe was a big trouble. Actually it's my first time implementing iframes so like they are a big headache. Passing props to the iframe page doesn't work to iframes on a page, and I tried about 2 hours to manipulate the iframe with previous methods and failed. Then I used the send message api and it worked good, just basically send a message to the iframe, and let it handle the message and toggle the theme.

Conclusion

Self-hosting is awfully addictive and fun, aiming to get rid of "necessary" proprietary commerical bloated platforms. But it's kinda limited as well, and maintaining cost real lot of efforts. For example, the bugs of in frontend react is a constant headache(actually I hate frontend now). I can enjoy videos with subtitles and watch them anywhere.

I could still remember in 2023.6 I was 2 weeks into learning full stack how I tried to first deploy the web server on Linode and Cloudflare. I was sitting on the floor and listening to indie music well past midnight. I tried Heroku by putting the full frontend and backend into it and it didn't work. I thought Paas wasn't for me. I 'tiger-vnc'ed into the web server and changed every env vars from localhost:3000 to the domain name, as well as the mongodb URI in the db file. Then I finally started the server with npm run dev. I remembered seeing it work on port 3000. Then I tried pointing my domain to the port 3000 and I wondered why it didn't work. Then finally I didn't know how to use https so I used the Cloudflare proxy. Anyway things finally worked but it was so stupid.