ON SELF HOSTING, PART 1
The cycle is endless. I get email, or I get physical mail, I put it on the living room tv table, and there it accumulates for a few days until we are forced to sort through what’s spam(90% of it is), and what is not. Those documents that are not spam are often notices and/or receipts to keep, and rarely, it is something immediately actionable. The solution so far? Maybe take a photo of it if needed via Google’s Stack app, and chuck it into a plastic box under the couch, make sure it’s not in the way of the robovac, and go on about our days.
This…isn’t exactly optimal. I’ve been thinking about a proper document management for a while (and considering buying a scanner as well). It is a Samba share at my homeserver, it automagically uploads to a Nextcloud instance, and a cron job runs every now and then to back up the changes. What I have currently works, however, it’s not exactly searchable, and an “Invoice” can be “Healthcare” or “University” as well.
Paperless-ngx itself provides a challenge, in a way. It has a “Consume” folder, and if I copy all my documents to it, it will delete them in the consume folder, and store the pdfs somewhere else meanwhile using its own logic to name/move files, which I can then tag in the web interface. There is however, a problem. I don’t want to have to copy the same PDF twice. Once in my Samba
//DOCUMENTS share, and another time in the ../
I can’t just use rsync for a one way sync directly, because it will just copy everything it sees as deleted over and over again, and paperless will try to consume it over and over again. There has to be a solution.
The solutions is…to keep a filter. After the 1-way rsync runes, you create a filtered_files.txt in the cronjob, which contains a list of all the current files that are in the
//DOCUMENTS share, and the next rsync run accepts this as a filter.
rsync -raz --progress --exclude-from=exlist <Documents> <Paperless Consoom Folder>
should do the job.
…I’ll update on how it goes.