Going Paperless
TL;DR
Paperless-ng
Here is how you can set up a Paperless-ng Server with Docker + Docker Compose.
- Install Docker + Docker Compose
- Create a folder where all the Paperless-ng files and folders can live:
1
mkdir -p ~/Docker/paperless-ng
- Now create the
docker-compose.env
anddocker-compose.yaml
files in that directory:1
touch ~/Docker/paperless-ng/docker-compose.env
1
touch ~/Docker/paperless-ng/docker-compose.yaml
- Copy the contents below into the
docker-compose.env
file (expand code block below to copy):Gitlab: docker-compose.example.env1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
# Environment variables to set for Paperless # Commented out variables will be replaced with a default within Paperless. # # In addition to what you see here, you can also define any values you find in # paperless.conf.example here. Values like: # # * PAPERLESS_PASSPHRASE # * PAPERLESS_CONSUMPTION_DIR # * PAPERLESS_CONSUME_MAIL_HOST # # ...are all explained in that file but can be defined here, since the Docker # installation doesn't make use of paperless.conf. # New to paperless-ng COMPOSE_PROJECT_NAME=paperless # Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC. TZ=America/New_York LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_ALL=en_US.UTF-8 # This setting was found from search the GitHub issues # Set the date format for dates within scanned documents PAPERLESS_DATE_ORDER=MDY # You can change the default user and group id to a custom one USERMAP_UID=1001 USERMAP_GID=1001 # Additional languages to install for text recognition. Note that this is # different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the # default language used when guessing the language from the OCR output. PAPERLESS_OCR_LANGUAGES=deu heb por spa # Set Paperless to use SSL for the web interface. # Enabling this will require ssl.key and ssl.cert files in paperless' data directory. #PAPERLESS_USE_SSL=false # Sample paperless.conf # As this file contains passwords it should only be readable by the user # running paperless. ############################################################################### #### Paths & Folders #### ############################################################################### # This where your documents should go to be consumed. Make sure that it exists # and that the user running the paperless service can read/write its contents # before you start Paperless. #PAPERLESS_CONSUMPTION_DIR="/consume" # You can specify where you want the SQLite database to be stored instead of # the default location of /data/ within the install directory. #PAPERLESS_DBDIR=/path/to/database/file # Override the default MEDIA_ROOT here. This is where all files are stored. # The default location is /media/documents/ within the install folder. #PAPERLESS_MEDIADIR=/path/to/media # Override the default STATIC_ROOT here. This is where all static files # created using "collectstatic" manager command are stored. #PAPERLESS_STATICDIR="" # Override the MEDIA_URL here. Unless you're hosting Paperless off a subdomain # like /paperless/, you probably don't need to change this. #PAPERLESS_MEDIA_URL="/media/" # Override the STATIC_URL here. Unless you're hosting Paperless off a # subdomain like /paperless/, you probably don't need to change this. #PAPERLESS_STATIC_URL="/static/" # These values are required if you want paperless to check a particular email # box every 10 minutes and attempt to consume documents from there. If you # don't define a HOST, mail checking will just be disabled. #PAPERLESS_CONSUME_MAIL_HOST="" #PAPERLESS_CONSUME_MAIL_PORT="" #PAPERLESS_CONSUME_MAIL_USER="" #PAPERLESS_CONSUME_MAIL_PASS="" # Override the default IMAP inbox here. If not set Paperless defaults to # "INBOX". #PAPERLESS_CONSUME_MAIL_INBOX="INBOX" # Any email sent to the target account that does not contain this text will be # ignored. #PAPERLESS_EMAIL_SECRET="" # Specify a filename format for the document (directories are supported) # Use the following placeholders: # * {correspondent} # * {title} # * {created} # * {added} # * {tags[KEY]} If your tags conform to key_value or key-value # * {tags[INDEX]} If your tags are strings, select the tag by index # Uniqueness of filenames is ensured, as an incrementing counter is attached # to each filename. #PAPERLESS_FILENAME_FORMAT="" ############################################################################### #### Security #### ############################################################################### # Controls whether django's debug mode is enabled. Disable this on production # systems. Debug mode is enabled by default. PAPERLESS_DEBUG="false" # Paperless can be instructed to attempt to encrypt your PDF files with GPG # using the PAPERLESS_PASSPHRASE specified below. If however you're not # concerned about encrypting these files (for example if you have disk # encryption locally) then you don't need this and can safely leave this value # un-set. # # One final note about the passphrase. Once you've consumed a document with # one passphrase, DON'T CHANGE IT. Paperless assumes this to be a constant and # can't properly export documents that were encrypted with an old passphrase if # you've since changed it to a new one. # # The default is to not use encryption at all. #PAPERLESS_PASSPHRASE="secret" # The secret key has a default that should be fine so long as you're hosting # Paperless on a closed network. However, if you're putting this anywhere # public, you should change the key to something unique and verbose. #PAPERLESS_SECRET_KEY="change-me" # If you're planning on putting Paperless on the open internet, then you # really should set this value to the domain name you're using. Failing to do # so leaves you open to HTTP host header attacks: # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting # # Just remember that this is a comma-separated list, so "example.com" is fine, # as is "example.com,www.example.com", but NOT " example.com" or "example.com," #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com" # If you decide to use the Paperless API in an ajax call, you need to add your # servers to the list of allowed hosts that can do CORS calls. By default # Paperless allows calls from localhost:8080, but you'd like to change that, # you can set this value to a comma-separated list. #PAPERLESS_CORS_ALLOWED_HOSTS="localhost:8080,example.com,localhost:8000" # To host paperless under a subpath url like example.com/paperless you set # this value to /paperless. No trailing slash! # # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name #PAPERLESS_FORCE_SCRIPT_NAME="" # If you are using alternative authentication means or are just using paperless # as a single user on a small private network, this option allows you to disable # user authentication if you set it to "true" #PAPERLESS_DISABLE_LOGIN="true" ############################################################################### #### Software Tweaks #### ############################################################################### # After a document is consumed, Paperless can trigger an arbitrary script if # you like. This script will be passed a number of arguments for you to work # with. The default is blank, which means nothing will be executed. For more # information, take a look at the docs: # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process #PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh" # By default, when clicking on a document within the web interface, the # browser will prompt the user to save the document to disk. By setting this to # "true", the document will instead be opened in the browser, if possible. #PAPERLESS_INLINE_DOC="false" # By default, paperless will check the document text for document date information. # Uncomment the line below to enable checking the document filename for date # information. The date order can be set to any option as specified in # https://dateparser.readthedocs.io/en/latest/#settings. The filename will be # checked first, and if nothing is found, the document text will be checked # as normal. #PAPERLESS_FILENAME_DATE_ORDER="YMD" # Sometimes devices won't create filenames which can be parsed properly # by the filename parser (see # https://paperless.readthedocs.io/en/latest/guesswork.html). # # This setting allows to specify a list of transformations # in regular expression syntax, which are passed in order to re.sub. # Transformation stops after the first match, so at most one transformation # is applied. # # Syntax is a JSON array of dictionaries containing "pattern" and "repl" # as keys. # # The example below transforms filenames created by a Brother ADS-2400N # document scanner in its standard configuration `Name_Date_Count', so that # count is used as title, name as tag and date can be parsed by paperless. #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}] # # The following values use sensible defaults for modern systems, but if you're # running Paperless on a low-resource device (like a Raspberry Pi), modifying # some of these values may be necessary. # # By default, Paperless will attempt to use all available CPU cores to process # a document, but if you would like to limit that, you can set this value to # an integer: PAPERLESS_OCR_THREADS=2 # Customize the default language that tesseract will attempt to use when # parsing documents. It should be a 3-letter language code consistent with ISO # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php PAPERLESS_OCR_LANGUAGE=eng # On smaller systems, or even in the case of Very Large Documents, the consumer # may explode, complaining about how it's "unable to extend pixel cache". In # such cases, try setting this to a reasonably low value, like 32000000. The # default is to use whatever is necessary to do everything without writing to # disk, and units are in megabytes. # # For more information on how to use this value, you should probably search # the web for "MAGICK_MEMORY_LIMIT". #PAPERLESS_CONVERT_MEMORY_LIMIT=0 # Similar to the memory limit, if you've got a small system and your OS mounts # /tmp as tmpfs, you should set this to a path that's on a physical disk, like # /home/your_user/tmp or something. ImageMagick will use this as scratch space # when crunching through very large documents. # # For more information on how to use this value, you should probably search # the web for "MAGICK_TMPDIR". #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless # By default the conversion density setting for documents is 300DPI, in some # cases it has proven useful to configure a lesser value. # This setting has a high impact on the physical size of tmp page files, # the speed of document conversion, and can affect the accuracy of OCR # results. Individual results can vary and this setting should be tested # thoroughly against the documents you are importing to see if it has any # impacts either negative or positive. # Testing on limited document sets has shown a setting of 200 can cut the # size of tmp files by 1/3, and speed up conversion by up to 4x # with little impact to OCR accuracy. PAPERLESS_CONVERT_DENSITY=300 # (This setting is ignored on Linux where inotify is used instead of a # polling loop.) # The number of seconds that Paperless will wait between checking # PAPERLESS_CONSUMPTION_DIR. If you tend to write documents to this directory # rarely, you may want to use a higher value than the default (10). #PAPERLESS_CONSUMER_LOOP_TIME=10 # By default Paperless stops consuming a document if no language can be # detected. Set to true to consume documents even if the language detection # fails. PAPERLESS_FORGIVING_OCR="true" # By default Paperless does not OCR a document if the text can be retrieved from # the document directly. Set to true to always OCR documents. PAPERLESS_OCR_ALWAYS="true" ############################################################################### #### Interface #### ############################################################################### # Override the default UTC time zone here. # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE # for details on how to set it. PAPERLESS_TIME_ZONE=America/New_York # If set, Paperless will show document filters per financial year. # The dates must be in the format "mm-dd", for example "07-15" for July 15. #PAPERLESS_FINANCIAL_YEAR_START="mm-dd" #PAPERLESS_FINANCIAL_YEAR_END="mm-dd" # The number of items on each page in the web UI. This value must be a # positive integer, but if you don't define one in paperless.conf, a default of # 100 will be used. #PAPERLESS_LIST_PER_PAGE=100 # The number of years for which a correspondent will be included in the recent # correspondents filter. #PAPERLESS_RECENT_CORRESPONDENT_YEARS=1 ############################################################################### #### Third-Party Binaries #### ############################################################################### # There are a few external software packages that Paperless expects to find on # your system when it starts up. Unless you've done something creative with # their installation, you probably won't need to edit any of these. However, # if you've installed these programs somewhere where simply typing the name of # the program doesn't automatically execute it (ie. the program isn't in your # $PATH), then you'll need to specify the literal path for that program here. # Convert (part of the ImageMagick suite) #PAPERLESS_CONVERT_BINARY=/usr/bin/convert # Ghostscript #PAPERLESS_GS_BINARY = /usr/bin/gs # Unpaper #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper # Optipng (for optimising thumbnail sizes) #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
- Copy the contents below into the
docker-compose.yaml
file:Gitlab: docker-compose.example.yaml1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
--- # ############################################################################################## # # # # # Paperless # # # # # ############################################################################################## # https://hub.docker.com/r/jonaswinkler/paperless-ng # https://hub.docker.com/r/linuxserver/paperless-ng # NOTE: # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents version: "2.1" networks: paperless: external: false services: paperlessweb: image: "linuxserver/paperless-ng:1.5.0" env_file: docker-compose.env environment: PUID: 1001 # Open a terminal, and type "id", and set this number to the uid value PGID: 1001 # Open a terminal, and type "id", and set this number to the gid value volumes: - ./paperless-config/:/config/ - ./paperless-data/:/data/ # Default Paperless port: 8000 ports: - "8000:8000" container_name: paperless-vonawesomeweb restart: unless-stopped networks: - paperless
- Now open up your terminal again:
1
cd ~/Docker/paperless-ng
1
docker-compose up -d
- Open up your favorite (Firefox 😉️) browser, and navigate to
http://localhost:8000
, and use the usernameadmin
and passwordadmin
to log in for the first time.
Paperless-ng + FTP
Here is how you can set up a Paperless-ng + FTP Server with Docker + Docker Compose.
-
Create a folder where all the Paperless-ng files and folders can live:
1
mkdir -p ~/Docker/paperless-ng
-
Now create the
docker-compose.env
anddocker-compose.yaml
files in that directory:1
touch ~/Docker/paperless-ng/docker-compose.env
1
touch ~/Docker/paperless-ng/docker-compose.yaml
-
Copy the contents below into the
docker-compose.env
file (expand code block below to copy):Gitlab: docker-compose.example.env1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
# Environment variables to set for Paperless # Commented out variables will be replaced with a default within Paperless. # # In addition to what you see here, you can also define any values you find in # paperless.conf.example here. Values like: # # * PAPERLESS_PASSPHRASE # * PAPERLESS_CONSUMPTION_DIR # * PAPERLESS_CONSUME_MAIL_HOST # # ...are all explained in that file but can be defined here, since the Docker # installation doesn't make use of paperless.conf. # New to paperless-ng COMPOSE_PROJECT_NAME=paperless # Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC. TZ=America/New_York LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_ALL=en_US.UTF-8 # This setting was found from search the GitHub issues # Set the date format for dates within scanned documents PAPERLESS_DATE_ORDER=MDY # You can change the default user and group id to a custom one USERMAP_UID=1001 USERMAP_GID=1001 # Additional languages to install for text recognition. Note that this is # different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the # default language used when guessing the language from the OCR output. PAPERLESS_OCR_LANGUAGES=deu heb por spa # Set Paperless to use SSL for the web interface. # Enabling this will require ssl.key and ssl.cert files in paperless' data directory. #PAPERLESS_USE_SSL=false # Sample paperless.conf # As this file contains passwords it should only be readable by the user # running paperless. ############################################################################### #### Paths & Folders #### ############################################################################### # This where your documents should go to be consumed. Make sure that it exists # and that the user running the paperless service can read/write its contents # before you start Paperless. #PAPERLESS_CONSUMPTION_DIR="/consume" # You can specify where you want the SQLite database to be stored instead of # the default location of /data/ within the install directory. #PAPERLESS_DBDIR=/path/to/database/file # Override the default MEDIA_ROOT here. This is where all files are stored. # The default location is /media/documents/ within the install folder. #PAPERLESS_MEDIADIR=/path/to/media # Override the default STATIC_ROOT here. This is where all static files # created using "collectstatic" manager command are stored. #PAPERLESS_STATICDIR="" # Override the MEDIA_URL here. Unless you're hosting Paperless off a subdomain # like /paperless/, you probably don't need to change this. #PAPERLESS_MEDIA_URL="/media/" # Override the STATIC_URL here. Unless you're hosting Paperless off a # subdomain like /paperless/, you probably don't need to change this. #PAPERLESS_STATIC_URL="/static/" # These values are required if you want paperless to check a particular email # box every 10 minutes and attempt to consume documents from there. If you # don't define a HOST, mail checking will just be disabled. #PAPERLESS_CONSUME_MAIL_HOST="" #PAPERLESS_CONSUME_MAIL_PORT="" #PAPERLESS_CONSUME_MAIL_USER="" #PAPERLESS_CONSUME_MAIL_PASS="" # Override the default IMAP inbox here. If not set Paperless defaults to # "INBOX". #PAPERLESS_CONSUME_MAIL_INBOX="INBOX" # Any email sent to the target account that does not contain this text will be # ignored. #PAPERLESS_EMAIL_SECRET="" # Specify a filename format for the document (directories are supported) # Use the following placeholders: # * {correspondent} # * {title} # * {created} # * {added} # * {tags[KEY]} If your tags conform to key_value or key-value # * {tags[INDEX]} If your tags are strings, select the tag by index # Uniqueness of filenames is ensured, as an incrementing counter is attached # to each filename. #PAPERLESS_FILENAME_FORMAT="" ############################################################################### #### Security #### ############################################################################### # Controls whether django's debug mode is enabled. Disable this on production # systems. Debug mode is enabled by default. PAPERLESS_DEBUG="false" # Paperless can be instructed to attempt to encrypt your PDF files with GPG # using the PAPERLESS_PASSPHRASE specified below. If however you're not # concerned about encrypting these files (for example if you have disk # encryption locally) then you don't need this and can safely leave this value # un-set. # # One final note about the passphrase. Once you've consumed a document with # one passphrase, DON'T CHANGE IT. Paperless assumes this to be a constant and # can't properly export documents that were encrypted with an old passphrase if # you've since changed it to a new one. # # The default is to not use encryption at all. #PAPERLESS_PASSPHRASE="secret" # The secret key has a default that should be fine so long as you're hosting # Paperless on a closed network. However, if you're putting this anywhere # public, you should change the key to something unique and verbose. #PAPERLESS_SECRET_KEY="change-me" # If you're planning on putting Paperless on the open internet, then you # really should set this value to the domain name you're using. Failing to do # so leaves you open to HTTP host header attacks: # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting # # Just remember that this is a comma-separated list, so "example.com" is fine, # as is "example.com,www.example.com", but NOT " example.com" or "example.com," #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com" # If you decide to use the Paperless API in an ajax call, you need to add your # servers to the list of allowed hosts that can do CORS calls. By default # Paperless allows calls from localhost:8080, but you'd like to change that, # you can set this value to a comma-separated list. #PAPERLESS_CORS_ALLOWED_HOSTS="localhost:8080,example.com,localhost:8000" # To host paperless under a subpath url like example.com/paperless you set # this value to /paperless. No trailing slash! # # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name #PAPERLESS_FORCE_SCRIPT_NAME="" # If you are using alternative authentication means or are just using paperless # as a single user on a small private network, this option allows you to disable # user authentication if you set it to "true" #PAPERLESS_DISABLE_LOGIN="true" ############################################################################### #### Software Tweaks #### ############################################################################### # After a document is consumed, Paperless can trigger an arbitrary script if # you like. This script will be passed a number of arguments for you to work # with. The default is blank, which means nothing will be executed. For more # information, take a look at the docs: # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process #PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh" # By default, when clicking on a document within the web interface, the # browser will prompt the user to save the document to disk. By setting this to # "true", the document will instead be opened in the browser, if possible. #PAPERLESS_INLINE_DOC="false" # By default, paperless will check the document text for document date information. # Uncomment the line below to enable checking the document filename for date # information. The date order can be set to any option as specified in # https://dateparser.readthedocs.io/en/latest/#settings. The filename will be # checked first, and if nothing is found, the document text will be checked # as normal. #PAPERLESS_FILENAME_DATE_ORDER="YMD" # Sometimes devices won't create filenames which can be parsed properly # by the filename parser (see # https://paperless.readthedocs.io/en/latest/guesswork.html). # # This setting allows to specify a list of transformations # in regular expression syntax, which are passed in order to re.sub. # Transformation stops after the first match, so at most one transformation # is applied. # # Syntax is a JSON array of dictionaries containing "pattern" and "repl" # as keys. # # The example below transforms filenames created by a Brother ADS-2400N # document scanner in its standard configuration `Name_Date_Count', so that # count is used as title, name as tag and date can be parsed by paperless. #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}] # # The following values use sensible defaults for modern systems, but if you're # running Paperless on a low-resource device (like a Raspberry Pi), modifying # some of these values may be necessary. # # By default, Paperless will attempt to use all available CPU cores to process # a document, but if you would like to limit that, you can set this value to # an integer: PAPERLESS_OCR_THREADS=2 # Customize the default language that tesseract will attempt to use when # parsing documents. It should be a 3-letter language code consistent with ISO # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php PAPERLESS_OCR_LANGUAGE=eng # On smaller systems, or even in the case of Very Large Documents, the consumer # may explode, complaining about how it's "unable to extend pixel cache". In # such cases, try setting this to a reasonably low value, like 32000000. The # default is to use whatever is necessary to do everything without writing to # disk, and units are in megabytes. # # For more information on how to use this value, you should probably search # the web for "MAGICK_MEMORY_LIMIT". #PAPERLESS_CONVERT_MEMORY_LIMIT=0 # Similar to the memory limit, if you've got a small system and your OS mounts # /tmp as tmpfs, you should set this to a path that's on a physical disk, like # /home/your_user/tmp or something. ImageMagick will use this as scratch space # when crunching through very large documents. # # For more information on how to use this value, you should probably search # the web for "MAGICK_TMPDIR". #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless # By default the conversion density setting for documents is 300DPI, in some # cases it has proven useful to configure a lesser value. # This setting has a high impact on the physical size of tmp page files, # the speed of document conversion, and can affect the accuracy of OCR # results. Individual results can vary and this setting should be tested # thoroughly against the documents you are importing to see if it has any # impacts either negative or positive. # Testing on limited document sets has shown a setting of 200 can cut the # size of tmp files by 1/3, and speed up conversion by up to 4x # with little impact to OCR accuracy. PAPERLESS_CONVERT_DENSITY=300 # (This setting is ignored on Linux where inotify is used instead of a # polling loop.) # The number of seconds that Paperless will wait between checking # PAPERLESS_CONSUMPTION_DIR. If you tend to write documents to this directory # rarely, you may want to use a higher value than the default (10). #PAPERLESS_CONSUMER_LOOP_TIME=10 # By default Paperless stops consuming a document if no language can be # detected. Set to true to consume documents even if the language detection # fails. PAPERLESS_FORGIVING_OCR="true" # By default Paperless does not OCR a document if the text can be retrieved from # the document directly. Set to true to always OCR documents. PAPERLESS_OCR_ALWAYS="true" ############################################################################### #### Interface #### ############################################################################### # Override the default UTC time zone here. # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE # for details on how to set it. PAPERLESS_TIME_ZONE=America/New_York # If set, Paperless will show document filters per financial year. # The dates must be in the format "mm-dd", for example "07-15" for July 15. #PAPERLESS_FINANCIAL_YEAR_START="mm-dd" #PAPERLESS_FINANCIAL_YEAR_END="mm-dd" # The number of items on each page in the web UI. This value must be a # positive integer, but if you don't define one in paperless.conf, a default of # 100 will be used. #PAPERLESS_LIST_PER_PAGE=100 # The number of years for which a correspondent will be included in the recent # correspondents filter. #PAPERLESS_RECENT_CORRESPONDENT_YEARS=1 ############################################################################### #### Third-Party Binaries #### ############################################################################### # There are a few external software packages that Paperless expects to find on # your system when it starts up. Unless you've done something creative with # their installation, you probably won't need to edit any of these. However, # if you've installed these programs somewhere where simply typing the name of # the program doesn't automatically execute it (ie. the program isn't in your # $PATH), then you'll need to specify the literal path for that program here. # Convert (part of the ImageMagick suite) #PAPERLESS_CONVERT_BINARY=/usr/bin/convert # Ghostscript #PAPERLESS_GS_BINARY = /usr/bin/gs # Unpaper #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper # Optipng (for optimising thumbnail sizes) #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
-
Copy the contents below into the
docker-compose.yaml
file:Important: Make sure you change the line
PUBLICHOST: "192.168.10.7"
to use your computer’s IP AddressGitlab: docker-compose.ftp-example.yaml1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
--- # ############################################################################################## # # # # # Paperless + FTP # # # # # ############################################################################################## # https://hub.docker.com/r/jonaswinkler/paperless-ng # https://hub.docker.com/r/linuxserver/paperless-ng # NOTE: # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents version: "2.1" networks: paperless: external: false paperlessftp: external: false services: paperlessweb: image: "linuxserver/paperless-ng:1.5.0" env_file: docker-compose.env environment: PUID: 1001 # Open a terminal, and type "id", and set this number to the uid value PGID: 1001 # Open a terminal, and type "id", and set this number to the gid value volumes: - ./paperless-config/:/config/ - ./paperless-data/:/data/ # Default Paperless port: 8000 ports: - "8000:8000" container_name: paperless-vonawesomeweb restart: unless-stopped networks: - paperless paperlesspureftpd: # https://github.com/stilliard/docker-pure-ftpd image: "stilliard/pure-ftpd:buster-latest" environment: PUBLICHOST: "192.168.10.7" # Change this to the IP address of **your** machine FTP_USER_NAME: paperless # Can be whatever you want FTP_USER_PASS: Super-Secret-Password-that-you-should-change FTP_USER_UID: 1001 # Open a terminal, and type "id", and set this number to the uid value FTP_USER_GID: 1001 # Open a terminal, and type "id", and set this number to the gid value FTP_USER_HOME: /home/paperless/ftp-files ports: - "21:21" - "30000-30009:30000-30009" volumes: - ./paperless-data/consume/:/home/paperless/ftp-files/ networks: - paperlessftp container_name: paperless-vonawesomepureftpd restart: unless-stopped depends_on: - paperlessweb
-
Now open up your terminal again:
1
cd ~/Docker/paperless-ng
1
docker-compose up -d
You can now scan documents directly from your network printer, and Paperless-ng will put up the file, process it, and add it to you document library.
The Rest of the Story
Background
So it’s like this. I was looking for a way archive mail (snail mail) that I was receiving, and stumbled upon The Paperless Project. That was two years ago, and Paperless has been running solid ever since.
More recently I was checking for updates, and discovered that the original Developer of The Paperless Project had stepped down, and now the current project is called Paperless-ng (ng - next generation?). I was very pleased to see the continuation of this project!
And the icing on the cake, the Linuxserver.io Team picked up the Paperless-ng project, and are providing their own Open Container Initiative (OCI) image for the Paperless-ng project. If ever I am wanting to run an OCI container application, either with Docker or Podman, I always check fleet.linuxserver.io, and highly recommend you check out the other images they offer! :-)
Outcome Vision
The outcome vision for this post is to provide an example Docker Compose file that you can use, and the setup steps to help you get your very own Paperless-ng server running. You then will be able to tweak the files to your liking. The documentation for Paperless-ng is great, however, I hope to distill it down even further in order jump start your Paperless-ng sever setup.
Prerequisites
Paperless-ng can be installed multiple ways, but I’m going to focus on the OCI image setup using Docker.
Docker Installation
So hopefully it’s obvious that you’ll need to install Docker and Docker Compose on your Linux Mint (or other Ubuntu-based) machine.
- Open up a terminal, and type the following:
1
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
- Install the Docker repository
Note: You’ll notice
1 2 3
echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ focal stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
focal stable
above, you can change that to$(lsb_release -cs) stable
if you are using vanilla Ubuntu. - Install Docker, and Docker Compose
1 2
sudo apt update sudo apt install docker-ce docker-ce-cli containerd.io docker-compose
- Add your Linux User to the
docker
group so you do not have to typesudo
when using Docker commands1 2
sudo groupadd docker sudo usermod -aG docker $USER
- Log out, and log back in to apply the group changes to your Linux User account.
Source: https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
Starting up Your Paperless Server
Thanks to the magic of containerization, getting start is quite simple.
Do me a favor, and set things up exactly as I show below. Then when you see things working tweak the heck out of it. 😇️
-
Open up your terminal, and let’s create a home where all the Paperless-ng files and folders can live:
1
mkdir -p ~/Docker/paperless-ng
-
Now create the
docker-compose.env
anddocker-compose.yaml
files in that directory:1
touch ~/Docker/paperless-ng/docker-compose.env
1
touch ~/Docker/paperless-ng/docker-compose.yaml
-
Copy the contents below into the
docker-compose.env
file (expand code block below to copy):Gitlab: docker-compose.example.env1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
# Environment variables to set for Paperless # Commented out variables will be replaced with a default within Paperless. # # In addition to what you see here, you can also define any values you find in # paperless.conf.example here. Values like: # # * PAPERLESS_PASSPHRASE # * PAPERLESS_CONSUMPTION_DIR # * PAPERLESS_CONSUME_MAIL_HOST # # ...are all explained in that file but can be defined here, since the Docker # installation doesn't make use of paperless.conf. # New to paperless-ng COMPOSE_PROJECT_NAME=paperless # Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC. TZ=America/New_York LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_ALL=en_US.UTF-8 # This setting was found from search the GitHub issues # Set the date format for dates within scanned documents PAPERLESS_DATE_ORDER=MDY # You can change the default user and group id to a custom one USERMAP_UID=1001 USERMAP_GID=1001 # Additional languages to install for text recognition. Note that this is # different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the # default language used when guessing the language from the OCR output. PAPERLESS_OCR_LANGUAGES=deu heb por spa # Set Paperless to use SSL for the web interface. # Enabling this will require ssl.key and ssl.cert files in paperless' data directory. #PAPERLESS_USE_SSL=false # Sample paperless.conf # As this file contains passwords it should only be readable by the user # running paperless. ############################################################################### #### Paths & Folders #### ############################################################################### # This where your documents should go to be consumed. Make sure that it exists # and that the user running the paperless service can read/write its contents # before you start Paperless. #PAPERLESS_CONSUMPTION_DIR="/consume" # You can specify where you want the SQLite database to be stored instead of # the default location of /data/ within the install directory. #PAPERLESS_DBDIR=/path/to/database/file # Override the default MEDIA_ROOT here. This is where all files are stored. # The default location is /media/documents/ within the install folder. #PAPERLESS_MEDIADIR=/path/to/media # Override the default STATIC_ROOT here. This is where all static files # created using "collectstatic" manager command are stored. #PAPERLESS_STATICDIR="" # Override the MEDIA_URL here. Unless you're hosting Paperless off a subdomain # like /paperless/, you probably don't need to change this. #PAPERLESS_MEDIA_URL="/media/" # Override the STATIC_URL here. Unless you're hosting Paperless off a # subdomain like /paperless/, you probably don't need to change this. #PAPERLESS_STATIC_URL="/static/" # These values are required if you want paperless to check a particular email # box every 10 minutes and attempt to consume documents from there. If you # don't define a HOST, mail checking will just be disabled. #PAPERLESS_CONSUME_MAIL_HOST="" #PAPERLESS_CONSUME_MAIL_PORT="" #PAPERLESS_CONSUME_MAIL_USER="" #PAPERLESS_CONSUME_MAIL_PASS="" # Override the default IMAP inbox here. If not set Paperless defaults to # "INBOX". #PAPERLESS_CONSUME_MAIL_INBOX="INBOX" # Any email sent to the target account that does not contain this text will be # ignored. #PAPERLESS_EMAIL_SECRET="" # Specify a filename format for the document (directories are supported) # Use the following placeholders: # * {correspondent} # * {title} # * {created} # * {added} # * {tags[KEY]} If your tags conform to key_value or key-value # * {tags[INDEX]} If your tags are strings, select the tag by index # Uniqueness of filenames is ensured, as an incrementing counter is attached # to each filename. #PAPERLESS_FILENAME_FORMAT="" ############################################################################### #### Security #### ############################################################################### # Controls whether django's debug mode is enabled. Disable this on production # systems. Debug mode is enabled by default. PAPERLESS_DEBUG="false" # Paperless can be instructed to attempt to encrypt your PDF files with GPG # using the PAPERLESS_PASSPHRASE specified below. If however you're not # concerned about encrypting these files (for example if you have disk # encryption locally) then you don't need this and can safely leave this value # un-set. # # One final note about the passphrase. Once you've consumed a document with # one passphrase, DON'T CHANGE IT. Paperless assumes this to be a constant and # can't properly export documents that were encrypted with an old passphrase if # you've since changed it to a new one. # # The default is to not use encryption at all. #PAPERLESS_PASSPHRASE="secret" # The secret key has a default that should be fine so long as you're hosting # Paperless on a closed network. However, if you're putting this anywhere # public, you should change the key to something unique and verbose. #PAPERLESS_SECRET_KEY="change-me" # If you're planning on putting Paperless on the open internet, then you # really should set this value to the domain name you're using. Failing to do # so leaves you open to HTTP host header attacks: # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting # # Just remember that this is a comma-separated list, so "example.com" is fine, # as is "example.com,www.example.com", but NOT " example.com" or "example.com," #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com" # If you decide to use the Paperless API in an ajax call, you need to add your # servers to the list of allowed hosts that can do CORS calls. By default # Paperless allows calls from localhost:8080, but you'd like to change that, # you can set this value to a comma-separated list. #PAPERLESS_CORS_ALLOWED_HOSTS="localhost:8080,example.com,localhost:8000" # To host paperless under a subpath url like example.com/paperless you set # this value to /paperless. No trailing slash! # # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name #PAPERLESS_FORCE_SCRIPT_NAME="" # If you are using alternative authentication means or are just using paperless # as a single user on a small private network, this option allows you to disable # user authentication if you set it to "true" #PAPERLESS_DISABLE_LOGIN="true" ############################################################################### #### Software Tweaks #### ############################################################################### # After a document is consumed, Paperless can trigger an arbitrary script if # you like. This script will be passed a number of arguments for you to work # with. The default is blank, which means nothing will be executed. For more # information, take a look at the docs: # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process #PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh" # By default, when clicking on a document within the web interface, the # browser will prompt the user to save the document to disk. By setting this to # "true", the document will instead be opened in the browser, if possible. #PAPERLESS_INLINE_DOC="false" # By default, paperless will check the document text for document date information. # Uncomment the line below to enable checking the document filename for date # information. The date order can be set to any option as specified in # https://dateparser.readthedocs.io/en/latest/#settings. The filename will be # checked first, and if nothing is found, the document text will be checked # as normal. #PAPERLESS_FILENAME_DATE_ORDER="YMD" # Sometimes devices won't create filenames which can be parsed properly # by the filename parser (see # https://paperless.readthedocs.io/en/latest/guesswork.html). # # This setting allows to specify a list of transformations # in regular expression syntax, which are passed in order to re.sub. # Transformation stops after the first match, so at most one transformation # is applied. # # Syntax is a JSON array of dictionaries containing "pattern" and "repl" # as keys. # # The example below transforms filenames created by a Brother ADS-2400N # document scanner in its standard configuration `Name_Date_Count', so that # count is used as title, name as tag and date can be parsed by paperless. #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}] # # The following values use sensible defaults for modern systems, but if you're # running Paperless on a low-resource device (like a Raspberry Pi), modifying # some of these values may be necessary. # # By default, Paperless will attempt to use all available CPU cores to process # a document, but if you would like to limit that, you can set this value to # an integer: PAPERLESS_OCR_THREADS=2 # Customize the default language that tesseract will attempt to use when # parsing documents. It should be a 3-letter language code consistent with ISO # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php PAPERLESS_OCR_LANGUAGE=eng # On smaller systems, or even in the case of Very Large Documents, the consumer # may explode, complaining about how it's "unable to extend pixel cache". In # such cases, try setting this to a reasonably low value, like 32000000. The # default is to use whatever is necessary to do everything without writing to # disk, and units are in megabytes. # # For more information on how to use this value, you should probably search # the web for "MAGICK_MEMORY_LIMIT". #PAPERLESS_CONVERT_MEMORY_LIMIT=0 # Similar to the memory limit, if you've got a small system and your OS mounts # /tmp as tmpfs, you should set this to a path that's on a physical disk, like # /home/your_user/tmp or something. ImageMagick will use this as scratch space # when crunching through very large documents. # # For more information on how to use this value, you should probably search # the web for "MAGICK_TMPDIR". #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless # By default the conversion density setting for documents is 300DPI, in some # cases it has proven useful to configure a lesser value. # This setting has a high impact on the physical size of tmp page files, # the speed of document conversion, and can affect the accuracy of OCR # results. Individual results can vary and this setting should be tested # thoroughly against the documents you are importing to see if it has any # impacts either negative or positive. # Testing on limited document sets has shown a setting of 200 can cut the # size of tmp files by 1/3, and speed up conversion by up to 4x # with little impact to OCR accuracy. PAPERLESS_CONVERT_DENSITY=300 # (This setting is ignored on Linux where inotify is used instead of a # polling loop.) # The number of seconds that Paperless will wait between checking # PAPERLESS_CONSUMPTION_DIR. If you tend to write documents to this directory # rarely, you may want to use a higher value than the default (10). #PAPERLESS_CONSUMER_LOOP_TIME=10 # By default Paperless stops consuming a document if no language can be # detected. Set to true to consume documents even if the language detection # fails. PAPERLESS_FORGIVING_OCR="true" # By default Paperless does not OCR a document if the text can be retrieved from # the document directly. Set to true to always OCR documents. PAPERLESS_OCR_ALWAYS="true" ############################################################################### #### Interface #### ############################################################################### # Override the default UTC time zone here. # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE # for details on how to set it. PAPERLESS_TIME_ZONE=America/New_York # If set, Paperless will show document filters per financial year. # The dates must be in the format "mm-dd", for example "07-15" for July 15. #PAPERLESS_FINANCIAL_YEAR_START="mm-dd" #PAPERLESS_FINANCIAL_YEAR_END="mm-dd" # The number of items on each page in the web UI. This value must be a # positive integer, but if you don't define one in paperless.conf, a default of # 100 will be used. #PAPERLESS_LIST_PER_PAGE=100 # The number of years for which a correspondent will be included in the recent # correspondents filter. #PAPERLESS_RECENT_CORRESPONDENT_YEARS=1 ############################################################################### #### Third-Party Binaries #### ############################################################################### # There are a few external software packages that Paperless expects to find on # your system when it starts up. Unless you've done something creative with # their installation, you probably won't need to edit any of these. However, # if you've installed these programs somewhere where simply typing the name of # the program doesn't automatically execute it (ie. the program isn't in your # $PATH), then you'll need to specify the literal path for that program here. # Convert (part of the ImageMagick suite) #PAPERLESS_CONVERT_BINARY=/usr/bin/convert # Ghostscript #PAPERLESS_GS_BINARY = /usr/bin/gs # Unpaper #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper # Optipng (for optimising thumbnail sizes) #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
-
Copy the contents below into the
docker-compose.yaml
file:Gitlab: docker-compose.example.yaml1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
--- # ############################################################################################## # # # # # Paperless # # # # # ############################################################################################## # https://hub.docker.com/r/jonaswinkler/paperless-ng # https://hub.docker.com/r/linuxserver/paperless-ng # NOTE: # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents version: "2.1" networks: paperless: external: false services: paperlessweb: image: "linuxserver/paperless-ng:1.5.0" env_file: docker-compose.env environment: PUID: 1001 # Open a terminal, and type "id", and set this number to the uid value PGID: 1001 # Open a terminal, and type "id", and set this number to the gid value volumes: - ./paperless-config/:/config/ - ./paperless-data/:/data/ # Default Paperless port: 8000 ports: - "8000:8000" container_name: paperless-vonawesomeweb restart: unless-stopped networks: - paperless
-
Now open up your terminal again:
1
cd ~/Docker/paperless-ng
1
docker-compose up -d
-
Monitor the initialization process:
1
docker-compose logs -f
In the output, you should see:
1 2 3 4 5 6
... paperless-vonawesomeweb | [2021-12-29 00:50:41 -0500] [365] [INFO] Server is ready. Spawning workers paperless-vonawesomeweb | [2021-12-29 00:50:41,917] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /data/consume ... paperless-vonawesomeweb | [2021-12-29 00:51:12,451] [INFO] [paperless.sanity_checker] Sanity checker detected no issues. ...
-
Press
Ctrl+c
to exit from the “docker-compose logs” output -
Open up your favorite (Firefox 😉️) browser, and navigate to
http://localhost:8000
, and use the usernameadmin
and passwordadmin
to log in for the first time.Upon a successful login, you’ll be greeted with the Paperless-ng Dashboard:
-
If you want to change your password, you can do that in the admin section:
Your First Upload
Ok, you’ve logged in, you’re at the Dashboard. Now try uploading a file.
The end result is that the file is “processed”, and added to your document library. Over time you’ll add more and more documents, and you’ll be able to find your specific document amongst the haystack of all the others with a simple text search thanks to the Optical Character Recognition (OCR) built into Paperless-ng.
Let the uploading begin!
-
On the Dashboard, either drag and drop the file you would like to upload, or click Browse files under the Upload new documents section:
-
Wait for your file to upload
-
And that’s it! Document uploaded/processed. You can click on Documents in the menu on the right side of the dashboard to view, edit, or search your documents:
Bonus: Upload via FTP
So instead of uploading everything via the web interface, how would you like to go directly from your scanner to paperless?
Well, you’re in luck, there’s a Docker Compose for that! 🤓️
Prerequisites
- You’ll need a printer that can send files via File Transfer Protocol (FTP), and your printer will need to be connected to your Local Area Network (LAN).
- Also, make a note of your computer’s IP Address, you need it later.
Starting up Your Paperless Server with FTP
-
Complete steps 1 - 3 in the Starting up Your Paperless Server section above, then move on to the next step below.
-
Copy the contents below into the
docker-compose.yaml
file:Important: Make sure you change the line
PUBLICHOST: "192.168.10.7"
to use your computer’s IP AddressGitlab: docker-compose.ftp-example.yaml1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
--- # ############################################################################################## # # # # # Paperless + FTP # # # # # ############################################################################################## # https://hub.docker.com/r/jonaswinkler/paperless-ng # https://hub.docker.com/r/linuxserver/paperless-ng # NOTE: # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents version: "2.1" networks: paperless: external: false paperlessftp: external: false services: paperlessweb: image: "linuxserver/paperless-ng:1.5.0" env_file: docker-compose.env environment: PUID: 1001 # Open a terminal, and type "id", and set this number to the uid value PGID: 1001 # Open a terminal, and type "id", and set this number to the gid value volumes: - ./paperless-config/:/config/ - ./paperless-data/:/data/ # Default Paperless port: 8000 ports: - "8000:8000" container_name: paperless-vonawesomeweb restart: unless-stopped networks: - paperless paperlesspureftpd: # https://github.com/stilliard/docker-pure-ftpd image: "stilliard/pure-ftpd:buster-latest" environment: PUBLICHOST: "192.168.10.7" # Change this to the IP address of **your** machine FTP_USER_NAME: paperless # Can be whatever you want FTP_USER_PASS: Super-Secret-Password-that-you-should-change FTP_USER_UID: 1001 # Open a terminal, and type "id", and set this number to the uid value FTP_USER_GID: 1001 # Open a terminal, and type "id", and set this number to the gid value FTP_USER_HOME: /home/paperless/ftp-files ports: - "21:21" - "30000-30009:30000-30009" volumes: - ./paperless-data/consume/:/home/paperless/ftp-files/ networks: - paperlessftp container_name: paperless-vonawesomepureftpd restart: unless-stopped depends_on: - paperlessweb
-
Now open up your terminal again:
1
cd ~/Docker/paperless-ng
1
docker-compose up -d
-
Monitor the initialization process:
1
docker-compose logs -f
Again, you’ll look for output similar to below:
1 2 3 4 5 6
... paperless-vonawesomeweb | [2021-12-29 00:50:41 -0500] [365] [INFO] Server is ready. Spawning workers paperless-vonawesomeweb | [2021-12-29 00:50:41,917] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /data/consume ... paperless-vonawesomeweb | [2021-12-29 00:51:12,451] [INFO] [paperless.sanity_checker] Sanity checker detected no issues. ...
Additionally, make sure the FTP Server has started up. You should see output similar to this:
1 2 3 4 5 6 7 8 9
paperless-vonawesomepureftpd | Creating user... paperless-vonawesomepureftpd | Password: paperless-vonawesomepureftpd | Enter it again: paperless-vonawesomepureftpd | root user give /home/paperless/ftp-files directory 1001 owner paperless-vonawesomepureftpd | Setting default port range to: 30000:30009 paperless-vonawesomepureftpd | Setting default max clients to: 5 paperless-vonawesomepureftpd | Setting default max connections per ip to: 5 paperless-vonawesomepureftpd | Starting Pure-FTPd: paperless-vonawesomepureftpd | pure-ftpd -l puredb:/etc/pure-ftpd/pureftpd.pdb -E -j -R -P 192.168.10.7 -p 30000:30009 -c 5 -C 5
-
Press
Ctrl+c
to exit from the “docker-compose logs” output.
You can now scan documents directly from your network printer, and Paperless-ng will put up the file, process it, and add it to you document library.
Conclusion
There is so much more you can do to your documents, like tagging or labeling documents automatically to help you better organize your documents.
However, this post is just meant to get you setup and started. The documentation for Paperless-ng is phenomenal, so if you want to dig deeper, check that out.
As always, questions or comments welcome.
I hope you find this helpful! :-)
End of Line.