Contents

Going Paperless

TL;DR

Paperless-ng

Here is how you can set up a Paperless-ng Server with Docker + Docker Compose.

  1. Install Docker + Docker Compose
  2. Create a folder where all the Paperless-ng files and folders can live:
    1
    
    mkdir -p ~/Docker/paperless-ng
    
  3. Now create the docker-compose.env and docker-compose.yaml files in that directory:
    1
    
    touch ~/Docker/paperless-ng/docker-compose.env
    
    1
    
    touch ~/Docker/paperless-ng/docker-compose.yaml
    
  4. Copy the contents below into the docker-compose.env file (expand code block below to copy):
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    
    # Environment variables to set for Paperless
    # Commented out variables will be replaced with a default within Paperless.
    #
    # In addition to what you see here, you can also define any values you find in
    # paperless.conf.example here.  Values like:
    #
    # * PAPERLESS_PASSPHRASE
    # * PAPERLESS_CONSUMPTION_DIR
    # * PAPERLESS_CONSUME_MAIL_HOST
    #
    # ...are all explained in that file but can be defined here, since the Docker
    # installation doesn't make use of paperless.conf.
    
    # New to paperless-ng
    COMPOSE_PROJECT_NAME=paperless
    
    # Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
    TZ=America/New_York
    LANG=en_US.UTF-8
    LANGUAGE=en_US.UTF-8
    LC_ALL=en_US.UTF-8
    
    # This setting was found from search the GitHub issues
    # Set the date format for dates within scanned documents
    PAPERLESS_DATE_ORDER=MDY
    
    # You can change the default user and group id to a custom one
    USERMAP_UID=1001
    USERMAP_GID=1001
    
    # Additional languages to install for text recognition.  Note that this is
    # different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
    # default language used when guessing the language from the OCR output.
    PAPERLESS_OCR_LANGUAGES=deu heb por spa
    
    # Set Paperless to use SSL for the web interface.
    # Enabling this will require ssl.key and ssl.cert files in paperless' data directory.
    #PAPERLESS_USE_SSL=false
    
    
    # Sample paperless.conf
    # As this file contains passwords it should only be readable by the user
    # running paperless.
    
    
    ###############################################################################
    ####                         Paths & Folders                               ####
    ###############################################################################
    
    # This where your documents should go to be consumed.  Make sure that it exists
    # and that the user running the paperless service can read/write its contents
    # before you start Paperless.
    #PAPERLESS_CONSUMPTION_DIR="/consume"
    
    
    # You can specify where you want the SQLite database to be stored instead of
    # the default location of /data/ within the install directory.
    #PAPERLESS_DBDIR=/path/to/database/file
    
    
    # Override the default MEDIA_ROOT here.  This is where all files are stored.
    # The default location is /media/documents/ within the install folder.
    #PAPERLESS_MEDIADIR=/path/to/media
    
    
    # Override the default STATIC_ROOT here.  This is where all static files
    # created using "collectstatic" manager command are stored.
    #PAPERLESS_STATICDIR=""
    
    
    # Override the MEDIA_URL here.  Unless you're hosting Paperless off a subdomain
    # like /paperless/, you probably don't need to change this.
    #PAPERLESS_MEDIA_URL="/media/"
    
    # Override the STATIC_URL here.  Unless you're hosting Paperless off a
    # subdomain like /paperless/, you probably don't need to change this.
    #PAPERLESS_STATIC_URL="/static/"
    
    
    # These values are required if you want paperless to check a particular email
    # box every 10 minutes and attempt to consume documents from there.  If you
    # don't define a HOST, mail checking will just be disabled.
    #PAPERLESS_CONSUME_MAIL_HOST=""
    #PAPERLESS_CONSUME_MAIL_PORT=""
    #PAPERLESS_CONSUME_MAIL_USER=""
    #PAPERLESS_CONSUME_MAIL_PASS=""
    
    # Override the default IMAP inbox here. If not set Paperless defaults to
    # "INBOX".
    #PAPERLESS_CONSUME_MAIL_INBOX="INBOX"
    
    # Any email sent to the target account that does not contain this text will be
    # ignored.
    #PAPERLESS_EMAIL_SECRET=""
    
    # Specify a filename format for the document (directories are supported)
    # Use the following placeholders:
    # * {correspondent}
    # * {title}
    # * {created}
    # * {added}
    # * {tags[KEY]} If your tags conform to key_value or key-value
    # * {tags[INDEX]} If your tags are strings, select the tag by index
    # Uniqueness of filenames is ensured, as an incrementing counter is attached
    # to each filename.
    #PAPERLESS_FILENAME_FORMAT=""
    
    ###############################################################################
    ####                              Security                                 ####
    ###############################################################################
    
    # Controls whether django's debug mode is enabled. Disable this on production
    # systems. Debug mode is enabled by default.
    PAPERLESS_DEBUG="false"
    
    
    # Paperless can be instructed to attempt to encrypt your PDF files with GPG
    # using the PAPERLESS_PASSPHRASE specified below.  If however you're not
    # concerned about encrypting these files (for example if you have disk
    # encryption locally) then you don't need this and can safely leave this value
    # un-set.
    #
    # One final note about the passphrase.  Once you've consumed a document with
    # one passphrase, DON'T CHANGE IT.  Paperless assumes this to be a constant and
    # can't properly export documents that were encrypted with an old passphrase if
    # you've since changed it to a new one.
    #
    # The default is to not use encryption at all.
    #PAPERLESS_PASSPHRASE="secret"
    
    
    # The secret key has a default that should be fine so long as you're hosting
    # Paperless on a closed network.  However, if you're putting this anywhere
    # public, you should change the key to something unique and verbose.
    #PAPERLESS_SECRET_KEY="change-me"
    
    
    # If you're planning on putting Paperless on the open internet, then you
    # really should set this value to the domain name you're using.  Failing to do
    # so leaves you open to HTTP host header attacks:
    # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting
    #
    # Just remember that this is a comma-separated list, so "example.com" is fine,
    # as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
    #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com"
    
    # If you decide to use the Paperless API in an ajax call, you need to add your
    # servers to the list of allowed hosts that can do CORS calls. By default
    # Paperless allows calls from localhost:8080, but you'd like to change that,
    # you can set this value to a comma-separated list.
    #PAPERLESS_CORS_ALLOWED_HOSTS="localhost:8080,example.com,localhost:8000"
    
    # To host paperless under a subpath url like example.com/paperless you set
    # this value to /paperless. No trailing slash!
    #
    # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name
    #PAPERLESS_FORCE_SCRIPT_NAME=""
    
    # If you are using alternative authentication means or are just using paperless
    # as a single user on a small private network, this option allows you to disable
    # user authentication if you set it to "true"
    #PAPERLESS_DISABLE_LOGIN="true"
    
    ###############################################################################
    ####                          Software Tweaks                              ####
    ###############################################################################
    
    # After a document is consumed, Paperless can trigger an arbitrary script if
    # you like.  This script will be passed a number of arguments for you to work
    # with.  The default is blank, which means nothing will be executed.  For more
    # information, take a look at the docs:
    # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process
    #PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh"
    
    # By default, when clicking on a document within the web interface, the
    # browser will prompt the user to save the document to disk. By setting this to
    # "true", the document will instead be opened in the browser, if possible.
    #PAPERLESS_INLINE_DOC="false"
    
    # By default, paperless will check the document text for document date information.
    # Uncomment the line below to enable checking the document filename for date
    # information. The date order can be set to any option as specified in
    # https://dateparser.readthedocs.io/en/latest/#settings. The filename will be
    # checked first, and if nothing is found, the document text will be checked
    # as normal.
    #PAPERLESS_FILENAME_DATE_ORDER="YMD"
    
    # Sometimes devices won't create filenames which can be parsed properly
    # by the filename parser (see
    # https://paperless.readthedocs.io/en/latest/guesswork.html).
    #
    # This setting allows to specify a list of transformations
    # in regular expression syntax, which are passed in order to re.sub.
    # Transformation stops after the first match, so at most one transformation
    # is applied.
    #
    # Syntax is a JSON array of dictionaries containing "pattern" and "repl"
    # as keys.
    #
    # The example below transforms filenames created by a Brother ADS-2400N
    # document scanner in its standard configuration `Name_Date_Count', so that
    # count is used as title, name as tag and date can be parsed by paperless.
    #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}]
    
    #
    # The following values use sensible defaults for modern systems, but if you're
    # running Paperless on a low-resource device (like a Raspberry Pi), modifying
    # some of these values may be necessary.
    #
    
    
    # By default, Paperless will attempt to use all available CPU cores to process
    # a document, but if you would like to limit that, you can set this value to
    # an integer:
    PAPERLESS_OCR_THREADS=2
    
    
    # Customize the default language that tesseract will attempt to use when
    # parsing documents.  It should be a 3-letter language code consistent with ISO
    # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php
    PAPERLESS_OCR_LANGUAGE=eng
    
    
    # On smaller systems, or even in the case of Very Large Documents, the consumer
    # may explode, complaining about how it's "unable to extend pixel cache".  In
    # such cases, try setting this to a reasonably low value, like 32000000.  The
    # default is to use whatever is necessary to do everything without writing to
    # disk, and units are in megabytes.
    #
    # For more information on how to use this value, you should probably search
    # the web for "MAGICK_MEMORY_LIMIT".
    #PAPERLESS_CONVERT_MEMORY_LIMIT=0
    
    
    # Similar to the memory limit, if you've got a small system and your OS mounts
    # /tmp as tmpfs, you should set this to a path that's on a physical disk, like
    # /home/your_user/tmp or something.  ImageMagick will use this as scratch space
    # when crunching through very large documents.
    #
    # For more information on how to use this value, you should probably search
    # the web for "MAGICK_TMPDIR".
    #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless
    
    
    # By default the conversion density setting for documents is 300DPI, in some
    # cases it has proven useful to configure a lesser value.
    # This setting has a high impact on the physical size of tmp page files,
    # the speed of document conversion, and can affect the accuracy of OCR
    # results. Individual results can vary and this setting should be tested
    # thoroughly against the documents you are importing to see if it has any
    # impacts either negative or positive.
    # Testing on limited document sets has shown a setting of 200 can cut the
    # size of tmp files by 1/3, and speed up conversion by up to 4x
    # with little impact to OCR accuracy.
    PAPERLESS_CONVERT_DENSITY=300
    
    
    # (This setting is ignored on Linux where inotify is used instead of a
    # polling loop.)
    # The number of seconds that Paperless will wait between checking
    # PAPERLESS_CONSUMPTION_DIR.  If you tend to write documents to this directory
    # rarely, you may want to use a higher value than the default (10).
    #PAPERLESS_CONSUMER_LOOP_TIME=10
    
    
    # By default Paperless stops consuming a document if no language can be
    # detected. Set to true to consume documents even if the language detection
    # fails.
    PAPERLESS_FORGIVING_OCR="true"
    
    
    # By default Paperless does not OCR a document if the text can be retrieved from
    # the document directly. Set to true to always OCR documents.
    PAPERLESS_OCR_ALWAYS="true"
    
    
    ###############################################################################
    ####                            Interface                                  ####
    ###############################################################################
    
    # Override the default UTC time zone here.
    # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE
    # for details on how to set it.
    PAPERLESS_TIME_ZONE=America/New_York
    
    
    # If set, Paperless will show document filters per financial year.
    # The dates must be in the format "mm-dd", for example "07-15" for July 15.
    #PAPERLESS_FINANCIAL_YEAR_START="mm-dd"
    #PAPERLESS_FINANCIAL_YEAR_END="mm-dd"
    
    
    # The number of items on each page in the web UI.  This value must be a
    # positive integer, but if you don't define one in paperless.conf, a default of
    # 100 will be used.
    #PAPERLESS_LIST_PER_PAGE=100
    
    
    # The number of years for which a correspondent will be included in the recent
    # correspondents filter.
    #PAPERLESS_RECENT_CORRESPONDENT_YEARS=1
    
    ###############################################################################
    ####                     Third-Party Binaries                              ####
    ###############################################################################
    
    # There are a few external software packages that Paperless expects to find on
    # your system when it starts up.  Unless you've done something creative with
    # their installation, you probably won't need to edit any of these.  However,
    # if you've installed these programs somewhere where simply typing the name of
    # the program doesn't automatically execute it (ie. the program isn't in your
    # $PATH), then you'll need to specify the literal path for that program here.
    
    # Convert (part of the ImageMagick suite)
    #PAPERLESS_CONVERT_BINARY=/usr/bin/convert
    
    # Ghostscript
    #PAPERLESS_GS_BINARY = /usr/bin/gs
    
    # Unpaper
    #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper
    
    # Optipng (for optimising thumbnail sizes)
    #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
    
    Gitlab: docker-compose.example.env
  5. Copy the contents below into the docker-compose.yaml file:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    
    ---
    # ##############################################################################################
    # #                                                                                            #
    # #                                        Paperless                                           #
    # #                                                                                            #
    # ##############################################################################################
    
    # https://hub.docker.com/r/jonaswinkler/paperless-ng
    # https://hub.docker.com/r/linuxserver/paperless-ng
    
    # NOTE:
    # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers
    # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger
    # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents
    
    version: "2.1"
    
    networks:
      paperless:
        external: false
    
    services:
    
      paperlessweb:
        image: "linuxserver/paperless-ng:1.5.0"
        env_file: docker-compose.env
        environment:
          PUID: 1001  # Open a terminal, and type "id", and set this number to the uid value
          PGID: 1001  # Open a terminal, and type "id", and set this number to the gid value
        volumes:
          - ./paperless-config/:/config/
          - ./paperless-data/:/data/
        # Default Paperless port: 8000
        ports:
          - "8000:8000"
        container_name: paperless-vonawesomeweb
        restart: unless-stopped
        networks:
          - paperless
    
    Gitlab: docker-compose.example.yaml
  6. Now open up your terminal again:
    1
    
    cd ~/Docker/paperless-ng
    
    1
    
    docker-compose up -d
    
  7. Open up your favorite (Firefox 😉️) browser, and navigate to http://localhost:8000, and use the username admin and password admin to log in for the first time.

Paperless-ng + FTP

Here is how you can set up a Paperless-ng + FTP Server with Docker + Docker Compose.

  1. Install Docker + Docker Compose

  2. Create a folder where all the Paperless-ng files and folders can live:

    1
    
    mkdir -p ~/Docker/paperless-ng
    
  3. Now create the docker-compose.env and docker-compose.yaml files in that directory:

    1
    
    touch ~/Docker/paperless-ng/docker-compose.env
    
    1
    
    touch ~/Docker/paperless-ng/docker-compose.yaml
    
  4. Copy the contents below into the docker-compose.env file (expand code block below to copy):

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    
    # Environment variables to set for Paperless
    # Commented out variables will be replaced with a default within Paperless.
    #
    # In addition to what you see here, you can also define any values you find in
    # paperless.conf.example here.  Values like:
    #
    # * PAPERLESS_PASSPHRASE
    # * PAPERLESS_CONSUMPTION_DIR
    # * PAPERLESS_CONSUME_MAIL_HOST
    #
    # ...are all explained in that file but can be defined here, since the Docker
    # installation doesn't make use of paperless.conf.
    
    # New to paperless-ng
    COMPOSE_PROJECT_NAME=paperless
    
    # Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
    TZ=America/New_York
    LANG=en_US.UTF-8
    LANGUAGE=en_US.UTF-8
    LC_ALL=en_US.UTF-8
    
    # This setting was found from search the GitHub issues
    # Set the date format for dates within scanned documents
    PAPERLESS_DATE_ORDER=MDY
    
    # You can change the default user and group id to a custom one
    USERMAP_UID=1001
    USERMAP_GID=1001
    
    # Additional languages to install for text recognition.  Note that this is
    # different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
    # default language used when guessing the language from the OCR output.
    PAPERLESS_OCR_LANGUAGES=deu heb por spa
    
    # Set Paperless to use SSL for the web interface.
    # Enabling this will require ssl.key and ssl.cert files in paperless' data directory.
    #PAPERLESS_USE_SSL=false
    
    
    # Sample paperless.conf
    # As this file contains passwords it should only be readable by the user
    # running paperless.
    
    
    ###############################################################################
    ####                         Paths & Folders                               ####
    ###############################################################################
    
    # This where your documents should go to be consumed.  Make sure that it exists
    # and that the user running the paperless service can read/write its contents
    # before you start Paperless.
    #PAPERLESS_CONSUMPTION_DIR="/consume"
    
    
    # You can specify where you want the SQLite database to be stored instead of
    # the default location of /data/ within the install directory.
    #PAPERLESS_DBDIR=/path/to/database/file
    
    
    # Override the default MEDIA_ROOT here.  This is where all files are stored.
    # The default location is /media/documents/ within the install folder.
    #PAPERLESS_MEDIADIR=/path/to/media
    
    
    # Override the default STATIC_ROOT here.  This is where all static files
    # created using "collectstatic" manager command are stored.
    #PAPERLESS_STATICDIR=""
    
    
    # Override the MEDIA_URL here.  Unless you're hosting Paperless off a subdomain
    # like /paperless/, you probably don't need to change this.
    #PAPERLESS_MEDIA_URL="/media/"
    
    # Override the STATIC_URL here.  Unless you're hosting Paperless off a
    # subdomain like /paperless/, you probably don't need to change this.
    #PAPERLESS_STATIC_URL="/static/"
    
    
    # These values are required if you want paperless to check a particular email
    # box every 10 minutes and attempt to consume documents from there.  If you
    # don't define a HOST, mail checking will just be disabled.
    #PAPERLESS_CONSUME_MAIL_HOST=""
    #PAPERLESS_CONSUME_MAIL_PORT=""
    #PAPERLESS_CONSUME_MAIL_USER=""
    #PAPERLESS_CONSUME_MAIL_PASS=""
    
    # Override the default IMAP inbox here. If not set Paperless defaults to
    # "INBOX".
    #PAPERLESS_CONSUME_MAIL_INBOX="INBOX"
    
    # Any email sent to the target account that does not contain this text will be
    # ignored.
    #PAPERLESS_EMAIL_SECRET=""
    
    # Specify a filename format for the document (directories are supported)
    # Use the following placeholders:
    # * {correspondent}
    # * {title}
    # * {created}
    # * {added}
    # * {tags[KEY]} If your tags conform to key_value or key-value
    # * {tags[INDEX]} If your tags are strings, select the tag by index
    # Uniqueness of filenames is ensured, as an incrementing counter is attached
    # to each filename.
    #PAPERLESS_FILENAME_FORMAT=""
    
    ###############################################################################
    ####                              Security                                 ####
    ###############################################################################
    
    # Controls whether django's debug mode is enabled. Disable this on production
    # systems. Debug mode is enabled by default.
    PAPERLESS_DEBUG="false"
    
    
    # Paperless can be instructed to attempt to encrypt your PDF files with GPG
    # using the PAPERLESS_PASSPHRASE specified below.  If however you're not
    # concerned about encrypting these files (for example if you have disk
    # encryption locally) then you don't need this and can safely leave this value
    # un-set.
    #
    # One final note about the passphrase.  Once you've consumed a document with
    # one passphrase, DON'T CHANGE IT.  Paperless assumes this to be a constant and
    # can't properly export documents that were encrypted with an old passphrase if
    # you've since changed it to a new one.
    #
    # The default is to not use encryption at all.
    #PAPERLESS_PASSPHRASE="secret"
    
    
    # The secret key has a default that should be fine so long as you're hosting
    # Paperless on a closed network.  However, if you're putting this anywhere
    # public, you should change the key to something unique and verbose.
    #PAPERLESS_SECRET_KEY="change-me"
    
    
    # If you're planning on putting Paperless on the open internet, then you
    # really should set this value to the domain name you're using.  Failing to do
    # so leaves you open to HTTP host header attacks:
    # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting
    #
    # Just remember that this is a comma-separated list, so "example.com" is fine,
    # as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
    #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com"
    
    # If you decide to use the Paperless API in an ajax call, you need to add your
    # servers to the list of allowed hosts that can do CORS calls. By default
    # Paperless allows calls from localhost:8080, but you'd like to change that,
    # you can set this value to a comma-separated list.
    #PAPERLESS_CORS_ALLOWED_HOSTS="localhost:8080,example.com,localhost:8000"
    
    # To host paperless under a subpath url like example.com/paperless you set
    # this value to /paperless. No trailing slash!
    #
    # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name
    #PAPERLESS_FORCE_SCRIPT_NAME=""
    
    # If you are using alternative authentication means or are just using paperless
    # as a single user on a small private network, this option allows you to disable
    # user authentication if you set it to "true"
    #PAPERLESS_DISABLE_LOGIN="true"
    
    ###############################################################################
    ####                          Software Tweaks                              ####
    ###############################################################################
    
    # After a document is consumed, Paperless can trigger an arbitrary script if
    # you like.  This script will be passed a number of arguments for you to work
    # with.  The default is blank, which means nothing will be executed.  For more
    # information, take a look at the docs:
    # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process
    #PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh"
    
    # By default, when clicking on a document within the web interface, the
    # browser will prompt the user to save the document to disk. By setting this to
    # "true", the document will instead be opened in the browser, if possible.
    #PAPERLESS_INLINE_DOC="false"
    
    # By default, paperless will check the document text for document date information.
    # Uncomment the line below to enable checking the document filename for date
    # information. The date order can be set to any option as specified in
    # https://dateparser.readthedocs.io/en/latest/#settings. The filename will be
    # checked first, and if nothing is found, the document text will be checked
    # as normal.
    #PAPERLESS_FILENAME_DATE_ORDER="YMD"
    
    # Sometimes devices won't create filenames which can be parsed properly
    # by the filename parser (see
    # https://paperless.readthedocs.io/en/latest/guesswork.html).
    #
    # This setting allows to specify a list of transformations
    # in regular expression syntax, which are passed in order to re.sub.
    # Transformation stops after the first match, so at most one transformation
    # is applied.
    #
    # Syntax is a JSON array of dictionaries containing "pattern" and "repl"
    # as keys.
    #
    # The example below transforms filenames created by a Brother ADS-2400N
    # document scanner in its standard configuration `Name_Date_Count', so that
    # count is used as title, name as tag and date can be parsed by paperless.
    #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}]
    
    #
    # The following values use sensible defaults for modern systems, but if you're
    # running Paperless on a low-resource device (like a Raspberry Pi), modifying
    # some of these values may be necessary.
    #
    
    
    # By default, Paperless will attempt to use all available CPU cores to process
    # a document, but if you would like to limit that, you can set this value to
    # an integer:
    PAPERLESS_OCR_THREADS=2
    
    
    # Customize the default language that tesseract will attempt to use when
    # parsing documents.  It should be a 3-letter language code consistent with ISO
    # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php
    PAPERLESS_OCR_LANGUAGE=eng
    
    
    # On smaller systems, or even in the case of Very Large Documents, the consumer
    # may explode, complaining about how it's "unable to extend pixel cache".  In
    # such cases, try setting this to a reasonably low value, like 32000000.  The
    # default is to use whatever is necessary to do everything without writing to
    # disk, and units are in megabytes.
    #
    # For more information on how to use this value, you should probably search
    # the web for "MAGICK_MEMORY_LIMIT".
    #PAPERLESS_CONVERT_MEMORY_LIMIT=0
    
    
    # Similar to the memory limit, if you've got a small system and your OS mounts
    # /tmp as tmpfs, you should set this to a path that's on a physical disk, like
    # /home/your_user/tmp or something.  ImageMagick will use this as scratch space
    # when crunching through very large documents.
    #
    # For more information on how to use this value, you should probably search
    # the web for "MAGICK_TMPDIR".
    #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless
    
    
    # By default the conversion density setting for documents is 300DPI, in some
    # cases it has proven useful to configure a lesser value.
    # This setting has a high impact on the physical size of tmp page files,
    # the speed of document conversion, and can affect the accuracy of OCR
    # results. Individual results can vary and this setting should be tested
    # thoroughly against the documents you are importing to see if it has any
    # impacts either negative or positive.
    # Testing on limited document sets has shown a setting of 200 can cut the
    # size of tmp files by 1/3, and speed up conversion by up to 4x
    # with little impact to OCR accuracy.
    PAPERLESS_CONVERT_DENSITY=300
    
    
    # (This setting is ignored on Linux where inotify is used instead of a
    # polling loop.)
    # The number of seconds that Paperless will wait between checking
    # PAPERLESS_CONSUMPTION_DIR.  If you tend to write documents to this directory
    # rarely, you may want to use a higher value than the default (10).
    #PAPERLESS_CONSUMER_LOOP_TIME=10
    
    
    # By default Paperless stops consuming a document if no language can be
    # detected. Set to true to consume documents even if the language detection
    # fails.
    PAPERLESS_FORGIVING_OCR="true"
    
    
    # By default Paperless does not OCR a document if the text can be retrieved from
    # the document directly. Set to true to always OCR documents.
    PAPERLESS_OCR_ALWAYS="true"
    
    
    ###############################################################################
    ####                            Interface                                  ####
    ###############################################################################
    
    # Override the default UTC time zone here.
    # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE
    # for details on how to set it.
    PAPERLESS_TIME_ZONE=America/New_York
    
    
    # If set, Paperless will show document filters per financial year.
    # The dates must be in the format "mm-dd", for example "07-15" for July 15.
    #PAPERLESS_FINANCIAL_YEAR_START="mm-dd"
    #PAPERLESS_FINANCIAL_YEAR_END="mm-dd"
    
    
    # The number of items on each page in the web UI.  This value must be a
    # positive integer, but if you don't define one in paperless.conf, a default of
    # 100 will be used.
    #PAPERLESS_LIST_PER_PAGE=100
    
    
    # The number of years for which a correspondent will be included in the recent
    # correspondents filter.
    #PAPERLESS_RECENT_CORRESPONDENT_YEARS=1
    
    ###############################################################################
    ####                     Third-Party Binaries                              ####
    ###############################################################################
    
    # There are a few external software packages that Paperless expects to find on
    # your system when it starts up.  Unless you've done something creative with
    # their installation, you probably won't need to edit any of these.  However,
    # if you've installed these programs somewhere where simply typing the name of
    # the program doesn't automatically execute it (ie. the program isn't in your
    # $PATH), then you'll need to specify the literal path for that program here.
    
    # Convert (part of the ImageMagick suite)
    #PAPERLESS_CONVERT_BINARY=/usr/bin/convert
    
    # Ghostscript
    #PAPERLESS_GS_BINARY = /usr/bin/gs
    
    # Unpaper
    #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper
    
    # Optipng (for optimising thumbnail sizes)
    #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
    
    Gitlab: docker-compose.example.env

  5. Copy the contents below into the docker-compose.yaml file:

    Important: Make sure you change the line PUBLICHOST: "192.168.10.7" to use your computer’s IP Address

     1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        
    ---
        # ##############################################################################################
        # #                                                                                            #
        # #                                     Paperless + FTP                                        #
        # #                                                                                            #
        # ##############################################################################################
        
        # https://hub.docker.com/r/jonaswinkler/paperless-ng
        # https://hub.docker.com/r/linuxserver/paperless-ng
        
        # NOTE:
        # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers
        # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger
        # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents
        
        version: "2.1"
        
        networks:
          paperless:
            external: false
          paperlessftp:
            external: false
        
        
        services:
        
          paperlessweb:
            image: "linuxserver/paperless-ng:1.5.0"
            env_file: docker-compose.env
            environment:
              PUID: 1001  # Open a terminal, and type "id", and set this number to the uid value
              PGID: 1001  # Open a terminal, and type "id", and set this number to the gid value
            volumes:
              - ./paperless-config/:/config/
              - ./paperless-data/:/data/
            # Default Paperless port: 8000
            ports:
              - "8000:8000"
            container_name: paperless-vonawesomeweb
            restart: unless-stopped
            networks:
              - paperless
        
          paperlesspureftpd:
            # https://github.com/stilliard/docker-pure-ftpd
            image: "stilliard/pure-ftpd:buster-latest"
            environment:
              PUBLICHOST: "192.168.10.7" # Change this to the IP address of **your** machine
              FTP_USER_NAME: paperless  # Can be whatever you want
              FTP_USER_PASS: Super-Secret-Password-that-you-should-change
              FTP_USER_UID: 1001 # Open a terminal, and type "id", and set this number to the uid value
              FTP_USER_GID: 1001 # Open a terminal, and type "id", and set this number to the gid value
              FTP_USER_HOME: /home/paperless/ftp-files
            ports:
              - "21:21"
              - "30000-30009:30000-30009"
            volumes:
              - ./paperless-data/consume/:/home/paperless/ftp-files/
            networks:
              - paperlessftp
            container_name: paperless-vonawesomepureftpd
            restart: unless-stopped
            depends_on:
              - paperlessweb
        
    Gitlab: docker-compose.ftp-example.yaml

  6. Now open up your terminal again:

    1
    
    cd ~/Docker/paperless-ng
    
    1
    
    docker-compose up -d
    

You can now scan documents directly from your network printer, and Paperless-ng will put up the file, process it, and add it to you document library.

The Rest of the Story

Background

So it’s like this. I was looking for a way archive mail (snail mail) that I was receiving, and stumbled upon The Paperless Project. That was two years ago, and Paperless has been running solid ever since.

More recently I was checking for updates, and discovered that the original Developer of The Paperless Project had stepped down, and now the current project is called Paperless-ng (ng - next generation?). I was very pleased to see the continuation of this project!

And the icing on the cake, the Linuxserver.io Team picked up the Paperless-ng project, and are providing their own Open Container Initiative (OCI) image for the Paperless-ng project. If ever I am wanting to run an OCI container application, either with Docker or Podman, I always check fleet.linuxserver.io, and highly recommend you check out the other images they offer! :-)

Outcome Vision

The outcome vision for this post is to provide an example Docker Compose file that you can use, and the setup steps to help you get your very own Paperless-ng server running. You then will be able to tweak the files to your liking. The documentation for Paperless-ng is great, however, I hope to distill it down even further in order jump start your Paperless-ng sever setup.

Prerequisites

Paperless-ng can be installed multiple ways, but I’m going to focus on the OCI image setup using Docker.

Docker Installation

So hopefully it’s obvious that you’ll need to install Docker and Docker Compose on your Linux Mint (or other Ubuntu-based) machine.

  1. Open up a terminal, and type the following:
    1
    
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
    
  2. Install the Docker repository
    1
    2
    3
    
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
      focal stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    
    Note: You’ll notice focal stable above, you can change that to $(lsb_release -cs) stable if you are using vanilla Ubuntu.
  3. Install Docker, and Docker Compose
    1
    2
    
    sudo apt update
    sudo apt install docker-ce docker-ce-cli containerd.io docker-compose
    
  4. Add your Linux User to the docker group so you do not have to type sudo when using Docker commands
    1
    2
    
    sudo groupadd docker
    sudo usermod -aG docker $USER
    
  5. Log out, and log back in to apply the group changes to your Linux User account.

Source: https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository

Starting up Your Paperless Server

Thanks to the magic of containerization, getting start is quite simple.

Do me a favor, and set things up exactly as I show below. Then when you see things working tweak the heck out of it. 😇️

  1. Open up your terminal, and let’s create a home where all the Paperless-ng files and folders can live:

    1
    
    mkdir -p ~/Docker/paperless-ng
    
  2. Now create the docker-compose.env and docker-compose.yaml files in that directory:

    1
    
    touch ~/Docker/paperless-ng/docker-compose.env
    
    1
    
    touch ~/Docker/paperless-ng/docker-compose.yaml
    
  3. Copy the contents below into the docker-compose.env file (expand code block below to copy):

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    
    # Environment variables to set for Paperless
    # Commented out variables will be replaced with a default within Paperless.
    #
    # In addition to what you see here, you can also define any values you find in
    # paperless.conf.example here.  Values like:
    #
    # * PAPERLESS_PASSPHRASE
    # * PAPERLESS_CONSUMPTION_DIR
    # * PAPERLESS_CONSUME_MAIL_HOST
    #
    # ...are all explained in that file but can be defined here, since the Docker
    # installation doesn't make use of paperless.conf.
    
    # New to paperless-ng
    COMPOSE_PROJECT_NAME=paperless
    
    # Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
    TZ=America/New_York
    LANG=en_US.UTF-8
    LANGUAGE=en_US.UTF-8
    LC_ALL=en_US.UTF-8
    
    # This setting was found from search the GitHub issues
    # Set the date format for dates within scanned documents
    PAPERLESS_DATE_ORDER=MDY
    
    # You can change the default user and group id to a custom one
    USERMAP_UID=1001
    USERMAP_GID=1001
    
    # Additional languages to install for text recognition.  Note that this is
    # different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
    # default language used when guessing the language from the OCR output.
    PAPERLESS_OCR_LANGUAGES=deu heb por spa
    
    # Set Paperless to use SSL for the web interface.
    # Enabling this will require ssl.key and ssl.cert files in paperless' data directory.
    #PAPERLESS_USE_SSL=false
    
    
    # Sample paperless.conf
    # As this file contains passwords it should only be readable by the user
    # running paperless.
    
    
    ###############################################################################
    ####                         Paths & Folders                               ####
    ###############################################################################
    
    # This where your documents should go to be consumed.  Make sure that it exists
    # and that the user running the paperless service can read/write its contents
    # before you start Paperless.
    #PAPERLESS_CONSUMPTION_DIR="/consume"
    
    
    # You can specify where you want the SQLite database to be stored instead of
    # the default location of /data/ within the install directory.
    #PAPERLESS_DBDIR=/path/to/database/file
    
    
    # Override the default MEDIA_ROOT here.  This is where all files are stored.
    # The default location is /media/documents/ within the install folder.
    #PAPERLESS_MEDIADIR=/path/to/media
    
    
    # Override the default STATIC_ROOT here.  This is where all static files
    # created using "collectstatic" manager command are stored.
    #PAPERLESS_STATICDIR=""
    
    
    # Override the MEDIA_URL here.  Unless you're hosting Paperless off a subdomain
    # like /paperless/, you probably don't need to change this.
    #PAPERLESS_MEDIA_URL="/media/"
    
    # Override the STATIC_URL here.  Unless you're hosting Paperless off a
    # subdomain like /paperless/, you probably don't need to change this.
    #PAPERLESS_STATIC_URL="/static/"
    
    
    # These values are required if you want paperless to check a particular email
    # box every 10 minutes and attempt to consume documents from there.  If you
    # don't define a HOST, mail checking will just be disabled.
    #PAPERLESS_CONSUME_MAIL_HOST=""
    #PAPERLESS_CONSUME_MAIL_PORT=""
    #PAPERLESS_CONSUME_MAIL_USER=""
    #PAPERLESS_CONSUME_MAIL_PASS=""
    
    # Override the default IMAP inbox here. If not set Paperless defaults to
    # "INBOX".
    #PAPERLESS_CONSUME_MAIL_INBOX="INBOX"
    
    # Any email sent to the target account that does not contain this text will be
    # ignored.
    #PAPERLESS_EMAIL_SECRET=""
    
    # Specify a filename format for the document (directories are supported)
    # Use the following placeholders:
    # * {correspondent}
    # * {title}
    # * {created}
    # * {added}
    # * {tags[KEY]} If your tags conform to key_value or key-value
    # * {tags[INDEX]} If your tags are strings, select the tag by index
    # Uniqueness of filenames is ensured, as an incrementing counter is attached
    # to each filename.
    #PAPERLESS_FILENAME_FORMAT=""
    
    ###############################################################################
    ####                              Security                                 ####
    ###############################################################################
    
    # Controls whether django's debug mode is enabled. Disable this on production
    # systems. Debug mode is enabled by default.
    PAPERLESS_DEBUG="false"
    
    
    # Paperless can be instructed to attempt to encrypt your PDF files with GPG
    # using the PAPERLESS_PASSPHRASE specified below.  If however you're not
    # concerned about encrypting these files (for example if you have disk
    # encryption locally) then you don't need this and can safely leave this value
    # un-set.
    #
    # One final note about the passphrase.  Once you've consumed a document with
    # one passphrase, DON'T CHANGE IT.  Paperless assumes this to be a constant and
    # can't properly export documents that were encrypted with an old passphrase if
    # you've since changed it to a new one.
    #
    # The default is to not use encryption at all.
    #PAPERLESS_PASSPHRASE="secret"
    
    
    # The secret key has a default that should be fine so long as you're hosting
    # Paperless on a closed network.  However, if you're putting this anywhere
    # public, you should change the key to something unique and verbose.
    #PAPERLESS_SECRET_KEY="change-me"
    
    
    # If you're planning on putting Paperless on the open internet, then you
    # really should set this value to the domain name you're using.  Failing to do
    # so leaves you open to HTTP host header attacks:
    # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting
    #
    # Just remember that this is a comma-separated list, so "example.com" is fine,
    # as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
    #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com"
    
    # If you decide to use the Paperless API in an ajax call, you need to add your
    # servers to the list of allowed hosts that can do CORS calls. By default
    # Paperless allows calls from localhost:8080, but you'd like to change that,
    # you can set this value to a comma-separated list.
    #PAPERLESS_CORS_ALLOWED_HOSTS="localhost:8080,example.com,localhost:8000"
    
    # To host paperless under a subpath url like example.com/paperless you set
    # this value to /paperless. No trailing slash!
    #
    # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name
    #PAPERLESS_FORCE_SCRIPT_NAME=""
    
    # If you are using alternative authentication means or are just using paperless
    # as a single user on a small private network, this option allows you to disable
    # user authentication if you set it to "true"
    #PAPERLESS_DISABLE_LOGIN="true"
    
    ###############################################################################
    ####                          Software Tweaks                              ####
    ###############################################################################
    
    # After a document is consumed, Paperless can trigger an arbitrary script if
    # you like.  This script will be passed a number of arguments for you to work
    # with.  The default is blank, which means nothing will be executed.  For more
    # information, take a look at the docs:
    # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process
    #PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh"
    
    # By default, when clicking on a document within the web interface, the
    # browser will prompt the user to save the document to disk. By setting this to
    # "true", the document will instead be opened in the browser, if possible.
    #PAPERLESS_INLINE_DOC="false"
    
    # By default, paperless will check the document text for document date information.
    # Uncomment the line below to enable checking the document filename for date
    # information. The date order can be set to any option as specified in
    # https://dateparser.readthedocs.io/en/latest/#settings. The filename will be
    # checked first, and if nothing is found, the document text will be checked
    # as normal.
    #PAPERLESS_FILENAME_DATE_ORDER="YMD"
    
    # Sometimes devices won't create filenames which can be parsed properly
    # by the filename parser (see
    # https://paperless.readthedocs.io/en/latest/guesswork.html).
    #
    # This setting allows to specify a list of transformations
    # in regular expression syntax, which are passed in order to re.sub.
    # Transformation stops after the first match, so at most one transformation
    # is applied.
    #
    # Syntax is a JSON array of dictionaries containing "pattern" and "repl"
    # as keys.
    #
    # The example below transforms filenames created by a Brother ADS-2400N
    # document scanner in its standard configuration `Name_Date_Count', so that
    # count is used as title, name as tag and date can be parsed by paperless.
    #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}]
    
    #
    # The following values use sensible defaults for modern systems, but if you're
    # running Paperless on a low-resource device (like a Raspberry Pi), modifying
    # some of these values may be necessary.
    #
    
    
    # By default, Paperless will attempt to use all available CPU cores to process
    # a document, but if you would like to limit that, you can set this value to
    # an integer:
    PAPERLESS_OCR_THREADS=2
    
    
    # Customize the default language that tesseract will attempt to use when
    # parsing documents.  It should be a 3-letter language code consistent with ISO
    # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php
    PAPERLESS_OCR_LANGUAGE=eng
    
    
    # On smaller systems, or even in the case of Very Large Documents, the consumer
    # may explode, complaining about how it's "unable to extend pixel cache".  In
    # such cases, try setting this to a reasonably low value, like 32000000.  The
    # default is to use whatever is necessary to do everything without writing to
    # disk, and units are in megabytes.
    #
    # For more information on how to use this value, you should probably search
    # the web for "MAGICK_MEMORY_LIMIT".
    #PAPERLESS_CONVERT_MEMORY_LIMIT=0
    
    
    # Similar to the memory limit, if you've got a small system and your OS mounts
    # /tmp as tmpfs, you should set this to a path that's on a physical disk, like
    # /home/your_user/tmp or something.  ImageMagick will use this as scratch space
    # when crunching through very large documents.
    #
    # For more information on how to use this value, you should probably search
    # the web for "MAGICK_TMPDIR".
    #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless
    
    
    # By default the conversion density setting for documents is 300DPI, in some
    # cases it has proven useful to configure a lesser value.
    # This setting has a high impact on the physical size of tmp page files,
    # the speed of document conversion, and can affect the accuracy of OCR
    # results. Individual results can vary and this setting should be tested
    # thoroughly against the documents you are importing to see if it has any
    # impacts either negative or positive.
    # Testing on limited document sets has shown a setting of 200 can cut the
    # size of tmp files by 1/3, and speed up conversion by up to 4x
    # with little impact to OCR accuracy.
    PAPERLESS_CONVERT_DENSITY=300
    
    
    # (This setting is ignored on Linux where inotify is used instead of a
    # polling loop.)
    # The number of seconds that Paperless will wait between checking
    # PAPERLESS_CONSUMPTION_DIR.  If you tend to write documents to this directory
    # rarely, you may want to use a higher value than the default (10).
    #PAPERLESS_CONSUMER_LOOP_TIME=10
    
    
    # By default Paperless stops consuming a document if no language can be
    # detected. Set to true to consume documents even if the language detection
    # fails.
    PAPERLESS_FORGIVING_OCR="true"
    
    
    # By default Paperless does not OCR a document if the text can be retrieved from
    # the document directly. Set to true to always OCR documents.
    PAPERLESS_OCR_ALWAYS="true"
    
    
    ###############################################################################
    ####                            Interface                                  ####
    ###############################################################################
    
    # Override the default UTC time zone here.
    # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE
    # for details on how to set it.
    PAPERLESS_TIME_ZONE=America/New_York
    
    
    # If set, Paperless will show document filters per financial year.
    # The dates must be in the format "mm-dd", for example "07-15" for July 15.
    #PAPERLESS_FINANCIAL_YEAR_START="mm-dd"
    #PAPERLESS_FINANCIAL_YEAR_END="mm-dd"
    
    
    # The number of items on each page in the web UI.  This value must be a
    # positive integer, but if you don't define one in paperless.conf, a default of
    # 100 will be used.
    #PAPERLESS_LIST_PER_PAGE=100
    
    
    # The number of years for which a correspondent will be included in the recent
    # correspondents filter.
    #PAPERLESS_RECENT_CORRESPONDENT_YEARS=1
    
    ###############################################################################
    ####                     Third-Party Binaries                              ####
    ###############################################################################
    
    # There are a few external software packages that Paperless expects to find on
    # your system when it starts up.  Unless you've done something creative with
    # their installation, you probably won't need to edit any of these.  However,
    # if you've installed these programs somewhere where simply typing the name of
    # the program doesn't automatically execute it (ie. the program isn't in your
    # $PATH), then you'll need to specify the literal path for that program here.
    
    # Convert (part of the ImageMagick suite)
    #PAPERLESS_CONVERT_BINARY=/usr/bin/convert
    
    # Ghostscript
    #PAPERLESS_GS_BINARY = /usr/bin/gs
    
    # Unpaper
    #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper
    
    # Optipng (for optimising thumbnail sizes)
    #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
    
    Gitlab: docker-compose.example.env

  4. Copy the contents below into the docker-compose.yaml file:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    
    ---
    # ##############################################################################################
    # #                                                                                            #
    # #                                        Paperless                                           #
    # #                                                                                            #
    # ##############################################################################################
    
    # https://hub.docker.com/r/jonaswinkler/paperless-ng
    # https://hub.docker.com/r/linuxserver/paperless-ng
    
    # NOTE:
    # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers
    # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger
    # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents
    
    version: "2.1"
    
    networks:
      paperless:
        external: false
    
    services:
    
      paperlessweb:
        image: "linuxserver/paperless-ng:1.5.0"
        env_file: docker-compose.env
        environment:
          PUID: 1001  # Open a terminal, and type "id", and set this number to the uid value
          PGID: 1001  # Open a terminal, and type "id", and set this number to the gid value
        volumes:
          - ./paperless-config/:/config/
          - ./paperless-data/:/data/
        # Default Paperless port: 8000
        ports:
          - "8000:8000"
        container_name: paperless-vonawesomeweb
        restart: unless-stopped
        networks:
          - paperless
    
    Gitlab: docker-compose.example.yaml

  5. Now open up your terminal again:

    1
    
    cd ~/Docker/paperless-ng
    
    1
    
    docker-compose up -d
    
  6. Monitor the initialization process:

    1
    
    docker-compose logs -f
    

    In the output, you should see:

    1
    2
    3
    4
    5
    6
    
    ...
    paperless-vonawesomeweb | [2021-12-29 00:50:41 -0500] [365] [INFO] Server is ready. Spawning workers
    paperless-vonawesomeweb | [2021-12-29 00:50:41,917] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /data/consume
    ...
    paperless-vonawesomeweb | [2021-12-29 00:51:12,451] [INFO] [paperless.sanity_checker] Sanity checker detected no issues.
    ...
    
  7. Press Ctrl+c to exit from the “docker-compose logs” output

  8. Open up your favorite (Firefox 😉️) browser, and navigate to http://localhost:8000, and use the username admin and password admin to log in for the first time.

    Upon a successful login, you’ll be greeted with the Paperless-ng Dashboard:

    /posts/going-paperless/01-paperless-dashboard.png
    Paperless-ng Dashboard

  9. If you want to change your password, you can do that in the admin section:

    /posts/going-paperless/02-admin.png
    Admin Section

Your First Upload

Ok, you’ve logged in, you’re at the Dashboard. Now try uploading a file.

The end result is that the file is “processed”, and added to your document library. Over time you’ll add more and more documents, and you’ll be able to find your specific document amongst the haystack of all the others with a simple text search thanks to the Optical Character Recognition (OCR) built into Paperless-ng.

Let the uploading begin!

  1. On the Dashboard, either drag and drop the file you would like to upload, or click Browse files under the Upload new documents section:

    /posts/going-paperless/03-upload-a-file.png
    Upload a File

  2. Wait for your file to upload

    /posts/going-paperless/04-upload-a-file.png
    File Processing...

    /posts/going-paperless/05-upload-a-file.png
    Upload Finished

  3. And that’s it! Document uploaded/processed. You can click on Documents in the menu on the right side of the dashboard to view, edit, or search your documents:

    /posts/going-paperless/06-view-documents.png
    View Documents

    /posts/going-paperless/07-view-documents.png
    Document Library

Bonus: Upload via FTP

So instead of uploading everything via the web interface, how would you like to go directly from your scanner to paperless?

Well, you’re in luck, there’s a Docker Compose for that! 🤓️

Prerequisites

  1. You’ll need a printer that can send files via File Transfer Protocol (FTP), and your printer will need to be connected to your Local Area Network (LAN).
  2. Also, make a note of your computer’s IP Address, you need it later.

Starting up Your Paperless Server with FTP

  1. Complete steps 1 - 3 in the Starting up Your Paperless Server section above, then move on to the next step below.

  2. Copy the contents below into the docker-compose.yaml file:

    Important: Make sure you change the line PUBLICHOST: "192.168.10.7" to use your computer’s IP Address

     1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        
    ---
        # ##############################################################################################
        # #                                                                                            #
        # #                                     Paperless + FTP                                        #
        # #                                                                                            #
        # ##############################################################################################
        
        # https://hub.docker.com/r/jonaswinkler/paperless-ng
        # https://hub.docker.com/r/linuxserver/paperless-ng
        
        # NOTE:
        # https://paperless.readthedocs.io/en/latest/utilities.html#re-running-your-tagging-and-correspondent-matchers
        # - To rescan documents for new tags, shell into the container and run --> python3 manage.py document_retagger
        # - To rescan documents for new correspondents, shell into the container and run --> python3 manage.py document_correspondents
        
        version: "2.1"
        
        networks:
          paperless:
            external: false
          paperlessftp:
            external: false
        
        
        services:
        
          paperlessweb:
            image: "linuxserver/paperless-ng:1.5.0"
            env_file: docker-compose.env
            environment:
              PUID: 1001  # Open a terminal, and type "id", and set this number to the uid value
              PGID: 1001  # Open a terminal, and type "id", and set this number to the gid value
            volumes:
              - ./paperless-config/:/config/
              - ./paperless-data/:/data/
            # Default Paperless port: 8000
            ports:
              - "8000:8000"
            container_name: paperless-vonawesomeweb
            restart: unless-stopped
            networks:
              - paperless
        
          paperlesspureftpd:
            # https://github.com/stilliard/docker-pure-ftpd
            image: "stilliard/pure-ftpd:buster-latest"
            environment:
              PUBLICHOST: "192.168.10.7" # Change this to the IP address of **your** machine
              FTP_USER_NAME: paperless  # Can be whatever you want
              FTP_USER_PASS: Super-Secret-Password-that-you-should-change
              FTP_USER_UID: 1001 # Open a terminal, and type "id", and set this number to the uid value
              FTP_USER_GID: 1001 # Open a terminal, and type "id", and set this number to the gid value
              FTP_USER_HOME: /home/paperless/ftp-files
            ports:
              - "21:21"
              - "30000-30009:30000-30009"
            volumes:
              - ./paperless-data/consume/:/home/paperless/ftp-files/
            networks:
              - paperlessftp
            container_name: paperless-vonawesomepureftpd
            restart: unless-stopped
            depends_on:
              - paperlessweb
        
    Gitlab: docker-compose.ftp-example.yaml

  3. Now open up your terminal again:

    1
    
    cd ~/Docker/paperless-ng
    
    1
    
    docker-compose up -d
    
  4. Monitor the initialization process:

    1
    
    docker-compose logs -f
    

    Again, you’ll look for output similar to below:

    1
    2
    3
    4
    5
    6
    
    ...
    paperless-vonawesomeweb | [2021-12-29 00:50:41 -0500] [365] [INFO] Server is ready. Spawning workers
    paperless-vonawesomeweb | [2021-12-29 00:50:41,917] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /data/consume
    ...
    paperless-vonawesomeweb | [2021-12-29 00:51:12,451] [INFO] [paperless.sanity_checker] Sanity checker detected no issues.
    ...
    

    Additionally, make sure the FTP Server has started up. You should see output similar to this:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    paperless-vonawesomepureftpd | Creating user...
    paperless-vonawesomepureftpd | Password:
    paperless-vonawesomepureftpd | Enter it again:
    paperless-vonawesomepureftpd |  root user give /home/paperless/ftp-files directory 1001 owner
    paperless-vonawesomepureftpd | Setting default port range to: 30000:30009
    paperless-vonawesomepureftpd | Setting default max clients to: 5
    paperless-vonawesomepureftpd | Setting default max connections per ip to: 5
    paperless-vonawesomepureftpd | Starting Pure-FTPd:
    paperless-vonawesomepureftpd |   pure-ftpd  -l puredb:/etc/pure-ftpd/pureftpd.pdb -E -j -R -P 192.168.10.7   -p 30000:30009 -c 5 -C 5
    
  5. Press Ctrl+c to exit from the “docker-compose logs” output.

You can now scan documents directly from your network printer, and Paperless-ng will put up the file, process it, and add it to you document library.

Conclusion

There is so much more you can do to your documents, like tagging or labeling documents automatically to help you better organize your documents.

However, this post is just meant to get you setup and started. The documentation for Paperless-ng is phenomenal, so if you want to dig deeper, check that out.

As always, questions or comments welcome.

I hope you find this helpful! :-)

End of Line.