Mender Production update from v2.2.0 to v2.3.0

It looks like something has gone wrong during the update.

My recommendation is to store your work. Check out a fresh 2.3.0 branch, and see if it works.

If this does work, start by adding your own yaml [override files(https://docs.docker.com/compose/extends/)

This should give a clearer indication, and give us a clean base to debug this from I think :slight_smile:

Hi
I have tried to raise a server from 0 in branch 2.3.0b1, and I have the same problem as the one I have raised.

I use Ubuntu 18.04, 16GB ram over vmware.

The mender-driver and minio containers, continue to mark healthy, and I get the same problem when uploading files for release, some upload them and the process finishes well, and others do not.

I await your suggestions :slight_smile:

@blackdevice can you please provide us the logs from all the containers (if possible), or in particular create-artifact-worker and workflows-server?

Hi,

Of course, thank you very much for the help!

Here you can download a zip of the container records:
http://cloud.interactionfactory.es/index.php/s/Y381QdRc9kO2tAE

Thank you!

@blackdevice the logs are not congruent, I believe; the create-artifact-worker logs show two artifact generation requests:

    1. 2020-02-27T10:36:46Z
    1. 2020-02-27T10:59:27Z

The logs from the other containers (I’m interested in api-gateway and deployments) are from the 2nd of March, though.

Sorry,
I have returned to log more time, to see if it can help you see the problem.
Thank you!

http://cloud.interactionfactory.es/index.php/s/QgwpmeHF5L2U5K3

@blackdevice as I can see from the logs, the first generation requests successfully completed and the resulting artifact was uploaded to the deployments service. The second request was completed as well, but I cannot see any upload to the deployments service.

May I ask you to check the “workflows” mongo database, and the “jobs” collection in particular? It contains the result of the asynchronous job execution with actual outputs.

Hi,
Thanks for your Help,
I’m not sure how I can see that level of logs,
once connected to the db, I can show the collections, but I don’t know how I can see logs of them.
MongoDB Conf file

root@7f1c1645616f:/etc# cat mongod.conf.orig
# mongod.conf

# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/

# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
#  engine:
#  mmapv1:
#  wiredTiger:

# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1


# how the process runs
processManagement:
  timeZoneInfo: /usr/share/zoneinfo

#security:

#operationProfiling:

#replication:

#sharding:

## Enterprise-Only Options:

#auditLog:

#snmp:

root@7f1c1645616f:/etc# cd /var/log/mongodb/
no data in this directory..

MongoDB workflows jobs collections stats

> db.jobs.stats()
{
        "ns" : "workflows.jobs",
        "size" : 13429,
        "count" : 2,
        "avgObjSize" : 6714,
        "storageSize" : 32768,
        "capped" : false,
        "wiredTiger" : {
                "metadata" : {
                        "formatVersion" : 1
                },
                "creationString" : "access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
                "type" : "file",
                "uri" : "statistics:table:collection-24--3612238366615292736",
                "LSM" : {
                        "bloom filter false positives" : 0,
                        "bloom filter hits" : 0,
                        "bloom filter misses" : 0,
                        "bloom filter pages evicted from cache" : 0,
                        "bloom filter pages read into cache" : 0,
                        "bloom filters in the LSM tree" : 0,
                        "chunks in the LSM tree" : 0,
                        "highest merge generation in the LSM tree" : 0,
                        "queries that could have benefited from a Bloom filter that did not exist" : 0,
                        "sleep for LSM checkpoint throttle" : 0,
                        "sleep for LSM merge throttle" : 0,
                        "total size of bloom filters" : 0
                },
                "block-manager" : {
                        "allocations requiring file extension" : 0,
                        "blocks allocated" : 0,
                        "blocks freed" : 0,
                        "checkpoint size" : 4096,
                        "file allocation unit size" : 4096,
                        "file bytes available for reuse" : 12288,
                        "file magic number" : 120897,
                        "file major version number" : 1,
                        "file size in bytes" : 32768,
                        "minor version number" : 0
                },
                "btree" : {
                        "btree checkpoint generation" : 0,
                        "column-store fixed-size leaf pages" : 0,
                        "column-store internal pages" : 0,
                        "column-store variable-size RLE encoded values" : 0,
                        "column-store variable-size deleted values" : 0,
                        "column-store variable-size leaf pages" : 0,
                        "fixed-record size" : 0,
                        "maximum internal page key size" : 368,
                        "maximum internal page size" : 4096,
                        "maximum leaf page key size" : 2867,
                        "maximum leaf page size" : 32768,
                        "maximum leaf page value size" : 67108864,
                        "maximum tree depth" : 0,
                        "number of key/value pairs" : 0,
                        "overflow pages" : 0,
                        "pages rewritten by compaction" : 0,
                        "row-store internal pages" : 0,
                        "row-store leaf pages" : 0
                },
                "cache" : {
                        "bytes currently in the cache" : 227,
                        "bytes read into cache" : 51,
                        "bytes written from cache" : 0,
                        "checkpoint blocked page eviction" : 0,
                        "data source pages selected for eviction unable to be evicted" : 0,
                        "eviction walk passes of a file" : 0,
                        "eviction walk target pages histogram - 0-9" : 0,
                        "eviction walk target pages histogram - 10-31" : 0,
                        "eviction walk target pages histogram - 128 and higher" : 0,
                        "eviction walk target pages histogram - 32-63" : 0,
                        "eviction walk target pages histogram - 64-128" : 0,
                        "eviction walks abandoned" : 0,
                        "eviction walks gave up because they restarted their walk twice" : 0,
                        "eviction walks gave up because they saw too many pages and found no candidates" : 0,
                        "eviction walks gave up because they saw too many pages and found too few candidates" : 0,
                        "eviction walks reached end of tree" : 0,
                        "eviction walks started from root of tree" : 0,
                        "eviction walks started from saved location in tree" : 0,
                        "hazard pointer blocked page eviction" : 0,
                        "in-memory page passed criteria to be split" : 0,
                        "in-memory page splits" : 0,
                        "internal pages evicted" : 0,
                        "internal pages split during eviction" : 0,
                        "leaf pages split during eviction" : 0,
                        "modified pages evicted" : 0,
                        "overflow pages read into cache" : 0,
                        "page split during eviction deepened the tree" : 0,
                        "page written requiring cache overflow records" : 0,
                        "pages read into cache" : 1,
                        "pages read into cache after truncate" : 0,
                        "pages read into cache after truncate in prepare state" : 0,
                        "pages read into cache requiring cache overflow entries" : 0,
                        "pages requested from the cache" : 0,
                        "pages seen by eviction walk" : 0,
                        "pages written from cache" : 0,
                        "pages written requiring in-memory restoration" : 0,
                        "tracked dirty bytes in the cache" : 0,
                        "unmodified pages evicted" : 0
                },
                "cache_walk" : {
                        "Average difference between current eviction generation when the page was last considered" : 0,
                        "Average on-disk page image size seen" : 0,
                        "Average time in cache for pages that have been visited by the eviction server" : 0,
                        "Average time in cache for pages that have not been visited by the eviction server" : 0,
                        "Clean pages currently in cache" : 0,
                        "Current eviction generation" : 0,
                        "Dirty pages currently in cache" : 0,
                        "Entries in the root page" : 0,
                        "Internal pages currently in cache" : 0,
                        "Leaf pages currently in cache" : 0,
                        "Maximum difference between current eviction generation when the page was last considered" : 0,
                        "Maximum page size seen" : 0,
                        "Minimum on-disk page image size seen" : 0,
                        "Number of pages never visited by eviction server" : 0,
                        "On-disk page image sizes smaller than a single allocation unit" : 0,
                        "Pages created in memory and never written" : 0,
                        "Pages currently queued for eviction" : 0,
                        "Pages that could not be queued for eviction" : 0,
                        "Refs skipped during cache traversal" : 0,
                        "Size of the root page" : 0,
                        "Total number of pages currently in cache" : 0
                },
                "compression" : {
                        "compressed pages read" : 0,
                        "compressed pages written" : 0,
                        "page written failed to compress" : 0,
                        "page written was too small to compress" : 0,
                        "raw compression call failed, additional data available" : 0,
                        "raw compression call failed, no additional data available" : 0,
                        "raw compression call succeeded" : 0
                },
                "cursor" : {
                        "bulk-loaded cursor-insert calls" : 0,
                        "close calls that result in cache" : 0,
                        "create calls" : 0,
                        "cursor operation restarted" : 0,
                        "cursor-insert key and value bytes inserted" : 0,
                        "cursor-remove key bytes removed" : 0,
                        "cursor-update value bytes updated" : 0,
                        "cursors reused from cache" : 0,
                        "insert calls" : 0,
                        "modify calls" : 0,
                        "next calls" : 0,
                        "open cursor count" : 0,
                        "prev calls" : 0,
                        "remove calls" : 0,
                        "reserve calls" : 0,
                        "reset calls" : 0,
                        "search calls" : 0,
                        "search near calls" : 0,
                        "truncate calls" : 0,
                        "update calls" : 0
                },
                "reconciliation" : {
                        "dictionary matches" : 0,
                        "fast-path pages deleted" : 0,
                        "internal page key bytes discarded using suffix compression" : 0,
                        "internal page multi-block writes" : 0,
                        "internal-page overflow keys" : 0,
                        "leaf page key bytes discarded using prefix compression" : 0,
                        "leaf page multi-block writes" : 0,
                        "leaf-page overflow keys" : 0,
                        "maximum blocks required for a page" : 0,
                        "overflow values written" : 0,
                        "page checksum matches" : 0,
                        "page reconciliation calls" : 0,
                        "page reconciliation calls for eviction" : 0,
                        "pages deleted" : 0
                },
                "session" : {
                        "object compaction" : 0
                },
                "transaction" : {
                        "update conflicts" : 0
                }
        },
        "nindexes" : 3,
        "totalIndexSize" : 98304,
        "indexSizes" : {
                "_id_" : 32768,
                "status" : 32768,
                "workflow_name" : 32768
        },
        "ok" : 1
}
> exit

Thanks for your help!
when try to upload file to artifact, get this issue on mongo container :

03-16T19:25:12.006+0000 I COMMAND [conn4] command workflows.jobs command: insert { insert: “jobs”, ordered: true, lsid: { id: UUID(“841dd009-dda6-4e96-9621-0523d672cb51”) }, $db: “workflows” } ninserted:0 writeConflicts:1 exception: E11000 duplicate key error collection: workflows.jobs index: id dup key: { : “5e6fd297b0e223cea63b0b98” } code:DuplicateKey numYields:0 reslen:196 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_msg 134ms

I think this issue its similar yo mine :

@blackdevice this is unfortunately a bug; may I ask you to run the latest master of the workflows service to verify if it solves your issue?

image: mendersoftware/workflows:master_218ef4159f6ff21361d58d4267b07e107f8a3e35

Ok I understand.

Yes, I can try that version, but I’m not sure how to do it.

If you indicate the steps to follow, I will try it!

is it enough to do docker pull + stop + rm?

@blackdevice

edit docker-compose.yaml, and replace the mender-workflows-server image here:

#
# mender-workflows-server
#
mender-workflows-server:
    image: mendersoftware/workflows:master_218ef4159f6ff21361d58d4267b07e107f8a3e35

Then, tear down your docker composition and start it again.

Thank you very much for your help!

Done the change,

Apparently I don’t see any changes in the problem, and the same thing keeps happening when trying to upload releases of more than 1mb…it reaches 100% of the file upload but it stays there.
I’ve tried upload almost anything, and it’s ok if it doesn’t reach 1mb, everything above that size doesn’t work.


@blackdevice I think there’s two separate issues here. If you have a mender artifact that you’re trying to upload, and it reaches 100% but doesn’t complete, try this patch:

That fixed the upload issue for me. I think the other issue is with uploading files to create artifacts using the web UI, and that’s what @tranchitella was trying to fix with the updated workflows image. I tried that image as well and I still couldn’t create an artifact from a file on the web UI, I’m going to try to collect the logs and send them when I have time.

Hi,
Thanks!, but unfortunately after trying the proxy modification it doesn’t fix the problem for me i have only tried modifying the value in the common archive without making it permanent but it doesn’t seem to have to do with my loading problem.
apparently, the same thing happens, above 1Mb, it reaches 100% but the load is not effective.

@blackdevice, can you please describe your setup? Are you using a load-balancer / reverse-proxy in front of the Mender back-end? You need to increase the idle timeout of the load-balancer / reverse proxy in front of the API-gateway to at least 300 seconds to process big artifacts.

This behavior is something we are going to improve/fix in the next Mender release (2.3) due early June, with beta release at the beginning of May.

Hi tranchitella, I’m happy to read your answer :slight_smile:
We have no balancer at the moment, nor a proxy in front of the server.

Our configuration for now is simple, we have the Mender hosted, with SSL certificates to make it secure, (this has never given us a greater problem, and the boards connect perfectly)

As I say, the problem is that above >1mb in the file load, it upload to 100% but it does not upload the file to the minio, nor does it reflect it in the web panel.

@blackdevice does the problem affect uploads of ready-to-use mender artifacts (.mender file extensions)? We are releasing the new Mender beta in a few days, it contains several improvements related to file uploads, and hopefully, it will solve your issue.

Hi Tranchitella,
No, the problem does not affect the .mender artifacts,
it only happens with any other type of file!

You can upload anything below 1mb,
but above 1mb, everything uploads to 100% but it is not reflected in minio or in the artifacts web panel!

If you want me to try a beta before uploading it to see if it solves the problem,
for us there is no problem!

Thanks for your help.