[SOLVED] Indexing problem files larger than 10MB

Hello

I’m using seafile pro 6.1.4 with office preview and indexing enabled.
The system runs quite well but PDFs larger than 10mb won`t be indexed.

Here some Output from elasticsearch

[09/04/2017 09:24:18] extracting 6529a6e2-1309-47a2-a395-237d90d236aa /Zeitschriften/CT/2014/ct1406.pdf…
Syntax Error: Invalid XRef entry
Internal Error: xref num 3317 not found but needed, try to reconstruct<0a>
Syntax Error: Invalid XRef entry
Syntax Error: Top-level pages object is wrong type (null)
Command Line Error: Wrong page range given: the first page (1) can not be after the last page (0).
[09/04/2017 09:24:18] successfully extracted /Zeitschriften/CT/2014/ct1406.pdf

My seafevent.conf:

[DATABASE]
type = sqlite3
path = /home/seafile/seafile/pro-data/seafevents.db
[INDEX FILES]
enabled = true
interval = 20m
index_office_pdf = true
lang = german
[OFFICE CONVERTER]
enabled = true
workers = 2
max-pages = 500
max-size = 50
[SEAHUB EMAIL]
enabled = true
interval = 30m

Smaller PDFs can be indexed.Is there a limitation?

There is no such limitation. From the error log, it should be a problem of that specific pdf file.

I don’t think so. For some test’s i’ve create the pdf with pdf24 creator (free tool) and even if the size is larger than 10MB the indexing failed.
If the file is more compressed and smaller 10mb the indexing is working.
Another test with this pdf (15MB) even failed see logs below

[09/05/2017 07:49:29] extracting 6529a6e2-1309-47a2-a395-237d90d236aa /Zeitschriften/lightroom_reference.pdf…
Syntax Error: Couldn’t find trailer dictionary
Syntax Error: Couldn’t find trailer dictionary
Syntax Error: Couldn’t read xref table

Is the file now searchable? Not sure if these errors neccessarily lead to an unindexed file.

The file isn`t searchable. Is the file searchable at your system? Is there a way to get more details/logs?

Having a look at my homeserver I also couldn’t find files larger than 10 mb.

So it actually looks like files larger than 10 mb won’t be indexed at all.

See

(The provided lightroom file is also not being indexed)

Thanks for your reply.
Anybody know how to change this?

modify seafes.config file, update office_size_limit more than 10mb, it worked.

I updated the seafevents.conf in section [INDEX] but it doesn*t take a effect.

[INDEX FILES]
enabled = true
interval = 60m
office_size_limit = 50

you need update seafes module config file. is not seafevents module.

class SeafesConfig(..):
    def __init__(self):

1 Like

Thanks zming!!
I`ve updatet

seafile/seafile-pro-server-6.1.4/pro/python/seafes/config.py

at line from

self.office_size_limit = 10 * 1024 * 1024 # 10 MB

to

self.office_size_limit = 40 * 1024 * 1024 # 40 MB

It works like a charm! Many thanks

2 Likes

@daniel.pan Could you add this option to be defined in the main config file (it’s easier and better for updates) or increase the default limit?

Yes, we will add such config item.

3 Likes