Ai知识库导入文档向量化失败

return0 · December 29, 2025, 6:49am

针对AI知识库，采用的是通过接口的方式将文档导入知识库，类似于这样：

app.rags.integratedManagementOfTheKnowled.addDocumentByBusinessId(‘aa’, [{ “fileName”: ‘aa’, “size”: size, “url”: ‘https://wd.hbxtxzl.cn/group1/M00/02/3E/CsiNamj4RdmAdH19ABkvYlzOmJY995.pdf’, “type”: ‘’}], {“chunkSeparator”: [“\n”], “chunkSize”: 1024, “chunkOverlap”: 100}, {“chunkCleaning”: True})

最近发现，有些文档通过这种方式导入后，知识库内并没有显示该文档

所以我采取了将文档下载到本地导进去的方式：

会出现向量化失败的情况：

错误信息如下：

文档 processing failed: {‘code’: 60101, ‘message’: ‘Document loading failed [{file_path}https://jit.hbxtxzl.cn/api/xtjit/hbxt/storages/services/StorageSvc/preview?file=4ff40de7673e07a70777b9b868fa51b0.pdf\]: {\‘code\’: 60101, \‘message\’: \‘Document loading failed [https://jit.hbxtxzl.cn/api/xtjit/hbxt/storages/services/StorageSvc/preview?file=4ff40de7673e07a70777b9b868fa51b0.pdf{file_url}\]: {“code”: 20005, “reason”: "Element rags.NormalType.services.RemoteLoader [HTTP Document Download Service] failed. Error: {\\\‘code\\\’: 60101, \\\‘message\\\’: \\\‘Document loading failed [/tmp/tmp_4wba758.pdf{file_url}]: {“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’, \\\‘description\\\’: \\\‘Unable to load specified document file\\\’, \\\‘file_path\\\’: \\\’/tmp/tmp_4wba758.pdf\\\’, \\\‘error\\\’: \\\‘{“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’}“}\‘, \‘description\’: \‘Unable to load specified document file\’, \‘file_path\’: \‘https://jit.hbxtxzl.cn/api/xtjit/hbxt/storages/services/StorageSvc/preview?file=4ff40de7673e07a70777b9b868fa51b0.pdf\’, \‘error\’: \’{“code”: 20005, “reason”: “Element rags.NormalType.services.RemoteLoader [HTTP Document Download Service] failed. Error: {\\\‘code\\\’: 60101, \\\‘message\\\’: \\\‘Document loading failed [/tmp/tmp_4wba758.pdf{file_url}]: {“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’, \\\‘description\\\’: \\\‘Unable to load specified document file\\\’, \\\‘file_path\\\’: \\\‘/tmp/tmp_4wba758.pdf\\\’, \\\‘error\\\’: \\\‘{“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’}”}\‘}’, ‘description’: ‘Unable to load specified document file’, ‘file_url’: ‘https://jit.hbxtxzl.cn/api/xtjit/hbxt/storages/services/StorageSvc/preview?file=4ff40de7673e07a70777b9b868fa51b0.pdf’, ‘error’: ‘{\‘code\’: 60101, \‘message\’: \‘Document loading failed [https://jit.hbxtxzl.cn/api/xtjit/hbxt/storages/services/StorageSvc/preview?file=4ff40de7673e07a70777b9b868fa51b0.pdf{file_url}\]: {“code”: 20005, “reason”: "Element rags.NormalType.services.RemoteLoader [HTTP Document Download Service] failed. Error: {\\\‘code\\\’: 60101, \\\‘message\\\’: \\\‘Document loading failed [/tmp/tmp_4wba758.pdf{file_url}]: {“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’, \\\‘description\\\’: \\\‘Unable to load specified document file\\\’, \\\‘file_path\\\’: \\\’/tmp/tmp_4wba758.pdf\\\’, \\\‘error\\\’: \\\‘{“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’}”}\‘, \‘description\’: \‘Unable to load specified document file\’, \‘file_path\’: \‘https://jit.hbxtxzl.cn/api/xtjit/hbxt/storages/services/StorageSvc/preview?file=4ff40de7673e07a70777b9b868fa51b0.pdf\’, \‘error\’: \’{“code”: 20005, “reason”: “Element rags.NormalType.services.RemoteLoader [HTTP Document Download Service] failed. Error: {\\\‘code\\\’: 60101, \\\‘message\\\’: \\\‘Document loading failed [/tmp/tmp_4wba758.pdf{file_url}]: {“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’, \\\‘description\\\’: \\\‘Unable to load specified document file\\\’, \\\‘file_path\\\’: \\\‘/tmp/tmp_4wba758.pdf\\\’, \\\‘error\\\’: \\\‘{“code”: 20005, “reason”: “Element rags.NormalType.services.PdfLoader [PDF Document Loader Service] failed. Error: Could not read Boolean object”}\\\’}”}\‘}’}

因此想请问一下，这是什么原因造成的，如何解决并修复

zjfjiayou · December 29, 2025, 7:42am

PDF 文件格式有问题或不兼容，把pdf文件私发我一下，我看看

zjfjiayou · December 29, 2025, 8:52am

文件我看了，目前解析pdf用的是pypdf库解析的，不支持你的文件格式，我换pymupdf试了一下，依然解析不出来里面的文本，可能这个pdf是设计型 PDF 宣传册，计软件导出时栅格化了文字，所以各种解析库拿不到里面的文字，也就不能向量化了，建议找到这个内容的docx版本或者文本内容去向量化，也可以使用文字识别pdf之后转文字再做向量化