Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alterada classe BaseSiganetSpider e incluído ma_sao_luis #887

Closed
wants to merge 2 commits into from

Conversation

valeriow
Copy link
Contributor

AO ABRIR um Pull Request de um novo raspador (spider), marque com um X cada um dos items do checklist
abaixo. NÃO ABRA um novo Pull Request antes de completar todos os items abaixo.

Checklist - Novo spider

  • Você executou uma extração completa do spider localmente e os dados retornados estavam corretos.
  • Você executou uma extração por período (start_date e end_date definidos) ao menos uma vez e os dados retornados estavam corretos.
  • Você verificou que não existe nenhum erro nos logs (log_count/ERROR igual a zero).
  • Você definiu o atributo de classe start_date no seu spider com a data do Diário Oficial mais antigo disponível na página da cidade.
  • Você garantiu que todos os campos que poderiam ser extraídos foram extraídos de acordo com a documentação.

Descrição

Alterada classe BaseSiganetSpider (siganet.py) para comportar a versão usada por ma_coroata e ma_sao_jose_dos_basilios sem quebrar, juntamente com a versão usada por ma_sao_luis. Adicionado spider para ma_sao_luis.

Executei ma_sao_luis, ma_coroata e ma_sao_jose_dos_basilios com sucesso. Alguns comentários sobre a WARNINGs que ocorreram na execução de São Luis/MA

  1. Alguns checksum (arquivos iguais (mesmo checksum) em data diferentes:

parece ser algum erro que levou a duplicidades de arquivos.

  1. tamanho do arquivo maior que o maior que o padrão do scrapy:
    Excedeu os 32Mb padrão do Scrapy. Avaliar a possibilidade de alterar esse valor nas configurações.
    https://doc.scrapy.org/en/latest/topics/settings.html?highlight=download_max#std-reqmeta-download_maxsize

Segue o log com todos warnings:

2023-06-19 17:08:01 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (36065267) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/dom-prefeitura-municipal-de-sao-luis-ano-xliii-edicao-0337-assinado.pdf>.
2023-06-19 17:08:09 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/dom-prefeitura-municipal-de-sao-luis-ano-xliii-edicao-0337-assinado.pdf>.
2023-06-19 17:08:14 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (217620867) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-006-S-86F1.pdf>.
2023-06-19 17:08:25 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-006-S-86F1.pdf>.
2023-06-19 17:09:09 [ma_sao_luis] WARNING: Something wrong has happened when adding the gazette in the database. Date: 2015-02-23. File Checksum: dca5588767f19c8d775dd2e317cb2da6. Details: ('(sqlite3.IntegrityError) UNIQUE constraint failed: gazettes.territory_id, gazettes.date, gazettes.file_checksum',)
2023-06-19 17:09:11 [ma_sao_luis] WARNING: Something wrong has happened when adding the gazette in the database. Date: 2015-02-24. File Checksum: af33a169e8eaa7d7684e3c1002ae8bdd. Details: ('(sqlite3.IntegrityError) UNIQUE constraint failed: gazettes.territory_id, gazettes.date, gazettes.file_checksum',)
2023-06-19 17:09:13 [ma_sao_luis] WARNING: Something wrong has happened when adding the gazette in the database. Date: 2015-02-25. File Checksum: bbfebba5e3e64cc583ec987401896f5c. Details: ('(sqlite3.IntegrityError) UNIQUE constraint failed: gazettes.territory_id, gazettes.date, gazettes.file_checksum',)
2023-06-19 17:09:32 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (34254118) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-051-N-1B33.pdf>.
2023-06-19 17:09:34 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (37101525) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2016-053-S-00F8.pdf>.
2023-06-19 17:09:40 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-051-N-1B33.pdf>.
2023-06-19 17:09:43 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2016-053-S-00F8.pdf>.
2023-06-19 17:11:19 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (168516559) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-100-S-D6D3.pdf>.
2023-06-19 17:11:25 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-100-S-D6D3.pdf>.
2023-06-19 17:11:37 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (109342702) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2020-109-S-AFFD.pdf>.
2023-06-19 17:11:45 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2020-109-S-AFFD.pdf>.
2023-06-19 17:12:23 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (45330644) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2018-123-S-870D.pdf>.
2023-06-19 17:12:28 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2018-123-S-870D.pdf>.
2023-06-19 17:12:43 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (36148669) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-127-N-D58C.pdf>.
2023-06-19 17:12:46 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (34337939) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2021-128-S-5212.pdf>.
2023-06-19 17:12:52 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (34830033) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-130-N-6691.pdf>.
2023-06-19 17:12:52 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-127-N-D58C.pdf>.
2023-06-19 17:12:58 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-130-N-6691.pdf>.
2023-06-19 17:13:00 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2021-128-S-5212.pdf>.
2023-06-19 17:13:15 [ma_sao_luis] WARNING: Something wrong has happened when adding the gazette in the database. Date: 2013-07-22. File Checksum: 63d64bca1e92e700dcdcf5062a144711. Details: ('(sqlite3.IntegrityError) UNIQUE constraint failed: gazettes.territory_id, gazettes.date, gazettes.file_checksum',)
2023-06-19 17:14:04 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (90352492) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-165-S-IEVD.pdf>.
2023-06-19 17:14:09 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-165-S-IEVD.pdf>.
2023-06-19 17:14:11 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (69845184) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-169-N-GFTR.pdf>.
2023-06-19 17:14:19 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2022-169-N-GFTR.pdf>.
2023-06-19 17:15:52 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (233851735) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2018-236-S-04F1.pdf>.
2023-06-19 17:15:57 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2018-236-S-04F1.pdf>.
2023-06-19 17:15:58 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (85478923) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2018-239-S-0DC5.pdf>.
2023-06-19 17:16:05 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2018-239-S-0DC5.pdf>.
2023-06-19 17:16:14 [scrapy.core.downloader.handlers.http11] WARNING: Expected response size (50316556) larger than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2017-242-S-9BAE.pdf>.
2023-06-19 17:16:24 [scrapy.core.downloader.handlers.http11] WARNING: Received more bytes than download warn size (33554432) in request <GET https://painel.siganet.net.br/upload/0000000485/cms/publicacoes/diario/DOM-2017-242-S-9BAE.pdf>.

…o usada por ma_coroata e ma_sao_jose_dos_basilios sem quebrar, juntamente com a versão usada por ma_sao_luis. Adicionado spider para ma_sao_luis.
@valeriow valeriow mentioned this pull request Jun 19, 2023
@trevineju trevineju linked an issue Jun 20, 2023 that may be closed by this pull request
@trevineju trevineju self-requested a review January 7, 2025 19:37
@trevineju
Copy link
Member

fui pegar essa PR para revisar e vi que o município migrou de site, a nova integração está se dando em #1353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

São Luís-MA
2 participants