Package: repboxArt 0.1

Sebastian Kranz

repboxArt: Converting articles from PDF to text and managing and extracting basic information including tables

Converting articles from PDF to text and managing and extracting basic information including tables

Authors:Sebastian Kranz

repboxArt_0.1.tar.gz
repboxArt_0.1.zip(r-4.5)repboxArt_0.1.zip(r-4.4)repboxArt_0.1.zip(r-4.3)
repboxArt_0.1.tgz(r-4.4-any)repboxArt_0.1.tgz(r-4.3-any)
repboxArt_0.1.tar.gz(r-4.5-noble)repboxArt_0.1.tar.gz(r-4.4-noble)
repboxArt_0.1.tgz(r-4.4-emscripten)repboxArt_0.1.tgz(r-4.3-emscripten)
repboxArt.pdf |repboxArt.html
repboxArt/json (API)

# Install 'repboxArt' in R:
install.packages('repboxArt', repos = c('https://repboxr.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/repboxr/repboxart/issues

On CRAN:

3.73 score 1 stars 2 packages 6 scripts 130 exports 27 dependencies

Last updated 17 days agofrom:46c18eb666 (on main). Checks:OK: 1 WARNING: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 06 2024
R-4.5-winWARNINGNov 06 2024
R-4.5-linuxWARNINGNov 06 2024
R-4.4-winWARNINGNov 06 2024
R-4.4-macWARNINGNov 06 2024
R-4.3-winWARNINGNov 06 2024
R-4.3-macWARNINGNov 06 2024

Exports:activate_art_routeart_ensure_correct_dirsart_extract_paren_type_from_tab_notesart_extract_pdf_raw_tabsart_extract_pdf_tabsart_extract_regstatsart_get_html_filesart_get_pdf_filesart_has_htmlart_has_pdfart_has_two_colart_html_tab_standardizeart_html_to_partsart_load_tab_dfart_load_tabsart_load_text_partsart_load_txt_pagesart_locate_col_refsart_locate_sentencesart_locate_tab_fig_refsart_pdf_pages_to_partsart_pdf_to_txt_pagesart_phrase_analysisart_refs_analysisart_reg_save_repdbart_reg_stats_phrasesart_repair_two_colart_repair_two_col_aer_pandpart_save_basic_infoart_save_repdb_tabart_tab_phrase_analysisart_tabs_to_regsart_text_parts_phrase_analysisart_update_projectbind_rows_with_parent_fieldscell_df_find_num_paren_pairscell_df_to_tabhtmlcheck_and_repair_footnote_candidatescombine_short_paragraphscombine_text_linesecta_parse_htmlecta_parse_html_tableends.with.textensure_empty_typesexampleextract_all_to_index_dfextract_num_from_sequence_textextract_order_num_from_sequence_textfind_stars_strfirst_repair_art_pdf_textfirst.non.nullfrom_toget_art_routeget_art_tab_cell_with_reg_infoget_phrases_defguess_journhtml_tab_cell_row_panel_dfhtml_table_cells_from_all_trhtml_table_cells_from_tridentify_figure_lines_on_pageis_aer_pandpis_really_a_note_lineis.truejpe_parse_htmljpe_parse_html_tablekeep.overlapping.locleft_join_overlapline_df_find_figuresline_df_find_footnotesline_df_find_junk_linesline_df_find_page_header_footerline_df_find_section_candsline_df_find_sectionsline_df_to_parts_dflines_to_pageslines_to_plinesload_art_route_parcelsload_phrases_defloc_sep_linesloc_to_dflocate_all_as_dfmake_art_small_regmake_phrases_defmap_loc_to_parent_locmatch_overlapmost.commonms_parse_htmlms_parse_html_tablemy_rankna.falsena.removena.valold.match_stat_to_reg_dfpdf_to_txt_pagesplines_to_linesreadRDS.or.nullrefine_cell_df_and_add_panel_inforemove_nested_html_elementsremove.colsremove.overlapping.locrepbox_art_optsrepbox_journ_listrepbox.extract.pdf.imagesrestat_parse_htmlrestud_parse_htmlrestud_parse_html_tablerle_blockrle_cummax_blockrle_tableroute_art_tab_finish_routeroute_art_tab_setsave_rds_create_dirsentences_merge_with_nextseq_rowsset_art_routeshow_cell_df_htmlsubstitute_wrong_pdf_txt_charstab_df_to_cell_dftab_df_to_row_dftabname_to_tabidtabtitle_to_tabidtext_df_add_section_colstext_df_standardizetext_parts_tab_fig_referencestext_parts_to_loctxt_locate_keywordstxt_locate_rx_keywordstxt_locate_typed_keywordstxt_phrase_analysisversion_repbox_art

Dependencies:clicpp11data.tabledigestdplyrExtractSciTabfansigenericsgluelifecyclemagrittrpillarpkgconfigpurrrR6repboxUtilsrestorepointrlangstringistringrstringtoolstibbletidyrtidyselectutf8vctrswithr