Skip to content Skip to sidebar Skip to footer

Read Tables From Html Page By Changing The Id Using Python

I am using the html link below to read the table in the page: http://a810-bisweb.nyc.gov/bisweb/ActionsByLocationServlet?requestid=1&allbin=2016664 The last part of the link(al

Solution 1:

To get all pages from list of IDs you can use next example:

import requests
import pandas as pd
from io import StringIO

url = "http://a810-bisweb.nyc.gov/bisweb/ActionsByLocationServlet?requestid=1&allbin={}&allcount={}"defget_info(ID, page=1):
    out = []
    whileTrue:
        try:
            print("ID: {} Page: {}".format(ID, page))
            t = requests.get(url.format(ID, page), timeout=1).text
            df = pd.read_html(StringIO(t))[3].loc[1:, :]
            iflen(df) == 0:
                break
            df.columns = ["NUMBER", "NUMBER", "TYPE", "FILE DATE"]
            df["ID"] = ID
            out.append(df)
            page += 25except requests.exceptions.ReadTimeout:
            print("Timeout...")
            continuereturn out


list_of_ids = [2016664, 4257909, 4138920, 4533715]

dfs = []
for ID in list_of_ids:
    dfs.extend(get_info(ID))

df = pd.concat(dfs)
print(df)
df.to_csv("data.csv", index=None)

Prints:

                                                                              NUMBER                                                                            NUMBER                                                                              TYPE                                                                         FILE DATE       ID
1                                                                      ALT 1469-1890                                                                               NaN                                                                        ALTERATION                                                                        00/00/0000  2016664
2                                                                      ALT 1313-1874                                                                               NaN                                                                        ALTERATION                                                                        00/00/0000  2016664
3                                                                        BN 332-1938                                                                               NaN                                                                   BUILDING NOTICE                                                                        00/00/0000  2016664
4                                                                        BN 636-1916                                                                               NaN                                                                   BUILDING NOTICE                                                                        00/00/0000  2016664
5                                                                    CO NB 1295-1923                                                                             (PDF)                                                          CERTIFICATE OF OCCUPANCY                                                                        00/00/0000  2016664

...

And saves data.csv (screenshot from LibreOffice):

enter image description here

Solution 2:

The code below will extract all the tables in a web page

import numpy as np

import pandas as pd

url = 'http://a810-bisweb.nyc.gov/bisweb/ActionsByLocationServlet?requestid=1&allbin=2016664'

df_list = pd.read_html(url) #returns as list of dataframes from the web page

print(len(df_list)) #print the number of dataframes

i = 0

while i < len(df_list): #loop through the list to print all tables

df = df_list[i]

print(df)

i = i + 1

Post a Comment for "Read Tables From Html Page By Changing The Id Using Python"