pip install openai
Python interpreter will be restarted.
Collecting openai
Downloading openai-0.27.6-py3-none-any.whl (71 kB)
Collecting tqdm
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Requirement already satisfied: requests>=2.20 in /databricks/python3/lib/python3.9/site-packages (from openai) (2.26.0)
Collecting aiohttp
Downloading aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.9/site-packages (from requests>=2.20->openai) (3.2)
Requirement already satisfied: charset-normalizer~=2.0.0 in /databricks/python3/lib/python3.9/site-packages (from requests>=2.20->openai) (2.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.7)
Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.9/site-packages (from requests>=2.20->openai) (2021.10.8)
Collecting frozenlist>=1.1.1
Downloading frozenlist-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)
Collecting yarl<2.0,>=1.0
Downloading yarl-1.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (269 kB)
Collecting async-timeout<5.0,>=4.0.0a3
Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting aiosignal>=1.1.2
Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting multidict<7.0,>=4.5
Downloading multidict-6.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
Requirement already satisfied: attrs>=17.3.0 in /databricks/python3/lib/python3.9/site-packages (from aiohttp->openai) (21.2.0)
Installing collected packages: multidict, frozenlist, yarl, async-timeout, aiosignal, tqdm, aiohttp, openai
Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 frozenlist-1.3.3 multidict-6.0.4 openai-0.27.6 tqdm-4.65.0 yarl-1.9.2
Python interpreter will be restarted.
pip install pdfplumber
Python interpreter will be restarted.
Collecting pdfplumber
Downloading pdfplumber-0.9.0-py3-none-any.whl (46 kB)
Collecting Wand>=0.6.10
Downloading Wand-0.6.11-py2.py3-none-any.whl (143 kB)
Collecting Pillow>=9.1
Downloading Pillow-9.5.0-cp39-cp39-manylinux_2_28_x86_64.whl (3.4 MB)
Collecting pdfminer.six==20221105
Downloading pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
Requirement already satisfied: charset-normalizer>=2.0.0 in /databricks/python3/lib/python3.9/site-packages (from pdfminer.six==20221105->pdfplumber) (2.0.4)
Collecting cryptography>=36.0.0
Downloading cryptography-40.0.2-cp36-abi3-manylinux_2_28_x86_64.whl (3.7 MB)
Requirement already satisfied: cffi>=1.12 in /databricks/python3/lib/python3.9/site-packages (from cryptography>=36.0.0->pdfminer.six==20221105->pdfplumber) (1.14.6)
Requirement already satisfied: pycparser in /databricks/python3/lib/python3.9/site-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six==20221105->pdfplumber) (2.20)
Installing collected packages: cryptography, Wand, Pillow, pdfminer.six, pdfplumber
Attempting uninstall: cryptography
Found existing installation: cryptography 3.4.8
Not uninstalling cryptography at /databricks/python3/lib/python3.9/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-d3129cee-e864-4618-9c69-e1f62321ba74
Can't uninstall 'cryptography'. No files were found to uninstall.
Attempting uninstall: Pillow
Found existing installation: Pillow 8.4.0
Not uninstalling pillow at /databricks/python3/lib/python3.9/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-d3129cee-e864-4618-9c69-e1f62321ba74
Can't uninstall 'Pillow'. No files were found to uninstall.
Successfully installed Pillow-9.5.0 Wand-0.6.11 cryptography-40.0.2 pdfminer.six-20221105 pdfplumber-0.9.0
Python interpreter will be restarted.
import os import openai import pdfplumber import re import pandas as pd openai.api_key = "sk-uCplXmEuLWmxxxxxxxxxxxxxZqpcU7hXqhgWBbLaBj" results = [] #Open File an search for "Technical specifications" pdf_file = "/Workspace/Users/chhirsh@gmail.com/Alimak-Scando-650-Brochure-w-max-lift-logo.pdf" pattern = re.compile(r"Technical specifications", flags=re.IGNORECASE) with pdfplumber.open(pdf_file) as pdf: pages = pdf.pages for page_nr, pg in enumerate(pages, 0): content = pg.extract_text() for match in pattern.finditer(content): print('Keyword:',match.group(), '| Page:', page_nr, '| Location: ',content.index(match.group())) foundPageNum = page_nr
Keyword: TECHNICAL SPECIFICATIONS | Page: 3 | Location: 765
table_page = pdf.pages[foundPageNum] testouput01 = table_page.extract_text_simple(x_tolerance=3, y_tolerance=3) testouput01
Out[2]: 'ALIMAK SCANDO 650\nMain characteristics\n• Robust design and excellent comfort \n• Modular car design with extensions and multiple\ndoor and ramp options\n• Single and dual car configurations \n• Triple entrance possible at ground level and at any\nlanding level\n• A high-efficiency helical gearbox provides lower\npower consumption and reduced operational costs\n• ALC-II collective control system with group control\nand internal fault diagnostics system. Control panel\nmovable between two sides \n• Built-in stainless steel electrical cabinet maximises\ninternal hoist dimensions for increased lifting\ncapacity \n• Wide range of optional equipment and functions\n• Remote Monitoring System — A3\n• Fully compliant with EN, ANSI, AS, PB and the\nmajority of all national regulations\nTECHNICAL SPECIFICATIONS\nALIMAK SCANDO 650 DOL FC\nMotor control Direct-on-line (DOL) Frequency control (FC)\nMax. payload capacity 1,500–3,200 kg (3,300–7,054 lbs) * 1,500–3,200 kg (3,300–7,054 lbs) *\nMax. travelling speed 38 m/min. (125 fpm) 66 m/min.(216 fpm)\nMax. lifting height 250 / 400 m (825 / 1300 ft) * 250 / 400 m (825 / 1,300 ft) *\nCar width, internal 1.5 m (4’-11”) 1.5 m (4’-11”)\nCar length, internal 2.8–5.0 m (9\'- 2 1⁄4"-16\'- 4 3⁄4") 2.8–5.0 m (9\'- 2 1⁄4"-16\'- 4 3⁄4")\nCar height, internal 2.3 m (7’-6 1/2”) 2.3 m (7’-6 1/2”)\nMax. load space per car 17 m3(610 ft3) 17 m3(610 ft3)\nNo. of motors 2 x 11 kW 2 or 3 x 11 kW\nSafety device type ALIMAK GFD ALIMAK GFD\nPower supply range 400–500 V, 50 or 60 Hz, 3 phase 400–500 V, 50 or 60 Hz, 3 phase\nFuse ratings From 60 A From 63 A \nType of mast 650, tubular steel with integrated rack 650, tubular steel with integrated rack\nLength mast section 1.508 m (4’-11 3⁄8”) 1.508 m (4’-11 3⁄8”)\nWeight mast section with 1 rack 118 kg (260 lbs) 118 kg (260 lbs)\nRack module 5 5 \n* Increased payload capacity and lifting height and on request.\nFor other demands or specifications, please consult your Alimak representative.\n9\n1\n0\nwww.alimak.com 2\nb. \ne\nF\nN/\nE\n7 \nPictures are illustrative only and do not necessarily show the configuration of products on the market at a given point in time. Products must be used in conformity with safe practice and applicable 21\nstatutes, regulations, codes and ordinances. Specifications of products and equipment shown herein are subject to change without notice. Copyright © 2019 Alimak Group. All rights reserved. 1\nAlimak and Scando are registered trademarks of Alimak Group. '
response = openai.Completion.create( model="text-davinci-003", prompt=f"Please extract all mechanical measurment contianing numbers into the table format [section | Spec] from the input text.\n\n\"\"\"{page_text}\"\"\"", temperature=0.7, max_tokens=3000, top_p=1, frequency_penalty=0, presence_penalty=0 ) output = response.choices[0].text.split('\n') output
Out[2]: ['',
'',
'Section | Spec',
'Motor control | Direct-on-line (DOL) Frequency control (FC)',
'Max. payload capacity | 1,500–3,200 kg (3,300–7,054 lbs)',
'Max. travelling speed | 38 m/min. (125 fpm) 66 m/min.(216 fpm)',
'Max. lifting height | 250 / 400 m (825 / 1300 ft)',
'Car width, internal | 1.5 m (4’-11”)',
'Car length, internal | 2.8–5.0 m (9\'- 2 1⁄4"-16\'- 4 3⁄4")',
'Car height, internal | 2.3 m (7’-6 1/2”)',
'Max. load space per car | 17 m3(610 ft3)',
'No. of motors | 2 x 11 kW 2 or 3 x 11 kW',
'Safety device type | ALIMAK GFD',
'Power supply range | 400–500 V, 50 or 60 Hz, 3 phase',
'Fuse ratings | From 60 A',
'Type of mast | 650, tubular steel with integrated rack',
'Length mast section | 1.508 m (4’-11 3⁄8”)',
'Weight mast section with 1 rack | 118 kg (260 lbs)',
'Rack module | 5']
Showcases OpenAIs ability to parse out relevent information from an entire PDF page.