pip install pandas_profiling
Python interpreter will be restarted.
Collecting pandas_profiling
Downloading pandas_profiling-3.6.6-py2.py3-none-any.whl (324 kB)
Collecting ydata-profiling
Downloading ydata_profiling-4.2.0-py2.py3-none-any.whl (352 kB)
Collecting multimethod<2,>=1.4
Downloading multimethod-1.9.1-py3-none-any.whl (10 kB)
Collecting typeguard<3,>=2.13.2
Downloading typeguard-2.13.3-py3-none-any.whl (17 kB)
Requirement already satisfied: numpy<1.24,>=1.16.0 in /databricks/python3/lib/python3.9/site-packages (from ydata-profiling->pandas_profiling) (1.20.3)
Requirement already satisfied: pandas!=1.4.0,<2,>1.1 in /databricks/python3/lib/python3.9/site-packages (from ydata-profiling->pandas_profiling) (1.3.4)
Requirement already satisfied: jinja2<3.2,>=2.11.1 in /databricks/python3/lib/python3.9/site-packages (from ydata-profiling->pandas_profiling) (2.11.3)
Collecting tqdm<5,>=4.48.2
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Requirement already satisfied: requests<3,>=2.24.0 in /databricks/python3/lib/python3.9/site-packages (from ydata-profiling->pandas_profiling) (2.26.0)
Collecting phik<0.13,>=0.11.1
Downloading phik-0.12.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (679 kB)
Collecting wordcloud>=1.9.1
Downloading wordcloud-1.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (460 kB)
Requirement already satisfied: seaborn<0.13,>=0.10.1 in /databricks/python3/lib/python3.9/site-packages (from ydata-profiling->pandas_profiling) (0.11.2)
Collecting visions[type_image_path]==0.7.5
Downloading visions-0.7.5-py3-none-any.whl (102 kB)
Collecting dacite>=1.8
Downloading dacite-1.8.1-py3-none-any.whl (14 kB)
Collecting imagehash==4.3.1
Downloading ImageHash-4.3.1-py2.py3-none-any.whl (296 kB)
Requirement already satisfied: matplotlib<4,>=3.2 in /databricks/python3/lib/python3.9/site-packages (from ydata-profiling->pandas_profiling) (3.4.3)
Collecting htmlmin==0.1.12
Downloading htmlmin-0.1.12.tar.gz (19 kB)
Collecting statsmodels<1,>=0.13.2
Downloading statsmodels-0.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.1 MB)
Collecting pydantic<2,>=1.8.1
Downloading pydantic-1.10.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Requirement already satisfied: scipy<1.11,>=1.4.1 in /databricks/python3/lib/python3.9/site-packages (from ydata-profiling->pandas_profiling) (1.7.1)
Collecting PyYAML<6.1,>=5.0.0
Downloading PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (661 kB)
Requirement already satisfied: pillow in /databricks/python3/lib/python3.9/site-packages (from imagehash==4.3.1->ydata-profiling->pandas_profiling) (8.4.0)
Collecting PyWavelets
Downloading PyWavelets-1.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB)
Collecting tangled-up-in-unicode>=0.0.4
Downloading tangled_up_in_unicode-0.2.0-py3-none-any.whl (4.7 MB)
Requirement already satisfied: attrs>=19.3.0 in /databricks/python3/lib/python3.9/site-packages (from visions[type_image_path]==0.7.5->ydata-profiling->pandas_profiling) (21.2.0)
Collecting networkx>=2.4
Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
Requirement already satisfied: MarkupSafe>=0.23 in /databricks/python3/lib/python3.9/site-packages (from jinja2<3.2,>=2.11.1->ydata-profiling->pandas_profiling) (2.0.1)
Requirement already satisfied: python-dateutil>=2.7 in /databricks/python3/lib/python3.9/site-packages (from matplotlib<4,>=3.2->ydata-profiling->pandas_profiling) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /databricks/python3/lib/python3.9/site-packages (from matplotlib<4,>=3.2->ydata-profiling->pandas_profiling) (1.3.1)
Requirement already satisfied: pyparsing>=2.2.1 in /databricks/python3/lib/python3.9/site-packages (from matplotlib<4,>=3.2->ydata-profiling->pandas_profiling) (3.0.4)
Requirement already satisfied: cycler>=0.10 in /databricks/python3/lib/python3.9/site-packages (from matplotlib<4,>=3.2->ydata-profiling->pandas_profiling) (0.10.0)
Requirement already satisfied: six in /databricks/python3/lib/python3.9/site-packages (from cycler>=0.10->matplotlib<4,>=3.2->ydata-profiling->pandas_profiling) (1.16.0)
Requirement already satisfied: pytz>=2017.3 in /databricks/python3/lib/python3.9/site-packages (from pandas!=1.4.0,<2,>1.1->ydata-profiling->pandas_profiling) (2021.3)
Requirement already satisfied: joblib>=0.14.1 in /databricks/python3/lib/python3.9/site-packages (from phik<0.13,>=0.11.1->ydata-profiling->pandas_profiling) (1.0.1)
Collecting typing-extensions>=4.2.0
Downloading typing_extensions-4.6.3-py3-none-any.whl (31 kB)
Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.9/site-packages (from requests<3,>=2.24.0->ydata-profiling->pandas_profiling) (3.2)
Requirement already satisfied: charset-normalizer~=2.0.0 in /databricks/python3/lib/python3.9/site-packages (from requests<3,>=2.24.0->ydata-profiling->pandas_profiling) (2.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.9/site-packages (from requests<3,>=2.24.0->ydata-profiling->pandas_profiling) (1.26.7)
Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.9/site-packages (from requests<3,>=2.24.0->ydata-profiling->pandas_profiling) (2021.10.8)
Collecting packaging>=21.3
Downloading packaging-23.1-py3-none-any.whl (48 kB)
Requirement already satisfied: patsy>=0.5.2 in /databricks/python3/lib/python3.9/site-packages (from statsmodels<1,>=0.13.2->ydata-profiling->pandas_profiling) (0.5.2)
Building wheels for collected packages: htmlmin
Building wheel for htmlmin (setup.py): started
Building wheel for htmlmin (setup.py): finished with status 'done'
Created wheel for htmlmin: filename=htmlmin-0.1.12-py3-none-any.whl size=27098 sha256=30de3840291857c6e7021f28eab72f4d8f79f8eac820d95400c573a2b2a42e8d
Stored in directory: /root/.cache/pip/wheels/1d/05/04/c6d7d3b66539d9e659ac6dfe81e2d0fd4c1a8316cc5a403300
Successfully built htmlmin
Installing collected packages: tangled-up-in-unicode, PyWavelets, networkx, multimethod, visions, typing-extensions, packaging, imagehash, wordcloud, typeguard, tqdm, statsmodels, PyYAML, pydantic, phik, htmlmin, dacite, ydata-profiling, pandas-profiling
Attempting uninstall: typing-extensions
Found existing installation: typing-extensions 3.10.0.2
Not uninstalling typing-extensions at /databricks/python3/lib/python3.9/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-f4e4a333-a81c-4e06-8f61-38cec448a697
Can't uninstall 'typing-extensions'. No files were found to uninstall.
Attempting uninstall: packaging
Found existing installation: packaging 21.0
Not uninstalling packaging at /databricks/python3/lib/python3.9/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-f4e4a333-a81c-4e06-8f61-38cec448a697
Can't uninstall 'packaging'. No files were found to uninstall.
Attempting uninstall: statsmodels
Found existing installation: statsmodels 0.12.2
Not uninstalling statsmodels at /databricks/python3/lib/python3.9/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-f4e4a333-a81c-4e06-8f61-38cec448a697
Can't uninstall 'statsmodels'. No files were found to uninstall.
Successfully installed PyWavelets-1.4.1 PyYAML-6.0 dacite-1.8.1 htmlmin-0.1.12 imagehash-4.3.1 multimethod-1.9.1 networkx-3.1 packaging-23.1 pandas-profiling-3.6.6 phik-0.12.3 pydantic-1.10.9 statsmodels-0.14.0 tangled-up-in-unicode-0.2.0 tqdm-4.65.0 typeguard-2.13.3 typing-extensions-4.6.3 visions-0.7.5 wordcloud-1.9.2 ydata-profiling-4.2.0
Python interpreter will be restarted.
import os
import uuid
import shutil
import pandas as pd
from pandas_profiling import ProfileReport
#To use Profiler dataframe must be in Pandas Format. The toPandas function converts it from spark to pandas
testoutput = spark.read.table("default.silver_pittsburgh_rev_exp").toPandas()
df_profile = ProfileReport(testoutput, title="Pittsburgh Revenue and Spending", infer_dtypes=False)
profile_html = df_profile.to_html()
displayHTML(profile_html)
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]