User Tools

Site Tools


data-analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
data-analysis [2023/04/15 00:43] – created dblumedata-analysis [2024/01/15 09:06] (current) – [GnuPlot] dblume
Line 25: Line 25:
 Here's an example command given the following two files, data.csv and gnuplot_instructions.gpi Here's an example command given the following two files, data.csv and gnuplot_instructions.gpi
  
-  gnuplot -e "f='data.csv'; t='Rating'" gnuplot_instructions.gpi+  gnuplot -e "f='data.csv'" gnuplot_instructions.gpi
  
 <file csv data.csv> <file csv data.csv>
 +date,col1,col2
 1992-01-01,5,14 1992-01-01,5,14
 1992-02-01,4,15 1992-02-01,4,15
Line 58: Line 59:
 set xdata time set xdata time
 set xlabel 'Date' set xlabel 'Date'
-set ylabel 'Value'+set xtics "1992-01-01", 2629746  # start, increment in seconds 
 +#set ylabel 'Value'
  
 # #
Line 70: Line 72:
 # #
 set datafile sep ',' set datafile sep ','
 +set key autotitle columnhead  # use the first line as title
 +firstrow = system('head -1 '.f. ' | tr "_," " "')
 +set xlabel word(firstrow, 1)
 +set ylabel word(firstrow, 2)
  
 # #
Line 77: Line 83:
 # #
 #plot f using 1:4 with lines, f using 1:3 with linespoints #plot f using 1:4 with lines, f using 1:3 with linespoints
-#plot f using 1:2 with lines title t, f using 1:3 with linespoints title 'Legend 2' +#plot f using 1:2 with lines, f using 1:3 with linespoints title 'Legend 2' 
-plot f using 1:2 with linespoints title t+plot f using 1:2 with linespoints 
 +</file> 
 + 
 +If you're making a "histogram" (actually a box chart with histogram style on X,Y points)... 
 +<file bash gnuplot_instructions.gpi> 
 +# Mostly the same as above, until... 
 + 
 +# Set your X axis format 
 +set style histogram clustered gap 1 
 +set style fill solid border -1
  
 +# Finally, plot with boxes
 +plot f using 1:2 with boxes
 </file> </file>
  
Line 91: Line 108:
         * NumPy: Fundamental, the other projects rely on it.         * NumPy: Fundamental, the other projects rely on it.
         * Matplotlib: Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.         * Matplotlib: Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.
 +    * Plotly: Generates interactive Javascript plots
  
 Here's [[https://github.com/Manishms18/Air-Passengers-Time-Series-Analysis/blob/master/Air_Passenger_with_explanations.ipynb|an example analysis of AirPassengers over time]]. Here's [[https://github.com/Manishms18/Air-Passengers-Time-Series-Analysis/blob/master/Air_Passenger_with_explanations.ipynb|an example analysis of AirPassengers over time]].
Line 111: Line 129:
  
 **Get this**. NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation. **Get this**. NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation.
 +
 +==== Plotly ====
 +
 +Undecided whether to use this. See [[https://towardsdatascience.com/matplotlib-vs-plotly-lets-decide-once-and-for-all-dc3eca9aa011|Matplotlib vs. Plotly: Let’s Decide Once and for All]]. Need to experiment. plotly.py is an interactive, open-source, and JavaScript-based graphing library for Python. Built on top of plotly.js, plotly.py is a high-level, declarative charting library that includes over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts.
  
 ==== Matplotlib ==== ==== Matplotlib ====
Line 116: Line 138:
 **Get this**. Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy **Get this**. Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy
  
 +====== Case Study: Temporal Series ======
 +
 +Data [[https://www.kaggle.com/datasets/rakannimer/air-passengers|AirPassengers.csv]]
 +
 +===== VisiData =====
 +
 +  vd AirPassenges.csv
 +
 +^ Key ^ Action ^
 +| @ | Set Column one as date format |
 +| ! | Set Column one as "important" for the X-axis |
 +| l | Navigate to column 2 |
 +| # | Set data as integer format |
 +| . | Create scatterplot |
 +
 +{{:general:visidata_airpassengers.png?400|}}
 +
 +**Pros**: Super fast and easy.
 +**Cons**: Need to use a font where Braille is supported. It's a scatterplot without lines.
 +
 +===== GnuPlot =====
 +
 +<file gnuplot AirPassengers.gpi>
 +# For ASCII on one full screen
 +#set term dumb `tput cols` `tput lines`*9/10
 +
 +# For a PNG file.
 +set term png size 900,400; set output 'AirPassengers.png'
 +
 +set timefmt '%Y-%m'
 +set xdata time
 +set format x '%Y'
 +set key autotitle columnhead  # use the first line for titles in legend
 +set xlabel 'Year'             # except for X, where we show Year not Month
 +set datafile sep ','
 +
 +# You can use: lines, points, linespoints
 +plot 'AirPassengers.csv' using 1:2 with lines
 +</file>
 +
 +  gnuplot AirPassengers.gpi && explorer.exe AirPassengers.png
 +
 +{{:general:airpassengers.png?400|}}
 +
 +When you change ''term'' to dumb, then depending on your terminal size you get output like:
 +
 +<code>
 +700 +-------------------------------------------------------------------------------+
 +    |      +          +      +          +      +          +      +          |
 +    |                                                           #Passengers ******* |
 +    |                                                                            *  |
 +600 |-+                                                                          *+-|
 +    |                                                                      *    * * |
 +    |                                                                     **    * * |
 +500 |-+                                                                 **    * *-|
 +    |                                                              **    * *    * * |
 +    |                                                        *     **    * *    * |
 +    |                                                       * *    **    *  ***    *|
 +400 |-+                                               **    * *    **     ***   +*|
 +    |                                          *     * *    * *  **  * **         |
 +    |                                          **    * *  *** *****  ***            |
 +    |                                          **   **  ***    * *    *             |
 +300 |-+                                  *     * ****   ***                       +-|
 +    |                             **    * *  **  ***    *                           |
 +    |                      **   ** *  *** ****    *                                 |
 +200 |-+              *     **  *    ***    *                                      +-|
 +    |         **  * * * ***  **     * *                                             |
 +    |  **   ** * ***  **                                                            |
 +    |*********  **      +      +          +      +          +      +          |
 +100 +-------------------------------------------------------------------------------+
 +  1949   1950  1951   1952   1953  1954   1955   1956  1957   1958   1959  1960   1961
 +                                          Year
 +</code>
 +
 +
 +
 +**Pros**: Fast and easy. Render to text or png pretty easily. Sometimes better text renderings than VisiData.
 +**Cons**: Not that pretty without customizations. GPI file takes some tweaking.
 +
 +===== MatPlotLib =====
 +
 +<code python>
 +import pandas as pd
 +data = pd.read_csv('AirPassengers.csv')
 +data['Month'] = pd.to_datetime(data['Month'])
 +data = data.set_index(['Month'])
 +
 +import matplotlib.pylab as plt
 +plt.figure(figsize=(10,5))
 +plt.xlabel("Year")
 +plt.ylabel("Airline Passengers")
 +plt.plot(data)
 +plt.show()
 +</code>
 +
 +{{:general:matplotlib_airpassengers.png?400|}}
 +
 +**Pros**: Theres [[https://github.com/Manishms18/Air-Passengers-Time-Series-Analysis/blob/master/Air_Passenger_with_explanations.ipynb|so much more you can do]].
 +**Cons**: Heavyweight.
data-analysis.1681544624.txt.gz · Last modified: 2023/04/15 00:43 by dblume